EthioCoders
Centre: Mean, median and mode
Four main aspects to analyse data: centre, spread, shape and outliers
The row in spread sheet is commonly known as instance or individual.
spread: range, interquartile range, standard deviation and variance
Histogram is mostly and commonly used for data spread
The five number summaries: Consists of five values: 1. Minimum, 2. Q1, 25% data
fall below; 3.Q2. 50% data fall below; 4. Q3. 75% of data fall below and 5.
Maximum, the largest value in the data set.
The interquartile range = Q3-Q1
Q2 is median
Box plot is used to compare quartiles, minimum and maximum
The box itself represents IQR
Standard deviation is the average distance of each observation from the mean
To create column B for calculation of variance in spread sheet for 10 data set, =A1-
AVERAGE(A$1:A$10), then for variance we create column C using formula =B1^2 in
cell C1, and in Cell C11, type =SUM(C1:C10). In Cell C12, =C11/10 is therefore
variance of the data set. For standard deviation, in cell C12, =sqrt(C12)
Advantages of single number (standard deviation):it simplifies amount of information,
useful for inferential statistics
normal: mean = median; right skewed: mean>median; left skewed: mean<median
when we consider outliers:,
1. We focus on noting they exist and impact on summary statistics
2. If typo-remove or fix
3. Understand why they exist, and the impact on questions we are trying to answer about
our data
4. The 5 number summary are important than mean and standard deviation when we have
outliers
5. be careful in reporting
To check whether your data is normally distributed, use normal quintile plot and
statistical methods like Kolmogorov-Smirnov test.
-Parameter: numerical summary about population
-statistic: numeric summary about sample
Limitations of spread sheets:
-lots of users and large datasets; variance on sheet size and how many there can be
between programs
-In excel, for help menu ‘press-F1
- to customize the quick access toolbar with file > Options> Quick Access Toolbar.
- the data menu has items such as sorting and filtering data
-Undo: Crrl+Z and Redo: Ctrl+Y
- the Range is the group of cells: =SUM(B2:C7) for example
- to quickly toggle from the relative to absolute address, use F4
- Adding dollar sign to an address makes it absolute based on where dollar signs are
placed. A dollar sign in front of the letter fixes a column, while the dollar sign in front of
the number fixes the row; if both are prefixed with a dollar sign, it is fixed to the exact
cell.
- to save data for the first time, save in SAVE AS, then save continuously in the same file.
- the excel format files are *.xlsx, *.xls and *.csv
-Excel formulas: converts a text string to upper case letters: UPPER(text); returns the
logical value TRUE: TRUE(); returns the number of characters in a text string:
LEN(text); estimates standard deviation based on sample (ignores logical values and
text): STDEV(n1, n2, …); adds all the numbers in a range of cells: SUM(n1, n2, …);
removes all spaces from a text string except single spaces between words: TRIM(text)
- to substitute will not change the original cell, instead showing the transformed data in
the new cell. SUBSTITUTE({text}, {old_text}, {new_text}), where {new_text} is the
cell to change, {old_text} is the string sequence to be replaced, and {new_text} is the new
string in place of the old one.
-To extract word from the middle: MID(text, start_num, num_chars)
text: text string containing the characters you want to extract
start_num: position of the first character you want to extract in the text
num_chars: the number of characters you want MID to return from the text
-For FIND function: FIND(find_text, within_text, [start_num])
find_text: the text you want to find
within_text: the text containing the text you want to find
start_num: OPTIONAL character position at which to start the search
- CONCATENATE: will join together two or more strings.
- TRIM: will help to remove excess white space from a string
- PROPER: sets the first letter of each word to upper case, with the rest lower case
- UPPER: sets all letters to upper case
- LOWER: sets all letters to lower case
Data Analysing in Spread Sheet
*aggregation functions: SUM, AVERAGE, MAX, MIN, MEDIAN, STDEV, COUNT
* Logical functions IF, AND, OR, NOT: =IF(condition, value if TRUE, [value if
FALSE]); for AND: IF(AND({condition1}, {condition2}), …)
*Conditional aggregation functions COUNTIF and SUMIF
*Pivot tables
*Named ranges
- The operators: <> for not equal, >= greater than or equal to, <= less than or equal to, =
equal to
Example: If the maximum number of any one fruit exceeds 10 pieces in an order, or the
total order exceeds 20 pieces (all fruit) then mark it with ‘‘Special order’’. Otherwise,
mark it with ‘‘NO’’ is given as =IF(OR(Max(B2:D2)>10, E2>20), ‘‘Special order’’,
‘‘NO’’)
- Lookup functions: uses a keyword and index to ‘’look up’’ a value in a table. There
are both horizontal and vertical lookup functions, although we will focus on a vertical
one called VLOOKUP
-Data Visualization: Pie chart, bar charts, scatter and line pilots, histograms, box plots
A pie chart illustrate the proportionality
To draw pie chart: select categories at the top of the tabe> hold down the control key and
select the bottom row with your mouse>copy highlighted data to another location>paste
using the transpose feature so it creates columns instead of rows > select and choose
‘’insert pie chart’’ as before.
- Bar chart is used for comparing values
- Pie chart is used to sow proportionality but used for categorical data
- Line chart uses for numerical values
- To add trend line in scatter: scatter chart> design>add chart element>trendline>linear.
5. Business Metrics
-Marketing: sales: leads, bookings, deals, margins; customers: CAC, CPA, Life time
value, Active users, chum; web traffic: CTR, CPC, CPL, Stickiness
-Growth:
-Sale
-Finance
Click Though Rate (CTR) = (Cicks/Impressions)*100
Cost Per Click (CPC) = Cost of advertisement/number of views
Cost Per Lead = cost of advertising/total cost
Customer Acquisition Cost (CAC) = (total marketing expenses + total marketing and
sales salaries)/number of customers converted to paid customers
-conversion rates based on impression = numbers/arrived on time numbers
-conversion rates based on each level = current level numbers/ prior level numbers
-total sales and marketing cost = (marketing cost + sales and marketing salaries +
overhead costs for sales and marketing)/number of leads
-Life Time Value (LTV)= Average sale*number of repeated sales *expected
retention*profit margin
-Profit Margin = (average sale-average cost of sale)/average sale