KEMBAR78
Sampling & Statistics Guide | PDF | Sampling (Statistics) | Percentage
0% found this document useful (0 votes)
40 views13 pages

Sampling & Statistics Guide

The document discusses different types of sampling methods including probability and non-probability sampling. It also covers descriptive statistics such as measures of central tendency, correlation analysis, and percentage and ranking. Descriptive statistics are used to describe data while inferential statistics are used to make predictions from samples.

Uploaded by

Anacleto Estoque
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views13 pages

Sampling & Statistics Guide

The document discusses different types of sampling methods including probability and non-probability sampling. It also covers descriptive statistics such as measures of central tendency, correlation analysis, and percentage and ranking. Descriptive statistics are used to describe data while inferential statistics are used to make predictions from samples.

Uploaded by

Anacleto Estoque
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 13

General Types of Sampling

1. Non-Probability Sampling- no equiprobability of sample.


a. Accidental Sampling- met by chance
b. Quota Sampling- many sectors but no proportional
representation.
c. Convenience Sampling- hot and controversial issue
2. Probability Sampling- equiprobability in the selection.
a. Pure random sampling- every element in sample frame has
an equal chance.
b. Systematic random sampling- subjects are arranged in some
systematic or logical order.
c. Stratified random sampling- population is segmented into
differentiated groups.
d. Purposive sampling- basis of their knowledge
e. Cluster sampling- total population of a big geographical
area.
f. Multistage sampling- subjects are scattered all over a
big geographical area.
Slovin’s Formula N= _n__
1+nc
In which:
N= size of sample
n= size of the population
c= margin of error
*Should not be higher than 5% (3% is an ideal one)
TYPES OF DATA USE
QUALITATIVE DATA QUANTITATIVE DATA
 Non- numerical data  Consists of number and
statistics
 Activities, attitudes,  Testing hypothesis
behavior
 In-depth interviews  Misses contextual detail
 Artefacts  Graphs and charts
 Relationships between  Seeking partners and
identified and relationships
emerging themes
 Comparing means and
correlations
Unit 5 Basic Statistics
Statistics came from Italian word “Statista” which means
statesman. It refers to the theory and method of collecting,
tabulating, and interpreting numerical data.

Types:

1. Descriptive Statistics- describing the data and deals


with whole population.

2. Inferential Statistics- methods that use sample results


to make decisions of predictions.

Descriptive Statistics

A. Measures of Central Tendency


1. Mean (µ=population mean; x= sample mean)- average score
or observation.
a. Ungrouped data
Formula: __
X= ∑ᵪ

b. Grouped Data (Midpoint method)
Formula: __
X= ∑ᵪ where: f= frequency
ṉ x= midpoint
2. Weighted Mean (Xw)-used if the answers in the tests are
multi- scored.
Formula: X= ∑fx where: f= frequency ṉ
ṉ x= scale value

Answer/Response Score (Scale Value)


Strongly Agree (SA) 5
Agree (A) 4
Undecided (U) 3
Disagree (D) 2
Strongly Disagree (SD) 1
For Interpretation:
Arbitrary Scale (I)
I- Highest Possible Score- Lowest Possible Score/No. of
categories
Example:
Statistical Limit Answer/Response
4.20-5.00 Strongly Agree
3.40-4.19 Agree
2.60-3.39 Undecided
1.80-2.59 Disagree
1.00-1.79 Strongly Disagree

3. Median- point or score that divides the distribution into


equal parts. The median is in ordinal data. Ungrouped Data.
1. Odd Number- Arrange from highest to lowest and
identify the middle.
2. Even Number- Arrange from highest to lowest.
Identify the two middle numbers and add them and divide
it by 2
4. Mode – this is the most frequent number in a data set –
that is, the number that occurs the highest number of times.
B. CORRELATION ANALYSIS
Correlation Analysis
Correlation analysis is a statistical method used to evaluate the
strength and direction of the relationship between two or
more variables. The correlation coefficient ranges from -1 to 1.
A correlation coefficient of 1 indicates a perfect positive
correlation. This means that as one variable increases, the other
variable also increases.
A correlation coefficient of -1 indicates a perfect negative
correlation. This means that as one variable increases, the other
variable decreases.
A correlation coefficient of 0 means that there’s no linear
relationship between the two variables.
Correlation Analysis Methodology
Conducting a correlation analysis involves a series of steps, as
described below:
Define the Problem: Identify the variables that you think might
be related. The variables must be measurable on an interval or
ratio scale. For example, if you’re interested in studying the
relationship between the amount of time spent studying and exam
scores, these would be your two variables.
Data Collection: Collect data on the variables of interest. The
data could be collected through various means such
as surveys, observations, or experiments. It’s crucial to ensure
that the data collected is accurate and reliable.
Data Inspection: Check the data for any errors or anomalies such
as outliers or missing values. Outliers can greatly affect the
correlation coefficient, so it’s crucial to handle them
appropriately.
Choose the Appropriate Correlation Method: Select the correlation
method that’s most appropriate for your data. If your data meets
the assumptions for Pearson’s correlation (interval or ratio
level, linear relationship, variables are normally distributed),
use that. If your data is ordinal or doesn’t meet the assumptions
for Pearson’s correlation, consider using Spearman’s rank
correlation or Kendall’s Tau.
Compute the Correlation Coefficient: Once you’ve selected the
appropriate method, compute the correlation coefficient. This can
be done using statistical software such as R, Python, or SPSS, or
manually using the formulas.
Interpret the Results: Interpret the correlation coefficient you
obtained. If the correlation is close to 1 or -1, the variables
are strongly correlated. If the correlation is close to 0, the
variables have little to no linear relationship. Also consider
the sign of the correlation coefficient: a positive sign
indicates a positive relationship (as one variable increases, so
does the other), while a negative sign indicates a negative
relationship (as one variable increases, the other decreases).
Check the Significance: It’s also important to test the
statistical significance of the correlation. This typically
involves performing a t-test. A small p-value (commonly less than
0.05) suggests that the observed correlation is statistically
significant and not due to random chance.
Report the Results: The final step is to report your findings.
This should include the correlation coefficient, the significance
level, and a discussion of what these findings mean in the
context of your research question.

Pearson –Product Moment Correlation -determine linear


relationships between variables.
This is the most common type of correlation analysis.
Pearson correlation measures the linear relationship between
two continuous variables. It assumes that the variables are
normally distributed and have equal variances. The
correlation coefficient (r) ranges from -1 to +1, with -1
indicating a perfect negative linear relationship, +1
indicating a perfect positive linear relationship, and 0
indicating no linear relationship.
Pearson’s correlation coefficient measures the linear
relationship between two variables. The formula is:

Visualizing the Pearson correlation coefficient


Another way to think of the Pearson correlation coefficient (r)
is as a measure of how close the observations are to a line of
best fit.
The Pearson correlation coefficient also tells you whether the
slope of the line of best fit is negative or positive. When the
slope is negative, r is negative. When the slope is
positive, r is positive.
When r is 1 or –1, all the points fall exactly on the line of
best fit:
When r is greater than .5 or less than –.5, the points are close
to the line of best fit:

When r is between 0 and .3 or between 0 and –.3, the points are


far from the line of best fit:

When r is 0, a line of best fit is not helpful in describing the


relationship between the variables:

When to use the Pearson correlation coefficient


The Pearson correlation coefficient (r) is one of
several correlation coefficients that you need to choose between
when you want to measure a correlation. The Pearson correlation
coefficient is a good choice when all of the following are true:
Both variables are quantitative: You will need to use a different
method if either of the variables is qualitative.
The variables are normally distributed: You can create a
histogram of each variable to verify whether the distributions
are approximately normal. It’s not a problem if the variables are
a little non-normal.
The data have no outliers: Outliers are observations that don’t
follow the same patterns as the rest of the data. A scatterplot
is one way to check for outliers—look for points that are far
away from the others.
The relationship is linear: “Linear” means that the relationship
between the two variables can be described reasonably well by a
straight line. You can use a scatterplot to check whether the
relationship between two variables is linear.

C. PERCENTAGE and RANKING - Percentage is a portion of a whole


expressed in hundredths. It is the value obtained by multiplying
a number by a percent.
The term ‘percentage‘ is widely used to express anything, from
changes in the tax rates, to the rate of unemployment, to the
number of people using smartphones, to resource allocation by the
government, to the change in price or quantity of a product or
results of an examination. It is a form of writing any digit with
denominator 100.
The terms ‘percentage’ and ‘percentile’ often confuse us,
especially to the students appearing in different examinations.
For a given dataset, percentile represents that value in the
distribution or level, at or below which, a certain percentage of
score lies.
BASIS FOR
PERCENTAGE PERCENTILE
COMPARISON

Meaning The percentage refers Percentile implies a


to the unit of value, at or below which a
measurement specific proportion of the
indicating, for every observations lies.
hundred.

What does it Scores out of hundred, Position or standing on


depict? or per hundred the basis of appearance
BASIS FOR
PERCENTAGE PERCENTILE
COMPARISON

Represents Rate, number or amount Rank

Symbol % pth

Based on Individual performance Relative performance

Comparison of Actual scores with the Individual's rank with the


total scores. total number of students
who appeared the
examination.

Objective To show fractional To show where the scores


numbers as whole stand in relation to other
numbers. scores.

Quartiles No Yes

Based on the No Yes


normal frequency
distribution

Definition of Percentage
The word ‘percentage’ is a combination of two words, ‘per’
‘cent’, i.e. ‘per hundred’ or ‘/100’, signifying ‘out of 100’. In
mathematical terminology, ‘out of’ refers to ‘divide by’. For
instance, 30% denotes 30 per 100, which can be expressed as a
fraction (30/100), or as a decimal (0.30).
Hence, we can use the given rule to convert a number stated in
percentage terms, into fraction or decimal:
To convert a fraction into a percentage, all you need to do is
divide the numerator by the denominator and multiply the result
by 100.

For example:
Suppose if 60% of people in India are using Amazon for online
shopping, then it means that if there are possibly 100 people in
total, 60 people would use Amazon to shop online.
A person donates 15% of his income to the orphanage. This implies
that 15 out of every 100 rupees of his income is donated.
Ratio Comparison
The percentage can also be used to compare ratios by representing
them as percentages.
For example:
In two exams, 420 out of 500 and 355 out of 400 candidates were
present. Now, we can express and compare the present percentage
as:
In the first exam,
420 out of 500 candidates appeared = (420/500)×100 = 84%
In the second exam,
370 out of 400 candidates appeared = (355/400)×100 = 88.75%
Definition of Percentile
In statistics, percentile refers to the point on a measurement
scale, at or below which a specified percentage of the cases
lies. The percentile rank of a score, implies the proportion of
scores in a frequency distribution, that the marks obtained are
more than or equal to. It reflects how a score compared to other
scores in the given dataset.
Alternatively, it can also refer to the values which divide the
dataset into 100 equal parts.
For example:
Suppose a student obtained 85 marks in an exam and this marks is
higher than or equal to marks of 79% of the students who gave the
exam, then the percentile rank of the student would be 79.
In a group of 15 people, Robin is the 3rd oldest person. 80% of
the people are younger than Robin.
Th
is indicates that Robin is at 81th percentile. Hence, the age of
Robin, i.e. 61 years is the 80th percentile age in that dataset.
The height of Alex is 168 cm, who is 5th tallest person in the
group of 40 people. This put him in 87.5th percentile. This
implies that the height of 87.5% of the people in the group is
equal to or less than 168 cm
Hence, in an examination, the range of percentile rank shown in
the result indicates the range within which the candidate’s
‘true’ percentile rank takes place.

Key Differences Between Percentage and Percentile


The difference between percentage and percentile can be drawn
clearly on the following grounds:
Percentage alludes to the mathematical value which can be
expressed as a fraction with denominator hundred. On the other
hand, the percentile is a point, whose measurement is done along
the scale of plotted variable, at or below which a certain
percentage of measures fall.
Percentage depicts scores out of hundred, per hundred or for
every hundred. Conversely, percentile indicates rank on the basis
of appearance.
Percentage shows the rate, number or amount, whereas percentile
indicates the position or standing of a person.
To indicate a percentage, ‘%’ symbol is used which means ‘divide
by 100’. In contrast, the percentile is denoted by pth, where ‘p’
is a number.
While the percentage is based on individual performance or score,
the percentile is based on comparative performance or score.
In the case of percentage, the comparison between actual scores
with the total scores is made. Contrastingly, in percentile, an
individual’s rank is compared with the total number of students
who appeared the examination.
The percentage is used to demonstrate fractional numbers as whole
numbers, wherein the denominator is 100. As opposed, the
percentile is used to indicate where the scores stand in relation
to other scores.
When it comes to quartiles, percentile has quartiles as the
dataset is divided into 100 equal parts, but the percentage does
not have quartiles.
While percentile is based on normal frequency distribution, the
percentage is not based on it.

D. TEST OF SIGNIFICANCE
1. Z-TEST- is a general parametric test used to determine
the randomness of samples from a population obtained from
the sample mean with expected population mean.
Let us have an example:
A factory has a machine that dispenses 80 ml of fluid in a
bottle. An employee believes the average amount of fluid is not
80ml. using 40 samples, he measures the average amount dispersed
by the machine to be 78ml with a standard deviation of 2.5.
(a)State the null and alternative hypotheses. (b) at a 95%
confidence level, is there enough evidence to support the idea
that the machine is not working properly?
2. T-TEST- for INDEPENDENT MEANS- used to determine if the
observed difference between the mean of two groups is
statistically significant.
T-Test

1. A study was conducted to determine the effectiveness of a weight loss program. The table below
shows the before and after weights of 10 subjects in the program. (a) Is the program effective for
reducing weight? Use a 5% significance level (b) Construct a 95% confidence interval and
determine the margins of error.

s. Before After Xd (Wa-Wb) x̄ d sd(xd- x̄ d)


1 185 169 -16 -13.1 13.025
2 192 187 -5
3 206 193 -13
4 177 176 -1
5 255 194 -31
6 168 171 +3
7 256 228 -28
8 239 217 -22
9 199 204 +5
10 218 195 -23

Ho: µd ≥ 0
Ha: µd < 0

x̄ d = Σ (Xd)/n

3. T-TEST for DEPENDENT SAMPLE MEAN- is a more precise test


with its use limited to scores that are correlated and
involving the pre-test and post-test.
4. F-test- it used when the study compares the means of two
or more groups.

You might also like