0% found this document useful (0 votes)

7 views42 pages

Module 2 - Part 1

Statistics of AIDS

Uploaded by

aditideo624

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views42 pages

Module 2 - Part 1

Statistics of AIDS

Uploaded by

aditideo624

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 42

University of Mumbai

Program – Bachelor of Engineering in

Computer Science and Engineering (Artificial Intelligence
and Machine Learning)

Class - T.E.
Course Code – CSDLO5011

Course Name – Statistics for Artificial

Intelligence Data Science

By
Prof. A.V.Phanse
Sampling –
 In statistics, sampling is a method of selecting the subset of the population to
make statistical inferences.

 From the sample, the characteristics of the whole population can be estimated.

 Sampling in market research can be classified into two different types, namely
probability sampling and non-probability sampling.
What is Probability Sampling?

 The probability sampling method utilizes some form of random selection. In

this method, all the eligible individuals have a chance of selecting the sample
from the whole sample space.
 This method is more time consuming and expensive than the non-probability
sampling method.
 The benefit of using probability sampling is that it guarantees the sample that
should be the representative of the population.

What is Non-Probability Sampling?

 The non-probability sampling method is a technique in which the researcher

selects the sample based on subjective judgment rather than the random
selection.
 In this method, not all the members of the population have a chance to
participate in the study.
Simple Random Sampling

 In simple random sampling technique, every item in the population has an equal
and likely chance of being selected in the sample.
 Since the item selection entirely depends on the chance, this method is known
as “Method of chance Selection”.
 As the sample size is large, and the item is chosen randomly, it is known as
“Representative Sampling”.

Example:
Suppose we want to select a simple random sample of 200 students from a school.
Here, we can assign a number to every student in the school database from 1 to
500 and use a random number generator to select a sample of 200 numbers.
Systematic Sampling

 In the systematic sampling method, the items are selected from the target
population by selecting the random selection point and selecting the other
methods after a fixed sample interval.
 It is calculated by dividing the total population size by the desired population
size.

Example:
 Suppose the names of 300 students of a school are sorted in the reverse
alphabetical order.
 To select a sample in a systematic sampling method, we have to choose some 15
students by randomly selecting a starting number, say 5. From number 5
onwards, will select every 15th person from the sorted list. Finally, we can end
up with a sample of some students.
Stratified Sampling

 In a stratified sampling method, the total population is divided into smaller

groups to complete the sampling process.
 The small group is formed based on a few characteristics in the population.
 After separating the population into a smaller group, the statisticians randomly
select the sample.

Example :
There are three bags (A, B and C), each with different balls. Bag A has 50 balls, bag
B has 100 balls, and bag C has 200 balls. We have to choose a sample of balls from
each bag proportionally. Suppose 5 balls from bag A, 10 balls from bag B and 20
balls from bag C.
Clustered Sampling

 In the clustered sampling method, the cluster or group of people are formed
from the population set.
 The group has similar characteristics. Also, they have an equal chance of being a
part of the sample.
 This method uses simple random sampling for the cluster of population.

Example:
An educational institution has ten branches across the country with almost the
same number of students. If we want to collect some data regarding facilities and
other things, we can’t travel to every unit to collect the required data. Hence, we
can use random sampling to select three or four branches as clusters.
Data and Sampling Distributions
 The left hand side figure represents
a population that is assumed to
follow an underlying but unknown
distribution.
 The right hand side figure is the
sample data and its empirical
distribution.
 To get from the left hand side to the
right hand side, a sampling
procedure is used (represented by
an arrow).

 Traditional statistics focused very much on the left‐ hand side, using theory based
on strong assumptions about the population.
 Modern statistics has moved to the right hand side, where such assumptions are
not needed.
Random Sampling and Sample Bias

 A sample is a subset of data from a larger data set i.e. population.

 A population is a large, defined set of data.

 Random sampling is a process in which each available member of the

population being sampled has an equal chance of being chosen for the sample
at each draw. The sample that results is called a simple random sample.

 Simple Random Sampling with Replacement (SRSWR) –

When simple random samples are selected in the way that unit which has been
selected as sample unit is remixed or replaced in the population before the
selection of the next unit in the sample then the method is known as simple
random sampling with replacement.

 Simple Random Sampling without Replacement (SRSWOR) –

When simple random sample are selected in the way that a unit is selected as
sample unit is not mixed or replaced in the population before the selection of the
next unit in the sample is known as simple random sampling without replacement
i.e. once a unit is selected in the sample will never be selected again in the sample.
Sampling bias occurs when some members of a population are systematically more
likely to be selected in a sample than others.

Types of sample bias -

Self-Selection Bias Nonresponse Bias

 For example, suppose you’re  For example, subjects with health

conducting a survey about local issues might not be able to complete
water quality. a study for a physical fitness program.
 People already interested in this  Consequently, the program appears
topic are more likely to respond more effective in the sample than in
and, thus, be overrepresented in the population.
the results.
 This group likely has opinions that
differ from the general population.
Survivorship Bias Pre-screening or Advertising Bias

 Studies that assess a sample of  For example, a study that advertises

existing companies are a classic a fitness improvement program is
example of this bias. more likely to find subjects who are
 By focusing on the financial already motivated to get fit.
status of active companies,  Hence, the program might be more
these studies don’t include effective in this sample than in the
those that have gone out of general population.
business.
Undercoverage Bias Healthy User Bias

 For example, homeless people  A study is conducted to assess the

are unlikely to appear on various effectiveness of a new exercise
lists and won’t have an address program on reducing cardiovascular
or phone number. disease risk.
 Consequently, samples are  The participants are selected from
unlikely to include them. a group of people who use fitness
app
 The people who use a fitness app
are likely more health-conscious
and physically active than the
average person, making them less
representative of the general
population.
How to Avoid Sampling Bias

1. Use random or stratified sampling - Stratified random sampling will help ensure
you get a representative research sample and reduce the interference of
irrelevant variables in your systematic investigation.
2. Avoid convenience sampling -Rather than collecting data from only easily
accessible or available participants, you should gather data from the different
subgroups that make up your population of interest.
3. Clearly define a target population and a sampling frame -Matching the
sampling frame to the target population as much as possible will reduce the risk
of sampling bias.
4. Follow up on non-responders - When people drop out or fail to respond to your
survey, do not ignore them, but rather follow up to determine why they are
unresponsive and see if you can garner a response.
5. Oversampling - Oversampling can be used to avoid sampling bias in cases where
members of the defined population are underrepresented.
6. Aim for a large research sample - The larger your sample population, the more
likely you are to represent all subgroups from your population of interest.
7. Set up quotas for each identified demographic - If you think participant gender,
age, or some other demographic characteristic is a potential source of bias
within your study, quotas will allow you to evenly sample people from different
demographic groups within the study.
Bias

 Statistical bias refers to a systematic error that causes an estimator or a

measurement process to consistently differ from the true value or the expected
result.

 In other words, bias refers to a flaw in the experiment design or data collection
process, which generates results that don’t accurately represent the population.

 Bias can lead to inaccuracies in data analysis, making the results less reliable.

 There are various types of statistical bias, each arising from different sources,
such as the design of a study, data collection methods, or data analysis
techniques.
Data Size Versus Data Quality
 Data quality is often more important than data quantity, and random sampling
can reduce bias and facilitate quality improvement that would otherwise be
prohibitively expensive.

 In the era of big data, it is sometimes surprising that smaller is better.

 Time and effort spent on random sampling not only reduces bias but also
allows greater attention to data exploration and data quality.

 For example, missing data and outliers may contain useful information. It might
be difficult to track down missing values or evaluate outliers in millions of
records, but doing so in a sample of several thousand records may be feasible.
Data Size

Pros:

 Increased Accuracy: Larger datasets can provide more information and

reduce the margin of error in statistical analyses.
 Better Representation: A bigger sample size is more likely to represent the
population accurately, reducing sampling bias.
 Powerful Models: More data can lead to more powerful machine learning
models that can capture complex patterns.

Cons:

 Storage and Processing: Large datasets require more storage space and
computational power, which can be costly and time-consuming.
 Noise and Redundancy: With more data, there's a higher chance of including
irrelevant or duplicate information, which can obscure important patterns.
Data Quality

Pros:

 Accuracy and Reliability: High-quality data ensures that the information is

accurate, complete, and reliable, leading to better decision-making.
 Efficiency: Good quality data requires less cleaning and preprocessing, saving
time and resources.
 Trust: Reliable data builds trust among stakeholders and users.

Cons:

 Cost and Effort: Ensuring high data quality can be resource-intensive, requiring
thorough validation, cleaning, and maintenance.
 Limited Scope: High-quality data might be more challenging to obtain in large
volumes, limiting the scope of analysis.
Balancing Data Size and Data Quality

Trade-offs:
Sometimes, there is a trade-off between data size and quality. It's crucial to find
a balance that suits the specific use case. For example, in some cases, a smaller
dataset of high-quality data may be more valuable than a larger, low-quality
dataset.

Context and Purpose:

The importance of data size versus quality depends on the context. For instance,
in medical research, high data quality is crucial, while in marketing, larger
datasets may be more beneficial for capturing trends and patterns.

Data Governance:
Implementing strong data governance practices can help ensure both data
quality and manageability, regardless of size.
In practice, prioritizing both large amounts of high-quality data is ideal, but the
focus may shift depending on the specific needs and constraints of a project.
Sample Mean Versus Population Mean
Sample Mean Population Mean

Definition: The sample mean is the Definition: The population mean is the
average of a set of observations average of all possible observations in
taken from a larger population. the entire population.

 It is used to estimate the population  It represents the true average of the

mean when it is impractical or entire population and is a parameter
impossible to measure the entire of the population.
population.
 The value of the sample mean can  It is a fixed value, it does not change
vary from one sample to another. unless the population itself changes.

In summary, the sample mean is a practical tool for estimating the population
mean, when the population is too large or impractical to measure its entirety.
Selection Bias

 Selection bias refers to the practice of selectively choosing data consciously or

unconsciously in a way that leads to a conclusion that is misleading.

 If you specify a hypothesis and conduct a well-designed experiment to test it,

you can have high confidence in the conclusion. But, this is frequently not what
followed.

 Often, one looks at available data and tries to understand or derive patterns.
These patterns are many a time result of data snooping.

 Data snooping is the process of extensive hunting through the data until
something interesting emerges
Regression to the Mean
 Regression to the mean is a statistical phenomenon that occurs when extreme
values in a data set tend to be closer to the average on subsequent
measurements. The phenomenon was first identified by Francis Galton in 1886.

 This effect is particularly noticeable in cases where there is some degree of

random variability or error in the measurements.

 When a variable is extreme on its first measurement, it will likely be closer to

the mean on its next measurement.

 This happens due to random variation or measurement error, not because of

any actual change in the underlying variable.

Example:
Imagine measuring students' scores on two tests.
Students who score extremely high or low on the first test are likely to score closer
to the average on the second test, simply due to random fluctuations in test
performance.
Sampling Distribution of a Statistic

 The term sampling distribution of a statistic refers to the distribution of some

sample statistic over many samples drawn from the same population.
 It is constructed by repeatedly drawing samples of the same size from a
population, calculating the statistic for each sample, and then plotting the
distribution of these statistics.

 Typically, a sample is drawn with the goal of measuring something or modeling

something.
 Since our estimate or model is based on a sample, it might contain error.
 We are therefore interested in sampling variability. If the data is huge, we could
draw additional samples and observe the distribution of a sample statistic
directly
 It is important to distinguish between the distribution of the individual data
points, known as the data distribution, and the distribution of a sample
statistic, known as the sampling distribution.
 The distribution of a sample statistic such as the mean is likely to be more
regular and bell-shaped than the distribution of the data itself.
Consider a population
of 10000 people.
The average height of
population is 5’4
Difference between population distribution and Sampling distribution
Central Limit Theorem

 Central limit theorem says that the means drawn from multiple samples will
resemble the familiar bell-shaped normal curve even if the source population
is not normally distributed, provided that the sample size is large enough and
the departure of the data from normality is not too great.

 As the sample size increases, the distribution of the sample means becomes
increasingly normal. This is particularly useful because the normal distribution
has well-known properties, making it easier to make inferences about the
population.

 The theorem applies when the sample size is "sufficiently large," often
considered as n ≥ 30. However, if the original population distribution is normal,
the CLT holds even for small sample sizes.
Standard Error
 The Standard Error (SE) is a statistical measure that quantifies the variability of a
sample mean estimate of a population parameter.
 In simpler terms, it indicates how much the sample mean is expected to
fluctuate from the true population mean if you were to repeatedly draw
samples.

where:
σ is the standard deviation of the population
s is the standard deviation of the sample
n is the sample size.
 A smaller standard error suggests that the sample mean is a more precise
estimate of the population mean.
 Conversely, a larger standard error indicates more variability in the sample
means and less precision.
 Standard error is often used in hypothesis testing, confidence intervals, and
regression analysis to understand the precision of an estimate.
 While the standard deviation measures the variability within a single sample,
the standard error measures the variability of the sample mean from one
sample to another.
Following steps are used for measuring standard error:

1. Collect a number of brand-new samples from the population.

2. For each new sample, calculate the statistic (e.g. mean).
3. Calculate the standard deviation of the mean computed in step 2 and use this
as your estimate of standard error

Numerical for Practice

In a certain property investment company with an international presence, workers

have a mean hourly wage of $12 with a population standard deviation of $3.
Given a sample size of 30, estimate and interpret the SE of the sample mean

 In practice, this approach of collecting new samples to estimate the standard

error is typically not feasible (and time consuming).

 In modern statistics, the bootstrap has become the standard way to estimate
standard error.
The Bootstrap

 Bootstrap is a powerful statistical method used for estimating the distribution of

a statistic (like mean, median etc.) by resampling with replacement from the
original data.
 In this method, additional samples are drawn with replacement from the sample
itself and the statistic for each resample is recalculated.

 The bootstrap process involves repeatedly drawing samples from the original
dataset, where each sample is of the same size as the original dataset but is
drawn with replacement (or remixing).
 This means that some data points may appear multiple times in a resampled
dataset, while others may not appear at all.
 Each resampled dataset is called a bootstrap sample. From each bootstrap
sample, the statistic of interest (e.g., mean, median) is calculated.
 By repeating the resampling process many times (typically thousands), you
create a distribution of the statistic of interest.
 This distribution is called the bootstrap distribution and can be used to estimate
standard errors, confidence intervals, and more.
Process:

1. Original Dataset: Suppose you have a dataset with n observations.

2. Resampling: Draw a bootstrap sample by randomly selecting n observations
from the original dataset with replacement.
3. Statistic Calculation: Calculate the statistic of interest (e.g., mean) from the
bootstrap sample.
4. Repeat: Repeat steps 2 and 3 many times (e.g., 1000 times) to create the
bootstrap distribution.
5. Analysis: Use the bootstrap distribution to estimate the standard error,
construct confidence intervals, or perform hypothesis testing.
Resampling Versus Bootstrapping

Resampling

 Resampling is a broad statistical technique that involves repeatedly drawing

samples from a dataset and assessing the variation in a statistic of interest.
 It is a general term that includes various methods like permutation tests, cross-
validation, and bootstrapping.
 Resampling methods are used to test hypotheses, validate models, or estimate
the variability of a statistic without relying on traditional parametric
assumptions.

Bootstrapping

 Bootstrapping is a specific type of resampling method. It involves repeatedly

drawing samples from the original data with replacement to estimate the
sampling distribution of a statistic.
 Bootstrapping is particularly useful when the underlying distribution of the data
is unknown or when the sample size is small, making traditional assumptions
unreliable.
Key Differences:

Scope:

 Resampling is an umbrella term that includes multiple methods.

 Bootstrapping is one specific method within the broader category of
resampling.

Resampling Method:

 Resampling can involve either sampling with or without replacement,

depending on the specific method being used (e.g., cross-validation
involves sampling without replacement).
 Bootstrapping always involves sampling with replacement.
Confidence Intervals
 A Confidence Interval (CI) is a range of values, derived from a sample, that is
likely to contain the true population parameter (such as the mean, proportion,
or difference between means) with a certain level of confidence.
 It's a fundamental concept in statistics used to express the uncertainty around
an estimate.
 A 95% confidence level means that if you were to take 100 different samples
and compute a confidence interval for each, about 95 of those intervals would
be expected to contain the true population parameter.
 If you calculate a 95% confidence interval for a mean as [10, 15], it means you
can be 95% confident that the true population mean lies between 10 and 15.

Factors Affecting Confidence Intervals:

Sample Size (n):

Larger sample sizes lead to narrower confidence intervals, as the estimate is more
precise.
Variability (σ or s):
Higher variability in the data leads to wider confidence intervals, indicating less
precision.
Confidence Level:
Higher confidence levels lead to wider intervals, as they provide more "certainty"
about containing the true parameter.
Numerical for Practice
1. Find the standard error of the estimate of the mean weight of high school
football players using the data given of weights of high school football players from
your school. Then find a 95% confidence interval for the data
Player Number Weight in Pounds
1 150
2 203
3 176
4 190
5 168
6 193
7 189
8 178
9 197
10 172
2. Find the standard error of the estimate for the average number of children in a
household in your city by using the data collected from a sample of households in
your city. Then find a 95% confidence interval for the data
Thank You…

Portion 3
No ratings yet
Portion 3
32 pages
2006 - Philosophy, Methodology and Action Research
No ratings yet
2006 - Philosophy, Methodology and Action Research
43 pages
Sampling Procedure
No ratings yet
Sampling Procedure
11 pages
3sampling True
No ratings yet
3sampling True
43 pages
Complete Basic Stats
No ratings yet
Complete Basic Stats
18 pages
Sampling and Distribution
No ratings yet
Sampling and Distribution
40 pages
5 Research Process Step 4 Identifyng and Selecting Research Participants 05112020 095507am
No ratings yet
5 Research Process Step 4 Identifyng and Selecting Research Participants 05112020 095507am
23 pages
13 Sampling Designs
No ratings yet
13 Sampling Designs
48 pages
Sampling
No ratings yet
Sampling
22 pages
5.sampling and Sampling Distributions
No ratings yet
5.sampling and Sampling Distributions
79 pages
Sampling Techniques
No ratings yet
Sampling Techniques
8 pages
Sad
No ratings yet
Sad
5 pages
Sampling Methods
No ratings yet
Sampling Methods
35 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
18 pages
My Assignment Abe 415
No ratings yet
My Assignment Abe 415
6 pages
5.2 Sampling Methods
No ratings yet
5.2 Sampling Methods
35 pages
Chapter 9 Sampling
No ratings yet
Chapter 9 Sampling
46 pages
Abm CH 2 Full PDF
No ratings yet
Abm CH 2 Full PDF
68 pages
Chapter 4
No ratings yet
Chapter 4
40 pages
Sampling
No ratings yet
Sampling
5 pages
Biostat
No ratings yet
Biostat
10 pages
Business Statistics Unit-Iv
No ratings yet
Business Statistics Unit-Iv
9 pages
Probability Vs Non Probability Sampling
100% (1)
Probability Vs Non Probability Sampling
39 pages
62 - Ex 12A Populations and Samples
No ratings yet
62 - Ex 12A Populations and Samples
30 pages
Sampling
No ratings yet
Sampling
29 pages
Lecture 4-Sampling Design
No ratings yet
Lecture 4-Sampling Design
36 pages
Sampling Techniques Overview
No ratings yet
Sampling Techniques Overview
35 pages
Understanding Sampling Errors and Techniques
No ratings yet
Understanding Sampling Errors and Techniques
15 pages
Research Sampling Methods Guide
No ratings yet
Research Sampling Methods Guide
8 pages
Green Yellow Aesthetic Cute Notebook Group Project Presentation
No ratings yet
Green Yellow Aesthetic Cute Notebook Group Project Presentation
29 pages
Sampling Techniques in Marketing Research
No ratings yet
Sampling Techniques in Marketing Research
14 pages
Reasearch Sample
No ratings yet
Reasearch Sample
13 pages
Sampling Methods 1
No ratings yet
Sampling Methods 1
7 pages
Sampling & Estimation for IT Students
No ratings yet
Sampling & Estimation for IT Students
50 pages
Reggie Assignment
No ratings yet
Reggie Assignment
6 pages
Sampling Methods Explained
No ratings yet
Sampling Methods Explained
27 pages
Research CH 4 Sampling Design
No ratings yet
Research CH 4 Sampling Design
25 pages
Notes - Sampling Design - Mac 2023
No ratings yet
Notes - Sampling Design - Mac 2023
41 pages
Data Sampling
No ratings yet
Data Sampling
18 pages
Unit-4: Sampling Design
No ratings yet
Unit-4: Sampling Design
43 pages
Lecture 6
No ratings yet
Lecture 6
40 pages
Week 4 Sampling and Sampling Procedures
100% (1)
Week 4 Sampling and Sampling Procedures
47 pages
QT Lecture
No ratings yet
QT Lecture
113 pages
MR Chapter Eight
No ratings yet
MR Chapter Eight
4 pages
Lecture8 Sampling Design
No ratings yet
Lecture8 Sampling Design
18 pages
Probability Sampling Methods Guide
No ratings yet
Probability Sampling Methods Guide
30 pages
RESEARCH DEVELOPMENT Lesson 6
No ratings yet
RESEARCH DEVELOPMENT Lesson 6
17 pages
Unit 3 BRM
No ratings yet
Unit 3 BRM
22 pages
Sampling Techniques Lecture
No ratings yet
Sampling Techniques Lecture
67 pages
Sampling Techniques
No ratings yet
Sampling Techniques
8 pages
Unit 2-2 Sampling Design
No ratings yet
Unit 2-2 Sampling Design
26 pages
4 - Sampling and Sample Size - SFB
No ratings yet
4 - Sampling and Sample Size - SFB
52 pages
Sampling Techniques Edited
No ratings yet
Sampling Techniques Edited
80 pages
Sampling Techniques
No ratings yet
Sampling Techniques
25 pages
Sampling
No ratings yet
Sampling
30 pages
Sampling Method Final
No ratings yet
Sampling Method Final
10 pages
Sampling
No ratings yet
Sampling
45 pages
Lecture 13
No ratings yet
Lecture 13
44 pages
Introduction: Demystifying The Art of Sampling
No ratings yet
Introduction: Demystifying The Art of Sampling
9 pages
BCT Ass3 Ans
No ratings yet
BCT Ass3 Ans
6 pages
BCT All - PYQ
No ratings yet
BCT All - PYQ
3 pages
Module 6
No ratings yet
Module 6
35 pages
Mod 3
No ratings yet
Mod 3
9 pages
(Ebook) Classic Case Studies in Psychology by Geoff Rolls ISBN 9781848722699, 1848722699 Download
100% (1)
(Ebook) Classic Case Studies in Psychology by Geoff Rolls ISBN 9781848722699, 1848722699 Download
54 pages
ENGL 1010 Reflective Essay (Final)
No ratings yet
ENGL 1010 Reflective Essay (Final)
9 pages
A Practical Research 1 q2m2 Teacher Copy Final Layout
100% (2)
A Practical Research 1 q2m2 Teacher Copy Final Layout
23 pages
Senior High Stats & Probability Guide
No ratings yet
Senior High Stats & Probability Guide
61 pages
Chapter 1-Introduction To Research Methology
No ratings yet
Chapter 1-Introduction To Research Methology
65 pages
CHAPTER I 4 To 5
No ratings yet
CHAPTER I 4 To 5
35 pages
Data Collection
No ratings yet
Data Collection
3 pages
The Departure Point of Hizb-Ut-Tahrir
No ratings yet
The Departure Point of Hizb-Ut-Tahrir
15 pages
Meat Loaf Mix Sales Analysis
No ratings yet
Meat Loaf Mix Sales Analysis
7 pages
Large Sample Test
No ratings yet
Large Sample Test
27 pages
Research Design and Methodology
No ratings yet
Research Design and Methodology
29 pages
Dynamic Panel Data
No ratings yet
Dynamic Panel Data
51 pages
METHODOLOGY
No ratings yet
METHODOLOGY
2 pages
Environmental Impact on Species
No ratings yet
Environmental Impact on Species
2 pages
Sci Math
No ratings yet
Sci Math
27 pages
Probability and Statistics: Dr. K.W. Chow Mechanical Engineering
No ratings yet
Probability and Statistics: Dr. K.W. Chow Mechanical Engineering
113 pages
Six Methods for Sample Size Justification
No ratings yet
Six Methods for Sample Size Justification
31 pages
Senior R&D Director Application
100% (1)
Senior R&D Director Application
1 page
Budget of Work Practical Research
No ratings yet
Budget of Work Practical Research
4 pages
OLIVO - Lesson-1-Statistics-Quiz (1)
No ratings yet
OLIVO - Lesson-1-Statistics-Quiz (1)
2 pages
Taking The Q
No ratings yet
Taking The Q
30 pages
R Code For Canonical Correlation Analysis
No ratings yet
R Code For Canonical Correlation Analysis
10 pages
III Q3 EXAM Answer Key
100% (1)
III Q3 EXAM Answer Key
2 pages
Pengaruh Work-Life Balance Terhadap Kepuasan Kerja Karyawan (Studi Pada Pt. Bio Farma Persero)
No ratings yet
Pengaruh Work-Life Balance Terhadap Kepuasan Kerja Karyawan (Studi Pada Pt. Bio Farma Persero)
11 pages
Summative Examination Stat
100% (1)
Summative Examination Stat
3 pages
Constructive Research
No ratings yet
Constructive Research
20 pages
The Many Dimensions of Dimension
No ratings yet
The Many Dimensions of Dimension
7 pages
M.E. Mech (Prod Des & Devp)
No ratings yet
M.E. Mech (Prod Des & Devp)
41 pages
Basic Epidemiology Summary
No ratings yet
Basic Epidemiology Summary
3 pages
The Chi Square Statistic
No ratings yet
The Chi Square Statistic
6 pages

Module 2 - Part 1

Uploaded by

Module 2 - Part 1

Uploaded by

University of Mumbai

Program – Bachelor of Engineering in

Course Name – Statistics for Artificial

 The probability sampling method utilizes some form of random selection. In

What is Non-Probability Sampling?

 The non-probability sampling method is a technique in which the researcher

 In a stratified sampling method, the total population is divided into smaller

 A sample is a subset of data from a larger data set i.e. population.

 A population is a large, defined set of data.

 Random sampling is a process in which each available member of the

 Simple Random Sampling with Replacement (SRSWR) –

 Simple Random Sampling without Replacement (SRSWOR) –

Types of sample bias -

Self-Selection Bias Nonresponse Bias

 For example, suppose you’re  For example, subjects with health

 Studies that assess a sample of  For example, a study that advertises

 For example, homeless people  A study is conducted to assess the

 Statistical bias refers to a systematic error that causes an estimator or a

 In the era of big data, it is sometimes surprising that smaller is better.

 Increased Accuracy: Larger datasets can provide more information and

 Accuracy and Reliability: High-quality data ensures that the information is

Context and Purpose:

 It is used to estimate the population  It represents the true average of the

 Selection bias refers to the practice of selectively choosing data consciously or

 If you specify a hypothesis and conduct a well-designed experiment to test it,

 This effect is particularly noticeable in cases where there is some degree of

 When a variable is extreme on its first measurement, it will likely be closer to

 This happens due to random variation or measurement error, not because of

 The term sampling distribution of a statistic refers to the distribution of some

 Typically, a sample is drawn with the goal of measuring something or modeling

1. Collect a number of brand-new samples from the population.

Numerical for Practice

In a certain property investment company with an international presence, workers

 In practice, this approach of collecting new samples to estimate the standard

 Bootstrap is a powerful statistical method used for estimating the distribution of

1. Original Dataset: Suppose you have a dataset with n observations.

 Resampling is a broad statistical technique that involves repeatedly drawing

 Bootstrapping is a specific type of resampling method. It involves repeatedly

 Resampling is an umbrella term that includes multiple methods.

 Resampling can involve either sampling with or without replacement,

Factors Affecting Confidence Intervals:

Sample Size (n):

You might also like