0% found this document useful (0 votes)

19 views12 pages

Statistics - Material

Uploaded by

swornavidhya.python

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views12 pages

Statistics - Material

Uploaded by

swornavidhya.python

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 12

1.

Introduction to Statistics and Understanding Data (20 minutes)

 Why Statistics Matters:

o Statistics plays a critical role in data analysis, helping to uncover patterns,
make decisions, and validate models. In machine learning, statistics is used to
preprocess data, evaluate models, and make predictions.
o It helps provide insights into the structure and characteristics of datasets,
allowing data scientists to design better models.
 Why Understanding Data Types is Crucial:
o Data type determines which statistical methods are appropriate for analysis.
o Different types of data require different preprocessing techniques and impact
the choice of machine learning algorithms.
 Understanding Data Types:
o Nominal Data: Categories without a meaningful order (e.g., color, gender).
 Example: Gender (Male/Female), Color (Red/Blue/Green).
 Statistical Operations: Mode, Frequency distribution.
o Ordinal Data: Data with a meaningful order or ranking, but without
consistent intervals (e.g., rating scales).
 Example: Educational level (High School, Undergraduate, Graduate).
 Statistical Operations: Mode, Median, Rank correlation.
o Interval Data: Data with consistent intervals, but no true zero (e.g.,
temperature in Celsius).
 Example: Temperature (30°C, 40°C).
 Statistical Operations: Mean, Median, Standard Deviation.
o Categorical Data (Nominal and Ordinal): Data divided into categories.
 Statistical Operations: Mode, Frequency tables, Cross-tabulation.
o Continuous Data: Data that can take any value within a range (e.g., height,
weight).
 Statistical Operations: Mean, Median, Standard Deviation, Variance.
o Discrete Data: Data that can only take specific values (e.g., number of
children, goals scored).
 Statistical Operations: Mode, Frequency distribution.

2. Descriptive Statistics

 What is Descriptive Statistics?

o Descriptive Statistics refers to the statistical methods used to summarize and
describe the main features of a dataset. Unlike inferential statistics (which
make predictions about a population from a sample), descriptive statistics
focus solely on the characteristics of the data at hand.
o It includes measures of central tendency, dispersion, and the shape of the data
distribution, allowing for a clear and concise understanding of the data.
 Measures of Central Tendency:
o Mean (Arithmetic Average): The sum of all data points divided by the
number of points.
 Formula: μ=∑i=1nxin\mu = \frac{\sum_{i=1}^{n} x_i}{n} Where
xix_i are the data points, and nn is the number of data points.
 Use: Most common for continuous, normally distributed data.
o Median: The middle value when the data is sorted in ascending order.
 Use: When data is skewed or contains outliers, the median is a more
accurate measure of central tendency.
o Mode: The most frequent value in the dataset.
 Use: For categorical or nominal data where mean or median cannot be
used.
 Example: In a dataset of colors, the mode could be "Red" if it appears
most frequently.
o When to Use:
 Mean: For continuous, normally distributed data.
 Median: For skewed distributions or ordinal data.
 Mode: For categorical data.
 Measures of Dispersion (Spread of Data):
o Variance: A measure of the average squared deviation from the mean.
 Formula: σ2=1n∑i=1n(xi−μ)2\sigma^2 = \frac{1}{n} \
sum_{i=1}^{n} (x_i - \mu)^2
 Use: Provides insight into how spread out the data is. A higher
variance indicates a wider spread.
o Standard Deviation: The square root of variance, providing a measure of how
much each data point deviates from the mean.
 Formula: σ=σ2\sigma = \sqrt{\sigma^2}
 Use: More interpretable than variance, as it is in the same unit as the
data.
o Range: The difference between the highest and lowest values in the dataset.
 Formula: Range=max⁡(x)−min⁡(x)\text{Range} = \max(x) - \min(x)
 Use: Provides a simple way to understand the spread of data, but can
be affected by outliers.
 Shape of the Distribution:
o Skewness: Measures the asymmetry of the distribution.
 Positive Skew: The right tail is longer than the left (e.g., income data
with a few very high earners).
 Negative Skew: The left tail is longer than the right.
o Kurtosis: Measures the "tailedness" of the distribution and the concentration
of data in the tails.
 High Kurtosis: More data points in the tails (e.g., more extreme
outliers).
 Low Kurtosis: Less data in the tails (e.g., a more uniform
distribution).
 Visualizing Descriptive Statistics:
o Histograms: Show the frequency of data points within certain ranges.
o Box Plots: Display the distribution and identify outliers.
o Bar Charts: Useful for visualizing categorical data (e.g., frequency of each
category).
 Practical Example:
o Use a dataset of exam scores to calculate the mean, median, mode, variance,
standard deviation, and range. Visualize the distribution with a histogram and
box plot.
Usharama61@gmail.com

3. Probability

 Basic Probability Concepts:

o Sample Space (S): The set of all possible outcomes (e.g., for a dice roll: {1, 2,
3, 4, 5, 6}).
o Event: A subset of the sample space (e.g., rolling an even number).
o Probability of an Event (A):
P(A)=Number of favorable outcomesTotal number of outcomesP(A) = \frac{\
text{Number of favorable outcomes}}{\text{Total number of outcomes}}
 Conditional Probability:
o Formula: P(A∣B)=P(A∩B)P(B)P(A|B) = \frac{P(A \cap B)}{P(B)}
o Bayes' Theorem: P(A∣B)=P(B∣A)P(A)P(B)P(A|B) = \frac{P(B|A) P(A)}
{P(B)}
o Example: Calculate conditional probabilities and apply Bayes' Theorem to
predict outcomes.
 Z-Score:
o A Z-score measures how many standard deviations a data point is from the
mean: Z=X−μσZ = \frac{X - \mu}{\sigma}
o Interpretation: A Z-score of 2 means the data point is 2 standard deviations
above the mean.
 Practical Example: Calculate the Z-scores for a dataset of exam scores and interpret
the results.

4. Inferential Statistics (45 minutes)

Objective: Understand the concepts of inferential statistics, which enable us to make

predictions or inferences about a population from sample data.

 What is Inferential Statistics?

Inferential statistics is a branch of statistics that allows us to make conclusions or
inferences about a population based on sample data. Unlike descriptive statistics,
which only summarize the data at hand, inferential statistics uses data from a sample
to estimate, predict, or test hypotheses about a population.

Inferential statistics is powerful because collecting data from an entire population is

often impractical or impossible. Instead, we collect data from a sample, and using
inferential methods, we make predictions or generalizations about the larger
population from which the sample is drawn.

 Key Concepts in Inferential Statistics:

o Population vs. Sample:
 Population: The entire group being studied (e.g., all employees in a
company, all customers in a region).
 Sample: A subset of the population selected for analysis. Samples are
used because it's often impractical to collect data from the entire
population.
o Parameter vs. Statistic:
 Parameter: A value that describes a characteristic of the entire
population (e.g., population mean, population variance).
 Statistic: A value that describes a characteristic of a sample (e.g.,
sample mean, sample variance).
o Point Estimation: Estimating population parameters (such as the population
mean) using sample statistics. Point estimates are used when we want to
estimate a specific value of a population characteristic.
o Confidence Intervals: Provides a range of values that is likely to contain the
population parameter with a certain level of confidence (e.g., 95% confidence
interval).
o Hypothesis Testing: Using sample data to test assumptions or claims
(hypotheses) about a population. Hypothesis testing helps us to validate or
refute assumptions about the population using statistical evidence.

5. Central Limit Theorem (CLT)

Definition: The distribution of the sample mean approaches a normal distribution as the
sample size increases, even if the population is not normally distributed.

Key Importance: This allows for the use of normal distribution for hypothesis testing and
estimation, even with non-normal data.

 Central Limit Theorem (CLT):

 Definition: The Central Limit Theorem states that the distribution of the sample mean
approaches a normal distribution as the sample size increases, even if the population
distribution is not normal. This holds true for sufficiently large sample sizes (typically
n≥30n \geq 30n≥30).
 Why It Matters:
o The CLT is the foundation for much of inferential statistics because it allows
us to make statistical inferences (such as hypothesis testing and confidence
intervals) based on the assumption of normality, even when the underlying
population is not normally distributed.
o This means that we can use methods such as z-tests and t-tests, which assume
normality, even for non-normally distributed populations, as long as we have a
sufficiently large sample.

 Application of CLT:

 The CLT allows us to estimate population parameters (such as the population mean)
based on the sample mean, and to calculate the standard error of the mean.
 The sampling distribution of the sample mean is normal (or nearly normal)
regardless of the shape of the population distribution, as long as the sample size is
large enough.
 Example:

 Simulate sampling from a skewed dataset (e.g., income distribution) and demonstrate
how the sampling distribution of the mean approaches a normal distribution as the
sample size increases.

 Practical Example: Simulate sampling from a skewed dataset (e.g., income) and
demonstrate how the sampling distribution of the mean approaches normality as the
sample size increases.

6. Hypothesis Testing

 Hypothesis Testing Basics:

 Null Hypothesis (H₀): The default assumption that there is no effect, no difference,
or no relationship in the population (e.g., the mean income is $50,000).
 Alternative Hypothesis (H₁): The hypothesis that we are trying to prove, asserting
that there is an effect or a difference (e.g., the mean income is not $50,000).
 Steps in Hypothesis Testing:
1. Formulate the Hypotheses: Define H₀ and H₁.
2. Select the Significance Level (α): This is the probability threshold for
rejecting the null hypothesis (commonly 0.05 or 0.01).
3. Collect Data: Gather a sample to test the hypothesis.
4. Perform the Test: Use a test statistic (e.g., t-test or z-test) to analyze the data.
5. Make a Decision: If the p-value ≤ α, reject H₀. If p-value > α, fail to reject
H₀.

 P-value:

 The probability of obtaining a result at least as extreme as the observed one, assuming
the null hypothesis is true.
 Interpretation: A small p-value (typically ≤ 0.05) suggests that the observed result is
unlikely to have occurred under the null hypothesis, so we reject H₀. If the p-value is
large, we fail to reject H₀, indicating insufficient evidence to support H₁.

 Types of Errors:

 Type I Error (False Positive): Rejecting a true null hypothesis (e.g., concluding
there is a difference when there is none).
 Type II Error (False Negative): Failing to reject a false null hypothesis (e.g.,
concluding there is no difference when there actually is).


o Null Hypothesis (H₀): Assumes no effect, no difference, or no relationship.
o Alternative Hypothesis (H₁): The hypothesis being tested for.
o P-value: The probability of obtaining a result at least as extreme as the
observed one, assuming the null hypothesis is true.
 Decision Rule: If p≤0.05p \leq 0.05, reject H₀; if p>0.05p > 0.05, fail
to reject H₀.
 T-test:
o One-sample T-test: Tests if the sample mean differs from a known value.
o Two-sample T-test: Compares means from two independent groups (e.g.,
testing if male and female salaries differ).
o Paired T-test: Compares means of two related groups (e.g., pre-test vs post-
test scores).
 Practical Example: Perform a two-sample t-test to compare the average scores of
two groups.

6. Confidence Intervals

A confidence interval is a range of values that is likely to contain the true population
parameter with a certain level of confidence (e.g., 95% confidence interval).

 Formula for Confidence Interval:

CI=Xˉ±Z×σn\text{CI} = \bar{X} \pm Z \times \frac{\sigma}{\sqrt{n}}CI=Xˉ±Z×nσ

Where:

o Xˉ\bar{X}Xˉ is the sample mean,

o ZZZ is the Z-score for the desired confidence level (e.g., 1.96 for 95%
confidence),
o σ\sigmaσ is the population standard deviation (or sample standard deviation if
the population standard deviation is unknown),
o nnn is the sample size.
 Interpretation:
o A 95% confidence interval means that if we were to take 100 samples, 95 of
them would contain the true population parameter.
o It gives us an estimate of the precision of our sample statistic and is used to
infer the true value of a population parameter.
 Inferential Statistics Link: Confidence intervals help us make inferences about a
population parameter by using a sample, and they give us a range that is likely to
contain the true population value.
 Practical Example:
o Calculate a 95% confidence interval for the average height based on a sample
dataset.

7. A/B Testing (30 minutes)

Objective: Understand and implement A/B testing in data science.

 A/B Testing Overview:

 Definition: A/B testing is a randomized controlled experiment comparing two
versions of a product or feature to determine which one performs better (e.g., website
layout, email subject lines).

 Steps in A/B Testing:

1. Formulate Hypotheses: Null hypothesis (no difference) and alternative hypothesis

(there is a difference).
2. Random Assignment: Randomly assign users to two groups (Group A and Group B).
3. Measure and Compare: Collect performance metrics (e.g., conversion rates, click-
through rates) from both groups.
4. Statistical Testing: Use t-tests or z-tests to determine if the difference between
groups is statistically significant.

 Inferential Statistics in A/B Testing:

 A/B testing uses inferential statistics to determine if the observed difference between
two groups is statistically significant. This allows businesses to infer whether one
version of a product is superior to another.

 Steps in A/B Testing:

1. Hypothesis Formulation: Null hypothesis (no difference) and alternative

hypothesis (there is a difference).
2. Random Assignment: Randomly assign users to different versions of a
product.
3. Measure and Compare: Collect data from each group (e.g., conversion rates,
click-through rates).
4. Statistical Testing: Use t-tests or z-tests to determine if the difference
between groups is statistically significant.

 Practical Example: Implement an A/B test comparing two versions of a website

(e.g., different button colors) to see which has a higher conversion rate.

Certainly! Below is a section explaining the difference between Inferential Statistics and
Descriptive Statistics:

Difference Between Inferential Statistics and Descriptive Statistics

Objective: Understand the key distinctions between Inferential Statistics and Descriptive
Statistics, and how each is used in data analysis.

Descriptive Statistics:
Definition: Descriptive statistics refers to methods that summarize and describe the main
features of a dataset. It is focused on organizing, presenting, and analyzing data in a way
that is easy to understand, without making predictions or inferences about a larger population.

 Purpose:
o The main goal is to describe the data that you have collected in a meaningful
way. This includes measures like averages, percentages, and frequencies that
help give a clear picture of the dataset.
 Key Operations:
o Central Tendency: Measures such as mean, median, and mode, which tell us
about the "center" of the data.
o Dispersion: Measures like range, variance, and standard deviation that
describe how spread out the data is.
o Shape of the Distribution: Measures such as skewness and kurtosis that
describe the shape of the data distribution.
o Visualizations: Histograms, bar charts, pie charts, and box plots that
graphically represent the data.
 Limitations:
o No Generalizations: Descriptive statistics do not allow us to make
generalizations or predictions about a population. They only describe the
sample data at hand.
o No Hypothesis Testing: It does not test hypotheses or predict future
outcomes.
 Example:
o Suppose you have a dataset of the heights of 100 students. Descriptive
statistics would allow you to calculate the average height, the range, and the
standard deviation of heights, but it wouldn't tell you about the population of
all students outside this sample.

Inferential Statistics:

Definition: Inferential statistics involves using sample data to make predictions,

generalizations, or inferences about a population from which the sample is drawn. It
allows us to go beyond the immediate data and make conclusions or test hypotheses about a
larger group.

 Purpose:
o The main goal is to make inferences or draw conclusions about a population
based on a sample. It uses probability theory to make predictions or decisions
about a population's characteristics.
 Key Operations:
o Estimation: Estimating population parameters (e.g., estimating the mean or
variance of a population from sample data).
o Hypothesis Testing: Testing assumptions or claims about a population (e.g.,
testing whether the average income of a group is different from a known
value).
o Confidence Intervals: Estimating a range of values for a population
parameter with a certain level of confidence (e.g., estimating the population
mean with a 95% confidence interval).
o Predictive Modeling: Using statistical models to predict future outcomes
(e.g., predicting future sales based on historical data).
 Limitations:
o Sample-Dependent: The conclusions drawn are based on the sample, and
there is always a level of uncertainty. The accuracy of inferences depends on
the sample size and representativeness.
o Requires Assumptions: Many inferential techniques (e.g., t-tests, regression
models) require assumptions, such as normality or independence of
observations.
 Example:
o Suppose you take a sample of 100 students' heights from a university and find
an average height of 5.5 feet with a standard deviation of 0.2 feet. Inferential
statistics could help you estimate the average height of all students at that
university (population) and test whether the sample mean significantly differs
from the average height in other populations.

Key Differences:

Feature Descriptive Statistics Inferential Statistics

Summarizes and describes data Makes predictions or inferences about a
Purpose
at hand population from sample data
Deals with the data you Uses sample data to draw conclusions
Data Focus
currently have about a larger population
No generalizations beyond the Makes generalizations and predictions
Generalizations
dataset beyond the sample
Mean, median, mode, range, Estimation, hypothesis testing, confidence
Key Techniques
variance, standard deviation intervals, regression
None needed for summarizing Requires assumptions (e.g., sample size,
Assumptions
data normality)
More abstract visualizations, such as
Bar charts, histograms, box
Visualizations confidence intervals and hypothesis testing
plots, etc.
plots
Calculating the mean height of Estimating the average height of all
Example
a group of students students in a university based on a sample

Conclusion:

 Descriptive statistics help us summarize and describe the dataset we have in a

meaningful way, providing a snapshot of the data.
 Inferential statistics allows us to make conclusions, predictions, or generalizations
about a larger population, using the data we have from a sample. It involves statistical
modeling and hypothesis testing to make informed decisions based on sample data.
Both branches of statistics are essential in data science. Descriptive statistics provide the
foundation for data exploration, while inferential statistics enable us to make evidence-based
decisions and predictions for the broader population.

8. Conclusion and Q&A (10 minutes)

Objective: Recap the session and address any questions.

 Recap:
oWe covered data types, descriptive statistics, probability, the Central Limit
Theorem, hypothesis testing, confidence intervals, and A/B testing.
o How these concepts help in making decisions based on data in machine
learning and data science.
 Q&A: Open the floor for final questions and clarifications.

Statistics plays a crucial role in Exploratory Data Analysis (EDA) by helping data
scientists and analysts summarize, visualize, and understand the underlying structure of data
before moving to more complex modeling tasks. Here’s how statistics contributes to the EDA
process:

1. Understanding the Data Distribution

 Descriptive Statistics:
o Descriptive statistics (mean, median, mode, variance, standard deviation)
provide an initial understanding of the central tendency and spread of the
data. This helps in identifying the key characteristics of each variable.
o For example, knowing the mean and standard deviation of a numeric
column gives you a sense of the average value and how spread out the values
are.
 Visualizing Distribution:
o Histograms and density plots are used to visualize how data is distributed
(e.g., normal, skewed, uniform).
o Understanding the distribution helps you know which statistical techniques to
apply and whether certain transformations (e.g., log transformation) might be
needed to normalize the data for modeling.

2. Identifying Outliers

 Box Plots:
o Box plots provide a clear visualization of the spread and help identify
outliers (data points that fall outside the whiskers). Outliers can significantly
affect model performance and are thus crucial to identify during EDA.
 Z-scores:
o Z-scores can also help detect outliers. Data points with a Z-score greater than
3 or less than -3 are typically considered outliers. This step is useful in
detecting extreme values that might need further investigation or removal from
the dataset.

3. Understanding Relationships Between Variables

 Correlation and Covariance:

o Correlation analysis (e.g., Pearson’s correlation coefficient) helps identify
relationships between two or more numeric variables. Positive or negative
correlations can indicate how one variable changes with respect to another.
o Covariance is similar to correlation and can provide insights into the direction
of relationships between variables.
 Scatter Plots:
o Scatter plots are an essential tool for visualizing the relationship between two
continuous variables. They help uncover linear or non-linear relationships,
trends, or patterns that can guide feature engineering or model selection.
 Cross-tabulation and Chi-Square Tests:
o For categorical data, cross-tabulation and Chi-square tests help analyze the
relationship between two categorical variables by checking for independence
or association.

4. Identifying Data Patterns and Structure

 Skewness and Kurtosis:

o Skewness indicates the asymmetry of the data distribution (whether the data is
skewed left or right).
o Kurtosis tells you whether the data has heavy or light tails. High kurtosis
indicates a high concentration of data points in the tails, which could be a sign
of outliers.
 Box-Cox Transformation:
o For data that is not normally distributed, transformations like Box-Cox can be
applied to normalize the distribution. This step improves the accuracy of
models that assume normality (e.g., linear regression, logistic regression).

5. Feature Selection and Engineering

 Identifying Important Features:

o Through statistical methods like correlation analysis, ANOVA, or mutual
information, EDA helps in selecting the most relevant features for modeling.
Features with high correlation to the target variable or low variance can be
candidates for feature selection.
 Handling Missing Data:
o Statistics is vital in dealing with missing data. Techniques like mean
imputation, median imputation, or multiple imputation are used to handle
missing values based on statistical reasoning, which helps in preventing biases
or errors in subsequent modeling.

6. Preparing Data for Modeling

 Statistical Tests:
o During EDA, statistical tests (e.g., t-tests, ANOVA) help assess the
significance of variables. These tests provide a way to check the reliability and
importance of features in explaining the target variable.
 Normalization and Standardization:
o Standardizing or normalizing features (scaling them to a standard range or
distribution) is essential for some machine learning algorithms. This can be
done using statistical techniques, ensuring that no feature dominates due to
differences in scale.
 Identifying Data Types:
o During EDA, statistical summaries allow you to classify data types (nominal,
ordinal, interval, ratio) and choose the appropriate analysis method. This is
especially important when deciding between techniques like chi-square tests
for categorical data or regression for continuous data.

7. Hypothesis Generation

 Statistical Reasoning:
o During EDA, you often generate hypotheses about the relationships in the data
or patterns that might exist. Statistical reasoning helps you form questions
like:
 Is the mean of one group different from another group?
 Is there a significant relationship between two variables?
o This step is essential for guiding further analyses and model-building steps.

Conclusion:

Statistics is at the heart of Exploratory Data Analysis (EDA) because it helps uncover the
distribution, relationships, and patterns in data. By applying descriptive and inferential
statistical techniques, data scientists can better understand the data, identify problems such as
outliers or missing values, and prepare the data for subsequent modeling. This statistical
foundation is crucial for making informed decisions, refining models, and ensuring that
insights drawn from the data are valid and reliable.

Statistics (Curso Completo)
No ratings yet
Statistics (Curso Completo)
9 pages
EDA - Reviewer Midterm
No ratings yet
EDA - Reviewer Midterm
9 pages
Prelim Coverage
No ratings yet
Prelim Coverage
6 pages
Statistics For Data Analysis
No ratings yet
Statistics For Data Analysis
13 pages
Statistics Notes 1702100127
No ratings yet
Statistics Notes 1702100127
22 pages
Statistics Referesher
No ratings yet
Statistics Referesher
30 pages
Business Analytics
No ratings yet
Business Analytics
44 pages
BRM Unit-1
No ratings yet
BRM Unit-1
25 pages
Pointers To Review Statistics
No ratings yet
Pointers To Review Statistics
6 pages
EDA - Reviewer Midterm
No ratings yet
EDA - Reviewer Midterm
8 pages
Statistics Guide
No ratings yet
Statistics Guide
27 pages
Mathematics Statistics
No ratings yet
Mathematics Statistics
4 pages
Statisitcs
No ratings yet
Statisitcs
22 pages
Introduction to Data & Statistics
No ratings yet
Introduction to Data & Statistics
21 pages
Introduction to Business Statistics
No ratings yet
Introduction to Business Statistics
54 pages
Statistics For Business Analytics - Keywords List - Explanation With Example - SVN
No ratings yet
Statistics For Business Analytics - Keywords List - Explanation With Example - SVN
15 pages
Statistics
No ratings yet
Statistics
7 pages
Intro To Probability and Statistics
No ratings yet
Intro To Probability and Statistics
147 pages
Statistics
No ratings yet
Statistics
3 pages
Chapters 1 and 2chapters 1 and 2chapters 1 and 2chapters 1 and 2chapters 1 and 2
No ratings yet
Chapters 1 and 2chapters 1 and 2chapters 1 and 2chapters 1 and 2chapters 1 and 2
47 pages
Sasa Reviewer P1, P4 at P5
No ratings yet
Sasa Reviewer P1, P4 at P5
10 pages
STAT100 - Full Course Notes
No ratings yet
STAT100 - Full Course Notes
27 pages
BPCC 104 EM 23-24 @assignment - Solved - IGNOU
No ratings yet
BPCC 104 EM 23-24 @assignment - Solved - IGNOU
11 pages
Business Statistics Course Guide
No ratings yet
Business Statistics Course Guide
69 pages
Unit-2 Data Analytics Approaches
No ratings yet
Unit-2 Data Analytics Approaches
24 pages
ISDS 361A - Cheat Sheet Exam 1 PDF
No ratings yet
ISDS 361A - Cheat Sheet Exam 1 PDF
2 pages
Chapter2-Statistical Analysis
No ratings yet
Chapter2-Statistical Analysis
86 pages
Statistics Interview Questions
No ratings yet
Statistics Interview Questions
10 pages
DAVA Notes 1-1
No ratings yet
DAVA Notes 1-1
19 pages
Applications of Inference Statistics
No ratings yet
Applications of Inference Statistics
28 pages
Ads Exp1
No ratings yet
Ads Exp1
4 pages
Ms Data Science S, 24 (WEEK# 1) Unlock
No ratings yet
Ms Data Science S, 24 (WEEK# 1) Unlock
31 pages
Ms Data Science S, 24 (WEEK# 1)
No ratings yet
Ms Data Science S, 24 (WEEK# 1)
30 pages
RESEARCH
No ratings yet
RESEARCH
9 pages
Probability and Statistics
No ratings yet
Probability and Statistics
50 pages
Inferential Statistics Course
No ratings yet
Inferential Statistics Course
46 pages
Sasa Reviewer P1 J P4 at P5
No ratings yet
Sasa Reviewer P1 J P4 at P5
10 pages
SAS 2130 Statistics 2021
No ratings yet
SAS 2130 Statistics 2021
212 pages
COMM 191 Reviewer
No ratings yet
COMM 191 Reviewer
17 pages
Statistics
No ratings yet
Statistics
36 pages
Research
No ratings yet
Research
9 pages
Math Test Prep File
No ratings yet
Math Test Prep File
88 pages
I Mba SBD
No ratings yet
I Mba SBD
7 pages
DS Chapter - 2
No ratings yet
DS Chapter - 2
73 pages
Chapter1 S
No ratings yet
Chapter1 S
100 pages
Chapter1 K57 S
No ratings yet
Chapter1 K57 S
80 pages
Introduction To Statistics Fall 2020: Teacher Name: Dr. Saqib Ur Rehman
No ratings yet
Introduction To Statistics Fall 2020: Teacher Name: Dr. Saqib Ur Rehman
3 pages
Slide 1
No ratings yet
Slide 1
3 pages
Statistics and Probability - Midterm Reviewer
No ratings yet
Statistics and Probability - Midterm Reviewer
13 pages
Text 15
No ratings yet
Text 15
2 pages
Final SB: Chapter1: Overview of Statistics
No ratings yet
Final SB: Chapter1: Overview of Statistics
32 pages
Introduction To Statistics 2024-2025
No ratings yet
Introduction To Statistics 2024-2025
40 pages
Statistics
No ratings yet
Statistics
12 pages
Probability and Statistics I - 2023
No ratings yet
Probability and Statistics I - 2023
197 pages
Introduction Book 1
No ratings yet
Introduction Book 1
41 pages
Descriptive & Inferential Statistics
No ratings yet
Descriptive & Inferential Statistics
6 pages
أساسيات الاقتصاد القياسي باستخدام ايفيوز د خالد السواعي موقع المكتبة
No ratings yet
أساسيات الاقتصاد القياسي باستخدام ايفيوز د خالد السواعي موقع المكتبة
290 pages
BA-310-315 - Final Project
No ratings yet
BA-310-315 - Final Project
18 pages
Curve Fitting Techniques Explained
100% (1)
Curve Fitting Techniques Explained
43 pages
Project Stakeholder Management
No ratings yet
Project Stakeholder Management
37 pages
Syllabus (CBCS) : Faculty of Commerce & Business Management, Kakatiya University
No ratings yet
Syllabus (CBCS) : Faculty of Commerce & Business Management, Kakatiya University
50 pages
Snap Packet: Science Fair
No ratings yet
Snap Packet: Science Fair
26 pages
2018-2023 Kurikulum Program Studi S3 Statistika: Kode Mata Kuliah / Inggris SKS Semester 1
No ratings yet
2018-2023 Kurikulum Program Studi S3 Statistika: Kode Mata Kuliah / Inggris SKS Semester 1
2 pages
Multimedia Auditory Signal Analysis
No ratings yet
Multimedia Auditory Signal Analysis
17 pages
Chapter-24 Multivariate Statistical Analysis
No ratings yet
Chapter-24 Multivariate Statistical Analysis
80 pages
AIA 6550 Module 4
No ratings yet
AIA 6550 Module 4
13 pages
KNN vs SVM: A Python Implementation
No ratings yet
KNN vs SVM: A Python Implementation
6 pages
What Is Science Cornell Notes Example
No ratings yet
What Is Science Cornell Notes Example
3 pages
INeuron Courses
No ratings yet
INeuron Courses
5,136 pages
CH 03 Data Visualization
No ratings yet
CH 03 Data Visualization
66 pages
Density, Boxplot, Violinplot, Scatterplot
No ratings yet
Density, Boxplot, Violinplot, Scatterplot
7 pages
904 - Ayush Jha - Internship Letter For NoQs
No ratings yet
904 - Ayush Jha - Internship Letter For NoQs
4 pages
Strategi Informasi & Data Warehouse
No ratings yet
Strategi Informasi & Data Warehouse
21 pages
Organizational Behaviour - D1PK103T - MBA 1 Sem - Batch 1, 2 and 3 - Project - Dr. Pratibha Verma
No ratings yet
Organizational Behaviour - D1PK103T - MBA 1 Sem - Batch 1, 2 and 3 - Project - Dr. Pratibha Verma
6 pages
"A Comparative Study" ON The Financial Performance of Union Bank of India and Punjab & Sind Bank
No ratings yet
"A Comparative Study" ON The Financial Performance of Union Bank of India and Punjab & Sind Bank
11 pages
Assignment (NAOE-4107)
No ratings yet
Assignment (NAOE-4107)
4 pages
SAT Math: Problem-Solving & Data Analysis
No ratings yet
SAT Math: Problem-Solving & Data Analysis
11 pages
Surprise Housing Case Study Coincent
No ratings yet
Surprise Housing Case Study Coincent
4 pages
Data Mining Anomaly Detection
No ratings yet
Data Mining Anomaly Detection
33 pages
Management Information System: The Business School, University of Jammu
No ratings yet
Management Information System: The Business School, University of Jammu
10 pages
MASQuestions For MASChapter 11
No ratings yet
MASQuestions For MASChapter 11
34 pages
Basics of Data Literacy
100% (2)
Basics of Data Literacy
33 pages
(Ebook PDF) Elementary Statistics: A Step by Step Approach 9th Edition Download
100% (2)
(Ebook PDF) Elementary Statistics: A Step by Step Approach 9th Edition Download
52 pages
Research Methodology by Ashish Basnet.
100% (1)
Research Methodology by Ashish Basnet.
4 pages
SWD 325
No ratings yet
SWD 325
26 pages
AirBnb EDA
100% (1)
AirBnb EDA
20 pages

Statistics - Material

Uploaded by

Statistics - Material

Uploaded by

1.

Introduction to Statistics and Understanding Data (20 minutes)

 Why Statistics Matters:

 What is Descriptive Statistics?

 Basic Probability Concepts:

4. Inferential Statistics (45 minutes)

Objective: Understand the concepts of inferential statistics, which enable us to make

 What is Inferential Statistics?

Inferential statistics is powerful because collecting data from an entire population is

 Key Concepts in Inferential Statistics:

5. Central Limit Theorem (CLT)

 Central Limit Theorem (CLT):

 Hypothesis Testing Basics:

 Formula for Confidence Interval:

CI=Xˉ±Z×σn\text{CI} = \bar{X} \pm Z \times \frac{\sigma}{\sqrt{n}}CI=Xˉ±Z×nσ

o Xˉ\bar{X}Xˉ is the sample mean,

7. A/B Testing (30 minutes)

Objective: Understand and implement A/B testing in data science.

 A/B Testing Overview:

 Steps in A/B Testing:

1. Formulate Hypotheses: Null hypothesis (no difference) and alternative hypothesis

 Inferential Statistics in A/B Testing:

 Steps in A/B Testing:

1. Hypothesis Formulation: Null hypothesis (no difference) and alternative

 Practical Example: Implement an A/B test comparing two versions of a website

Difference Between Inferential Statistics and Descriptive Statistics

Definition: Inferential statistics involves using sample data to make predictions,

Feature Descriptive Statistics Inferential Statistics

 Descriptive statistics help us summarize and describe the dataset we have in a

8. Conclusion and Q&A (10 minutes)

Objective: Recap the session and address any questions.

1. Understanding the Data Distribution

3. Understanding Relationships Between Variables

 Correlation and Covariance:

4. Identifying Data Patterns and Structure

 Skewness and Kurtosis:

5. Feature Selection and Engineering

 Identifying Important Features:

6. Preparing Data for Modeling

You might also like