Probability and Applied Statistics
(BCSS 0052 )
Probability and Statistical Inference
Lecture 1
Data Science‐ Why, What and How?
Pooja Pathak
pooja.Pathak@gla.ac.in
9897249598
1
What is Data Science?
What is Science?
‐ Special approach to find the answer of a query.
‐ We want to know the reason ‐ How, Why, Where, Who…?
What is Data?
‐ Source of reliable information.
What is Data Science?
‐ A scientific approach of reterving the information from data.
2
Why data is needed?
Why do we collect the data?
To quench the thirst for knowledge.
We want to know the reason ‐ How, Why, Where, Who…?
Easiest option‐ get some data about the relevant question.
Answer all the questions on the basis of collected data.
3
Data and Statistics
Data: Very important source of information but cannot speak
itself.
We cannot understand what data is telling us.
Statistics is the language of data.
How to collect the data, how to analyze that, how to draw
correct statistical inferences, how to decide for the correct
statistical tool on the numerical facts is referred as data analysis.
Statistics is a science of turning data into information to be used
for decision making. 4
Data and Statistics
Proper interpretation of inferences is important.
Statistics can not do miracles.
Statistics can not change the process or phenomenon.
Its a scientific way of extracting and retriving information.
Why collect data?
• To verify theoretical findings,
• Draw inferences just on the basis of collected data,
• Developing statistical models, which can be further used
for policy decisions, classification, forecasting etc.
5
Data and Statistics
Statistics is a language of data.
Correct data Wrong data
Correct statistical tool Correct decision Incorrect decision
Wrong statistical tool Incorrect decision Incorrect decision
Rule: Garbage in – Garbage out
Statistics has its own derived rules.
Rules are framed such that correct decisions, as indicated by the
data and based on the hidden information, are taken.
It does forecasting but not like astrologer‘s parrot.
6
Statistics and Data Science
How Statistics got transformed to Data Science?
What is expected from Data Science which was not expected from
“Statistics“.
Advent and rapid development in computers have impacted
Statistics.
Earlier, it was difficult to collect the data and even many times the
data was not available.
Now, data is easily available and too much data is available.
Big data analysis is the latest news, petabyte is the unit of data 7
size.
Statistics, Computers and Data Science
Earlier, the emphasis was on theoretical developments in Statistics.
Computers helped in the development of from “Computational
Statistics“.
If theory and mathematical analysis became complicated, the
computational statistics supplemented it.
With the computational support , the theoretical developments in
statistics gained more relevance and applications.
The computations and statistics became the two inseperable parts
of data science. 8
Statistics, Computers and Data Science
Once we adventure into the Computational Statistics, the role and
use of computers became very important.
Computers require programming language, software, data
management and several other aspects.
The areas of applications of statistics have increased.
Topics like artificial intelligence, machine learning, supervised
learning, unsupervised learning, reinforcement learning are based
on statistics but they are heavily based on compters.
9
Statistics, Computer Science and Data Science
Data science has various ingredients‐ Statistics, mathematics,
computer science, ...
Objectives of statistics and data science are the same.
Statistics aims to extract the information contained in the data and
so is the aim of data science.
Data science, when applied to different fields can lead to incredible
new insights.
10
Statistics, Computer Science and Data Science
The only form of data that matters in decision science is digital
data.
Digital data is information that is not so easily interpretable by an
individual. It depends upon machines to interpret/ process and/or
alter it.
What we see on a computer screen – text, photo, movie etc., they
are the digital letters which is essentially a systematic collection of
coded ones and zeros.
11
Expectation from Data Scientist
What is needed to become the data scientist?
First decide what we want to become‐ A Doctor or a Compounder?
Decide‐
Want to only use the tools?
Want to understand the utility of tools?
Or want to develop the tools?
In my opinion‐ all are needed.
12
Role of Statistics in Data Science
Statistics is the soul of data science.
• Descriptive statistics • Nonparametric inference
• Probability theory • Multivariate analysis
• Statistical inference • Linear regression analysis
• Decision theory • Nonlinear regression analysis
• Bayesian inference • Simulation techniques
• Frequentist inference • Monte Carlo methods
• Parametric inference • ………
13
Role of Statistics in Data Science
The theoretical developments are essential which are needed to be
exposed to computational procedures.
Computational procedures have their own limitations and so
optimization methods are required.
The implementation of statistical, mathematical, optimization
methods etc. are to be simultaneously implemented over a data
set and for that, data management is required.
All these aspects are logically implemented in a systematic way
and
correct statistical inferences are drawn. 14
Role of Statistics in Data Science
Based on the obtained inferences, proper interpretations are made
and used for policy formulation, policy prescription and further
applications like forecasting etc.
Without learning the basic tools of Statistics, it is not possible to
learn data science. So proper knowledge of all the fields is required
to become a data scientist.
My role?
My job?
15