Introduction to data analytics
PH 618
# Lecture: 01
Topic: Introduction
by
Dr. Joy Prakash Das
Department of Physics
NIT Tiruchirappalli
Date: 06/01/2025 (Mon)
Questions
What is data?
Data and information?
2
Questions
➢ Examples
3
Syllabus
Tools of
Probability
Statistical Linear
Inference Regression
Resampling Tree based
methods methods
4
Course Objectives
➢ To introduce the language and core concepts of
probability theory.
➢ To understand basic principles of statistical inference.
➢ To perform linear regression and apply various
classification techniques using software
➢ To apply resampling methods to enhance model
performance.
➢ To introduce tree-based methods and foundational
concepts in deep learning, including neural networks
and convolutional networks.
5
Tools of probability
➢ Concept of Probability
➢ Random variables
➢ Central limit theorem
➢ Conditional probability
➢ Total probability theorem
➢ Bayes theorem
➢ Collecting Data
➢ Summarizing and Exploring Data
6
Statistical Inference
➢ Basic Concepts of Inference
➢ Inferences for Single Samples
➢ Interference for two samples
➢ Z-test
➢ Student’s t-test
➢ Implementation in R
7
Linear regression
➢ Simple linear regression
➢ Multiple linear regression
➢ Qualitative predictors
➢ Few application using a programming language
➢ Classifications
➢ Qualitative variables
➢ Logistic regression
➢ Linear discriminant analysis
➢ Quadratic logistic regression
➢ Naive Bayes
➢ K-nearest neighbors 8
Image source: kids.britannica.com, giphy.com
Resampling methods
➢ Validation approach
➢ Leave out cross validation
➢ Boot strap
➢ Linear model selection and regularization
➢ Subset selection
➢ Stepwise selection
➢ Shrinkage methods
➢ Non linear regression
➢ Polynomial, step function and splines.
9
Image source: giphy.com, gfycat.com
Tree based methods
➢ Decision trees
➢ Bagging
➢ Deep learning
➢ Single layer neural networks
➢ Multilayer Neural networks
➢ Convolution neural networks
10
Text Books and References
1. Gareth James, Daniela Witten,Trevor
Hastie, Robert Tibshirani, An
Introduction to Statistical Learning
with applications in R (2nd Edition),
Springer, 2021
Lecture slides will be available at
tinyurl.com/PH618joy
11
Dropbox link
Lecture slides and e-books will be available at tinyurl.com/PH618joy
12
encrypted-tbn0.gstatic.com media.istockphoto.com gettyimages.in billboard.com
Plan of action
JAN APR
2025 2025
4 MONTHS 14 weeks 40 lectures
Topic # Lectures
Tools 12
Regression 10
Resampling 10
Tree based methods 8
13
Evaluation
➢ Assessment at regular intervals of time will be carried out.
➢ The following examinations will be held.
Exam Marks Tentative date
Cycle Test I (CT 1) 15 Feb last week
Cycle Test I (CT 2) 15 Mar last week
Assignments 20 TBA
End semester exam 50 May
➢ Out of 100, the passing marks will be 35 or Class-Average/2
whichever is higher.
➢ End semester: score at least 20%.
➢ Attendance : At least 75% (With medical reasons: 65%). 14
Evaluation
➢ Relative grading will be followed.
➢ Based on the performance, each student is awarded a final letter
grade at the end of the semester, in each subject. The letter
grades and the corresponding grade points are as follows:
➢ GPA (Grade Point Average) will be calculated as (C is credit)
15
Data analytics vs Data science
➢ One of the biggest differences between data analysts and
scientists is what they do with data.
➢ Data analysts typically work with structured data to solve tangible
business problems using tools like SQL, R or Python programming
languages, data visualization software, and statistical analysis.
➢ Common tasks for a data analyst might include:
➢ Collaborating with organizational leaders to identify
informational needs
➢ Acquiring data from primary and secondary sources
➢ Cleaning and reorganizing data for analysis
➢ Analyzing data sets to spot trends and patterns that can be
translated into actionable insights
➢ Presenting findings in an easy-to-understand way to inform
data-driven decisions
16
Data analytics vs Data science
➢ Data scientists often deal with the unknown by using more
advanced data techniques to make predictions about the future.
➢ They might automate their own machine learning algorithms or
design predictive modeling processes that can handle both
structured and unstructured data. This role is generally considered
a more advanced version of a data analyst.
➢ Some day-to-day tasks might include:
➢ Gathering, cleaning, and processing raw data
➢ Designing predictive models and machine learning algorithms
to mine big data sets
➢ Developing tools and processes to monitor and analyze data
accuracy
➢ Building data visualization tools, dashboards, and reports
➢ Writing programs to automate data collection and processing
17
Companies using data analytics
and many more… 18
Course outcome
Upon completion of this course, students will be able to
➢ gain a foundational understanding of probability concepts and
apply these concepts to real world data problems.
➢ develop skills in summarizing and exploring data, and perform
basic inferential statistics to draw conclusions from samples.
➢ apply regression analysis and classification methods to solve
practical data analysis problems.
➢ learn and implement various
resampling methods, such as
validation approach, crossvalidation,
and bootstrapping, to evaluate
model performance.
➢ build and interpret decision trees,
and gain an introduction to neural A smart data analyst
networks and deep learning.
19
Thank You
20