KEMBAR78
Lec PH618.01 | PDF | Data Analysis | Regression Analysis
0% found this document useful (0 votes)
16 views20 pages

Lec PH618.01

The document outlines a course on data analytics, focusing on core concepts of probability, statistical inference, linear regression, resampling methods, and tree-based methods. It includes course objectives, evaluation criteria, and a comparison between data analytics and data science. Upon completion, students will gain foundational skills in data analysis, model performance evaluation, and an introduction to neural networks and deep learning.

Uploaded by

Asutosh Panda
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views20 pages

Lec PH618.01

The document outlines a course on data analytics, focusing on core concepts of probability, statistical inference, linear regression, resampling methods, and tree-based methods. It includes course objectives, evaluation criteria, and a comparison between data analytics and data science. Upon completion, students will gain foundational skills in data analysis, model performance evaluation, and an introduction to neural networks and deep learning.

Uploaded by

Asutosh Panda
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Introduction to data analytics

PH 618

# Lecture: 01
Topic: Introduction
by

Dr. Joy Prakash Das


Department of Physics
NIT Tiruchirappalli
Date: 06/01/2025 (Mon)
Questions

What is data?

Data and information?

2
Questions

➢ Examples

3
Syllabus

Tools of
Probability

Statistical Linear
Inference Regression

Resampling Tree based


methods methods
4
Course Objectives

➢ To introduce the language and core concepts of


probability theory.
➢ To understand basic principles of statistical inference.
➢ To perform linear regression and apply various
classification techniques using software
➢ To apply resampling methods to enhance model
performance.
➢ To introduce tree-based methods and foundational
concepts in deep learning, including neural networks
and convolutional networks.
5
Tools of probability

➢ Concept of Probability
➢ Random variables
➢ Central limit theorem
➢ Conditional probability
➢ Total probability theorem
➢ Bayes theorem
➢ Collecting Data
➢ Summarizing and Exploring Data

6
Statistical Inference

➢ Basic Concepts of Inference


➢ Inferences for Single Samples
➢ Interference for two samples
➢ Z-test
➢ Student’s t-test
➢ Implementation in R

7
Linear regression

➢ Simple linear regression


➢ Multiple linear regression
➢ Qualitative predictors
➢ Few application using a programming language
➢ Classifications
➢ Qualitative variables
➢ Logistic regression
➢ Linear discriminant analysis
➢ Quadratic logistic regression
➢ Naive Bayes
➢ K-nearest neighbors 8
Image source: kids.britannica.com, giphy.com

Resampling methods

➢ Validation approach
➢ Leave out cross validation
➢ Boot strap
➢ Linear model selection and regularization
➢ Subset selection
➢ Stepwise selection
➢ Shrinkage methods
➢ Non linear regression
➢ Polynomial, step function and splines.

9
Image source: giphy.com, gfycat.com

Tree based methods

➢ Decision trees
➢ Bagging
➢ Deep learning
➢ Single layer neural networks
➢ Multilayer Neural networks
➢ Convolution neural networks

10
Text Books and References

1. Gareth James, Daniela Witten,Trevor


Hastie, Robert Tibshirani, An
Introduction to Statistical Learning
with applications in R (2nd Edition),
Springer, 2021

Lecture slides will be available at


tinyurl.com/PH618joy

11
Dropbox link

Lecture slides and e-books will be available at tinyurl.com/PH618joy

12
encrypted-tbn0.gstatic.com media.istockphoto.com gettyimages.in billboard.com
Plan of action

JAN APR
2025 2025

4 MONTHS 14 weeks 40 lectures

Topic # Lectures
Tools 12
Regression 10
Resampling 10
Tree based methods 8
13
Evaluation

➢ Assessment at regular intervals of time will be carried out.

➢ The following examinations will be held.


Exam Marks Tentative date
Cycle Test I (CT 1) 15 Feb last week
Cycle Test I (CT 2) 15 Mar last week
Assignments 20 TBA
End semester exam 50 May

➢ Out of 100, the passing marks will be 35 or Class-Average/2


whichever is higher.

➢ End semester: score at least 20%.

➢ Attendance : At least 75% (With medical reasons: 65%). 14


Evaluation

➢ Relative grading will be followed.

➢ Based on the performance, each student is awarded a final letter


grade at the end of the semester, in each subject. The letter
grades and the corresponding grade points are as follows:

➢ GPA (Grade Point Average) will be calculated as (C is credit)

15
Data analytics vs Data science

➢ One of the biggest differences between data analysts and


scientists is what they do with data.
➢ Data analysts typically work with structured data to solve tangible
business problems using tools like SQL, R or Python programming
languages, data visualization software, and statistical analysis.

➢ Common tasks for a data analyst might include:


➢ Collaborating with organizational leaders to identify
informational needs
➢ Acquiring data from primary and secondary sources
➢ Cleaning and reorganizing data for analysis
➢ Analyzing data sets to spot trends and patterns that can be
translated into actionable insights
➢ Presenting findings in an easy-to-understand way to inform
data-driven decisions
16
Data analytics vs Data science

➢ Data scientists often deal with the unknown by using more


advanced data techniques to make predictions about the future.
➢ They might automate their own machine learning algorithms or
design predictive modeling processes that can handle both
structured and unstructured data. This role is generally considered
a more advanced version of a data analyst.
➢ Some day-to-day tasks might include:
➢ Gathering, cleaning, and processing raw data
➢ Designing predictive models and machine learning algorithms
to mine big data sets
➢ Developing tools and processes to monitor and analyze data
accuracy
➢ Building data visualization tools, dashboards, and reports
➢ Writing programs to automate data collection and processing
17
Companies using data analytics

and many more… 18


Course outcome

Upon completion of this course, students will be able to


➢ gain a foundational understanding of probability concepts and
apply these concepts to real world data problems.
➢ develop skills in summarizing and exploring data, and perform
basic inferential statistics to draw conclusions from samples.
➢ apply regression analysis and classification methods to solve
practical data analysis problems.
➢ learn and implement various
resampling methods, such as
validation approach, crossvalidation,
and bootstrapping, to evaluate
model performance.
➢ build and interpret decision trees,
and gain an introduction to neural A smart data analyst
networks and deep learning.
19
Thank You

20

You might also like