Course Outline
Workshop on Data Science and Analytics
Course Description
Internet based digital businesses have grown rapidly in the last decade with Covid-19 providing a
greater impetus. With such businesses on rise, naturally the amount of data is also increasing by
leaps and bounds, now the data is being computed in exabytes and yottyabytes 1. Since there is a
huge amount of data being accumulated it is imperative to have a scientific mechanism to extract
knowledge from this data. Data Science provides a method to extract knowledge from this sea of
data. A data scientist is expected to possess skills in mathematics, statistics, machine learning,
databases and other branches of computer science along with a good understanding of the problem
formulation to create an effective solution. This course will introduce students to the rapidly
growing field of data science and analytics and its applications in different functional practices of
business. Students will be exposed to various aspects of data science practice, together with data
collection and integration, exploratory data analysis, predictive modelling, descriptive modelling
and solution presentation.
Learning Outcomes
Deliverables of this course are:
• Explain the concepts of Data Science and its components
• Develop kill sets required to be a data scientist.
• Understanding the applications of statistical tools
• Python programming language for statistical modelling and analysis
• Exploratory data analysis (EDA) in data science.
• Apply machine learning algorithms (Linear Regression, Logistics Regression) for
predictive modelling.
1
Tibi Puiu, “How big is a petabyte, exabyte or yottabyte? What’s the biggest byte for that matter?”, ZME Science,
accessed on April 20, 2020, https://www.zmescience.com/science/how-big-data-can-get/
1
Evaluation methods:
Students will be evaluated on the basis of:
Class participation: 10%
Quiz: 30%
Individual Assignment: 10%
Group Project: 50%
Following is a listing of session wise contents; each session will be of 75 minutes
Session Number Topics Learning Resource
1 Introduction: What is Data Science?
- Why Data Science?
- Current landscape of perspectives
- Discussion on skills required
2-3 Data Science Toolkit
Understanding Python as a programming
environment
Setting up environment
-Python and Jupyter Notebook
- First Python sheet
4-5 Python Constructs
List, dictionaries & Tuples
Strings
Iterations
6-7 Libraries and Packages
Numpy & Pandas
8-9-10 Exploratory Data Analysis (EDA)
Basic tools (plots, graphs and summary
statistics) of EDA;
Reading data from various sources and
platforms
2
Data Cleaning and missing value
imputation
11 Statistical Inference through statsmodel
- Populations and samples
- Statistical modelling, probability
distributions, concept of hypothesis
testing( one sample -2 tail test)
12-13 Introduction to Machine Learning
Algorithm and applications
Supervised Machine Learning; Simple
Linear Regression; Multiple Linear
Regressions
14 Introduction to classification technique
and related steps
Logistic Regression
(Classification Technique)
Data Science Ethics
15- 16 Recap of the skill sets
Student presentations on projects
Recommended Text Book
Python for Data Analysis, O’reilly Wes McKinney
Python for Everybody, by Charles Severance
Doing Data Science by Rachel Schutt O’reilly
Python Data Science handbook by Jake vanderPlas
Reference Books:
Mastering Python for Data Science – Samir Madhavan
Hands-On Data Analysis with NumPy and Pandas -By Curtis Miller