Data Science Specialization Track
Lecture Practical
Sno. Subject
(L) Lab (P)
1 Foundations of Data Science with Python Lab 3 1
2 Statistical Modelling 3 1
3 Big Data Overview 3 1
4 Supervised & Unsupervised Learning 3 1
5 Neural Networks & Deep Learning 3 1
6 R for Data Science 3 1
7 Data Visualization 2 2
Total Credits: 28
Subject 1 Lecture: 3 Credits
Total Credits: 4 Lab: 1 Credit
FOUNDATIONAL DATA SCIENCE AND PYTHON LAB
Unit 1: Introduction to Data Science
Defining Data Science and Big Data, Benefits and Uses of Data Science and Big Data, Facets of
Data, Structured Data, Unstructured Data, Semi-Structured Data Natural Language, Machine-
generated Data, Graph based or Network Data, Audio, Image, Video, Streaming data, Data Science
Process, Big data ecosystem and data science, Distributed file systems, Distributed programming
framework, data integration framework, machine learning framework, NoSQL Databases,
scheduling tools, benchmarking tools, & system deployments, Introduction to Machine Learning,
History of AI, Applications of DS
Unit 2: Application of Data Science
Use cases and market trends of Data Science, Technologies for visualization, Bokeh (Python),
recent trends in various data collection and analysis techniques, various visualization techniques,
application development methods of used in data science.
Unit 3: Setting up the Python Environment
Compiler vs. Interpreter, statically vs. Dynamically Typed Languages, Introduction to Python,
Installing Python, Anaconda, Jupyter Notebook, Spyder, Components and Versions of Python,
Difference between Python 2 and Python 3, Python Distributions
Unit 4: Programming with Python
Python REPL, Variables, control structures, functions and objects, First-class functions, immutable
data, strict and non-strict evaluation, Recursion instead of an explicit loop state, Functions, iterators,
and generators, writing pure functions, functions as first-class objects, using strings, tuples and
named tuples, using lists, dicts, and sets, The itertools module, best practices and clean coding,
reading data files into Python, writing files, Introduction to Python libraries.
Unit 5: Data Pre processing
Introduction to Pandas and Basic Concepts of Pandas, Data Cleaning and Preparation, Handling
Missing Data, filtering out Missing Data, Filling in Missing Data, Data Transformation, Removing
Duplicates, Transforming Data Using a Function or Mapping, Replacing Values, Renaming Axis
Indexes, Discretization and Binning, Detecting and Filtering Outliers, Permutation and Random
Sampling, String Manipulation, Feature Engineering.
Subject 2 Lecture: 3 Credits
Total Credits: 3 Lab: 0 Credits
STATISTICAL MODELLING
Unit 1: Basic Statistical Concepts
Introduction to Statistics, Classification of Statistical Methods, Descriptive Statistics, Inferential
Statistics, Scale of Measurements (Nominal, Ordinal, Ratio and Interval), Nominal Scales, Nominal
Scales, Ratio Scales, Mean, Median, Mode, Measures of Variability/Spread, range, Quartiles and
Interquartile Range, Standard Deviation (SD), Measures of Shape, Skewness, Kurtosis.
Unit 2: Probability Theory
Principles of Counting, Introduction and Definitions of Probability Theory, Conditional Probability,
Bayes Theorem, Discrete Probability Distribution, Covariance and Correlation, Continuous
Probability Distribution, Central Limit Theorem, Hypothesis Testing.
Unit 3: Matrices
Introduction to Matrices, Matrix Notations and Types, Matrix Equality, Operations on Matrices,
Determinants, Singularity of a Matrix, Orthogonal Matrix, Elementary Transformations and
elementary matrices, Echelon forms and echelon transformations, Matrix Rank and Normal Form of
a matrix, Vector Spaces and the axioms, Linear Dependence and Independence of vectors,
Consistency of linear system of equations, Eigenvalues and eigenvectors, Cayley Hamilton
Theorem, Linear Transformation and Orthogonal transformation, Matrix Factorization and Types.
Unit 4: Linear Algebra
Introduction to Linear Algebra, Notations in Linear Algebra, Important Concepts of Linear Algebra,
Definitions of Linear Algebra, Introduction to mathematical modelling, Applications of mathematical
modelling, Principles and stages involved in developing a mathematical model, Classification of
mathematical modelling, Conceptualizing a mathematical model, Concept of boundary conditions.
Unit 5: Statistical Modelling
Derived Variables, Basic Exploratory Data Analysis, Methods for EDA and Examples, Statistical
Modelling, Curve Fitting: Linear Regression, Nonlinear Regression
Subject 3 Lecture: 3 Credits
Total Credits: 4 Lab: 1 Credit
BIG DATA OVERVIEW
Unit 1: Data Growth Explosion
Data is everywhere, Different sources of data, Types of data, Data explosion, what has led to data
explosion? Increase in Storage Capacities, Data Processing Abilities, Emerging Data Formats and
Data Availability, of Data Explosion
Unit 2: Categories of Data
Data Classification, Organization of structured data, Examples of structured data, How Structured
Data expands? Advantages and Disadvantages of Structured Data, what is unstructured data?
Examples of Unstructured Data, Advantages and Disadvantages of Unstructured Data, What is
Semi-structured data?, Examples of semi-structured data, Advantages and disadvantages of semi-
structured data, Comparison of structured, unstructured and semi-structured data,
Unit 3: Different Data Storage Mechanisms
Data Storage - An Introduction, Mechanisms of data storage, Introduction to Databases, Database
Architecture, Common Database Types, Tabular databases, Advantages and Limitations of tabular
databases Entity Relationships and Tables, Characteristics of NoSQL databases, Types of NoSQL
Datastores, Advantages and disadvantages of NoSQL Characteristics of Big Data, Definition(s) of
Big Data, Know the history, How Big is Big Data? Sources of Big Data, Characteristics of Big Data
Unit 4: Data Lake Essentials
What is a data lake?, Key attributes of a data lake, Traditional Analytics Pipeline, Data Lake Pipeline,
How Data Lake Compares to Enterprise Data Warehouse, Components of a Data Lake – Ingestion,
Components of a Data Lake – Storage, Components of a Data Lake - Catalogue and Search,
Components of Data, Benefits, Use cases of data lake, stores used in data lake, Data Processing
Requirements, Scalability, Improve the Availability and Performance of Systems, Elasticity,
Scalability, How to measure scalability?
Unit 5: Big Data Ecosystem
The Big Data Ecosystem, Big Data storage, NoSQL Databases, Distributed File Systems, Big Data
Processing, MapReduce - An Introduction, Map, Reduce, Other User Interfaces of MapReduce, An
Example for MapReduce – Wordcount, Daemons of MapReduce, Key Benefits of using
MapReduce, Use case examples, Data Locality, Categories of Data Locality, Advantages of Data
Locality, Challenges and Ways to Optimize Data Locality, Resiliency, Fault Tolerance
Subject 4 Lecture: 3 Credits
Total Credits: 4 Lab: 1 Credit
SUPERVISED & UNSUPERVISED LEARNING
Unit 1: Difference Between Supervised and Unsupervised Learning
Machine learning, why we need machine learning, machine learning process State the different types of
learning: Supervised, unsupervised and reinforcement learning, Detailing out on labeled data and its types,
classification and regression models, unlabeled data and its types, clustering model; Gradient Descent-
Overview, Gradient Descent, Finding a Minimum Using Gradient Descent, Estimating the Gradient, Using
the Gradient Descent, Example, Loss Function, Different Loss Functions,
Unit 2: Regression Techniques
Regression Technique, Origin of Regression, Regression in Real World, regression concepts, Regression
Types, Linear Regression Types, Linear Regression Variance, Co-Variance, Linear Regression Correlation
Coefficient, OLS, R Squared, Goodness of fit, Linear Regression Using Gradient Descent, Gradient Descent
Explained with an Example, Stochastic Gradient Descent, Cost Function –Partial Derivative, Testing Model
Using Cross Validation, Cross Validation Types, regularized regression, Ridge Regression, lasso regression,
L1 vs L2 Norm – Regression, Generalized Linear Regression, RANDOM COMPONENT OF A GLM
Unit 3: Classification Techniques- Decision Tress & Naïve Bayes
Classification Technique, Decision Tree, Decision Tree Illustration using Sample Dataset, concept of
homogeneity., entropy, Entropy Explained with Rainfall Example, plot of entropy versus the proportions,
Information Gain, Algorithms to Create a Decision Tree, Gini Index, Truncation and Pruning, Decision Tree
Working Methodology, Decision Tree Tuning Parameters, Naïve Bayes, bayes theorem., Example, Naïve
Bayes Algorithm for Categorical Data, Popular Naive Bayes Classifiers, Types of Naive Bayes Classifier,
Naïve Bayes for Text Classification, popular naive bayes classifiers, Naïve Bayes Algorithm, K Nearest
Neighbour classification , Curse of Dimensionality, K-Factor, Implementation of KNN using Python
Unit 4: Dimensionality Reduction
Introduction, Singular Value Decomposition, SVD code: Principal Component Analysis (PCA), Isometric
Maps (Isomaps), Multidimensional Scaling (MDS), ISOMAPS with MDS, ISOMAPS (Code), Visualizing the
ISOMAPS Data, Applying PCA on the Same Data, Visualization of PCA, Feature Selection Techniques,
Wrapper Method
Unit 5: Clustering
What is Clustering and Why is it Important? Techniques in Clustering, K-Means Clustering, Steps for K-
Means Algorithms, Density Based Spatial Clustering (DBSCAN), Types of Points in DBSCAN, DBSCAN
Example, DBSCAN: Advantages, DBSCAN: Disadvantages, Hierarchical Clustering, Dendrograms,
Hierarchical Clustering Code, DBSCAN Dendrogram Visualization
Subject 5 Lecture: 3 Credits
Total Credits: 3 Lab: 0 Credits
NEURAL NETWORKS AND DEEP LEARNING
Unit 1: RNN
Gradient Descents, Gradient Descent Terminologies, Types of Gradient Descents, Recurrent Neural
Network, Using MLP instead of RNN, Steps in Recurrent Neuron, RNN Mathematically, Example of
Feedforward Propagation, Backpropagation, Steps in Back Propagation, Limitations of RNN, Long
Short Term Memory(LSTM), Architecture of LSTM, Gates in LSTM, Forget Gate, Input Gate, Output
Gate, Predicting the next character using RNNs, Hopifield Network, Gated Recurrent Unit (GRU),
GRU Reset Gate, Bidirectional RNN
Unit 2: Deep Learning
Introduction to Deep Learning, Deep Learning Subset of AI and ML, Machine Learning vs Deep
Learning, Deep Learning Network Structure, Types of Deep Learning Networks, Convolution Neural
Network (Convo Net), Tensor, Introduction to TensorFlow, advantages of TensorFlow, Deep Learning
Libraries, Creating a Deep Learning Network using TensorFlow
Unit 3: Boltzmann Machines
Introduction to Boltzmann Machines, Working of Boltzmann Network, Restricted Boltzmann
Machines, Working, Deep Boltzmann Machine (DBM), DBM Training, Collaborative Filtering using
Boltzmann Machines, Collaborative Filtering Using RBM, RBM Net Architecture, Markov Random
Fields, Deep Boltzmann Machine
Unit 4: Deep Belief Networks
Introduction to Deep Belief Network, Stacking RBM to create Deep Belief Network, Working of DBN,
Greedy Layer Wise Learning, Need of Fine Tuning, Wake Sleep Algorithm
Unit 5: Modern Statistical Concepts
Model Free Confidence Intervals, Confidence Interval Data Requirements, construct confidence
interval, Jackknife Regression, Hidden Decision Trees, learn about confidence intervals, define
jackknife regression, Probabilistic Graphical Models (PGM), Bayesian Network (BN), Inference in
Bayesian Network, Explain graphical models, Describe tetter goodness of fit and yield metrics
Subject 6 Lecture: 3 Credits
Total Credits: 4 Lab: 1 Credit
R FOR DATA SCIENCE
Unit 1: Getting Started with R and R Workspace
Introducing R, R as a programming Language, the need of R, Installing R, RStudio, RStudio’s user
interface, console, editor, environment pane, history pane, file pane, plots pane, package pane, help
and viewer pane. R Workspace, R’s working directory, R Project in R Studio, absolute and relative
path, Inspecting an Environment, Library of Packages, Getting to know a package, Installing a
Package from CRAN, Updating Package from CRAN, Installing package from online repository,
Package Function, Masking and name conflicts.
Unit 2: Basic Objects and Basic Expressions
Vectors, Numeric Vectors, Logical Vectors, Character Vectors, subset vectors, Named Vectors,
extracting element, converting vector, Arithmetic operators, create Matrix, Naming row and columns,
subsetting matrix, matrix operators, creating and subsetting an Array, Creating a List, extracting
element from list, subsetting a list, setting value, creating a value of data frame, subsetting a data
frame, setting values, factors, useful functions of a data frame, loading and writing data on disk,
creating a function, calling a function, dynamic typing, generalizing a function. Assignment
Operators, Conditional Expression, using if as expression and statement, using if with vectors,
vectorized if, if else, using switch, using for loop, nested for loop, while loop.
Unit 3: Working with Basic Objects and Strings
Working with object function, getting data dimensions, reshaping data structures, iterating over one
dimension, logical operators, logical functions, dealing with missing values, logical coercion, math
function, number rounding functions, trigonometric functions, hyperbolic functions, extreme
functions, finding roots, derivatives and integration, Statistical function, sampling from a vector,
Working with random distributions, computing summary statistics, covariance and correlation matrix
Unit 4: Working with Data – Visualize and Analyze
Data Reading and Writing Data, importing data using built-in-function, READR package, export a
data frame to file, reading and writing Excel worksheets, reading and writing native data files, loading
built-in data sets, create scatter plot, bar chart, pie chart, histogram and density plots, box plot, fitting
linear model and regression tree.
Unit 5: Introduction to Statistics
Role of statistics in scientific methods, current applications of statistics. Scientific data gathering
Sampling techniques, scientific studies, observational studies, data management. Data description
Displaying data on a single variable (graphical methods, measure of central tendency, measure of
spread), displaying relationship between two or more variables, measure of association between
two or more variables.
Subject 7 Lecture: 2 Credits
Total Credits: 4 Lab: 2 Credits
DATA VISUALIZATION
Unit 1: Introduction to Visualization
Visualization, History and Evolution of Visualization, Need of Visualization. Data, types of data and
their representations. Effectiveness of Dataset in Visualization, Visualization Principles, Scope of
Data Visualization in Business, Data Visualization Use Cases, Tools for Data Visualization.
Unit 2: Data Visualization Techniques
Basic Data Visualization Techniques: Bar charts and column charts, Line charts, Pie charts,
Histograms and Density plots, Scatter plots, Advanced Data Visualization Techniques: Heatmaps,
Box plots, Treemaps, Geospatial visualization, Advanced Visualization features: Customizing
visualization- colors, labels, interaction. Implementing slicers and filters for iwnteractivity,
Introduction to DAX, Aggregation Functions, Logical Functions, Variables in DAX, Advanced
Analytics, Creating Forecasting and Trend Analysis.
Unit 3: Data Preparation and Transformation Techniques
Data cleaning, Feature engineering: Feature extraction, deriving new features, finding relationships
between features, data normalization, data merging & appending.
Unit 4: Interactive Data Visualization
Understanding Dashboards, Creating Simple Dashboard, creating simple storyboards, working with
real world data, Building interactive dashboards and reports, Ethical considerations in Data
Visualization.