Data Science Course Content Yashoda Technologies
Data Science Introduction
Introduction to Data Science
Data science and its importance?
Case studies, Realtime Examples
How Data Science can help us in each domain
Data Science Project Cycle Overview
How Data science is different from Business Intelligence & Reporting?
Data science Career opportunities and Roles
What are the basic skills required to learn Data science?
Introduction to Big Data Analytics
Big data and analytics?
Why Big Data?
What is Big Data?
Big Data Scalability
Big Data Architectures
Handling Big Data
Introduction to Hadoop
Hadoop Distributed File System (HDFS)
Map and Reduce blocks
What is Hive?
What is Pig?
1|Page
Data Science Course Content Yashoda Technologies
Python For Data Science
Introduction
Why Python for Data Science?
Working Environment Setup
Python Distributions for Data Science
Python 2.x vs. Python 3.x
Installing Ananconda Distribution
Jupyter Notebook
o Basics
o Magic Functions
Extracting Data
o From Databases
o Through APIs
o Using Web Scraping
Reproducible Script for Getting Data
Public Datasets
Exploring and Processing Data
Introduction to NumPy and Pandas
Investigating Basic Structure
Selection, Indexing, and Filtering
Basic Statistics with Python
Univariate Distribution and distribution Plots
Grouping and Aggregation
Crosstab
Pivot Table
Data Munging
Missing Values
Missing Imputation Techniques
Treating Missing Values Using Pandas
Detecting and Treating Outliers Using Pandas and NumPy
Feature Creation Using Pandas and NumPy
Categorical Feature Encoding: Binary Encoding Using Pandas
Reproducible Script for Data Processing Using Pandas and NumPy
2|Page
Data Science Course Content Yashoda Technologies
Introduction to R-Programming
A Premier to R-Programming
History of R
R Overview
Why R?
Why Learn R Programming?
Install R on Windows (or Install R on Linux or Install R on Mac)
Hello World in R
Editors and IDEs for R
Install R-Studio on Windows (or on Linux or on Mac)
R-Studio Desktop Overview
Built-In Help
a. Using Help Commands
b. Using Demo Commands
c. Using Vignettes
Web Search
Community Support
a. Mailing List
b. Forums
c. Blogs
R-Variable
a. Naming Convention
b. Naming Guide
c. Assign Variable
Environments and Variables
Operators
a. Arithmetic Operators
b. Special Numbers: Inf, NaN, NA
c. Logical Operators
Vectorized Operations
a. Types of Vectorized Operations
Data Structures in R
a. Atomic Vectors & Common Operations on Atomic Vectors
b. Factors & Operations
c. Lists & Common Operation on Lists
d. Data Frames & Common Operation on Data Frames
e. Matrices & Common Operation on Matrices
f. Arrays & examples
Functions
a. Overview
b. Components
c. Naming Guidelines
d. Argument Matching
e. Arguments with Default Values
f. Additional Arguments Using Ellipsis
3|Page
Data Science Course Content Yashoda Technologies
g. Lazy Evaluation
h. Multiple Return Values
i. Functions as Objects
j. Anonymous Function
R - Flow Control
a. If
b. If-Else
c. Multiple If-Else
d. Switch
e. Vectorized If
f. Repeat
g. Repeat With Break
h. Repeat With Next
i. While
j. For
k. Apply
l. Functions in Apply Family
R - Packages
a. About R Package
b. Load R Package
c. Install R Package
d. Manage R Package
R - Import Data
a. Working Directory
b. Import CSV Files
c. Import Table
d. Import from URL
e. Import XML Files
f. Import Excel Files
g. Import Other File Types
h. Import Built-In Datasets
i. Import from Database
j. Import Database Using RODBC Package
Miscellaneous
a. Creating new variables or Updating Existing Variables
b. String Manipulations
c. Sub setting data from matrices and data frames
d. Casting and melting data to long and wide format
e. Merging data frames
4|Page
Data Science Course Content Yashoda Technologies
Statistics Basics
Basics of Statistics
Definitions of Basic Statistical Terms
o Three Ms ( Mean Median and Mode)
o Variance
o Standard Deviation
Significant Difference
o Significance & P-value
Correlation
o Positive
o Negative
o No Correlation
Spurious correlation
Correlation vs causation
Sampling
Business Statistics
Data types
Variables
o Continuous Variables
o Ordinal Variables
o Categorical Variables
Time Series
Miscellaneous
Descriptive Statistics
Sampling
o Need of Sampling
o Types of Sampling
Simple random sampling
Systematic sampling
Stratified sampling
Data distributions
Normal Distribution and its characteristics
Binomial Distribution
Inferential Statistics
Hypothesis Testing
Type I error
Type II error
Null and alternate hypothesis
Reject or acceptance criterion
5|Page
Data Science Course Content Yashoda Technologies
Exploratory data analysis and visualization
Working with data
Getting data into R
o Reading from files, Connecting to DB
Data Munging
Cleaning and preparing the data
o Converting data types (Character to Numeric etc.)
Handling Missing values
o Imputation or Replacing with place holder values
Cleaning Data with tidyr
o What is tidy Data?
o Wide to Long Conversion
o Long to Wide Conversion
o Splitting Cells
o Joins in dplyr
Data Filtering and Querying with dplyr and data.table
o Queries at Row and Column Level
o Combined Queries
o Converting to Data.table
o Filtering Big Data
Data Visualizations with R
Visualization in R using ggplot (plots and charts)
o Histograms
o Barcharts
o Boxplot
o Scatterplots
Adding more dimensions to the plots
o Geom(), Dodge etc
Visualization using Tableau
Introduction to Machine learning
Machine Learning Basics
Spam Classification
Performance Metrics
o Accuracy & F1 Score
o Precision and Recall
Types of Machine Learning
o Supervised
o Unsupervised
Machine Learning Workflow
o Data Preparation
o Algorithm Selection
o Training Process
o Testing Model's Accuracy
o Improving Model Performance
6|Page
Data Science Course Content Yashoda Technologies
Statistical Modelling
Linear Regression
Modeling relationships b/w Variables using Regression
Understanding Simple Regression Models
Solving the Regression Problem
Residuals and the Regression Assumptions
R-squared as Variance
Prediction Using Simple Regression
Sum of Least Squares
Multiple Regression in R
Disadvantages of Linear Models
Logistic Regression
Importance of Logistic Regression
Modeling relationships b/w Variables using Regression
Applications of Logistic Regression
i. Analysis
ii. Allocation
iii. Prediction
iv. Classification
Understanding the S-curve
Maximum likelihood estimation (MLE)
Confusion Matrix
ROC Curve
Logistic Regression and Linear Regression – Similarities and Differences
Advantages and disadvantages of logistic regression models
Underfitting vs. Overfitting
Cross validation
K-Fold Cross validation
Decision Trees
Classification and Regression Trees (CART)
Process of Tree Building
Entropy and Gini Index
Problem of Overfitting
Pruning a tree back
Trees for Prediction (Linear) – example
Trees for Classification models – example
Advantages of tree based models?
KNN – K Nearest Neighbors
Advantages and disadvantages of KNN
Re-Sampling and ensembles Methods
Bagging
Random Forests
Boosting – Gradient Boosting machines
Advanced methods
Support Vector Machines (SVM)
7|Page
Data Science Course Content Yashoda Technologies
Probabilistic methods
Naïve Bayes
Un-Supervised learning
Cluster Analysis
o Hierarchical clustering
o K-Means Clustering
o Distance measures
o Cluster analysis of Applications
Principal Component Analysis (PCA)
Advantages of Principle Components
Applications of PCA
Time series analysis – Forecasting
Simple Moving Averages
Exceptional Smoothing
Time series decomposition
ARIMA
Association Rules (Market Basket Analysis)
Apriori
Recommender Systems
Collaborative filtering
o User based filtering
o Item based filtering
Text Analytics
Introduction to natural language Processing (NLP)
Finding Frequently occurring words in a document corpus
WordCloud
Term Document Matrix
Sentiment Analysis
Text classification models (Spam Detection)
8|Page
Data Science Course Content Yashoda Technologies
Introduction to Deep learning
Overview about Deep learning
Relationships among
o Artificial Intelligence (AI)
o Machine Learning (ML)
o Deep Learning (DL)
Artificial Neurons
Neural networks
Deep Neural networks
Deep learning Techniques
Convolutional Neural Networks
Recurrent Neural Networks
Fully Connected Neural Networks
Generative Adversarial Networks
Deep Reinforcement Learning
Deep Learning Applications
Tables
Text files
Audio files
Video files
Impact of Deep Learning
9|Page