KEMBAR78
EdYoda Data Scientist Program Curriculum | PDF | Cluster Analysis | Statistical Classification
0% found this document useful (0 votes)
303 views20 pages

EdYoda Data Scientist Program Curriculum

The document outlines the curriculum for EdYoda's Data Scientist Program. The program covers topics such as machine learning, data wrangling, mathematics, modeling techniques like linear regression and decision trees, ensemble methods, clustering, anomaly detection, and big data tools like Spark. It aims to provide strong fundamental concepts and industry-standard knowledge to help students gain skills in areas like frontend development, database usage, and creating websites.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
303 views20 pages

EdYoda Data Scientist Program Curriculum

The document outlines the curriculum for EdYoda's Data Scientist Program. The program covers topics such as machine learning, data wrangling, mathematics, modeling techniques like linear regression and decision trees, ensemble methods, clustering, anomaly detection, and big data tools like Spark. It aims to provide strong fundamental concepts and industry-standard knowledge to help students gain skills in areas like frontend development, database usage, and creating websites.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

EdYoda

Data Scientist Program

Program Curriculum

www.edyoda.com hello@edyoda.com
Learning outcomes:

• Strong fundamental concepts of Frontend Development

• Strong fundamental concepts of Server-side Development using REST APIs

• Work with database seamlessly

• Create beautiful end to end websites

• Gain Industry standard knowledge

Data Wrangling

1. Black Box Introduction to Machine Learning

• What is not Machine Learning


• What is Machine Learning
• Types of ML - Supervised, Unsupervised
• Supervised - Classification, Regression
• Unsupervised - Clustering, Association
• Machine Learning Pipeline

2. Essential NumPy

• Introduction to NumPy
• Creation
• Access
• Stacking and Splitting
• Methods
• Broadcasting

3. Pandas for Machine Learning

• Introduction to Pandas
• Understanding Series & DataFrames
• Loading CSV,JSON
• Connecting databases

www.edyoda.com hello@edyoda.com
• Descriptive Statistics
• Accessing subsets of data - Rows, Columns, Filters
• Handling Missing Data
• Dropping rows & columns
• Handling Duplicates
• Function Application - map, apply, groupby, rolling, str
• Merge, Join & Concatenate
• Stacking, Unstacking & Melting
• Pivot-tables
• Normalizing JSON
• Application - EDA on Employee data, sales data

4. Understanding Visualization:

• Introduction to matplotlib & seaborn


• Basic Plotting
• Title, Labels, Legends, Grid, colormap, xticks, yticks
• Color, linewidth
• Sub Plotting
• Scatter plot
• Histogram
• Bar Graphs
• Plotting distributions
• Plotting 3D data
• Fundamentals of Tableau

Fundamental Maths for Data Scientist

1. Essential Maths & Statistics

• Essential Linear Algebra


• Matrix Operations
• Understanding distributions
• Probability Concepts

www.edyoda.com hello@edyoda.com
• Calculus
• Understanding distributions
• Mean, Median, Mode, Quantile
• Other statistics Concepts
• Sampling Techniques

Machine Learning

1. Linear Models for Classification & Regression

• Simple Linear Regression using Ordinary Least Squares


• Gradient Descent Algorithm
• Regularized Regression Methods - Ridge, Lasso, Elastic Net
• Logistic Regression for Classification
• OnLine Learning Methods - Stochastic Gradient Descent & Passive Aggressive
• Robust Regression - Dealing with outliers & Model errors
• Polynomial Regression
• Bias-Variance Tradeoff
• Application - House Price, Cancer Prediction, Insurance Prediction

2. Preprocessing for Machine Learning

• Introduction to Preprocessing
• StandardScaler
• MinMaxScaler
• RobustScaler
• Normalization
• Binarization
• Encoding Categorical (Ordinal & Nominal) Features
• Imputation
• Polynomial Features
• Custom Transformer
• Text Processing
• CountVectorizer

www.edyoda.com hello@edyoda.com
• TfIdf
• HashingVectorizer
• Image using skimage

3. Decision Trees

• Introduction to Decision Trees


• The Decision Tree Algorithms
• Decision Tree for Classification
• Decision Tree for Regression
• Advantages & Limitations of Decision Trees
• Application - Cloth Prediction

4. Naive Bayes

• Introduction Bayes' Theorem


• Naive Bayes Classifier
• Gaussian Naive Bayes
• Multinomial Naive Bayes
• Bernoulli’s Naive Bayes
• Naive Bayes for out-of-core
• Application - Text Classification, Sentiment Analysis and Spam & Non-spam
classification

5. Composite Estimators using Pipelines & FeatureUnions

• Introduction to Composite Estimators


• Pipelines
• Transformed Target Regressor
• FeatureUnions
• ColumnTransformer
• GridSearch on pipeline
• Application - Author classification

www.edyoda.com hello@edyoda.com
6. Model Selection & Evaluation

• Cross Validation
• Hyperparameter Tuning
• Model Evaluation
• Model Persistence
• Validation Curves
• Learning Curves
7. Feature Selection & Dimensionality Reduction

• Introduction to Feature Selection


• Variance Threshold
• Chi-squared stats
• ANOVA using f_classif
• Univariate Linear Regression Tests using f_regression
• F-score vs Mutual Information
• Mutual Information for discrete value
• Mutual Information for continues value
• SelectKBest
• SelectPercentile
• SelectFromModel
• Recursive Feature Elimination
• PCA
• SVD
• Application - Credit Risk Prediction

8. Nearest Neighbors

• Fundamentals of Nearest Neighbor Algorithm


• Unsupervised Nearest Neighbors
• Nearest Neighbors for Classification
• Nearest Neighbors for Regression
• Nearest Centroid Classifier
• Application - Nearest neighbour for face inpainting

www.edyoda.com hello@edyoda.com
9. Clustering Techniques

• Introduction to Unsupervised Learning


• Clustering
• Similarity or Distance Calculation
• Clustering as an Optimization Function
• Types of Clustering Methods
• Partitioning Clustering - KMeans & Meanshift
• Hierarchical Clustering - Agglomerative
• Density Based Clustering - DBSCAN
• Measuring Performance of Clusters
• Comparing all clustering methods
• Application - Grouping similar customers

10. Anomaly Detection

• What are Outliers ?


• Statistical Methods for Univariate Data
• Using Gaussian Mixture Models
• Fitting an elliptic envelope
• Isolation Forest
• Local Outlier Factor
• Using clustering method like DBSCAN
• Application - Anomaly detection for credit risk prediction

11. Support Vector Machines

• Introduction to Support Vector Machines


• Maximal Margin Classifier
• Soft Margin Classifier
• SVM Algorithm for Classification
• SVM for Regression
• Hyper-parameters in SVM
• Application - Face recognition and breast cancer classification

www.edyoda.com hello@edyoda.com
12. Dealing with Imbalanced Classes

• What are imbalanced classes & their impact?


• OverSampling
• UnderSampling
• Connecting Sampler to pipelines
• Making classification algorithm aware of Imbalance
• Anomaly Detection
• Application - Fraud detection
13. Ensemble Methods

• Introduction to Ensemble Methods


• RandomForest
• AdaBoost
• Gradient Boosting Tree
• VotingClassifier
• XGBoost
• Application - Malicious data detection

14. Recommendation Engine

• Understanding distance vector calculation - cosine, euclidean, manhattan


• Types of Recommendation Engines
• Recommendation based on similarity
• Application - Grouping videos based on description, user rating prediction

15. Time Series Modeling

• Simple Average & Moving Average


• Single Exponential Smoothing
• Holt’s linear trend method
• Holt’s winter seasonal method
• ARIMA
16. Packaging & Deployment

• Creating Python Package

www.edyoda.com hello@edyoda.com
• Deploy trained model behind REST interface
• Deploy model behind API call
• Deploy on AWS cloud (optional)

Big Data Ecosystem

1. Introduction to Big Data

• Big Data
• Understanding distributed computing
• Introduction to Hadoop
• HDFS, YARN, MapReduce
• Limitations of Hadoop
• Introduction to Spark
• Introduction to Kafka
• Hive
• Cassandra

2. Internal Details of Spark

• Driver
• Executors
• Partitions
• Jobs
• Stages
• Tasks
• Resilient Distributed Datastructure
• DataFrames as a High Level Datastructure

3. Foundations of Spark using RDD

• Basics of Distributed Computing


• Resilient Distributed Dataset
• Simple Transformers - map,filter,groupby

www.edyoda.com hello@edyoda.com
• Actions - Collect, count, foreach
• Complex api - combinebykey
• Caching, Debugging
• Important Configuration

4. Data Wrangling using DataFrames

• Creating DataFrames from collections


• Creating a DataFrame from csv,json etc.
• DataFrame Row
• DataFrame Column
• Creating tables from dataframe
• SQL query
• DataFrame Grouping
• DataFrame Functions
• User Defined Functions (UDF)

5. Packaging & Deployment of Spark Applications

• The spark-submit command


• Command line parameters
• Deploying the app programmatically
• Configuring your SparkSession
• Modularizing code
• Structure of the module
• Building an egg
• User defined functions in Spark
• Submitting a job
• Monitoring execution

www.edyoda.com hello@edyoda.com
Mindset for Problem Solving

1. Mathematical Aptitude
• Percentages
• Profit and Loss
• Simple Interest and Compound Interest
• Work And Time
• Probability
• Permutation and Combination
• Profit and Loss
• Time & Speed
• Ratios and Proportions
• Data Interpretation

2. Art of Learning Anything


• What is Intelligence
• Relation of success with intelligence
• Illusion of Learning
• Focussed Mode vs Diffused Mode
• Procrastination
• Improving Recall
• Creating Brain Links
• Visual memory & Data Memory
• Slow Thinking

3. Computational Thinking
• Thinking before Doing/Coding
• Problem Identification
• Decomposition
• Pattern Recognition
• Abstraction
• Algorithm Design
• Computational Thinking Use Case 1

www.edyoda.com hello@edyoda.com
• Computational Thinking Use Case 2

4. Technical Puzzles
• Why are Puzzles part of interviews?
• The Art of solving puzzles
• Approach more important than the solution
• Puzzles for Vertical Thinking
• Puzzles for Horizontal Thinking

Productivity and Decision Making

1. Art of being Super Productive


• Start with Why to make objectives clear
• Thinking Limitless
• The magic of computing returns
• Deciding what to work on
• Time Management Skills
• Measuring what matters
• Choosing wisely habits to inculcate

2. Effective Decision Making


• Why is decision making a key skill?
• Components of Decision Making
• Understanding common biases
• Letting emotions not clutter decision making
• Difference between quick decision making & slow decision making

www.edyoda.com hello@edyoda.com
Professional Communication

1. Reading comprehension & Short writing


• Building vocabulary
• Extracting insights from the textual information
• Drawing inferences from multiple stories
• Writing you inferences for others to understand

2. Book Reading & Writing Reviews
• Reading 10 books during the entire course & writing book reviews
• 2 Biographies
• 2 Fictions
• 6 Non-Fictions

3. Effective Understanding & Articulation


• Watching 20 movies from our suggested list
• Writing 1000 words essay on those movies
• Writing a summary of the movies

4. Group Discussion for decision making


• Understanding why GD is so important in personal & professional life
• The objective of GD - Collectively making the right decision
• 5 GD on various topics

5. Writing Professional chat/E-mail


• Writing as the most common method of professional communication
• Factors to keep in mind before starting to write
• Points to consider while writing
• Activities after writing
• Difference between chat writing & email writing

www.edyoda.com hello@edyoda.com
6. Making Impressive Presentation
• Why making a presentation is a professional job
• The objective of the presentation
• Attributes of good presentation
• Why research is key to the presentation
• Making a presentation interactive
• Doing 10 video/live presentation

Computer Fundamentals

1. Operating System Concepts


• Operating System Architecture
• Processes and Process Management
• Threads and Concurrency control
• Scheduling
• Memory Management
• Inter-Process Communication
• Synchronization Constructs
• I/O Management
• Resource Virtualization
• Remote Services
• Distributed Systems
• Introduction to Data Center Technologies

2. Linux Administration
• Introduction to Linux Operating Systems
• Basic Linux Commands
• File Management and Security

www.edyoda.com hello@edyoda.com
• The directory structure of Unix
• User Management
• Groups
• Shell types and basic commands
• Permissions
• sudo
• Systemd Services Start and Stop
• Resource Mgmt with systemctl
• Process Management (top, ps)
• Package Management(yum, apt, rpm)
• Managing disks (lsblk, df, mount, umount,du)
• File systems

3. Data Structures and Algorithms


• Built-in Data Type
o Integers
o Boolean
o Floating
o Character and Strings
• Derived Data Type
o List
o Array
o Stack
o Queue
• Linked List
o Singly Linked List
o Doubly Linked List
o Circular Linked List
• Array
• Stack
• Queue
• Tree
• Basic Operations
o Traversing
o Searching
o Sorting
o Hashing

www.edyoda.com hello@edyoda.com
o Insertion
o Deletion
o Merging
• Searching techniques
o Binary search
o Linear search
• Recursion
• Fibonacci series
• Sorting Algorithm
o Bubble sort
o Insertion sort
o Selection sort
o Quick sort
o Merge sort
o Bucket sort

4. Database concepts
• Introduction to Databases
• Entity Relationship Model
• Relational Model
• Relational Algebra
• Normalization
• Transactions and Concurrency Control
• DBMS Architecture 2-level 3-level
• Data Abstraction and Data Independence
• Database Objects
• Entity-Relationship Model
• Generalization
• Specialization
• Aggregation
• Entity Relationship Diagrams
• Keys in Relational Model
• Candidate key,
• Super key
• Primary key
• Alternate key

www.edyoda.com hello@edyoda.com
• Foreign key
• Strategies for Schema design
• Schema Integration
• Data modelling
• Star Schema in Data Warehouse modelling
• Data Warehouse Modeling

5. Basic SQL - Syntax


• Data Types
• Operators
• Expressions
• Create Database
• Drop Database
• Select Queries
• Create Table
• Drop Table
• Other Table Operations
• Insert Query
• Where Clause
• AND & OR Clauses
• Update operations
• Delete operations
• Order By clause
• Group By Clause
• Sorting operations
• SQL Constraints
• Type of Joins
• Unions Clause
• NULL Values
• Indexing
• Views

6. Software Engineering

www.edyoda.com hello@edyoda.com
• Software Engineering Overview
• Features of Good Software:
o Operational Features
o Transitional Features
o Maintenance Features
• Software Development:
o Requirement Gathering
o Software Design
o Programming
• Software Design
o Design
o Maintenance
o Programming
• Programming:
o Coding
o Testing
o Integration
• Software Development Life Cycle
o Requirement Gathering
o System Analysis
o Software Design
o Coding
o Testing
o Integration
o Deployment
o Operation and Maintenance
• Types of SDLC
o Waterfall model
o Iterative Model
o Spiral model
o V Model
• Agile Concepts
• DevOps Concepts
• Microservices Architecture
• Features of Microservices Architecture
• Software Requirements
• Software Design Basics
• Analysis & Design Tools
o Data Flow Diagram

www.edyoda.com hello@edyoda.com
o Flow Chart
• Design Strategies
o Function-Oriented Design
o Object-Oriented Design
• User Interface Design
o Command Line Interface(CLI)
o Graphical User Interface (GUI)
• Design Complexity
• Software Testing Overview
o Manual Vs Automated Testing
o Testing Approaches
o Black-box testing
o White-box testing
o Unit Testing
o Integration Testing
o Functionality testing
o Acceptance Testing
o Regression Testing
• Quality Control
• Deployment Methods
o Blue-Green Deployment
o Rolling Deployment
• Software Monitoring
• Software Maintenance

7. Tools
• Git
o What is Git?
o Installing Git
o First-Time Git Setup
o Git Basics
o Getting a Git Repository
o Recording Changes to the Repository
o Viewing the Commit History
o Undoing Things
o Working with Remotes
o Tagging
o Git Branching
o Basic Branching and Merging
o Branch Management

www.edyoda.com hello@edyoda.com
o Branching Workflows
o Remote Branches
o Rebasing

• Putty
o Installation
o Types of connections
o Connecting to a remote server
o Using Auth keys
o Customizing putty

• Vim
o Vim Basics
o Insert Mode
o Visual Mode
o Command Mode
o Create and Edit a file
o Search and replace in Vim
o Vim diff
o Copy operations
o .vimrc file
o Vim Commands

www.edyoda.com hello@edyoda.com

You might also like