KEMBAR78
What Is Data Science and Machine Learning | PDF | Regression Analysis | Analytics
0% found this document useful (0 votes)
5 views22 pages

What Is Data Science and Machine Learning

Uploaded by

ayush231225
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views22 pages

What Is Data Science and Machine Learning

Uploaded by

ayush231225
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Complete Data Science

and Machine Learning


Using Python

By
Jitesh Khurkhuriya

© Jitesh Khurkhuriya
Data Growth – IDC-Seagate November, 2018

© Jitesh Khurkhuriya
Data Growth – IDC-Seagate November, 2018

175 ZB

1 ZB = 1,000,000,000 TB
Majority of the data is Unstructured

2025

© Jitesh Khurkhuriya
It’s not just about…..

© Jitesh Khurkhuriya
Application of Data Science and Machine Learning

Automobile Banking Healthcare Media Telecom

Preventive Fraud/Default Content


Maintenance Predict Disease Increase sale
Prevention Personalisation

© Jitesh Khurkhuriya
Heard On The Streets
• IDC Futurescape - Two-thirds of Global 2000 Enterprises CEOs will
centre their corporate strategy on digital transformation including
machine learning (ML) solutions.

• Harvard Business Review – Data Scientist: The Sexiest Job of the


21st Century

• McKinsey Report – 45 percent of work activities could potentially


be automated by currently demonstrated technologies; machine
learning can be an enabling technology for the automation of
80 percent of those activities.

• Microsoft CEO Satya Nadella – called out machine learning -- and


the big data that powers it -- as a key development in his memo to
Microsoft last July.

© Jitesh Khurkhuriya
Benefits of Data Science and Machine Learning
✓ Faster decisions

✓ Develop insights that are beyond human


capabilities

✓ Act at the right time and take advantage of


opportunities, converting them into closed
deals.

© Jitesh Khurkhuriya
Types of Analytics
• What’s the Best method to
Past/Present retain the customer?
• Will this customer go?
How can we
• Poor customer service
• Cheaper Alternative make/prevent it?
Foresight What will
• Sales are up/down happen? Prescriptive
• Customer Left/Leaving Why did it Analytics
happen? Predictive Analytics
Insight What
Happened? Diagnostic Analytics

Descriptive
Hindsight Analytics

Difficulty Level
Courtesy – Gartner Report and analysis
© Jitesh Khurkhuriya
What is Data Science?

Mathematics, Programming
Statistics Data Preparation
Machine Learning
Data
Science

Domain Knowledge
Subject Matter Expertise

© Jitesh Khurkhuriya
Present
Result Deploy
Model
Planning
Data Model
Processing Building
and
Business Selection
Case and
Discovery

© Jitesh Khurkhuriya
Business Case and Discovery

What’s the End Goal?


Stakeholders Discussions

How much time and budget we have

Past attempts

What kind of data is available

© Jitesh Khurkhuriya
Data Processing

Data Mapping Data Cleaning Data Transformation Sample the Data

• Data Quality • Format conversion


• Data Sampling
• Missing Data • Data Normalization
• Data Split
• Noisy Data • Statistical imputation
• Data Binning
• Outlier Treatment • Feature Engineering

© Jitesh Khurkhuriya
Exploratory Data Analysis

© Jitesh Khurkhuriya
CLUSTERING
ANOMALY DETECTION
K-MEANS MULTI-CLASS CLASSIFICATION
One Class SVM > 100 Features
Fast Training, Linear Model Multi-Class Logistic Regression

PCA Based Anomaly Detection Fast Training Accuracy, Long Training Times Multi-Class Neural Network

Accuracy, Fast Training Multi-Class Decision Forest

REGRESSION Accuracy, Small Memory Footprint Multi-Class Decision Jungle

Data in Rank Order


Start Depends on Two-Class One-V-All Multiclass
Ordinal Regression
categories

Poisson Regression Predicting Event Counts

Predicting a
Fast Forest Quantile Regression Distribution TWO-CLASS CLASSIFICATION
Fast Training, Linear Two-Class Decision
Linear Regression Accuracy, Fast
Model >100 Features, Forest
Two Class SVM Training
Linear Model
Linear Model, Small Accuracy, Fast Two-Class Boosted
Bayesian Linear Regression
datasets Two-Class Averaged Fast Training, Training, LargeM Decision Tree
Perceptron Linear Model
Accuracy, Long Training Accuracy, SmallM Two Class Decision
Neural Network Regression Fast Training,
Time Two Class Logistic Jungle
Regression Linear model
Decision Forest Regression Accuracy, Fast Training >100 Features Two Class Locally Deep
Two Class Bayes Fast Training, SVM
Point Machine Linear Model
Accuracy, Fast Training, Accuracy, Long Two Class Neural
Boosted Decision Tree Regression Training Times
large Memory Network
© Jitesh Khurkhuriya
What to consider while choosing an algorithm?

Predicting Categories

Predicting Continuous Value

Finding Unusual Data Points

Discovering Structure

© Jitesh Khurkhuriya
Model Building and Selection

Train Model

Cross Validation

Parameter Tuning

Select Model

© Jitesh Khurkhuriya
Present the results

• Explain the process of model planning and selection

• Explain the findings; correlations, causes, variable


selections

• Communicate the results

• Explain the process of operationalization

© Jitesh Khurkhuriya
Deployment
Back Office Systems Data Science Space Social Media
Transactional Data
Activities

Operations Data

ML Engine Web and Mobile Logs


ERP/CRM Data
Website and Online Apps
DWH
Mobile Apps

MIS/Reporting Marketing DWH


Enterprise BI and Reporting Marketing Campaigns
Processed Algorithms
Data
External Systems Customer Service
3rd Party Operations
Customer CRM

Regulatory

Decisions
© Jitesh Khurkhuriya
Skills Required to be a Data Scientist

• Soft Skills
• Domain knowledge
Business
• Communication Case and
• Analytical skills Discovery
• Technical Skills
• Curiosity Deploy
Data
Processing • Mathematics
• Common Sense
Data • Statistics
Science • File handling or database
Present Model • Machine Learning
Result Planning
• Python or similar
Model • Tableu or similar visualization
Building

© Jitesh Khurkhuriya
Soft Skills

Understanding of the Discovery phase as well Analyse various Asking the right
Is it making sense on
data elements based as presenting findings relationships among questions to gain
normal beliefs?
on domain expertise to the stakeholders data features. deeper understanding.

Domain knowledge Communication Analytical Skills Curiosity Common Sense

© Jitesh Khurkhuriya
Technical Skills
Math as the basis for Helps in dealing with the
Build models using either
algorithms. Helps for own imperfections of data as
Python, R, SAS, Azure ML
implementations. well as data transformation

Mathematics Statistics Data Wrangling Programming Data


Machine Learning
Languages Visualisation

Helps in data imputation as Heart of Data Science. Visual understanding of


well as validate the results Various algorithms for data as well as
of an experiment predictions of the outcome. communication of findings.

© Jitesh Khurkhuriya
Complete Data Science and Machine Learning Using Python

Thank You!
© Jitesh Khurkhuriya

You might also like