KEMBAR78
Introduction to machine learning and applications (1) | PDF
Introduction to Machine
Learning and Applications
Manjunath Sindagi
1.07.2017
Agenda
● About myself
● Data and its Characteristics
● Introduction to Machine Learning
● Applications
● Example of Linear Regression
● Career
● Q & ATopics
About Myself
● Over a decade of Experience in the data field.
● Information Retrieval (Search), NLP, Extraction,Recommendation
Engine,Machine Learning
● Half a dozen data products.
● Worked for startups
● Interests : Defining Product Strategy, Defining data problems (Business to
technical) , Consulting, Mentoring Startups, Solving Data Problems,
Quoran, Teaching and Photography
Data &
Information
What is?
● Datum - ‘something given’
● A Datum is single factual, single
entity, point of matter
● Data (data sets) represents a
collection of data points
(Datums).
● Information - data is processed,
organized, structured or
presented in a context to make it
useful
Data
Different Sources
● Huge list of file formats
● Text - Word, PDF, Excel, PPTx,
RTF, Jar , txt, HTML
● Image - jpg, bmp, svg
● Video - .avi, .mov, .flv, .mp4
● Audio - .wav, .mp3 etc
● Present in
○ Structured
○ Semi Structured
○ Unstructured
Data Generation
Data Characteristics
Data
Characteristics
4 Vs of Data
➢ Volume
➢ Velocity
➢ Variety
➢ Veracity
Artificial
Intelligence
Artificial Intelligence is defined as
the science of making computers
do things that require intelligence
when done by Humans
Making sense out of data .
AI And Related Fields
Applications
● News and Video Aggregator
● Search Engines
(Recommendations)
● Stock Recommendation
● User Profile Creations using
Public data
○ Disambiguation of Authors.
● Stack Recommendation
Real World Projects that I
worked On.
Applications
● Automation of Tweets to Users
for Song Suggestions
● Competitive Intelligence
● Automatic Generation of Table
of Contents from Documents
● Chatbots
● Q & A Engine
Real World Problems that I
worked On.
Machine Learning
Arthur Samuel (1958)
Field of Study that gives computers the ability to
learn without being explicitly programmed.
- Checker Program
Tom Mitchell (98)
A Computer program is said to learn from
Experience E with respect to some task T and
some performance measure P if its
performance on T as measured by P improves
with experience E
Example
● Classification - Spam or No
Spam
○ Task - Spam or No Spam
○ Experience - Watching the label as
spam or not
○ Performance - Emails correctly
classified
● Machine Learning - Grown out
of AI field
More Examples
● Search Engine
● Email Response
● Handwriting recognition
● Product Recommendations
● Detecting Objects from Images
● Many More examples
Machine
Learning
Block Diagram
Machine
Learning
Types
➢ Supervised Learning
➢ Unsupervised Learning
➢ Others - Reinforcement
Learning, Recommendation
Engine
Machine
Learning
Supervised Learning
➢ Housing Price Prediction
➢ Breast Cancer ( Malignant or
Benign)
Supervised
Learning
Continued
SLNO Area Price
132842 2818 795000.00
134364 3032 399000.00
135141 3540 545000.00
135712 1249 909000.00
136282 1800 109900.00
136431 1603 324900.00
137036 1450 192900.00
137090 3360 215000.00
137159 1323 999000.00
137570 1750 319000.00
138053 1400 ?
Housing Prices
Supervised
Learning
Continued
ID Number Clump Thickness
Class (0=benign,
1=malignant)
1000025 5 1
1002945 5 1
1015425 3 0
1016277 6 1
1017023 4 0
1017122 8 1
1018099 1 0
1018561 2 1
1033078 2 1
1033078 4 ?
Cancer Data
Apply Linear Regression?
Supervised Learning
➢ Right Answers Given
➢ Regression
○ Predict - What is Price?
○ Continuous Output
○ Housing Price
➢ Classification
○ Discrete Output - 0,1,2 etc
○ Breast Cancer
In reality, Many features are
present
Supervised
Learning
Regression Classification
Machine
Learning
● No Labels or right answers given
● Pattern or Structure needs to be
identified
● Not told what to do with data.
Unsupervised Learning
Unsupervised Learning - Clustering
Unsupervised
Learning
➢ Social Network Analysis
➢ Market Segmentation
○ Grouping Customers
➢ News
➢ Grouping Investigators from
Pubmed Articles
➢ Innumerable examples
➢ Images that has Human Face.
Clustering
Machine Learning ● Decision Trees
● Naive Bayes
● Linear regression
● SVM - Support Vector
Machines
● Logistic Regression
● K-Means
● Apriori
● Nearest Neighbours
Most Commonly used
Algorithms
Unsupervised
Learning
➢ Audio refinement - Cocktail
Party Algorithm
➢ Fraud Detection
➢ Default detection
Non-Clustering
Machine
Learning
What are the Steps to Solve?
Example of Housing Price
Prediction using Linear
Regression.
Steps to Solve
● Feature selection
SlNo Length Breadth Area Price Area
1 20 40 800 25L 0
2 30 50 1500 50L 1
Steps to Solve
● Feature scaling
● Model Selection
Feature Scaling
● X’ = ( X - Xmin
)/(Xmax
- Xmin
)
● X` = (X- μ)/σ
Model/Hypothesis
y = mx + c
Steps to Solve
● Parameter Selection
● Cost Function
● m and c - select randomly
● cost = 1/2n Σ n
i
(yi
- y)2
Steps to Solve
● Gradient Descent
● Find Min Cost using Gradient
Descent
●
Θ is m and c
respectively
W is m here
Steps to Solve
● Gradient Descent
● Find Min Cost using Gradient
Descent
●
Steps to Solve
● Evaluation
● Data - Train data , Test data
● Sensitivity, Specificity
●
Steps to Solve
● Evaluation
● ROC Curve
Steps to Solve
● Feature Selection
● Feature Scaling
● Model Selection
● Parameter Selection
● Cost Function
● Gradient Descent
● Evaluation
Summary
Data Science
as Career
Data Science as
Career
Intersection of Different fields
Data Science as a Career
Mathematics
● Linear Algebra
● Differentiation
● Probability &
Statistics
● Calculus
Machine Learning
● Basics
● Text
● Image
● Video
Programming
Language
● Python
● R
● Java
● Spark (PySpark)
Data Science as a Career
Tools
● Cloud - AWS, Google Cloud,
Azure ML.
● Solr/Elastic Search
● Spark
● NoSQL & SQL Databases
● Message Queues
● Streaming - Kafka
After that
● Neural Networks
● Deep Learning
Where do I
start?
● Coursera : Andrew NG Machine
Learning Course https://goo.gl/fDTwSE
● Youtube : Prof. Sengupta
https://goo.gl/JGG6th
● People to follow.
○ Andrew NG
○ Bernard Marr - AI Journalist.
○ Geoffrey Hinton
○ Roman Trusov
○ Many people :
https://www.quora.com/Who-are-some
-notable-machine-learning-researchers
● Books
○ Programming Collective Intelligence
Data Science
Career - Reality
Check
Data Acquisition and
Preparation - Major time
consuming task
Data Science
Career - Reality
Check
● Cloud Native
Applications
● Engineering Problem
● Cloud Services
○ Google Cloud -Natural Language API
○ MS Azure -ML - Drag and Drop data.
○ Crowd Flower
■ AI Platform
● Predominantly, Engineering problem.
Data Science
Career - Reality
Check
● Industry Expectations
● Industry Expectations
○ Azure Data Modelling and Data
Scientist - Mumbai - Experience – 10
years
○ New Project :
■ Food delivery company.
■ Save More Money.
■ Problem : Delivery not on time,
hence free food delivered.
■ Which machine learning
algorithm can save them?
Data Science
Career - Reality
Check
● Future Assessment
● Data Science is a Craft.
● Project Shelf Life
● Future
○ Everyone wants to be a DS
○ Industry is unaware what to expect
of Data Scientist
○ Business Value is not Clear (Very
few orgs realize it)
○ Competition
■ Cloud Services
■ People
Questions & Answers
● Can we use machine learning to make algorithms which will learn machine
learning on itself and apply to multiple domains out of which it was meant
to be and eliminate human machine learning developers ?
● How easy or difficult is to learn machine learning?
● How can you identify business problem is eligible for ML?
Questions & Answers
● What is difference Between ML & Deep learning?
● How machine learning is going to change various industries?
Questions & Answers
Any Questions?
Contact
Email : sindagimanju@gmail.com
Quora : https://www.quora.com/profile/Manjunath-Sindagi
Twitter : https://twitter.com/smanjunath
Linkedin : https://www.linkedin.com/in/smanjunath/

Introduction to machine learning and applications (1)

  • 1.
    Introduction to Machine Learningand Applications Manjunath Sindagi 1.07.2017
  • 2.
    Agenda ● About myself ●Data and its Characteristics ● Introduction to Machine Learning ● Applications ● Example of Linear Regression ● Career ● Q & ATopics
  • 3.
    About Myself ● Overa decade of Experience in the data field. ● Information Retrieval (Search), NLP, Extraction,Recommendation Engine,Machine Learning ● Half a dozen data products. ● Worked for startups ● Interests : Defining Product Strategy, Defining data problems (Business to technical) , Consulting, Mentoring Startups, Solving Data Problems, Quoran, Teaching and Photography
  • 4.
    Data & Information What is? ●Datum - ‘something given’ ● A Datum is single factual, single entity, point of matter ● Data (data sets) represents a collection of data points (Datums). ● Information - data is processed, organized, structured or presented in a context to make it useful
  • 5.
    Data Different Sources ● Hugelist of file formats ● Text - Word, PDF, Excel, PPTx, RTF, Jar , txt, HTML ● Image - jpg, bmp, svg ● Video - .avi, .mov, .flv, .mp4 ● Audio - .wav, .mp3 etc ● Present in ○ Structured ○ Semi Structured ○ Unstructured
  • 6.
  • 7.
  • 8.
    Data Characteristics 4 Vs ofData ➢ Volume ➢ Velocity ➢ Variety ➢ Veracity
  • 9.
    Artificial Intelligence Artificial Intelligence isdefined as the science of making computers do things that require intelligence when done by Humans Making sense out of data .
  • 10.
  • 11.
    Applications ● News andVideo Aggregator ● Search Engines (Recommendations) ● Stock Recommendation ● User Profile Creations using Public data ○ Disambiguation of Authors. ● Stack Recommendation Real World Projects that I worked On.
  • 12.
    Applications ● Automation ofTweets to Users for Song Suggestions ● Competitive Intelligence ● Automatic Generation of Table of Contents from Documents ● Chatbots ● Q & A Engine Real World Problems that I worked On.
  • 13.
    Machine Learning Arthur Samuel(1958) Field of Study that gives computers the ability to learn without being explicitly programmed. - Checker Program Tom Mitchell (98) A Computer program is said to learn from Experience E with respect to some task T and some performance measure P if its performance on T as measured by P improves with experience E
  • 14.
    Example ● Classification -Spam or No Spam ○ Task - Spam or No Spam ○ Experience - Watching the label as spam or not ○ Performance - Emails correctly classified ● Machine Learning - Grown out of AI field
  • 15.
    More Examples ● SearchEngine ● Email Response ● Handwriting recognition ● Product Recommendations ● Detecting Objects from Images ● Many More examples
  • 16.
  • 17.
    Machine Learning Types ➢ Supervised Learning ➢Unsupervised Learning ➢ Others - Reinforcement Learning, Recommendation Engine
  • 18.
    Machine Learning Supervised Learning ➢ HousingPrice Prediction ➢ Breast Cancer ( Malignant or Benign)
  • 19.
    Supervised Learning Continued SLNO Area Price 1328422818 795000.00 134364 3032 399000.00 135141 3540 545000.00 135712 1249 909000.00 136282 1800 109900.00 136431 1603 324900.00 137036 1450 192900.00 137090 3360 215000.00 137159 1323 999000.00 137570 1750 319000.00 138053 1400 ? Housing Prices
  • 20.
    Supervised Learning Continued ID Number ClumpThickness Class (0=benign, 1=malignant) 1000025 5 1 1002945 5 1 1015425 3 0 1016277 6 1 1017023 4 0 1017122 8 1 1018099 1 0 1018561 2 1 1033078 2 1 1033078 4 ? Cancer Data
  • 21.
  • 22.
    Supervised Learning ➢ RightAnswers Given ➢ Regression ○ Predict - What is Price? ○ Continuous Output ○ Housing Price ➢ Classification ○ Discrete Output - 0,1,2 etc ○ Breast Cancer In reality, Many features are present Supervised Learning Regression Classification
  • 23.
    Machine Learning ● No Labelsor right answers given ● Pattern or Structure needs to be identified ● Not told what to do with data. Unsupervised Learning
  • 24.
  • 25.
    Unsupervised Learning ➢ Social NetworkAnalysis ➢ Market Segmentation ○ Grouping Customers ➢ News ➢ Grouping Investigators from Pubmed Articles ➢ Innumerable examples ➢ Images that has Human Face. Clustering
  • 26.
    Machine Learning ●Decision Trees ● Naive Bayes ● Linear regression ● SVM - Support Vector Machines ● Logistic Regression ● K-Means ● Apriori ● Nearest Neighbours Most Commonly used Algorithms
  • 27.
    Unsupervised Learning ➢ Audio refinement- Cocktail Party Algorithm ➢ Fraud Detection ➢ Default detection Non-Clustering
  • 28.
    Machine Learning What are theSteps to Solve? Example of Housing Price Prediction using Linear Regression.
  • 29.
    Steps to Solve ●Feature selection SlNo Length Breadth Area Price Area 1 20 40 800 25L 0 2 30 50 1500 50L 1
  • 30.
    Steps to Solve ●Feature scaling ● Model Selection Feature Scaling ● X’ = ( X - Xmin )/(Xmax - Xmin ) ● X` = (X- μ)/σ Model/Hypothesis y = mx + c
  • 31.
    Steps to Solve ●Parameter Selection ● Cost Function ● m and c - select randomly ● cost = 1/2n Σ n i (yi - y)2
  • 32.
    Steps to Solve ●Gradient Descent ● Find Min Cost using Gradient Descent ● Θ is m and c respectively W is m here
  • 33.
    Steps to Solve ●Gradient Descent ● Find Min Cost using Gradient Descent ●
  • 34.
    Steps to Solve ●Evaluation ● Data - Train data , Test data ● Sensitivity, Specificity ●
  • 35.
    Steps to Solve ●Evaluation ● ROC Curve
  • 36.
    Steps to Solve ●Feature Selection ● Feature Scaling ● Model Selection ● Parameter Selection ● Cost Function ● Gradient Descent ● Evaluation Summary
  • 37.
  • 38.
  • 39.
    Data Science asa Career Mathematics ● Linear Algebra ● Differentiation ● Probability & Statistics ● Calculus Machine Learning ● Basics ● Text ● Image ● Video Programming Language ● Python ● R ● Java ● Spark (PySpark)
  • 40.
    Data Science asa Career Tools ● Cloud - AWS, Google Cloud, Azure ML. ● Solr/Elastic Search ● Spark ● NoSQL & SQL Databases ● Message Queues ● Streaming - Kafka After that ● Neural Networks ● Deep Learning
  • 41.
    Where do I start? ●Coursera : Andrew NG Machine Learning Course https://goo.gl/fDTwSE ● Youtube : Prof. Sengupta https://goo.gl/JGG6th ● People to follow. ○ Andrew NG ○ Bernard Marr - AI Journalist. ○ Geoffrey Hinton ○ Roman Trusov ○ Many people : https://www.quora.com/Who-are-some -notable-machine-learning-researchers ● Books ○ Programming Collective Intelligence
  • 42.
    Data Science Career -Reality Check Data Acquisition and Preparation - Major time consuming task
  • 43.
    Data Science Career -Reality Check ● Cloud Native Applications ● Engineering Problem ● Cloud Services ○ Google Cloud -Natural Language API ○ MS Azure -ML - Drag and Drop data. ○ Crowd Flower ■ AI Platform ● Predominantly, Engineering problem.
  • 44.
    Data Science Career -Reality Check ● Industry Expectations ● Industry Expectations ○ Azure Data Modelling and Data Scientist - Mumbai - Experience – 10 years ○ New Project : ■ Food delivery company. ■ Save More Money. ■ Problem : Delivery not on time, hence free food delivered. ■ Which machine learning algorithm can save them?
  • 45.
    Data Science Career -Reality Check ● Future Assessment ● Data Science is a Craft. ● Project Shelf Life ● Future ○ Everyone wants to be a DS ○ Industry is unaware what to expect of Data Scientist ○ Business Value is not Clear (Very few orgs realize it) ○ Competition ■ Cloud Services ■ People
  • 46.
    Questions & Answers ●Can we use machine learning to make algorithms which will learn machine learning on itself and apply to multiple domains out of which it was meant to be and eliminate human machine learning developers ? ● How easy or difficult is to learn machine learning? ● How can you identify business problem is eligible for ML?
  • 47.
    Questions & Answers ●What is difference Between ML & Deep learning? ● How machine learning is going to change various industries?
  • 48.
  • 49.
    Contact Email : sindagimanju@gmail.com Quora: https://www.quora.com/profile/Manjunath-Sindagi Twitter : https://twitter.com/smanjunath Linkedin : https://www.linkedin.com/in/smanjunath/