Introduction to machine learning and applications (1)

Introduction to Machine
Learning and Applications
Manjunath Sindagi
1.07.2017

Agenda
● About myself
● Data and its Characteristics
● Introduction to Machine Learning
● Applications
● Example of Linear Regression
● Career
● Q & ATopics

About Myself
● Over a decade of Experience in the data field.
● Information Retrieval (Search), NLP, Extraction,Recommendation
Engine,Machine Learning
● Half a dozen data products.
● Worked for startups
● Interests : Defining Product Strategy, Defining data problems (Business to
technical) , Consulting, Mentoring Startups, Solving Data Problems,
Quoran, Teaching and Photography

Data &
Information
What is?
● Datum - ‘something given’
● A Datum is single factual, single
entity, point of matter
● Data (data sets) represents a
collection of data points
(Datums).
● Information - data is processed,
organized, structured or
presented in a context to make it
useful

Data
Different Sources
● Huge list of file formats
● Text - Word, PDF, Excel, PPTx,
RTF, Jar , txt, HTML
● Image - jpg, bmp, svg
● Video - .avi, .mov, .flv, .mp4
● Audio - .wav, .mp3 etc
● Present in
○ Structured
○ Semi Structured
○ Unstructured

Data
Characteristics
4 Vs of Data
➢ Volume
➢ Velocity
➢ Variety
➢ Veracity

Artificial
Intelligence
Artificial Intelligence is defined as
the science of making computers
do things that require intelligence
when done by Humans
Making sense out of data .

Applications
● News and Video Aggregator
● Search Engines
(Recommendations)
● Stock Recommendation
● User Profile Creations using
Public data
○ Disambiguation of Authors.
● Stack Recommendation
Real World Projects that I
worked On.

Applications
● Automation of Tweets to Users
for Song Suggestions
● Competitive Intelligence
● Automatic Generation of Table
of Contents from Documents
● Chatbots
● Q & A Engine
Real World Problems that I
worked On.

Machine Learning
Arthur Samuel (1958)
Field of Study that gives computers the ability to
learn without being explicitly programmed.
- Checker Program
Tom Mitchell (98)
A Computer program is said to learn from
Experience E with respect to some task T and
some performance measure P if its
performance on T as measured by P improves
with experience E

Example
● Classification - Spam or No
Spam
○ Task - Spam or No Spam
○ Experience - Watching the label as
spam or not
○ Performance - Emails correctly
classified
● Machine Learning - Grown out
of AI field

More Examples
● Search Engine
● Email Response
● Handwriting recognition
● Product Recommendations
● Detecting Objects from Images
● Many More examples

Machine
Learning
Block Diagram

Machine
Learning
Types
➢ Supervised Learning
➢ Unsupervised Learning
➢ Others - Reinforcement
Learning, Recommendation
Engine

Machine
Learning
Supervised Learning
➢ Housing Price Prediction
➢ Breast Cancer ( Malignant or
Benign)

Supervised
Learning
Continued
SLNO Area Price
132842 2818 795000.00
134364 3032 399000.00
135141 3540 545000.00
135712 1249 909000.00
136282 1800 109900.00
136431 1603 324900.00
137036 1450 192900.00
137090 3360 215000.00
137159 1323 999000.00
137570 1750 319000.00
138053 1400 ?
Housing Prices

Supervised
Learning
Continued
ID Number Clump Thickness
Class (0=benign,
1=malignant)
1000025 5 1
1002945 5 1
1015425 3 0
1016277 6 1
1017023 4 0
1017122 8 1
1018099 1 0
1018561 2 1
1033078 2 1
1033078 4 ?
Cancer Data

Supervised Learning
➢ Right Answers Given
➢ Regression
○ Predict - What is Price?
○ Continuous Output
○ Housing Price
➢ Classification
○ Discrete Output - 0,1,2 etc
○ Breast Cancer
In reality, Many features are
present
Supervised
Learning
Regression Classification

Machine
Learning
● No Labels or right answers given
● Pattern or Structure needs to be
identified
● Not told what to do with data.
Unsupervised Learning

Unsupervised Learning - Clustering

Unsupervised
Learning
➢ Social Network Analysis
➢ Market Segmentation
○ Grouping Customers
➢ News
➢ Grouping Investigators from
Pubmed Articles
➢ Innumerable examples
➢ Images that has Human Face.
Clustering

Machine Learning ● Decision Trees
● Naive Bayes
● Linear regression
● SVM - Support Vector
Machines
● Logistic Regression
● K-Means
● Apriori
● Nearest Neighbours
Most Commonly used
Algorithms

Unsupervised
Learning
➢ Audio refinement - Cocktail
Party Algorithm
➢ Fraud Detection
➢ Default detection
Non-Clustering

Machine
Learning
What are the Steps to Solve?
Example of Housing Price
Prediction using Linear
Regression.

Steps to Solve
● Feature selection
SlNo Length Breadth Area Price Area
1 20 40 800 25L 0
2 30 50 1500 50L 1

Steps to Solve
● Feature scaling
● Model Selection
Feature Scaling
● X’ = ( X - Xmin
)/(Xmax
- Xmin
)
● X` = (X- μ)/σ
Model/Hypothesis
y = mx + c

Steps to Solve
● Parameter Selection
● Cost Function
● m and c - select randomly
● cost = 1/2n Σ n
i
(yi
- y)2

Steps to Solve
● Gradient Descent
● Find Min Cost using Gradient
Descent
●
Θ is m and c
respectively
W is m here

Steps to Solve
● Find Min Cost using Gradient
Descent
●

Steps to Solve
● Evaluation
● Data - Train data , Test data
● Sensitivity, Specificity
●

Steps to Solve
● Evaluation
● ROC Curve

Steps to Solve
● Feature Selection
● Feature Scaling
● Model Selection
● Parameter Selection
● Cost Function
● Evaluation
Summary

Data Science as
Career
Intersection of Different fields

Data Science as a Career
Mathematics
● Linear Algebra
● Differentiation
● Probability &
Statistics
● Calculus
Machine Learning
● Basics
● Text
● Image
● Video
Programming
Language
● Python
● R
● Java
● Spark (PySpark)

Data Science as a Career
Tools
● Cloud - AWS, Google Cloud,
Azure ML.
● Solr/Elastic Search
● Spark
● NoSQL & SQL Databases
● Message Queues
● Streaming - Kafka
After that
● Neural Networks
● Deep Learning

Where do I
start?
● Coursera : Andrew NG Machine
Learning Course https://goo.gl/fDTwSE
● Youtube : Prof. Sengupta
https://goo.gl/JGG6th
● People to follow.
○ Andrew NG
○ Bernard Marr - AI Journalist.
○ Geoffrey Hinton
○ Roman Trusov
○ Many people :
https://www.quora.com/Who-are-some
-notable-machine-learning-researchers
● Books
○ Programming Collective Intelligence

Data Science
Career - Reality
Check
Data Acquisition and
Preparation - Major time
consuming task

Data Science
Career - Reality
Check
● Cloud Native
Applications
● Engineering Problem
● Cloud Services
○ Google Cloud -Natural Language API
○ MS Azure -ML - Drag and Drop data.
○ Crowd Flower
■ AI Platform
● Predominantly, Engineering problem.

Data Science
Career - Reality
Check
● Industry Expectations
● Industry Expectations
○ Azure Data Modelling and Data
Scientist - Mumbai - Experience – 10
years
○ New Project :
■ Food delivery company.
■ Save More Money.
■ Problem : Delivery not on time,
hence free food delivered.
■ Which machine learning
algorithm can save them?

Data Science
Career - Reality
Check
● Future Assessment
● Data Science is a Craft.
● Project Shelf Life
● Future
○ Everyone wants to be a DS
○ Industry is unaware what to expect
of Data Scientist
○ Business Value is not Clear (Very
few orgs realize it)
○ Competition
■ Cloud Services
■ People

Questions & Answers
● Can we use machine learning to make algorithms which will learn machine
learning on itself and apply to multiple domains out of which it was meant
to be and eliminate human machine learning developers ?
● How easy or difficult is to learn machine learning?
● How can you identify business problem is eligible for ML?

Questions & Answers
● What is difference Between ML & Deep learning?
● How machine learning is going to change various industries?

Questions & Answers
Any Questions?

Contact
Email : sindagimanju@gmail.com
Quora : https://www.quora.com/profile/Manjunath-Sindagi
Twitter : https://twitter.com/smanjunath
Linkedin : https://www.linkedin.com/in/smanjunath/

Introduction to machine learning and applications (1)

More Related Content

What's hot

Similar to Introduction to machine learning and applications (1)

Recently uploaded

Introduction to machine learning and applications (1)