Introduction to Machine Learning
Agenda
● What is Machine Learning?
● Classification of Machine Learning
- Supervised learning
- Unsupervised learning
● Categorizing based on Required Output
- Classification
- Regression
- Clustering
● ML lifecycle
● ML applications
What is Machine Learning?
What is Machine Learning?
“ Machine Learning is the field of study that gives computers
the ability to learn without being explicitly programmed. ”
– Arthur Samuel
“Machine learning is a branch of artificial intelligence where
computers learn from data.”
– John McCarthy
Machine Learning (ML) involves feeding data into
algorithms that can then identify patterns and make
predictions on new data.
Example:
Think of it like teaching a child to recognize fruits by showing them lots of pictures of different
fruits and telling them the names. Over time, the child learns to identify new fruits on their own.
Classification of Machine Learning
How Supervised Machine learning Works
In supervised learning, we use
labeled data, which means each
example has the correct answer with
it. The model learns from these
examples. Once the model is trained,
it is tested using a separate set of
data, known as the test data, to
evaluate its performance and
accuracy.
Supervised Learning is further
divided into 2 types :
● Classification
● Regression
Supervised learning algorithms
● k-Nearest Neighbors
● Linear Regression
● Logistic Regression
● Naive Bayes
● Support Vector Machines (SVMs)
● Decision Trees and Random Forests
● Neural networks
How Unsupervised Machine learning Works
In unsupervised learning, we do not
use labeled data. Instead, the model
analyzes the data to discover hidden
patterns and structures. This
approach helps systems understand
and organize the data without
predefined labels.
Unsupervised learning further
divideed into 2 categories:
● Clustering
● Anomaly Detection
Unsupervised learning algorithms
● K-means clustering
● KNN (k-nearest neighbors)
● Hierarchical Cluster Analysis (HCA)
● Anomaly detection
● Neural Networks
● Principal Component Analysis (PCA)
● Independent Component Analysis
● Apriori algorithm
● Singular value decomposition
Supervised learning vs Unsupervised learning
Categorizing based on Required Output
Classification Regression
Classification is the process of grouping of Regression is used to determine the relationship
objects into two or more classes and ideas between one input variable and an output
into preset categories. variable. It is primarily used for predicting
Example: spam filter continuous variables rather than discrete ones.
A spam filter is trained with many example Example: predict price of a car
emails labeled as either spam or ham, and The model is trained using features such as
it learns to classify new emails into these mileage, age, and brand, along with the actual
categories.
prices, and then it predicts the price of a new car.
Categorizing based on Required Output
Clustering Association
Clustering is collection of unlabeled objects Association rule mining is a rule-based approach
into clusters with similarities in one group to reveal interesting relationships between data
and with no similarities in different group. points in large datasets.
Example: Grouping Customers Example: Recommendation engines
A customer segmentation model can Using association rules, unsupervised machine
analyze purchasing behavior and group learning can help explore transactional data to
customers into clusters based on their discover patterns or trends that can be used to
buying patterns.
drive personalized recommendations for online
retailers.
ML applications
ML lifecycle
1. Gathering Data
2. Data preparation
3. Data Wrangling
4. Analyse Data
5. Train Model
6. Test Model
7. Deployment
Data preparation
1. Reading/Loading the data
2. Filling/Dropping null values
3. Converting categorical variables into numerical
4. Feature engineering(will talk about it later)
5. Normalization
6. Splitting the data into train and test
7. Model training
8. Testing
9. Visualization(might be in the middle of this cycle)
Loading data
Converting cat to num
Filling null values
Converting cat to num
Normalization
Normalization
Splitting the data(70/30)
Linear regression
Accuracy
LET’S PRACTICE