Machine Learning
Types of AI
Machine Learning
Types of ML
Machine Learning 1
Machine Learning 2
Supervised learning: The computer is provided with labeled training data and
learns to map inputs to outputs.
Unsupervised learning: The computer is provided with unlabeled data and
learns to find underlying structures or patterns in the data.
Reinforcement learning: The computer learns to make decisions in an
environment by receiving rewards or punishments for its actions.
Deep learning: A type of machine learning that involves training artificial
neural networks with multiple layers to learn complex patterns in data.
Supervised ML algos:
Linear regression
Logistic regression
Machine Learning 3
Decision tree
Support vector machine (SVM)
Naive Bayes
Linear discriminant analysis
K-nearest neighbors (KNN)
Neural networks
Random forest
Gradient boosting
XGBoost
Stochastic gradient descent
Adaptive boosting (AdaBoost)
Bagging
Classification and regression trees (CART)
Conditional random fields (CRF)
Gaussian processes (GP)
Hidden Markov models (HMM)
Kalman filter
Maximum entropy (MaxEnt)
Unsupervised Learning:
K means clustering
Hierarchy clustering
DBSCAN
GMM - Gussian Mixture Models
PCA - Principal component Analysis
Machine Learning 4
t-SNE
Associate Rule Learning (Apriori Learning)
Auto encoder
Self-organizing maps (SOM)
Reinforcement Learning:
Q-learning
Deep Q-network (QN)
Policy gradient methods
Actor critic methods (A2C)
Proximal Policy optimization (PPO)
Transfer Learning:
Pre-trained models, fine-tuning, domain adoption, Multi-task learning, Model
ensemble, one-shot learning
Deep Learning:
CNN, RNN, GANs, Auto-encoders, transformers, DBNs
Ensemble Learning:
Bagging, Boosting, Stacking, Voting
Terminologies:
Overfitting: Holds the exact pattern of data - couldn’t do well on test data
Underfitting: Didn’t hold the pattern - couldn’t predict well on test data
Machine Learning 5
Batch/offline ML: entire dataset is trained on Local Machine - deployed on Server
Online ML: Data will be dynamically feeded into model - Dynamic learning
Model Based ML: Draws the best fit line is the model
Instance based ML: Load all the data as model - calculate distance between test
data vs model data (training data) - Lazy Learner
MLDLC - ML Development Life Cycle
1. Frame the problem
2. Gather the data
3. Data Preprocessing
4. Exploratory Data Analysis (EDA)
5. Feature Engineering and Feature Selection
6. Model Training, Evaluation and selection
7. Model Deployment
8. Testing
9. Optimize
Machine Learning 6
1. Frame the problem:
2. Gather the data:
Machine Learning 7
Loading a CSV file
# Import necessary libraries
import pandas as pd
# Load data from csv file
data = pd.read_csv('filename.csv')
# Print the first 5 rows of the dataframe
print(data.head())
Collection of data from an API
# Import necessary libraries
import requests
import json
# Define the API endpoint
url = '<https://api.example.com/data>'
Machine Learning 8
# Send a GET request to the API
response = requests.get(url)
# Convert the response to JSON format
data = response.json()
# Print the data
print(json.dumps(data, indent=4))
https://youtu.be/roTZJaxjnJc?feature=shared
Web Scraping:
# Import necessary libraries
from bs4 import BeautifulSoup
import requests
# Specify url
url = '<https://www.example.com>'
# Send a GET request to the website
response = requests.get(url)
# Parse the html content
soup = BeautifulSoup(response.content, 'html.parser')
# Print out the parsed HTML
print(soup.prettify())
Machine Learning 9
https://youtu.be/8NOdgjC1988?feature=shared
From JSON/SQL
https://youtu.be/fFwRC-fapIU?feature=shared
3. Data Preprocessing
Structural Issues
Data from different sources - Not compatable
Remove Duplicates
Handle Missing Values
Outliers
Scale - Standardization or Normalization
Few General Operations:
df.shape()
df.head()
df.tail()
df.sample()
Machine Learning 10
df.isnull().sum()
df.dupliacted().sum()
df.describe() # High level maths
df.info() # Column details
df.corr()
df.corr()['Age']
Here are some of the operations we perform during data preprocessing, along
with their respective Python codes:
1. Removing Duplicates:
import pandas as pd
# Assuming df is your DataFrame
df = pd.read_csv('filename.csv')
# Removing duplicates
df = df.drop_duplicates()
2. Handling Missing Values:
# You can fill missing values with some value or median, mean
of the column
df = df.fillna(value)
# Or you can drop rows with missing values
df = df.dropna()
3. Handling Outliers:
# Assuming 'column' is a column in df with outliers
Q1 = df['column'].quantile(0.25)
Q3 = df['column'].quantile(0.75)
Machine Learning 11
IQR = Q3 - Q1
# Removing outliers
df = df[~((df['column'] < (Q1 - 1.5 * IQR)) |(df['column'] >
(Q3 + 1.5 * IQR)))]
4. Feature Scaling (Standardization):
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
# Assuming X is your features DataFrame
X = pd.DataFrame(scaler.fit_transform(X), columns = X.column
s)
5. Feature Scaling (Normalization):
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
# Assuming X is your features DataFrame
X = pd.DataFrame(scaler.fit_transform(X), columns = X.column
s)
4. Exploratory Data Analysis (EDA):
“Study of relationship between INPUT and OUTPUT features”
“Getting Idea about the data”
“Experiment and Extract the relationships”
Machine Learning 12
Visualization
Univariate Analysis/ Bivariate Analysis
Outlier Detection
Data Imbalance
1. Visualization
the first question would be if data is numerical or categorical?
working on Categorical columns
1. Count plot
sns.countplot(df['survived'])
df['survived'].value_counts().plot(kind='bar')
2. Pie chart
df['survived'].value_counts().plot(kind='pie', autopct='%.2f')
working on Numerical columns:
1. Histogram
import matplotlib.pyplot as plt
plt.hist(df['Age'], bins=10) #bin - kinda ZOOMIN
Machine Learning 13
2. Distplot - pdf [probability Density Function]
sns.distplot(df['Age'])
3. Box plot
sns.boxplot(df['Age'])
2. Bivariate and Multi variate Analysis
1. Scatter plot [Num vs Num]
bivariate
sns.scatterplot(tips['totalbill'], tips['tip'])
Machine Learning 14
multivariate
sns.scatterplot(tips['total_bill'], tips['tip'], hue=df['sex])
sns.scatterplot(tips['total_bill'], tip['tip'], hue=df['sex'], s
# hue - change in color
# style - change in shape
# size - change in size
5. Feature Engineering and Selection:
Selecting Features
Merging columns
Machine Learning 15
Minimizing Columns - Time & Cost efficient
Gradient Descent
https://www.youtube.com/watch?v=qg4PchTECck&list=PLqwozWPBo-Fvu
HWx3_aYwG2WVdbb-wC6q&index=2
import numpy as np
# Initialize parameters
learning_rate = 0.01
num_iterations = 1000
m, theta = np.zeros(shape=(2,1)), 0
# Gradient Descent
for i in range(num_iterations):
prediction = m * X + theta
error = prediction - y
m = m - learning_rate * (1/n) * np.dot(X.T, error)
theta = theta - learning_rate * error.sum()
print("Gradient Descent has converged at m = ", m, ", theta =
", theta)
Linear Regression:
Use linear regression in machine learning when you have a continuous target
variable and want to model the linear relationship between input features and the
Machine Learning 16
target, making it suitable for predicting numerical outcomes.
https://www.youtube.com/watch?v=CtsRRUddV2s
# Import necessary libraries
import numpy as np
from sklearn.linear_model import LinearRegression
# Load dataset
dataset = np.loadtxt("[dataset_file_name]", delimiter=",")
X = dataset[:, 0:n_features] # X is a 2D array of feature data (
y = dataset[:, n_features:] # y is a 1D array of target data
# Train the model
regr = LinearRegression()
regr.fit(X, y)
# Make predictions
predictions = regr.predict(X)
# Evaluate model performance
[evaluation_metric] = regr.score(X, y)
Logistic regression:
https://www.youtube.com/watch?v=L_xBe7MbPwk
Unsupervised Learning
Machine Learning 17
PCA - Principal Component Analysis
https://www.youtube.com/watch?v=FD4DeN81ODY
Machine Learning 18