KEMBAR78
11-130 Org | PDF | Machine Learning | Logistic Regression
0% found this document useful (0 votes)
36 views120 pages

11-130 Org

The document outlines a comprehensive Machine Learning with Python course, covering key concepts, algorithms, and practical applications. It includes sections on supervised, unsupervised, and reinforcement learning, as well as Jupyter Notebook setup and statistics for ML. Additionally, it discusses project development, career preparation, and an internship overview at Epro Academy & Glossary Softech.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views120 pages

11-130 Org

The document outlines a comprehensive Machine Learning with Python course, covering key concepts, algorithms, and practical applications. It includes sections on supervised, unsupervised, and reinforcement learning, as well as Jupyter Notebook setup and statistics for ML. Additionally, it discusses project development, career preparation, and an internship overview at Epro Academy & Glossary Softech.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 120

CONTENTS

MACHINE LEARNING WITH PYTHON COURSE CONTENT

Introduction to Machine Learning & Key Concepts


1. Introduction to Machine Learning
a. What is Machine Learning?
b. AI vs ML vs Deep Learning
2. How Does Machine Learning Work?
a. Overview of the ML lifecycle (Data collection, preprocessing, training,
testing, etc.)
b. Types of Machine Learning
3. Supervised Learning
a. What is Supervised Learning?
b. Supervised Learning Examples (Classification, Regression)
4. Unsupervised Learning
a. What is Unsupervised Learning?
b. Unsupervised Learning Examples (Clustering)
5. Reinforcement Learning
a. What is Reinforcement Learning?
b. Reinforcement Learning Examples (Games, Robotics)

Jupyter Notebook & Introduction to Machine Learning Algorithms


1. Setting up Jupyter Notebook
a. Installation, Setup, and Basic Commands
b. Introduction to Python Libraries for ML
2. Introduction to Machine Learning Algorithms
a. Overview of Classification Algorithms
b. Anomaly Detection Algorithms
3. Clustering Algorithms Overview
a. Introduction to Clustering
b. Types of Clustering Algorithms
Page | 11
4. Regression Algorithms
a. Introduction to Regression
b. Linear vs Logistic Regression
5. Demo Iris Dataset
a. Practical demo Using Scikitlearn with Iris dataset for classification

Statistics & Probability for Machine Learning


1. Categories of Data
a. Qualitative vs Quantitative Data
b. Understanding the importance of data types
2. Introduction to Statistics
a. Basic statistics terminology and concepts
3. Sampling Techniques
a. Random Sampling, Systematic Sampling, Stratified Sampling
4. Descriptive Statistics
a. Measures of Central Tendency Mean, Median, Mode
b. Measures of Spread Variance, Standard Deviation
5. Probability Theory
a. Basic Probability Terminology
b. Types of Events and Probability of Distribution

Advanced Statistics for Machine Learning


1. Marginal Probability and Joint Probability
a. Understanding Marginal and Joint Probability
b. Conditional Probability and Bayes Theorem
2. Inferential Statistics
a. Point Estimation, Interval Estimate, Margin of Error
3. Hypothesis Testing
a. Introduction to hypothesis testing
b. Different types of tests and how they’re used in ML
4. Information Gain & Entropy

Page | 12
a. Understanding Information Gain and Entropy
b. Applications in decision trees and classification
5. Confusion Matrix
a. Understanding metrics like Accuracy, Precision, Recall, FScore
b. Confusion Matrix in real world problems

Supervised Learning Algorithms (Part 1 )


1. Linear Regression
a. Introduction and working of Linear Regression
b. Real life application and Titanic Data Analysis example
2. Logistic Regression
a. Understanding Logistic Regression Curve
b. Handson demo
3. Decision Trees
a. What is Classification?
b. Introduction to Decision Trees and Terminologies
4. Entropy & Gini Index
a. Understanding decision tree splitting criteria
b. Decision Tree on Credit Risk Detection Usecase

Supervised Learning Algorithms (Part 2)


1. Random Forest
a. Introduction to Random Forest and its advantages
b. Random Forest Usecases
2. KNN Algorithm
a. KNearest Neighbors Algorithm and Working
b. KNN Demo on a dataset
3. Naive Bayes
a. Naive Bayes working and types (Gaussian, Multinomial, Bernoulli)

Page | 13
b. Industrial Use of Naive Bayes (PIMA Diabetic Test)
4. Support Vector Machine (SVM)
a. SVM theory and application
b. Nonlinear SVM, SVM usecase examples

Unsupervised Learning & Clustering


1. Introduction to Clustering
a. Types of Clustering Algorithms (KMeans, Hierarchical)
b. Practical use cases of clustering
2. KMeans Clustering
a. Understanding the KMeans algorithm
b. KMeans Working, Pros & Cons
c. KMeans Demo on customer segmentation
3. Hierarchical Clustering
a. Understanding Agglomerative and Divisive methods
b. Hierarchical Clustering practical example
4. Association Rule Mining
a. Introduction to Apriori Algorithm and its applications
b. Apriori Algorithm demo

Reinforcement Learning
1. Markov’s Decision Process (MDP)
a. Understanding MDP and its importance in RL
b. QLearning overview and implementation
2. The Bellman Equation
a. Indepth understanding of Bellman Equation
b. Transitioning to QLearning in practical applications
3. Implementing QLearning
a. Implement QLearning in a simulated environment
b. CounterStrike example and its usecase in RL
4. Reinforcement Learning Projects

Page | 14
a. Handson practice with an RL project
b. Apply QLearning to a selfplay game

Machine Learning Project Development (Part 1)


1. Data Preprocessing
a. Data Cleaning, Handling missing values, and encoding categorical data
2. Model Training & Testing
a. Split the dataset into training, testing, and validation
b. Model evaluation metrics
3. Model Deployment Basics
a. Deploying machine learning models to the cloud (AWS, Azure)
b. Building a basic web interface for ML models

Python Basics:
1. Introduction
a. I/O statements in python
b. Variables, identifiers, statements, conditions
c. programs
2. Operators
a. Data types
b. Functions/ File handling
c. comprehensions
3. OOPS concepts
a. OOPS basics(class,object, concepts)
b. Programs

Machine Learning Project Development (Part 2)


1. Advanced ML Algorithms
a. XGBoost, LightGBM, CatBoost
b. How these algorithms differ from traditional models

Page | 15
2. Real World ML Project
a. Start a complete end to end project Data preprocessing, Model training, and
Evaluation
3. Optimization & Hyperparameter Tuning
a. Tuning hyperparameters using GridSearch and RandomSearch
b. Cross validation in practice
4. Project Demo & Presentation
a. Prepare the project for presentation
b. Evaluate the model on the test set and validate predictions

Career Prep and Job Readiness


1. Machine Learning Engineer Role & Responsibilities
a. Overview of the ML Engineer job, skills, and career path
b. Job description, salary trends, and opportunities
2. Building an ML Engineer Resume
a. Tailoring a resume for ML job applications
b. Key skills and projects to showcase
3. ML Engineer Interview Preparation
a. Common interview questions and answers
b. Mock interviews and case study exercises
4. Building an Online Portfolio (GitHub/Kaggle)
a. Creating an online portfolio to showcase projects
b. Optimizing GitHub repositories for better visibility

Capstone Project & Review


1. Capstone Project Development
a. Choose a project involving various ML algorithms (e.g., Classification +
Clustering).
b. Develop the project end to end (Data preprocessing, Algorithm implementation,
Evaluation, and Deployment).

Page | 16
CHAPTER1: EXECUTIVE SUMMARY

The internship report shall have a brief executive summary. It shall include five or more
Learning Objectives and Outcomes achieved, a brief description of the sector of business
and intern organization and summary of all the activities done by the intern during the
period.

CHAPTER2: OVER VIEW OF THE ORGANIZATION

Suggestive contents

 Introduction of the Organization Epro academy & Glossary Softech committed EdTech
Organizations with high quality Trainings and Placements record. Epro academy & Glossary
Softech are Involved in training and placements of Engineering, Degree, Polytechnic, ITI,
BPO/Tech Support

 Inbound & Outbound Support

 Data Management

 Training & Soft skills

 Software Development

 Staffing and solutions

 Academic projects and Vocational students from 15+ years.

A. Vision, Mission, and Values of the Organization.

Epro academy & Glossary Softech have a vision to become a INDIA’S Best Training
Organizations and make the students expert in both academics and technology related knowledge and
mission to provide the best quality training at a reasonable price like a non-profitable organization we
follow quality training and value-added services for the students and Working People.

B. Policy of the Organizations in relation to the intern role.

The policy of Epro Academy & Glossary Softech towards of the Interns is to Provide
Best quality Training in Both Online and Offline mode so Intern can get best Job
opportunities in this competitive World.
Page | 17
C. Organizational Structure.

Epro academy and Glossary soft tech have Biggest Lab and Company for Complete
Practical Infrastructures and placements and Trainings.

D. Roles and responsibilities of the employees in which the intern is placed.

At Epro academy & Glossary soft tech Organizations we Train on multiple things like IT
trainings, HR & corporate activities, Job roles, Personality Development, Interview preparation
employees take at most importance in providing quality trainings on future trends.

Performance of the Organization in terms of turnover, profits, market reach and market value.

Glossary Soft Tech put emphasis on long-term commitment and combine global reach and
local intimacy to provide premier professional services from consulting, system development to
business IT outsourcing. We share the aspiration of our customers to realize innovation. Epro
academy is the Top most Training and Placement Organization in its region. We have played key
role in Training Engineering, polytechnic, Degree, ITI, Core fields, IT Sector and Medical sector
with excellent growth in both Turnover and quality training.

A. Future Plans of the Organization.

The Major future plans of Epro academy & Glossary Softech to Establish Skill Oriented
Center of Excellences all over the INDIA hence to involve in nation building.

CHAPTER3: INTERNSHIP PART

At Epro academy & Glossary Soft tech Trainee will learn about the selected domain in both Theory
and Particles. live practical knowledge is given to the trainee with industry experts. Trainee will be
Trained by Real time Industry experts and hands on practice will be given. Internship Includes Job
interview preparation and Industry ready Subject Knowledge

Page | 18
ACTIVITY LOG FOR THE FIRST WEEK

Day Person
Brief description of the daily Learning Outcome In-Charge
activity Signature

Introduction to Machine Learning Students learned about ML


Day–1

Day-2 Students learned about how


Explain how Does Machine Learning
Does Machine Learning
Work?
Work?

Day–3 Students learned about


Explain Supervised Learning
Supervised Learning

Day–4 Students learned about


Explain Unsupervised Learning
Unsupervised Learning

Day–5 Explain Reinforcement Learning Students learned about


Reinforcement Learning

Day–6 Students learned


Explain Reinforcement Learning
aboutReinforcement
Learning

Page | 19
WEEKLY REPORT
WEEK–1(From Dt………..…..to Dt.................. )

Objective of the Activity Done:

Detailed Report:

Explain introduction to Machine Learning

Explain how Does Machine Learning Work?

Explain supervised Learning

Explain Unsupervised Learning

Explain Reinforcement Learning

Page | 20
Introduction to Machine Learning

What is Machine Learning?

Machine Learning (ML) is a subset of Artificial Intelligence (AI) that enables computers to
learn patterns from data and make decisions or predictions without explicit programming. The
core idea is to build models that improve automatically through experience.

AI vs ML vs Deep Learning

 Artificial Intelligence (AI): A broad field that involves creating intelligent machines
capable of simulating human thinking.
 Machine Learning (ML): A subset of AI where machines learn from data without
being explicitly programmed.
 Deep Learning (DL): A further subset of ML that uses neural networks with multiple
layers to analyze complex data.

How Does Machine Learning Work?

Overview of the ML Lifecycle

1. Data Collection – Gather raw data from various sources.


2. Data Preprocessing – Clean, normalize, and transform the data.
3. Feature Engineering – Select and create relevant features for training.
4. Model Selection – Choose an appropriate algorithm (e.g., Decision Trees, Neural
Networks).
5. Training – Fit the model to the training data.
6. Evaluation & Testing – Assess model performance using test data.
7. Deployment & Monitoring – Deploy the model into production and track its
performance.

Types of Machine Learning

 Supervised Learning: Uses labeled data for training (e.g., spam detection).
 Unsupervised Learning: Finds hidden patterns in unlabeled data (e.g., customer
segmentation).
 Reinforcement Learning: Trains an agent through rewards and penalties (e.g., game
playing AI).

Page | 21
Supervised Learning

What is Supervised Learning?

Supervised Learning is a type of ML where the model learns from labeled datasets. Each
input data point has a corresponding correct output, and the algorithm maps the input to the
output.

Supervised Learning Examples

 Classification: Categorizing data into predefined classes (e.g., email spam detection,
disease diagnosis).
 Regression: Predicting continuous values (e.g., house price prediction, stock market
forecasting).

Unsupervised Learning

What is Unsupervised Learning?

Unsupervised Learning involves training models on unlabeled data, allowing them to


discover hidden structures or patterns.

Unsupervised Learning Examples

 Clustering: Grouping similar data points together (e.g., customer segmentation in


marketing).
 Dimensionality Reduction: Reducing the number of features while retaining
information (e.g., PCA for image compression).

Reinforcement Learning

What is Reinforcement Learning?

Reinforcement Learning (RL) is a learning paradigm where an agent interacts with an


environment, learns through trial and error, and receives rewards for good actions and
penalties for bad ones.

Reinforcement Learning Examples

 Games: AI playing chess, Go, or video games (e.g., AlphaGo, OpenAI's Dota 2).
 Robotics: Training robots to perform complex tasks (e.g., self-driving cars, industrial
automation).

Page | 22
ACTIVITY LOG FOR THE SECOND WEEK

Day Person
Brief description of the daily Learning Outcome In-Charge
activity Signature

Day–1 Students learn about


Explain Setting up Jupyter Notebook
Setting up Jupyter
Notebook

Day-2 Explain Introduction to Machine Students learn about


Learning Algorithms Introduction to Machine
Learning Algorithms

Day–3 Students learn about


Explain Clustering Algorithms
Clustering Algorithms
Overview
Overview

Day–4 Students learn about


Explain Regression Algorithms
Regression Algorithms

Day–5 Demo: Iris Dataset Students learn about


Demo: Iris Dataset

Day–6 Students learned about


Demo: Iris Dataset
Demo: Iris Dataset

Page | 23
WEEKLY REPORT
WEEK–2 (From Dt………..…..to Dt................. )

Objective of the Activity Done:

Detailed Report:

Explain Setting up Jupyter Notebook

Explain Introduction to Machine Learning Algorithms

Explain Clustering Algorithms Overview

Explain Regression Algorithms

Explain Demo: Iris Dataset

Page | 24
Setting up Jupyter Notebook

Installation, Setup, and Basic Commands

Jupyter Notebook is an open-source interactive computing environment that allows users to create
and share documents containing live code, equations, visualizations, and text. It is widely used in
Machine Learning (ML) for prototyping and experimentation.

Installation Steps:

1. Using Anaconda (Recommended for Beginners)


o Download and install Anaconda from Anaconda's official website.
o Open Anaconda Navigator and launch Jupyter Notebook.
2. Using pip (For Advanced Users)
o Install Jupyter Notebook using pip:

pip install notebook

o Run Jupyter Notebook:

jupyter notebook

o The Jupyter interface will open in your web browser.

Basic Jupyter Commands:

 Create a new notebook: Click "New" → "Python 3".


 Run a cell: Press Shift + Enter.
 Stop execution: Click on the "Stop" button or press Kernel -> Interrupt Kernel.
 Save notebook: Ctrl + S or "File -> Save".
 Add a markdown cell: Change cell type from "Code" to "Markdown".

Introduction to Python Libraries for ML

1. NumPy – For numerical computations and handling arrays.

import numpy as np
a = np.array([1, 2, 3])
print(a)

2. Pandas – For data manipulation and analysis.

import pandas as pd
df = pd.read_csv("data.csv")
print(df.head())

3. Matplotlib & Seaborn – For data visualization.

import matplotlib.pyplot as plt


import seaborn as sns
sns.histplot(df['column_name'])
plt.show()

Page | 25
4. Scikit-learn – For implementing ML algorithms.

from sklearn.model_selection import train_test_split

Introduction to Machine Learning Algorithms

Overview of Classification Algorithms

Classification is a Supervised Learning technique where the goal is to predict categorical labels.

Common Classification Algorithms:

1. Decision Trees – Tree-based models for classification.


2. Random Forest – An ensemble of decision trees for better accuracy.
3. Support Vector Machines (SVM) – Finds the best boundary (hyperplane) between classes.
4. K-Nearest Neighbors (KNN) – Classifies a data point based on its nearest neighbors.
5. Naïve Bayes – Based on probability and Bayes’ theorem.

Anomaly Detection Algorithms

Anomaly detection is used for identifying rare events or unusual patterns in data.

Common Anomaly Detection Techniques:

 Isolation Forest – Uses decision trees to isolate anomalies.


 Local Outlier Factor (LOF) – Measures the local density of data points.
 Autoencoders – Neural networks for detecting anomalies.

Clustering Algorithms Overview

Introduction to Clustering

Clustering is an Unsupervised Learning technique where similar data points are grouped together
based on features.

Types of Clustering Algorithms

1. K-Means Clustering – Groups data into ‘K’ clusters based on the centroid approach.

from sklearn.cluster import KMeans


kmeans = KMeans(n_clusters=3)
kmeans.fit(X)

2. Hierarchical Clustering – Creates a tree of clusters (dendrogram).


3. DBSCAN (Density-Based Spatial Clustering) – Detects clusters based on density.

Page | 26
Regression Algorithms

Introduction to Regression

Regression is a Supervised Learning technique used to predict continuous values.

Linear vs Logistic Regression

1. Linear Regression:
o Predicts continuous values.
o Example: Predicting house prices based on area.
o Formula: Y=mX+bY = mX + bY=mX+b

from sklearn.linear_model import LinearRegression


model = LinearRegression()
model.fit(X_train, y_train)

2. Logistic Regression:
o Used for binary classification problems.
o Example: Spam detection (Spam or Not Spam).
o Uses a sigmoid function to map outputs between 0 and 1.

from sklearn.linear_model import LogisticRegression


model = LogisticRegression()
model.fit(X_train, y_train)

Demo – Iris Dataset (Using Scikit-learn for Classification)

Introduction to the Iris Dataset

 The Iris dataset contains 150 samples of iris flowers, categorized into three species: Setosa,
Versicolor, and Virginica.
 Features include sepal length, sepal width, petal length, and petal width.

Practical Demo: Implementing a Classification Model with Scikit-learn

Step 1: Import Libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, confusion_matrix

Step 2: Load Dataset

iris = datasets.load_iris()
X = iris.data

Page | 27
y = iris.target

Step 3: Split Data into Training and Testing Sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,


random_state=42)

Step 4: Normalize Data

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

Step 5: Train a K-Nearest Neighbors (KNN) Model

model = KNeighborsClassifier(n_neighbors=3)
model.fit(X_train, y_train)

Step 6: Make Predictions and Evaluate the Model

y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy * 100:.2f}%')

# Confusion Matrix
conf_matrix = confusion_matrix(y_test, y_pred)
sns.heatmap(conf_matrix, annot=True, cmap="Blues", fmt="d")
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.show()

Key Takeaways:

 Understanding the basic ML workflow using real-world data.


 Implementing ML algorithms using Scikit-learn.
 Evaluating model performance using accuracy and confusion matrix.

Summary:


⬛ Setting up and using Jupyter Notebook for ML development.

⬛ Introduction to classification, clustering, and regression algorithms.

⬛ Hands-on experience with Scikit-learn using the Iris dataset.

Page | 28
ACTIVITY LOG FOR THE THIRDWEEK

Day Person
Brief description of the daily Learning Outcome In-Charge
activity Signature

Day–1 Explain about Categories of Students learn about Categories


Data of Data

Day-2 Explain about Categories of Students learn about Categories


Data of Data

Day–3 Explain about Introduction to Students learn about


Statistics Introduction to Statistics

Day–4 Explain about Sampling Students learn Sampling


Techniques Techniques

Day–5 Explain about Descriptive Descriptive Statistics


Statistics

Day–6 Students learn about


Explain about Probability
Probability Theory
Theory

Page | 29
WEEKLY REPORT
WEEK–3(From Dt………..…..to Dt.................. )

Objective of the Activity Done:

Detailed Report:

Explain about Categories of Data

Explain about Categories of Data

Explain about Introduction to Statistics

Explain about Sampling Techniques

Explain about Descriptive Statistics

Explain about Probability Theory

Page | 30
Statistics & Probability for Machine Learning
Categories of Data

Qualitative vs Quantitative Data

Data can be broadly categorized into two types:

1. Qualitative (Categorical) Data:


o Describes characteristics or labels without numerical value.
o Example: Colors (Red, Blue), Gender (Male, Female), Customer Feedback (Good,
Bad).
o Types of Qualitative Data:
 Nominal Data: Categories without a meaningful order (e.g., eye color, car
brands).
 Ordinal Data: Categories with a meaningful order but no fixed interval
(e.g., ratings like Good, Better, Best).
2. Quantitative (Numerical) Data:
o Represents numerical values that can be measured.
o Example: Age, Height, Weight, Salary.
o Types of Quantitative Data:
 Discrete Data: Countable and finite (e.g., number of students in a class).
 Continuous Data: Infinite and measurable values (e.g., temperature,
weight).

Understanding the Importance of Data Types

 Determines the choice of statistical techniques and visualization methods.


 Helps in selecting the right Machine Learning algorithms.
 Influences how data is processed and analyzed (e.g., categorical encoding for qualitative
data).

Introduction to Statistics

Basic Statistics Terminology and Concepts

1. Population vs Sample:
o Population: Entire dataset or group under study (e.g., all students in a country).
o Sample: Subset of the population used for analysis (e.g., students in a particular
city).
2. Parameter vs Statistic:
o Parameter: A value that describes a population (e.g., average height of all
students).
o Statistic: A value derived from a sample (e.g., average height of students in one
school).
3. Variables:
o Independent Variable (Predictor): The input variable influencing the outcome.

Page | 31
oDependent Variable (Response): The outcome being measured.
4. Bias & Variability:
o Bias: Systematic error in data leading to inaccurate results.
o Variability: Degree of spread in data points (higher variability = less reliable
predictions).

Sampling Techniques

Sampling techniques help select representative data from a population.

Types of Sampling Techniques:

1. Random Sampling:
o Every individual has an equal chance of selection.
o Example: Picking names randomly from a list.
2. Systematic Sampling:
o Selecting every ‘k-th’ individual from a list.
o Example: Selecting every 10th customer in a survey.
3. Stratified Sampling:
o Population is divided into subgroups (strata), and random samples are taken from
each.
o Example: Sampling equal proportions from male and female students in a university.

Importance of Sampling:

 Ensures representative data collection.


 Reduces bias and improves accuracy in predictions.
 Saves time and cost compared to analyzing the entire population.

Descriptive Statistics

Descriptive statistics summarize and analyze data distribution.

Measures of Central Tendency:

1. Mean (Average):

Mean= (∑X)/N

o Example: The average salary of employees in a company.


2. Median (Middle Value):
o The middle value when data is arranged in ascending order.
o Less affected by outliers.
o Example: Median household income in a country.
3. Mode (Most Frequent Value):
o The most common value in a dataset.

Page | 32
o Example: The most frequently purchased mobile phone model.

Measures of Spread (Dispersion):

1. Variance (σ²):
o Measures how far data points are from the mean.

Variance=∑(X−Mean)2/N

oHigh variance = more spread-out data.


2. Standard Deviation (σ):
o Square root of variance.
o Represents data dispersion in original units.

Standard Deviation= sqrt{Variance}

Example Calculation (Python):

import numpy as np

data = [10, 20, 30, 40, 50]


mean = np.mean(data)
median = np.median(data)
mode = max(set(data), key=data.count) # Simple mode calculation
variance = np.var(data)
std_dev = np.std(data)

print(f"Mean: {mean}, Median: {median}, Mode: {mode}")


print(f"Variance: {variance}, Standard Deviation: {std_dev}")

Probability Theory

Probability is used to measure the likelihood of events occurring in data analysis and machine
learning.

Basic Probability Terminology

 Experiment: A process that leads to an outcome (e.g., rolling a die).


 Sample Space (S): The set of all possible outcomes (e.g., {1, 2, 3, 4, 5, 6}).
 Event: A subset of the sample space (e.g., rolling an even number {2, 4, 6}).
 Probability Formula:

P(A)=Total Number of Outcomes/Number of Favorable Outcomes

Types of Events in Probability

1. Independent Events:
o The occurrence of one event does not affect the probability of another.
o Example: Rolling a die and flipping a coin.
2. Dependent Events:
o The probability of one event depends on another.
o Example: Drawing cards without replacement from a deck.

Page | 33
3. Mutually Exclusive Events:
o Two events cannot happen at the same time.
o Example: A single dice roll cannot be both 1 and 6.
4. Conditional Probability:
o The probability of event A occurring given that event B has already occurred.

P(A∣B)=P(A∩B)P(B)

o Example: Probability of drawing a King given that a Queen was drawn earlier from
a deck.

Probability Distributions

1. Discrete Probability Distribution (e.g., Binomial Distribution):


o Deals with countable outcomes (e.g., flipping a coin multiple times).
2. Continuous Probability Distribution (e.g., Normal Distribution):
o Deals with continuous values (e.g., height, temperature).
o Normal Distribution Formula:

o Example: Bell curve distribution of students' exam scores.

Example Probability Calculation (Python):

from scipy.stats import norm

mean = 50
std_dev = 10
prob = norm.cdf(60, mean, std_dev) # Probability of a value ≤ 60
print(f"Probability of a value ≤ 60: {prob:.4f}")

Summary:


⬛ Understanding data types and their significance in ML.

⬛ Learning sampling techniques for unbiased data collection.

⬛ Applying descriptive statistics to summarize data distributions.

⬛ Gaining foundational knowledge in probability for ML models.

Page | 34
ACTIVITY LOG FOR THE FORTH WEEK

Day Person
Brief description of the daily Learning Outcome In-Charge
activity Signature

Day–1 Explain about Marginal Probability Students learn about


and Joint Probability Marginal Probability and
Joint Probability

Day-2 Explain about Inferential Statistics Students learn about


Inferential Statistics

Day–3 Explain about Hypothesis Testing Students learn about


Hypothesis Testing

Day–4 Explain about Students learn about


Information Gain & Information Gain &
Entropy Entropy

Day–5 Explain about Confusion Matrix Students learn about


Confusion Matrix

Day–6 Students learn about


Explain about Confusion Matrix
Confusion Matrix

Page | 35
WEEKLY REPORT
WEEK–4 (From Dt………..…..to Dt................. )

Objective of the Activity Done:

Detailed Report:

Explain about Marginal Probability and Joint Probability

Explain about Inferential Statistics

Explain about Hypothesis Testing

Explain about Information Gain & Entropy

Explain about Confusion Matrix

Page | 36
Advanced Statistics for Machine Learning
This week covers advanced statistical concepts used in Machine Learning (ML) to make data-
driven decisions and evaluate models effectively.

Marginal Probability and Joint Probability


Understanding Marginal and Joint Probability

1. Marginal Probability

 Probability of a single event occurring, regardless of other variables.


 Example: The probability that a student passes an exam (P(Pass)), ignoring study hours.
 Formula: P(A)=∑P(A,B)

Where P(A,B)is the joint probability of A and B occurring.

2. Joint Probability

 Probability of two events occurring together.


 Example: The probability that a student studies for 5+ hours and passes an exam (P(Study
& Pass)).
 Formula: P(A∩B)=P(A)×P(B)
 P(A∩B)=P(A)×P(B) if A and B are independent.

Conditional Probability and Bayes' Theorem

3. Conditional Probability

 Probability of event A occurring given that event B has already occurred.


 Example: Probability of passing given that a student studied for 5+ hours (P(Pass | Study)).
 Formula: P(A∣B)=P(A∩B)P(B)

4. Bayes' Theorem

 Used to update probabilities based on new evidence.


 Formula: P(A∣B)=(P(B∣A)×P(A)) /P(B)
 Example in ML: Used in Naïve Bayes Classifier to classify spam emails.

Page | 37
Inferential Statistics
Inferential statistics help make predictions about a population based on a sample.

1. Point Estimation

 Definition: Estimating a population parameter using a single value (e.g., sample mean for
population mean).
 Example: Predicting the average salary of all data scientists using a small sample.

2. Interval Estimate

 Provides a range of values instead of a single number.


 Example: "The average salary of a data scientist is $120,000 ± $5,000."

3. Margin of Error (MOE)

 Indicates the uncertainty in an estimate.


 Formula: MOE=Z×σn Where:
o Z = Z-score (based on confidence level)
o σ = Population standard deviation
o n = Sample size

Application in ML

 Confidence intervals in A/B testing for model evaluation.


 Used in confidence estimation of ML predictions.

Hypothesis Testing
Used to validate claims or compare models.

1. Introduction to Hypothesis Testing

 Null Hypothesis (H₀): No effect or difference.


 Alternative Hypothesis (H₁): There is an effect or difference.
 Example:
o H₀: "Studying has no effect on exam performance."
o H₁: "Studying improves exam performance."

2. Types of Hypothesis Tests in ML

Test Type Usage Example


Checking if a sample mean differs from the population mean (large
Z-Test sample, known variance).

Page | 38
Test Type Usage Example
Comparing means of two groups (e.g., average test scores before &
T-Test
after ML training).
Used for categorical data analysis (e.g., checking correlation between
Chi-Square Test
gender and product preference).
ANOVA (Analysis of Used to compare more than two groups (e.g., performance of multiple
Variance) ML models).

3. P-Value & Statistical Significance

 P-value < 0.05 → Reject H₀ (Significant effect).


 P-value > 0.05 → Fail to reject H₀ (No significant effect).

Example in Python:

from scipy.stats import ttest_ind

group1 = [85, 90, 88, 92, 87]


group2 = [75, 78, 80, 74, 79]

t_stat, p_value = ttest_ind(group1, group2)


print(f"P-value: {p_value}")

Information Gain & Entropy


Used in decision trees to measure the purity of a dataset.

1. Understanding Entropy

 Measures uncertainty in a dataset.


 Formula: H(S)=−∑pi log2 pi
 Example:
o Dataset with 50% spam and 50% non-spam → High entropy (uncertain).
o Dataset with 90% spam and 10% non-spam → Low entropy (more pure).

2. Information Gain (IG)

 Measures reduction in entropy after a split.


 Formula: IG=H(parent)−∑(samples in child node×H(child) / total samples)
 Used in: Decision Trees (ID3, C4.5, CART).

Example in Python (Using Scikit-learn):

from sklearn.tree import DecisionTreeClassifier


from sklearn.datasets import load_iris

X, y = load_iris(return_X_y=True)
tree = DecisionTreeClassifier(criterion='entropy')
tree.fit(X, y)

Page | 39
Confusion Matrix
Used to evaluate classification models.

1. Understanding the Confusion Matrix

Actual \ Predicted Positive (1) Negative (0)


Positive (1) True Positive (TP) False Negative (FN)
Negative (0) False Positive (FP) True Negative (TN)

2. Key Metrics from the Confusion Matrix

1. Accuracy:

Accuracy=(TP+TN)/(TP+TN+FP+FN)

o Overall performance of the model.


2. Precision:

Precision=TP/(TP+FP)

o How many positive predictions were correct?


o Important for spam detection.
3. Recall (Sensitivity):

Recall=TP/(TP+FN)

o How many actual positives were correctly predicted?


o Important for medical diagnosis.
4. F1-Score:

F1=(2×Precision×Recall)/(Precision+Recall)

o Harmonic mean of Precision and Recall.

3. Example in Python
from sklearn.metrics import confusion_matrix, classification_report
import numpy as np

y_true = np.array([1, 0, 1, 1, 0, 1, 0, 0, 1, 0])


y_pred = np.array([1, 0, 1, 0, 0, 1, 1, 0, 1, 0])

cm = confusion_matrix(y_true, y_pred)
print("Confusion Matrix:\n", cm)
print("\nClassification Report:\n", classification_report(y_true, y_pred))

Page | 40
Summary

⬛ Marginal & Joint Probability, Bayes’ Theorem for probabilistic ML models.

⬛ Inferential Statistics for making data-driven predictions.

⬛ Hypothesis Testing for validating ML model improvements.

⬛ Entropy & Information Gain for decision trees.

⬛ Confusion Matrix & Performance Metrics for evaluating classification models.

Page | 41
ACTIVITY LOG FOR THE FIFTH WEEK

Day Person
Brief description of the daily Learning Outcome In-Charge
activity Signature

Day–1 Explain about Linear Regression Students learn about Linear


Regression

Day-2 Explain about Logistic Regression Students learn about


Logistic Regression

Day–3 Explain about Decision Trees Students learn about


Decision Trees

Day–4 Explain about Decision Trees Students learn about


Decision Trees

Day–5 Explain about Entropy & Gini Students learn about


Index Entropy & Gini Index

Day–6 Students learn about


Explain about Entropy & Gini
Entropy & Gini Index
Index

Page | 42
WEEKLY REPORT
WEEK–5(From Dt………..…..to Dt.................. )

Objective of the Activity Done:

Detailed Report:

Explain about Linear Regression

Explain about Logistic Regression

Explain about Decision Trees

Explain about Entropy & Gini Index

Page | 43
Linear Regression
Introduction and Working of Linear Regression

Linear Regression is a fundamental supervised learning algorithm used for predicting continuous
values. It establishes a relationship between a dependent variable Y and one or more independent
variables X. The relationship is modeled as:

Y=mX+c+ϵ

where:

 Y = Dependent variable (target variable)


 X = Independent variable(s) (features)
 m = Slope (coefficient)
 c = Intercept (bias)
 ϵ = Error term

Types of Linear Regression

1. Simple Linear Regression: A single independent variable is used to predict the dependent
variable.
2. Multiple Linear Regression: Multiple independent variables are used for prediction.

Real-life Applications of Linear Regression

 Predicting House Prices: Relationship between square footage and price.


 Stock Market Prediction: Estimating stock trends using historical data.
 Sales Forecasting: Predicting future sales based on past sales data.

Titanic Data Analysis Example

Linear regression can be applied to analyze Titanic survival predictions by examining variables like
passenger class, age, fare, etc., to estimate the likelihood of survival.

Logistic Regression
Understanding Logistic Regression Curve

Logistic Regression is a supervised learning algorithm used for binary classification problems
(e.g., spam detection, fraud detection). Unlike linear regression, it predicts probabilities rather than
continuous values.

P(Y=1∣X)=1/(1+e-(β0+β1X))

Page | 44
where:

 P(Y=1∣X) is the probability of an event occurring.


 β0,β1 are the regression coefficients.
 X is the independent variable.
 The function transforms the output to a value between 0 and 1 using the sigmoid function:

σ(z)=1 /(1+e-z)
z is a linear combination of the input features: z=w0+w1x1+w2x2+⋯+wnxn (where w are the
weights and x are the input features)
σ(z) maps any real number to a value between 0 and 1.
 S-shaped curve (also called the "logistic curve").
 Output is always between 0 and 1 (suitable for probabilities).
 If z=0 the probability is 0.5 (neutral decision boundary).
 As z→∞, the probability approaches 1.
 As z→−∞ the probability approaches 0.

Types of Logistic Regression

1. Binary Logistic Regression: Two possible outcomes (e.g., "Yes" or "No").


2. Multinomial Logistic Regression: More than two categories (e.g., Classifying types of
flowers).
3. Ordinal Logistic Regression: Ordered categories (e.g., customer ratings from 1 to 5).

Hands-on Demo

 Implementing Logistic Regression using Python (Scikit-Learn)


 Example: Predicting whether a passenger survived the Titanic disaster using features like
Age, Sex, and Passenger Class.

Step 1: Import Required Libraries


import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix,
classification_report

Step 2: Load the Titanic Dataset


# Load Titanic dataset from Seaborn
titanic = sns.load_dataset('titanic')

# Display first few rows


titanic.head()

Step 3: Data Preprocessing

Page | 45
Select Relevant Features

We will use the following features:

 Age (numerical)
 Sex (categorical: male/female)
 Pclass (categorical: 1st, 2nd, 3rd class)

# Select relevant columns


df = titanic[['survived', 'sex', 'age', 'pclass']]

# Drop rows with missing values


df = df.dropna()

# Convert categorical 'sex' into numerical (0 for female, 1 for male)


df['sex'] = df['sex'].map({'male': 1, 'female': 0})

# Split dataset into features (X) and target variable (y)


X = df[['sex', 'age', 'pclass']]
y = df['survived']

Step 4: Split Data into Training and Testing Sets

# Split data into 80% training and 20% testing


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

Step 5: Feature Scaling (Standardization)

Since Age has different scales compared to other features, we apply Standardization.

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

Step 6: Train Logistic Regression Model


# Create and train the logistic regression model
model = LogisticRegression()
model.fit(X_train, y_train)

Step 7: Make Predictions


# Predict survival on test data
y_pred = model.predict(X_test)

Step 8: Evaluate the Model


# Accuracy Score
accuracy = accuracy_score(y_test, y_pred)

Page | 46
print(f'Accuracy: {accuracy:.2f}')

# Confusion Matrix
conf_matrix = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:\n", conf_matrix)

# Classification Report
print("Classification Report:\n", classification_report(y_test, y_pred))

Step 9: Visualize the Results


# Plot Confusion Matrix
sns.heatmap(conf_matrix, annot=True, fmt='d', cmap='Blues')
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.title('Confusion Matrix')
plt.show()

Results Interpretation

 Accuracy: Measures how many predictions are correct.


 Confusion Matrix: Shows TP, FP, TN, FN values.
 Classification Report: Displays Precision, Recall, and F1-score.

Decision Trees
What is Classification?

Classification is a supervised learning task where the goal is to categorize data points into
predefined classes (labels).

 Binary Classification: Two possible outcomes (e.g., "Spam" or "Not Spam").


 Multiclass Classification: More than two outcomes (e.g., classifying different species of
flowers).

Introduction to Decision Trees and Terminologies

A Decision Tree is a tree-like structure where internal nodes represent features, branches represent
decision rules, and leaf nodes represent the outcome.

Key Terminologies:

 Root Node: The topmost node representing the entire dataset.


 Splitting: Dividing nodes based on feature values.
 Decision Node: A node that splits further.
 Leaf Node: A node with a final classification outcome.
 Pruning: Removing branches to avoid overfitting.

Page | 47
How Decision Trees Work?
1. Start with the root node containing all training samples.
2. Choose the best feature to split the data based on a splitting criterion (e.g., Gini Index,
Entropy).
3. Recursively split data into branches until all data points belong to a specific class.
4. Prune the tree to improve generalization.

Entropy & Gini Index


Understanding Decision Tree Splitting Criteria

Decision trees use different criteria to decide the best feature to split on. Two popular methods are
Entropy and Gini Index.

Entropy (Information Gain)

Entropy measures the randomness or disorder in data. It is calculated as:

H(S)=−∑pi log2 (pi)

where pip_ipi is the probability of class iii. A lower entropy value indicates a purer node.
Information Gain (IG) is used to determine the best feature for splitting:

IG(S,A)=H(S)− ∑(∣Sv∣)( H(Sv))/(∣S∣)


 S = original dataset
 A = feature used for splitting

∣Sv∣/∣S∣ = proportion of subset in dataset

where Sv are the subsets after splitting. A feature with high information gain is preferred.

Gini Index

Gini Index measures impurity and is given by:

Gini=1−∑pi 2

A lower Gini Index indicates a better split.

Decision Tree on Credit Risk Detection Use-Case

 Problem Statement: Predict whether a loan applicant is at risk of default.


 Features: Age, Income, Credit Score, Loan Amount, etc.
 Approach:
1. Train a Decision Tree using historical loan data.
2. Use Entropy or Gini Index for feature selection.
3. Evaluate model performance using accuracy, precision, and recall.

Page | 48
ACTIVITY LOG FOR THE SIXTH WEEK

Day Person
Brief description of the daily Learning Outcome In-Charge
activity Signature

Day–1 Explain about Random Forest Students learned about


Random Forest

Day-2 Explain about KNN Algorithm Students learned about


KNN Algorithm

Day–3 Explain about Naive Bayes Students learned about


Naive Bayes

Day–4 Explain about Naive Bayes Students learned about


Naive Bayes

Day–5 Explain about Support Vector Students learned about


Machine (SVM) HTMLTables

Day–6 Students learn about


Explain about Support Vector
Support Vector Machine
Machine (SVM)
(SVM)

Page | 49
WEEKLY REPORT
WEEK–6(From Dt……….…..to Dt................... )

Objective of the Activity Done:

Detailed Report:

Explain about Random Forest

Explain about KNN Algorithm

Explain about Naive Bayes

Explain about Support Vector Machine (SVM)

Page | 50
Random Forest
Introduction to Random Forest and Its Advantages
Random Forest is an ensemble learning method that builds multiple decision trees and combines
their outputs to improve accuracy and reduce over fitting.

How Random Forest Works:

1. Bootstrap Sampling: Randomly select subsets of training data with replacement.


2. Multiple Decision Trees: Train each tree on a different subset.
3. Voting Mechanism:
o Classification: Majority voting decides the final class label.
o Regression: Average of all tree predictions is taken.

Advantages of Random Forest:


⬛ Reduces Overfitting: Multiple trees generalize better.

⬛ Handles Missing Values: Works well even with incomplete data.

⬛ Works on Large Datasets: Efficient in handling high-dimensional data.

⬛ Handles Non-linearity: Unlike simple decision trees, it captures complex relationships.

Random Forest Use-cases:


’)cMedical Diagnosis: Predicting diseases like diabetes and cancer.
’)c Banking & Credit Scoring: Fraud detection and loan approval predictions.
c)’ Stock Market Prediction: Forecasting stock trends based on past patterns.
c) ’ Recommendation Systems: Improving movie or e-commerce recommendations.

K-Nearest Neighbors (KNN) Algorithm


KNN Algorithm and Working
KNN is a non-parametric, lazy learning algorithm used for classification and regression tasks.

How KNN Works:

1. Choose the number of neighbors (k).


2. Calculate the distance between the query point and all training points using:
o Euclidean Distance
o Manhattan Distance
o Minkowski Distance
3. Identify the k nearest neighbors.
4. Assign the majority class label (for classification) or take the average (for regression).

Page | 51
Choosing the Best k Value:

 Small k → High variance, more noise sensitivity.


 Large k → More bias, less overfitting.
 Use cross-validation to select the best k.

KNN Demo on a Dataset:


 Load a dataset (e.g., Iris Dataset or MNIST Handwritten Digits).
 Implement KNN using Scikit-Learn.
 Evaluate performance using accuracy, precision, recall, and F1-score.

Advantages & Disadvantages of KNN



⬛ Simple & Intuitive: No complex model training required.

⬛ Works Well for Small Datasets: Good for low-dimensional data.
+ Computationally Expensive: Slow for large datasets.
+ Sensitive to Feature Scaling: Needs normalization (e.g., MinMax scaling).

Naive Bayes
Naive Bayes Working & Formula
Naive Bayes is a probabilistic classifier based on Bayes’ Theorem:

P(A∣B) = (P(B∣A)P(A))/P(B)

where:

 P(A∣B) → Probability of event A occurring given event B.


 P(B∣A) → Probability of event B given A.
 P(A) & P(B) → Prior probabilities.

The "Naive" assumption is that features are independent, which simplifies calculations.

Types of Naive Bayes Classifiers:


1. Gaussian Naive Bayes: Assumes features follow a normal distribution.
2. Multinomial Naive Bayes: Used for text classification (e.g., spam detection).
3. Bernoulli Naive Bayes: Deals with binary feature values (e.g., word presence in spam
emails).

Industrial Use of Naive Bayes (PIMA Diabetic Test)


 The PIMA Diabetes dataset contains medical details of patients.

Page | 52
 The goal is to classify whether a patient has diabetes using Naive Bayes.
 Features include glucose levels, blood pressure, insulin, age, etc.
 Why Naive Bayes? It works well even with small datasets and is computationally efficient.

Advantages & Disadvantages of Naive Bayes



⬛ Fast and Efficient: Works well in real-time applications.

⬛ Handles High-Dimensional Data: Ideal for text classification (e.g., spam filtering).
+ Assumes Feature Independence: Not always realistic in real-world scenarios.
+ Performs Poorly with Highly Correlated Features.

Support Vector Machine (SVM)


SVM Theory and Application
SVM is a powerful supervised learning algorithm used for both classification and regression.

How SVM Works?

1. Finds the Optimal Decision Boundary (Hyperplane) that separates classes.


2. Maximizes the Margin between data points closest to the hyperplane (support vectors).
3. Uses Kernel Trick to handle non-linear data.

Mathematical Representation:

For a hyperplane defined as:

wX+b=0

The decision function is:

y=sign(wX+b)

where:

 w = Weight vector
 X = Feature vector
 b = Bias

Non-Linear SVM & Kernel Functions


For non-linearly separable data, SVM uses kernel functions to transform data into higher
dimensions where a linear separator exists.

Popular Kernels:

Page | 53
1. Linear Kernel: For linearly separable data.
2. Polynomial Kernel: For moderately complex data.
3. Radial Basis Function (RBF) Kernel: Commonly used for highly non-linear problems.

SVM Use-Case Examples:


)c ’ Face Detection: Differentiating between faces and
non-faces. c’) Cancer Detection: Classifying malignant vs.
benign tumors. c) ’ Sentiment Analysis: Classifying positive vs.
negative reviews.
c)’ Handwriting Recognition: Recognizing handwritten characters.

Advantages & Disadvantages of SVM



⬛ Works Well for High-Dimensional Data.

⬛ Effective for Non-Linear Data with Kernel Trick.
+ Slow for Large Datasets.
+ Sensitive to Hyperparameter Tuning (C, gamma).

Summary (Supervised Learning Algorithms - Part 2)



⬛ Random Forest → Multiple decision trees improve accuracy.

⬛ KNN → Uses nearest neighbors for classification & regression.

⬛ Naive Bayes → Probabilistic classifier used for spam filtering & medical diagnosis.

⬛ SVM → Works well for complex classification problems using hyperplanes & kernel functions

Page | 54
ACTIVITY LOG FOR THE SEVENTH WEEK

Day Person
Brief description of the daily Learning Outcome In-Charge
activity Signature

Day–1 Students learn about


Explain about Introduction to
Introduction to Clustering
Clustering

Day-2 Students learn about


Explain about Introduction to
Introduction to Clustering
Clustering

Day–3 Explain about K-Means Students learn about K-


Clustering Means Clustering

Day–4 Explain about K-Means Students learn about K-


Clustering Means Clustering

Day–5 Students learn about


Explain about Hierarchical Clustering
Hierarchical Clustering

Day–6 Students learn about


Explain about Association Rule
Association Rule Mining
Mining

Page | 55
WEEKLY REPORT
WEEK–7(From Dt………..…..to Dt.................. )

Objective of the Activity Done:

Detailed Report:

Explain about Introduction to Clustering

Explain about K-Means Clustering

Explain about Hierarchical Clustering

Association Rule Mining

Page | 56
Unsupervised Learning & Clustering
In this week, we explore unsupervised learning techniques, particularly clustering and
association rule mining. These methods help find hidden patterns and relationships in data
without labeled outputs.

Introduction to Clustering
What is Clustering?

Clustering is an unsupervised learning technique used to group similar data points into clusters
based on their features.

Types of Clustering Algorithms

1. K-Means Clustering
o Partition-based clustering method.
o Assigns data points to K clusters based on similarity.
2. Hierarchical Clustering
o Builds a hierarchy of clusters.
o Can be Agglomerative (bottom-up) or Divisive (top-down).
3. DBSCAN (Density-Based Clustering)
o Groups points based on density.
o Good for irregularly shaped clusters.
4. Gaussian Mixture Model (GMM)
o Uses probability distributions for clustering.
o More flexible than K-Means.

Practical Use Cases of Clustering

)c ’ Customer Segmentation: Grouping customers based on purchasing behavior.


c)’ Anomaly Detection: Identifying fraud in banking
transactions. c)
’ Image Segmentation: Clustering pixels in
image processing. c’) Genetic Data Analysis: Grouping similar
gene sequences.

K-Means Clustering
Understanding the K-Means Algorithm

K-Means is a centroid-based clustering algorithm that partitions data into Kclusters.

How K-Means Works?


Page | 57
1. Select K centroids randomly.
2. Assign each data point to the nearest centroid (using Euclidean distance).
3. Update centroids by computing the mean of assigned points.
4. Repeat steps 2 & 3 until centroids don’t change (convergence).

Choosing the Optimal K

 Use Elbow Method: Plot within-cluster variance vs. K.


 Use Silhouette Score: Measures how well points fit within clusters.

Pros & Cons of K-Means


✓ Fast and Scalable for large datasets.

⬛ Works well with clear, well-separated clusters.
+ Sensitive to initial centroid selection.
+ Fails on non-spherical or imbalanced clusters.

K-Means Demo: Customer Segmentation

 Dataset: Mall Customers Dataset (features: income, spending score).


 Steps:
1. Load dataset & preprocess data (feature scaling).
2. Apply K-Means with Elbow Method to find optimal K.
3. Visualize clusters using scatter plots.
4. Interpret clusters (e.g., High spenders, Budget shoppers).

Hierarchical Clustering
Types of Hierarchical Clustering

1. Agglomerative (Bottom-Up)
o Start with individual points and merge closest clusters.
2. Divisive (Top-Down)
o Start with all points in one cluster and split recursively.

Dendrogram & Linkage Methods

 Dendrogram: Tree-like diagram showing merging process.


 Linkage Methods:
o Single Linkage: Merge clusters with the shortest distance.
o Complete Linkage: Merge clusters with the farthest distance.
o Average Linkage: Merge based on average distances.

Hierarchical Clustering Practical Example

 Dataset: Customer Segmentation (similar to K-Means).

Page | 58
 Steps:
1. Compute distance matrix & apply Agglomerative clustering.
2. Plot the dendrogram to find optimal clusters.
3. Assign cluster labels & visualize the results.

Pros & Cons of Hierarchical Clustering


⬛ Doesn’t require specifying K in advance.

⬛ Produces a tree-like structure for better interpretability.
+ Computationally expensive for large datasets.
+ Sensitive to noisy data & outliers.

Association Rule Mining


What is Association Rule Mining?

It is an unsupervised learning method to find relationships between variables in large datasets.


Example: Market Basket Analysis (e.g., “Customers who buy bread also buy butter”).

Apriori Algorithm

The Apriori Algorithm is used to find frequent item sets in transactional datasets using:

1. Support: Frequency of an item appearing in transactions.


Support(A)=Transactions containing A / Total transactions
2. Confidence: Probability that item B is bought given A is bought.
Confidence(A→B)=Support(A∪B) / Support(A)
3. Lift: Strength of association between A and B.

Lift(A→B)=Confidence(A→B) / Support(B)

o Lift > 1 → Strong association.


o Lift < 1 → Weak or negative association.

Apriori Algorithm Demo

 Dataset: Grocery Transactions (e.g., Bread, Milk, Eggs, Butter).


 Steps:
1. Load dataset & convert transactions to binary format.
2. Apply Apriori Algorithm to find frequent item sets.
3. Visualize association rules with Lift values.

Page | 59
Consider a database of transactions in a supermarket:

Transaction ID Items Bought

T1 Milk, Bread

T2 Milk, Diaper, Beer, Egg

T3 Milk, Bread, Diaper, Beer

T4 Bread, Diaper, Beer

T5 Milk, Bread, Diaper, Beer

Let's assume the following parameters:


 Minimum Support: 60% (i.e., at least 3 transactions out of 5)
 Minimum Confidence: 70%
Step 1: Generate Frequent Itemsets
Start by checking individual items:
 Milk appears in 4 out of 5 transactions (80% support).
 Bread appears in 4 out of 5 transactions (80% support).
 Diaper appears in 4 out of 5 transactions (80% support).
 Beer appears in 4 out of 5 transactions (80% support).
All these items meet the minimum support threshold, so they are frequent itemsets.
Step 2: Generate Candidate Itemsets of Size 2
Now, generate pairs of items:
 {Milk, Bread} appears in 3 out of 5 transactions (60% support).
 {Milk, Diaper} appears in 3 out of 5 transactions (60% support).
 {Milk, Beer} appears in 3 out of 5 transactions (60% support).
 {Bread, Diaper} appears in 3 out of 5 transactions (60% support).
 {Bread, Beer} appears in 3 out of 5 transactions (60% support).
 {Diaper, Beer} appears in 3 out of 5 transactions (60% support).
All these pairs meet the minimum support threshold and are considered frequent itemsets.
Step 3: Generate Association Rules
From the frequent itemsets, we generate association rules and calculate their confidence:
 Rule: {Milk} ⇒ {Bread}: The confidence is calculated as:
Confidence(Milk⇒Bread)=Support(Milk,Bread)/ Support(Milk)

Page | 60
(3/5)/(4/5)=75%
Since 75% confidence meets the minimum threshold of 70%, the rule is kept.
 Rule: {Diaper} ⇒ {Beer}: The confidence is:
Confidence(Diaper⇒Beer)=Support(Diaper,Beer)/Support(Diaper)
=(3/5)/(4/5)=75%
This rule is also kept because it meets the confidence threshold.
Example of Apriori Results
1. Frequent Itemsets:
o Single items: {Milk}, {Bread}, {Diaper}, {Beer}
o Pairs: {Milk, Bread}, {Milk, Diaper}, {Milk, Beer}, {Bread, Diaper}, {Bread,
Beer}, {Diaper, Beer}
2. Association Rules:
o {Milk} ⇒ {Bread} (75% confidence)
o {Diaper} ⇒ {Beer} (75% confidence)
Conclusion
The Apriori algorithm is widely used in market basket analysis and other applications involving
association rule mining. By identifying frequent itemsets and deriving strong association rules,
businesses can understand customer purchasing behavior and make informed decisions regarding
product placements, promotions, and recommendations.

Applications of Apriori Algorithm

)c ’ Retail & E-Commerce: Recommending frequently bought items together.


)’c Medical Diagnosis: Finding co-occurring symptoms in patient records.
c) ’ Web Page Analysis: Suggesting related content based on browsing patterns.

Summary:

⬛ Introduction to Clustering → Types & real-world applications.

⬛ K-Means Clustering → Working, pros/cons & customer segmentation demo.

⬛ Hierarchical Clustering → Agglomerative vs. Divisive methods & practical example.

⬛ Association Rule Mining → Apriori Algorithm for market basket analysis.

Page | 61
ACTIVITY LOG FOR THE EIGTH WEEK

Day Person
Brief description of the daily Learning Outcome In-Charge
activity Signature

Day–1 Explain about Markov’s Decision Students learn about


Process (MDP) Markov’s Decision
Process (MDP)

Day-2 Explain about Markov’s Decision Students learn Markov’s


Process (MDP) Decision Process (MDP)

Day–3 Explain about The Bellman Equation Students learn about


The Bellman Equation

Day–4 Explain about The Bellman Equation Students learn about


The Bellman Equation

Day–5 Explain about Implementing Q- Students learn about


Learning Implementing Q-Learning

Day–6 Students implements


Reinforcement Learning Projects
Reinforcement Learning
Projects

Page | 62
WEEKLY REPORT
WEEK–8(From Dt………..…..to Dt:Dt....................)

Objective of the Activity Done:

Detailed Report:

Explain about Markov’s Decision Process (MDP)

Explain about The Bellman Equation

Explain about Implementing Q-Learning

Explain about Reinforcement Learning Projects

Page | 63
Reinforcement Learning (RL)
Reinforcement Learning (RL) is a type of machine learning where an agent learns to make
decisions by interacting with an environment and receiving rewards or penalties based on its
actions. This week covers the fundamentals of RL, including Markov Decision Processes (MDP),
Q-Learning, the Bellman Equation, and hands-on projects.

Markov Decision Process (MDP)


What is MDP and Why is it Important in RL?

A Markov Decision Process (MDP) provides a mathematical framework for modeling decision-
making in sequential environments where outcomes are partly random and partly controlled by an
agent’s actions.

MDP Components:

1. States (S): Represents different situations the agent can be in.


2. Actions (A): Possible actions the agent can take.
3. Transition Probability (P): The probability of moving from one state to another given an
action.
4. Reward (R): The immediate reward received for taking an action in a given state.
5. Policy (π): The strategy that the agent follows to decide actions.
6. Discount Factor (γ): Determines how much future rewards are valued.

Why Use MDP in RL?


⬛ Defines an environment mathematically, helping in designing RL algorithms.

⬛ Helps the agent learn optimal decision-making strategies over time.

⬛ Forms the basis for Q-Learning, Deep Q-Networks (DQN), and Policy Gradient Methods.

Q-Learning Overview and Implementation


Q-Learning is a model-free RL algorithm used to learn the best action policy for an agent through
trial and error.

Q-Learning Formula (Bellman Update Rule)

Q(s,a)=Q(s,a)+α[R+γmaxQ(s′,a′)−Q(s,a)]

where:

 Q(s,a) → Q-value for states and action a.


 α → Learning rate (0 < α < 1).
Page | 64
 γ → Discount factor (determines importance of future rewards).
 R → Reward received after taking action a.
 Max Q(s′,a′) → Highest Q-value for the next state s′.

Q-Learning Implementation (Step-by-Step)

1. Initialize Q-Table with zeros for all state-action pairs.


2. Observe the current state s.
3. Choose an action a using an exploration strategy (e.g., ε-greedy).
4. Take action a and observe reward R and next state s′.
5. Update Q-value using the Bellman equation.
6. Repeat until convergence.

The Bellman Equation


Understanding the Bellman Equation

The Bellman Equation describes the optimal value function for an MDP, helping in determining
the best possible action for an agent.

Bellman Optimality Equation

V∗(s)=maxa[R(s,a)+γs′∑P(s′∣s,a)V∗(s′)]

where:

 V∗(s) is the maximum value for state s.


 P(s′∣s,a) is the transition probability to the next state.

Why is the Bellman Equation Important in RL?


⬛ Helps compute the optimal policy for decision-making.

⬛ Forms the foundation of Q-Learning and Deep Q-Networks (DQN).

⬛ Used to evaluate expected future rewards.

Transitioning to Q-Learning in Practical Applications

 Implementing Q-Learning for Grid-World Navigation.


 Understanding how Bellman updates improve learning over time.
 Exploration vs. Exploitation: Balancing between discovering new actions and using
known actions to maximize rewards.

Implementing Q-Learning
Implementing Q-Learning in a Simulated Environment

Page | 65
 Environment: OpenAI Gym (e.g., FrozenLake, Taxi-v3).
 Algorithm: Use Q-Learning to teach an agent to navigate the environment.

Step-by-Step Implementation in Python

1. Import Dependencies: Use OpenAI Gym and NumPy.


2. Create Environment: Load an environment like FrozenLake.
3. Initialize Q-Table: Set all values to zero.
4. Train the Agent: Apply Q-learning updates over multiple episodes.
5. Evaluate Performance: Observe agent behavior and Q-table updates.

Counter-Strike Example and RL Use-case


 Game AI Bots: Reinforcement Learning can be used to train AI agents to play games like
Counter-Strike by rewarding good behaviors (e.g., hitting targets) and penalizing bad ones
(e.g., getting shot).
 Q-Learning in FPS Games:
o States: Agent’s position, health, ammo.
o Actions: Move forward, shoot, reload, hide.
o Rewards: Killing an enemy (+10), getting shot (-5), running out of ammo (-3).

Day 49-50: Reinforcement Learning Projects


Hands-on Practice with an RL Project

c)
’ Project 1: Training an RL Agent to Play a Self-Play Game

 Environment: OpenAI Gym – CartPole.


 Goal: Keep the pole balanced by taking appropriate actions.
 Steps:
1. Set up the Gym environment.
2. Implement Q-Learning or Deep Q-Network (DQN).
3. Train the agent and visualize learning progress.

c)
’ Project 2: Self-Driving Car Simulation

 States: Car position, speed, distance from obstacles.


 Actions: Accelerate, brake, turn left/right.
 Rewards: Staying on track (+10), collision (-50).
 Algorithm: Use Q-Learning or Deep Reinforcement Learning (DQN).

Page | 66
Summary:

⬛ Markov Decision Process (MDP) → Foundation of RL, Q-learning intro.

⬛ The Bellman Equation → Understanding and applying the Bellman equation.

⬛ Implementing Q-Learning → Train an RL agent, Counter-Strike example.

⬛ RL Projects → Apply Q-Learning to self-play games & real-world scenarios.

Page | 67
ACTIVITY LOG FOR THE NINETH WEEK

Day Person
Brief description of the daily Learning Outcome In-Charge
activity Signature

Day–1 Students learned about Data


Explain about Data Preprocessing
Preprocessing

Day-2 Students learned about Data


Explain about Data Preprocessing
Preprocessing

Day–3 Students learned


Explain about Model Training &
about Model
Testing
Training & Testing

Day–4 Students learned


Explain about Model Training &
about Model
Testing
Training & Testing

Day–5 Students learned about


Explain about Model Deployment
Model Deployment Basics
Basics

Day–6 Students learn about


Model Deployment Basics
Model Deployment
Basics

Page | 68
WEEKLY REPORT
WEEK–9(From Dt………..…..to Dt...................)

Objective of the Activity Done:

Detailed Report:

Explain about Data Preprocessing

Explain about Model Training & Testing

Explain about Model Deployment Basics

Page | 69
Machine Learning Project Development (Part 1)
This week focuses on the end-to-end process of developing a machine learning (ML) project,
from data preprocessing to model deployment. By the end of this module, you will be able to
prepare data, train & evaluate models, and deploy them to the cloud or a web interface.

Data Preprocessing
Before training a machine learning model, the data needs to be cleaned and transformed into a
format suitable for the model. This step significantly impacts the accuracy and performance of
the model.

1. Data Cleaning

)’c Common Issues in Raw Data

 Missing Values: Some records may have empty or NaN values.


 Duplicate Entries: Duplicate rows can affect model performance.
 Inconsistent Formatting: Inconsistent capitalization, typos, and formats.
 Outliers: Extreme values that can distort predictions.

c)
’ Techniques to Handle Missing Values

 Remove missing values (if very few).


 Fill missing values using:
o Mean/Median (for numerical data).
o Mode (for categorical data).
o Predictive Imputation (using regression or KNN).

’c Handling Outliers
)

 Z-Score method (removing values beyond 3 standard deviations).


 IQR Method (removing values outside 1.5x the interquartile range).

2. Encoding Categorical Data

c
)’Why Encode Categorical Variables?
Machine learning models work with numerical values. Categorical data must be converted into
numerical format.

c Types of Encoding

)

 Label Encoding (Converts categories into numerical values, e.g., Male → 0, Female → 1).

Page | 70
 One-Hot Encoding (Creates a binary column for each category).
 Ordinal Encoding (For ordered categories like Low, Medium, High → 1, 2, 3).

3. Feature Scaling & Normalization

’ hy Scale Features?
)
W

 Models like SVM, k-NN, and Gradient Descent-based algorithms are affected by varying
scales.
 Ensures that features contribute equally to the learning process.

)

c Scaling Techniques

 Min-Max Scaling (scales values between 0 and 1).


 Standardization (Z-score Normalization) (scales data to have mean 0 and variance 1).

c)
’ Python Implementation Example

from sklearn.preprocessing import StandardScaler, LabelEncoder


import pandas as pd

# Load dataset
df = pd.read_csv('data.csv')

# Handling missing values


df.fillna(df.mean(), inplace=True)

# Encoding categorical features


encoder = LabelEncoder()
df['Category'] = encoder.fit_transform(df['Category'])

# Scaling numerical features


scaler = StandardScaler()
df[['Age', 'Salary']] = scaler.fit_transform(df[['Age', 'Salary']])

print(df.head())

Model Training & Testing


Once data is pre-processed, the next step is to train and evaluate machine learning models.

1. Splitting the Dataset

’c Why Split the Data?


)

 The dataset is split to train the model and evaluate its performance.
 Train Set (70-80%) → Used to train the model.
 Test Set (20-30%) → Used to check model performance.
 Validation Set (Optional, 10-15%) → Fine-tunes hyperparameters before testing.

Page | 71

)
c Splitting Data in Python

from sklearn.model_selection import train_test_split

# Splitting dataset
X = df.drop('Target', axis=1)
y = df['Target']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,


random_state=42)

print("Training data size:", X_train.shape)


print("Testing data size:", X_test.shape)

2. Model Training


)
c Choosing the Right Model

 Linear Regression → Predicting continuous values (e.g., house prices).


 Logistic Regression → Binary classification (e.g., spam detection).
 Decision Trees / Random Forest → High interpretability, good for tabular data.
 SVM & k-NN → Best for smaller datasets.
 Neural Networks → Best for large datasets with complex patterns.

c)
’ Training a Sample Model (Random Forest)

from sklearn.ensemble import RandomForestClassifier

# Train model
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

print("Model training complete!")

3. Model Evaluation Metrics

’c Classification Metrics
)

 Accuracy: Percentage of correctly classified instances.


 Precision & Recall: Evaluates class-specific performance.
 F1-Score: Balance between Precision & Recall.
 Confusion Matrix: Shows True Positives (TP), False Positives (FP), etc.

’c Regression Metrics
)

 Mean Squared Error (MSE): Measures average squared error.


 R² Score: Measures goodness of fit.

)
c’ Evaluation Example in Python

from sklearn.metrics import accuracy_score, classification_report

# Predictions

Page | 72
y_pred = model.predict(X_test)

# Evaluation
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
print(classification_report(y_test, y_pred))

Model Deployment Basics


Once the model is trained, the next step is deploying it to a real-world application.

1. Deploying Machine Learning Models to the Cloud

’c Why Deploy Models?


)

 Makes the model available for real-world applications via an API or web interface.
 Can be hosted on cloud platforms like AWS, Azure, or Google Cloud.

’c Deployment Platforms
)

 AWS SageMaker: Deploy ML models with auto-scaling.


 Google AI Platform: Best for TensorFlow models.
 Microsoft Azure ML: Easy deployment for businesses.
 Flask/Django API: Run models on web servers.

c)
’ Basic Model Deployment with Flask

from flask import Flask, request, jsonify


import pickle

app = Flask( name )

# Load trained model


model = pickle.load(open('model.pkl', 'rb'))

@app.route('/predict', methods=['POST'])
def predict():
data = request.get_json()
prediction = model.predict([data['features']])
return jsonify({'prediction': prediction.tolist()})

if name == ' main ':


app.run(debug=True)

2. Building a Basic Web Interface for ML Models

)’c Using Streamlit for UI

import streamlit as st
import pickle

# Load model
model = pickle.load(open('model.pkl', 'rb'))

Page | 73
# UI
st.title("ML Model Deployment")
input_data = st.text_input("Enter Features:")

if st.button("Predict"):
prediction = model.predict([eval(input_data)])
st.write("Prediction:", prediction)


c
) Run the app:

streamlit run app.py

Summary:

⬛ Data Preprocessing → Cleaning, handling missing values, encoding & scaling.

⬛ Model Training & Testing → Splitting data, training, and evaluating models.

⬛ Model Deployment → Deploying models on the cloud & building a web interface.

Page | 74
ACTIVITY LOG FOR THE TENTH WEEK

Day Person
Brief description of the daily Learning Outcome In-Charge
activity Signature

Day–1 Students learn


Explain about python basics
about python
introduction
basics
introduction

Day-2 Students learn about data


Explain about variables, identifiers
types, variables,
identifiers

Day–3 Students learned


Explain about input and output about input and
statements in python output statements

Day–4 Students learn about


Explain about operators
operators

Day–5 Students learn


Explain about conditional statements
aboutconditional
statements

Day–6 Students implements


Implement programs regarding
programs regarding
python covered topics
python covered topics

Page | 75
WEEKLY REPORT
WEEK–10(From Dt………..…..to Dt.................. )

Objective of the Activity Done:

Detailed Report:

Explain about python basics

Explain about variables, identifiers

Explain about operators, conditional statements

Page | 76
Python Basics: A Comprehensive Guide
Python is a high-level, interpreted programming language known for its simplicity and
readability. It is widely used in web development, data science, artificial intelligence, machine
learning, automation, and scripting.

1. Introduction to Python Basics



c
) Key Features of Python:

⬛ Easy to Learn: Simple syntax similar to English.

⬛ Interpreted Language: Executes code line-by-line, making debugging easier.

⬛ Dynamically Typed: No need to declare variable types explicitly.

⬛ Portable: Runs on multiple platforms (Windows, Mac, Linux).

⬛ Extensive Libraries: NumPy, Pandas, TensorFlow, and many more.

c)
’ Python Installation & Running Python Code

 Download from python.org.


 Run Python using:
o Python Interactive Shell (REPL)
o Python Scripts (Save code in .py files and execute via terminal).

Hello World in Python


print("Hello, World!")

◆ print() is used to display output on the screen.

2. Variables and Identifiers


What is a Variable?

A variable is a name assigned to a memory location that stores data.

c)
’ Variable Declaration Example

x = 10
name = "Alice"
pi = 3.14

◆ Python automatically assigns the data type based on the value.

Page | 77
Rules for Variable Names (Identifiers)


P Must start with a letter (A-Z or a-z) or an underscore _.

P Can contain letters, digits (0-9), and underscores.

P Cannot be a reserved keyword (e.g., if, else, while).

P Case-sensitive (Age and age are different).

) Valid & Invalid Variable Names



c

⬛ my_variable, age_23, _hidden_value
+ 23age (Cannot start with a number), my variable (Spaces not allowed)

)
’c Checking Data Type of a Variable

x = 5
print(type(x)) # Output: <class 'int'>

3. Input and Output Statements in Python


Input Statements (input())

’c Taking User Input


)

name = input("Enter your name: ")


print("Hello, " + name + "!")

◆ input() reads input as a string.


◆ Use int() or float() to convert input to a number.

c)
’ Example: Taking Integer Input

age = int(input("Enter your age: "))


print("You are", age, "years old.")

Output Statements (print())

)’c Printing Multiple Values

print("Python", "is", "fun!")

◆ Outputs: Python is fun! (By default, prints with space).

c)
’ Using Format Strings (f-strings)

name = "Alice"
age = 25
print(f"My name is {name} and I am {age} years old.")

Page | 78
4. Operators in Python
Types of Operators

1. Arithmetic Operators (Used for mathematical operations)

Operator Description Example


+ Addition x + y
- Subtraction x - y
* Multiplication x * y
/ Division x / y
// Floor Division x // y (removes decimal)
% Modulus x % y (remainder)
** Exponentiation x ** y (power)

c Example

)

x = 10
y = 3
print(x + y) # 13
print(x ** y) # 1000 (10^3)

2. Comparison Operators (Returns True or False)

Operator Example Output


== x == y Checks if x is equal to y
!= x != y Checks if x is not equal to y
> x > y Checks if x is greater than y
< x < y Checks if x is less than y
>= x >= y Checks if x is greater than or equal to y
<= x <= y Checks if x is less than or equal to y

3. Logical Operators (Used in conditional statements)

Operator Description Example


and Returns True if both conditions are True x > 5 and y < 10
or Returns True if at least one condition is True x > 5 or y < 10
not Reverses the condition not(x > 5)

c Example

)

x = 5
y = 10
print(x > 2 and y < 15) # True

Page | 79
5. Conditional Statements
1. if Statement


c
) Basic if Condition

age = 18
if age >= 18:
print("You are eligible to vote!")

2. if-else Statement

)’c Example with else Condition

num = int(input("Enter a number: "))


if num % 2 == 0:
print("Even number")
else:
print("Odd number")

3. if-elif-else Statement

)
c’ Example with Multiple Conditions

marks = int(input("Enter marks: "))

if marks >= 90:


print("Grade: A")
elif marks >= 75:
print("Grade: B")
elif marks >= 60:
print("Grade: C")
else:
print("Fail")

6. Implementing Programs Based on Covered Topics


)’c Program 1: Simple Calculator

a = float(input("Enter first number: "))


b = float(input("Enter second number: "))
operation = input("Choose operation (+, -, *, /): ")

if operation == '+':
print("Result:", a + b)
elif operation == '-':
print("Result:", a - b)
elif operation == '*':
print("Result:", a * b)
elif operation == '/':
print("Result:", a / b)

Page | 80
else:
print("Invalid operation")

’)c Program 2: Check Leap Year

year = int(input("Enter a year: "))

if (year % 4 == 0 and year % 100 != 0) or (year % 400 == 0):


print(year, "is a leap year.")
else:
print(year, "is not a leap year.")

c)’ Program 3: Largest of Three Numbers

a = int(input("Enter first number: "))


b = int(input("Enter second number: "))
c = int(input("Enter third number: "))

if a > b and a > c:


print(a, "is the largest")
elif b > a and b > c:
print(b, "is the largest")
else:
print(c, "is the largest")

Conclusion
’● Python is beginner-friendly and powerful.

●’◉ Understanding variables, operators, and conditional statements is fundamental.
’ Practice writing small programs to reinforce learning.

Page | 81
ACTIVITY LOG FOR THE ELEVENTH WEEK

Day Person
Brief description of the daily Learning Outcome In-Charge
activity Signature

Day–1 Students learn


Explain about data types (numeric,
aboutdata types
sequence)
(numeric,
sequence)

Day-2 Students learn about list,


Explain about list, tuple,set,dict
tuple,set,dict

Day–3 Students learn about


Explain about functions
functions

Day–4 Students learn about file


Explain about file handling
handling

Day–5 Students learn


Explain about comprehensions
about
comprehensions

Day–6 Students programs


Implement programs on list, set,
on list, set, tuple,
tuple, dict, functions
dict, functions

Page | 82
WEEKLY REPORT
WEEK–11(From Dt………..…..to Dt.................. )

Objective of the Activity Done:

Detailed Report:

Explain about data types (list,string,set,tuple,dict)

Explain about functions and modules, file handling

Explain about comprehensions


Worked on programs regarding above topics

Page | 83
Python: In-depth Explanation with Examples

1. Data Types in Python


Python provides several built-in data types that help in handling different types of data.

A. Numeric Data Types

’c Used for storing numbers.


)

Data Type Description Example


Int Integer (Whole Numbers) x = 10
Float Decimal Numbers y = 3.14
Complex Complex Numbers z = 2 + 3j

’c) Example of Numeric Types

a = 5 # int
b = 2.5 # float
c = 3 + 4j # complex

print(type(a)) # <class 'int'>


print(type(b)) # <class 'float'>
print(type(c)) # <class 'complex'>

B. Sequence Data Types

Sequence data types allow us to store multiple values.

Data Type Description Example


Str String (Text data) "Hello"
List Ordered, Mutable Collection [1, 2, 3]
Tuple Ordered, Immutable Collection (1, 2, 3)
Range Sequence of numbers range(1,10)

c Example

)

s = "Python" # String
l = [1, 2, 3] # List
t = (4, 5, 6) # Tuple

print(type(s), type(l), type(t))

Page | 84
2. Lists, Tuples, Sets, and Dictionaries
A. Lists in Python

c’) Lists are mutable, ordered collections of elements.

fruits = ["apple", "banana", "cherry"]


fruits.append("orange") # Adds a new element
fruits.remove("banana") # Removes an element
print(fruits) # ['apple', 'cherry', 'orange']

◆ List Operations

numbers = [1, 2, 3]
print(numbers[1]) # Accessing elements
print(len(numbers)) # Finding length
print(numbers[::-1]) # Reversing a list

B. Tuples in Python

c)
’ Tuples are immutable, ordered collections.

colors = ("red", "green", "blue")


print(colors[0]) # Accessing elements

◆ Tuple Packing & Unpacking

x, y, z = colors # Unpacking tuple


print(x, y, z)

C. Sets in Python

c)
’ Sets are unordered, unique collections.

numbers = {1, 2, 3, 3, 2, 1} # Duplicates are removed


numbers.add(4) # Adding an element
numbers.remove(2) # Removing an element
print(numbers) # {1, 3, 4}

◆ Set Operations

A = {1, 2, 3}
B = {3, 4, 5}
print(A | B) # Union {1, 2, 3, 4, 5}
print(A & B) # Intersection {3}
print(A - B) # Difference {1, 2}

D. Dictionaries in Python

Page | 85
c)
’ Dictionaries store key-value pairs.

student = {"name": "John", "age": 20, "grade": "A"}


print(student["name"]) # Accessing values
student["age"] = 21 # Modifying value
student["city"] = "New York" # Adding new key-value pair
print(student)

◆ Dictionary Methods

print(student.keys()) # Get all keys


print(student.values()) # Get all values
print(student.items()) # Get key-value pairs

3. Functions in Python
A. Defining Functions

c)
’ Functions help to reuse code efficiently.

def greet(name):
return f"Hello, {name}!"

print(greet("Alice")) # Output: Hello, Alice!

B. Function with Default Arguments


def greet(name="Guest"):
return f"Welcome, {name}!"

print(greet()) # Default value used

C. Function with Multiple Parameters


def add(a, b):
return a + b

print(add(3, 5)) # Output: 8

4. File Handling in Python


A. Writing to a File
with open("test.txt", "w") as file:
file.write("Hello, world!")

B. Reading from a File


with open("test.txt", "r") as file:
content = file.read()
print(content)

Page | 86
C. Appending to a File
with open("test.txt", "a") as file:
file.write("\nThis is a new line!")

5. Comprehensions in Python
A. List Comprehension

c)
’ Creating a list with a single line of code.

squares = [x**2 for x in range(5)]


print(squares) # [0, 1, 4, 9, 16]

B. Dictionary Comprehension
squares_dict = {x: x**2 for x in range(5)}
print(squares_dict) # {0: 0, 1: 1, 2: 4, 3: 9, 4: 16}

C. Set Comprehension
unique_numbers = {x for x in [1, 2, 2, 3, 4, 4, 5]}
print(unique_numbers) # {1, 2, 3, 4, 5}

6. Python Programs on Lists, Sets, Tuples, Dict, Functions


A. Find Maximum in a List
numbers = [10, 20, 5, 40]
print("Maximum:", max(numbers))

B. Remove Duplicates from a List using Sets


numbers = [1, 2, 2, 3, 4, 4, 5]
unique_numbers = list(set(numbers))
print(unique_numbers) # [1, 2, 3, 4, 5]

C. Convert List to Tuple


numbers = [1, 2, 3, 4]
numbers_tuple = tuple(numbers)
print(numbers_tuple) # (1, 2, 3, 4)

D. Merge Two Dictionaries


dict1 = {"a": 1, "b": 2}
dict2 = {"c": 3, "d": 4}

Page | 87
merged = {**dict1, **dict2}
print(merged)

E. Function to Calculate Factorial


def factorial(n):
if n == 0:
return 1
return n * factorial(n - 1)

print(factorial(5)) # Output: 120

F. Count Word Frequency in a File


with open("test.txt", "r") as file:
words = file.read().split()
word_count = {word: words.count(word) for word in words}
print(word_count)

Conclusion
’●
◉ Python provides flexible data types (Lists, Tuples, Sets, and Dicts).
’●◉ Functions improve code reusability.
’ File handling is essential for working with external data.


’●
◉ Comprehensions simplify list, dict, and set creation.

Page | 88
ACTIVITY LOG FOR THE TWELVETH WEEK

Day Person
Brief description of the daily Learning Outcome In-Charge
activity Signature

Day–1 Students learn about OOPS


Explain about oops, class, object,
Concepts
constructor

Day-2 Students learn about


Explain about oops concepts
oops concepts

Students works on
Explain programs regarding oops
programs regarding
concepts (inheritance,
Day–3 oops concepts
polymorphism)
(inheritance,
polymorphism)

Explain programs regarding Students works on


Day–4 oops concepts (abstraction, programs regarding oops
encapsulation) concepts (abstraction,
encapsulation)

Day–5 Explain packages Students learn about


packages

Explain array concepts and Students learn about


Day–6 programs arrays and programs

Page | 89
WEEKLY REPORT
WEEK–12(From Dt………..…..to Dt.................. )

Objective of the Activity Done:

Detailed Report:

Explain about oops concepts


Explain programs regarding oops concepts (inheritance, polymorphism)

Explain packages

Explain array concepts and programs

Page | 90
Object-Oriented Programming (OOP) in Python
Object-Oriented Programming (OOP) is a programming paradigm that uses objects and classes to
structure code. It helps in organizing code into reusable and scalable components.

1. Classes, Objects, and Constructors in Python


A. Class & Object


)cA class is a blueprint for creating objects.
)
’c An object is an instance of a class.

# Defining a Class
class Car:
def init (self, brand, model): # Constructor
self.brand = brand
self.model = model

def show_info(self):
return f"Car: {self.brand}, Model: {self.model}"

# Creating Objects
car1 = Car("Toyota", "Corolla")
car2 = Car("Honda", "Civic")

print(car1.show_info()) # Output: Car: Toyota, Model: Corolla


print(car2.show_info()) # Output: Car: Honda, Model: Civic

2. OOP Concepts in Python


A. Encapsulation

c)’ Encapsulation means restricting direct access to data and allowing modifications via
methods.

class BankAccount:
def init (self, balance):
self. balance = balance # Private variable

def deposit(self, amount):


self. balance += amount

def get_balance(self):
return self. balance

# Creating Object
account = BankAccount(1000)

Page | 91
account.deposit(500)
print(account.get_balance()) # Output: 1500

◆ Here, balance is a private variable, so it cannot be accessed directly.

B. Inheritance

c)’ Inheritance allows a child class to acquire properties of a parent class.

# Parent Class
class Animal:
def sound(self):
return "Animal makes sound"

# Child Class
class Dog(Animal):
def sound(self): # Method Overriding
return "Dog barks"

dog = Dog()
print(dog.sound()) # Output: Dog barks

C. Polymorphism

c)’ Polymorphism allows methods to have different implementations.

class Cat:
def sound(self):
return "Meow"

class Dog:
def sound(self):
return "Bark"

# Using Polymorphism
for animal in (Cat(), Dog()):
print(animal.sound())

◆ Both Cat and Dog have a sound() method, but their implementations are different.

D. Abstraction

c)’ Abstraction hides implementation details and only exposes the necessary parts.

from abc import ABC, abstractmethod

class Vehicle(ABC):
@abstractmethod
def start(self):

Page | 92
pass

class Car(Vehicle):
def start(self):
return "Car is starting"

car = Car()
print(car.start()) # Output: Car is starting

◆ Abstract classes cannot be instantiated and must be inherited.

3. Programs on OOP Concepts (Inheritance, Polymorphism,


Abstraction, Encapsulation)
Program: Inheritance
class Parent:
def show(self):
return "This is the Parent class"

class Child(Parent):
def display(self):
return "This is the Child class"

obj = Child()
print(obj.show()) # Parent class method
print(obj.display()) # Child class method

Program: Polymorphism
class Shape:
def area(self):
return 0

class Square(Shape):
def init (self, side):
self.side = side

def area(self):
return self.side * self.side

sq = Square(5) print(sq.area())
# Output: 25

Program: Abstraction
from abc import ABC, abstractmethod

class Animal(ABC):
@abstractmethod
def make_sound(self):
pass

Page | 93
class Dog(Animal):
def make_sound(self):
return "Bark"

dog = Dog()
print(dog.make_sound()) # Output: Bark

Program: Encapsulation
class Student:
def init (self, name, age):
self. name = name # Private attribute
self. age = age # Private attribute

def get_details(self):
return f"Student: {self. name}, Age: {self. age}"

student = Student("Alice", 22)


print(student.get_details()) # Output: Student: Alice, Age: 22

4. Packages in Python
)’
cA package is a collection of Python modules.
c)’ It helps in organizing large projects.

Creating a Package

1. Create a folder named mypackage


2. Inside mypackage, create a file math_operations.py

# math_operations.py
def add(a, b):
return a + b

3. Inside mypackage, create an empty init .py file


4. Now, use the package in another file:

from mypackage import math_operations

result = math_operations.add(5, 3)
print(result) # Output: 8

◆ The init .py file tells Python that this directory is a package.

Page | 94
5. Arrays in Python
c)’ Arrays store multiple values of the same type.
)’c Python provides array module for handling arrays.

import array

# Creating an array
arr = array.array('i', [10, 20, 30])

# Accessing elements
print(arr[0]) # Output: 10
print(arr[1]) # Output: 20

# Adding elements
arr.append(40)

# Removing elements
arr.remove(20)

print(arr) # Output: array('i', [10, 30, 40])

6. Programs on Arrays
A. Find Maximum in an Array
import array

arr = array.array('i', [10, 50, 30, 40])


print(max(arr)) # Output: 50

B. Reverse an Array
import array

arr = array.array('i', [1, 2, 3, 4])


arr.reverse()
print(arr) # Output: array('i', [4, 3, 2, 1])

C. Find Sum of Array Elements


import array

arr = array.array('i', [5, 10, 15, 20])


print(sum(arr)) # Output: 50

D. Searching an Element in an Array

Page | 95
import array

arr = array.array('i', [10, 20, 30, 40])

def search(arr, x):


for i in range(len(arr)):
if arr[i] == x:
return i
return -1

print(search(arr, 30)) # Output: 2

Conclusion

⬛ OOP concepts help in structuring large programs efficiently.

⬛ Encapsulation, Inheritance, Polymorphism, and Abstraction enhance code reusability and
security.

⬛ Packages allow modular programming and easy maintenance.

⬛ Arrays store multiple values efficiently and support basic operations.

Page | 96
ACTIVITY LOG FOR THE THIRTEENTH WEEK

Day Person
Brief description of the daily Learning Outcome In-Charge
activity Signature

Day–1 Explain about NumPy for Students learn about


numerical computing NumPy for
numerical computing

Day-2 Explain Pandas for data Students learns about


manipulation pandas

Day–3 Explaining about Students learns about


Matplotlib & Seaborn for Matplotlib &
data visualization Seaborn for data
visualization

Day–4 Implement Small Python Students Implement


scripts and data analysis Small Python scripts
tasks and data analysis
tasks

Day–5 Implement Small Python Students Implement


scripts and data analysis Small Python scripts
tasks and data analysis
tasks

Day–6 Perform EDA on a real-world


dataset (e.g., Titanic dataset) Students Perform
EDA on a real-
world dataset (e.g.,
Titanic dataset)

Page | 97
WEEKLY REPORT
WEEK–13(From Dt………..…..to Dt.................. )

Objective of the Activity Done:

Detailed Report:
Explain about NumPy for numerical computing

Explain Pandas for data manipulation

Explaining about Matplotlib & Seaborn for data visualization

Implement Small Python scripts and data analysis tasks

Page | 98
Python for Data Science: NumPy, Pandas,
Matplotlib, and Seaborn
Python is widely used for data science and numerical computing. The key libraries for these tasks
include NumPy, Pandas, Matplotlib, and Seaborn. This guide covers these libraries in-depth,
along with practical Python scripts and Exploratory Data Analysis (EDA) on the Titanic dataset.

1. NumPy for Numerical Computing


Introduction to NumPy
)’c NumPy (Numerical Python) is a powerful library for numerical computations in Python.
’)c It provides multi-dimensional arrays and high-level mathematical functions.

Key Features of NumPy


⬛ Efficient handling of large datasets

⬛ Vectorized operations (faster than Python lists)

⬛ Supports mathematical and statistical operations

Installing NumPy
pip install numpy

Creating NumPy Arrays


import numpy as np

# 1D Array
arr1 = np.array([1, 2, 3, 4, 5])
print(arr1) # Output: [1 2 3 4 5]

# 2D Array (Matrix)
arr2 = np.array([[1, 2, 3], [4, 5, 6]])
print(arr2)

NumPy Operations
# Arithmetic Operations
a = np.array([10, 20, 30])
b = np.array([1, 2, 3])
print(a + b) # Output: [11 22 33]

# Generating Random Numbers


random_array = np.random.rand(3, 3)
print(random_array)

Page | 99
Statistical Functions
data = np.array([1, 2, 3, 4, 5])
print(np.mean(data)) # Mean
print(np.median(data)) # Median
print(np.std(data)) # Standard Deviation

2. Pandas for Data Manipulation


Introduction to Pandas
’)cPandas is a Python library used for data manipulation and analysis.
)’c It provides two main data structures:
✔ Series (1D labeled array)
✔ DataFrame (2D table similar to a database table)

Installing Pandas
pip install pandas

Creating Pandas Series


import pandas as pd

data = pd.Series([10, 20, 30, 40])


print(data)

Creating Pandas DataFrame


data = {
"Name": ["Alice", "Bob", "Charlie"],
"Age": [25, 30, 35],
"City": ["New York", "London", "Paris"]
}

df = pd.DataFrame(data)
print(df)

Reading & Writing Data

# Reading CSV File


df = pd.read_csv("data.csv")

# Writing to CSV File


df.to_csv("output.csv", index=False)

Basic Data Operations


# Display first 5 rows
print(df.head())

Page | 100
# Display data types
print(df.dtypes)

# Summary Statistics
print(df.describe())

# Checking for missing values


print(df.isnull().sum())

3. Matplotlib & Seaborn for Data


Visualization
Matplotlib for Basic Visualization
c’) Matplotlib is a plotting library used for creating static visualizations.

Installing Matplotlib
pip install matplotlib

Basic Plotting
import matplotlib.pyplot as plt

# Creating a Line Plot


x = [1, 2, 3, 4, 5]
y = [10, 15, 20, 25, 30]

plt.plot(x, y, marker='o', linestyle='--', color='b')


plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.title("Simple Line Plot")
plt.show()

Bar Chart
categories = ["A", "B", "C"]
values = [30, 50, 80]

plt.bar(categories, values, color=['red', 'blue', 'green'])


plt.title("Bar Chart Example")
plt.show()

Seaborn for Advanced Visualization


’c) Seaborn is a visualization library based on Matplotlib.
’)c It provides a high-level interface for attractive statistical graphics.

Page | 101
Installing Seaborn
pip install seaborn

Creating a Histogram
import seaborn as sns

# Sample Data
data = [10, 20, 20, 30, 30, 30, 40, 50, 60]

# Histogram
sns.histplot(data, bins=5, kde=True)
plt.title("Histogram Example")
plt.show()

Creating a Scatter Plot


import seaborn as sns

# Creating sample dataset


df = pd.DataFrame({"X": [1, 2, 3, 4, 5], "Y": [10, 20, 25, 30, 50]})

sns.scatterplot(data=df, x="X", y="Y")


plt.title("Scatter Plot Example")
plt.show()

4. Implementing Small Python Scripts & Data


Analysis Tasks
Python Script to Compute Mean and Median
import numpy as np

data = np.array([10, 20, 30, 40, 50])

print("Mean:", np.mean(data))
print("Median:", np.median(data))

Python Script to Load & Analyze a CSV File


import pandas as pd

df = pd.read_csv("data.csv")

# Display first 5 rows


print(df.head())

# Check for missing values


print(df.isnull().sum())

Page | 102
5. Performing EDA on Titanic Dataset
Step 1: Load the Dataset
import pandas as pd

# Load Titanic dataset


df =
pd.read_csv("https://raw.githubusercontent.com/datasciencedojo/datasets/master/t
itanic.csv")

# Display first few rows


print(df.head())

Step 2: Check for Missing Values


print(df.isnull().sum())

Step 3: Data Visualization


import seaborn as sns
import matplotlib.pyplot as plt

# Countplot of Survival
sns.countplot(x="Survived", data=df)
plt.title("Survival Count")
plt.show()

# Survival rate by gender


sns.barplot(x="Sex", y="Survived", data=df)
plt.title("Survival Rate by Gender")
plt.show()

Step 4: Handling Missing Values


# Fill missing Age values with the median
df["Age"].fillna(df["Age"].median(), inplace=True)

# Drop Cabin column as it has too many missing values


df.drop(columns=["Cabin"], inplace=True)

Step 5: Feature Encoding


# Convert categorical columns into numerical values
df["Sex"] = df["Sex"].map({"male": 0, "female": 1})
df["Embarked"] = df["Embarked"].map({"C": 0, "Q": 1, "S": 2})
df.fillna(0, inplace=True) # Handling remaining missing values

Page | 103
Step 6: Train a Simple Model
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

# Define features and target variable


X = df[["Pclass", "Sex", "Age", "Fare"]]
y = df["Survived"]

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

# Train model
model = RandomForestClassifier()
model.fit(X_train, y_train)

# Predictions
y_pred = model.predict(X_test)

# Accuracy
print("Model Accuracy:", accuracy_score(y_test, y_pred))

Conclusion

⬛ NumPy for numerical computing

⬛ Pandas for data manipulation

⬛ Matplotlib & Seaborn for data visualization

✓ EDA on Titanic Dataset c÷
X

Page | 104
ACTIVITY LOG FOR THE FOURTEENTH WEEK

Day Person
Brief description of the daily Learning Outcome In-Charge
activity Signature

Day–1 Students learn


Explain about Advanced ML
aboutAdvanced
Algorithms
ML Algorithms

Students Implement
Day-2 Implement Real-World ML Real-World ML Project
Project (capstone project)

Students Implement
Day–3 Implement Optimization & Optimization &
Hyperparameter Tuning Hyperparameter Tuning

Day–4 Students Implement


Implement Real-World ML Project
Real-World ML Project
(capstone project) (continued...)

Day–5 Students
Implement Real-World ML Project
Implement Real-
(capstone project) (continued...)
World ML Project

Day–6 Students
Implement Real-World ML Project
Implement Real-
(capstone project) (continued...)
World ML Project

Page | 105
WEEKLY REPORT
WEEK–14(From Dt………..…..to Dt.................. )

Objective of the Activity Done:

Detailed Report:

Explain about Advanced ML algorithms

Implement Real-World ML Project (capstone project)

Page | 106
Advanced Machine Learning Algorithms &
Capstone Project Implementation
1. Advanced Machine Learning Algorithms
Once you understand basic supervised and unsupervised learning algorithms, you can explore more
advanced ML techniques to improve model performance and solve complex real-world problems.

1.1. Ensemble Learning


)’c Ensemble Learning is a technique that combines multiple models to improve accuracy and
robustness.
’)c It helps to reduce bias and variance, leading to better generalization.

Types of Ensemble Learning


⬛ Bagging (Bootstrap Aggregating)

⬛ Boosting (Adaptive Learning)

⬛ Stacking (Meta-Learning)

1.1.1. Random Forest (Bagging Example)

c)’ Random Forest is an extension of Decision Trees where multiple trees are built using different
subsets of data.
’)c The final prediction is made by majority voting (for classification) or averaging (for
regression).

Implementation:

from sklearn.ensemble import RandomForestClassifier


from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.datasets import load_iris

# Load dataset
data = load_iris()
X, y = data.data, data.target

# Split dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

# Train Random Forest


model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

Page | 107
# Predictions
y_pred = model.predict(X_test)

# Accuracy
print("Random Forest Accuracy:", accuracy_score(y_test, y_pred))

1.1.2. Gradient Boosting (Boosting Example)

)’c Boosting is an iterative technique that adjusts model weights to focus on hard-to-classify
cases.
c) ’ Gradient Boosting & XGBoost are popular implementations.

Implementation using XGBoost:

from xgboost import XGBClassifier

# Train XGBoost Model


xgb_model = XGBClassifier(n_estimators=100, learning_rate=0.1)
xgb_model.fit(X_train, y_train)

# Predictions
y_pred = xgb_model.predict(X_test)

# Accuracy
print("XGBoost Accuracy:", accuracy_score(y_test, y_pred))

1.2. Dimensionality Reduction


)’c High-dimensional data increases computational cost and overfitting risk.
c)’ Principal Component Analysis (PCA) is used to reduce the number of features while
preserving maximum variance.

PCA Example:

from sklearn.decomposition import PCA

# Apply PCA
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X)

print("Original Shape:", X.shape)


print("Reduced Shape:", X_pca.shape)

1.3. Clustering (Advanced Unsupervised Learning)


c)
’ Clustering is used to group similar data points together.
c’) K-Means and DBSCAN are commonly used clustering techniques.

K-Means Clustering Example:

Page | 108
from sklearn.cluster import KMeans

# Train K-Means Model


kmeans = KMeans(n_clusters=3, random_state=42)
clusters = kmeans.fit_predict(X)

print("Cluster Labels:", clusters)

1.4. Natural Language Processing (NLP)


c
)’NLP is used to analyze and process textual data.
)’
c TF-IDF, Word Embeddings (Word2Vec), LSTMs, Transformers (BERT, GPT) are key
NLP techniques.

TF-IDF Example:

from sklearn.feature_extraction.text import TfidfVectorizer

# Sample text data


texts = ["Machine learning is amazing", "Deep learning is a subset of ML", "AI
is the future"]

# Convert text to TF-IDF features


vectorizer = TfidfVectorizer()
X_tfidf = vectorizer.fit_transform(texts)

print("TF-IDF Features:", X_tfidf.toarray())

Feature Subset Selection in Machine Learning


Feature Subset Selection (also called Feature Selection) is the process of selecting a subset of
relevant features (input variables) to use in model training. The goal is to improve model
performance by removing irrelevant, redundant, or noisy features.

1. Why Feature Selection?



⬛ Reduces Overfitting → Less noise in data leads to better generalization.

⬛ Improves Accuracy → Irrelevant features can mislead the model.

⬛ Reduces Training Time → Fewer features mean faster computation.

⬛ Enhances Interpretability → Easier to understand a model with fewer features.

2. Types of Feature Selection Methods


Feature selection methods are broadly categorized into:

1. Filter Methods
Page | 109
2. Wrapper Methods
3. Embedded Methods
4. Hybrid Methods

3. Feature Selection Methods in Detail


1. Filter Methods (Preprocessing Step)

Filter methods use statistical techniques to evaluate feature importance before training the
model.

✔ Advantages: Fast, independent of any ML algorithm.


+ Disadvantages: Ignores feature interactions.

◆ Common Techniques:

 Correlation Coefficient: Remove highly correlated features.


 Chi-Square Test: Measures dependency between categorical features and target.
 Mutual Information (MI): Measures how much information a feature provides about the
target.
 Variance Threshold: Removes features with low variance.
 ANOVA (Analysis of Variance): Tests differences between groups for feature selection.

c)
’ Example (Using Scikit-Learn in Python)

from sklearn.feature_selection import SelectKBest, chi2


X_new = SelectKBest(chi2, k=5).fit_transform(X, y) # Selects top 5 features

2. Wrapper Methods (Use ML Models)

Wrapper methods train models iteratively using different feature subsets and select the best-
performing subset.

✔ Advantages: Finds the best feature subset for a specific model.


+ Disadvantages: Computationally expensive.

◆ Common Techniques:

 Forward Selection: Starts with no features, adds the best feature iteratively.
 Backward Elimination: Starts with all features, removes least significant one at each step.
 Recursive Feature Elimination (RFE): Eliminates least important features recursively.

)
c’ Example (Using RFE in Python)

from sklearn.feature_selection import RFE


from sklearn.ensemble import RandomForestClassifier

Page | 110
model = RandomForestClassifier()
rfe = RFE(model, n_features_to_select=5) # Selects top 5 features
X_new = rfe.fit_transform(X, y)

3. Embedded Methods (Built into ML Algorithms)

Embedded methods integrate feature selection into the model training process.

✔ Advantages: Faster than wrapper methods, considers feature interactions.


+ Disadvantages: Specific to the ML model used.

◆ Common Techniques:

 Lasso Regression (L1 Regularization): Shrinks less important feature coefficients to zero.
 Decision Tree Feature Importance: Trees naturally rank features based on splits.

c)
’ Example (Using Lasso in Python)

from sklearn.linear_model import Lasso


lasso = Lasso(alpha=0.01) # Alpha controls feature selection strength
lasso.fit(X, y)
selected_features = X.columns[lasso.coef_ != 0]

c)
’ Example (Using Feature Importance in Random Forest)

from sklearn.ensemble import RandomForestClassifier


model = RandomForestClassifier()
model.fit(X, y)
importances = model.feature_importances_

4. Hybrid Methods (Combination of Above)

Hybrid methods combine filter, wrapper, and embedded methods to optimize feature selection.

✔ Advantages: More accurate than using a single method.


+ Disadvantages: Computationally expensive.

◆ Example Workflow:

1. Use Filter Method to remove irrelevant features (e.g., low variance).


2. Use Wrapper Method (like RFE) to refine feature selection.
3. Use Embedded Method (like Lasso) for final selection.

Page | 111
4. Feature Selection vs Feature Extraction
Feature Selection Feature Extraction
Selects a subset of existing features Creates new features from existing ones
Does not alter original features Transforms features (e.g., PCA, Autoencoders)
Example: Removing redundant columns Example: Reducing dimensions using PCA

5. How to Choose the Right Feature Selection Method?


Scenario Best Method
Large dataset, fast processing Filter (Chi-Square, Correlation)
Small dataset, model-specific tuning Wrapper (RFE, Forward/Backward Selection)
Automated selection with training Embedded (Lasso, Decision Trees)
Best of both worlds Hybrid (Filter + Wrapper)

6. Conclusion
 Feature selection is crucial for improving model performance.
 Different methods (Filter, Wrapper, Embedded) have trade-offs.
 Hybrid methods offer the best results but are computationally expensive.

2. Implementing a Real-World ML Project (Capstone Project)


A capstone project allows you to apply ML skills to solve real-world problems.
We will develop a House Price Prediction Model using Linear Regression.

2.1. Problem Statement

)’c Predict house prices based on features such as area, number of bedrooms, location, etc.
c) ’ Use regression techniques to develop a predictive model.

2.2. Steps to Build the Project

Data Collection: Get real estate data


Data Preprocessing: Handle missing values, encoding
Exploratory Data Analysis (EDA): Visualize correlations
Feature Selection & Engineering: Choose important variables
Model Training: Train regression models

Page | 112
Model Evaluation: Check performance metrics
Deployment: Deploy the model using Flask/Streamlit

2.3. Dataset & Data Preprocessing

c)
’ Load house price dataset and clean it.

import pandas as pd

# Load dataset
df = pd.read_csv("house_prices.csv")

# Check for missing values


print(df.isnull().sum())

# Fill missing values


df.fillna(df.median(), inplace=True)

# Convert categorical columns into numerical


df = pd.get_dummies(df, columns=["location"], drop_first=True)

2.4. Exploratory Data Analysis (EDA)

c)
’ Visualizing correlations in data

import seaborn as sns


import matplotlib.pyplot as plt

# Correlation Heatmap
plt.figure(figsize=(10,6))
sns.heatmap(df.corr(), annot=True, cmap="coolwarm")
plt.show()

2.5. Model Training

c)
’ Train a Linear Regression Model

from sklearn.model_selection import train_test_split


from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, r2_score

# Select features and target


X = df[["area", "bedrooms", "bathrooms", "location_New York", "location_Los
Angeles"]]
y = df["price"]

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

# Train Model
model = LinearRegression()
model.fit(X_train, y_train)

Page | 113
# Predictions
y_pred = model.predict(X_test)

# Evaluation
print("MAE:", mean_absolute_error(y_test, y_pred))
print("R2 Score:", r2_score(y_test, y_pred))

2.6. Deploying the Model

c)
’ Deploy the trained model using Flask

from flask import Flask, request, jsonify


import pickle

app = Flask( name )

# Load saved model


model = pickle.load(open("house_price_model.pkl", "rb"))

@app.route("/predict", methods=["POST"])
def predict():
data = request.json
prediction = model.predict([[data["area"], data["bedrooms"],
data["bathrooms"],
data["location_New York"], data["location_Los
Angeles"]]])
return jsonify({"Predicted Price": prediction[0]})

if name == " main ":


app.run(debug=True)

3. Conclusion

⬛ Covered Advanced ML Algorithms (Ensemble Learning, Clustering, NLP, etc.)

⬛ Built a Real-World House Price Prediction Model

⬛ Performed Data Preprocessing, Model Training, and Deployment

Page | 114
ACTIVITY LOG FOR THE FIFTEENTH WEEK

Day Person In-


Brief description of the daily Learning Outcome Charges
nature
activity

Day–1 Machine Learning Engineer Students learns about


Role & Responsibilities Machine Learning
Engineer Role &
Responsibilities

Day-2 Building an ML Engineer Students learns how


Resume to build ML Engineer
Resume

Day–3 ML Engineer Interview Students learns common


Preparation interview questions

ML Engineer Interview
Day–4 Preparation Students learns about
interview questions

Day–5 Students learns about


Building an Online Portfolio Building an Online
(GitHub/Kaggle) Portfolio
(GitHub/Kaggle)
Students clear their doubts
regarding training program
Day–6 Student trainer communication
regarding training program, doubts
clarification

Page | 115
WEEKLY REPORT
WEEK–15(From Dt………..…..to Dt.................. )

Objective of the Activity Done:

Detailed Report:

Common interview questions

Building an ML engineer resume

ML engineer roles and responsibilities


Builds an online portfolio Github/Kaggle

Page | 116
Machine Learning Engineer: Role,
Responsibilities & Career Guide
1. Machine Learning Engineer Role & Responsibilities
A Machine Learning Engineer (MLE) is responsible for designing, building, and deploying
ML models to solve real-world problems. They work at the intersection of software
engineering, data science, and artificial intelligence.

1.1. Key Responsibilities


⬛ Data Collection & Preprocessing

 Gathering data from various sources (databases, APIs, web scraping)


 Cleaning and handling missing values
 Feature engineering & feature selection


⬛ Model Development & Training

 Selecting the right ML algorithm (regression, classification, clustering, NLP, etc.)


 Training models using libraries like Scikit-Learn, TensorFlow, PyTorch
 Hyperparameter tuning for model optimization


⬛ Model Evaluation & Validation

 Using performance metrics (accuracy, precision, recall, F1-score, RMSE)


 Handling overfitting and underfitting


⬛ Model Deployment & Monitoring

 Deploying ML models using Flask, FastAPI, Streamlit, or cloud platforms (AWS,


GCP, Azure)
 Monitoring model performance in production


⬛ Collaboration & Documentation

 Working with data scientists, software engineers, DevOps teams


 Writing clear documentation for ML models and APIs

2. Building a Machine Learning Engineer Resume


A strong MLE resume should highlight technical skills, projects, and relevant experience.
Page | 117
2.1. Key Sections in an ML Engineer Resume

c’) 1. Contact Information – Name, Email, LinkedIn, GitHub/Kaggle links


’)c 2. Summary Statement – Briefly describe your expertise in ML and software development
)
c ’ 3. Skills & Technologies

 Programming: Python, SQL, R


 ML Frameworks: TensorFlow, PyTorch, Scikit-Learn
 Cloud Services: AWS, GCP, Azure
 Big Data: Spark, Hadoop, Kafka
)’ c 4. Work Experience – Detail your role, projects, and contributions
’)c 5. Projects & Research – Highlight real-world ML projects with GitHub links
)c’ 6. Education & Certifications – Include degrees and courses from Coursera,
Udacity, Google ML

2.2. Sample Resume Summary

'•/¸
.
˙½7 Machine Learning Engineer with 3+ years of experience in developing and deploying
scalable ML models. Expertise in deep learning, NLP, and cloud-based ML solutions. Passionate
about solving business problems using AI-driven technologies.

3. ML Engineer Interview Preparation


Preparing for an ML Engineer interview requires strong knowledge of ML concepts, coding
skills, and system design.

3.1. Common ML Interview Topics


⬛ Machine Learning Basics – Supervised vs. Unsupervised Learning, Overfitting, Bias-
Variance Tradeoff

⬛ Deep Learning – CNNs, RNNs, Transformers, Attention Mechanisms

⬛ Mathematics & Statistics – Probability, Linear Algebra, Gradient Descent

⬛ Feature Engineering – Handling missing values, categorical encoding, dimensionality
reduction

⬛ Model Evaluation – Precision, Recall, AUC-ROC, RMSE, Log Loss

⬛ System Design – Building ML pipelines, scalable architectures

⬛ Deployment & MLOps – Docker, Kubernetes, Model Monitoring

3.2. ML Coding Challenges (Leetcode, HackerRank, Kaggle)

Example Question:
¸
/'
.
˙•7½ Implement a function that normalizes a given dataset using Min-Max Scaling
Page | 118
import numpy as np

def min_max_scaling(data):
return (data - np.min(data)) / (np.max(data) - np.min(data))

# Example Usage
data = np.array([10, 20, 30, 40, 50])
scaled_data = min_max_scaling(data)
print(scaled_data)


⬛ Practice ML coding questions on Leetcode (ML Section), HackerRank, Kaggle Notebooks

3.3. Behavioral Interview Questions

◆ Tell me about a challenging ML project you worked on.


◆ How do you handle bias in an ML model?
◆ Explain a time when you had to optimize an ML model for production.

4. Building an Online Portfolio (GitHub/Kaggle)


A strong online portfolio helps recruiters evaluate your practical skills.

4.1. What to Include in Your Portfolio?


✓ ML Projects with GitHub Repositories

⬛ Jupyter Notebooks & Kaggle Datasets

⬛ Blogs on ML topics (Medium, Dev.to, Hashnode)

⬛ Open-source contributions to ML libraries (Scikit-Learn, TensorFlow, Hugging Face)

4.2. Example GitHub ML Project Structure


˙
н House-Price-Prediction/
├── □
н̇ data/ # Raw & processed datasets
├── □
˙ notebooks/
н # Jupyter Notebooks
├── ˙

н models/ # Trained model files
˙
□ scripts/
├── н # Python scripts for training & inference
├── requirements.txt # Dependencies (pandas, sklearn, flask)
├── README.md # Project Overview & Instructions

’)c Tip: Keep your README well-documented with model results, graphs, and deployment
steps.

Page | 119
5. Summary & Next Steps
˙•
¸'./
7 ½ Role of an ML Engineer – Model building, deployment, and optimization

' /½˙•7 How to Build an ML Resume – Highlight skills, projects, and experience

' /½˙•7 ML Interview Preparation – Coding practice, ML theory, system design
¸ /'
.
˙•7½ Building an Online Portfolio – GitHub, Kaggle, Blogs, Open-source contributions

Page | 120
CHAPTER5: OUTCOMES DESCRIPTION

Describe the work environment you have experienced

At Epro academy and Glossary Softech all staff members and Management are supportive as
an Intern My role is to attend on time and maintain self-discipline and dedication towards of
Training program.

Epro academy and Glossary soft tech is best EdTech Company having Infrastructure for both
Online and offline Trainings. I have done good Projects with my co-Interns and Staff. I have
gone through the wonderful training experience.

Describe the real time technical skills you have acquired.

I have learned about machine learning topics with real time examples, python related to
machine learning curriculum and so many things.
I have done live hands-on Particles and implemented
examples of machine learning techniques I have made one real time project of House Price
Prediction Model using Linear Regression.

Describe the managerial skills you have acquired.

At Epro academy and Glossary Softech I have learned about team management and leadership
skills in order to complete the project.

Coordinating with my team and assigning roles for them is very challenging
task but with the support of my mentor and Team leader it will become easy to successfully
manage my team in order to reach our common goal.

Page | 121
Describe how you could improve your communication skills

At Epro academy and Glossary Softech I have improved my oral and written communication
skills by involving in real time projects and maintaining coordination with the team members it
improves my confidence level.at Epro while after taking suggestions from the seniors iam able
to control my anxiety.

At different levels of training, I have understood about my team’s’ strengths


and drawbacks with my leadership skills.

The main thing is understanding others, getting understood by others, extempore speech, ability
to articulate the key points, closing the conversation, maintaining niceties and protocols,
greeting, thanking and appreciating others, etc are I have learned at Epro academy. hence, I
have improvised myself in all aspects as an intern.)

Describe how could you enhance your abilities in group discussions, participation in teams,
contribution as a team member, leading a team/activity.

As an Intern In group discussion, I have learned how to Control voice and tone while discussing
with my team members and how to respect others’ opinions and thoughts about the Topic we are
discussing.

I have learned about how respect other team members Ideas and way of
convincing them on a right path. while doing project work, I have learned how to lead the team
from my faculty members.

Describe the technological developments you have observed and relevant to the subject area
of training (focus on digital technologies relevant to your job role)

While doing project work as an intern most of the things are done coding, In
Machine learning I have also learned about latest trends in software field job related knowledge.
I have learned about latest codes and lines of programs which makes very easy to made train a
model.

Page | 122
Page | 123
Page | 124
Student Self Evaluation of the Short-Term Internship

Student Name: Registration No:

Term of Internship: From: To:

Date of Evaluation:

Organization Name & Address:

Please rate your performance in the following areas:

Rating Scale: Letter grade of CGPA calculation to be provided

1 Oral communication 1 2 3 4 5
2 Written communication 1 2 3 4 5
3 Proactiveness 1 2 3 4 5
4 Interaction ability with community 1 2 3 4 5
5 Positive Attitude 1 2 3 4 5
6 Self-confidence 1 2 3 4 5
7 Ability to learn 1 2 3 4 5
8 Work Plan and organization 1 2 3 4 5
9 Professionalism 1 2 3 4 5
10 Creativity 1 2 3 4 5
11 Quality of work done 1 2 3 4 5
12 Time Management 1 2 3 4 5
13 Understanding the Community 1 2 3 4 5
14 Achievement of Desired Outcomes 1 2 3 4 5
15 OVERALLPERFORMANCE 1 2 3 4 5

Date: Signature of the Student

Page | 125
Evaluation by the Supervisor of the Intern Organization

Student Name: Registration No:

Term of Internship: From: To:

Date of Evaluation:

Organization Name & Address:

Name & Address of the Supervisor


with Mobile Number

Please rate the student’s performance in the following areas:

Please note that your evaluation shall be done independent of the Student’s self-
evaluation

Rating Scale:1 is lowest and 5 is highest rank

1 Oral communication 1 2 3 4 5
2 Written communication 1 2 3 4 5
3 Proactiveness 1 2 3 4 5
4 Interaction ability with community 1 2 3 4 5
5 Positive Attitude 1 2 3 4 5
6 Self-confidence 1 2 3 4 5
7 Ability to learn 1 2 3 4 5
8 Work Plan and organization 1 2 3 4 5
9 Professionalism 1 2 3 4 5
10 Creativity 1 2 3 4 5
11 Quality of work done 1 2 3 4 5
12 Time Management 1 2 3 4 5
13 Understanding the Community 1 2 3 4 5
14 Achievement of Desired Outcomes 1 2 3 4 5
15 OVERALL PERFORMANCE 1 2 3 4 5

Date: Signature of the Supervisor

Page | 126
Internal & External Evaluation for Semester Internship

Objectives:
 Explore career alternatives prior to graduation.
 To assess interests and abilities in the field of study.
 To develop communication, interpersonal and other critical skills in the future job.
 To acquire additional skills required for the world of work.
 To acquire employment contacts leading directly to a
full-time job following graduation from college.

Assessment Model:
 There shall be both internal evaluation and external evaluation
 The Faculty Guide assigned is in-charge of the learning activities of the students
and for the comprehensive and continuous assessment of the students.
 The assessment is to be conducted for 200 marks. Internal Evaluation for
50marksandExternalEvaluationfor150marks
 The number of credits assigned is 12. Later the marks shall be convertedinto
grades and grade points to include finally in the SGPA and CGPA.
 The weightings for Internal Evaluation shall be:
o Activity Log 10marks
o Internship Evaluation 30marks
o Oral Presentation 10marks
 The weightings for External Evaluation shall be:
o Internship Evaluation 100marks
o Viva-Voce 50marks
 The External Evaluation shall be conducted by an Evaluation Committee
comprising of the Principal, FacultyGuide, Internal Expert and External Expert
nominated by the affiliating University. The Evaluation Committee shall also
consider the grading given by the Supervisor of the Intern Organization.
 Activity Log is the record of the day-to-day activities. The Activity Log is
assessed on an individual basis, thus allowing for individual members within
groups to be assessed this way. The assessment will take into consideration

Page | 127
the individual student’s involvement in the assigned work.
 While evaluating the student’s Activity Log, the following shall be considered-
a. The individual student’s effort and commitment.
b. Theoriginalityandqualityoftheworkproducedbytheindividualstudent.
c. The student’s integration and co-operation with the work assigned.
d. The completeness of the Activity Log.
 The Internship Evaluation shall include the following components and
based on WEEKLY REPORTs and Outcomes Description
a. Description of the Work Environment.
b. RealTime Technical Skills acquired.
c. Managerial Skills acquired.
d. Improvement of Communication Skills.
e. Team Dynamics
f. Technological Developments recorded.

Page | 128
MARKSSTATEMENT
(To be used by the Examiners)

INTERNALASSESSMENTSTATEMENT

Name Of the Student:


Programme of Study:
Year of Study:
Group:
RegisterNo/H.T.No:
Name of the College:
University:

Sl.No Evaluation Criterion Maximum Marks


Marks Awarded

1. Activity Log 10
2. Internship Evaluation 30
3. Oral Presentation 10
GRANDTOTAL 50

Date: Signature of the Faculty Guide

Page | 129
EXTERNAL ASSESSMENT STATEMENT

Name Of the Student:


Programme of Study:
Year of Study:
Group:
Register No/H.T.No:
Name of the College:
University:

Maximum Marks
Sl.No Evaluation Criterion Marks Awarded

1. Internship Evaluation 80
For the grading giving by the Supervisor of the I
2. 20
intern Organization
3. Viva-Voce 50
TOTAL 150
GRANDTOTAL(EXT.50M+INT.100M) 200

Signature of the Faculty Guide

Signature of the Internal Expert

Signature of the External Expert

Signature of the Principal with Seal

Page | 130

You might also like