0% found this document useful (0 votes)

18 views27 pages

Session-11 Machine Learning

The document discusses the differences between linear regression and classification models, emphasizing the use of logistic regression for classifying categorical data. It explains the logistic regression process, including the use of the sigmoid function to ensure probability values are between 0 and 1, and outlines various metrics for evaluating classification models such as accuracy, precision, recall, and F1 score. Additionally, it introduces a practical example of predicting diabetes using a dataset.

Uploaded by

shrikantrathod9663

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views27 pages

Session-11 Machine Learning

Uploaded by

shrikantrathod9663

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

2/2/24, 7:29 PM Session-11 Machine Learning - Jupyter Notebook

Introduction
In linear regression, the type of data we deal with is quantitative, whereas we use
classification models to deal with qualitative data or categorical data. The algorithms used
for solving a classification problem first predict the probability of each of the categories of
the qualitative variables, as the basis for making the classification. And, as the probabilities
are continuous numbers, classification using probabilities also behave like regression
methods. Logistic regression is one such type of classification model which is used to
classify the dependent variable into two or more classes or categories.

Why don’t we use Linear regression for classification problems?

Let’s suppose you took a survey and noted the response of each person as satisfied, neutral
or Not satisfied. Let’s map each category:

Satisfied – 2

Neutral – 1

Not Satisfied – 0

But this doesn’t mean that the gap between Not satisfied and Neutral is same as Neutral
and satisfied. There is no mathematical significance of these mapping. We can also map the
categories like:

Satisfied – 0

Neutral – 1

Not Satisfied – 2

It’s completely fine to choose the above mapping. If we apply linear regression to both the
type of mappings, we will get different sets of predictions. Also, we can get prediction values
like 1.2, 0.8, 2.3 etc. which makes no sense for categorical values. So, there is no normal
method to convert qualitative data into quantitative data for use in linear regression.
Although, for binary classification, i.e. when there only two categorical values, using the
least square method can give decent results. Suppose we have two categories Black and
White and we map them as follows:

Black – 0

White - 1

We can assign predicted values for both the categories such as Y> 0.5 goes to class white
and vice versa. Although, there will be some predictions for which the value can be greater
than 1 or less than 0 making them hard to classify in any class. Nevertheless, linear
regression can work decently for binary classification but not that well for multi-class
classification. Hence, we use classification methods for dealing with such problems.

localhost:8888/notebooks/Downloads/Session-11 Machine Learning.ipynb 1/27

2/2/24, 7:29 PM Session-11 Machine Learning - Jupyter Notebook

Logistic Regression
Logistic regression is one such regression algorithm which can be used for performing
classification problems. It calculates the probability that a given value belongs to a specific
class. If the probability is more than 50%, it assigns the value in that particular class else if
the probability is less than 50%, the value is assigned to the other class. Therefore, we can
say that logistic regression acts as a binary classifier.

Working of a Logistic Model

For linear regression, the model is defined by: 𝑦 = 𝛽0 + 𝛽1 𝑥 - (i)

and for logistic regression, we calculate probability, i.e. y is the probability of a given variable
x belonging to a certain class. Thus, it is obvious that the value of y should lie between 0
and 1.

But, when we use equation(i) to calculate probability, we would get values less than 0 as
well as greater than 1. That doesn’t make any sense . So, we need to use such an equation
which always gives values between 0 and 1, as we desire while calculating the probability.

Sigmoid function

We use the sigmoid function as the underlying function in Logistic regression.

Mathematically and graphically.

Why do we use the Sigmoid Function?

1. The sigmoid function’s range is bounded between 0 and 1. Thus it’s useful in calculating
the probability for the Logistic function.
2. It’s derivative is easy to calculate than other functions which is useful during gradient
descent calculation.
3. It is a simple way of introducing non-linearity to the model.

Although there are other functions as well, which can be used, but sigmoid is the most
common function used for logistic regression. We will talk about the rest of the functions in
the neural network section.

Evaluation of a Classification Model

In machine learning, once we have a result of the classification problem, how do we
measure how accurate our classification is? For a regression problem, we have different
metrics like R Squared score, Mean Squared Error etc. what are the metrics to measure the
credibility of a classification model?

Metrics In a regression problem, the accuracy is generally measured in terms of the

difference in the actual values and the predicted values. In a classification problem, the
credibility of the model is measured using the confusion matrix generated, i.e., how
accurately the true positives and true negatives were predicted. The different metrics used
for this purpose are:

Accuracy
Recall

localhost:8888/notebooks/Downloads/Session-11 Machine Learning.ipynb 2/27

2/2/24, 7:29 PM Session-11 Machine Learning - Jupyter Notebook

Precision
F1 Score
Specifity
AUC( Area Under the Curve)
ROC(Receiver Operator Characteristic)
Classification Report

Confusion Matrix
Where the terms have the meaning:

 True Positive(TP): A result that was predicted as positive by the classification model and
also is positive

 True Negative(TN): A result that was predicted as negative by the classification model
and also is negative

 False Positive(FP): A result that was predicted as positive by the classification model but
actually is negative

 False Negative(FN): A result that was predicted as negative by the classification model
but actually is positive.

The Credibility of the model is based on how many correct predictions did the model do.

Accuracy
The mathematical formula is :

(𝑇𝑃+𝑇𝑁)
Accuracy= (𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁)
Or, it can be said that it’s defined as the total number of correct classifications divided by the
total number of classifications. Its is not the correct for inbalanc data beacause its always
show you high accurancy becoz its bais to the high count data in binary classification becoz
its not calculate the error / its won't count the error

Recall or Sensitivity

The mathematical formula is:

𝑇𝑃
Recall= (𝑇𝑃+𝐹𝑁)
Or, as the name suggests, it is a measure of: from the total number of positive results how
many positives were correctly predicted by the model.

It shows how relevant the model is, in terms of positive results only.

Consider a classification model , the model gave 50 correct predictions(TP) but failed to
identify 200 cancer patients(FN). Recall in that case will be:

50
Recall= (50+200) = 0.2 (The model was able to recall only 20% of the cancer patients)

localhost:8888/notebooks/Downloads/Session-11 Machine Learning.ipynb 3/27

2/2/24, 7:29 PM Session-11 Machine Learning - Jupyter Notebook

Precision
Precision is a measure of amongst all the positive predictions, how many of them were
actually positive. Mathematically,
𝑇𝑃
Precision= (𝑇𝑃+𝐹𝑃)
Let’s suppose in the previous example, the model identified 50 people as cancer
patients(TP) but also raised a false alarm for 100 patients(FP). Hence,

50
Precision= (50+100) =0.33 (The model only has a precision of 33%)
But we have a problem!!
As evident from the previous example, the model had a very high Accuracy but performed
poorly in terms of Precision and Recall. So, necessarily Accuracy is not the metric to use for
evaluating the model in this case.

Imagine a scenario, where the requirement was that the model recalled all the defaulters
who did not pay back the loan. Suppose there were 10 such defaulters and to recall those
10 defaulters, and the model gave you 20 results out of which only the 10 are the actual
defaulters. Now, the recall of the model is 100%, but the precision goes down to 50%.

F1 Score
From the previous examples, it is clear that we need a metric that considers both Precision
and Recall for evaluating a model. One such metric is the F1 score.

F1 score is defined as the harmonic mean of Precision and Recall.

2∗((𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛∗𝑅𝑒𝑐𝑎𝑙𝑙)
The mathematical formula is: F1 score= (𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+𝑅𝑒𝑐𝑎𝑙𝑙))
Specificity or True Negative Rate
This represents how specific is the model while predicting the True Negatives.
Mathematically,

𝑇𝑁
Specificity= (𝑇𝑁+𝐹𝑃) Or, it can be said that it quantifies the total number of negatives
predicted by the model with respect to the total number of actual negative or non favorable
outcomes.

𝐹𝑃
Similarly, False Positive rate can be defined as: (1- specificity) Or, (𝑇𝑁+𝐹𝑃)
ROC(Receiver Operator Characteristic)
We know that the classification algorithms work on the concept of probability of occurrence
of the possible outcomes. A probability value lies between 0 and 1. Zero means that there is
no probability of occurrence and one means that the occurrence is certain.

But while working with real-time data, it has been observed that we seldom get a perfect 0
or 1 value. Instead of that, we get different decimal values lying between 0 and 1. Now the
question is if we are not getting binary probability values how are we actually determining
the class in our classification problem?

localhost:8888/notebooks/Downloads/Session-11 Machine Learning.ipynb 4/27

2/2/24, 7:29 PM Session-11 Machine Learning - Jupyter Notebook

There comes the concept of Threshold. A threshold is set, any probability value below the
threshold is a negative outcome, and anything more than the threshold is a favourable or the
positive outcome. For Example, if the threshold is 0.5, any probability value below 0.5
means a negative or an unfavourable outcome and any value above 0.5 indicates a positive
or favourable outcome.

Now, the question is, what should be an ideal threshold?

The horizontal lines represent the various values of thresholds ranging from 0 to 1.
Let’s suppose our classification problem was to identify the obese people from the
given data.
The green markers represent obese people and the red markers represent the non-
obese people.
Our confusion matrix will depend on the value of the threshold chosen by us.
For Example, if 0.25 is the threshold then TP(actually obese)=3 TN(Not obese)=2
FP(Not obese but predicted obese)=2(the two red squares above the 0.25 line)
FN(Obese but predicted as not obese )=1(Green circle below 0.25line )

business case: to predict weather a a patient

will have a diaabetes or not
In [78]: # libraries
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

In [5]: data = pd.read_csv('diabetes.csv')

In [6]: data

Out[6]:
Pregnancies Glucose BloodPressure SkinThickness Insulin BMI DiabetesPedigreeFun

0 6 148 72 35 0 33.6

1 1 85 66 29 0 26.6

2 8 183 64 0 0 23.3

3 1 89 66 23 94 28.1

4 0 137 40 35 168 43.1

... ... ... ... ... ... ...

763 10 101 76 48 180 32.9

764 2 122 70 27 0 36.8

765 5 121 72 23 112 26.2

766 1 126 60 0 0 30.1

767 1 93 70 31 0 30.4

768 rows × 9 columns

 

localhost:8888/notebooks/Downloads/Session-11 Machine Learning.ipynb 5/27

2/2/24, 7:29 PM Session-11 Machine Learning - Jupyter Notebook

In [7]: data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 768 entries, 0 to 767
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Pregnancies 768 non-null int64
1 Glucose 768 non-null int64
2 BloodPressure 768 non-null int64
3 SkinThickness 768 non-null int64
4 Insulin 768 non-null int64
5 BMI 768 non-null float64
6 DiabetesPedigreeFunction 768 non-null float64
7 Age 768 non-null int64
8 Outcome 768 non-null int64
dtypes: float64(2), int64(7)
memory usage: 54.1 KB

In [10]: data.describe()

Out[10]:
Pregnancies Glucose BloodPressure SkinThickness Insulin BMI Diab

count 768.000000 768.000000 768.000000 768.000000 768.000000 768.000000

mean 3.845052 120.894531 69.105469 20.536458 79.799479 31.992578

std 3.369578 31.972618 19.355807 15.952218 115.244002 7.884160

min 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000

25% 1.000000 99.000000 62.000000 0.000000 0.000000 27.300000

50% 3.000000 117.000000 72.000000 23.000000 30.500000 32.000000

75% 6.000000 140.250000 80.000000 32.000000 127.250000 36.600000

max 17.000000 199.000000 122.000000 99.000000 846.000000 67.100000

 

EDA
In [12]: # Univariate analysis

localhost:8888/notebooks/Downloads/Session-11 Machine Learning.ipynb 6/27

2/2/24, 7:29 PM Session-11 Machine Learning - Jupyter Notebook

In [13]: !pip install sweetviz

Requirement already satisfied: sweetviz in c:\users\shrik\anaconda3\lib\si

te-packages (2.3.0)
Requirement already satisfied: pandas!=1.0.0,!=1.0.1,!=1.0.2,>=0.25.3 in
c:\users\shrik\anaconda3\lib\site-packages (from sweetviz) (2.0.3)
Requirement already satisfied: numpy>=1.16.0 in c:\users\shrik\anaconda3\l
ib\site-packages (from sweetviz) (1.24.3)
Requirement already satisfied: matplotlib>=3.1.3 in c:\users\shrik\anacond
a3\lib\site-packages (from sweetviz) (3.7.2)
Requirement already satisfied: tqdm>=4.43.0 in c:\users\shrik\anaconda3\li
b\site-packages (from sweetviz) (4.65.0)
Requirement already satisfied: scipy>=1.3.2 in c:\users\shrik\anaconda3\li
b\site-packages (from sweetviz) (1.11.1)
Requirement already satisfied: jinja2>=2.11.1 in c:\users\shrik\anaconda3
\lib\site-packages (from sweetviz) (3.1.2)
Requirement already satisfied: importlib-resources>=1.2.0 in c:\users\shri
k\anaconda3\lib\site-packages (from sweetviz) (6.1.1)
Requirement already satisfied: MarkupSafe>=2.0 in c:\users\shrik\anaconda3
\lib\site-packages (from jinja2>=2.11.1->sweetviz) (2.1.1)
Requirement already satisfied: contourpy>=1.0.1 in c:\users\shrik\anaconda
3\lib\site-packages (from matplotlib>=3.1.3->sweetviz) (1.0.5)
Requirement already satisfied: cycler>=0.10 in c:\users\shrik\anaconda3\li
b\site-packages (from matplotlib>=3.1.3->sweetviz) (0.11.0)
Requirement already satisfied: fonttools>=4.22.0 in c:\users\shrik\anacond
a3\lib\site-packages (from matplotlib>=3.1.3->sweetviz) (4.25.0)
Requirement already satisfied: kiwisolver>=1.0.1 in c:\users\shrik\anacond
a3\lib\site-packages (from matplotlib>=3.1.3->sweetviz) (1.4.4)
Requirement already satisfied: packaging>=20.0 in c:\users\shrik\anaconda3
\lib\site-packages (from matplotlib>=3.1.3->sweetviz) (23.1)
Requirement already satisfied: pillow>=6.2.0 in c:\users\shrik\anaconda3\l
ib\site-packages (from matplotlib>=3.1.3->sweetviz) (9.4.0)
Requirement already satisfied: pyparsing<3.1,>=2.3.1 in c:\users\shrik\ana
conda3\lib\site-packages (from matplotlib>=3.1.3->sweetviz) (3.0.9)
Requirement already satisfied: python-dateutil>=2.7 in c:\users\shrik\anac
onda3\lib\site-packages (from matplotlib>=3.1.3->sweetviz) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in c:\users\shrik\anaconda3\li
b\site-packages (from pandas!=1.0.0,!=1.0.1,!=1.0.2,>=0.25.3->sweetviz) (2
023.3.post1)
Requirement already satisfied: tzdata>=2022.1 in c:\users\shrik\anaconda3
\lib\site-packages (from pandas!=1.0.0,!=1.0.1,!=1.0.2,>=0.25.3->sweetviz)
(2023.3)
Requirement already satisfied: colorama in c:\users\shrik\anaconda3\lib\si
te-packages (from tqdm>=4.43.0->sweetviz) (0.4.6)
Requirement already satisfied: six>=1.5 in c:\users\shrik\anaconda3\lib\si
te-packages (from python-dateutil>=2.7->matplotlib>=3.1.3->sweetviz) (1.1
6.0)

In [14]: import sweetviz as sv

my_report = sv.analyze(data)
my_report.show_html()

Done! Use 'show' commands to display/save. [100%] 00:01 -> (00:00 left)

Report SWEETVIZ_REPORT.html was generated! NOTEBOOK/COLAB USERS: the web b

rowser MAY not pop up, regardless, the report IS saved in your notebook/co
lab files.

localhost:8888/notebooks/Downloads/Session-11 Machine Learning.ipynb 7/27

2/2/24, 7:29 PM Session-11 Machine Learning - Jupyter Notebook

In [15]: data.head()

Out[15]:
Pregnancies Glucose BloodPressure SkinThickness Insulin BMI DiabetesPedigreeFunct

0 6 148 72 35 0 33.6 0.6

1 1 85 66 29 0 26.6 0.3

2 8 183 64 0 0 23.3 0.6

3 1 89 66 23 94 28.1 0.

4 0 137 40 35 168 43.1 2.2

 

In [16]: # bi variate analysis

In [17]: sns.countplot(x= data.Pregnancies, hue = data.Outcome)

plt.show()

In [18]: # missing values

localhost:8888/notebooks/Downloads/Session-11 Machine Learning.ipynb 8/27

2/2/24, 7:29 PM Session-11 Machine Learning - Jupyter Notebook

In [19]: data.isnull().sum()

Out[19]: Pregnancies 0
Glucose 0
BloodPressure 0
SkinThickness 0
Insulin 0
BMI 0
DiabetesPedigreeFunction 0
Age 0
Outcome 0
dtype: int64

In [20]: data.loc[data['BMI']==0]

Out[20]:
Pregnancies Glucose BloodPressure SkinThickness Insulin BMI DiabetesPedigreeFun

9 8 125 96 0 0 0.0

49 7 105 0 0 0 0.0

60 2 84 0 0 0 0.0

81 2 74 0 0 0 0.0

145 0 102 75 23 0 0.0

371 0 118 64 23 89 0.0

426 0 94 0 0 0 0.0

494 3 80 0 0 0 0.0

522 6 114 0 0 0 0.0

684 5 136 82 0 0 0.0

706 10 115 0 0 0 0.0

 

In [21]: data['BMI'].mean()

Out[21]: 31.992578124999998

In [22]: # replacing 0 values with either mean or meadian

data['BMI'] = data['BMI'].replace(0, data['BMI'].mean())
data['Glucose'] = data['Glucose'].replace(0, data['Glucose'].mean())
data['BloodPressure'] = data['BloodPressure'].replace(0, data['BloodPressure
data['SkinThickness'] = data['SkinThickness'].replace(0, data['SkinThickness
data['Insulin'] = data['Insulin'].replace(0, data['Insulin'].mean())

localhost:8888/notebooks/Downloads/Session-11 Machine Learning.ipynb 9/27

2/2/24, 7:29 PM Session-11 Machine Learning - Jupyter Notebook

In [23]: data.describe()

Out[23]:
Pregnancies Glucose BloodPressure SkinThickness Insulin BMI Diab

count 768.000000 768.000000 768.000000 768.000000 768.000000 768.000000

mean 3.845052 121.681605 72.254807 26.606479 118.660163 32.450805

std 3.369578 30.436016 12.115932 9.631241 93.080358 6.875374

min 0.000000 44.000000 24.000000 7.000000 14.000000 18.200000

25% 1.000000 99.750000 64.000000 20.536458 79.799479 27.500000

50% 3.000000 117.000000 72.000000 23.000000 79.799479 32.000000

75% 6.000000 140.250000 80.000000 32.000000 127.250000 36.600000

max 17.000000 199.000000 122.000000 99.000000 846.000000 67.100000

 

In [24]: data

Out[24]:
Pregnancies Glucose BloodPressure SkinThickness Insulin BMI DiabetesPedigree

0 6 148.0 72.0 35.000000 79.799479 33.6

1 1 85.0 66.0 29.000000 79.799479 26.6

2 8 183.0 64.0 20.536458 79.799479 23.3

3 1 89.0 66.0 23.000000 94.000000 28.1

4 0 137.0 40.0 35.000000 168.000000 43.1

... ... ... ... ... ... ...

763 10 101.0 76.0 48.000000 180.000000 32.9

764 2 122.0 70.0 27.000000 79.799479 36.8

765 5 121.0 72.0 23.000000 112.000000 26.2

766 1 126.0 60.0 20.536458 79.799479 30.1

767 1 93.0 70.0 31.000000 79.799479 30.4

768 rows × 9 columns

 

outliers
In [30]: data.columns
data1 = ['Pregnancies', 'Glucose', 'BloodPressure', 'SkinThickness', 'Insuli
'BMI', 'DiabetesPedigreeFunction', 'Age', 'Outcome']

localhost:8888/notebooks/Downloads/Session-11 Machine Learning.ipynb 10/27

2/2/24, 7:29 PM Session-11 Machine Learning - Jupyter Notebook

In [32]: plt.figure(figsize=(20,25), facecolor= 'White')

plotnumber =1

for column in data:
if plotnumber <= 9:
ax = plt.subplot(3,3, plotnumber)
sns.boxplot(data[column])
plt.xlabel(column, fontsize = 20)
plotnumber +=1
plt.show()

feature selection

localhost:8888/notebooks/Downloads/Session-11 Machine Learning.ipynb 11/27

2/2/24, 7:29 PM Session-11 Machine Learning - Jupyter Notebook

In [39]: tc = data.drop('Outcome', axis =1)

sns.heatmap(tc.corr(), annot = True)
plt.show()

In [40]: # Model Creation

localhost:8888/notebooks/Downloads/Session-11 Machine Learning.ipynb 12/27

2/2/24, 7:29 PM Session-11 Machine Learning - Jupyter Notebook

In [41]: data

Out[41]:
Pregnancies Glucose BloodPressure SkinThickness Insulin BMI DiabetesPedigree

0 6 148.0 72.0 35.000000 79.799479 33.6

1 1 85.0 66.0 29.000000 79.799479 26.6

2 8 183.0 64.0 20.536458 79.799479 23.3

3 1 89.0 66.0 23.000000 94.000000 28.1

4 0 137.0 40.0 35.000000 168.000000 43.1

... ... ... ... ... ... ...

763 10 101.0 76.0 48.000000 180.000000 32.9

764 2 122.0 70.0 27.000000 79.799479 36.8

765 5 121.0 72.0 23.000000 112.000000 26.2

766 1 126.0 60.0 20.536458 79.799479 30.1

767 1 93.0 70.0 31.000000 79.799479 30.4

768 rows × 9 columns

 

In [42]: X = data.drop('Outcome', axis=1)

y = data['Outcome']

In [43]: X

Out[43]:
Pregnancies Glucose BloodPressure SkinThickness Insulin BMI DiabetesPedigree

0 6 148.0 72.0 35.000000 79.799479 33.6

1 1 85.0 66.0 29.000000 79.799479 26.6

2 8 183.0 64.0 20.536458 79.799479 23.3

3 1 89.0 66.0 23.000000 94.000000 28.1

4 0 137.0 40.0 35.000000 168.000000 43.1

... ... ... ... ... ... ...

763 10 101.0 76.0 48.000000 180.000000 32.9

764 2 122.0 70.0 27.000000 79.799479 36.8

765 5 121.0 72.0 23.000000 112.000000 26.2

766 1 126.0 60.0 20.536458 79.799479 30.1

767 1 93.0 70.0 31.000000 79.799479 30.4

768 rows × 8 columns

 

localhost:8888/notebooks/Downloads/Session-11 Machine Learning.ipynb 13/27

2/2/24, 7:29 PM Session-11 Machine Learning - Jupyter Notebook

In [44]: # scale the individual features

# 1. min-max scaler
# 2. standard scaler

In [45]: from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

In [46]: X_scaled

Out[46]: array([[ 0.63994726, 0.86527574, -0.0210444 , ..., 0.16725546,

0.46849198, 1.4259954 ],
[-0.84488505, -1.20598931, -0.51658286, ..., -0.85153454,
-0.36506078, -0.19067191],
[ 1.23388019, 2.01597855, -0.68176235, ..., -1.33182125,
0.60439732, -0.10558415],
...,
[ 0.3429808 , -0.02240928, -0.0210444 , ..., -0.90975111,
-0.68519336, -0.27575966],
[-0.84488505, 0.14197684, -1.01212132, ..., -0.34213954,
-0.37110101, 1.17073215],
[-0.84488505, -0.94297153, -0.18622389, ..., -0.29847711,
-0.47378505, -0.87137393]])

In [47]: from sklearn.model_selection import train_test_split

X_train, x_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0

In [48]: from sklearn.linear_model import LogisticRegression

log_reg = LogisticRegression()
log_reg.fit(X_train, y_train)

Out[48]: ▾ LogisticRegression
LogisticRegression()

In [50]: y_train_pred = log_reg.predict(X_train)

localhost:8888/notebooks/Downloads/Session-11 Machine Learning.ipynb 14/27

2/2/24, 7:29 PM Session-11 Machine Learning - Jupyter Notebook

In [51]: y_train_pred

Out[51]: array([1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1,
0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1,
0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0,
0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0,
0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0,
0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0,
0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,
0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0,
1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1,
0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1,
0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0,
1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1,
1, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0,
1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0,
0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0,
0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0,
0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1,
0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0,
0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1,
0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0,
1, 0, 0, 0], dtype=int64)

In [52]: y_pred = log_reg.predict(x_test)

In [53]: y_pred

Out[53]: array([0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0,
0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1,
0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0,
0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0,
0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0,
0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0,
0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,
0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1,
0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0], dtype=int64)

In [55]: y_pred.shape

Out[55]: (192,)

localhost:8888/notebooks/Downloads/Session-11 Machine Learning.ipynb 15/27

2/2/24, 7:29 PM Session-11 Machine Learning - Jupyter Notebook

In [56]: data

Out[56]:
Pregnancies Glucose BloodPressure SkinThickness Insulin BMI DiabetesPedigree

0 6 148.0 72.0 35.000000 79.799479 33.6

1 1 85.0 66.0 29.000000 79.799479 26.6

2 8 183.0 64.0 20.536458 79.799479 23.3

3 1 89.0 66.0 23.000000 94.000000 28.1

4 0 137.0 40.0 35.000000 168.000000 43.1

... ... ... ... ... ... ...

763 10 101.0 76.0 48.000000 180.000000 32.9

764 2 122.0 70.0 27.000000 79.799479 36.8

765 5 121.0 72.0 23.000000 112.000000 26.2

766 1 126.0 60.0 20.536458 79.799479 30.1

767 1 93.0 70.0 31.000000 79.799479 30.4

768 rows × 9 columns

 

In [57]: # evaluating logistic regression model

In [77]: from sklearn.metrics import (accuracy_score, confusion_matrix, precision_sco

recall_score, f1_score, roc_auc_score, classification_report)

In [59]: accuracy = accuracy_score(y_test, y_pred)

accuracy

Out[59]: 0.78125

In [60]: precision = precision_score(y_test, y_pred)

In [61]: precision

Out[61]: 0.7708333333333334

In [63]: recall = recall_score(y_test, y_pred)

In [64]: recall

Out[64]: 0.5441176470588235

In [66]: f1_score= f1_score(y_test, y_pred)

In [67]: f1_score

Out[67]: 0.6379310344827587

localhost:8888/notebooks/Downloads/Session-11 Machine Learning.ipynb 16/27

2/2/24, 7:29 PM Session-11 Machine Learning - Jupyter Notebook

In [68]: pd.crosstab(y_test, y_pred)

Out[68]:
col_0 0 1

Outcome

0 113 11

1 31 37

In [70]: auc = roc_auc_score(y_test, y_pred)

In [71]: auc

Out[71]: 0.7277039848197343

In [73]: report = classification_report(y_test,y_pred)

In [75]: print(report)

precision recall f1-score support

0 0.78 0.91 0.84 124

1 0.77 0.54 0.64 68

accuracy 0.78 192

macro avg 0.78 0.73 0.74 192
weighted avg 0.78 0.78 0.77 192

Multiclass classification
In [80]: import pandas as pd
import seaborn as sns

In [81]: data = sns.load_dataset('iris')

localhost:8888/notebooks/Downloads/Session-11 Machine Learning.ipynb 17/27

2/2/24, 7:29 PM Session-11 Machine Learning - Jupyter Notebook

In [82]: data

Out[82]:
sepal_length sepal_width petal_length petal_width species

0 5.1 3.5 1.4 0.2 setosa

1 4.9 3.0 1.4 0.2 setosa

2 4.7 3.2 1.3 0.2 setosa

3 4.6 3.1 1.5 0.2 setosa

4 5.0 3.6 1.4 0.2 setosa

... ... ... ... ... ...

145 6.7 3.0 5.2 2.3 virginica

146 6.3 2.5 5.0 1.9 virginica

147 6.5 3.0 5.2 2.0 virginica

148 6.2 3.4 5.4 2.3 virginica

149 5.9 3.0 5.1 1.8 virginica

150 rows × 5 columns

In [84]: data.species.unique()

Out[84]: array(['setosa', 'versicolor', 'virginica'], dtype=object)

In [85]: data.species.value_counts()

Out[85]: species
setosa 50
versicolor 50
virginica 50
Name: count, dtype: int64

In [91]: X = data.loc[:,['sepal_length', 'sepal_width', 'petal_length', 'petal_width

y = data.species

localhost:8888/notebooks/Downloads/Session-11 Machine Learning.ipynb 18/27

2/2/24, 7:29 PM Session-11 Machine Learning - Jupyter Notebook

In [90]: X

Out[90]:
sepal_length sepal_width petal_length petal_width

0 5.1 3.5 1.4 0.2

1 4.9 3.0 1.4 0.2

2 4.7 3.2 1.3 0.2

3 4.6 3.1 1.5 0.2

4 5.0 3.6 1.4 0.2

... ... ... ... ...

145 6.7 3.0 5.2 2.3

146 6.3 2.5 5.0 1.9

147 6.5 3.0 5.2 2.0

148 6.2 3.4 5.4 2.3

149 5.9 3.0 5.1 1.8

150 rows × 4 columns

In [92]: y

Out[92]: 0 setosa
1 setosa
2 setosa
3 setosa
4 setosa
...
145 virginica
146 virginica
147 virginica
148 virginica
149 virginica
Name: species, Length: 150, dtype: object

In [93]: from sklearn.preprocessing import StandardScaler

sc = StandardScaler()
X_new = sc.fit_transform(X)

localhost:8888/notebooks/Downloads/Session-11 Machine Learning.ipynb 19/27

2/2/24, 7:29 PM Session-11 Machine Learning - Jupyter Notebook

In [94]: X_new

localhost:8888/notebooks/Downloads/Session-11 Machine Learning.ipynb 20/27

2/2/24, 7:29 PM Session-11 Machine Learning - Jupyter Notebook

Out[94]: array([[-9.00681170e-01, 1.01900435e+00, -1.34022653e+00,

-1.31544430e+00],
[-1.14301691e+00, -1.31979479e-01, -1.34022653e+00,
-1.31544430e+00],
[-1.38535265e+00, 3.28414053e-01, -1.39706395e+00,
-1.31544430e+00],
[-1.50652052e+00, 9.82172869e-02, -1.28338910e+00,
-1.31544430e+00],
[-1.02184904e+00, 1.24920112e+00, -1.34022653e+00,
-1.31544430e+00],
[-5.37177559e-01, 1.93979142e+00, -1.16971425e+00,
-1.05217993e+00],
[-1.50652052e+00, 7.88807586e-01, -1.34022653e+00,
-1.18381211e+00],
[-1.02184904e+00, 7.88807586e-01, -1.28338910e+00,
-1.31544430e+00],
[-1.74885626e+00, -3.62176246e-01, -1.34022653e+00,
-1.31544430e+00],
[-1.14301691e+00, 9.82172869e-02, -1.28338910e+00,
-1.44707648e+00],
[-5.37177559e-01, 1.47939788e+00, -1.28338910e+00,
-1.31544430e+00],
[-1.26418478e+00, 7.88807586e-01, -1.22655167e+00,
-1.31544430e+00],
[-1.26418478e+00, -1.31979479e-01, -1.34022653e+00,
-1.44707648e+00],
[-1.87002413e+00, -1.31979479e-01, -1.51073881e+00,
-1.44707648e+00],
[-5.25060772e-02, 2.16998818e+00, -1.45390138e+00,
-1.31544430e+00],
[-1.73673948e-01, 3.09077525e+00, -1.28338910e+00,
-1.05217993e+00],
[-5.37177559e-01, 1.93979142e+00, -1.39706395e+00,
-1.05217993e+00],
[-9.00681170e-01, 1.01900435e+00, -1.34022653e+00,
-1.18381211e+00],
[-1.73673948e-01, 1.70959465e+00, -1.16971425e+00,
-1.18381211e+00],
[-9.00681170e-01, 1.70959465e+00, -1.28338910e+00,
-1.18381211e+00],
[-5.37177559e-01, 7.88807586e-01, -1.16971425e+00,
-1.31544430e+00],
[-9.00681170e-01, 1.47939788e+00, -1.28338910e+00,
-1.05217993e+00],
[-1.50652052e+00, 1.24920112e+00, -1.56757623e+00,
-1.31544430e+00],
[-9.00681170e-01, 5.58610819e-01, -1.16971425e+00,
-9.20547742e-01],
[-1.26418478e+00, 7.88807586e-01, -1.05603939e+00,
-1.31544430e+00],
[-1.02184904e+00, -1.31979479e-01, -1.22655167e+00,
-1.31544430e+00],
[-1.02184904e+00, 7.88807586e-01, -1.22655167e+00,
-1.05217993e+00],
[-7.79513300e-01, 1.01900435e+00, -1.28338910e+00,
-1.31544430e+00],
[-7.79513300e-01, 7.88807586e-01, -1.34022653e+00,
-1.31544430e+00],
[-1.38535265e+00, 3.28414053e-01, -1.22655167e+00,
-1.31544430e+00],
[-1.26418478e+00, 9.82172869e-02, -1.22655167e+00,
localhost:8888/notebooks/Downloads/Session-11 Machine Learning.ipynb 21/27
2/2/24, 7:29 PM Session-11 Machine Learning - Jupyter Notebook
-1.31544430e+00],
[-5.37177559e-01, 7.88807586e-01, -1.28338910e+00,
-1.05217993e+00],
[-7.79513300e-01, 2.40018495e+00, -1.28338910e+00,
-1.44707648e+00],
[-4.16009689e-01, 2.63038172e+00, -1.34022653e+00,
-1.31544430e+00],
[-1.14301691e+00, 9.82172869e-02, -1.28338910e+00,
-1.31544430e+00],
[-1.02184904e+00, 3.28414053e-01, -1.45390138e+00,
-1.31544430e+00],
[-4.16009689e-01, 1.01900435e+00, -1.39706395e+00,
-1.31544430e+00],
[-1.14301691e+00, 1.24920112e+00, -1.34022653e+00,
-1.44707648e+00],
[-1.74885626e+00, -1.31979479e-01, -1.39706395e+00,
-1.31544430e+00],
[-9.00681170e-01, 7.88807586e-01, -1.28338910e+00,
-1.31544430e+00],
[-1.02184904e+00, 1.01900435e+00, -1.39706395e+00,
-1.18381211e+00],
[-1.62768839e+00, -1.74335684e+00, -1.39706395e+00,
-1.18381211e+00],
[-1.74885626e+00, 3.28414053e-01, -1.39706395e+00,
-1.31544430e+00],
[-1.02184904e+00, 1.01900435e+00, -1.22655167e+00,
-7.88915558e-01],
[-9.00681170e-01, 1.70959465e+00, -1.05603939e+00,
-1.05217993e+00],
[-1.26418478e+00, -1.31979479e-01, -1.34022653e+00,
-1.18381211e+00],
[-9.00681170e-01, 1.70959465e+00, -1.22655167e+00,
-1.31544430e+00],
[-1.50652052e+00, 3.28414053e-01, -1.34022653e+00,
-1.31544430e+00],
[-6.58345429e-01, 1.47939788e+00, -1.28338910e+00,
-1.31544430e+00],
[-1.02184904e+00, 5.58610819e-01, -1.34022653e+00,
-1.31544430e+00],
[ 1.40150837e+00, 3.28414053e-01, 5.35408562e-01,
2.64141916e-01],
[ 6.74501145e-01, 3.28414053e-01, 4.21733708e-01,
3.95774101e-01],
[ 1.28034050e+00, 9.82172869e-02, 6.49083415e-01,
3.95774101e-01],
[-4.16009689e-01, -1.74335684e+00, 1.37546573e-01,
1.32509732e-01],
[ 7.95669016e-01, -5.92373012e-01, 4.78571135e-01,
3.95774101e-01],
[-1.73673948e-01, -5.92373012e-01, 4.21733708e-01,
1.32509732e-01],
[ 5.53333275e-01, 5.58610819e-01, 5.35408562e-01,
5.27406285e-01],
[-1.14301691e+00, -1.51316008e+00, -2.60315415e-01,
-2.62386821e-01],
[ 9.16836886e-01, -3.62176246e-01, 4.78571135e-01,
1.32509732e-01],
[-7.79513300e-01, -8.22569778e-01, 8.07091462e-02,
2.64141916e-01],
[-1.02184904e+00, -2.43394714e+00, -1.46640561e-01,
-2.62386821e-01],

localhost:8888/notebooks/Downloads/Session-11 Machine Learning.ipynb 22/27

2/2/24, 7:29 PM Session-11 Machine Learning - Jupyter Notebook
[ 6.86617933e-02, -1.31979479e-01, 2.51221427e-01,
3.95774101e-01],
[ 1.89829664e-01, -1.97355361e+00, 1.37546573e-01,
-2.62386821e-01],
[ 3.10997534e-01, -3.62176246e-01, 5.35408562e-01,
2.64141916e-01],
[-2.94841818e-01, -3.62176246e-01, -8.98031345e-02,
1.32509732e-01],
[ 1.03800476e+00, 9.82172869e-02, 3.64896281e-01,
2.64141916e-01],
[-2.94841818e-01, -1.31979479e-01, 4.21733708e-01,
3.95774101e-01],
[-5.25060772e-02, -8.22569778e-01, 1.94384000e-01,
-2.62386821e-01],
[ 4.32165405e-01, -1.97355361e+00, 4.21733708e-01,
3.95774101e-01],
[-2.94841818e-01, -1.28296331e+00, 8.07091462e-02,
-1.30754636e-01],
[ 6.86617933e-02, 3.28414053e-01, 5.92245988e-01,
7.90670654e-01],
[ 3.10997534e-01, -5.92373012e-01, 1.37546573e-01,
1.32509732e-01],
[ 5.53333275e-01, -1.28296331e+00, 6.49083415e-01,
3.95774101e-01],
[ 3.10997534e-01, -5.92373012e-01, 5.35408562e-01,
8.77547895e-04],
[ 6.74501145e-01, -3.62176246e-01, 3.08058854e-01,
1.32509732e-01],
[ 9.16836886e-01, -1.31979479e-01, 3.64896281e-01,
2.64141916e-01],
[ 1.15917263e+00, -5.92373012e-01, 5.92245988e-01,
2.64141916e-01],
[ 1.03800476e+00, -1.31979479e-01, 7.05920842e-01,
6.59038469e-01],
[ 1.89829664e-01, -3.62176246e-01, 4.21733708e-01,
3.95774101e-01],
[-1.73673948e-01, -1.05276654e+00, -1.46640561e-01,
-2.62386821e-01],
[-4.16009689e-01, -1.51316008e+00, 2.38717193e-02,
-1.30754636e-01],
[-4.16009689e-01, -1.51316008e+00, -3.29657076e-02,
-2.62386821e-01],
[-5.25060772e-02, -8.22569778e-01, 8.07091462e-02,
8.77547895e-04],
[ 1.89829664e-01, -8.22569778e-01, 7.62758269e-01,
5.27406285e-01],
[-5.37177559e-01, -1.31979479e-01, 4.21733708e-01,
3.95774101e-01],
[ 1.89829664e-01, 7.88807586e-01, 4.21733708e-01,
5.27406285e-01],
[ 1.03800476e+00, 9.82172869e-02, 5.35408562e-01,
3.95774101e-01],
[ 5.53333275e-01, -1.74335684e+00, 3.64896281e-01,
1.32509732e-01],
[-2.94841818e-01, -1.31979479e-01, 1.94384000e-01,
1.32509732e-01],
[-4.16009689e-01, -1.28296331e+00, 1.37546573e-01,
1.32509732e-01],
[-4.16009689e-01, -1.05276654e+00, 3.64896281e-01,
8.77547895e-04],
[ 3.10997534e-01, -1.31979479e-01, 4.78571135e-01,

localhost:8888/notebooks/Downloads/Session-11 Machine Learning.ipynb 23/27

2/2/24, 7:29 PM Session-11 Machine Learning - Jupyter Notebook
2.64141916e-01],
[-5.25060772e-02, -1.05276654e+00, 1.37546573e-01,
8.77547895e-04],
[-1.02184904e+00, -1.74335684e+00, -2.60315415e-01,
-2.62386821e-01],
[-2.94841818e-01, -8.22569778e-01, 2.51221427e-01,
1.32509732e-01],
[-1.73673948e-01, -1.31979479e-01, 2.51221427e-01,
8.77547895e-04],
[-1.73673948e-01, -3.62176246e-01, 2.51221427e-01,
1.32509732e-01],
[ 4.32165405e-01, -3.62176246e-01, 3.08058854e-01,
1.32509732e-01],
[-9.00681170e-01, -1.28296331e+00, -4.30827696e-01,
-1.30754636e-01],
[-1.73673948e-01, -5.92373012e-01, 1.94384000e-01,
1.32509732e-01],
[ 5.53333275e-01, 5.58610819e-01, 1.27429511e+00,
1.71209594e+00],
[-5.25060772e-02, -8.22569778e-01, 7.62758269e-01,
9.22302838e-01],
[ 1.52267624e+00, -1.31979479e-01, 1.21745768e+00,
1.18556721e+00],
[ 5.53333275e-01, -3.62176246e-01, 1.04694540e+00,
7.90670654e-01],
[ 7.95669016e-01, -1.31979479e-01, 1.16062026e+00,
1.31719939e+00],
[ 2.12851559e+00, -1.31979479e-01, 1.61531967e+00,
1.18556721e+00],
[-1.14301691e+00, -1.28296331e+00, 4.21733708e-01,
6.59038469e-01],
[ 1.76501198e+00, -3.62176246e-01, 1.44480739e+00,
7.90670654e-01],
[ 1.03800476e+00, -1.28296331e+00, 1.16062026e+00,
7.90670654e-01],
[ 1.64384411e+00, 1.24920112e+00, 1.33113254e+00,
1.71209594e+00],
[ 7.95669016e-01, 3.28414053e-01, 7.62758269e-01,
1.05393502e+00],
[ 6.74501145e-01, -8.22569778e-01, 8.76433123e-01,
9.22302838e-01],
[ 1.15917263e+00, -1.31979479e-01, 9.90107977e-01,
1.18556721e+00],
[-1.73673948e-01, -1.28296331e+00, 7.05920842e-01,
1.05393502e+00],
[-5.25060772e-02, -5.92373012e-01, 7.62758269e-01,
1.58046376e+00],
[ 6.74501145e-01, 3.28414053e-01, 8.76433123e-01,
1.44883158e+00],
[ 7.95669016e-01, -1.31979479e-01, 9.90107977e-01,
7.90670654e-01],
[ 2.24968346e+00, 1.70959465e+00, 1.67215710e+00,
1.31719939e+00],
[ 2.24968346e+00, -1.05276654e+00, 1.78583195e+00,
1.44883158e+00],
[ 1.89829664e-01, -1.97355361e+00, 7.05920842e-01,
3.95774101e-01],
[ 1.28034050e+00, 3.28414053e-01, 1.10378283e+00,
1.44883158e+00],
[-2.94841818e-01, -5.92373012e-01, 6.49083415e-01,
1.05393502e+00],

localhost:8888/notebooks/Downloads/Session-11 Machine Learning.ipynb 24/27

2/2/24, 7:29 PM Session-11 Machine Learning - Jupyter Notebook
[ 2.24968346e+00, -5.92373012e-01, 1.67215710e+00,
1.05393502e+00],
[ 5.53333275e-01, -8.22569778e-01, 6.49083415e-01,
7.90670654e-01],
[ 1.03800476e+00, 5.58610819e-01, 1.10378283e+00,
1.18556721e+00],
[ 1.64384411e+00, 3.28414053e-01, 1.27429511e+00,
7.90670654e-01],
[ 4.32165405e-01, -5.92373012e-01, 5.92245988e-01,
7.90670654e-01],
[ 3.10997534e-01, -1.31979479e-01, 6.49083415e-01,
7.90670654e-01],
[ 6.74501145e-01, -5.92373012e-01, 1.04694540e+00,
1.18556721e+00],
[ 1.64384411e+00, -1.31979479e-01, 1.16062026e+00,
5.27406285e-01],
[ 1.88617985e+00, -5.92373012e-01, 1.33113254e+00,
9.22302838e-01],
[ 2.49201920e+00, 1.70959465e+00, 1.50164482e+00,
1.05393502e+00],
[ 6.74501145e-01, -5.92373012e-01, 1.04694540e+00,
1.31719939e+00],
[ 5.53333275e-01, -5.92373012e-01, 7.62758269e-01,
3.95774101e-01],
[ 3.10997534e-01, -1.05276654e+00, 1.04694540e+00,
2.64141916e-01],
[ 2.24968346e+00, -1.31979479e-01, 1.33113254e+00,
1.44883158e+00],
[ 5.53333275e-01, 7.88807586e-01, 1.04694540e+00,
1.58046376e+00],
[ 6.74501145e-01, 9.82172869e-02, 9.90107977e-01,
7.90670654e-01],
[ 1.89829664e-01, -1.31979479e-01, 5.92245988e-01,
7.90670654e-01],
[ 1.28034050e+00, 9.82172869e-02, 9.33270550e-01,
1.18556721e+00],
[ 1.03800476e+00, 9.82172869e-02, 1.04694540e+00,
1.58046376e+00],
[ 1.28034050e+00, 9.82172869e-02, 7.62758269e-01,
1.44883158e+00],
[-5.25060772e-02, -8.22569778e-01, 7.62758269e-01,
9.22302838e-01],
[ 1.15917263e+00, 3.28414053e-01, 1.21745768e+00,
1.44883158e+00],
[ 1.03800476e+00, 5.58610819e-01, 1.10378283e+00,
1.71209594e+00],
[ 1.03800476e+00, -1.31979479e-01, 8.19595696e-01,
1.44883158e+00],
[ 5.53333275e-01, -1.28296331e+00, 7.05920842e-01,
9.22302838e-01],
[ 7.95669016e-01, -1.31979479e-01, 8.19595696e-01,
1.05393502e+00],
[ 4.32165405e-01, 7.88807586e-01, 9.33270550e-01,
1.44883158e+00],
[ 6.86617933e-02, -1.31979479e-01, 7.62758269e-01,
7.90670654e-01]])

localhost:8888/notebooks/Downloads/Session-11 Machine Learning.ipynb 25/27

2/2/24, 7:29 PM Session-11 Machine Learning - Jupyter Notebook

In [102]: X_train, X_test, y_train, y_test = train_test_split(X_new, y, random_state=

In [103]: y_train

Out[103]: 4 setosa
32 setosa
142 virginica
85 versicolor
86 versicolor
...
71 versicolor
106 virginica
14 setosa
92 versicolor
102 virginica
Name: species, Length: 112, dtype: object

In [104]: from sklearn.linear_model import LogisticRegression

In [105]: LR = LogisticRegression(multi_class= 'ovr')

In [106]: LR.fit(X_train, y_train)

Out[106]: ▾ LogisticRegression
LogisticRegression(multi_class='ovr')

In [107]: y_hat = LR.predict(X_test)

In [108]: y_hat

Out[108]: array(['versicolor', 'setosa', 'virginica', 'versicolor', 'versicolor',

'setosa', 'versicolor', 'virginica', 'versicolor', 'versicolor',
'virginica', 'setosa', 'setosa', 'setosa', 'setosa', 'virginica',
'virginica', 'versicolor', 'versicolor', 'virginica', 'setosa',
'virginica', 'setosa', 'virginica', 'virginica', 'virginica',
'virginica', 'virginica', 'setosa', 'setosa', 'setosa', 'setosa',
'versicolor', 'setosa', 'setosa', 'virginica', 'versicolor',
'setosa'], dtype=object)

In [111]: # single parameter

new = [[0.5,0.5,0.5,0.5]]

In [112]: LR.predict(new)

Out[112]: array(['virginica'], dtype=object)

In [113]: X_test.shape

Out[113]: (38, 4)

localhost:8888/notebooks/Downloads/Session-11 Machine Learning.ipynb 26/27

2/2/24, 7:29 PM Session-11 Machine Learning - Jupyter Notebook

In [115]: pd.crosstab(y_test, y_hat)

Out[115]:
col_0 setosa versicolor virginica

species

setosa 15 0 0

versicolor 0 10 1

virginica 0 0 12

In [118]: recall = recall_score(y_test, y_hat, average= 'weighted')

recall

Out[118]: 0.9736842105263158

In [120]: precision = precision_score(y_test, y_hat, average = 'weighted')

precision

Out[120]: 0.9757085020242916

In [121]: print(classification_report(y_test, y_hat))

precision recall f1-score support

setosa 1.00 1.00 1.00 15

versicolor 1.00 0.91 0.95 11
virginica 0.92 1.00 0.96 12

accuracy 0.97 38
macro avg 0.97 0.97 0.97 38
weighted avg 0.98 0.97 0.97 38

For any queries:

# premierintellectruals@gmail.com

In [ ]:

localhost:8888/notebooks/Downloads/Session-11 Machine Learning.ipynb 27/27

Session-11 Machine Learning - Jupyter Notebook
No ratings yet
Session-11 Machine Learning - Jupyter Notebook
11 pages
Practical - Logistic Regression
No ratings yet
Practical - Logistic Regression
84 pages
DS Unit 4
No ratings yet
DS Unit 4
13 pages
Part A Assignment - No - 5 PDF
No ratings yet
Part A Assignment - No - 5 PDF
8 pages
Day.12 Logistic Regression
No ratings yet
Day.12 Logistic Regression
8 pages
Lecture Notes 6 Logistic Regression
No ratings yet
Lecture Notes 6 Logistic Regression
8 pages
Lecture 7 Classification
No ratings yet
Lecture 7 Classification
33 pages
Classification
100% (2)
Classification
105 pages
Regression vs Classification Algorithms
100% (1)
Regression vs Classification Algorithms
13 pages
Evaluation Metrics in Machine Learning - GeeksforGeeks
No ratings yet
Evaluation Metrics in Machine Learning - GeeksforGeeks
6 pages
Unit8 (Evaluation Method)
No ratings yet
Unit8 (Evaluation Method)
43 pages
Information Securtiy
No ratings yet
Information Securtiy
8 pages
Binary Classifier Evaluation Guide
No ratings yet
Binary Classifier Evaluation Guide
12 pages
13 Logistic Regression Main
No ratings yet
13 Logistic Regression Main
14 pages
Ai DS 2 Book-Chpt-5
No ratings yet
Ai DS 2 Book-Chpt-5
17 pages
Classification Algorithm Guide
100% (2)
Classification Algorithm Guide
23 pages
ML CH 5
No ratings yet
ML CH 5
45 pages
Logistic+regression Data
No ratings yet
Logistic+regression Data
13 pages
Module 2
No ratings yet
Module 2
72 pages
Performance Evaluation
No ratings yet
Performance Evaluation
24 pages
Exp 5
No ratings yet
Exp 5
3 pages
Lect 02 Evaluation Part 1
No ratings yet
Lect 02 Evaluation Part 1
33 pages
Unit - 5
No ratings yet
Unit - 5
57 pages
Logistic Regression
No ratings yet
Logistic Regression
13 pages
AI & ML Notes
No ratings yet
AI & ML Notes
22 pages
ML Model Evaluation Metrics
No ratings yet
ML Model Evaluation Metrics
8 pages
Unit III Iml Final
No ratings yet
Unit III Iml Final
36 pages
Performance Metrics
No ratings yet
Performance Metrics
34 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
Machine Learning Essentials
No ratings yet
Machine Learning Essentials
12 pages
Unit3 Evaluating Models
No ratings yet
Unit3 Evaluating Models
10 pages
ML CLASS 5 Logistic Regression Algorithm
No ratings yet
ML CLASS 5 Logistic Regression Algorithm
16 pages
Unit 3
No ratings yet
Unit 3
13 pages
B-56 Sanket Jambhulkar MLA-3
No ratings yet
B-56 Sanket Jambhulkar MLA-3
7 pages
Unit 2 Chap 4
No ratings yet
Unit 2 Chap 4
14 pages
Binary Classification PDF
No ratings yet
Binary Classification PDF
27 pages
Lec 4 ML S4 Evaluation Metrics
No ratings yet
Lec 4 ML S4 Evaluation Metrics
29 pages
Logistic Regression Explained
No ratings yet
Logistic Regression Explained
25 pages
Classification Metrics
No ratings yet
Classification Metrics
39 pages
FAM Unit6
No ratings yet
FAM Unit6
32 pages
3-Performance Measures
No ratings yet
3-Performance Measures
35 pages
DS203 2024 01 02 LogisticRegression
No ratings yet
DS203 2024 01 02 LogisticRegression
38 pages
Intermediate Analytics-Regression-Week 3-1
No ratings yet
Intermediate Analytics-Regression-Week 3-1
44 pages
ML Interview Questions Placements
No ratings yet
ML Interview Questions Placements
99 pages
Week 7
No ratings yet
Week 7
21 pages
Iai&ml Unit-5
No ratings yet
Iai&ml Unit-5
15 pages
ML 2 PPT Unit 2
No ratings yet
ML 2 PPT Unit 2
214 pages
Unit 2 Classification
No ratings yet
Unit 2 Classification
59 pages
Confusion Matrix
No ratings yet
Confusion Matrix
8 pages
Logistic Regression 5
No ratings yet
Logistic Regression 5
61 pages
Linear Regressio
No ratings yet
Linear Regressio
12 pages
4-1 Fine-Tuning Your Model
No ratings yet
4-1 Fine-Tuning Your Model
60 pages
Advanced Regression with GLMs
No ratings yet
Advanced Regression with GLMs
13 pages
Assignment 5
No ratings yet
Assignment 5
22 pages
ML Model Evaluation Metrics
No ratings yet
ML Model Evaluation Metrics
11 pages
Ad3501-Dl-Unit 4 Notes
No ratings yet
Ad3501-Dl-Unit 4 Notes
16 pages
Logistic Regression
No ratings yet
Logistic Regression
5 pages
Srujan ML 2 Project Fin
No ratings yet
Srujan ML 2 Project Fin
39 pages
Biostatistics Exam Prep Guide
No ratings yet
Biostatistics Exam Prep Guide
5 pages
Business Statistics For Contemporary Decision Making 7th Edition by Ken Black Ebook and TestBank Bundle Verified PDF
No ratings yet
Business Statistics For Contemporary Decision Making 7th Edition by Ken Black Ebook and TestBank Bundle Verified PDF
410 pages
3 Ssss
No ratings yet
3 Ssss
18 pages
IE354 Slides 10 Chp11
No ratings yet
IE354 Slides 10 Chp11
68 pages
P&S Ass2
No ratings yet
P&S Ass2
4 pages
Exercise 1
No ratings yet
Exercise 1
1 page
Handbook of Missing Data Methodology
90% (10)
Handbook of Missing Data Methodology
590 pages
Session 9 Ch7
No ratings yet
Session 9 Ch7
3 pages
Math Exam Prep Workshop
100% (1)
Math Exam Prep Workshop
6 pages
Normality Tests Power Analysis
No ratings yet
Normality Tests Power Analysis
14 pages
Online Course: Ecological Statistics
No ratings yet
Online Course: Ecological Statistics
7 pages
BP 801t Biostatistics and Research Methodology Jun 2020
No ratings yet
BP 801t Biostatistics and Research Methodology Jun 2020
3 pages
285 (Ebook PDF) Methods in Behavioural Research 2rd by Paul C. Cozby Download
100% (3)
285 (Ebook PDF) Methods in Behavioural Research 2rd by Paul C. Cozby Download
55 pages
Panel Unit Root Testing Guide
No ratings yet
Panel Unit Root Testing Guide
2 pages
Pengauditan Chapter 15 Arens
No ratings yet
Pengauditan Chapter 15 Arens
33 pages
Data and Variable
No ratings yet
Data and Variable
12 pages
Ridge Regression Explained
No ratings yet
Ridge Regression Explained
6 pages
Statistik 1 - 6 Distribusi Probabilitas Normal
No ratings yet
Statistik 1 - 6 Distribusi Probabilitas Normal
49 pages
Stata GMM Estimation Guide
100% (1)
Stata GMM Estimation Guide
10 pages
Rossmann Sales Prediction Presentation
No ratings yet
Rossmann Sales Prediction Presentation
35 pages
XZCX
No ratings yet
XZCX
6 pages
186 344 1 SM PDF
No ratings yet
186 344 1 SM PDF
12 pages
Understanding Error Bars in Biology
No ratings yet
Understanding Error Bars in Biology
6 pages
Final Project Memo - QNT550
No ratings yet
Final Project Memo - QNT550
7 pages
Computational Statistics - 3rd Sem-1
No ratings yet
Computational Statistics - 3rd Sem-1
4 pages
Rao Schoenfeld 2007 Survival Methods
No ratings yet
Rao Schoenfeld 2007 Survival Methods
5 pages
Stats Book
No ratings yet
Stats Book
102 pages
Lab 04 - Supervised ML Classification - Updated
No ratings yet
Lab 04 - Supervised ML Classification - Updated
21 pages
Multivariate Analysis in SPSS
No ratings yet
Multivariate Analysis in SPSS
65 pages