KEMBAR78
ML Lab Manual With Statistical Formulas | PDF | Matrix (Mathematics) | Quartile
0% found this document useful (0 votes)
12 views9 pages

ML Lab Manual With Statistical Formulas

The document outlines a series of experiments aimed at installing Python and essential libraries for machine learning, performing mathematical operations with NumPy, and using Pandas for CSV file handling. It includes procedures for statistical calculations and visualizations using Matplotlib. Each experiment provides code examples and expected outputs to verify successful execution.

Uploaded by

19057cme009
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views9 pages

ML Lab Manual With Statistical Formulas

The document outlines a series of experiments aimed at installing Python and essential libraries for machine learning, performing mathematical operations with NumPy, and using Pandas for CSV file handling. It includes procedures for statistical calculations and visualizations using Matplotlib. Each experiment provides code examples and expected outputs to verify successful execution.

Uploaded by

19057cme009
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Experiment 1: Installing Python and Required Packages

Aim: To install Python and essential machine learning libraries: NumPy, Pandas, Matplotlib, and
Scikit-learn.

Theory: Python is widely used for data science and machine learning because of its simple syntax
and vast ecosystem of libraries. Instead of coding everything from scratch, we use prebuilt libraries
like:

 NumPy: for numerical computations

 Pandas: for data handling and manipulation

 Matplotlib: for data visualization

 Scikit-learn: for implementing machine learning models

Software Requirements:

 Anaconda Distribution (Python 3.x)

 Internet Connection

 Jupyter Notebook / VS Code / Any Python IDE

Procedure:

1. Open a browser and visit the official Anaconda website:


https://www.anaconda.com/products/distribution

2. Download the latest version of Anaconda for your OS (Windows/macOS/Linux).

3. Run the downloaded installer and follow the steps:

- Accept license agreement

- Select 'Just Me'

- Choose installation location (default or custom)

– Enable PATH and register as default Python interpreter

- Click Install

4. Once installed, open Anaconda Navigator from the Start menu.

5. Launch Jupyter Notebook.

6. Inside Jupyter, click on New > Python 3 to start a new notebook.

7. Install the required libraries (if not preinstalled) using Anaconda Prompt:

conda install numpy pandas matplotlib scikit-learn

OR using pip:

pip install numpy pandas matplotlib scikit-learn

8. In the notebook, type and run the following code to verify installation:
Code:

import numpy

import pandas

import matplotlib

import sklearn

print("All packages are installed and imported successfully.")

Expected Output:

All packages are installed and imported successfully.

Output: When you run the code, it should display the above message without any errors.

Result: Successfully installed Python and verified the working of essential ML libraries: NumPy,
Pandas, Matplotlib, and Scikit-learn.

Experiment 2: Mathematical Operations on Vectors and Matrices

Aim: To perform mathematical operations such as addition, subtraction, multiplication, and


transpose on vectors and matrices using NumPy.

Theory: Matrices and vectors form the backbone of machine learning models. NumPy is a powerful
library that simplifies numerical computations and allows efficient matrix manipulations.

Software Requirements:

 Python with NumPy installed

 Jupyter Notebook or Python IDE

Procedure:

1. Import the NumPy library.

2. Create arrays representing vectors and matrices.

3. Perform operations like

4.
1. Matrix addition
2. Matrix multiplication
3. Transpose
4. Scalar multiplication

Code:

import numpy as np

# Define matrices
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])
# Operations
print("Matrix A:\n", A)
print("Matrix B:\n", B)
print("Addition:\n", A + B)
print("Dot Product:\n", np.dot(A, B))
print("Transpose of A:\n", A.T)

Expected Output:

Matrix A:
[[1 2]
[3 4]]
Matrix B:
[[5 6]
[7 8]]
Addition:
[[ 6 8]
[10 12]]
Dot Product:
[[19 22]
[43 50]]
Transpose of A:
[[1 3]
[2 4]]

Output: Displays the result of matrix operations performed using NumPy.

Result: Successfully executed vector and matrix operations using NumPy.

Experiment 3: Creating, Loading, and Saving CSV Files

Aim: To create, load, and save datasets using CSV (Comma-Separated Values) files with
Python Pandas.

Theory: CSV is a simple file format used to store tabular data. Pandas provides powerful
tools for reading from and writing to CSV files, which are commonly used in machine
learning workflows for storing datasets.

Software Requirements:

 Python
 Pandas

Procedure:

1. Import the Pandas library.


2. Create a DataFrame from scratch.
3. Save the DataFrame to a CSV file.
4. Load the CSV file back into a DataFrame.
5. Display and manipulate the data.

Code:
import pandas as pd

# Step 1: Create data

data = {

'Name': ['Alice', 'Bob', 'Charlie'],

'Age': [25, 30, 35],

'Department': ['IT', 'HR', 'Finance']

# Step 2: Create DataFrame

df = pd.DataFrame(data)

print("Original DataFrame:")

print(df)

# Step 3: Save to CSV

df.to_csv('employees.csv', index=False)

print("

Data saved to employees.csv")

# Step 4: Load from CSV

loaded_df = pd.read_csv('employees.csv')

print("

Loaded DataFrame:")

print(loaded_df)

Expected Output:
Printed original and loaded DataFrames

Output: Console output showing the original data and data reloaded from the CSV file.

Result: Successfully created, saved, and loaded a CSV file using Pandas.

Experiment 4: Calculations of Mean, Median, Variance, Standard Deviation, Quartiles, and IQR

Aim: To perform basic statistical calculations such as mean, median, variance, standard deviation,
quartiles, and interquartile range using Python.

Theory: These statistical measures help in understanding the distribution and spread of data.

 Mean: Average value

 Median: Middle value in sorted data

 Variance: Measure of spread around the mean

 Standard Deviation: Square root of variance

 Quartiles: Divide data into four parts

 Interquartile Range (IQR): Difference between Q3 and Q1, shows spread of the middle 50%

Software Requirements:

 Python

 Pandas / NumPy

Procedure:

1. Import necessary libraries.

2. Create a sample dataset.

3. Calculate mean, median, variance, and standard deviation.

4. Calculate quartiles and IQR.

Code:

import pandas as pd

import numpy as np

# Sample data

data = [12, 15, 20, 21, 22, 25, 27, 30, 33, 35]
# Convert to Series

df = pd.Series(data)

# Calculations

print("Mean:", df.mean())

print("Median:", df.median())

print("Variance:", df.var())

print("Standard Deviation:", df.std())

print("Q1 (25th percentile):", df.quantile(0.25))

print("Q2 (50th percentile):", df.quantile(0.50))

print("Q3 (75th percentile):", df.quantile(0.75))

print("Interquartile Range (IQR):", df.quantile(0.75) - df.quantile(0.25))

Expected Output:

 Printed results of all statistical metrics.

Output: Displays the calculated values for mean, median, variance, standard deviation, quartiles, and
IQR.

Result: Successfully calculated statistical measures using Python.

Experiment 5: Basic Plots using Matplotlib for an Example Dataset

Aim: To visualize data using basic plots like line plot, bar chart, histogram, and scatter plot using
Matplotlib.

Theory: Matplotlib is a widely-used Python library for 2D plotting. Visualizing data helps in
understanding patterns, distributions, and relationships. Key types of plots include:

 Line Plot: Shows trends over time.

 Bar Chart: Compares categorical data.

 Histogram: Displays frequency distribution.

 Scatter Plot: Shows relationship between two variables.

Software Requirements:

 Python

 Matplotlib

 Pandas (optional for dataset handling)

Procedure:
1. Import required libraries.

2. Prepare or load example dataset.

3. Create basic visualizations using matplotlib.

Code:

import matplotlib.pyplot as plt

import numpy as np

# Sample data

x = [1, 2, 3, 4, 5]

y = [10, 20, 15, 25, 30]

# Line plot

plt.figure(figsize=(6,4))

plt.plot(x, y, marker='o')

plt.title("Line Plot")

plt.xlabel("X-axis")

plt.ylabel("Y-axis")

plt.grid(True)

plt.show()

# Bar chart

plt.bar(x, y)

plt.title("Bar Chart")

plt.xlabel("X")

plt.ylabel("Values")

plt.show()

# Histogram

data = [12, 15, 20, 20, 21, 22, 25, 27, 30, 33, 35, 35, 35]

plt.hist(data, bins=5, color='skyblue')

plt.title("Histogram")
plt.xlabel("Values")

plt.ylabel("Frequency")

plt.show()

# Scatter plot

x2 = np.random.rand(50)

y2 = np.random.rand(50)

plt.scatter(x2, y2, color='green')

plt.title("Scatter Plot")

plt.xlabel("X")

plt.ylabel("Y")

plt.show()

Expected Output:

 Multiple plots: line, bar, histogram, and scatter plot

Output: Visualizations appear in separate windows or cells (in Jupyter)

Result: Successfully created various plots to visualize data using Matplotlib.


Appendix: Basic Statistical Formulas (For Diploma Students)

1. Mean (µ):
µ = (x■ + x■ + ... + x■) / n

2. Median:
If n is odd → Middle value of sorted data
If n is even → Median = (middle1 + middle2) / 2

3. Mode:
The value that appears most frequently in the dataset.

4. Variance (σ²):
σ² = (1/n) * Σ(x■ - µ)²

5. Standard Deviation (σ):


σ = √((1/n) * Σ(x■ - µ)²)

These formulas help you understand the spread and average behavior of data.
They are useful in preprocessing and model evaluation in Machine Learning.

You might also like