Experiment 1: Installing Python and Required Packages
Aim: To install Python and essential machine learning libraries: NumPy, Pandas, Matplotlib, and
Scikit-learn.
Theory: Python is widely used for data science and machine learning because of its simple syntax
and vast ecosystem of libraries. Instead of coding everything from scratch, we use prebuilt libraries
like:
NumPy: for numerical computations
Pandas: for data handling and manipulation
Matplotlib: for data visualization
Scikit-learn: for implementing machine learning models
Software Requirements:
Anaconda Distribution (Python 3.x)
Internet Connection
Jupyter Notebook / VS Code / Any Python IDE
Procedure:
1. Open a browser and visit the official Anaconda website:
https://www.anaconda.com/products/distribution
2. Download the latest version of Anaconda for your OS (Windows/macOS/Linux).
3. Run the downloaded installer and follow the steps:
- Accept license agreement
- Select 'Just Me'
- Choose installation location (default or custom)
– Enable PATH and register as default Python interpreter
- Click Install
4. Once installed, open Anaconda Navigator from the Start menu.
5. Launch Jupyter Notebook.
6. Inside Jupyter, click on New > Python 3 to start a new notebook.
7. Install the required libraries (if not preinstalled) using Anaconda Prompt:
conda install numpy pandas matplotlib scikit-learn
OR using pip:
pip install numpy pandas matplotlib scikit-learn
8. In the notebook, type and run the following code to verify installation:
Code:
import numpy
import pandas
import matplotlib
import sklearn
print("All packages are installed and imported successfully.")
Expected Output:
All packages are installed and imported successfully.
Output: When you run the code, it should display the above message without any errors.
Result: Successfully installed Python and verified the working of essential ML libraries: NumPy,
Pandas, Matplotlib, and Scikit-learn.
Experiment 2: Mathematical Operations on Vectors and Matrices
Aim: To perform mathematical operations such as addition, subtraction, multiplication, and
transpose on vectors and matrices using NumPy.
Theory: Matrices and vectors form the backbone of machine learning models. NumPy is a powerful
library that simplifies numerical computations and allows efficient matrix manipulations.
Software Requirements:
Python with NumPy installed
Jupyter Notebook or Python IDE
Procedure:
1. Import the NumPy library.
2. Create arrays representing vectors and matrices.
3. Perform operations like
4.
1. Matrix addition
2. Matrix multiplication
3. Transpose
4. Scalar multiplication
Code:
import numpy as np
# Define matrices
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])
# Operations
print("Matrix A:\n", A)
print("Matrix B:\n", B)
print("Addition:\n", A + B)
print("Dot Product:\n", np.dot(A, B))
print("Transpose of A:\n", A.T)
Expected Output:
Matrix A:
[[1 2]
[3 4]]
Matrix B:
[[5 6]
[7 8]]
Addition:
[[ 6 8]
[10 12]]
Dot Product:
[[19 22]
[43 50]]
Transpose of A:
[[1 3]
[2 4]]
Output: Displays the result of matrix operations performed using NumPy.
Result: Successfully executed vector and matrix operations using NumPy.
Experiment 3: Creating, Loading, and Saving CSV Files
Aim: To create, load, and save datasets using CSV (Comma-Separated Values) files with
Python Pandas.
Theory: CSV is a simple file format used to store tabular data. Pandas provides powerful
tools for reading from and writing to CSV files, which are commonly used in machine
learning workflows for storing datasets.
Software Requirements:
Python
Pandas
Procedure:
1. Import the Pandas library.
2. Create a DataFrame from scratch.
3. Save the DataFrame to a CSV file.
4. Load the CSV file back into a DataFrame.
5. Display and manipulate the data.
Code:
import pandas as pd
# Step 1: Create data
data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'Department': ['IT', 'HR', 'Finance']
# Step 2: Create DataFrame
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)
# Step 3: Save to CSV
df.to_csv('employees.csv', index=False)
print("
Data saved to employees.csv")
# Step 4: Load from CSV
loaded_df = pd.read_csv('employees.csv')
print("
Loaded DataFrame:")
print(loaded_df)
Expected Output:
Printed original and loaded DataFrames
Output: Console output showing the original data and data reloaded from the CSV file.
Result: Successfully created, saved, and loaded a CSV file using Pandas.
Experiment 4: Calculations of Mean, Median, Variance, Standard Deviation, Quartiles, and IQR
Aim: To perform basic statistical calculations such as mean, median, variance, standard deviation,
quartiles, and interquartile range using Python.
Theory: These statistical measures help in understanding the distribution and spread of data.
Mean: Average value
Median: Middle value in sorted data
Variance: Measure of spread around the mean
Standard Deviation: Square root of variance
Quartiles: Divide data into four parts
Interquartile Range (IQR): Difference between Q3 and Q1, shows spread of the middle 50%
Software Requirements:
Python
Pandas / NumPy
Procedure:
1. Import necessary libraries.
2. Create a sample dataset.
3. Calculate mean, median, variance, and standard deviation.
4. Calculate quartiles and IQR.
Code:
import pandas as pd
import numpy as np
# Sample data
data = [12, 15, 20, 21, 22, 25, 27, 30, 33, 35]
# Convert to Series
df = pd.Series(data)
# Calculations
print("Mean:", df.mean())
print("Median:", df.median())
print("Variance:", df.var())
print("Standard Deviation:", df.std())
print("Q1 (25th percentile):", df.quantile(0.25))
print("Q2 (50th percentile):", df.quantile(0.50))
print("Q3 (75th percentile):", df.quantile(0.75))
print("Interquartile Range (IQR):", df.quantile(0.75) - df.quantile(0.25))
Expected Output:
Printed results of all statistical metrics.
Output: Displays the calculated values for mean, median, variance, standard deviation, quartiles, and
IQR.
Result: Successfully calculated statistical measures using Python.
Experiment 5: Basic Plots using Matplotlib for an Example Dataset
Aim: To visualize data using basic plots like line plot, bar chart, histogram, and scatter plot using
Matplotlib.
Theory: Matplotlib is a widely-used Python library for 2D plotting. Visualizing data helps in
understanding patterns, distributions, and relationships. Key types of plots include:
Line Plot: Shows trends over time.
Bar Chart: Compares categorical data.
Histogram: Displays frequency distribution.
Scatter Plot: Shows relationship between two variables.
Software Requirements:
Python
Matplotlib
Pandas (optional for dataset handling)
Procedure:
1. Import required libraries.
2. Prepare or load example dataset.
3. Create basic visualizations using matplotlib.
Code:
import matplotlib.pyplot as plt
import numpy as np
# Sample data
x = [1, 2, 3, 4, 5]
y = [10, 20, 15, 25, 30]
# Line plot
plt.figure(figsize=(6,4))
plt.plot(x, y, marker='o')
plt.title("Line Plot")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.grid(True)
plt.show()
# Bar chart
plt.bar(x, y)
plt.title("Bar Chart")
plt.xlabel("X")
plt.ylabel("Values")
plt.show()
# Histogram
data = [12, 15, 20, 20, 21, 22, 25, 27, 30, 33, 35, 35, 35]
plt.hist(data, bins=5, color='skyblue')
plt.title("Histogram")
plt.xlabel("Values")
plt.ylabel("Frequency")
plt.show()
# Scatter plot
x2 = np.random.rand(50)
y2 = np.random.rand(50)
plt.scatter(x2, y2, color='green')
plt.title("Scatter Plot")
plt.xlabel("X")
plt.ylabel("Y")
plt.show()
Expected Output:
Multiple plots: line, bar, histogram, and scatter plot
Output: Visualizations appear in separate windows or cells (in Jupyter)
Result: Successfully created various plots to visualize data using Matplotlib.
Appendix: Basic Statistical Formulas (For Diploma Students)
1. Mean (µ):
µ = (x■ + x■ + ... + x■) / n
2. Median:
If n is odd → Middle value of sorted data
If n is even → Median = (middle1 + middle2) / 2
3. Mode:
The value that appears most frequently in the dataset.
4. Variance (σ²):
σ² = (1/n) * Σ(x■ - µ)²
5. Standard Deviation (σ):
σ = √((1/n) * Σ(x■ - µ)²)
These formulas help you understand the spread and average behavior of data.
They are useful in preprocessing and model evaluation in Machine Learning.