KEMBAR78
FDSA Lab Manual 1 | PDF | Coefficient Of Determination | Normal Distribution
0% found this document useful (0 votes)
20 views34 pages

FDSA Lab Manual 1

The document outlines the practical laboratory work for the Data Science and Analytics course at Christian College of Engineering and Technology. It includes various experiments such as working with Pandas data frames, creating plots using Matplotlib, and performing statistical tests like Z-test and T-test. Each experiment is accompanied by aims, algorithms, programs, outputs, and results, demonstrating the application of Python in data science.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views34 pages

FDSA Lab Manual 1

The document outlines the practical laboratory work for the Data Science and Analytics course at Christian College of Engineering and Technology. It includes various experiments such as working with Pandas data frames, creating plots using Matplotlib, and performing statistical tests like Z-test and T-test. Each experiment is accompanied by aims, algorithms, programs, outputs, and results, demonstrating the application of Python in data science.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

CHRISTIAN COLLEGE OF ENGINEERING AND TECHNOLOGY

ODDANCHATRAM – 624 619

DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

NAME :

REGISTER NO. :

YEAR/ SEM :

SUBJECT :

DURATION :
CHRISTIAN COLLEGE OF ENGINEERING AND TECHNOLOGY

ODDANCHATRAM – 624 619

DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

REGISTER NO:

This is to certify that this is the bonafide record work done by_________________________in

the AD3411 DATA SCIENCE AND ANALYTICS Laboratory of this institution,as prescribed

by the Anna University Chennai for the II-year 4th semester B.Tech Practical Examination,

during_______________________

STAFF INCHARGE HEAD OF THE DEPARTMENT

Submitted for the practical examination held on

INTERNAL EXAMINER EXTERNAL EXAMINER


TABLE OF CONTENT

S.NO DATE NAME OF THE EXPRIEMENT PAGE MARKS SIGNATURE


NUMBER

1. Working with Pandas data frames

2. Basic plots using Matplotlib

3(a) Frequency distributions

3(b) Averages

3(c) Variability

4(a) Normal curves,

4(b) Correlation and scatter plots,

4(c) Correlation coefficient

5 Regression

6 Z-test

7 T-test

8 ANOVA

9 Building and validating linear


models

10 Building and validating logistic


models

11 Time series analysis


EXP NO: 1 Working with Pandas data frames
Date:

AIM: To work with Pandas data frames.

ALGORITHM:

Step1: Start

Step2: import pandas module

Step3: Create a data frame using the dictionary

Step4: Print the output

Step5: Stop

PROGRAM:

import pandas as pd

data = {

'Name': ['Alice', 'Bob', 'Charlie'],

'Age': [25, 30, 35],

'City': ['New York', 'Los Angeles', 'Chicago']

df = pd.DataFrame(data)

print(df.head())

filtered_df = df[df['Age'] > 30]

print(filtered_df)

df['Senior'] = df['Age'] > 30

print(df)

grouped_df = df.groupby('City')['Age'].mean()

print(grouped_df)
OUTPUT:

Name Age City


0 Alice 25 New York
1 Bob 30 Los Angeles
2 Charlie 35 Chicago
Name Age City
2 Charlie 35 Chicago
Name Age City Senior
0 Alice 25 New York False
1 Bob 30 Los Angeles False
2 Charlie 35 Chicago True
City
Chicago 35.0
Los Angeles 30.0
New York 25.0
Name: Age, dtype: float64

RESULT:

Thus the working with Pandas data frames was successfully completed.
EXP NO: 2 Basic plots using Matplotlib
Date:

AIM:

To draw basic plots in Python program using Matplotlib.

ALGORITHM:

Step1: Start

Step2: import Matplotlib module

Step3: Create a Basic plots using Matplotlib

Step4: Print the output

Step5: Stop

PROGRAM:

import matplotlib.pyplot as plt

x = [1, 2, 3, 4, 5]

y = [2, 3, 5, 7, 11]

plt.plot(x, y, color='green', linestyle='--', marker='o', markersize=10)

plt.xlabel('X-axis')

plt.ylabel('Y-axis')

plt.title('Customized Line Plot')

plt.show()
OUTPUT:

RESULT:

Thus the basic plots using Matplotlib in Python program was successfully completed.
EXP NO:3A Frequency distributions
Date:

AIM :

To write a python program to the frequency distribution in jupyter notebook.

ALGORITHM:

Step 1: Start the Program

Step 2: Import the python library modules

Step 3: Write the code the frequency distribution

Step 5: Print the result

Step 6: Stop the program

PROGRAM:

import pandas as pd

data = [1, 2, 2, 3, 4, 4, 4, 5, 6, 7, 7, 7, 7]

series = pd.Series(data)

frequency = series.value_counts()

print(frequency)

OUTPUT:

7 4
4 3
2 2
1 1
3 1
5 1
6 1
dtype: int64
In [ ]:
RESULT:

Thus the python program to the frequency distribution in jupyter notebook was written and
executed successfully.
EXP NO:
3B Averages
Date:

AIM :

To write a python program to find an average in jupyter notebook.

ALGORITHM:

Step 1: Start the Program

Step 2: Import the python library modules

Step 3: Write the code the average

Step 5: Print the result

Step 6: Stop the program

PROGRAM:

import numpy as np

import pandas as pd

data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

mean_pandas = pd.Series(data).mean()

print(f"Mean (Pandas): {mean_pandas}")

mean_numpy = np.mean(data)

print(f"Mean (NumPy): {mean_numpy}")

median = pd.Series(data).median()

print(f"Median: {median}")

mode = pd.Series(data).mode()

print(f"Mode: {mode}")
OUTPUT:

Mean (Pandas): 5.5


Mean (NumPy): 5.5
Median: 5.5
Mode: 0 1
1 2
2 3
3 4
4 5
5 6
6 7
7 8
8 9
9 10
dtype: int64
In [ ]:

RESULT:

Thus the python program to the average in jupyter notebook was written and executed
successfully.
EXP NO:
3C Variability
Date:

AIM :

To write a python program to find an average in jupyter notebook.

ALGORITHM:

Step 1: Start the Program

Step 2: Import the python library modules

Step 3: Write the code the average

Step 5: Print the result

Step 6: Stop the program

PROGRAM:

import numpy as np

import pandas as pd

data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

range_value = max(data) - min(data)

print(f"Range: {range_value}")

variance_pandas = pd.Series(data).var()

print(f"Variance (Pandas): {variance_pandas}")

std_dev_pandas = pd.Series(data).std()

print(f"Standard Deviation (Pandas): {std_dev_pandas}")

variance_numpy = np.var(data)

print(f"Variance (NumPy): {variance_numpy}")

std_dev_numpy = np.std(data)

print(f"Standard Deviation (NumPy): {std_dev_numpy}")


OUTPUT:

Range: 9
Variance (Pandas): 9.166666666666666
Standard Deviation (Pandas): 3.0276503540974917
Variance (NumPy): 8.25
Standard Deviation (NumPy): 2.8722813232690143

RESULT:

Thus the computation for variance was successfully completed.


EXP NO: Normal curves
4A
Date:

AIM :

To create a normal curve using python program

ALGORITHM:

Step 1: Start the program

Step 2: Import packages numpy and matplotlib

Step 3: Create the distribution

Step 4: Visualizing the distribution

Step 5: Stop the program

PROGRAM:

import numpy as np

import matplotlib.pyplot as plt

mu = 0

sigma = 1

data = np.random.normal(mu, sigma, 1000)

plt.hist(data, bins=30, density=True, alpha=0.6, color='g')

xmin, xmax = plt.xlim()

x = np.linspace(xmin, xmax, 100)

p = 1/(sigma * np.sqrt(2 * np.pi)) * np.exp(-0.5 * ((x - mu) / sigma)**2)

plt.plot(x, p, 'k', linewidth=2)

plt.xlabel('Data values')

plt.ylabel('Probability density')

plt.title('Normal Distribution Curve')


plt.show()

OUTPUT:

RESULT:

Thus the normal curve using python program was successfully completed.
EXP NO: Correlation and scatter plots
4B
Date:

AIM :

To write a python program for correlation with scatter plot.

ALGORITHM:

Step 1: Start the Program

Step 2: Create variable y1, y2

Step 3: Create variable x, y3 using random function

Step 4: plot the scatter plot

Step 5: Print the result

Step 6: Stop the process

Program:

import matplotlib.pyplot as plt

import numpy as np

x = np.random.rand(100)

y = 2 * x + np.random.normal(0, 0.1, 100)

plt.scatter(x, y, alpha=0.7)

plt.xlabel('X')

plt.ylabel('Y')

plt.title('Scatter Plot: X vs Y')

plt.show()
OUTPUT:

Result:

Thus the Correlation and scatter plots using python program was successfully completed.
EXP NO: Correlation coefficient
4C
Date:

Aim:

To write a python program to compute correlation coefficient.

ALGORITHM

Step 1: Start the Program

Step 2: Import math package

Step 3: Define correlation coefficient function

Step 4: Calculate correlation using formula

Step 5:Print the result

Step 6 : Stop the process

PROGRAM:

import numpy as np

import pandas as pd

x = np.random.rand(100)

y = 2 * x + np.random.normal(0, 0.1, 100)

correlation_numpy = np.corrcoef(x, y)[0, 1]

print(f"Correlation Coefficient (NumPy): {correlation_numpy}")

df = pd.DataFrame({'x': x, 'y': y})

correlation_pandas = df.corr().loc['x', 'y']

print(f"Correlation Coefficient (Pandas): {correlation_pandas}")


OUTPUT:

Correlation Coefficient (NumPy): 0.979876345

Correlation Coefficient (Pandas): 0.979876345

Result:

Thus the computation for correlation coefficient was successfully completed.


EXP NO: 5
Date:
Simple Linear Regression

AIM:

To write a python program for Simple Linear Regression

ALGORITHM:

Step 1: Start the Program

Step 2: Import numpy and matplotlib package

Step 3: Define coefficient function

Step 4: Calculate cross-deviation and deviation about x

Step 5: Calculate regression coefficients

Step 6: Plot the Linear regression and define main function

Step 7: Print the result

Step 8: Stop the process

PROGRAM:

import numpy as np

import matplotlib.pyplot as plt

from sklearn.linear_model import LinearRegression

np.random.seed(0)

X = np.random.rand(100) * 10

Y = 2.5 * X + np.random.normal(0, 2, 100)

plt.scatter(X, Y, color='blue', alpha=0.7)

plt.xlabel('X')

plt.ylabel('Y')

plt.title('Scatter Plot: X vs Y')

plt.show()
X = X.reshape(-1, 1)

model = LinearRegression()

model.fit(X, Y)

slope = model.coef_[0]

intercept = model.intercept_

print(f"Slope (beta_1): {slope}")

print(f"Intercept (beta_0): {intercept}")

Y_pred = model.predict(X)

plt.scatter(X, Y, color='blue', alpha=0.7, label='Data')

plt.plot(X, Y_pred, color='red', label='Fitted Line')

plt.xlabel('X')

plt.ylabel('Y')

plt.title('Simple Linear Regression: Fitted Line')

plt.legend()

plt.show()

r_squared = model.score(X, Y)

print(f"R-squared: {r_squared}")

X_new = np.array([[15]])

Y_new = model.predict(X_new)

print(f"Predicted Y for X = 15: {Y_new[0]}")


OUTPUT:

Slope (beta_1): 2.487387004280408


Intercept (beta_0): 0.4443021548944568

R-squared: 0.928337996765404
Predicted Y for X = 15: 37.75510721910058

RESULT:

Thus the computation for Simple Linear Regression was successfully completed.
EXP NO: 6
Date:
Z-test

AIM:

To write a python program for Z-test

ALGORITHM:

Step 1: Start the Program

Step 2: Import math package

Step 3: Define Z-test function

Step 4: Calculate Z-test using formula

Step 5: Print the result

Step 6: Stop the process

PROGRAM:

import numpy as np

import scipy.stats as stats

mean_1 = 50

mean_2 = 45

std_1 = 10

std_2 = 12

size_1 = 40

size_2 = 35

z_score_two_sample = (mean_1 - mean_2) / np.sqrt((std_1**2 / size_1) + (std_2**2 / size_2))

p_value_two_sample = 2 * (1 - stats.norm.cdf(abs(z_score_two_sample)))

print(f"Z-Score: {z_score_two_sample}")

print(f"P-value: {p_value_two_sample}")
OUTPUT:

Z-Score: 1.9441444452997994
P-value: 0.051878034893831915

RESULT:

Thus the computation for Z-test was successfully completed.


EXP NO: 7
Date:
T-test

AIM:

To write a python program for T-test

ALGORITHM:

Step 1: Start the Program

Step 2: Import math package

Step 3: Define T-test function

Step 4: Calculate T-test using formula

Step 5: Print the result

Step 6: Stop the process

PROGRAM:

import scipy.stats as stats

import numpy as np

sample_data = np.array([52, 55, 48, 49, 53, 54, 51, 50, 55, 58, 56, 57, 52, 51, 54, 53, 59, 61, 50,
52, 54, 53, 49, 47, 52, 51, 50, 48, 56, 55])

population_mean = 50

t_stat, p_value = stats.ttest_1samp(sample_data, population_mean)

print(f"T-statistic: {t_stat}")

print(f"P-value: {p_value}")

OUTPUT:

T-statistic: 4.571679054413011
P-value: 8.327654458471987e-05

RESULT:

Thus the computation for T-test was successfully completed.


EXP NO: 8
Date:
ANOVA

AIM:

To write a python program for ANOVA

ALGORITHM:

Step 1: Start the Program

Step 2: Import package

Step 3: Prepare the Data

Step 4: Perform ANOVA

Step 5: Calculate the F-statistic

Step 6: Calculate the P-value

Step 7: Print the result

Step 8: Stop the process

PROGRAM:

import numpy as np

import scipy.stats as stats

group_1 = np.array([23, 45, 67, 32, 45, 34, 43, 45, 56, 42])

group_2 = np.array([45, 32, 23, 43, 46, 32, 21, 22, 43, 43])

group_3 = np.array([65, 78, 56, 67, 82, 73, 74, 65, 68, 74])

f_stat, p_value = stats.f_oneway(group_1, group_2, group_3)

print(f"F-statistic: {f_stat}")

print(f"P-value: {p_value}")

if p_value < 0.05:

print("There is a significant difference between the group means.")


else:

print("There is no significant difference between the group means.")

OUTPUT:

F-statistic: 32.6259618124822
P-value: 6.255218731829188e-08
There is a significant difference between the group means.

RESULT:

Thus the computation for ANOVA was successfully completed.


EXP NO: 9
Date:
Building and validating linear models

AIM:

To write a python program to building and validating linear models using jupyter notebook.

ALGORITHM:

Step 1: Start the Program

Step 2: Import package

Step 3: Prepare the Data

Step 4: Build the Model

Step 5: Evaluate the Model

Step 6: Model Diagnostics

Step 7: Print the result

Step 8: Stop the process

PROGRAM:

import numpy as np

import pandas as pd

import statsmodels.api as sm

from sklearn.model_selection import train_test_split

from sklearn.metrics import mean_squared_error, r2_score

np.random.seed(0)

X = np.random.rand(100, 1) * 10

y = 2.5 * X.squeeze() + np.random.randn(100) * 2

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

X_train_sm = sm.add_constant(X_train)
X_test_sm = sm.add_constant(X_test)

model = sm.OLS(y_train, X_train_sm).fit()

y_pred = model.predict(X_test_sm)

print(model.summary())

mse = mean_squared_error(y_test, y_pred)

r2 = r2_score(y_test, y_pred)

print(f'Mean Squared Error: {mse}')

print(f'R-squared: {r2}')

OUTPUT:

OLS Regression Results


=====================================================================
=========
Dep. Variable: y R-squared: 0.932
Model: OLS Adj. R-squared: 0.931
Method: Least Squares F-statistic: 1074.
Date: Thu, 19 Dec 2024 Prob (F-statistic): 2.29e-47
Time: 14:52:46 Log-Likelihood: -169.42
No. Observations: 80 AIC: 342.8
Df Residuals: 78 BIC: 347.6
Df Model: 1
Covariance Type: nonrobust
=====================================================================
=========
coef std err t P>|t| [0.025 0.975]

const 0.4127 0.417 0.990 0.325 -0.417 1.242


x1 2.4961 0.076 32.776 0.000 2.344 2.648
=====================================================================
=========
Omnibus: 8.580 Durbin-Watson: 2.053
Prob(Omnibus): 0.014 Jarque-Bera (JB): 3.170
Skew: 0.107 Prob(JB): 0.205
Kurtosis: 2.048 Cond. No. 10.3
=====================================================================
=========

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Mean Squared Error: 3.6710129878857174
R-squared: 0.896480483165161

RESULT:

Thus the computation for building and validating linear models was successfully completed.
EXP NO: 10
Date:
Building and validating logistic models

AIM:

To write a python program to building and validating logistic models using jupyter notebook.

ALGORITHM:

Step 1: Start the Program

Step 2: Import python libraries

Step 3: Generate synthetic data

Step 4: Split the data

Step 5: Build the logistic regression model

Step 6: Make predictions and Evaluate the model

Step 7: Print evaluation metrics and Print the result

Step 8: Stop the process

PROGRAM:

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LogisticRegression

from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

np.random.seed(0)

X = np.random.rand(100, 2)

y = (X[:, 0] + X[:, 1] > 1).astype(int)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


model = LogisticRegression()

model.fit(X_train, y_train)

y_pred = model.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)

conf_matrix = confusion_matrix(y_test, y_pred)

class_report = classification_report(y_test, y_pred)

print(f'Accuracy: {accuracy}')

print('Confusion Matrix:')

print(conf_matrix)

print('Classification Report:')

print(class_report)

plt.figure(figsize=(10, 6))

plt.scatter(X_test[:, 0], X_test[:, 1], c=y_test, cmap='coolwarm', edgecolors='k', s=100,


label='True Labels')

plt.scatter(X_test[:, 0], X_test[:, 1], c=y_pred, marker='x', cmap='coolwarm', s=100,


label='Predicted Labels')

plt.title('Logistic Regression Predictions')

plt.xlabel('Feature 1')

plt.ylabel('Feature 2')

plt.legend()

plt.show()

OUTPUT:

Accuracy: 0.9
Confusion Matrix:
[[ 8 2]
[ 0 10]]
Classification Report:
precision recall f1-score support
0 1.00 0.80 0.89 10
1 0.83 1.00 0.91 10

accuracy 0.90 20
macro avg 0.92 0.90 0.90 20
weighted avg 0.92 0.90 0.90 20

RESULT:

Thus the computation for building and validating logistic models was successfully completed.
EXP NO: 11
Date:
Time series analysis

AIM:

To write a python program to time series analysis using jupyter notebook.

ALGORITHM:

Step 1: Start the Program

Step 2: Import python libraries

Step 3: Generate a time series data

Step 4: Create a DataFrame

Step 5: Print the result

Step 6: Stop the process

PROGRAM:

import matplotlib.pyplot as plt

import pandas as pd

import numpy as np

date_range = pd.date_range(start='1/1/2020', periods=100)

data = np.random.randn(100).cumsum()

time_series_data = pd.DataFrame(data, index=date_range, columns=['Value'])

plt.figure(figsize=(12, 6))

plt.plot(time_series_data.index, time_series_data['Value'], label='Random Data', color='blue')

plt.title('Time Series Analysis')

plt.xlabel('Date')

plt.ylabel('Value')

plt.legend()

plt.grid()
plt.show()

OUTPUT:

RESULT:

Thus the computation for time series analysis was successfully completed.

You might also like