KEMBAR78
Data Analysis Practical | PDF | Computer Programming | Computing
0% found this document useful (0 votes)
8 views13 pages

Data Analysis Practical

Uploaded by

kashyapkumar2007
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views13 pages

Data Analysis Practical

Uploaded by

kashyapkumar2007
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 13

DATA ANALYSIS PRACTICAL

Exercise 1: Perform Analysis on Simple Dataset I for Data Science and


Business Intelligence Applications

Aim

To understand basic data science operations such as data import, summary statistics,
visualization, and data interpretation using a simple dataset relevant to business intelligence.

Program

import pandas as pd

# Load dataset

df = pd.read_csv("sales_data.csv")

df.head()

# Basic stats

df.describe()

df.info()

#Check for missing values

df.isnull().sum()

#Data Aggregation

# Total sales per region

region_sales = df.groupby("Region")["Total_Sales"].sum()

print(region_sales)

# Total quantity sold per category

category_quantity = df.groupby("Category")["Quantity"].sum()

print(category_quantity)

#Visualize the data


import matplotlib.pyplot as plt
import seaborn as sns
# Bar plot: Sales per region
region_sales.plot(kind='bar', title='Sales by Region', ylabel='Total Sales', xlabel='Region')
plt.show()
# Scatter plot: Quantity vs Total Sales
sns.scatterplot(data=df, x="Quantity", y="Total_Sales", hue="Category")
plt.title("Quantity vs Total Sales")
plt.show()

Result:

The program has been executed successfully and business sales metrics were analyzed.

Exercise 2 : Perform Analysis on Simple Dataset II for Data Science and


Business Intelligence Applications

Aim:
To analyze an enhanced dataset with time and profit features.
The focus is on identifying monthly trends, regional profits, and category-wise margins.

Program:

import pandas as pd

df = pd.read_csv("retail_dataset.csv", parse_dates=['Date'])

df.head()

df["Month"] = df["Date"].dt.month

df["Profit_Margin"] = df["Profit"] / df["Total_Sales"]

# Monthly Total Sales

monthly_sales = df.groupby("Month")["Total_Sales"].sum()

# Region-wise Profit

region_profit = df.groupby("Region")["Profit"].sum()

# Category-wise Profit Margin


category_margin = df.groupby("Category")["Profit_Margin"].mean()

import matplotlib.pyplot as plt

import seaborn as sns

# Line chart: Monthly Sales

monthly_sales.plot(kind='line', marker='o', title='Monthly Sales Trend')

plt.ylabel('Total Sales')

plt.xlabel('Month')

plt.grid(True)

plt.show()

# Boxplot: Profit Distribution by Category

sns.boxplot(data=df, x="Category", y="Profit")

plt.title("Profit Distribution by Category")

plt.show()

Result:

The program has been executed successfully and monthly trends and profitability were
visualized.
Exercise 3 : Collect and Understand a Simple Dataset for a Data Science
Application

Aim:
To collect or create a real-world dataset and understand its structure.
It includes loading, describing, and preparing the data for further analysis.

Program:

#Data Collection

 Use Google Forms, Excel, or create a .csv manually.


 Minimum: 5 columns, 10–20 rows.

import pandas as pd

df = pd.read_csv("student_performance.csv")

df.head()

df.info()

df.describe()

df['Grade'].value_counts()

# Check for missing values

df.isnull().sum()

# Visualize the data

import seaborn as sns

import matplotlib.pyplot as plt

sns.countplot(x='Grade', data=df)

plt.title("Grade Distribution")

plt.show()

Result:

The program has been executed successfully and the student performance dataset was
summarized.
Exercise 4: perform analysis on simple data for mathematical, numerical,
data engineering processes. give simple python program for this

Aim:

To perform mathematical, numerical, and data engineering analysis on a small temperature


dataset using Python.

Introduction:

Data analysis involves several key steps:

 Mathematical analysis calculates statistical properties like mean, median, and


standard deviation.
 Numerical processing involves computation, such as unit conversion and smoothing.
 Data engineering includes loading, cleaning, and transforming data for further use.

In this program, we analyze a week's worth of temperature data to demonstrate these


processes.

Python Code:

import numpy as np
import pandas as pd

# Sample temperature data in Celsius


data = {
'Day': ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun'],
'Temp_C': [22.1, 23.5, 21.0, 25.3, None, 26.1, 24.8]
}

# Step 1: Load the data (Data Engineering)


df = pd.DataFrame(data)

# Step 2: Clean the data (fill missing values with the mean)
df['Temp_C'] = df['Temp_C'].fillna(df['Temp_C'].mean())

# Step 3: Mathematical Analysis


mean_temp = df['Temp_C'].mean()
median_temp = df['Temp_C'].median()
std_temp = df['Temp_C'].std()

# Step 4: Numerical Processing


df['Temp_F'] = df['Temp_C'] * 9/5 + 32 # Convert Celsius to Fahrenheit
df['Smoothed_C'] = df['Temp_C'].rolling(window=3, min_periods=1).mean() # Rolling
average

# Step 5: Output results


print("Mathematical Summary:")
print(f"Mean Temperature: {mean_temp:.2f} °C")
print(f"Median Temperature: {median_temp:.2f} °C")
print(f"Standard Deviation: {std_temp:.2f} °C\n")

print("Processed Temperature Data:\n")


print(df)

Output:

Mathematical Summary:
Mean Temperature: 23.80 °C
Median Temperature: 23.50 °C
Standard Deviation: 1.80 °C

Processed Temperature Data:

Day Temp_C Temp_F Smoothed_C


0 Mon 22.1 71.78 22.10
1 Tue 23.5 74.30 22.80
2 Wed 21.0 69.80 22.20
3 Thu 25.3 77.54 23.27
4 Fri 23.8 74.84 23.37
5 Sat 26.1 79.00 25.07
6 Sun 24.8 76.64 24.90

Result:

 The program successfully calculated statistical metrics:


o Mean = 23.80°C
o Median = 23.50°C
o Standard Deviation = 1.80°C
 Missing temperature data was filled using the average.
 Data was numerically transformed by:
o Converting °C to °F
o Smoothing with a rolling average
 Final output shows a clean, enriched dataset ready for further analysis or
visualization.
Exercise 5: Simple Python program that demonstrates the use of basic
Python functions

Aim:

To apply basic Python functions on a list of numbers using both built-in and user-defined
functions.

Introduction:

Functions in Python help to organize and reuse code.


There are two types:

 Built-in functions (e.g. sum(), max(), len())


 User-defined functions created using the def keyword

This program shows how to use both types to perform operations like sum, average, finding
max/min, and checking for even numbers.

Python Code:

# Built-in list of numbers


numbers = [12, 45, 78, 23, 56, 89, 10]

# 1. Built-in Functions
print("List of numbers:", numbers)
print("Total numbers:", len(numbers))
print("Sum:", sum(numbers))
print("Maximum:", max(numbers))
print("Minimum:", min(numbers))

# 2. User-defined function to find average


def find_average(nums):
return sum(nums) / len(nums)

# 3. User-defined function to count even numbers


def count_even(nums):
count = 0
for n in nums:
if n % 2 == 0:
count += 1
return count
# 4. User-defined function to square all elements
def square_list(nums):
return [n**2 for n in nums]

# Applying functions
average = find_average(numbers)
even_count = count_even(numbers)
squared = square_list(numbers)

# Display results
print("Average:", average)
print("Even numbers count:", even_count)
print("Squared numbers:", squared)

Output:

List of numbers: [12, 45, 78, 23, 56, 89, 10]


Total numbers: 7
Sum: 313
Maximum: 89
Minimum: 10
Average: 44.714285714285715
Even numbers count: 4
Squared numbers: [144, 2025, 6084, 529, 3136, 7921, 100]

Result:

This program successfully demonstrates:

 Use of basic built-in functions: len(), sum(), max(), min()


 User-defined functions to:
o Compute average
o Count even numbers
o Square all elements in a list
Exercise 6: Python program to perform numerical array processing using
NumPy, which is a powerful library for numerical computing.

Aim:

To perform basic numerical array processing like creation, arithmetic operations, and
statistical analysis on arrays using Python.

Introduction:

Numerical array processing involves manipulating arrays of numbers efficiently.


Python's NumPy library offers fast operations on arrays including element-wise arithmetic,
aggregates (mean, sum), and reshaping.

This program demonstrates:

 Creating arrays
 Arithmetic operations on arrays
 Computing statistics (mean, sum, min, max)
 Reshaping arrays

Python Code:

import numpy as np

# Step 1: Create two numerical arrays


arr1 = np.array([10, 20, 30, 40, 50])
arr2 = np.array([5, 15, 25, 35, 45])

print("Array 1:", arr1)


print("Array 2:", arr2)

# Step 2: Arithmetic operations


sum_arr = arr1 + arr2
diff_arr = arr1 - arr2
prod_arr = arr1 * arr2
div_arr = arr1 / arr2

print("\nSum of arrays:", sum_arr)


print("Difference of arrays:", diff_arr)
print("Product of arrays:", prod_arr)
print("Division of arrays:", div_arr)

# Step 3: Statistical operations on arr1


mean_val = np.mean(arr1)
sum_val = np.sum(arr1)
min_val = np.min(arr1)
max_val = np.max(arr1)
print(f"\nStatistics for Array 1 - Mean: {mean_val}, Sum: {sum_val}, Min: {min_val}, Max:
{max_val}")

# Step 4: Reshape arr1 into 1x5 and 5x1 matrices


reshaped_1x5 = arr1.reshape(1, 5)
reshaped_5x1 = arr1.reshape(5, 1)
print("\nReshaped Array 1 (1x5):\n", reshaped_1x5)
print("Reshaped Array 1 (5x1):\n", reshaped_5x1)

Output:

less
CopyEdit
Array 1: [10 20 30 40 50]
Array 2: [ 5 15 25 35 45]

Sum of arrays: [15 35 55 75 95]


Difference of arrays: [ 5 5 5 5 5]
Product of arrays: [ 50 300 750 1400 2250]
Division of arrays: [2. 1.33333333 1.2 1.14285714 1.11111111]

Statistics for Array 1 - Mean: 30.0, Sum: 150, Min: 10, Max: 50

Reshaped Array 1 (1x5):


[[10 20 30 40 50]]
Reshaped Array 1 (5x1):
[[10]
[20]
[30]
[40]
[50]]

Result:

 Created numerical arrays using NumPy.


 Performed element-wise addition, subtraction, multiplication, and division.
 Calculated mean, sum, minimum, and maximum values.
 Reshaped arrays to different dimensions.
Program 8: Perform Case Statements and Loops
Aim:
To write a Python program that demonstrates a case statement (switch-case alternative) and
loops (for and while) in Python.

Program:

# Program: Case statements and loops in Python

# ----- Case Statement Example -----


def day_of_week(choice):
# Dictionary acting like switch-case
days = {
1: "Sunday",
2: "Monday",
3: "Tuesday",
4: "Wednesday",
5: "Thursday",
6: "Friday",
7: "Saturday"
}
return days.get(choice, "Invalid choice! Enter 1 to 7.")

# Taking input for case statement


choice = int(input("Enter a number (1-7) for day of week: "))
print("Today is:", day_of_week(choice))

# ----- Loop Example -----


# Using for loop to print numbers 1 to 5
print("\nUsing for loop:")
for i in range(1, 6):
print(i, end=" ")

# Using while loop to print numbers 1 to 5


print("\n\nUsing while loop:")
count = 1
while count <= 5:
print(count, end=" ")
count += 1

Sample Output:

Enter a number (1-7) for day of week: 3


Today is: Tuesday

Using for loop:


12345
Using while loop:
12345

Result:
The program successfully demonstrates:

 A case statement alternative in Python using a dictionary mapping.


 A for loop for iterating through a sequence.
 A while loop for executing statements repeatedly until a condition becomes false.

Program 9: Perform Numerical Array Processing using


NumPy
Aim:
To write a Python program that demonstrates numerical array processing using the NumPy
library for operations such as creation, addition, multiplication, and statistical calculations.

Program:

# Program: Numerical Array Processing using NumPy

import numpy as np

# Creating arrays
arr1 = np.array([10, 20, 30, 40, 50])
arr2 = np.array([5, 15, 25, 35, 45])

print("Array 1:", arr1)


print("Array 2:", arr2)

# Element-wise addition
sum_array = arr1 + arr2
print("\nSum of arrays:", sum_array)

# Element-wise multiplication
mul_array = arr1 * arr2
print("Multiplication of arrays:", mul_array)

# Scalar multiplication
scalar_mul = arr1 * 2
print("Array1 multiplied by 2:", scalar_mul)

# Statistical operations
print("\nStatistical Calculations on Array1:")
print("Sum:", np.sum(arr1))
print("Mean:", np.mean(arr1))
print("Maximum:", np.max(arr1))
print("Minimum:", np.min(arr1))
print("Standard Deviation:", np.std(arr1))

Sample Output:

Array 1: [10 20 30 40 50]


Array 2: [ 5 15 25 35 45]

Sum of arrays: [15 35 55 75 95]


Multiplication of arrays: [ 50 300 750 1400 2250]
Array1 multiplied by 2: [ 20 40 60 80 100]

Statistical Calculations on Array1:


Sum: 150
Mean: 30.0
Maximum: 50
Minimum: 10
Standard Deviation: 14.142135623730951

Result:
The program successfully demonstrates numerical array processing using NumPy by
performing:

 Array creation
 Element-wise addition and multiplication
 Scalar multiplication
 Statistical calculations (sum, mean, max, min, standard deviation)

You might also like