DATA ANALYSIS PRACTICAL
Exercise 1: Perform Analysis on Simple Dataset I for Data Science and
Business Intelligence Applications
Aim
To understand basic data science operations such as data import, summary statistics,
visualization, and data interpretation using a simple dataset relevant to business intelligence.
Program
import pandas as pd
# Load dataset
df = pd.read_csv("sales_data.csv")
df.head()
# Basic stats
df.describe()
df.info()
#Check for missing values
df.isnull().sum()
#Data Aggregation
# Total sales per region
region_sales = df.groupby("Region")["Total_Sales"].sum()
print(region_sales)
# Total quantity sold per category
category_quantity = df.groupby("Category")["Quantity"].sum()
print(category_quantity)
#Visualize the data
import matplotlib.pyplot as plt
import seaborn as sns
# Bar plot: Sales per region
region_sales.plot(kind='bar', title='Sales by Region', ylabel='Total Sales', xlabel='Region')
plt.show()
# Scatter plot: Quantity vs Total Sales
sns.scatterplot(data=df, x="Quantity", y="Total_Sales", hue="Category")
plt.title("Quantity vs Total Sales")
plt.show()
Result:
The program has been executed successfully and business sales metrics were analyzed.
Exercise 2 : Perform Analysis on Simple Dataset II for Data Science and
Business Intelligence Applications
Aim:
To analyze an enhanced dataset with time and profit features.
The focus is on identifying monthly trends, regional profits, and category-wise margins.
Program:
import pandas as pd
df = pd.read_csv("retail_dataset.csv", parse_dates=['Date'])
df.head()
df["Month"] = df["Date"].dt.month
df["Profit_Margin"] = df["Profit"] / df["Total_Sales"]
# Monthly Total Sales
monthly_sales = df.groupby("Month")["Total_Sales"].sum()
# Region-wise Profit
region_profit = df.groupby("Region")["Profit"].sum()
# Category-wise Profit Margin
category_margin = df.groupby("Category")["Profit_Margin"].mean()
import matplotlib.pyplot as plt
import seaborn as sns
# Line chart: Monthly Sales
monthly_sales.plot(kind='line', marker='o', title='Monthly Sales Trend')
plt.ylabel('Total Sales')
plt.xlabel('Month')
plt.grid(True)
plt.show()
# Boxplot: Profit Distribution by Category
sns.boxplot(data=df, x="Category", y="Profit")
plt.title("Profit Distribution by Category")
plt.show()
Result:
The program has been executed successfully and monthly trends and profitability were
visualized.
Exercise 3 : Collect and Understand a Simple Dataset for a Data Science
Application
Aim:
To collect or create a real-world dataset and understand its structure.
It includes loading, describing, and preparing the data for further analysis.
Program:
#Data Collection
Use Google Forms, Excel, or create a .csv manually.
Minimum: 5 columns, 10–20 rows.
import pandas as pd
df = pd.read_csv("student_performance.csv")
df.head()
df.info()
df.describe()
df['Grade'].value_counts()
# Check for missing values
df.isnull().sum()
# Visualize the data
import seaborn as sns
import matplotlib.pyplot as plt
sns.countplot(x='Grade', data=df)
plt.title("Grade Distribution")
plt.show()
Result:
The program has been executed successfully and the student performance dataset was
summarized.
Exercise 4: perform analysis on simple data for mathematical, numerical,
data engineering processes. give simple python program for this
Aim:
To perform mathematical, numerical, and data engineering analysis on a small temperature
dataset using Python.
Introduction:
Data analysis involves several key steps:
Mathematical analysis calculates statistical properties like mean, median, and
standard deviation.
Numerical processing involves computation, such as unit conversion and smoothing.
Data engineering includes loading, cleaning, and transforming data for further use.
In this program, we analyze a week's worth of temperature data to demonstrate these
processes.
Python Code:
import numpy as np
import pandas as pd
# Sample temperature data in Celsius
data = {
'Day': ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun'],
'Temp_C': [22.1, 23.5, 21.0, 25.3, None, 26.1, 24.8]
}
# Step 1: Load the data (Data Engineering)
df = pd.DataFrame(data)
# Step 2: Clean the data (fill missing values with the mean)
df['Temp_C'] = df['Temp_C'].fillna(df['Temp_C'].mean())
# Step 3: Mathematical Analysis
mean_temp = df['Temp_C'].mean()
median_temp = df['Temp_C'].median()
std_temp = df['Temp_C'].std()
# Step 4: Numerical Processing
df['Temp_F'] = df['Temp_C'] * 9/5 + 32 # Convert Celsius to Fahrenheit
df['Smoothed_C'] = df['Temp_C'].rolling(window=3, min_periods=1).mean() # Rolling
average
# Step 5: Output results
print("Mathematical Summary:")
print(f"Mean Temperature: {mean_temp:.2f} °C")
print(f"Median Temperature: {median_temp:.2f} °C")
print(f"Standard Deviation: {std_temp:.2f} °C\n")
print("Processed Temperature Data:\n")
print(df)
Output:
Mathematical Summary:
Mean Temperature: 23.80 °C
Median Temperature: 23.50 °C
Standard Deviation: 1.80 °C
Processed Temperature Data:
Day Temp_C Temp_F Smoothed_C
0 Mon 22.1 71.78 22.10
1 Tue 23.5 74.30 22.80
2 Wed 21.0 69.80 22.20
3 Thu 25.3 77.54 23.27
4 Fri 23.8 74.84 23.37
5 Sat 26.1 79.00 25.07
6 Sun 24.8 76.64 24.90
Result:
The program successfully calculated statistical metrics:
o Mean = 23.80°C
o Median = 23.50°C
o Standard Deviation = 1.80°C
Missing temperature data was filled using the average.
Data was numerically transformed by:
o Converting °C to °F
o Smoothing with a rolling average
Final output shows a clean, enriched dataset ready for further analysis or
visualization.
Exercise 5: Simple Python program that demonstrates the use of basic
Python functions
Aim:
To apply basic Python functions on a list of numbers using both built-in and user-defined
functions.
Introduction:
Functions in Python help to organize and reuse code.
There are two types:
Built-in functions (e.g. sum(), max(), len())
User-defined functions created using the def keyword
This program shows how to use both types to perform operations like sum, average, finding
max/min, and checking for even numbers.
Python Code:
# Built-in list of numbers
numbers = [12, 45, 78, 23, 56, 89, 10]
# 1. Built-in Functions
print("List of numbers:", numbers)
print("Total numbers:", len(numbers))
print("Sum:", sum(numbers))
print("Maximum:", max(numbers))
print("Minimum:", min(numbers))
# 2. User-defined function to find average
def find_average(nums):
return sum(nums) / len(nums)
# 3. User-defined function to count even numbers
def count_even(nums):
count = 0
for n in nums:
if n % 2 == 0:
count += 1
return count
# 4. User-defined function to square all elements
def square_list(nums):
return [n**2 for n in nums]
# Applying functions
average = find_average(numbers)
even_count = count_even(numbers)
squared = square_list(numbers)
# Display results
print("Average:", average)
print("Even numbers count:", even_count)
print("Squared numbers:", squared)
Output:
List of numbers: [12, 45, 78, 23, 56, 89, 10]
Total numbers: 7
Sum: 313
Maximum: 89
Minimum: 10
Average: 44.714285714285715
Even numbers count: 4
Squared numbers: [144, 2025, 6084, 529, 3136, 7921, 100]
Result:
This program successfully demonstrates:
Use of basic built-in functions: len(), sum(), max(), min()
User-defined functions to:
o Compute average
o Count even numbers
o Square all elements in a list
Exercise 6: Python program to perform numerical array processing using
NumPy, which is a powerful library for numerical computing.
Aim:
To perform basic numerical array processing like creation, arithmetic operations, and
statistical analysis on arrays using Python.
Introduction:
Numerical array processing involves manipulating arrays of numbers efficiently.
Python's NumPy library offers fast operations on arrays including element-wise arithmetic,
aggregates (mean, sum), and reshaping.
This program demonstrates:
Creating arrays
Arithmetic operations on arrays
Computing statistics (mean, sum, min, max)
Reshaping arrays
Python Code:
import numpy as np
# Step 1: Create two numerical arrays
arr1 = np.array([10, 20, 30, 40, 50])
arr2 = np.array([5, 15, 25, 35, 45])
print("Array 1:", arr1)
print("Array 2:", arr2)
# Step 2: Arithmetic operations
sum_arr = arr1 + arr2
diff_arr = arr1 - arr2
prod_arr = arr1 * arr2
div_arr = arr1 / arr2
print("\nSum of arrays:", sum_arr)
print("Difference of arrays:", diff_arr)
print("Product of arrays:", prod_arr)
print("Division of arrays:", div_arr)
# Step 3: Statistical operations on arr1
mean_val = np.mean(arr1)
sum_val = np.sum(arr1)
min_val = np.min(arr1)
max_val = np.max(arr1)
print(f"\nStatistics for Array 1 - Mean: {mean_val}, Sum: {sum_val}, Min: {min_val}, Max:
{max_val}")
# Step 4: Reshape arr1 into 1x5 and 5x1 matrices
reshaped_1x5 = arr1.reshape(1, 5)
reshaped_5x1 = arr1.reshape(5, 1)
print("\nReshaped Array 1 (1x5):\n", reshaped_1x5)
print("Reshaped Array 1 (5x1):\n", reshaped_5x1)
Output:
less
CopyEdit
Array 1: [10 20 30 40 50]
Array 2: [ 5 15 25 35 45]
Sum of arrays: [15 35 55 75 95]
Difference of arrays: [ 5 5 5 5 5]
Product of arrays: [ 50 300 750 1400 2250]
Division of arrays: [2. 1.33333333 1.2 1.14285714 1.11111111]
Statistics for Array 1 - Mean: 30.0, Sum: 150, Min: 10, Max: 50
Reshaped Array 1 (1x5):
[[10 20 30 40 50]]
Reshaped Array 1 (5x1):
[[10]
[20]
[30]
[40]
[50]]
Result:
Created numerical arrays using NumPy.
Performed element-wise addition, subtraction, multiplication, and division.
Calculated mean, sum, minimum, and maximum values.
Reshaped arrays to different dimensions.
Program 8: Perform Case Statements and Loops
Aim:
To write a Python program that demonstrates a case statement (switch-case alternative) and
loops (for and while) in Python.
Program:
# Program: Case statements and loops in Python
# ----- Case Statement Example -----
def day_of_week(choice):
# Dictionary acting like switch-case
days = {
1: "Sunday",
2: "Monday",
3: "Tuesday",
4: "Wednesday",
5: "Thursday",
6: "Friday",
7: "Saturday"
}
return days.get(choice, "Invalid choice! Enter 1 to 7.")
# Taking input for case statement
choice = int(input("Enter a number (1-7) for day of week: "))
print("Today is:", day_of_week(choice))
# ----- Loop Example -----
# Using for loop to print numbers 1 to 5
print("\nUsing for loop:")
for i in range(1, 6):
print(i, end=" ")
# Using while loop to print numbers 1 to 5
print("\n\nUsing while loop:")
count = 1
while count <= 5:
print(count, end=" ")
count += 1
Sample Output:
Enter a number (1-7) for day of week: 3
Today is: Tuesday
Using for loop:
12345
Using while loop:
12345
Result:
The program successfully demonstrates:
A case statement alternative in Python using a dictionary mapping.
A for loop for iterating through a sequence.
A while loop for executing statements repeatedly until a condition becomes false.
Program 9: Perform Numerical Array Processing using
NumPy
Aim:
To write a Python program that demonstrates numerical array processing using the NumPy
library for operations such as creation, addition, multiplication, and statistical calculations.
Program:
# Program: Numerical Array Processing using NumPy
import numpy as np
# Creating arrays
arr1 = np.array([10, 20, 30, 40, 50])
arr2 = np.array([5, 15, 25, 35, 45])
print("Array 1:", arr1)
print("Array 2:", arr2)
# Element-wise addition
sum_array = arr1 + arr2
print("\nSum of arrays:", sum_array)
# Element-wise multiplication
mul_array = arr1 * arr2
print("Multiplication of arrays:", mul_array)
# Scalar multiplication
scalar_mul = arr1 * 2
print("Array1 multiplied by 2:", scalar_mul)
# Statistical operations
print("\nStatistical Calculations on Array1:")
print("Sum:", np.sum(arr1))
print("Mean:", np.mean(arr1))
print("Maximum:", np.max(arr1))
print("Minimum:", np.min(arr1))
print("Standard Deviation:", np.std(arr1))
Sample Output:
Array 1: [10 20 30 40 50]
Array 2: [ 5 15 25 35 45]
Sum of arrays: [15 35 55 75 95]
Multiplication of arrays: [ 50 300 750 1400 2250]
Array1 multiplied by 2: [ 20 40 60 80 100]
Statistical Calculations on Array1:
Sum: 150
Mean: 30.0
Maximum: 50
Minimum: 10
Standard Deviation: 14.142135623730951
Result:
The program successfully demonstrates numerical array processing using NumPy by
performing:
Array creation
Element-wise addition and multiplication
Scalar multiplication
Statistical calculations (sum, mean, max, min, standard deviation)