0% found this document useful (0 votes)

28 views12 pages

Practical

This document provides a comprehensive guide on setting up Python and Jupyter Notebook for data manipulation using Pandas and SQL. It covers installation, creating DataFrames, basic and advanced data manipulation techniques, and querying data with SQL in Jupyter Notebook. Additionally, it includes practice exercises to reinforce learning.

Uploaded by

eczhyena

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views12 pages

Practical

Uploaded by

eczhyena

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

1.

Setting Up Your Environment

1. Install Python and Jupyter Notebook:

- Download Python from [python.org](https://www.python.org/).

- Install Jupyter Notebook using pip:

bash

pip install notebook

- Launch Jupyter Notebook:

bash

jupyter notebook

2. Install required libraries:

bash

pip install pandas sqlite3

_____________________________________________________

2. Introduction to Pandas

Pandas is a powerful library for data manipulation and analysis in Python.

2.1 Importing Pandas

python

import pandas as pd

2.2 Creating a DataFrame

A DataFrame is a 2D table-like structure for storing data.

python

Create a DataFrame from a dictionary

data = {

'Name': ['Alice', 'Bob', 'Charlie'],

'Age': [25, 30, 35],

'City': ['New York', 'Los Angeles', 'Chicago']

df = pd.DataFrame(data)

print(df)

2.3 Reading Data from a CSV File

python

Load a CSV file into a DataFrame

df = pd.read_csv('data.csv')

print(df.head()) Display the first 5 rows

2.4 Basic Data Manipulation

- Selecting Columns:

python

df['Name'] Select a single column

df[['Name', 'Age']] Select multiple columns

- Filtering Rows:

python

df[df['Age'] > 30] Filter rows where Age > 30

- Adding a New Column:

python

df['Salary'] = [50000, 60000, 70000]

print(df)

- Descriptive Statistics:

python

df.describe() Summary statistics for numerical columns

_____________________________________________________

3. Introduction to SQL in Jupyter Notebook

You can use SQL to query data directly in Jupyter Notebook using the `sqlite3` library or the `pandasql`
library.

3.1 Using SQLite with Pandas

python

import sqlite3

Create a connection to a SQLite database

conn = sqlite3.connect('example.db')

Load a DataFrame into a SQL table

df.to_sql('people', conn, if_exists='replace', index=False)

Query the database using SQL

query = "SELECT * FROM people WHERE Age > 30"

result = pd.read_sql(query, conn)

print(result)

3.2 Using `pandasql` for SQL Queries

Install `pandasql`:

bash

pip install pandasql

Use it in Jupyter Notebook:

python

from pandasql import sqldf

Define a query

query = "SELECT * FROM df WHERE Age > 30"

Execute the query

result = sqldf(query)

print(result)

_____________________________________________________

4. Combining Pandas and SQL

You can use SQL to query Pandas DataFrames directly.

4.1 Querying a DataFrame with SQL

python
from pandasql import sqldf

Example DataFrame

data = {

'Name': ['Alice', 'Bob', 'Charlie'],

'Age': [25, 30, 35],

'City': ['New York', 'Los Angeles', 'Chicago']

df = pd.DataFrame(data)

SQL query

query = """

SELECT Name, Age

FROM df

WHERE City = 'New York'

"""

Execute the query

result = sqldf(query)

print(result)

5. Practice Exercises

1. Create a DataFrame with 5 rows and 4 columns (e.g., Name, Age, City, Salary).

2. Filter rows where the Salary is greater than 50,000.

3. Add a new column called `Bonus` that is 10% of the Salary.

4. Use SQL to query the DataFrame and find people older than 30.

_________________________________________________________________________
1. Advanced Pandas Functions

1.1 Handling Missing Data

- Check for Missing Values:

python

df.isnull().sum() Count missing values per column

- Drop Missing Values:

python

df.dropna() Drop rows with missing values

df.dropna(axis=1) Drop columns with missing values

- Fill Missing Values:

python

df.fillna(0) Fill missing values with 0

df['Age'].fillna(df['Age'].mean(), inplace=True) Fill with mean

1.2 Grouping and Aggregation

- Group by a Column:

python

df.groupby('City')['Age'].mean() Average age by city

- Aggregate Functions:

python

df.groupby('City').agg({
'Age': ['mean', 'min', 'max'],

'Salary': 'sum'

})

1.3 Merging and Joining DataFrames

- Merge Two DataFrames:

python

df1 = pd.DataFrame({'ID': [1, 2, 3], 'Name': ['Alice', 'Bob', 'Charlie']})

df2 = pd.DataFrame({'ID': [2, 3, 4], 'Salary': [50000, 60000, 70000]})

merged_df = pd.merge(df1, df2, on='ID', how='inner') Inner join

print(merged_df)

- Concatenate DataFrames:

python

concatenated_df = pd.concat([df1, df2], axis=0) Stack vertically

print(concatenated_df)

1.4 Pivot Tables

- Create a Pivot Table:

python

pivot_table = df.pivot_table(values='Salary', index='City', columns='Name', aggfunc='mean')

print(pivot_table)
1.5 Apply Functions

- Apply a Function to a Column:

python

df['Age'] = df['Age'].apply(lambda x: x + 1) Increment age by 1

- Apply a Function Row-wise:

python

df['Age_Salary_Ratio'] = df.apply(lambda row: row['Age'] / row['Salary'], axis=1)

1.6 Sorting and Ranking

- Sort by a Column:

python

df.sort_values(by='Salary', ascending=False, inplace=True)

- Rank Data:

python

df['Rank'] = df['Salary'].rank(ascending=False)

_____________________________________________________

2. Advanced SQL Queries

2.1 Basic SQL Queries

- Select with Conditions:

sql

SELECT * FROM people WHERE Age > 30 AND City = 'New York';
- Order By:

sql

SELECT * FROM people ORDER BY Salary DESC;

2.2 Aggregation in SQL

- Group By:

sql

SELECT City, AVG(Age) AS AvgAge FROM people GROUP BY City;

- Having Clause:

sql

SELECT City, AVG(Salary) AS AvgSalary

FROM people

GROUP BY City

HAVING AVG(Salary) > 50000;

2.3 Joins in SQL

- Inner Join:

sql

SELECT df1.Name, df2.Salary

FROM df1

INNER JOIN df2 ON df1.ID = df2.ID;

- Left Join:

sql
SELECT df1.Name, df2.Salary

FROM df1

LEFT JOIN df2 ON df1.ID = df2.ID;

2.4 Subqueries

- Subquery in WHERE Clause:

sql

SELECT Name, Salary

FROM people

WHERE Salary > (SELECT AVG(Salary) FROM people);

- Subquery in SELECT Clause:

sql

SELECT Name, (SELECT AVG(Salary) FROM people) AS AvgSalary

FROM people;

2.5 Window Functions

- Row Number:

sql

SELECT Name, Salary, ROW_NUMBER() OVER (ORDER BY Salary DESC) AS Rank

FROM people;

- Rank:

sql

SELECT Name, Salary, RANK() OVER (ORDER BY Salary DESC) AS Rank

FROM people;

3. Combining Pandas and SQL for Analysis

3.1 Querying DataFrames with SQL

python

from pandasql import sqldf

Example DataFrame

data = {

'Name': ['Alice', 'Bob', 'Charlie'],

'Age': [25, 30, 35],

'City': ['New York', 'Los Angeles', 'Chicago']

df = pd.DataFrame(data)

SQL query

query = """

SELECT Name, Age

FROM df

WHERE City = 'New York'

"""

Execute the query

result = sqldf(query)

print(result)
3.2 Exporting Data to SQL

python

from sqlalchemy import create_engine

Create a SQLite database connection

engine = create_engine('sqlite:///example.db')

Export DataFrame to SQL table

df.to_sql('people', engine, if_exists='replace', index=False)

3.3 Reading Data from SQL

python

Read data from SQL table into a DataFrame

query = "SELECT * FROM people"

df_from_sql = pd.read_sql(query, engine)

print(df_from_sql)

4. Practice Exercises

1. Load a CSV file into a DataFrame and clean it by handling missing values.

2. Group the data by a categorical column and calculate summary statistics.

3. Perform an inner join on two DataFrames using both Pandas and SQL.

4. Use a window function in SQL to rank rows based on a numeric column.

5. Export a DataFrame to a SQL table and query it using SQL.

Python & MySQL For Data Analysis
No ratings yet
Python & MySQL For Data Analysis
45 pages
Battle of The Data Tools - Pandas Vs SQL
No ratings yet
Battle of The Data Tools - Pandas Vs SQL
12 pages
Informatics Practices Practical File
No ratings yet
Informatics Practices Practical File
8 pages
.2 Dse
No ratings yet
.2 Dse
14 pages
Python CheatSheet
No ratings yet
Python CheatSheet
2 pages
Data Analyst Cheat Sheet
No ratings yet
Data Analyst Cheat Sheet
28 pages
File Ip
No ratings yet
File Ip
22 pages
Pandas PDF
No ratings yet
Pandas PDF
25 pages
Python Interviews
No ratings yet
Python Interviews
154 pages
Pandas
No ratings yet
Pandas
6 pages
Data Manipulation
No ratings yet
Data Manipulation
3 pages
IP Imp Notes
No ratings yet
IP Imp Notes
5 pages
FDS Exp4
No ratings yet
FDS Exp4
5 pages
Dataframe in Pandas - Cheatsheet
No ratings yet
Dataframe in Pandas - Cheatsheet
8 pages
Pandas Introduction: What Is Python Pandas Used For?
No ratings yet
Pandas Introduction: What Is Python Pandas Used For?
28 pages
Pandas
No ratings yet
Pandas
4 pages
Python 2.1.3
No ratings yet
Python 2.1.3
6 pages
EDA Cheat Sheet
No ratings yet
EDA Cheat Sheet
7 pages
Learn Pandas
No ratings yet
Learn Pandas
37 pages
Pandas Handbook
No ratings yet
Pandas Handbook
33 pages
Lab Session 06: Perform Following Operations Using Pandas Lab Session 06: Perform Following Operations Using Pandas
No ratings yet
Lab Session 06: Perform Following Operations Using Pandas Lab Session 06: Perform Following Operations Using Pandas
5 pages
Pandas
No ratings yet
Pandas
13 pages
Exploratory Data Analysis (Eda) With Pandas: (Cheatsheet)
No ratings yet
Exploratory Data Analysis (Eda) With Pandas: (Cheatsheet)
7 pages
Pandas Library: Data Manipulation & Analysis Guide
No ratings yet
Pandas Library: Data Manipulation & Analysis Guide
9 pages
HTML Code
No ratings yet
HTML Code
4 pages
Python Data Science 101
100% (1)
Python Data Science 101
41 pages
Pandas
No ratings yet
Pandas
13 pages
Deloitte Data Engineer Interview Experience (0-3 Yoe)
No ratings yet
Deloitte Data Engineer Interview Experience (0-3 Yoe)
22 pages
Pandas Research
No ratings yet
Pandas Research
14 pages
Pandas Guide
No ratings yet
Pandas Guide
50 pages
DHP Journal
No ratings yet
DHP Journal
29 pages
Pandas
No ratings yet
Pandas
35 pages
Pandas Trampas
No ratings yet
Pandas Trampas
9 pages
HTML Code
No ratings yet
HTML Code
3 pages
Pandas Dataframe All Operations 1735471870
No ratings yet
Pandas Dataframe All Operations 1735471870
4 pages
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
17 pages
Introduction To Pandas Programming 1
No ratings yet
Introduction To Pandas Programming 1
2 pages
Starting Out With Pandas - Ext
No ratings yet
Starting Out With Pandas - Ext
18 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
16 pages
05 Pandas Data Frames
No ratings yet
05 Pandas Data Frames
33 pages
Getting Start With Pandas
No ratings yet
Getting Start With Pandas
11 pages
DAP 3 Module
No ratings yet
DAP 3 Module
62 pages
Usage of NumPy For Numerical Data in Detail
No ratings yet
Usage of NumPy For Numerical Data in Detail
52 pages
Python Pandas, Matplotlib, SQL Tasks
No ratings yet
Python Pandas, Matplotlib, SQL Tasks
6 pages
Data Analysis With Python
No ratings yet
Data Analysis With Python
60 pages
Pandas Fuction Notes
No ratings yet
Pandas Fuction Notes
3 pages
More On Pandas
No ratings yet
More On Pandas
51 pages
EDA With Pandas
No ratings yet
EDA With Pandas
8 pages
Pandas Tutorial
No ratings yet
Pandas Tutorial
7 pages
Pandas Basics For Data Science
No ratings yet
Pandas Basics For Data Science
2 pages
Data Handling Module
No ratings yet
Data Handling Module
10 pages
Pandas
No ratings yet
Pandas
26 pages
Tung Wah College GEN3005 / GED3005 Big Data and Data Sciences
No ratings yet
Tung Wah College GEN3005 / GED3005 Big Data and Data Sciences
6 pages
I.P File
No ratings yet
I.P File
20 pages
Data Science Introduction
No ratings yet
Data Science Introduction
15 pages
SQL
No ratings yet
SQL
13 pages
Data Science
No ratings yet
Data Science
15 pages
Data Processing
No ratings yet
Data Processing
14 pages
Hadoop Frame Work
No ratings yet
Hadoop Frame Work
38 pages
MATH GR11 (Stats Prob) QTR3-MODULE-4
No ratings yet
MATH GR11 (Stats Prob) QTR3-MODULE-4
28 pages
Economics Seminar Paper Guide
No ratings yet
Economics Seminar Paper Guide
2 pages
Gunawan+Tjokro JED Final 38-49
No ratings yet
Gunawan+Tjokro JED Final 38-49
12 pages
TTI Bundling
100% (1)
TTI Bundling
5 pages
Qlik Deployment Framework-QlikView Getting Started Guide
No ratings yet
Qlik Deployment Framework-QlikView Getting Started Guide
17 pages
iQR Code: Features and Benefits
No ratings yet
iQR Code: Features and Benefits
6 pages
KDK Case Report 2 - Group B
No ratings yet
KDK Case Report 2 - Group B
13 pages
A Guide To Fact Based Problem Solving
100% (1)
A Guide To Fact Based Problem Solving
56 pages
Geographic Information - System - SEMESTER BREAKUP
No ratings yet
Geographic Information - System - SEMESTER BREAKUP
4 pages
Artificial Intelligence and Data Science in Environmental Sensing Mohsen Asadnia HQ File Fast Access
No ratings yet
Artificial Intelligence and Data Science in Environmental Sensing Mohsen Asadnia HQ File Fast Access
330 pages
Research Proposal Draft April 23 2024 Updated
100% (1)
Research Proposal Draft April 23 2024 Updated
26 pages
CH - 1 Artificial Intelligence Class 11 Notes
75% (4)
CH - 1 Artificial Intelligence Class 11 Notes
11 pages
University Management System
0% (1)
University Management System
17 pages
Generative Artificial Intelligence For Distribut - 2024 - International Journal
No ratings yet
Generative Artificial Intelligence For Distribut - 2024 - International Journal
8 pages
Brahmarishi Bawra Shanti Vidya Peeth
No ratings yet
Brahmarishi Bawra Shanti Vidya Peeth
23 pages
SQL Service Integration Services
No ratings yet
SQL Service Integration Services
35 pages
Research Guide for Nursing Students
No ratings yet
Research Guide for Nursing Students
3 pages
Certificate in Sales and Marketing PPM
No ratings yet
Certificate in Sales and Marketing PPM
1 page
Global Transport Label Odette Recommendation - GTLV03
No ratings yet
Global Transport Label Odette Recommendation - GTLV03
96 pages
Lenovo ThinkSystem DE Storage Overview
No ratings yet
Lenovo ThinkSystem DE Storage Overview
32 pages
A Digital Computer System (DCS) : Hardware
No ratings yet
A Digital Computer System (DCS) : Hardware
36 pages
Unit 5 - Dr.D.umanandhini (Autosaved)
No ratings yet
Unit 5 - Dr.D.umanandhini (Autosaved)
77 pages
COCOMO Model for Project Estimation
No ratings yet
COCOMO Model for Project Estimation
13 pages
International Journal For Research in Education (IJRE)
No ratings yet
International Journal For Research in Education (IJRE)
74 pages
IICS Projects For Resume
100% (3)
IICS Projects For Resume
4 pages
SQL Server - Writing A Simple Bank Schema - How Should I Keep My Balances in PDF
No ratings yet
SQL Server - Writing A Simple Bank Schema - How Should I Keep My Balances in PDF
15 pages
OS Memory & Process Management FAQ
No ratings yet
OS Memory & Process Management FAQ
14 pages
Apply Patching On Oracle 19c Database Release Update 19
100% (1)
Apply Patching On Oracle 19c Database Release Update 19
14 pages
Be Electronics and Telecommunication Engineering Semester 5 2023 November Database Management DM Pattern 2019
No ratings yet
Be Electronics and Telecommunication Engineering Semester 5 2023 November Database Management DM Pattern 2019
2 pages