0% found this document useful (0 votes)

58 views7 pages

EDA - Session-1 - Basic Dataframe Opertaions-1

The document discusses reading and manipulating CSV files using Pandas in Python. It shows how to: 1) Import necessary libraries and read a CSV file into a Pandas DataFrame. 2) Create DataFrames from lists, dictionaries, and zipped data with or without specifying column names and indexes. 3) Add, update, and drop columns and rows from DataFrames. 4) Save DataFrames to CSV and Excel files.

Uploaded by

jeeshu048

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

58 views7 pages

EDA - Session-1 - Basic Dataframe Opertaions-1

Uploaded by

jeeshu048

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Reading the Csv file

In [1]: import pandas as pd # Dataframe operations

import numpy as np # Math operations
import matplotlib.pyplot as plt # Diagrams / plots
import seaborn as sns # Diagrams / plots

In [ ]: # data set name: visadataset

# read csv file : Comma separeated value
# extension : .csv
# you can read this using pandas package

# read excel file
# extension: .xlsx

In [2]: # path
#file location+filename+extension

path=r"C:\Users\omkar\OneDrive\Documents\Data science\Naresh IT\Datafiles\Vi

In [3]: pd.read_csv(path)

Out[3]:
case_id continent education_of_employee has_job_experience requires_job_trainin

0 EZYV01 Asia High School N

1 EZYV02 Asia Master's Y

2 EZYV03 Asia Bachelor's N

3 EZYV04 Asia Bachelor's N

4 EZYV05 Africa Master's Y

... ... ... ... ...

25475 EZYV25476 Asia Bachelor's Y

25476 EZYV25477 Asia High School Y

25477 EZYV25478 Asia Master's Y

25478 EZYV25479 Asia Master's Y

25479 EZYV25480 Asia Bachelor's Y

25480 rows × 12 columns

 
In [8]: # Can you do bank data
# data set name= bank
path=r"C:\Users\omkar\OneDrive\Documents\Data science\Naresh IT\Datafiles\ba
pd.read_csv(path,
sep=';')

Out[8]:
age job marital education default balance housing loan contact day m

0 30 unemployed married primary no 1787 no no cellular 19

1 33 services married secondary no 4789 yes yes cellular 11

2 35 management single tertiary no 1350 yes no cellular 16

3 30 management married tertiary no 1476 yes yes unknown 3

4 59 blue-collar married secondary no 0 yes no unknown 5

... ... ... ... ... ... ... ... ... ... ...

4516 33 services married secondary no -333 yes no cellular 30

self-
4517 57 married tertiary yes -3313 yes yes unknown 9
employed

4518 57 technician married secondary no 295 no no cellular 19

4519 28 blue-collar married secondary no 1137 no no cellular 6

4520 44 entrepreneur single tertiary no 1136 yes yes cellular 3

4521 rows × 17 columns

 

Creat dataframes using List

In [10]: name=['Ramesh','Suresh','Sathish']
age=[30,35,40]
name,age

Out[10]: (['Ramesh', 'Suresh', 'Sathish'], [30, 35, 40])

Step-1

𝑐𝑟𝑒𝑎𝑡𝑒 𝑑𝑎𝑡𝑎𝑓𝑟𝑎𝑚𝑒
In [11]: pd.DataFrame() # make the dataframe

Out[11]:

Step-2

𝑝𝑟𝑜𝑣𝑖𝑑𝑒 𝑑𝑎𝑡𝑎
In [12]: pd.DataFrame(zip(name,age))

Out[12]:
0 1

0 Ramesh 30

1 Suresh 35

2 Sathish 40

Step-3

𝑝𝑟𝑜𝑣𝑖𝑑𝑒 𝑐𝑜𝑙𝑢𝑚𝑛𝑠
In [15]: #Provide columns
data=zip(name,age)
cols=['Name','Age']
pd.DataFrame(data,columns=cols)
#pd.DataFrame(zip(name,age),columns=['Name','Age'])

Out[15]:
Name Age

0 Ramesh 30

1 Suresh 35

2 Sathish 40

Step-4

𝑝𝑟𝑜𝑣𝑖𝑑𝑒 𝑖𝑛𝑑𝑒𝑥
In [16]: data=zip(name,age)
cols=['Name','Age']
ind=['A','B','C']
pd.DataFrame(data,
columns=cols,
index=ind)

Out[16]:
Name Age

A Ramesh 30

B Suresh 35

C Sathish 40

Step-5

𝐴𝑑𝑑 𝑛𝑒𝑤 𝑐𝑜𝑙𝑢𝑚𝑛

In [17]: name=['Ramesh','Suresh','Sathish']
age=[30,35,40]

data=zip(name,age)
cols=['Name','Age']
ind=['A','B','C']
df=pd.DataFrame(data,columns=cols,index=ind)
df

Out[17]:
Name Age

A Ramesh 30

B Suresh 35

C Sathish 40

if you want to add a new column

df['new column']
you need to have a list which is having some elements
that elements need to equal to number of rows
city_names=['Hyd','Blr','Chennai']
df['city']=city_names

In [19]: city_names=['Hyd','Blr','Chennai']
df['city']=city_names
df

Out[19]:
Name Age city

A Ramesh 30 Hyd

B Suresh 35 Blr

C Sathish 40 Chennai

Step-6

𝑢𝑝𝑑𝑎𝑡𝑒 𝑡ℎ𝑒 𝑒𝑥𝑠𝑖𝑠𝑡𝑖𝑛𝑔 𝑐𝑜𝑙𝑢𝑚𝑛

if you want to create new column or update the old column
both are same way

In [22]: df['Name']=['Swamy','Asif','Sathwik']
df

Out[22]:
Name Age city

A Swamy 30 Hyd

B Asif 35 Blr

C Sathwik 40 Chennai

Step-7
𝑑𝑟𝑜𝑝 𝑡ℎ𝑒 𝑐𝑜𝑙𝑢𝑚𝑛
In order to drop the column
We need to use drop method
It takes 3 parameters
drop column or row
mention the column name
axis
axis=1 reference as column
axis=0 reference as row
you want to create a new dataframe or
you want overwrite the existing dataframe
inplace= True

In [23]: df.drop('city', # column name

axis=1, # Column
inplace=True) # overwrite the same

In [24]: df

Out[24]:
Name Age

A Swamy 30

B Asif 35

C Sathwik 40

In [25]: name=['Ramesh','Suresh','Sathish']
age=[30,35,40]

df=pd.DataFrame(zip(name,age),
columns=['Name','Age'],
index=['A','B','C'])

city_names=['Hyd','Blr','Chennai']
df['city']=city_names

df.drop('city', # column name
axis=1, # Column
inplace=True) # overwrite the same
df

Out[25]:
Name Age

A Ramesh 30

B Suresh 35

C Sathish 40

𝑆𝑡𝑒𝑝 − 8
Drop rows
In [26]: df.drop('C', # column name
axis=0, # Column
inplace=True) # overwrite the same
df

Out[26]:
Name Age

A Ramesh 30

B Suresh 35

Step-9

𝑠𝑎𝑣𝑒 𝑡ℎ𝑒 𝑑𝑎𝑡𝑎𝑓𝑟𝑎𝑚𝑒

In [27]: df.to_csv("output.csv")
# while saving index consider as extra column
df.to_excel("output.xlsx")

In [28]: # read output csv

pd.read_csv("output.csv")

Out[28]:
Unnamed: 0 Name Age

0 A Ramesh 30

1 B Suresh 35

Step-10

𝑅𝑒𝑚𝑜𝑣𝑒 𝑇ℎ𝑒 𝐼𝑛𝑑𝑒𝑥

In [29]: # To avoid the above problem
# give index=False
df.to_csv("output.csv",index=False)

In [30]: pd.read_csv("output.csv")

Out[30]:
Name Age

0 Ramesh 30

1 Suresh 35
Creat dataframes using dictionary
In [32]: d1={"NAME":['Ramesh','Suresh','Sathish'],
"AGE":[30,35,40]}

pd.DataFrame(d1)

# No need of zip
# No need of column names

Out[32]:
NAME AGE

0 Ramesh 30

1 Suresh 35

2 Sathish 40

In [ ]:

Ainotes Dataframe
No ratings yet
Ainotes Dataframe
5 pages
12 Pandas
100% (1)
12 Pandas
21 pages
Ainotes
No ratings yet
Ainotes
5 pages
IP 12th Chapter 3
No ratings yet
IP 12th Chapter 3
9 pages
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
17 pages
Data Analysis Tools
No ratings yet
Data Analysis Tools
26 pages
12 Information Practices Text Book Preeti Arora
No ratings yet
12 Information Practices Text Book Preeti Arora
45 pages
Python Solutions
No ratings yet
Python Solutions
11 pages
PDF&Rendition 1
No ratings yet
PDF&Rendition 1
47 pages
Python
No ratings yet
Python
16 pages
Creating A CSV File Using Excel
No ratings yet
Creating A CSV File Using Excel
4 pages
NumPy and Pandas Basics Guide
No ratings yet
NumPy and Pandas Basics Guide
8 pages
Pandas Dataframe All Operations 1735471870
No ratings yet
Pandas Dataframe All Operations 1735471870
4 pages
05 Pandas Data Frames
No ratings yet
05 Pandas Data Frames
33 pages
GR12 Record Programs 6TH Onwards
No ratings yet
GR12 Record Programs 6TH Onwards
18 pages
Pandas
No ratings yet
Pandas
27 pages
CSV Data Handling Guide
No ratings yet
CSV Data Handling Guide
14 pages
Chapter Notes - Data Handling Using Pandas DataFrame
No ratings yet
Chapter Notes - Data Handling Using Pandas DataFrame
16 pages
Lab 1 ML Lab
No ratings yet
Lab 1 ML Lab
15 pages
Day08-Pandas-Tutorial: Pandas - by Punith V T
No ratings yet
Day08-Pandas-Tutorial: Pandas - by Punith V T
8 pages
Pandas
No ratings yet
Pandas
5 pages
Exp 3
No ratings yet
Exp 3
10 pages
Pandas Introduction: What Is Python Pandas Used For?
No ratings yet
Pandas Introduction: What Is Python Pandas Used For?
28 pages
Introduction To Pandas
No ratings yet
Introduction To Pandas
27 pages
Python Data Handling with Pandas
No ratings yet
Python Data Handling with Pandas
12 pages
Pandas DataFrame Guide for Informatics
No ratings yet
Pandas DataFrame Guide for Informatics
11 pages
Data Frames
No ratings yet
Data Frames
60 pages
File Handling
No ratings yet
File Handling
6 pages
CSL 410 L16
No ratings yet
CSL 410 L16
22 pages
Economy of Different Countries
No ratings yet
Economy of Different Countries
24 pages
Xii Record (Dataframe & CSV)
No ratings yet
Xii Record (Dataframe & CSV)
11 pages
Pandas
No ratings yet
Pandas
32 pages
File Ip
No ratings yet
File Ip
22 pages
Pandas
No ratings yet
Pandas
35 pages
DataFrame 1
No ratings yet
DataFrame 1
3 pages
Lab 9
No ratings yet
Lab 9
9 pages
Class X11 Dataframe Notes PDF
No ratings yet
Class X11 Dataframe Notes PDF
17 pages
Pandas DataFrame Basics
No ratings yet
Pandas DataFrame Basics
10 pages
Practical File Python
No ratings yet
Practical File Python
25 pages
Pandas Guide
No ratings yet
Pandas Guide
50 pages
L32, 33 Pandas
No ratings yet
L32, 33 Pandas
7 pages
Pandas, Numpy, Matplotlib
No ratings yet
Pandas, Numpy, Matplotlib
11 pages
Pandas
No ratings yet
Pandas
4 pages
Cheat Sheet Pandas
No ratings yet
Cheat Sheet Pandas
4 pages
Justenoughpython Pandas 220915 175329
No ratings yet
Justenoughpython Pandas 220915 175329
64 pages
Dataframe Ip
No ratings yet
Dataframe Ip
75 pages
Dataframe Syntax
No ratings yet
Dataframe Syntax
3 pages
Assignment 2
No ratings yet
Assignment 2
6 pages
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
2 pages
Data Frames Pandas, Handout 1
No ratings yet
Data Frames Pandas, Handout 1
16 pages
Pandas - Cheatsheet
No ratings yet
Pandas - Cheatsheet
4 pages
Pandas
No ratings yet
Pandas
8 pages
Pandas
No ratings yet
Pandas
13 pages
Pandas & PyNumS Essentials
No ratings yet
Pandas & PyNumS Essentials
10 pages
Pandas DataFrame Methods Guide
No ratings yet
Pandas DataFrame Methods Guide
12 pages
Data Handling for Data Scientists
No ratings yet
Data Handling for Data Scientists
163 pages
Python Pandas-DataFrames Complete - Jupyter Notebook
No ratings yet
Python Pandas-DataFrames Complete - Jupyter Notebook
34 pages
NumPy and Pandas Step
No ratings yet
NumPy and Pandas Step
9 pages
Test-1 - Python and Stat - Jupyter Notebook
0% (1)
Test-1 - Python and Stat - Jupyter Notebook
3 pages
EDA - Session-6 - Bi Variate Analysis
No ratings yet
EDA - Session-6 - Bi Variate Analysis
17 pages
EDA - Session-5 - Outlier Analysis
No ratings yet
EDA - Session-5 - Outlier Analysis
11 pages
EDA - Session-7 - Convert Categorical To Numerical
No ratings yet
EDA - Session-7 - Convert Categorical To Numerical
5 pages
Unit - 1
No ratings yet
Unit - 1
29 pages
Statistics Sampling Theoresm Session 8
No ratings yet
Statistics Sampling Theoresm Session 8
5 pages
Metamorphosis Clean
No ratings yet
Metamorphosis Clean
35 pages
Database Exam Prep for Students
No ratings yet
Database Exam Prep for Students
5 pages
Data Migration Strategy For AFP Reengineering Project
100% (6)
Data Migration Strategy For AFP Reengineering Project
36 pages
PL/SQL Technical Assessment
No ratings yet
PL/SQL Technical Assessment
11 pages
Informatica Powercenter Training Course Content PDF
No ratings yet
Informatica Powercenter Training Course Content PDF
8 pages
Appian Developer Associate Part 4
No ratings yet
Appian Developer Associate Part 4
95 pages
2.1 Fundamentals of RDBMS
No ratings yet
2.1 Fundamentals of RDBMS
34 pages
Section 2
No ratings yet
Section 2
6 pages
WDT Practical
No ratings yet
WDT Practical
18 pages
Data Warehouse Design Guide
No ratings yet
Data Warehouse Design Guide
105 pages
Database Essentials for IT Professionals
No ratings yet
Database Essentials for IT Professionals
10 pages
ADDM Report: TMPROD Database Analysis
No ratings yet
ADDM Report: TMPROD Database Analysis
45 pages
Advanced Database (Lab)
No ratings yet
Advanced Database (Lab)
11 pages
Telephone Billing System
78% (36)
Telephone Billing System
43 pages
Association Rule Learning
No ratings yet
Association Rule Learning
11 pages
Technical Skills Enhancement - PL/SQL Best Practices Oracle Architecture
No ratings yet
Technical Skills Enhancement - PL/SQL Best Practices Oracle Architecture
35 pages
Semester 2 Final Exam PL SQL
100% (1)
Semester 2 Final Exam PL SQL
9 pages
Study of Hadoop Features For Large Scale Data: Dipali Salunkhe, Devendra Bahirat, Neha V. Koushik, Deepali Javale
No ratings yet
Study of Hadoop Features For Large Scale Data: Dipali Salunkhe, Devendra Bahirat, Neha V. Koushik, Deepali Javale
4 pages
ASM Pocket PDF
No ratings yet
ASM Pocket PDF
2 pages
CSE2004 - DATABASE-MANAGEMENT-SYSTEMS - ETH - 1.0 - 0 - CSE2004 Database Management System PDF
No ratings yet
CSE2004 - DATABASE-MANAGEMENT-SYSTEMS - ETH - 1.0 - 0 - CSE2004 Database Management System PDF
14 pages
000 - Optimise Compute Resources
No ratings yet
000 - Optimise Compute Resources
7 pages
DatabaseDesignDocumentV1 1
No ratings yet
DatabaseDesignDocumentV1 1
15 pages
Practicals
100% (1)
Practicals
72 pages
SpagoBI Architecture
No ratings yet
SpagoBI Architecture
25 pages
CSE602 - Data Warehousing & Data Mining
No ratings yet
CSE602 - Data Warehousing & Data Mining
6 pages
Power Bi Connection Types: Trainer Name: Jabivulla Vanalli Email: Mobile: +91 7829533577 Youtube Channel
No ratings yet
Power Bi Connection Types: Trainer Name: Jabivulla Vanalli Email: Mobile: +91 7829533577 Youtube Channel
3 pages
ODA - C2C Available Resources-April-2024+ Rate Card.
No ratings yet
ODA - C2C Available Resources-April-2024+ Rate Card.
6 pages
El Masri CH 1 PPT
No ratings yet
El Masri CH 1 PPT
38 pages
Module - 6 Transaction Management in Dbms
No ratings yet
Module - 6 Transaction Management in Dbms
50 pages
Vtu 5TH Sem Cse DBMS Notes
100% (1)
Vtu 5TH Sem Cse DBMS Notes
54 pages
Data Science Essentials & Big Data Concepts
No ratings yet
Data Science Essentials & Big Data Concepts
20 pages

EDA - Session-1 - Basic Dataframe Opertaions-1

Uploaded by

EDA - Session-1 - Basic Dataframe Opertaions-1

Uploaded by

Reading the Csv file

In [1]: import pandas as pd # Dataframe operations

In [ ]: # data set name: visadataset

0 EZYV01 Asia High School N

1 EZYV02 Asia Master's Y

2 EZYV03 Asia Bachelor's N

3 EZYV04 Asia Bachelor's N

4 EZYV05 Africa Master's Y

... ... ... ... ...

25475 EZYV25476 Asia Bachelor's Y

25476 EZYV25477 Asia High School Y

25477 EZYV25478 Asia Master's Y

25478 EZYV25479 Asia Master's Y

25479 EZYV25480 Asia Bachelor's Y

25480 rows × 12 columns

0 30 unemployed married primary no 1787 no no cellular 19

1 33 services married secondary no 4789 yes yes cellular 11

2 35 management single tertiary no 1350 yes no cellular 16

3 30 management married tertiary no 1476 yes yes unknown 3

4 59 blue-collar married secondary no 0 yes no unknown 5

4516 33 services married secondary no -333 yes no cellular 30

4518 57 technician married secondary no 295 no no cellular 19

4519 28 blue-collar married secondary no 1137 no no cellular 6

4520 44 entrepreneur single tertiary no 1136 yes yes cellular 3

4521 rows × 17 columns

Creat dataframes using List

Out[10]: (['Ramesh', 'Suresh', 'Sathish'], [30, 35, 40])

𝐴𝑑𝑑 𝑛𝑒𝑤 𝑐𝑜𝑙𝑢𝑚𝑛

if you want to add a new column

𝑢𝑝𝑑𝑎𝑡𝑒 𝑡ℎ𝑒 𝑒𝑥𝑠𝑖𝑠𝑡𝑖𝑛𝑔 𝑐𝑜𝑙𝑢𝑚𝑛

In [23]: df.drop('city', # column name

𝑠𝑎𝑣𝑒 𝑡ℎ𝑒 𝑑𝑎𝑡𝑎𝑓𝑟𝑎𝑚𝑒

In [28]: # read output csv

𝑅𝑒𝑚𝑜𝑣𝑒 𝑇ℎ𝑒 𝐼𝑛𝑑𝑒𝑥

You might also like