Pandas and Python
Pandas and Python
In this tutorial, you will learn what a DataFrame is, how to create it from different sources,
how to export it to different results and how to manipulate its data.
Install pandas
Can you install pandas in Python?using pipRun the following command in cmd:
pip install pandas
condainstallpandas
<class 'pandas.core.frame.DataFrame'>
This result is called DataFrame! That is the basic unit of pandas with which we work.
to be addressed until the end of the tutorial.
The DataFrame is a labeled two-dimensional structure where we can store
data of different types. DataFrame is similar to a SQL table or a spreadsheet.
Excel.
import pandas
Now call the read_csv() method as follows:
pandas.read_csv('Book1.csv')
Book1.csv has the following content:
The code will generate the following DataFrame:
import pandas
pandas.read_csv('myFile.txt')
The myFile.txt has the following format:
The output of the previous code will be:
This text file is treated like a CSV file because we have separated elements.
by commas. The file can also use another delimiter, such as a semicolon, a
tabulator, etc.
Suppose we have a tab delimiter and the file looks like this:
When the delimiter is a tab, we will have the following result:
To define the tab character as a delimiter, pass the delimiter argument from
this way:
pandas.read_csv('myFile.txt', delimiter='\t')
Now the output will be:
import sqlite3
import pandas
con = sqlite3.connect('mydatabase.db')
When you run the above code, the output will be as follows:
Select columns
Let's assume we have three columns in the Employee table like this:
To select columns from the table, we will run the following query:
x
The result will be the following:
import pandas
df = pandas.DataFrame(frame_data)
In this code, we create a DataFrame with three columns and three rows using the method
Pandas DataFrame(). The result will be as follows:
df.loc[df['name'] == 'Jason']
df.loc [] or DataFrame.loc [] is a boolean array that can be used to access rows or
columns by values or labels. In the previous code, it will search for the row where the
My name is Jason.
>>> df = pandas.DataFrame(frame_data)
We create a DataFrame. Now we are going to access a row using df.loc[]:
>>> df.loc[1]
As you can see, we retrieved a row. We can do the same using the operator of
segmentation in the following way:
>>> df[1:2]
df.dtypes
The exit will be:
>>>df.apply(np.sqrt)
The output will be as follows:
>>> df.apply(np.sum)
To apply the function to a specific column, you can specify the column of the
next way:
>>>df['A'].apply(np.sqrt)
>>> df = pandas.DataFrame(frame_data)
Now to sort the values:
>>> df.sort_values(by=['A'])
The output will be:
The sort_values() method has a required 'by' attribute. In the previous code, the
values are sorted by column A. To sort by multiple columns, the code is
next:
>>>df.sort_values(by=['A'], ascending=False)
The output will be:
>>> df = pandas.DataFrame(frame_data)
Here we create a DataFrame with a duplicate row. To check for duplicate rows in
the DataFrame, use the DataFrame's duplicated() method.
>>> df.duplicated()
The result will be:
It can be seen that the last row is a duplicate. To delete this row, execute the following
line of code:
>>> df.drop_duplicates()
Now the result will be:
>>> df = pandas.DataFrame(frame_data)
Here you can see that Jason appears twice. If you want to remove duplicates by column,
just pass the column name as follows:
>>> df.drop_duplicates(['name'])
The result will be as follows:
Delete a column
To remove a whole column or row, we can use the drop() method of the DataFrame
specifying the name of the column or row.
>>> df = pandas.DataFrame(frame_data)
To delete a row with index 0 where the name is James, the age is 18 and the job
as an assistant, use the following code:
>>> df.drop([0])
We are going to create a DataFrame where the indices are the names:
Now we can delete a row with a certain value. For example, if we want to delete a
row where the name is Rogers, then the code will be:
>>> df.drop(['Rogers'])
The output will be:
If you want to delete the last row of the DataFrame and do not know the total number of rows,
You can use negative indexing as shown below:
>>>df.drop(df.index[-1])
-1 deletes the last row. Similarly, -2 will delete the last 2 rows and so on.
Sum a column
You can use the sum() method of the DataFrame to sum the elements of the column.
>>> df = pandas.DataFrame(frame_data)
Now to sum the elements of column A, use the following line of code:
>>> df['A'].sum()
You can also use the apply() method of the DataFrame and pass the sum method.
numpy to sum the values.
>>> df = pandas.DataFrame(frame_data)
To count the unique values in column A:
>>> df['A'].nunique()
As you can see, column A has only 2 unique values 23 and 12 and the other 12 is a
duplicate, that's why we have 2 in the output.
If you want to count all the values in a column, you can use the count() method of the
next way:
>>> df['A'].count()
Rows of subsets
To select a subset of a DataFrame, you can use brackets.
For example, we have a DataFrame that contains some integers. We can select or
find the subset of a row like this:
df.[start:count]
The starting point will be included in the subset, but the stopping point is not included.
For example, to select 3 rows starting from the first row, you will write:
>>> df[0:3]
The output will be:
That code means to start from the first row which is 0 and select 3 rows.
To select or retrieve a subset with the last row, use negative indexing:
>>> df[-1:]
Write to an Excel
To write a DataFrame to an Excel sheet, we can use the to_excel() method.
To write on an Excel sheet, you need to open the sheet, and to open an Excel sheet,
we will have to import the openpyxl module.
>>> df = pandas.DataFrame(frame_data)
import sqlite3
import pandas
con = sqlite3.connect('mydatabase.db')
df = pandas.DataFrame(frame_data)
df.to_sql('users', con)
In this code, we create a connection to a sqlite3 database. Then we create a
DataFrame with three rows and three columns.
Finally, we use the to_sql method of our DataFrame (df) and pass the name of
the table where the data will be stored along with the connection object.
The SQL database will look like this:
Write to JSON
You can use the DataFrame's to_json() method to write to a JSON file.
df.to_json("myJson.json")
In this line of code, the name of the JSON file is passed as an argument. The
The DataFrame will be stored in the JSON file. The file will contain the following content:
Write in an HTML file
You can use the DataFrame's to_html() method to create an HTML file with the
content of the DataFrame.
>>> df.to_html("myhtml.html")
The results file will have the following content:
When you open the HTML file in the browser, it will look like this: