KEMBAR78
Pandas Assignment Version-2 | PDF | Computer Science | Computing
0% found this document useful (0 votes)
5 views9 pages

Pandas Assignment Version-2

The document provides an introduction to the Pandas library in Python, covering installation, data structures (Series and DataFrame), and various methods for creating these structures. It explains how Series can be used like a NumPy array and a dictionary, while DataFrames can be constructed from different data sources, including lists, dictionaries, and NumPy arrays. The document includes code examples to illustrate the concepts discussed.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views9 pages

Pandas Assignment Version-2

The document provides an introduction to the Pandas library in Python, covering installation, data structures (Series and DataFrame), and various methods for creating these structures. It explains how Series can be used like a NumPy array and a dictionary, while DataFrames can be constructed from different data sources, including lists, dictionaries, and NumPy arrays. The document includes code examples to illustrate the concepts discussed.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Introduction to Pandas: Data Manipulation in Python

1. Installing Pandas
Pandas is a powerful Python library specifically designed for handling structured data. It
simplifies tasks like data cleaning, transformation, and analysis by providing user-friendly data
structures and functions. To begin using it, you first need to install the library.

Example:

You can easily install Pandas using pip, Python's package manager.

pip install pandas​

After installation, you can import it and check the version to confirm it's ready to use.

import pandas as pd​


print(pd.__version__)​

Output:

Pandas version: 2.1.4​

2. Understanding Pandas Data Structures


The two fundamental data types in Pandas are the Series and the DataFrame. A Series is a
one-dimensional array-like object capable of holding various data types, while a DataFrame is
a two-dimensional, table-like structure. Think of a DataFrame as a spreadsheet or SQL table.

Example:

A Series can represent a single column of data.


A DataFrame is a collection of Series objects, where each Series represents a column.

import pandas as pd​

# Creating a Series for a list of daily temperatures​
temperatures = pd.Series([25, 27, 24, 26])​
print(temperatures) ​
# Creating a DataFrame for student data​
student_data = pd.DataFrame({​
'Student_ID': [101, 102],​
'Score': [85, 92]​
})​
print(student_data)​

Output:

0 25​
1 27​
2 24​
3 26​
dtype: int64​
Student_ID Score​
0 101 85​
1 102 92​

3. Different Ways to Create a Series Object


A Pandas Series can be constructed from several different types of data sources, making it
highly versatile.

Example:

import pandas as pd​


import numpy as np​

# From a simple list​
fruits = pd.Series(["apple", "banana", "orange"])​

# From a NumPy array​
np_array = np.array([10, 20, 30])​
numbers = pd.Series(np_array)​

# From a Python dictionary​
# Keys become the index labels, and values become the data​
product_prices = pd.Series({"Laptop": 1200, "Mouse": 25, "Keyboard": 75})​

# From a single scalar value, repeated for a given index​
# The value '50' is assigned to each index label​
single_value_series = pd.Series(50, index=["item1", "item2", "item3"])​

print(fruits)​
print(numbers)​
print(product_prices)​
print(single_value_series)​

Output:

0 apple​
1 banana​
2 orange​
dtype: object​
0 10​
1 20​
2 30​
dtype: int64​
Laptop 1200​
Mouse 25​
Keyboard 75​
dtype: int64​
item1 50​
item2 50​
item3 50​
dtype: int64​

4. Series as a Specialized NumPy Array


A Series can be seen as an enhanced version of a NumPy array. While it shares core features
like vectorized operations, it adds the crucial element of a labeled index, which allows for
more intuitive data access and alignment.

Example:

The .values attribute of a Series provides access to the underlying NumPy array, while the
.index attribute reveals the added labels.

import pandas as pd​



gpa_scores = pd.Series([3.8, 3.5, 4.0], index=["A-1", "A-2", "A-3"])​

# The core values (like a NumPy array)​
print(gpa_scores.values)​

# The labeled index (the extra feature)​
print(f"Series index: {gpa_scores.index}")​

Output:

Series values: [3.8 3.5 4. ]​


Series index: Index(['A-1', 'A-2', 'A-3'], dtype='object')​

5. Series as a Specialized Dictionary


A Series acts similarly to a Python dictionary, where the index labels serve as keys and the
data values are the associated values. This allows for quick and efficient data retrieval using
familiar dictionary-style syntax.

Example:

You can access data points in a Series using their index label, just as you would use a key to
look up a value in a dictionary.

import pandas as pd​



city_populations = pd.Series([1000000, 250000, 500000], index=["Tokyo", "London", "Paris"])​

# Accessing the population of "London"​
print(f"Population of London: {city_populations['London']}")​

Output:

Population of London: 250000​

6. Understanding DataFrame Objects


A Pandas DataFrame is the most widely used data structure in Pandas. It’s a two-dimensional,
mutable table of data with labeled axes (rows and columns). It’s essentially a container for
multiple Series objects that share the same index.

Example:
import pandas as pd​
# Creating a DataFrame from a dictionary of lists​
# Each list becomes a column in the table​
employee_data = {​
'Employee_ID': [1, 2, 3],​
'Department': ['IT', 'HR', 'Finance']​
}​

employee_df = pd.DataFrame(employee_data)​
print(employee_df)​

Output:

Employee_ID Department​
0 1 IT​
1 2 HR​
2 3 Finance​

7. DataFrame as a Specialized NumPy Array


Just as a Series extends a NumPy array, a DataFrame can be viewed as an extended
two-dimensional NumPy array. It not only contains a grid of data but also provides labels for
both rows and columns, making it much easier to work with.

Example:

You can create a DataFrame from a NumPy array and then add meaningful labels for the
columns and rows.

import numpy as np​


import pandas as pd​

# A 2x3 NumPy array​
np_matrix = np.array([[10, 20, 30], [40, 50, 60]])​

# Creating a DataFrame with column and row labels​
df_from_array = pd.DataFrame(np_matrix, columns=["Col A", "Col B", "Col C"], index=["Row 1",
"Row 2"])​
print(df_from_array)​

Output:​

Col A Col B Col C​
Row 1 10 20 30​
Row 2 40 50 60​

8. DataFrame as a Specialized Dictionary


A DataFrame can also be understood as a dictionary where the keys are the column names
and the values are the corresponding Series objects. This means you can access a column
using dictionary-like syntax.

Example:

Accessing a specific column from a DataFrame is straightforward using bracket notation.

import pandas as pd​



dataset = pd.DataFrame({​
'Product': ['Phone', 'Tablet'],​
'Price': [800, 450]​
})​

# Accessing the 'Price' column​
prices = dataset['Price']​
print(f"The prices are: \n{prices}")​

Output:​
The prices are: ​
0 800​
1 450​
Name: Price, dtype: int64​

9. Constructing DataFrame Objects (Multiple


Methods)
DataFrames are incredibly flexible and can be created from a wide variety of data sources.
Here are some of the most common methods.
(a) From a Single Series​
A single Series can be directly converted into a DataFrame. The
Series' index becomes the DataFrame's row index, and its values
become a single column.
import pandas as pd​

scores_series = pd.Series([95, 88, 72], name="Exam_Scores")​
scores_df = pd.DataFrame(scores_series)​
print(scores_df)​

Output:

Exam_Scores​
0 95​
1 88​
2 72​

(b) From a List of Dictionaries


This is a very common method, where each dictionary in the list represents a single row, and
the dictionary keys become the column names.

import pandas as pd​



project_members = [​
{"Name": "Alex", "Role": "Developer"},​
{"Name": "Ben", "Role": "Designer"},​
{"Name": "Chris", "Role": "Manager"}​
]​
project_df = pd.DataFrame(project_members)​
print(project_df)​

Output:

Name Role​
0 Alex Developer​
1 Ben Designer​
2 Chris Manager
(c) From a Dictionary of Series Objects
By using a dictionary where the keys are column names and the values are Series objects, you
can build a DataFrame with aligned columns.

import pandas as pd​



# Creating two Series with a shared index​
units = pd.Series([150, 200], index=["Q1", "Q2"])​
revenue = pd.Series([5000, 7500], index=["Q1", "Q2"])​

sales_report = pd.DataFrame({"Units_Sold": units, "Total_Revenue": revenue})​
print(sales_report)​

Output

Units_Sold Total_Revenue​
Q1 150 5000​
Q2 200 7500​

(d) From a Two-Dimensional NumPy Array


A 2D NumPy array can be used as the foundation for a DataFrame. You can then add column
and row labels for better readability.

import numpy as np​


import pandas as pd​

data_array = np.array([[1, 2, 3], [4, 5, 6]])​
dataset_df = pd.DataFrame(data_array, columns=["A", "B", "C"])​
print(dataset_df)​

Output:

A B C​
0 1 2 3​
1 4 5 6​

(e) From a NumPy Structured Array


This method is useful when you have data with a mix of data types (e.g., numbers and strings)
that you want to organize into a DataFrame.

import numpy as np​


import pandas as pd​

# A structured array with a defined data type for each field​
employee_info = np.array([​
(101, "John", 60000),​
(102, "Jane", 75000)​
], dtype=[("ID", "i4"), ("Name", "U10"), ("Salary", "i4")])​

employee_info_df = pd.DataFrame(employee_info)​
print(employee_info_df)​

Output:

ID Name Salary​
0 101 John 60000​
1 102 Jane 75000​

You might also like