Unit 3
Using Arrays with Numpy: Vectors and operations
vector properties and characteristics,
Pandas
Python Vector With Various
Operations Using Numpy
• Python Vector, in layman’s language, is nothing but a
one-dimensional array of numbers. The only difference
between python vectors and arrays is that. Unlike typical
arrays, the vectors’ data, and size are mutable. The vectors are
also known as dynamic arrays. Arrays and vectors are both
basic data structures. The Vectors in Python comprising of
numerous values in an organized manner.
• Python Vectors can be represented as: v = [v1, v2, v3].
Python Vector operations using
NumPy library:
• Single dimensional arrays are created in python by importing
an array module. The multi-dimensional arrays cannot be
created with the array module implementation. Vectors are
created using the import array class.
• However, various operations are performed over vectors. Some
of the operations include basic addition, subtraction,
multiplication, division. Some other operations include dot
product and cross product of two vectors.
• For doing the faster operations on arrays and vectors, one
such library containing all such functions is NumPy. For
instance, the NumPy module is thus imported into the python
program for the required vectors’ operations.
Scalar and Vector Mathematical Definition
• Mathematicians and scientists call a quantity which depends on
direction a vector quantity.
• A quantity which does not depend on direction is called a
scalar quantity.
• Vector quantities have two characteristics, a magnitude and a
direction.
• Scalar quantities have only a magnitude.
• Scalar Eg.: Volume
• Vector Eg. : Velocity
Importing NumPy for Python Vector:
• If we write import NumPy in our programs while using any
python 3 versions. It shows an error. The NumPy package is
not available by default. We need to install it manually. Python
has an amazing tool called the pip (pip install packages).
Vector definition in python:
#create a vector
from numpy import array
vec = array([1, 5, 6])
print(vec)
Output
[1, 5, 6]
Addition of vectors in Python:
# addition of vectors
from numpy import array Output
x = array([1, 5, 6]) [1, 5, 6]
print(x) [1, 5, 6]
y = array([1, 5, 6]) [2, 10, 12]
print(y)
z=x+y
print(z)
Subtraction of vectors in Python:
# Subtraction of vectors
from numpy import array Output
x = array([1, 5, 6]) [1 5 6]
print(x) [1 4 6]
y = array([1, 4, 6]) [0 1 0]
print(y)
z=x-y
print(z)
Python Vector Dot Product:
#vector dot product
from numpy import array Output
x = array([1, 2, 3])
print(x) [1, 2, 3]
y = array([1, 2, 5]) [1, 2, 5]
20
print(y)
z = x.dot(y)
print(z)
Python Vector Cross Product:
#vector cross product
from numpy import array
import numpy as np Output
[1 2 3]
x = array([1, 2, 3])
[1 2 5]
print(x) [ 4 -2 0]
y = array([1, 2, 5])
print(y)
z = np.cross(x, y)
print(z)
Multiplication of a Python Vector with
a scalar:
# scalar vector multiplication
from numpy import array
a = array([1, 2, 3]) Output
print(a)
[1, 2, 3]
b = 2.0
2.0
print(s) [2.0, 4.0, 6.0]
c=b*a
print(c)
Unit Vector of Python Vector:
#unit vector product
from numpy import array
x = array([1, 2, 3])
Output
x_hat = x / (x**2).sum()**0.5 [0.26726124 0.53452248 0.80178373]
print(x)
• Python vector is simply a one-dimensional array. We can
perform all operations using lists or importing an array
module. But installing and importing the NumPy package
made all the vector operations easier and faster. Vectors
are plotted and drawn using arrows by importing
matplotlib.pyplot. To draw vectors with arrows in python, the
function used is matplotlib.pyplot.quiver(). However, the
quiver function takes four arguments. Out of which, the first
two arguments are the data about where the arrows will be.
And the other two take the data of the ending point of the
arrows.
Pandas
• Pandas is an open-source library that is made mainly for
working with relational or labeled data both easily and intuitively.
It provides various data structures and operations for
manipulating numerical data and time series. This library is built
on top of the NumPy library. Pandas is fast and it has high
performance & productivity for users.
History
• Pandas were initially developed by Wes McKinney in 2008 while
he was working at AQR Capital Management. He convinced the
AQR to allow him to open source the Pandas. Another AQR
employee, Chang She, joined as the second major contributor
to the library in 2012. Over time many versions of pandas have
been released. The latest version of the pandas is 1.4.1
Advantages
• Fast and efficient for manipulating and analyzing data.
• Data from different file objects can be loaded.
• Easy handling of missing data (represented as NaN) in floating point as
well as non-floating point data
• Size mutability: columns can be inserted and deleted from DataFrame and
higher dimensional objects
• Data set merging and joining.
• Flexible reshaping and pivoting of data sets
• Provides time-series functionality.
• Powerful group by functionality for performing split-apply-combine
operations on data sets.
Getting Started
• After the pandas have been installed into the system, you need
to import the library. This module is generally imported as:
import pandas as pd
Here, pd is referred to as an alias to the Pandas. However, it is
not necessary to import the library using the alias, it just helps in
writing less amount code every time a method or property is
called.
• Pandas generally provide two data structures for manipulating
data, They are:
• Series
• DataFrame
Series:
• Pandas Series is a one-dimensional labelled array capable of
holding data of any type (integer, string, float, python objects,
etc.). The axis labels are collectively called indexes. Pandas
Series is nothing but a column in an excel sheet.
import pandas as pd
import numpy as np
Output
# Creating empty series
ser = pd.Series()
Series([], dtype: float64)
0 g
print(ser)
1 o
2 o
# simple array
3 d
data = np.array(['g', ‘o’, ‘o', ‘d']) dtype: object
ser = pd.Series(data)
print(ser)
DataFrame
• Pandas DataFrame is a two-dimensional size-mutable, potentially
heterogeneous tabular data structure with labeled axes (rows
and columns). A Data frame is a two-dimensional data structure,
i.e., data is aligned in a tabular fashion in rows and columns.
Pandas DataFrame consists of three principal components, the
data, rows, and columns.
Output
import pandas as pd
Empty DataFrame
# Calling DataFrame constructor Columns: []
df = pd.DataFrame()
Index: []
print(df)
0
# list of strings 0 God
lst = [‘God', ‘is', ‘Good']
1 is
# Calling DataFrame constructor on list 2 Good
df = pd.DataFrame(lst)
print(df)
Why Pandas is used for Data Science
• Pandas are used in conjunction with other libraries that are
used for data science. It is built on the top of the NumPy library
which means that a lot of structures of NumPy are used or
replicated in Pandas. The data produced by Pandas are often
used as input for plotting functions of Matplotlib, statistical
analysis in SciPy, machine learning algorithms in Scikit-learn.