7/18/24, 11:44 AM 3-introduction-to-numpy.
ipynb - Colab
In this series of articles, we will cover the basics of Data Analysis using Python. The lessons will start growing gradually until
forming a concrete analytical mindset for students. This lesson will cover the essentials of Scientific Computing in Python using
NumPy
What is NumPy?
NumPy is short for Numerical Python and, as the name indicates, it deals with everything related to Scientific Computing. The basic object in
NumPy is the ndarray which is also a short for n-dimentional array and in a mathematical context it means multi-dimentional array.
Any mathematical operation such as differention, optimization, solving equations simultionously will need to be defined in a matrix format to be
done properly and easily and that was the pupose of programming languages like Matlab.
Unlike any other python object, ndarray has some intersting aspects that ease any mathematical computation.
NumPy arrays have a fixed size at creation, unlike Python lists (which can grow dynamically). Changing the size of an ndarray will create a
new array and delete the original.
The elements in a NumPy array are all required to be of the same data type, and thus will be the same size in memory.
NumPy arrays facilitate advanced mathematical and other types of operations on large numbers of data. Typically, such operations are
executed more efficiently and with less code than is possible using Python’s built-in sequences.
For the sake of Data Analytics there will not be a lot of mathematical compution proplems but later on when we will start working with data in
tables. You will figure out that any table is more or less a 2d dimentional array and that's why it's essiential to know a bit about array that will
convert in future lessons to tables of data.
What is an array?
Array is a mathematical object that is defined to hold some numbers organized in rows and columns. The structure of the array should allow
selecting (indexing) any of the inner items. Later on we will see how to do this in code.
Below is a graph for the structure of arrays.
https://colab.research.google.com/drive/1Eu1iJwqopohMA9DYh1T5AhPVwsSwgF3J#printMode=true 1/9
7/18/24, 11:44 AM 3-introduction-to-numpy.ipynb - Colab
keyboard_arrow_down Creating a NumPy array
import numpy as np
# Create a 2-d array
arr2d = np.array([[1, 2, 3], [4, 5, 6]])
# Print its content
print(arr2d)
# Print array type
print(type(arr2d))
[[1 2 3]
[4 5 6]]
<class 'numpy.ndarray'>
# Let's create a 1-d array
arr1d = np.array([[1, 2, 3]])
# print its content
print(arr1d)
# print array type
print(type(arr1d))
[[1 2 3]]
<class 'numpy.ndarray'>
keyboard_arrow_down Array Shape
Array Shape is the most important aspect to take care of when dealing with array and array-maths in general.
It's simply: shape = N rows ∗ N columns
It's a major info to know espcially when dealing with multiplication in arrays. Lets see it in a more visual way.
https://colab.research.google.com/drive/1Eu1iJwqopohMA9DYh1T5AhPVwsSwgF3J#printMode=true 2/9
7/18/24, 11:44 AM 3-introduction-to-numpy.ipynb - Colab
Lets see how to get this info in Python using .shape
print(arr2d.shape)
print(arr1d.shape)
(2, 3)
(1, 3)
keyboard_arrow_down Special Types of Arrays.
In some cases, we will need to create some special types of arrays such as array of zeros, identity array, or array of ones, etc.. Lets see some
examples that could be implented using NumPy
np.zeros() : creating array of all zeros.
np.ones() : creating array of all ones.
np.empty() : creating array of random values.
np.full() : creating array full of the same number.
np.eye() : creating an identity array. and much more!
zeros = np.zeros((2,2)) # Create an array of all zeros
print(zeros)
[[0. 0.]
[0. 0.]]
ones = np.ones((5,2)) # Create an array of all ones
print(ones)
[[1. 1.]
[1. 1.]
[1. 1.]
[1. 1.]
[1. 1.]]
full = np.full((5,4), -9) # Create a constant array
print(full)
[[-9 -9 -9 -9]
[-9 -9 -9 -9]
[-9 -9 -9 -9]
[-9 -9 -9 -9]
[-9 -9 -9 -9]]
eye = np.eye(5) # Create a 2x2 identity matrix
print(eye)
https://colab.research.google.com/drive/1Eu1iJwqopohMA9DYh1T5AhPVwsSwgF3J#printMode=true 3/9
7/18/24, 11:44 AM 3-introduction-to-numpy.ipynb - Colab
[[1. 0. 0. 0. 0.]
[0. 1. 0. 0. 0.]
[0. 0. 1. 0. 0.]
[0. 0. 0. 1. 0.]
[0. 0. 0. 0. 1.]]
np.random.seed(0) # try to comment and uncomment this line
random = np.random.random((2,2)) # Create an array filled with random values
print(random)
[[0.5488135 0.71518937]
[0.60276338 0.54488318]]
keyboard_arrow_down Array Slicing and Indexing
Array slicing means to select a part of an arry not the entire version and indexing has been touched previously!
# Create the following rank with shape (3, 4)
myarr = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
print(myarr)
[[ 1 2 3 4]
[ 5 6 7 8]
[ 9 10 11 12]]
Suppose that we want to select the second row of myarr ?
This is what called slicing. It can be done using the following syntacx.
arr[start_index_of_rows:end_index_for_rows, start_index_for_columns:end_index_for_columns]
Just keep in mind two things:
Python is generally a zero-indexed languge so, your first column will be column zero and the same applies for rows.
The end boundary for the above syntax is exclusive so the slicing stops directly before that boundary.
# Lets type the syntax for selecting the second row.
row_r2 = myarr[1:2, :]
print(row_r2)
[[5 6 7 8]]
Now, lets try selecting the second column with the same manner.
col_c2 = myarr[:, 1:2]
print(col_c2)
[[ 2]
[ 6]
[10]]
Now, lets select the slice that come from the first two rows and two columns
myarr_slice = myarr[:2, :2]
print(myarr_slice)
[[1 2]
[5 6]]
keyboard_arrow_down Integer Array Indexing
What if we want to select some specific elements in the array?
We should select this based on the mathematical indexing and, for sure, with applying the zero-indexing.
The mathematical way of indexing is as following.
https://colab.research.google.com/drive/1Eu1iJwqopohMA9DYh1T5AhPVwsSwgF3J#printMode=true 4/9
7/18/24, 11:44 AM 3-introduction-to-numpy.ipynb - Colab
Lets see some code examples
# Lets define a new array
newarr = np.array([[1,2], [3, 4], [5, 6]])
# Print it
print(newarr)
[[1 2]
[3 4]
[5 6]]
Now we will try to select items with the following indexes:
0*0
1*1
2*0
The values will be 1 , 4 , and 5 respectivily
print(newarr[[0, 1, 2], [0, 1, 0]]) # Prints "[1 4 5]"
[1 4 5]
keyboard_arrow_down Boolean Indexing
In some scenarios, the task will be to select based on some critiera such as the elements greater than 2 or less or equal -1. Luckily, Python is
capable of doing such type of indexing easily without the need to consturct any loops or so. Lets see how!
# Define an array
otherNewArray = np.array([[1,2], [3, 4], [5, 6]])
# Lets print it
print(otherNewArray)
# Consturuct a boolean index (To check for elements greater than 2)
bool_idx = (otherNewArray > 2)
# Print the result of the boolean index
print(bool_idx)
# Now we will use such index to print all elements greater than 2
print(otherNewArray[bool_idx])
[[1 2]
[3 4]
[5 6]]
[[False False]
[ True True]
[ True True]]
[3 4 5 6]
https://colab.research.google.com/drive/1Eu1iJwqopohMA9DYh1T5AhPVwsSwgF3J#printMode=true 5/9
7/18/24, 11:44 AM 3-introduction-to-numpy.ipynb - Colab
keyboard_arrow_down Data Types in NumPy
NumPy has some data types that could be refered with one character, like i for integers, u for unsigned integers etc.
Below is a list of all data types in NumPy and the characters used to represent them.
i - integer
b - boolean
u - unsigned integer
f - float
c - complex float
m - timedelta
M - datetime
O - object
S - string
U - unicode string
V - fixed chunk of memory for other type ( void )
In general, we will not use all of them. Only the famous ones are heavily used such as iteger , float , string .
Now, lets see how to chech a datatype for a NumPy array.
x = np.array([1, 2])
print(x.dtype)
int64
Here, the datatype of the inner elements, which must be unified, is int64
y = np.array([1.0, 2.0])
print(y.dtype)
float64
This one is float64
While creating a NumPy array we can force a specific data type. Lets see the following example.
z = np.array([1.0, 2.0], dtype='S')
print(z.dtype)
|S3
Here, the elements of the array z are str . Lets define a float array.
f = np.array([1, 2], dtype='f')
print(f.dtype)
float32
keyboard_arrow_down Array Math
NumPy is supporting all the mathematical operations on arrays. Lets see some examples.
Now, we will define two arrays on which the whole mathematical operations will be applied.
x = np.array([[1,2],[3,4]], dtype=np.float64)
y = np.array([[5,6],[7,8]], dtype=np.float64)
https://colab.research.google.com/drive/1Eu1iJwqopohMA9DYh1T5AhPVwsSwgF3J#printMode=true 6/9
7/18/24, 11:44 AM 3-introduction-to-numpy.ipynb - Colab
keyboard_arrow_down Elementwise sum
print(x + y)
print('='*10)
print(np.add(x, y))
print('='*10)
[[ 6. 8.]
[10. 12.]]
==========
[[ 6. 8.]
[10. 12.]]
==========
keyboard_arrow_down Elementwise difference
print(x - y)
print('='*10)
print(np.subtract(x, y))
print('='*10)
[[-4. -4.]
[-4. -4.]]
==========
[[-4. -4.]
[-4. -4.]]
==========
keyboard_arrow_down Elementwise product
print(x * y)
print('='*10)
print(np.multiply(x, y))
print('='*10)
[[ 5. 12.]
[21. 32.]]
==========
[[ 5. 12.]
[21. 32.]]
==========
keyboard_arrow_down Elementwise division
print(x / y)
print('='*10)
print(np.divide(x, y))
print('='*10)
[[0.2 0.33333333]
[0.42857143 0.5 ]]
==========
[[0.2 0.33333333]
[0.42857143 0.5 ]]
==========
keyboard_arrow_down Elementwise square root
print(np.sqrt(x))
[[1. 1.41421356]
[1.73205081 2. ]]
keyboard_arrow_down Transpose of a Matrix
https://colab.research.google.com/drive/1Eu1iJwqopohMA9DYh1T5AhPVwsSwgF3J#printMode=true 7/9
7/18/24, 11:44 AM 3-introduction-to-numpy.ipynb - Colab
print(x)
print('='*10)
print(x.T)
[[1. 2.]
[3. 4.]]
==========
[[1. 3.]
[2. 4.]]
keyboard_arrow_down Dot Product
np.dot(x, y)
array([[19., 22.],
[43., 50.]])
keyboard_arrow_down Broadcasting
The term "broadcasting" describes how Numpy handles arrays of differing dimensions when performing operations that result in restrictions;
the smaller array is broadcast across the bigger array to ensure that they have compatible dimensions
As we know that Numpy is built in C, broadcasting offers a way to vectorize array operations so that looping happens in C rather than Python.
This results in effective algorithm implementations without the requirement for extra data duplication.
In the follwing example, we need to add the elements of y to each row of array x . We will do this using two methods:
The conventional Python loop.
The broadcasting method.
x = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])
v = np.array([1, 0, 1])
print(x)
print('='*10)
print(v)
[[ 1 2 3]
[ 4 5 6]
[ 7 8 9]
[10 11 12]]
==========
[1 0 1]
Lets create an empty array y with the same shape of x that will hold the result of the addition process.
%%time
# This command will calculate the excution time for the whole cell
# Create an empty matrix with the same shape as x
y = np.empty_like(x)
# Add the vector v to each row of the matrix x with an explicit loop
for i in range(4):
y[i, :] = x[i, :] + v
print(y)
[[ 2 2 4]
[ 5 5 7]
[ 8 8 10]
[11 11 13]]
CPU times: user 275 µs, sys: 37 µs, total: 312 µs
Wall time: 308 µs
Now, lets use the concept of Broadcasting
%%time
z = x + v
print(z)
https://colab.research.google.com/drive/1Eu1iJwqopohMA9DYh1T5AhPVwsSwgF3J#printMode=true 8/9
7/18/24, 11:44 AM 3-introduction-to-numpy.ipynb - Colab
[[ 2 2 4]
[ 5 5 7]
[ 8 8 10]
[11 11 13]]
CPU times: user 692 µs, sys: 0 ns, total: 692 µs
Wall time: 672 µs
We can notice that Broadcasting is faster and easier in implementation.
This notebook is part of my Python for Data Analysis course. If you find it useful, you can upvote it! Also, you can follow me on LinkedIn and
Twitter.
Below are the contents of the whole course:
1. Introduction to Python
2. Iterative Operations & Functions in Python
https://colab.research.google.com/drive/1Eu1iJwqopohMA9DYh1T5AhPVwsSwgF3J#printMode=true 9/9