Unit3 Notes
Unit3 Notes
NumPy Basics:
NumPy, or "Numerical Python," is a key library for scientific computing,
founded by Travis Oliphant in 2005. Its core object, the ndarray, is a
multidimensional array of a single data type that is up to 50 times faster
than standard Python lists. This efficiency makes it vital for data-intensive
fields like machine learning.
Advantages of NumPy array:
Memory Efficiency: ndarrays are stored in a contiguous block of
memory. This allows for cache-friendly access, a concept called
locality of reference, enabling the CPU to retrieve data quickly.
Python lists, by contrast, store pointers to scattered objects.
Vectorization: NumPy operations are vectorized, meaning they
apply to the entire array at once without slow Python for loops. These
operations are built on optimized C and Fortran code, allowing NumPy
to leverage modern CPU architectures for parallel processing,
significantly boosting performance.
print("Dimensions:", arr1.ndim)
# Output: Dimensions: 1
print("Shape:", arr1.shape)
# Output: Shape: (3,)
<hr>
2D Array (Matrix): A two-dimensional array has rows and columns. This is a
common structure for representing data tables or matrices.
Python
import numpy as np
print("Dimensions:", arr2.ndim)
# Output: Dimensions: 2
print("Shape:", arr2.shape)
# Output: Shape: (2, 3)
The shape (2, 3) indicates the array has 2 rows and 3 columns.
<hr>
3D Array (Tensor): A three-dimensional array can be thought of as a
collection of matrices.
Python
import numpy as np
arr3 = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])
print(arr3)
# Output:
# [[[ 1 2 3]
# [ 4 5 6]]
#
# [[ 7 8 9]
# [10 11 12]]]
print("Dimensions:", arr3.ndim)
# Output: Dimensions: 3
print("Shape:", arr3.shape)
# Output: Shape: (2, 2, 3)
The shape (2, 2, 3) means the array has 2 matrices, each with 2 rows and 3
columns. The number of elements is 2times2times3=12.
print(arr[0]) # Output: 10
print(arr[3]) # Output: 40
print(arr[-1]) # Output: 50 (negative indexing works the same way)
Multidimensional Arrays: To access an element in a 2D array or higher,
you provide a comma-separated tuple of indices for each dimension.
Python
import numpy as np
arr_2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
Slicing
Slicing is used to select a range of elements. The syntax is start:stop:step,
where stop is exclusive.
1D Array Slicing:
Python
import numpy as np
arr = np.array([10, 20, 30, 40, 50])
Array Iteration
Iterating through NumPy arrays is fundamental for processing data.
Standard Iteration: You can iterate over the first dimension of an array
using a for loop. For multidimensional arrays, this will return a sub-
array at each step.
Python
import numpy as np
arr_2d = np.array([[1, 2, 3], [4, 5, 6]])
for row in arr_2d:
print(row)
# Output:
# [1 2 3]
# [4 5 6]
Iterating with nditer: To iterate through every element of a
multidimensional array, regardless of its shape, use the np.nditer()
function.
Python
import numpy as np
arr_2d = np.array([[1, 2], [3, 4]])
for element in np.nditer(arr_2d):
print(element)
# Output: 1, 2, 3, 4
Enumerated Iteration: To get both the index and the value during
iteration, use np.ndenumerate().
Python
import numpy as np
arr_2d = np.array([[1, 2], [3, 4]])
for index, value in np.ndenumerate(arr_2d):
print(f"Index: {index}, Value: {value}")
# Output:
# Index: (0, 0), Value: 1
# Index: (0, 1), Value: 2
# Index: (1, 0), Value: 3
# Index: (1, 1), Value: 4
Joining Arrays
Joining means putting two or more arrays together. The np.concatenate()
function is a primary way to do this. You specify the arrays to join and the
axis along which to join them.
Python
import numpy as np
arr1 = np.array([[1, 2], [3, 4]])
arr2 = np.array([[5, 6], [7, 8]])
Arithmetic operations:
Arithmetic operations are used for numerical computation and can
perform them on arrays using NumPy. With NumPy we can quickly add,
subtract, multiply, divide and get power of elements in an array. NumPy
performs these operations even with large amounts of data. The basic
arithmetic functions in NumPy and show how to use them for simple
calculations.
1. Addition of Arrays
Addition is an arithmetic operation where the corresponding elements of two
arrays are added together. In NumPy the addition of two arrays is done using
the np.add() function.
import numpy as np
a = np.array([5, 72, 13, 100])
b = np.array([2, 5, 10, 30])
add_ans = np.add(a, b) print(add_ans)
Output: [ 7 77 23 130]
2. Subtraction of Arrays
Subtract two arrays element-wise using the np.subtract() function. This
function subtracts each element of the second array from the corresponding
element in the first array.
import numpy as np
a = np.array([5, 72, 13, 100])
b = np.array([2, 5, 10, 30])
sub_ans = np.subtract(a, b) print(sub_ans)
Output: [ 3 67 3 70]
3. Multiplication of Arrays
Multiplication in NumPy can be done element-wise using
the np.multiply() function. This multiplies corresponding elements of two
arrays.
import numpy as np
a = np.array([5, 72, 13, 100])
b = np.array([2, 5, 10, 30])
mul_ans = np.multiply(a, b) print(mul_ans)
Output: [ 10 360 130 3000]
4. Division of Arrays
Division is another important operation that is performed element-wise using
the np.divide() function. This divides each element of the first array by the
corresponding element in the second array.
import numpy as np
a = np.array([5, 72, 13, 100])
b = np.array([2, 5, 10, 30])
div_ans = np.divide(a, b) print(div_ans)
Output: [ 2.5 14.4 1.3 3.33333333]
5. Exponentiation (Power)
It allows us to raise each element in an array to a specified power. In NumPy,
this can be done using the np.power() function.
import numpy as np
a = np.array([5, 72, 13, 100])
b = np.array([2, 5, 10, 30])
pow_ans = np.power(a, b) print(pow_ans)
Output: [25 1934917632 137858491849 1152921504606846976]
6. Modulus Operation
It finds the remainder when one number is divided by another. In NumPy, you
can use the np.mod() function to calculate the modulus element-wise
between two arrays.
import numpy as np
a = np.array([5, 72, 13, 100])
b = np.array([2, 5, 10, 30])
mod_ans = np.mod(a, b) print(mod_ans)
Output: [ 1 2 3 10]
With these basic arithmetic functions in NumPy we can efficiently perform
calculations on arrays.
Boolean Indexing:
Access array elements based on conditions instead of explicit indices.
Practice with Boolean Indexing: Dataset
import numpy as np
data = np.array([12, 43, 36, 32, 51, 18, 79, 7])
print("Data: ", data) # Data: [12 43 36 32 51 18 79 7]
Practice with Boolean Indexing: Boolean Mask
Now, suppose we want to extract elements greater than 30. We form a
Boolean array checking this condition for data:
bool_array = data > 30
print("Boolean Array: ", bool_array) # Boolean Array: [False True True True
True False True False]
Practice with Boolean Indexing: Selecting Data
To extract the elements that satisfy our condition from data, we use
the bool_array as an index:
filtered_data = data[bool_array]
print("Filtered Data: ", filtered_data) # Filtered Data: [43 36 32 51 79]
Now filtered_data only holds values from data that are greater than 30.
Complex Filter Condition
We can also use Boolean conditions to combine multiple criteria using logical
operators like & for AND and | for OR.
Consider an array of prices per unit for different products in a retail store:
prices = np.array([15, 30, 45, 10, 20, 35, 50])
print("Prices: ", prices) # Prices: [15 30 45 10 20 35 50]
Suppose we want to find prices that are greater than 20 and less than 40. We
can combine conditions using the & operator:
filtered_prices = prices[(prices > 20) & (prices < 40)]
print("Filtered Prices (20 < price < 40): ", filtered_prices)
# Filtered Prices (20 < price < 40): [30 35]
Now, let's consider finding prices that are either less than 15 or greater than
45 using the | operator:
filtered_prices_or = prices[(prices < 15) | (prices > 45)]
print("Filtered Prices (price < 15 OR price > 45): ", filtered_prices_or)
# Filtered Prices (price < 15 OR price > 45): [10 50]
Using these logical operators, you can create complex filtering conditions to
extract exactly the data you need.
Fancy Indexing
A tool for accessing multiple non-adjacent array items. pass an array of
indices to select these items. Consider an array of seven numbers:
data = np.array([11, 22, 33, 44, 55, 66, 77])
print("Data: ", data) # Data: [11 22 33 44 55 66 77]
fetch the 1st, 3rd, and 5th elements together:
fancy_indexes = np.array([0, 2, 4])
fancy_data = data[fancy_indexes]
print("Data from Fancy Indexing: ", fancy_data) # Data from Fancy Indexing:
[11 33 55]
Practice with Fancy Indexing
Let's consider a practical use of Fancy Indexing. Given an array representing
people's ages, we want to fetch the ages of person 2, person 5, and person
7:
ages = np.array([15, 22, 27, 35, 41, 56, 63, 74, 81])
print("Initial Ages Array: ", ages) # Initial Ages Array: [15 22 27 35 41 56 63
74 81]
indexes = np.array([1, 4, 6]) # Indices of values of interest
fetched_ages = ages[indexes]
print("Fetched Ages: ", fetched_ages) # Fetched Ages: [22 41 63]
Transposing Arrays
The numpy.transpose() function or .T attribute is used to reverse or permute
the axes of an array.
Example:
import numpy as np
# Create a 2D array
arr = np.array([[1, 2, 3], [4, 5, 6]])
# Transpose using .T
transposed1 = arr.T
# Transpose using numpy.transpose()
transposed2 = np.transpose(arr)
print(transposed1)
# Output:
# [[1 4]
# [2 5]
# [3 6]]
For higher-dimensional arrays, you can specify the order of axes:
Copy codearr_3d = np.random.rand(2, 3, 4)
transposed = np.transpose(arr_3d, axes=(1, 0, 2)) # Rearrange axes
Swapping Axes
The numpy.swapaxes() function swaps two specified axes of an array while
leaving others unchanged.
Example:
Copy code# Create a 3D array
arr_3d = np.random.rand(2, 3, 4)
# Swap axes 0 and 1
swapped = np.swapaxes(arr_3d, 0, 1)
print(swapped.shape) # Output: (3, 2, 4)
Note:
Transpose: Rearranges all axes (default reverses them).
Swapaxes: Swaps only two specified axes.
Both are efficient and return views of the original array when possible.
Universal Functions:
"Universal Functions" and they are NumPy functions that operate on
the ndarray object.
Why use ufuncs?
ufuncs are used to implement vectorization in NumPy which is way faster
than iterating over elements. They also provide broadcasting and additional
methods like reduce, accumulate etc. that are very helpful for computation.
functions also take additional arguments, like:
where boolean array or condition defining where the operations should take
place.
dtype defining the return type of elements.
out output array where the return value should be copied.
ufunc Rounding Decimals
There are primarily five ways of rounding off decimals in NumPy:
Truncation fix rounding floor ceil
NumPy – Rounding Decimals
NumPy offers 5 main ways to round decimals:
1. Truncation – Removes decimals, returns value closest to zero.
np.trunc([-3.1666, 3.6667]) # [-3. 3.]
np.fix([-3.1666, 3.6667]) # [-3. 3.]
2. Rounding (around) – Rounds to given decimal places (≥5 rounds up).
np.around(3.1666, 2) # 3.17
3. Floor – Rounds down to nearest integer.
np.floor([-3.1666, 3.6667]) # [-4. 3.]
4. Ceil – Rounds up to nearest integer.
np.ceil([-3.1666, 3.6667]) # [-3. 4.]
ufunc Summations
NumPy Addition vs Summation
Addition (np.add) – Element-wise addition between two arrays.
import numpy as np
arr1 = np.array([1, 2, 3])
arr2 = np.array([1, 2, 3])
print(np.add(arr1, arr2)) # [2 4 6]
Summation (np.sum) – Adds all elements.
print(np.sum([arr1, arr2])) # 12
Summation over axis – Adds along a specified axis.
print(np.sum([arr1, arr2], axis=1)) # [6 6]
Cumulative sum (np.cumsum) – Running total of elements.
arr = np.array([1, 2, 3])
print(np.cumsum(arr)) # [1 3 6]
ufunc Products
Product (np.prod) – Multiplies all elements.
import numpy as np
arr = np.array([1, 2, 3, 4])
print(np.prod(arr)) # 24
Product of multiple arrays – Flattens and multiplies all elements.
arr1 = np.array([1, 2, 3, 4])
arr2 = np.array([5, 6, 7, 8])
print(np.prod([arr1, arr2])) # 40320
Product over axis – Multiplies along a specific axis.
print(np.prod([arr1, arr2], axis=1)) # [24 1680]
Cumulative product (np.cumprod) – Running product of elements.
arr = np.array([5, 6, 7, 8])
print(np.cumprod(arr)) # [5 30 210 1680]
2. Hyperbolic Functions
NumPy provides:
sinh(), cosh(), tanh() – hyperbolic equivalents.
print(np.sinh(np.pi/2))
arr = np.array([np.pi/2, np.pi/3, np.pi/4, np.pi/5])
print(np.cosh(arr))
Inverse Hyperbolic Functions
arcsinh(), arccosh(), arctanh() – return angles in radians.
print(np.arcsinh(1.0))
print(np.arctanh([0.1, 0.2, 0.5]))
Statistical Methods
1. Descriptive Statistics:
o np.mean(): Calculates the mean.
o np.median(): Finds the median.
o np.std(), np.var(): Standard deviation and variance.
o Example: np.mean([1, 2, 3]) → 2.0
2. Summation and Products:
o np.sum(), np.prod()
o Example: np.sum([1, 2, 3]) → 6
3. Min/Max and Percentiles:
o np.min(), np.max(), np.percentile()
o Example: np.percentile([1, 2, 3, 4], 50) → 2.5
4. Correlation and Covariance:
o np.corrcoef(): Correlation coefficient.
o np.cov(): Covariance matrix.
o Example: np.corrcoef([1, 2, 3], [4, 5, 6])
2. Logical Operations
AND (&):
a = np.array([True, False, True])
b = np.array([False, False, True])
result = a & b # [False, False, True]
OR (|):
result = a | b # [True, False, True]
NOT (~):
result = ~a # [False, True, False]
XOR (^):
result = a ^ b # [True, False, False]
5. Combining Conditions
Combine multiple conditions using logical operators.
arr = np.array([10, 20, 30, 40])
result = (arr > 15) & (arr < 35) # [False, True, True, False]
6. Boolean Reduction
np.any(): Checks if at least one value is True.
np.any([False, True, False]) # True
np.all(): Checks if all values are True.
np.all([True, True, False]) # False
3. Set Operations
NumPy supports mathematical set operations for arrays:
Union: Combine unique elements from two arrays.
a = np.array([1, 2, 3])
b = np.array([3, 4, 5])
union = np.union1d(a, b)
print(union) # Output: [1 2 3 4 5]
Intersection: Find common elements.
intersection = np.intersect1d(a, b)
print(intersection) # Output: [3]
Difference: Elements in a but not in b.
difference = np.setdiff1d(a, b)
print(difference) # Output: [1 2]
Symmetric Difference: Elements in either a or b but not both.
sym_diff = np.setxor1d(a, b)
print(sym_diff) # Output: [1 2 4 5]
2. Loading Arrays
numpy.load: Loads arrays from .npy or .npz files.
loaded_arr = np.load('array_file.npy')
print(loaded_arr)
For .npz files:
data = np.load('arrays_file.npz')
print(data['array1'], data['array2'])
numpy.loadtxt: Loads data from a text file into an array.
loaded_arr = np.loadtxt('array_file.txt', delimiter=',')
print(loaded_arr)
numpy.genfromtxt: Handles missing data while loading from text
files.
loaded_arr = np.genfromtxt('array_file.txt', delimiter=',',
filling_values=0)
print(loaded_arr)
Note:
Binary Format: Efficient for large arrays but not human-readable
(.npy, .npz).
Text Format: Human-readable but less efficient for storage (.txt, .csv).
Compression: Use. npz for saving multiple arrays in a compressed
format.
These functions make it easy to store and retrieve array data for various
applications, ensuring both efficiency and flexibility.
Linear Algebra
Linear algebra in Python can be efficiently handled using libraries like
NumPy and SciPy, which provide robust tools for matrix operations, solving
linear systems, and more. Here's a quick overview of what you can do:
1. Basic Operations
You can perform operations like addition, subtraction, multiplication, and
division on matrices and vectors.
import numpy as np
# Define matrices
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])
# Matrix addition
C=A+B
# Matrix multiplication
D = np.dot(A, B)
print("Addition:\n", C)
print("Multiplication:\n", D)
2. Matrix Properties
You can compute properties like determinant, rank, trace, and eigenvalues.
# Determinant
det = np.linalg.det(A)
# Rank
rank = np.linalg.matrix_rank(A)
print("Determinant:", det)
print("Rank:", rank)
print("Eigenvalues:", eigenvalues)
# Solve for x
x = np.linalg.solve(A, b)
print("Solution:", x)
4. Matrix Inversion
Find the inverse of a matrix (if it exists).
# Inverse of a matrix
A_inv = np.linalg.inv(A)
print("Inverse:\n", A_inv)
5. Advanced Operations
Perform singular value decomposition (SVD), QR decomposition, or compute
norms.
# Singular Value Decomposition
U, S, V = np.linalg.svd(A)
# Norm of a matrix
norm = np.linalg.norm(A)
print("SVD Components:\n", U, S, V)
print("Norm:", norm)
These tools make Python a powerful choice for linear algebra tasks, whether
you're working on data science, machine learning, or scientific computing.
Pseudorandom Number Generation
Pseudo-random number generation is the process of generating a sequence
of numbers that appears random but is actually deterministic. These
numbers are produced by an algorithm, a pseudo-random number
generator (PRNG), using a starting value called a seed. If you use the same
seed, you'll get the exact same sequence of "random" numbers. This
predictability is useful for debugging and reproducibility in simulations.
position = 0
path = [position]
for _ in range(100):
if random.random() < 0.5:
position -= 1
else:
position += 1
path.append(position)
In this example, random.random() is the PRNG function that generates a
number between 0.0 and 1.0.