KEMBAR78
Comprehensive NumPy Guide for Python | PDF | Matrix (Mathematics) | Eigenvalues And Eigenvectors
0% found this document useful (0 votes)
88 views30 pages

Comprehensive NumPy Guide for Python

Uploaded by

samuel.yacoubb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
88 views30 pages

Comprehensive NumPy Guide for Python

Uploaded by

samuel.yacoubb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

NumPy Python library By ChatGPT

Alright, let's dive into a detailed guide for NumPy! I’ll structure this into sections to make it thorough
and manageable. Here’s how we’ll cover it:

1. Introduction to NumPy
2. NumPy Basics
Creating arrays
Array indexing and slicing
Array data types
3. Array Operations
Basic math and statistics on arrays
Array broadcasting
Universal functions
4. Advanced Array Manipulation
Array reshaping
Stacking and splitting arrays
5. Working with Random Data
Random number generation
Setting seeds
6. Linear Algebra with NumPy
Matrix operations
Eigenvalues and eigenvectors
7. Practical Examples
Real-world data manipulation with NumPy

After each point, I’ll include code examples with comments, explanations, and real-world applications.
Let’s start with the Introduction to NumPy and NumPy Basics. When you're ready to move on, just
type "continue," and I'll proceed with the next sections!

1. Introduction to NumPy

What is NumPy?

NumPy (Numerical Python) is a powerful library in Python designed for numerical computations. It
provides support for:

Multi-dimensional arrays (matrices)


Mathematical functions optimized for speed
Array broadcasting and linear algebra

NumPy is the foundation for data manipulation in Python, often used alongside libraries like Pandas
and SciPy.

Why use NumPy?

Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 1/30
Performance: NumPy arrays are much faster than traditional Python lists.
Memory Efficiency: Efficiently stores data using contiguous memory.
Flexibility: Supports complex operations like linear algebra, Fourier transformations, and
random data generation.

2. NumPy Basics

2.1 Creating Arrays

In NumPy, arrays are the primary structure. Let’s look at the basic ways to create arrays:

1. Creating a 1D Array:

python

import numpy as np
# Creating a 1D array from a list
array_1d = np.array([1, 2, 3, 4, 5])
print("1D Array:", array_1d)

2. Creating a 2D Array:

python

# Creating a 2D array from a nested list


array_2d = np.array([[1, 2, 3], [4, 5, 6]])
print("2D Array:\n", array_2d)

3. Using Built-in Functions:


Zeros Array:

python

zeros_array = np.zeros((3, 3))


print("Zeros Array:\n", zeros_array)

Ones Array:

python

ones_array = np.ones((2, 4))


print("Ones Array:\n", ones_array)

Identity Matrix:

Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 2/30
python

identity_matrix = np.eye(3)
print("Identity Matrix:\n", identity_matrix)

4. Array of a Range of Values:

python

range_array = np.arange(1, 10, 2) # From 1 to 9 with a step of 2


print("Range Array:", range_array)

Explanation and Comments:

`np.array()` converts lists into arrays.


Functions like `np.zeros()`, `np.ones()`, and `np.eye()` generate arrays with specific patterns.
`np.arange()` is similar to Python's `range()` but returns an array.

2.2 Array Indexing and Slicing

Indexing and slicing are essential for accessing data in arrays.

1. Basic Indexing:

python

# Accessing elements in a 1D array


print("First element:", array_1d[0])
print("Last element:", array_1d[-1])

2. Indexing in a 2D Array:

python

# Accessing elements in a 2D array


print("Element at row 1, col 2:", array_2d[0, 1]) # 2

Absolutely, let’s expand the introduction with more depth and examples. NumPy is foundational in
data science and scientific computing because it enables fast and efficient manipulation of large
datasets. I’ll dive into each core feature here:

1. Introduction to NumPy

Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 3/30
What is NumPy?

NumPy, short for "Numerical Python," is a high-performance library for handling large datasets and
performing mathematical and logical operations in Python. It’s widely used in fields like data science,
machine learning, and scientific computing, largely because of its speed and flexibility. At its core,
NumPy offers:

1. ndarray: A powerful n-dimensional array object, similar to lists but with optimized performance
for large amounts of data.
2. Mathematical Functions: Built-in support for mathematical, logical, and statistical operations.
3. Broadcasting: Allows operations across arrays of different shapes and sizes without the need for
explicit loops.
4. Interoperability: NumPy is foundational for many other Python libraries like Pandas,
TensorFlow, and SciPy, making it easy to integrate into data pipelines.

Key Advantages of NumPy

Memory Efficiency: NumPy arrays use less memory than Python lists, which is essential for
large datasets.
Speed: Built on low-level languages like C and Fortran, NumPy operations are much faster than
pure Python equivalents.
Convenient Syntax: Provides concise and clear syntax for complex mathematical computations.

Why Use NumPy?

Scenario Example

Imagine a dataset with millions of rows and columns, such as stock prices over time. Using Python
lists for mathematical operations (e.g., summing columns, finding averages) would be very slow and
resource-intensive. With NumPy’s `ndarray` structure, these operations can be completed in
milliseconds due to the array's optimized performance.

Installation of NumPy

To get started with NumPy, first, you need to install it (if you haven’t already):

bash

pip install numpy

Importing NumPy

Once installed, we usually import NumPy as follows:

python

import numpy as np

Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 4/30
This shorthand (`np`) is a convention in the Python community, making code more concise and
readable.

Real-World Example: Working with Sensor Data

Let’s say you’re working with data from environmental sensors (temperature, humidity, and pressure
readings). These values need to be processed efficiently to detect trends, patterns, or anomalies.

With NumPy, you can:

1. Store the readings in an array, where each row represents a reading and each column
represents a different type of measurement (temperature, humidity, etc.).
2. Perform statistical analyses to find averages, standard deviations, or maximum values.
3. Use broadcasting to apply transformations (e.g., temperature conversions) across the entire
dataset with minimal code.

Now that we understand why NumPy is so useful, let's start with NumPy Basics, covering how to
create arrays, indexing, slicing, and more.

Ready for the next part? Just type "continue," and I’ll move forward!

Great! Let's dive into NumPy Basics to get hands-on with arrays, indexing, slicing, and data types.

2. NumPy Basics

The ndarray (N-dimensional array) is the core data structure of NumPy. It’s like a Python list but with
added functionality, optimized for numeric data.

2.1 Creating Arrays

NumPy offers multiple ways to create arrays, from converting Python lists to generating arrays with
specific values or shapes. Let's look at the most common methods:

1. From Python Lists: You can convert a list directly to a NumPy array using `np.array()`.

python

import numpy as np

# Creating a 1D array from a list


arr_1d = np.array([1, 2, 3, 4, 5])
print("1D Array:", arr_1d)

# Creating a 2D array from a list of lists


arr_2d = np.array([[1, 2, 3], [4, 5, 6]])
print("2D Array:\n", arr_2d)

Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 5/30
2. Using Built-in Functions: NumPy provides functions like `np.zeros()`, `np.ones()`, and
`np.arange()` for quickly creating arrays with specific values or ranges.

python

# Array of zeros
zeros_array = np.zeros((3, 3)) # 3x3 matrix of zeros
print("Zeros Array:\n", zeros_array)

# Array of ones
ones_array = np.ones((2, 4)) # 2x4 matrix of ones
print("Ones Array:\n", ones_array)

# Array of evenly spaced values


range_array = np.arange(0, 10, 2) # Start at 0, end before 10, step by 2
print("Range Array:", range_array)

3. Random Arrays: Random numbers are essential in data science, especially for generating
sample data or initializing model weights.

python

# Array of random values between 0 and 1


random_array = np.random.rand(3, 3)
print("Random Array:\n", random_array)

2.2 Array Indexing and Slicing

Indexing and slicing in NumPy work similarly to Python lists but with more functionality, allowing us to
access and modify parts of arrays efficiently.

1. Indexing: Access individual elements in an array using indices.

python

# Accessing elements in a 1D array


print("First element in arr_1d:", arr_1d[0])

# Accessing elements in a 2D array


print("Element at row 1, column 2 in arr_2d:", arr_2d[1, 2])

2. Slicing: Extract subarrays using the slice notation `start:end:step`.

python

# Slicing a 1D array
slice_1d = arr_1d[1:4] # Extract elements from index 1 to 3
print("Sliced 1D Array:", slice_1d)

# Slicing a 2D array

Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 6/30
slice_2d = arr_2d[:, 1] # Extract the second column from all rows
print("Sliced 2D Array (column 2):", slice_2d)

2.3 Array Data Types

NumPy allows specifying data types for arrays. This is useful for controlling memory usage and
precision.

1. Default Data Types: NumPy will infer the data type based on the input. You can check an array’s
data type using `.dtype`.

python

print("Data type of arr_1d:", arr_1d.dtype)

2. Specifying Data Types: You can set a specific data type during array creation.

python

float_array = np.array([1, 2, 3], dtype=float)


print("Float Array:", float_array)
print("Data type:", float_array.dtype)

3. Changing Data Types: Use `.astype()` to convert an array to another data type.

python

int_array = float_array.astype(int) # Convert float to integer


print("Converted Array:", int_array)
print("New data type:", int_array.dtype)

Real-World Example: Processing Sensor Readings

Let's say you have a list of temperature readings in Celsius and want to analyze them. Here’s how you
can use NumPy:

1. Store the data in an array:

python

temp_readings = np.array([22.4, 23.1, 21.7, 22.9, 23.3, 21.8])

2. Convert the readings to Fahrenheit:

Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 7/30
python

temp_fahrenheit = temp_readings * 9/5 + 32


print("Temperatures in Fahrenheit:", temp_fahrenheit)

3. Calculate the average temperature:

python

average_temp = np.mean(temp_readings)
print("Average Temperature in Celsius:", average_temp)

4. Identify readings above a threshold (e.g., 23°C):

python

high_temps = temp_readings[temp_readings > 23]


print("High Temperatures:", high_temps)

These basics should give you a strong start with NumPy arrays, their creation, and manipulation. Just
type "continue" when you’re ready to move on to Array Operations!

Alright, let's move on to Array Operations! In this section, we'll cover essential operations you can
perform on arrays, including basic math, statistical functions, broadcasting, and universal functions.

3. Array Operations

NumPy makes it simple to perform various operations on arrays without writing complex loops. These
operations are optimized for performance and can be applied across entire arrays or specific axes
(e.g., rows or columns in a 2D array).

3.1 Basic Math and Statistics on Arrays

NumPy provides a wide range of mathematical and statistical functions that operate on arrays,
including addition, subtraction, finding means, sums, and more.

1. Basic Arithmetic Operations: You can perform arithmetic operations directly between arrays or
between arrays and scalars.

python

arr = np.array([1, 2, 3, 4, 5])

# Addition

Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 8/30
add_result = arr + 2
print("Array after addition:", add_result)

# Multiplication
multiply_result = arr * 3
print("Array after multiplication:", multiply_result)

# Adding two arrays


arr2 = np.array([5, 4, 3, 2, 1])
sum_result = arr + arr2
print("Sum of two arrays:", sum_result)

2. Statistical Functions: Calculate statistical properties like mean, median, standard deviation, and
sum.

python

# Mean
mean_value = np.mean(arr)
print("Mean:", mean_value)

# Sum
sum_value = np.sum(arr)
print("Sum:", sum_value)

# Standard deviation
std_value = np.std(arr)
print("Standard Deviation:", std_value)

These functions also work with 2D arrays. You can calculate statistics along specific axes (rows or
columns) by using the `axis` parameter:

python

arr_2d = np.array([[1, 2, 3], [4, 5, 6]])

# Sum along columns (axis=0)


col_sum = np.sum(arr_2d, axis=0)
print("Column-wise sum:", col_sum)

# Mean along rows (axis=1)


row_mean = np.mean(arr_2d, axis=1)
print("Row-wise mean:", row_mean)

3.2 Array Broadcasting

Broadcasting is a powerful feature in NumPy that lets you perform operations on arrays of different
shapes and sizes without explicitly looping through elements. Broadcasting automatically expands
smaller arrays to match the size of larger ones.

1. Basic Broadcasting Example:

Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 9/30
python

arr = np.array([1, 2, 3])


matrix = np.array([[1, 2, 3], [4, 5, 6]])

# Broadcasting - adding 1D array to 2D array


broadcast_result = matrix + arr
print("Result of broadcasting:\n", broadcast_result)

2. Rules of Broadcasting:
The trailing dimensions of the arrays must be either the same size or one of them must be
1.
NumPy will “stretch” the array with a dimension of 1 to match the size of the other array.

3.3 Universal Functions (ufuncs)

Universal functions, or `ufuncs`, are optimized functions in NumPy that operate element-wise on
arrays. These include functions like `np.sqrt()`, `np.exp()`, `np.log()`, and trigonometric functions.

1. Square Root and Exponential Functions:

python

arr = np.array([1, 4, 9, 16])

# Square root
sqrt_result = np.sqrt(arr)
print("Square root of each element:", sqrt_result)

# Exponential
exp_result = np.exp(arr)
print("Exponential of each element:", exp_result)

2. Trigonometric Functions:

python

angles = np.array([0, np.pi / 2, np.pi])

# Sine of each angle


sin_result = np.sin(angles)
print("Sine values:", sin_result)

# Cosine of each angle


cos_result = np.cos(angles)
print("Cosine values:", cos_result)

Real-World Example: Image Processing

Imagine you’re working on an image processing project. Each pixel in an image can be represented as
a value in a 2D or 3D array. Using array operations, you can manipulate these values quickly.

Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 10/30
1. Brightness Adjustment: Suppose you have a grayscale image array (2D array of pixel values)
and want to increase the brightness by 50 units.

python

image = np.array([[100, 150, 200], [50, 75, 100], [25, 125, 225]])

# Increase brightness by 50
bright_image = image + 50
print("Brightened image:\n", bright_image)

2. Thresholding: Apply a threshold to turn the image into a binary (black and white) image.

python

# Apply threshold - any pixel > 150 becomes white (255), others become black (0)
binary_image = np.where(image > 150, 255, 0)
print("Binary image:\n", binary_image)

3. Normalization: Normalize pixel values to be between 0 and 1 for certain machine learning
applications.

python

normalized_image = image / 255


print("Normalized image:\n", normalized_image)

That covers the basics of array operations. These functions can help you handle numerical data and
make quick adjustments to datasets. When you're ready for the next section, Advanced Array
Manipulation, just type "continue"!

Alright! Now let’s move on to Advanced Array Manipulation, where we’ll cover reshaping arrays,
stacking and splitting them, and more. These techniques are especially useful for organizing and
reformatting data in data science workflows.

4. Advanced Array Manipulation

4.1 Array Reshaping

Reshaping allows you to change the dimensions of an array without altering its data. This is useful
when working with datasets that need to be reformatted for analysis or fed into machine learning
models.

1. Reshape Example: Convert a 1D array into a 2D array.

Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 11/30
python

arr = np.array([1, 2, 3, 4, 5, 6])

# Reshape to 2 rows and 3 columns


reshaped_arr = arr.reshape(2, 3)
print("Reshaped Array (2x3):\n", reshaped_arr)

Explanation: In the example above, we’re converting a 1D array of 6 elements into a 2x3 2D
array. The total number of elements must remain the same.

2. Using `-1` to Infer Dimensions: NumPy can infer one dimension automatically if you set it to
`-1`.

python

arr = np.array([1, 2, 3, 4, 5, 6])

# Reshape to 3 rows and auto-calculate columns


reshaped_auto = arr.reshape(3, -1)
print("Reshaped with auto-calculated columns (3x2):\n", reshaped_auto)

4.2 Stacking and Splitting Arrays

Sometimes you need to combine arrays or split them into smaller arrays. NumPy provides several
functions for these tasks.

1. Stacking Arrays: You can stack arrays vertically or horizontally.

python

arr1 = np.array([1, 2, 3])


arr2 = np.array([4, 5, 6])

# Vertical stack
v_stack = np.vstack((arr1, arr2))
print("Vertical Stack:\n", v_stack)

# Horizontal stack
h_stack = np.hstack((arr1, arr2))
print("Horizontal Stack:", h_stack)

Explanation:
`np.vstack` stacks arrays along a new row, while `np.hstack` joins them side-by-side.
The arrays need to be compatible in size along the dimension they’re being stacked on (e.g.,
matching column size for vertical stacking).

2. Splitting Arrays: You can split arrays into subarrays using `np.split`, `np.hsplit`, and
`np.vsplit`.

Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 12/30
python

arr = np.array([1, 2, 3, 4, 5, 6])

# Split array into 3 parts


split_arr = np.split(arr, 3)
print("Split Array:", split_arr)

# For a 2D array, horizontal and vertical splits are also possible


arr2d = np.array([[1, 2, 3], [4, 5, 6]])
h_split = np.hsplit(arr2d, 3) # Split horizontally into 3 columns
print("Horizontal Split:\n", h_split)

Explanation:
`np.split` divides a 1D array into equal parts, while `np.hsplit` and `np.vsplit` work
specifically for 2D arrays.
This can be particularly useful when dividing data for training and testing in machine
learning.

Real-World Example: Splitting Data for Machine Learning

Suppose you have a dataset with features and labels in a 2D array, where each row is a data instance
and each column is a feature. You can split this data into input and output arrays, or into training and
testing sets.

1. Separating Features and Labels: Let’s assume the last column of your array is the label, and the
other columns are features.

python

data = np.array([[5.1, 3.5, 1.4, 0.2, 0],


[4.9, 3.0, 1.4, 0.2, 0],
[6.3, 3.3, 6.0, 2.5, 1]])

# Separate features (first 4 columns) and labels (last column)


features = data[:, :-1]
labels = data[:, -1]

print("Features:\n", features)
print("Labels:", labels)

2. Splitting into Training and Testing Sets: Use `np.split` to divide the data into a training set
(80%) and a testing set (20%).

python

# Shuffle data for randomization


np.random.shuffle(data)

# Split - here we assume we have 10 rows, take first 8 for training, last 2 for testi
ng
train, test = np.split(data, [int(0.8 * len(data))])

Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 13/30
print("Training Set:\n", train)
print("Testing Set:\n", test)

These advanced array manipulations are essential tools when reshaping data for analysis or preparing
it for machine learning models.

Next, we’ll dive into Working with Random Data and how NumPy can help with random number
generation, which is important for tasks like creating synthetic datasets and initializing model weights.
Just type "continue" to proceed!

Alright! Now let’s explore Working with Random Data in NumPy, where we’ll learn how to generate
random numbers, create random arrays, and set random seeds. This is particularly useful for data
science tasks, such as creating synthetic datasets, initializing weights in neural networks, or shuffling
data for machine learning.

5. Working with Random Data

NumPy’s `random` module provides a range of functions to generate random numbers or arrays with
specified distributions. This module is highly versatile, covering everything from random integers to
more complex distributions.

5.1 Generating Random Numbers

1. Random Float Between 0 and 1: The `rand` function generates random floating-point numbers
in the range [0, 1).

python

import numpy as np

# Generate a random float


random_float = np.random.rand()
print("Random Float between 0 and 1:", random_float)

2. Random Integers: You can specify a range for generating random integers with `randint`.

python

# Generate a random integer between 10 and 50


random_int = np.random.randint(10, 50)
print("Random Integer between 10 and 50:", random_int)

# Generate an array of 5 random integers between 1 and 10


random_int_array = np.random.randint(1, 10, size=5)
print("Array of Random Integers:", random_int_array)

Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 14/30
5.2 Creating Random Arrays

NumPy can create arrays with random values in specific shapes and dimensions, which is useful for
simulating data.

1. Random Array with Specified Shape: Use `rand` to create arrays of random floats in the range
[0, 1).

python

# 2x3 array with random floats


random_array = np.random.rand(2, 3)
print("Random Array:\n", random_array)

2. Normal Distribution: Generate values following a normal (Gaussian) distribution using `randn`.
This is often used in machine learning for initializing weights.

python

# 1D array of 5 elements from a standard normal distribution (mean=0, std=1)


normal_array = np.random.randn(5)
print("Normal Distribution Array:", normal_array)

# 2D array with specified mean and standard deviation


normal_array_custom = np.random.normal(loc=10, scale=2, size=(2, 3))
print("Custom Normal Distribution Array:\n", normal_array_custom)

Explanation:
`loc` is the mean, and `scale` is the standard deviation.
This lets you control the central tendency and spread of values in the array.

5.3 Setting a Random Seed

Setting a seed ensures reproducibility in random operations, which is helpful when you need
consistent results, such as in testing or debugging.

1. Setting the Seed:

python

# Set a random seed for reproducibility


np.random.seed(42)

# Generate random numbers with the seed set


print("Random Number with Seed 42:", np.random.rand())

Explanation:
Setting `np.random.seed()` before random operations ensures that each run generates the
same sequence of random numbers.
Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 15/30
This is valuable when sharing code or ensuring consistency across tests.

Real-World Example: Generating Synthetic Data for Testing

Let’s say you’re building a system to analyze the weights and heights of individuals. You can use
NumPy to generate synthetic data to test your system.

1. Generate Heights and Weights with a Normal Distribution:


Assume heights (in cm) have a mean of 170 and a standard deviation of 10.
Assume weights (in kg) have a mean of 65 and a standard deviation of 15.

python

np.random.seed(0) # Ensures results are reproducible

# Generate heights and weights


heights = np.random.normal(loc=170, scale=10, size=100)
weights = np.random.normal(loc=65, scale=15, size=100)

print("Sample Heights:", heights[:5])


print("Sample Weights:", weights[:5])

2. Combine Heights and Weights into a Dataset:


Stack heights and weights into a 2D array for easy analysis.

python

# Stack heights and weights as columns in a 2D array


data = np.column_stack((heights, weights))
print("Data (Heights and Weights):\n", data[:5]) # Display first 5 rows

3. Calculate Summary Statistics:


Analyze the mean and standard deviation of the generated data to verify it matches our
specified parameters.

python

# Calculate mean and standard deviation


mean_height = np.mean(heights)
std_height = np.std(heights)
mean_weight = np.mean(weights)
std_weight = np.std(weights)

print("Mean Height:", mean_height)


print("Height Std Dev:", std_height)
print("Mean Weight:", mean_weight)
print("Weight Std Dev:", std_weight)

This synthetic dataset could be used for testing algorithms that analyze physical data, providing a
controlled and reproducible set of random data.

Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 16/30
Next, we’ll explore Linear Algebra with NumPy, covering matrices, matrix operations, and linear
transformations. Just type "continue" when you're ready to move forward!

Great! Let's continue with Linear Algebra with NumPy, a critical area for scientific computing and
data science. NumPy offers powerful tools for matrix operations, solving linear systems, and
performing transformations, which are essential for many applications in machine learning, physics,
and engineering.

6. Linear Algebra with NumPy

NumPy’s `linalg` module provides many functions for performing linear algebra operations, such as
matrix multiplication, finding determinants, solving linear systems, and computing eigenvalues.

6.1 Creating Matrices

In NumPy, matrices can be created with 2D arrays. Here’s how to set up some basic matrices:

1. Basic Matrix Setup:

python

import numpy as np

# Define a 2x3 matrix


matrix_2x3 = np.array([[1, 2, 3], [4, 5, 6]])
print("2x3 Matrix:\n", matrix_2x3)

# Define a 3x3 square matrix


matrix_3x3 = np.array([[1, 2, 3], [0, 1, 4], [5, 6, 0]])
print("3x3 Matrix:\n", matrix_3x3)

6.2 Matrix Transpose

The transpose of a matrix swaps its rows with columns, which is useful for vectorized computations.

1. Transpose Example:

python

# Original matrix
matrix = np.array([[1, 2, 3], [4, 5, 6]])

# Transpose
transposed_matrix = matrix.T
print("Transposed Matrix:\n", transposed_matrix)

6.3 Matrix Multiplication

Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 17/30
Matrix multiplication is a common operation in linear algebra. NumPy provides both element-wise and
dot product (matrix) multiplications.

1. Element-wise Multiplication:

python

matrix_a = np.array([[1, 2], [3, 4]])


matrix_b = np.array([[2, 0], [1, 3]])

# Element-wise multiplication
elementwise_product = matrix_a * matrix_b
print("Element-wise Product:\n", elementwise_product)

2. Dot Product (Matrix Multiplication):

python

# Matrix multiplication
matrix_product = np.dot(matrix_a, matrix_b)
print("Matrix Product (Dot Product):\n", matrix_product)

# Alternatively, you can use @ for matrix multiplication


matrix_product_alt = matrix_a @ matrix_b
print("Matrix Product using '@':\n", matrix_product_alt)

Explanation:
In element-wise multiplication, corresponding elements are multiplied together.
In matrix multiplication (dot product), each element of the resulting matrix is the sum of
the products of the corresponding row and column elements.

6.4 Determinant and Inverse of a Matrix

The determinant of a square matrix provides insight into the matrix's properties, and the inverse
matrix is useful for solving systems of equations.

1. Determinant:

python

square_matrix = np.array([[1, 2], [3, 4]])

# Calculate the determinant


determinant = np.linalg.det(square_matrix)
print("Determinant:", determinant)

2. Inverse Matrix:

Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 18/30
python

# Calculate the inverse


inverse_matrix = np.linalg.inv(square_matrix)
print("Inverse Matrix:\n", inverse_matrix)

Explanation:
A matrix must be square (same number of rows and columns) and have a non-zero
determinant to have an inverse.
The inverse matrix, when multiplied with the original, gives the identity matrix.

6.5 Solving Linear Systems

For a system of linear equations represented as Ax = b, where A is a matrix and b is a vector, you can
solve for x using `np.linalg.solve`.

1. Example: Solving Ax = b:

python

# Define matrix A and vector b


A = np.array([[3, 1], [1, 2]])
b = np.array([9, 8])

# Solve for x
x = np.linalg.solve(A, b)
print("Solution (x):", x)

Explanation:
This solves the equation by finding values for x that satisfy the matrix equation.
`np.linalg.solve` is more efficient than finding the inverse matrix for this purpose.

6.6 Eigenvalues and Eigenvectors

Eigenvalues and eigenvectors are foundational concepts in linear algebra, with applications in
machine learning, physics, and more.

1. Calculating Eigenvalues and Eigenvectors:

python

# Define a square matrix


matrix = np.array([[4, -2], [1, 1]])

# Calculate eigenvalues and eigenvectors


eigenvalues, eigenvectors = np.linalg.eig(matrix)
print("Eigenvalues:", eigenvalues)
print("Eigenvectors:\n", eigenvectors)

Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 19/30
Explanation:
Eigenvalues represent how much the eigenvectors are scaled during the transformation.
Each eigenvector corresponds to a specific eigenvalue and indicates the direction of the
transformation.

Real-World Example: Solving a System of Linear Equations

Let’s say you have two equations representing the relationship between items sold and profit:

3x + y = 9

x + 2y = 8

where x represents the price of item A and y represents the price of item B. Using matrix notation, we
can express this as Ax = b:

python

A = np.array([[3, 1], [1, 2]])


b = np.array([9, 8])

# Solve the system


prices = np.linalg.solve(A, b)
print("Prices of Item A and Item B:", prices)

This example shows how easily you can solve systems of equations using linear algebra in NumPy.

This concludes our overview of basic linear algebra operations. These tools are essential for a variety
of applications, from solving equations to understanding transformations in data.

Next up, we'll look at Broadcasting in Practice, where we’ll apply what we've learned about
broadcasting to some real-world tasks. Just type "continue" to proceed!

Great! Let’s dive into Broadcasting in Practice. Broadcasting is a powerful feature in NumPy that
allows operations between arrays of different shapes, which can save time and make code more
efficient. Understanding broadcasting is essential for performing vectorized operations, especially in
data processing and machine learning tasks.

7. Broadcasting in Practice

Broadcasting allows you to perform operations on arrays of different shapes without requiring explicit
reshaping. This feature is highly useful when you want to apply operations on each element of an
array or to align arrays of varying shapes.

Let’s break down how broadcasting works with some examples.

Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 20/30
7.1 Basic Broadcasting Rules

To understand how broadcasting works, here are the basic rules:

1. If the arrays have a different number of dimensions, prepend `1`s to the shape of the smaller
array until they match in dimensions.
2. If the shapes are still different, the arrays are compatible in broadcasting if each dimension is
either:
The same in both arrays
One of the arrays has a size of `1` in that dimension

When these rules are satisfied, broadcasting can proceed.

7.2 Examples of Broadcasting

1. Scalar and Array: When a scalar (single number) is used in an operation with an array, the scalar
is broadcast across all elements.

python

import numpy as np

array = np.array([1, 2, 3, 4])


result = array + 10 # Add 10 to each element
print("Array + 10:", result)

Explanation: Here, `10` is "broadcasted" to each element of `array`, making the operation
element-wise.

2. Different Shapes with Compatible Dimensions:

python

# Define a 2x3 array


array_2x3 = np.array([[1, 2, 3], [4, 5, 6]])

# Define a 1x3 array (row vector)


row_vector = np.array([10, 20, 30])

# Broadcast row vector to each row in array_2x3


result = array_2x3 + row_vector
print("Broadcasted Addition:\n", result)

Explanation: Here, the row vector `[10, 20, 30]` is broadcast across each row of the `2x3` array.
NumPy replicates the row vector so it can perform element-wise addition across both arrays.

3. Broadcasting with Different Axes:

python

# Define a 2x3 array


array_2x3 = np.array([[1, 2, 3], [4, 5, 6]])

Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 21/30
# Define a 2x1 column vector
column_vector = np.array([[10], [20]])

# Broadcast column vector to each column in array_2x3


result = array_2x3 + column_vector
print("Broadcasted Addition with Column Vector:\n", result)

Explanation: Here, the column vector `[[10], [20]]` is broadcast along each column of the
`2x3` array. NumPy expands the shape of the column vector to match that of `array_2x3`.

7.3 Real-World Example: Normalizing Data by Broadcasting

Imagine you have a dataset where each row represents a data point, and each column represents a
feature. You want to normalize each feature column (subtract the mean and divide by the standard
deviation).

1. Dataset Setup:

python

# Create a 3x3 dataset


data = np.array([[10, 20, 30], [15, 25, 35], [20, 30, 40]])
print("Original Data:\n", data)

2. Calculate Mean and Standard Deviation:

python

# Compute mean and standard deviation along each column


mean = np.mean(data, axis=0)
std_dev = np.std(data, axis=0)

print("Mean:", mean)
print("Standard Deviation:", std_dev)

3. Normalize Using Broadcasting:

python

# Normalize each column


normalized_data = (data - mean) / std_dev
print("Normalized Data:\n", normalized_data)

Explanation:
Here, `mean` and `std_dev` are 1D arrays with one value per column.
Broadcasting enables the subtraction of `mean` and division by `std_dev` across each
column of `data` without explicit loops, making the code cleaner and faster.

Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 22/30
7.4 Applying Broadcasting to Image Processing

In image processing, broadcasting is often used to apply transformations to color channels. Let’s say
you have an image represented by a 3D array with dimensions `(height, width, 3)` for the RGB
channels. You want to increase the brightness of the image by adding a value to each RGB channel.

1. Setup a Sample Image:

python

# Create a 2x2 image with RGB channels (values range from 0 to 255)
image = np.array([[[100, 150, 200], [50, 100, 150]],
[[200, 50, 100], [150, 200, 50]]])
print("Original Image:\n", image)

2. Define Brightness Increase for Each Channel:

python

# Brightness increase for each RGB channel


brightness_increase = np.array([10, 20, 30]) # R, G, B

# Broadcast and add brightness increase to each pixel


brighter_image = image + brightness_increase
print("Brighter Image:\n", brighter_image)

Explanation:
The `brightness_increase` array is broadcast across each pixel of the image.
This effectively increases the brightness in each RGB channel for every pixel.

Summary

Broadcasting in NumPy is an efficient way to perform element-wise operations without needing to


reshape arrays explicitly. This is especially useful in data processing, normalization, and image
manipulation, where operations need to be applied across large datasets or multidimensional arrays.

Next, we’ll move on to Handling Missing Data with NumPy, where we’ll explore strategies for
dealing with `NaN` (Not a Number) values and introduce functions for detecting and handling missing
data points. Just type "continue" when you're ready!

Let's continue with Handling Missing Data with NumPy. Managing missing data is crucial in data
analysis since real-world datasets often have incomplete values. NumPy provides tools for identifying,
handling, and replacing missing values, typically represented by `NaN` (Not a Number).

8. Handling Missing Data with NumPy

Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 23/30
Missing data can interfere with calculations and lead to inaccurate results. Here’s how to identify,
manage, and fill missing values in NumPy arrays.

8.1 Identifying Missing Data

NumPy uses `NaN` (Not a Number) as a placeholder for missing values in floating-point arrays. To
check if values are missing, use `np.isnan`.

1. Creating an Array with Missing Values:

python

import numpy as np

# Define an array with some missing values (NaN)


data_with_nan = np.array([1, 2, np.nan, 4, np.nan, 6])
print("Data with NaNs:\n", data_with_nan)

2. Detecting Missing Values:

python

# Use np.isnan to identify NaNs


is_nan = np.isnan(data_with_nan)
print("Is NaN:\n", is_nan)

Explanation:
`np.isnan` returns a Boolean array where `True` indicates missing values.
This Boolean mask can be used for filtering or replacing values.

8.2 Removing Missing Data

Sometimes, the simplest way to handle missing data is to remove rows or columns with `NaN` values.
NumPy does not have a direct function for this, but you can use Boolean indexing to filter out the
`NaN` values.

1. Removing NaNs:

python

# Remove NaN values from the array


data_no_nan = data_with_nan[~np.isnan(data_with_nan)]
print("Data without NaNs:\n", data_no_nan)

Explanation:
`~np.isnan(data_with_nan)` inverts the Boolean mask, selecting only non-NaN elements.

Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 24/30
This approach works well when a small percentage of values are missing, and removing
them won't distort the data.

8.3 Replacing Missing Data

In cases where missing values cannot be ignored, it’s common to replace them with a placeholder,
such as the mean or median of the data.

1. Filling NaNs with the Mean:

python

# Calculate the mean, ignoring NaNs


mean_value = np.nanmean(data_with_nan)

# Replace NaNs with the calculated mean


filled_data = np.where(np.isnan(data_with_nan), mean_value, data_with_nan)
print("Data with NaNs replaced by mean:\n", filled_data)

Explanation:
`np.nanmean(data_with_nan)` calculates the mean while ignoring `NaN` values.
`np.where` replaces `NaN` values with `mean_value`, leaving other elements unchanged.

2. Replacing NaNs with a Specific Value:

python

# Replace NaNs with a specific value, e.g., 0


data_filled_zero = np.where(np.isnan(data_with_nan), 0, data_with_nan)
print("Data with NaNs replaced by 0:\n", data_filled_zero)

Explanation:
This method lets you replace `NaN` values with any specific value, like `0` or a constant,
which may be useful in certain data preprocessing tasks.

8.4 Real-World Example: Filling Missing Data in a 2D Array

Let’s say you have a 2D array representing sensor readings, with some readings missing (`NaN`). You
want to fill each missing value with the mean of its respective column.

1. Create a 2D Array with Missing Values:

python

data_2d = np.array([[1, 2, np.nan], [4, np.nan, 6], [7, 8, 9]])


print("2D Array with NaNs:\n", data_2d)

Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 25/30
2. Calculate Column Means:

python

# Calculate mean of each column, ignoring NaNs


col_means = np.nanmean(data_2d, axis=0)
print("Column Means:", col_means)

3. Fill NaNs with Column Means:

python

# Replace NaNs with corresponding column means


filled_data_2d = np.where(np.isnan(data_2d), col_means, data_2d)
print("2D Array with NaNs replaced by column means:\n", filled_data_2d)

Explanation:
`np.nanmean(data_2d, axis=0)` computes the mean for each column, excluding `NaN` values.
`np.where` then fills each missing value with the mean of its column.

8.5 Advanced: Interpolating Missing Data

In time-series data, where missing values may disrupt continuity, interpolation can estimate values
based on neighboring data. Although NumPy lacks a built-in interpolation function, you can achieve
interpolation with `np.interp`.

1. Interpolating Missing Values in a 1D Array:

python

# Create a time series with missing values


time_series = np.array([1, np.nan, 3, np.nan, 5, 6, np.nan, 8])

# Identify indices of valid (non-NaN) data


valid_indices = np.where(~np.isnan(time_series))[0]
valid_values = time_series[valid_indices]

# Identify indices of missing (NaN) data


nan_indices = np.where(np.isnan(time_series))[0]

# Interpolate missing values


interpolated_series = time_series.copy()
interpolated_series[nan_indices] = np.interp(nan_indices, valid_indices, valid_value
s)
print("Interpolated Time Series:\n", interpolated_series)

Explanation:
`np.interp` uses linear interpolation to estimate missing values based on the surrounding
data.
This approach is helpful for filling in gaps in time-series data, where continuity is essential.

Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 26/30
Summary

Handling missing data is essential for data preparation and analysis. NumPy provides tools for
detecting, removing, and replacing `NaN` values, allowing you to clean data effectively before further
processing.

Next, we’ll explore Random Sampling and Distributions with NumPy, a core aspect of data science
and machine learning. You’ll learn to generate random samples and create distributions that are
fundamental for simulations, testing, and modeling. Just type "continue" to proceed!

Let’s move on to Random Sampling and Distributions with NumPy. This feature is fundamental for
creating simulations, performing experiments, and building data models. NumPy’s `random` module
provides an array of functions for generating random values, sampling, and working with various
probability distributions.

9. Random Sampling and Distributions with NumPy

The `np.random` submodule lets you generate random numbers, sample data, and create random
distributions. These functions are widely used in tasks like testing algorithms, initializing machine
learning models, and creating randomized simulations.

9.1 Generating Random Numbers

The most basic random function in NumPy is `np.random.rand`, which generates random floating-point
numbers between `0` and `1`.

1. Generate Random Numbers:

python

import numpy as np

# Generate 5 random numbers between 0 and 1


random_numbers = np.random.rand(5)
print("Random numbers between 0 and 1:", random_numbers)

Explanation: `np.random.rand` generates an array of random numbers uniformly distributed


between `0` and `1`.

2. Generate Random Integers:

python

# Generate 5 random integers between 10 and 20


random_integers = np.random.randint(10, 20, size=5)

Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 27/30
print("Random integers between 10 and 20:", random_integers)

Explanation: `np.random.randint` generates random integers within a specified range. The


`size` parameter determines the number of integers to generate.

9.2 Creating Random Arrays

You can also generate multi-dimensional arrays with random values. This is useful for initializing
matrices, tensors, or simulating datasets.

1. Random Array of Shape (3, 3):

python

# Generate a 3x3 random array


random_array = np.random.rand(3, 3)
print("3x3 Random Array:\n", random_array)

Explanation: `np.random.rand(3, 3)` generates a 3x3 array with random values between `0` and
`1`.

9.3 Seeding the Random Number Generator

Seeding ensures reproducibility. By setting a seed value, you get the same random values every time
you run the code.

1. Set a Seed for Reproducibility:

python

# Set a seed for reproducibility


np.random.seed(42)

# Generate a random array


reproducible_array = np.random.rand(3)
print("Reproducible Array with Seed 42:", reproducible_array)

Explanation: With `np.random.seed(42)`, any subsequent random numbers generated will be the
same each time this code is run. This is important for debugging and ensuring consistent results.

9.4 Sampling from Distributions

NumPy allows sampling from various distributions. This is essential for simulations, probability, and
machine learning.

1. Uniform Distribution:

Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 28/30
The uniform distribution generates numbers evenly between two specified values.

python

# Generate 5 random numbers from a uniform distribution between 0 and 10


uniform_samples = np.random.uniform(0, 10, 5)
print("Uniformly Distributed Samples:", uniform_samples)

Explanation: `np.random.uniform` generates samples with equal probability across the specified
range `[0, 10]`.

2. Normal Distribution (Gaussian):


Normal distribution is commonly used in statistics and machine learning.

python

# Generate 5 samples from a normal distribution with mean=0 and std=1


normal_samples = np.random.normal(0, 1, 5)
print("Normally Distributed Samples (mean=0, std=1):", normal_samples)

Explanation: `np.random.normal(0, 1, 5)` generates samples from a normal distribution with a


mean of `0` and standard deviation of `1`.

3. Binomial Distribution:
This distribution models binary outcomes (success/failure) based on a probability.

python

# Generate 5 samples from a binomial distribution (n=10 trials, p=0.5)


binomial_samples = np.random.binomial(n=10, p=0.5, size=5)
print("Binomially Distributed Samples:", binomial_samples)

Explanation: Each sample represents the number of successes out of `10` trials, with a `0.5`
probability of success per trial.

9.5 Real-World Example: Simulating Dice Rolls

A common example of sampling is simulating dice rolls. Let’s create a simulation of rolling a six-sided
die `1000` times and see the distribution.

1. Simulating Dice Rolls:

python

# Simulate rolling a die 1000 times


dice_rolls = np.random.randint(1, 7, size=1000)

Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 29/30
print("First 10 Dice Rolls:", dice_rolls[:10])

2. Analyzing the Results:

python

# Count occurrences of each outcome (1 to 6)


counts = np.bincount(dice_rolls)[1:7]
print("Counts for each face (1 to 6):", counts)

Explanation:
`np.random.randint(1, 7, 1000)` simulates `1000` dice rolls, where each outcome is an
integer from `1` to `6`.
`np.bincount(dice_rolls)[1:7]` counts occurrences of each possible outcome. The results
should be fairly even due to the uniform nature of dice rolls.

3. Visualizing the Results (Optional):

python

import matplotlib.pyplot as plt

faces = np.arange(1, 7)
plt.bar(faces, counts)
plt.xlabel("Die Face")
plt.ylabel("Count")
plt.title("Distribution of Dice Rolls")
plt.show()

Explanation:
This bar chart shows the distribution of outcomes across `1000` rolls. It should approximate
a uniform distribution if the die is fair.

Summary

NumPy’s random module provides powerful functions for generating random numbers, sampling
from distributions, and creating simulations. Seeding allows reproducibility, while various distributions
support more sophisticated modeling and experimentation.

Next, we’ll discuss Advanced Array Manipulations, such as stacking, splitting, and reshaping arrays,
to give you more flexibility in handling data. When you're ready, type "continue"!

ChatGPT can make mistakes. Check important info.

Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 30/30

You might also like