NumPy — Deep Dive
1. What is NumPy?
NumPy (Numerical Python) is the foundational library for numerical computing in Python. It provides the
ndarray — a fast, memory-efficient N-dimensional array — plus a large collection of mathematical
functions that operate on these arrays.
2. Creating Arrays
import numpy as np
# from Python sequences
a = np.array([1, 2, 3])
# special constructors
z = np.zeros((2, 3))
o = np.ones(4)
i = np.arange(0, 10, 2)
lin = np.linspace(0, 1, 5)
# identity
I = np.eye(3)
# from file
arr = np.loadtxt('data.csv', delimiter=',')
3. Array Attributes (important)
• ndarray.shape — tuple of dimensions
• ndarray.ndim — number of axes
• ndarray.size — total elements
• ndarray.dtype — data type
• ndarray.itemsize — bytes per element
• ndarray.nbytes — total bytes
x = np.arange(12).reshape(3, 4)
print(x.shape, x.ndim, x.size, x.dtype, x.nbytes)
4. Indexing, Slicing, and Views
• Slicing returns a view (no copy) when possible — modifying it changes the original.
1
r = np.arange(10)
s = r[2:5] # view
s[0] = 99 # r changes too
m = np.arange(9).reshape(3,3)
col = m[:, 1] # column view
• Use .copy() to force a copy.
5. Fancy Indexing & Boolean Indexing
• Fancy indexing with integer arrays returns a copy.
arr = np.array([10,20,30,40,50])
idx = [0, 2, 4]
print(arr[idx]) # [10 30 50]
• Boolean masks for filtering.
mask = arr > 25
print(arr[mask]) # [30 40 50]
6. Broadcasting Rules (summary)
• Smaller array is "stretched" to match larger array when compatible.
• Two dimensions are compatible when they are equal or one of them is 1.
A = np.ones((3,4))
B = np.array([1,2,3,4])
A + B # B is broadcast across rows
7. Universal Functions (ufuncs)
• Fast element-wise functions implemented in C.
• Examples: np.add , np.multiply , np.exp , np.log , np.maximum .
2
np.sqrt(np.array([1,4,9]))
np.add([1,2],[3,4])
8. Linear Algebra & Matrix Ops
A = np.array([[1,2],[3,4]])
B = np.array([[5,6],[7,8]])
C = A @ B # matrix multiply
np.linalg.inv(A) # inverse
np.linalg.det(A) # determinant
np.linalg.eig(A) # eigen decomposition
9. Statistical & Aggregation Functions
• np.mean , np.median , np.std , np.sum , np.min , np.max , np.percentile
• Use axis parameter to operate along rows/columns.
arr = np.arange(12).reshape(3,4)
arr.mean(axis=0) # mean of each column
10. Shape Manipulation
• reshape , ravel (view), flatten (copy), transpose , swapaxes , expand_dims ,
squeeze .
x = np.arange(6)
x2 = x.reshape(2,3)
flat = x2.ravel()
11. Memory & Performance Tips
• Use appropriate dtype (e.g., float32 vs float64 ) to save memory.
• Prefer vectorized operations over Python loops.
• Avoid unnecessary copies; use views where possible.
• Use np.concatenate , np.stack , np.vstack , np.hstack for joining arrays.
3
12. Random Numbers
rng = np.random.default_rng(42) # new Generator API
rng.normal(0,1, size=(3,))
rng.integers(0, 10, size=5)
13. File I/O (binary & text)
np.savetxt('out.csv', arr, delimiter=',')
arr = np.loadtxt('out.csv', delimiter=',')
np.save('arr.npy', arr)
arr2 = np.load('arr.npy')
14. Interoperability with Pandas
• Pandas Series / DataFrame use NumPy under the hood.
• Convert: df.to_numpy() or np.array(df['col']) .
15. Advanced Topics (short)
• Structured arrays for mixed dtypes (like columns).
• Masked arrays for missing data ( np.ma ).
• Stride tricks ( np.lib.stride_tricks ) for advanced windowing — use carefully.
16. Mini Examples
Standardization (z-score)
X = np.array([[1,2],[3,4],[5,6]], dtype=float)
mean = X.mean(axis=0)
std = X.std(axis=0)
X_std = (X - mean) / std
4
Moving window (convolution) — simple rolling mean
from numpy.lib.stride_tricks import sliding_window_view
arr = np.arange(10)
win = sliding_window_view(arr, window_shape=3)
rolling_mean = win.mean(axis=1)
17. Common Gotchas
• Slicing gives views — accidental modification of original array.
• Fancy indexing returns copies (not views).
• Broadcasting can hide bugs if shapes are unintentionally compatible.
• Mixing Python lists & NumPy operations may cause implicit conversions.
18. Learning Path & Practice
• Solve array-manipulation problems on practice sites.
• Re-implement small parts of ML preprocessing using NumPy (scaling, PCA basics).
• Profile using %timeit and inspect memory use for large arrays.
If you want, I can: - add runnable example notebooks, or - export this canvas to PDF, or - create a short quiz/
practice sheet based on these topics.