UNIT II
Data Handling – Part I
Understanding Data Types in Python - Basics of Numpy arrays – Computation on NumPy
Arrays: Universal Functions - Aggregations: Min, Max, and Everything in Between -
Computation on Arrays: Broadcasting – Comparisons, Masks, and Boolean Logic –fancy
indexing – Sorting Arrays - Structured Data.
Understanding Data Types, NumPy Arrays,
and Computation in Python
1. Understanding Data Types in Python
Basics of Data Types
Python supports several basic data types:
Integer (int): Whole numbers, e.g., 5, -10
Float (float): Numbers with a decimal point, e.g., 3.14, -0.5
String (str): Sequence of characters enclosed in quotes, e.g., "hello", 'world'
Boolean (bool): Represents truth values True or False
Example:
python
Copy code
# Example of different data types
num_int = 10
num_float = 3.14
text_str = "Hello, world!"
is_true = True
print(type(num_int)) # Output: <class 'int'>
print(type(num_float)) # Output: <class 'float'>
print(type(text_str)) # Output: <class 'str'>
print(type(is_true)) # Output: <class 'bool'>
Type Conversion
Python allows conversion between different data types using type constructors:
int(), float(), str(), bool()
Example:
python
Copy code
# Type conversion examples
num_str = "25"
num_int = int(num_str) # Convert string to integer
num_float = float(num_str) # Convert string to float
bool_val = bool(num_int) # Convert integer to boolean (True for non-zero,
False for zero)
print(num_int) # Output: 25
print(num_float) # Output: 25.0
print(bool_val) # Output: True
2. Basics of NumPy Arrays
Introduction to NumPy Arrays
NumPy (Numerical Python) provides support for large, multi-dimensional arrays and
matrices. Key advantages over Python lists include:
Efficient operations
Convenience in manipulating numerical data
Speed
Example:
python
Copy code
import numpy as np
# Creating a NumPy array from a Python list
my_list = [1, 2, 3, 4, 5]
arr = np.array(my_list)
print(arr) # Output: [1 2 3 4 5]
print(type(arr)) # Output: <class 'numpy.ndarray'>
Creating NumPy Arrays
NumPy arrays can be created using various methods:
From Python lists: np.array()
Using built-in functions: np.zeros(), np.ones(), np.arange(), np.linspace()
Examples:
python
Copy code
import numpy as np
# Creating arrays from lists
arr1 = np.array([1, 2, 3, 4, 5])
print(arr1) # Output: [1 2 3 4 5]
# Creating arrays using built-in functions
zeros_arr = np.zeros(5, dtype=int) # Array of zeros
print(zeros_arr) # Output: [0 0 0 0 0]
ones_arr = np.ones((3, 2)) # 3x2 array of ones
print(ones_arr)
# Output:
# [[1. 1.]
# [1. 1.]
# [1. 1.]]
range_arr = np.arange(0, 10, 2) # Array with values from 0 to 10
(exclusive), step 2
print(range_arr) # Output: [0 2 4 6 8]
lin_arr = np.linspace(0, 1, 5) # Array with 5 evenly spaced values from
0 to 1
print(lin_arr) # Output: [0. 0.25 0.5 0.75 1. ]
Array Attributes
NumPy arrays have attributes that provide information about the array:
shape: Dimensions of the array
dtype: Data type of the array elements
size: Total number of elements in the array
Example:
python
Copy code
import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]])
print(arr.shape) # Output: (2, 3) -> 2 rows, 3 columns
print(arr.dtype) # Output: int64 (depends on system)
print(arr.size) # Output: 6 (total number of elements)
Print(arr.ndim) # Output: 2 (no of dimentions)
Print(arr.itemsize) # Output: 8 (integer 4 bytes each)
Indexing and Slicing
Accessing elements and subarrays in NumPy arrays:
Indexing: Accessing single elements
Slicing: Accessing subarrays using start:stop:step notation
Example:
python
Copy code
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
# Indexing
print(arr[0]) # Output: 1 (first element)
print(arr[-1]) # Output: 5 (last element)
Arr[2]=10
# Slicing
print(arr[1:4]) # Output: [2 3 4] (elements from index 1 to 3)
print(arr[::-1]) # Output: [5 4 3 2 1] (reverse the array)
Print(arr[0:]
3. Computation on NumPy Arrays: Universal Functions
(ufuncs)
Introduction to Universal Functions (ufuncs)
Universal functions in NumPy operate element-wise on arrays, enabling fast computation.
Examples:
python
Copy code
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
# Arithmetic operations
print(np.add(arr, 2)) # Output: [3 4 5 6 7]
print(np.multiply(arr, 3)) # Output: [ 3 6 9 12 15]
# Trigonometric functions
print(np.sin(arr)) # Output: [ 0.84147098 0.90929743 0.14112001
-0.7568025 -0.95892427]
# Exponential and logarithmic functions
print(np.exp(arr)) # Output: [ 2.71828183 7.3890561
20.08553692 54.59815003 148.4131591 ]
print(np.log(arr)) # Output: [0. 0.69314718 1.09861229
1.38629436 1.60943791]
4. Aggregations: Min, Max, and Everything in Between
Summarizing Data
Aggregation functions in NumPy compute summary statistics across arrays.
Examples:
python
Copy code
import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]])
# Minimum and maximum
print(np.min(arr)) # Output: 1
print(np.max(arr)) # Output: 6
# Sum, mean, median
print(np.sum(arr)) # Output: 21
print(np.mean(arr)) # Output: 3.5
print(np.median(arr)) # Output: 3.5
Aggregation along Axes
Performing aggregations along specific axes of multi-dimensional arrays.
Example:
python
Copy code
import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]])
# Sum along columns (axis=0)
print(np.sum(arr, axis=0)) # Output: [5 7 9]
# Mean along rows (axis=1)
print(np.mean(arr, axis=1)) # Output: [2. 5.]
5. Computation on Arrays: Broadcasting
Understanding Broadcasting
NumPy broadcasting allows operations on arrays of different shapes.
Example:
python
Copy code
import numpy as np
arr1 = np.array([[1, 2, 3], [4, 5, 6]])
arr2 = np.array([10, 20, 30])
# Broadcasting in action
print(arr1 + arr2)
# Output:
# [[11 22 33]
# [14 25 36]]
Broadcasting Rules
Broadcasting rules enable NumPy to perform operations even when shapes differ.
Example:
python
Copy code
import numpy as np
arr1 = np.array([[1, 2, 3], [4, 5, 6]])
arr2 = np.array([10, 20])
# Broadcasting fails due to incompatible shapes
# print(arr1 + arr2) # ValueError: operands could not be broadcast
together with shapes (2,3) (2,)
6. Comparisons, Masks, and Boolean Logic
Comparison Operators
NumPy arrays support element-wise comparison operations.
Example:
python
Copy code
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
# Comparison operators
print(arr > 2) # Output: [False False True True True]
print(arr == 3) # Output: [False False True False False]
Boolean Masks
Boolean arrays can be used as masks to select elements based on conditions.
Example:
python
Copy code
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
# Boolean masking
mask = arr > 2
print(arr[mask]) # Output: [3 4 5]
7. Fancy Indexing
Indexing Arrays with Arrays
NumPy allows indexing with arrays of integers.
Example:
python
Copy code
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
# Fancy indexing
indices = np.array([0, 2, 4])
print(arr[indices]) # Output: [1 3 5]
Boolean Array Indexing
Using boolean arrays as masks for indexing.
Example:
python
Copy code
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
# Boolean indexing
mask = np.array([True, False, True, False, True])
print(arr[mask]) # Output: [1 3 5]
8. Sorting Arrays
Sorting NumPy Arrays
Sorting arrays using NumPy's built-in functions.
Example:
python
Copy code
import numpy as np
arr = np.array([3, 1, 5, 2, 4])
# Sorting
print(np.sort(arr)) # Output: [1 2 3 4 5]
Partial Sorting and Partitioning
Using np.partition() for partial sorting.
Example:
python
Copy code
import numpy as np
arr = np.array([3, 1, 5, 2, 4])
# Partial sorting
print(np.partition(arr, 3)) # Output: [2 1 3 4 5]
9. Structured Data in NumPy
Introduction to Structured Arrays
NumPy supports arrays with structured data types.
Example:
python
Copy code
import numpy as np
# Define structured data type
data_type = np.dtype([('name', 'S10'), ('age', int), ('score', float)])
# Create a structured array
data = np.array([('John', 25, 78.5), ('Jane', 30, 85.5)], dtype=data_type)
print(data)
# Output:
# [(b'John', 25, 78.5) (b'Jane', 30, 85.5)]
Manipulating Structured Data
Accessing and manipulating fields in structured arrays.
Example:
python
Copy code
import numpy as np
# Define structured data type
data_type = np.dtype([('name', 'S10'), ('age', int), ('score', float)])
# Create a structured array
data = np.array([('John', 25, 78.5), ('Jane', 30, 85.5)], dtype=data_type)
# Accessing elements by field
print(data['name']) # Output: [b'John' b'Jane']
print(data['age']) # Output: [25 30]
# Accessing individual elements
print(data[0]['score']) # Output: 78.5