VEL TECH HIGH TECH
Dr. RANGARAJAN Dr. SAKUNTHALA ENGINEERING
COLLEGE
An Autonomous Institution
Approved by AICTE-New Delhi, Affiliated to Anna University, Chennai
Accredited by NBA, New Delhi & Accredited by NAAC with “A” Grade & CGPA of 3.27
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE
Course code Semester
Category OPEN ELECTIVE(OE) L T P C
Course Title PYTHON FOR DATA SCIENCE 3 0 0 3
UNIT II TOWARDS DATA SCIENCE USING NUMPY
Understanding Data Types in Python - The Basics of NumPy Arrays - Computation on NumPy Arrays:
Universal Functions - Aggregations: Min, Max, and Everything in Between Computation on Arrays:
Broadcasting-Comparisons, Masks, and Boolean Logic Fancy Indexing-Sorting Arrays.
COURSEOBJECTIVES:
· To describe the fundamentals for exploring and managing data with Python.
· To examine the various data analytics techniques for labelled/columnar data using Python.
· To demonstrate a flexible range of data visualizations techniques in Python.
· To describe the various Machine learning algorithms for data modelling with Python.
COURSEOUTCOMES:
Blooms
CO.No. CourseOutcomes
level
OnsuccessfulcompletionofthisCourse,studentswillbeableto
C305. 2 Make use of knowledge on NumPy to write programs on K2
array operations.
Understanding Data Types in Python
In Python, data types define the kind of data a variable can hold. Understanding these is essential because different
operations are valid for different data types, and correct usage ensures efficient coding and data processing.
. Basic Built-in Data Types
Type Description Example
int Integer numbers x = 10
float Decimal numbers y = 3.14
bool Boolean values (True or flag = True
False)
str Text (string of characters) name = "Alice"
2. Type Conversion (Casting)
You can convert between types using built-in functions:
python
CopyEdit
int("10") # Converts string to integer 10
float("3.5") # Converts string to float 3.5
str(100) # Converts integer to string '100'
bool(0) # Converts to boolean False
3. Collection Data Types
Type Description Example
lis Ordered, mutable collection fruits = ["apple",
t "banana"]
tup Ordered, immutable collection coords = (10, 20)
le
set Unordered collection of unique elements ids = {101, 102, 103}
dic Key-value pairs student = {"name": "John",
t "age": 21}
4. Type Checking
Use the type() function to check the data type of a variable:
python
CopyEdit
x = 25
print(type(x)) # Output: <class 'int'>
5. NumPy Data Types (For Data Science)
In scientific computing with NumPy, special data types are used for efficient computation:
NumPy Data Type Description
np.int32, np.int64 Fixed-size integers
np.float32, Floating-point numbers
np.float64
np.bool_ Boolean
np.str_ Unicode string
Example:
python
CopyEdit
import numpy as np
arr = np.array([1, 2, 3], dtype=np.int32)
print(arr.dtype) # Output: int32
Why Understanding Data Types is Important in Data Science?
● Ensures correct operations (e.g., numeric vs. text processing)
● Saves memory and improves performance
● Helps prevent errors in data transformation and modeling
● Required when using libraries like Pandas, NumPy, and scikit-learn
The Basics of NumPy Arrays
What is NumPy?
NumPy (Numerical Python) is a powerful Python library used for:
● Efficient numerical computation,
● Creating and manipulating n-dimensional arrays,
● Performing mathematical operations on large datasets.
1. What is a NumPy Array?
A NumPy array is a grid of values, all of the same type, and is indexed by a tuple of non-negative integers.
● NumPy arrays are more efficient than Python lists.
● Arrays support element-wise operations and broadcasting.
import numpy as np
arr = np.array([1, 2, 3]) # 1D array
2. Creating NumPy Arrays
🔹 From Python lists:
python
CopyEdit
a = np.array([1, 2, 3]) # 1D array
b = np.array([[1, 2], [3, 4]]) # 2D array
Using built-in functions:
python
CopyEdit
np.zeros((2, 3)) # 2x3 array of zeros
np.ones((2, 2)) # 2x2 array of ones
np.arange(0, 10, 2) # [0 2 4 6 8]
np.linspace(0, 1, 5) # 5 evenly spaced numbers from 0 to 1
np.eye(3) # 3x3 identity matrix
3. Array Attributes
Attribute Description Example
ndim Number of dimensions arr.ndi
m
shape Tuple representing array dimensions arr.sha
pe
size Total number of elements arr.siz
e
dtype Data type of array elements arr.dty
pe
arr = np.array([[1, 2, 3], [4, 5, 6]])
print(arr.ndim) # 2
print(arr.shape) # (2, 3)
print(arr.size) # 6
4. Indexing and Slicing
Just like Python lists, NumPy arrays support indexing and slicing.
python
CopyEdit
arr = np.array([10, 20, 30, 40])
print(arr[1]) # 20
print(arr[1:3]) # [20 30]
matrix = np.array([[1, 2], [3, 4]])
print(matrix[0, 1]) # 2
5. Array Operations
NumPy supports vectorized operations (element-wise), which are faster than loops.
python
CopyEdit
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
print(a + b) # [5 7 9]
print(a * 2) # [2 4 6]
print(a ** 2) # [1 4 9]
print(np.sin(a)) # Applies sin to each element
6. Reshaping Arrays
You can change the shape of arrays without changing data.
python
CopyEdit
arr = np.array([[1, 2], [3, 4], [5, 6]])
reshaped = arr.reshape((2, 3))
7. Array Copying vs. View
● arr.copy() creates a new array.
● arr.view() or slicing returns a view (reference) to the same data.
Why NumPy Arrays Are Important in Data Science:
● Memory-efficient and fast computations
● Support for vectorized operations (no explicit loops)
● Foundation for Pandas, Scikit-learn, TensorFlow, etc.
Computation on NumPy Arrays
NumPy is optimized for performing fast, element-wise operations on large arrays. It includes mathematical, statistical, and logical operations that are
both vectorized and highly efficient.
1. Arithmetic Operations
NumPy allows arithmetic operations to be applied element-wise on arrays of the same shape.
python
import numpy as np
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
print(a + b) # [5 7 9]
print(a - b) # [-3 -3 -3]
print(a * b) # [4 10 18]
print(a / b) # [0.25 0.4 0.5 ]
print(a ** 2) # [1 4 9]
2. Universal Functions (ufuncs)
NumPy provides many universal functions, which are fast, vectorized operations implemented in C.
🔹 Examples:
Function Description
np.add Element-wise addition
(a, b)
np.sub Element-wise subtraction
tract(
a, b)
np.mul Element-wise multiplication
tiply(
a, b)
np.div Element-wise division
ide(a,
b)
Trigonometric Functions:
x = np.array([0, np.pi/2, np.pi])
print(np.sin(x)) # [0.0, 1.0, 0.0]
Exponential and Logarithmic:
python
CopyEdit
np.exp([1, 2, 3]) # [e^1,
e^2, e^3]
np.log([1, np.e, np.e**2]) # [0.,
1., 2.]
Aggregation Functions
Aggregations compute summary statistics over the entire
array or along an axis.
Common Aggregations:
Function Description
np.sum() Sum of all
elements
np.mean() Mean
(average)
np.std() Standard
deviation
np.min() Minimum
value
np.max() Maximum
value
np.median( Median
)
np.percent Nth percentile
ile()
Example:
arr = np.array([[1, 2, 3], [4, 5, 6]])
print(arr.sum()) # 21
print(arr.mean()) # 3.5
print(arr.sum(axis=0)) # [5 7 9] - column-wise sum
print(arr.sum(axis=1)) # [6 15] - row-wise sum
4. Broadcasting in NumPy
Broadcasting allows NumPy to work with arrays of different shapes during
arithmetic operations.
Rules:
1. If arrays have different dimensions, the smaller one is
stretched to match.
2. Dimensions must be compatible (or one must be 1).
Example:
python
CopyEdit
a = np.array([1, 2, 3]) # shape (3,)
b = np.array([[10], [20]]) # shape (2, 1)
print(a + b)
# Output: [[11 12 13]
# [21 22 23]]
5. Comparison and Boolean
Logic
You can use NumPy to compare elements or apply Boolean conditions.
python
CopyEdit
a = np.array([1, 2, 3, 4])
print(a > 2) # [False False True True]
print(a == 3) # [False False True False]
Logical Operators:
Operator Description
np.logical_ Element-wise
and() AND
np.logical_ Element-wise
or()
OR
np.logical_ Element-wise
not()
NOT
a = np.array([1, 2, 3])
b = np.array([3, 2, 1])
np.logical_and(a < 3, b < 3) # [ True False False]
6. Fancy Indexing
Allows retrieving multiple elements using an array of indices.
python
CopyEdit
arr = np.array([10, 20, 30, 40])
idx = [0, 3]
print(arr[idx]) # [10 40]
7. Sorting Arrays
python
CopyEdit
a = np.array([3, 1, 2])
print(np.sort(a)) # [1 2 3]
For multi-dimensional arrays:
python
CopyEdit
arr = np.array([[3, 1], [4, 2]])
np.sort(arr, axis=1) # Sort rows
Summary Table
Topic Examples / Functions
Arithmetic
Ops +, -, *, /, **
Universal
Funcs np.sin(),
np.exp(),
np.power()
Aggregations
np.sum(),
np.mean(),
np.max()
Broadcasting Add arrays of different shapes
Comparisons
a > b, a == b
Boolean Logic
np.logical_and(
),
np.logical_not(
)
Fancy
Indexing arr[[0, 2]]
Sorting
np.sort()