KEMBAR78
MLDAP Module1 | PDF | Machine Learning | Boolean Data Type
0% found this document useful (0 votes)
19 views43 pages

MLDAP Module1

Uploaded by

Gagan Gagan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views43 pages

MLDAP Module1

Uploaded by

Gagan Gagan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

MODULE #1

Definition of Machine Learning

Machine Learning (ML) is a subset of Artificial Intelligence (AI) that focuses on developing
algorithms that enable computers to learn patterns from data and make decisions or predictions
without being explicitly programmed for every task.

In more technical terms, machine learning involves training a model on a dataset, so that it can
learn relationships or structures within the data. Once trained, the model can be used to make
predictions or decisions based on new, unseen data.

Types of Machine Learning

1. Supervised Learning: The model is trained on labeled data (input-output pairs).


o Examples: Linear Regression, Decision Trees, Support Vector Machines (SVM),
Neural Networks.
o Applications: Spam detection, credit scoring, medical diagnosis.
2. Unsupervised Learning: The model is trained on data without labels and tries to find
hidden patterns.
o Examples: K-Means Clustering, Hierarchical Clustering, Principal Component
Analysis (PCA).
o Applications: Customer segmentation, anomaly detection.
3. Semi-supervised Learning: A combination of a small amount of labeled data with a
large amount of unlabeled data.
o Applications: Image recognition, speech analysis.
4. Reinforcement Learning: The model learns by interacting with an environment and
receiving feedback (rewards or penalties).
o Examples: Q-Learning, Deep Q Networks (DQN).
o Applications: Game playing (e.g., AlphaGo), robotics, autonomous vehicles.

Importance of Machine Learning

1. Automation of Decision-Making
o ML systems can automate complex decision-making processes in real time, which
increases efficiency and reduces the need for manual intervention.
2. Data-Driven Insights
o Machine learning can extract useful insights from large and complex datasets that
are often beyond human capabilities to analyze manually.
3. Improved Accuracy and Predictions
o With continuous learning from data, ML models can improve over time, leading
to more accurate forecasts, recommendations, and decisions.
4. Wide Range of Applications
o ML is applied in diverse fields:
 Healthcare: Disease prediction, medical imaging, personalized treatment.
Finance: Fraud detection, risk assessment, algorithmic trading.
Marketing: Customer segmentation, personalized recommendations.
Transportation: Route optimization, self-driving cars.
Manufacturing: Predictive maintenance, quality control.
5. Real-Time Applications
o Machine learning enables systems like real-time language translation, facial
recognition, and spam filtering, which are increasingly integral to daily digital
experiences.
6. Scalability
o ML models can handle vast amounts of data and scale effectively as data grows,
making them suitable for big data environments.

1. Supervised Learning

Definition

Supervised learning is a type of machine learning where the algorithm is trained on a labeled
dataset — meaning each training example is paired with an output label. The model learns to
map inputs to known outputs.

Goal: To learn a function that, given an input, provides the correct output based
on examples.

How It Works

1. Input data (features) and output labels are provided.


2. The algorithm learns the relationship between inputs and outputs.
3. Once trained, it can predict outputs for new, unseen inputs.

Types

 Classification: Output is categorical (e.g., spam or not spam).


 Regression: Output is continuous (e.g., house prices).

Common Algorithms

 Linear Regression
 Logistic Regression
 Decision Trees
 Random Forest
 Support Vector Machines (SVM)
 Neural Networks
 k-Nearest Neighbors (k-NN)
Applications

 Email spam detection


 Fraud detection
 Medical diagnosis (e.g., cancer prediction)
 House price prediction
 Sentiment analysis

2. Unsupervised Learning

Definition

Unsupervised learning is a type of machine learning where the algorithm is trained on unlabeled
data. It tries to find hidden patterns or intrinsic structures in the input data without any
supervision.

Goal :To explore the underlying structure or distribution in the data to learn
more about it.

How It Works

 The model explores similarities or differences in data points.


 It groups or reduces data based on patterns or distributions.

Types

 Clustering: Grouping similar data points together.


 Dimensionality Reduction: Simplifying data while retaining its structure.

Common Algorithms

 K-Means Clustering
 Hierarchical Clustering
 DBSCAN
 Principal Component Analysis (PCA)
 Autoencoders
 t-SNE

Applications

 Customer segmentation
 Market basket analysis
 Anomaly detection
 Document/topic modeling
 Recommendation systems
3. Reinforcement Learning (RL)

Definition

Reinforcement learning is a type of machine learning where an agent learns to make decisions
by interacting with an environment, aiming to maximize cumulative rewards through trial and
error.

Goal: To learn an optimal policy or strategy that maximizes rewards over time.

How It Works

1. The agent observes the current state of the environment.


2. It chooses an action based on a policy.
3. The environment responds with a new state and a reward.
4. The agent updates its policy based on this feedback.

Key Concepts

 Agent: The learner/decision maker.


 Environment: The external system the agent interacts with.
 State: A representation of the current situation.
 Action: Choices the agent can make.
 Reward: Feedback signal to evaluate the action taken.
 Policy: Strategy used to determine actions.
 Value Function: Estimate of future rewards.

Common Algorithms

 Q-Learning
 Deep Q Networks (DQN)
 SARSA
 Policy Gradient Methods
 Actor-Critic Models

Applications

 Game playing (e.g., AlphaGo, Chess AI)


 Robotics (navigation, manipulation)
 Self-driving cars
 Dynamic pricing
 Real-time decision systems

Reinforcement
Feature Supervised Learning Unsupervised Learning
Learning
Reinforcement
Feature Supervised Learning Unsupervised Learning
Learning
Labeled
Yes No No (uses rewards instead)
Data
Learn optimal actions
Goal Predict outputs Find patterns/structures
(policy)
Indirect
Feedback Direct (correct labels) None
(rewards/penalties)
Email classification, price Market segmentation, Game AI, robotics,
Examples
prediction anomaly detection trading bots

APPLICATIONS OF MACHINE LEARNING


1. Healthcare

Applications:

 Disease Prediction & Diagnosis: ML models can predict diseases like diabetes, cancer,
and heart conditions using patient data (e.g., IBM Watson).
 Medical Imaging: ML analyzes X-rays, MRIs, and CT scans for early detection of
conditions (e.g., tumors).
 Drug Discovery: Accelerates development by predicting molecular interactions.
 Personalized Treatment: Recommends tailored treatments based on a patient’s history
and genetics.
 Remote Monitoring: Predicts health issues from wearable device data (e.g., Fitbit, Apple
Watch).

2. Finance and Banking

Applications:

 Fraud Detection: ML identifies unusual patterns in transactions (e.g., credit card fraud).
 Credit Scoring: Assesses creditworthiness of individuals or businesses.
 Algorithmic Trading: Automated trading based on market trends and patterns.
 Loan Approval: Predicts default risks and streamlines underwriting.
 Customer Service: AI chatbots for support and queries.

3. Retail and E-Commerce

Applications:
 Recommendation Systems: Suggests products based on browsing and purchase history
(e.g., Amazon, Netflix).
 Customer Segmentation: Groups customers for targeted marketing.
 Inventory Management: Predicts demand and optimizes stock.
 Price Optimization: Adjusts prices dynamically based on competition, demand, and
seasonality.
 Chatbots and Virtual Assistants: Improve customer support.

4. Transportation and Automotive

Applications:

 Autonomous Vehicles: Self-driving cars use ML for object detection, navigation, and
decision-making (e.g., Tesla Autopilot).
 Route Optimization: Finds efficient routes using traffic and weather data (e.g., Google
Maps, Uber).
 Predictive Maintenance: Prevents failures by analyzing sensor data from vehicles.
 Fleet Management: Optimizes logistics and delivery schedules.

5. Manufacturing

Applications:

 Predictive Maintenance: Detects machinery faults before they occur.


 Quality Control: Inspects products using computer vision and anomaly detection.
 Process Optimization: Enhances production efficiency.
 Supply Chain Forecasting: Improves inventory planning and logistics.

6. Education

Applications:

 Personalized Learning: Adaptive learning platforms customize content for students


(e.g., Khan Academy).
 Automated Grading: Evaluates essays and quizzes automatically.
 Early Dropout Prediction: Identifies at-risk students.
 Chatbots for Tutoring: Provide homework help or answer course-related questions.

7. Gaming

Applications:

 Game AI: NPCs (non-player characters) learn and adapt to player behavior.
 Dynamic Difficulty Adjustment: Modifies game complexity based on player skill.
 Procedural Content Generation: ML creates game levels, stories, and assets.
 Player Behavior Analysis: Detects toxic behavior or cheating.

8. Social Media and Online Platforms

Applications:

 Content Recommendation: News feeds, friend suggestions (e.g., Facebook, TikTok).


 Sentiment Analysis: Understands public opinion on topics or brands.
 Fake News Detection: Identifies misinformation using text classification.
 Ad Targeting: Personalized advertisements based on user activity.

9. Cybersecurity

Applications:

 Intrusion Detection: Identifies unusual system activity or attacks.


 Spam and Phishing Detection: Filters malicious emails and messages.
 Malware Classification: Detects harmful software using ML models.
 User Behavior Analytics: Detects insider threats or compromised accounts.

10. Agriculture

Applications:

 Crop Monitoring: Drones and ML analyze crop health via images.


 Yield Prediction: Estimates harvest based on weather, soil, and planting data.
 Pest and Disease Detection: Identifies early signs of infestation.
 Soil Analysis: Recommends fertilizer and planting practices.

11. Environmental Science

Applications:

 Weather Forecasting: ML improves short- and long-term predictions.


 Climate Modeling: Analyzes climate change trends and risks.
 Natural Disaster Prediction: Earthquake, flood, and wildfire detection.
 Air and Water Quality Monitoring: Real-time pollution detection.

12. Government and Public Services

Applications:

 Smart Cities: Traffic control, energy efficiency, waste management.


 Public Safety: Predictive policing, crime forecasting.
 Census and Survey Analysis: Automates data analysis from large-scale surveys.
 Citizen Engagement: AI-powered bots for services and inquiries.

PYTHON PROGRAMMING
Python Syntax

Definition

Python syntax refers to the set of rules that defines how Python code should be written and
interpreted.

Key Characteristics

 Python uses indentation (whitespace) to define blocks of code instead of braces {} like
other languages.
 It is case-sensitive (Variable and variable are different).
 Statements do not require a semicolon ; (though allowed).

Example
def greet(name):
print("Hello,", name)

Indentation is critical: all lines within a function must be indented consistently.

Variables

Definition

Variables are containers for storing data values. Python is dynamically typed, meaning you
don’t need to declare the type of a variable.

Syntax
x = 5
name = "Alice"
price = 12.99

🧠 Rules for Naming Variables

 Must start with a letter or underscore (_)


 Cannot start with a number
 Can contain letters, numbers, and underscores
 Case-sensitive

Examples
my_var = 10
MyVar = "text" # different from my_var
_name = "John"
age2 = 25

Comments

Definition: Comments are notes in the code that are not executed. They help explain the code.

Single-Line Comment

Use the hash symbol #:

python
CopyEdit
# This is a comment
x = 10 # This is an inline comment

Multi-Line Comment (Convention)

Python doesn't have a native multi-line comment syntax, but triple quotes are often used:

"""
This is a multi-line comment
used for documentation
"""

Data Types

Python has several built-in data types, which are automatically assigned when values are stored
in variables.

Basic Data Types

Data Type Description Example

int Integer 10, -5

float Decimal 3.14, -2.0

str String "hello", 'world'


Data Type Description Example

bool Boolean True, False

Advanced Data Types

Data Type Description Example

list Ordered, mutable collection [1, 2, 3]

tuple Ordered, immutable collection (1, 2, 3)

set Unordered, unique elements {1, 2, 3}

dict Key-value pairs {'name': 'Alice'}

Type Checking
x = 5
print(type(x)) # <class 'int'>

Type Casting

Definition: Type casting means converting the data type of a value to another
type.

Common Functions

Function Description Example

int() Converts to integer int("5") → 5

float() Converts to float float("3.14") → 3.14

str() Converts to string str(100) → "100"

bool() Converts to boolean bool(0) → False

Examples
x = "123"
y = int(x) # y is now 123 (int)

a = 3.14
b = str(a) # b is "3.14" (str)

c = 0
d = bool(c) # d is False

Operators
Arithmetic Operators: Used for mathematical calculations.
Operator Description Example Result

+ Addition 5 + 3 8

- Subtraction 5 - 3 2

* Multiplication 5 * 3 15

/ Division (float) 5 / 3 1.666...

// Floor Division 5 // 3 1

% Modulus (remainder) 5 % 3 2

** Exponentiation 5 ** 3 125

Example:
a = 7
b = 3
print(a + b) # 10
print(a / b) # 2.3333
print(a // b) # 2
print(a % b) # 1
print(a ** b) # 343

2. Comparison (Relational) Operators

Used to compare two values, return Boolean (True or False).

Operator Description Example Result

== Equal to 5 == 3 False
Operator Description Example Result

!= Not equal to 5 != 3 True

> Greater than 5 > 3 True

< Less than 5 < 3 False

>= Greater than or equal 5 >= 5 True

<= Less than or equal 5 <= 3 False

Example:
x = 10
y = 20
print(x == y) # False
print(x < y) # True
print(x >= 10) # True

Logical Operators

Used to combine conditional statements.

Operator Description Example Result

and Logical AND (5 > 3) and (2 < 4) True

or Logical OR (5 < 3) or (2 < 4) True

not Logical NOT not(5 > 3) False

Example:
a = 5
b = 10
print(a > 3 and b < 15) # True
print(a == 5 or b == 5) # True
print(not(a == b)) # True

Assignment Operators : Used to assign values to variables, often combined with arithmetic.

Operator Description Example Equivalent To

= Assign x = 5 -
Operator Description Example Equivalent To

+= Add and assign x += 3 x = x + 3

-= Subtract and assign x -= 2 x = x - 2

*= Multiply and assign x *= 4 x = x * 4

/= Divide and assign x /= 5 x = x / 5

%= Modulus and assign x %= 2 x = x % 2

//= Floor divide & assign x //= 2 x = x // 2

**= Exponent & assign x **= 3 x = x ** 3

Example:
x = 10
x += 5 # x = 15
x *= 2 # x = 30
print(x)

5. Bitwise Operators : Operate on bits (binary digits) of integers.


Operator Description Example Result (binary)

& AND 5 & 3 1 (0101 & 0011 = 0001)

` ` OR `5

^ XOR 5 ^ 3 6 (0101 ^ 0011 = 0110)

~ NOT (invert bits) ~5 -6 (invert bits and add 1)

<< Left Shift 5 << 1 10 (0101 << 1 = 1010)

>> Right Shift 5 >> 1 2 (0101 >> 1 = 0010)

Example:
a = 5 # binary 0101
b = 3 # binary 0011
print(a & b) # 1
print(a | b) # 7
print(a ^ b) # 6
print(~a) # -6
print(a << 1) # 10
print(a >> 1) # 2

Membership Operators

Check if a value exists in a sequence (like a list, tuple, string).

Operator Description Example

in Returns True if found 'a' in 'cat' → True

not in Returns True if not found 'b' not in 'cat' → True

Example:
my_list = [1, 2, 3, 4]
print(3 in my_list) # True
print(5 not in my_list) # True

Identity Operators

Check if two variables refer to the same object in memory.

Operator Description Example

is True if both are the same object a is b

is not True if not the same object a is not b

Example:
a = [1, 2, 3]
b = a
c = [1, 2, 3]

print(a is b) # True (both point to same list)


print(a is c) # False (different lists with same content)
print(a == c) # True (same content)

Strings in python
What is a String?

A string in Python is a sequence of characters enclosed in quotes. It can be in:


 Single quotes: 'Hello'
 Double quotes: "Hello"
 Triple quotes for multi-line strings: '''Hello''' or """Hello"""

Strings are immutable, meaning once created, their contents cannot be changed.

Creating Strings
s1 = 'Hello'
s2 = "World"
s3 = '''This is
a multi-line
string'''

String Operations

Concatenation (+)
greeting = s1 + " " + s2 # 'Hello World'

Repetition (*)
echo = "Ha" * 3 # 'HaHaHa'

Accessing Characters and Slicing

 Indexing starts at 0
 Supports negative indexing (from the end)

text = "Python"
print(text[0]) # 'P'
print(text[-1]) # 'n'
print(text[1:4]) # 'yth' (from index 1 to 3)
print(text[:3]) # 'Pyt' (start to 2)
print(text[3:]) # 'hon' (3 to end)
print(text[:]) # 'Python' (whole string)

Common String Functions and Methods

4.1. Length
len(text) # 6
4.2. Changing Case

Method Description Example

.lower() Converts to lowercase "HELLO".lower() → "hello"

.upper() Converts to uppercase "hello".upper() → "HELLO"

"hello world".capitalize() → "Hello


.capitalize() Capitalizes first letter
world"

Capitalizes first letter of each


.title() "hello world".title() → "Hello World"
word

.swapcase() Swaps case "Hello".swapcase() → "hELLO"

4.3. Searching and Counting

Method Description Example

Returns index of first occurrence of sub or


.find(sub) "hello".find("e") → 1
-1 if not found

.index(sub) Like .find() but raises error if not found "hello".index("l") → 2

.count(sub) Counts occurrences of substring "hello".count("l") → 2

.startswith(sub) "hello".startswith("he") →
Checks if string starts with sub
True

.endswith(sub) "hello".endswith("lo") →
Checks if string ends with sub
True

4.4. Modifying Strings

Method Description Example

Removes whitespace from both


.strip() " hello ".strip() → "hello"
ends

.lstrip() Removes whitespace from left " hello".lstrip() → "hello"

.rstrip() Removes whitespace from right "hello ".rstrip() → "hello"


Method Description Example

.replace(old, Replaces occurrences of old with


new) "hello".replace("l", "p") → "heppo"
new

4.5. Splitting and Joining

Method Description Example

Splits string into list by separator "a,b,c".split(",") → ['a',


.split(sep=None)
(default whitespace) 'b', 'c']

.join(iterable) Joins list/tuple into string with separator ",".join(['a', 'b', 'c']) →
"a,b,c"

4.6. Checking Content

Method Description Example

.isalpha() Checks if all characters are letters "abc".isalpha() → True

.isdigit() Checks if all characters are digits "123".isdigit() → True

.isalnum() Checks if all are letters or digits "abc123".isalnum() → True

.isspace() Checks if all are whitespace " \t\n".isspace() → True

Escape Characters

Used to insert special characters in strings.

Escape Sequence Meaning

\\ Backslash \

\' Single quote '

\" Double quote "

\n New line

\t Tab
Example:

print("Line 1\nLine 2")


# Output:
# Line 1
# Line 2

String Formatting

Old style: % operator


name = "Alice"
age = 30
print("My name is %s and I am %d years old." % (name, age))

New style: .format()


print("My name is {} and I am {} years old.".format(name, age))
print("My name is {0} and I am {1} years old.".format(name, age))
print("My name is {name} and I am {age} years old.".format(name=name,
age=age))

f-strings (Python 3.6+)


print(f"My name is {name} and I am {age} years old.")

Example Code
text = " Hello, Python! "

print(text.lower()) # " hello, python! "


print(text.strip()) # "Hello, Python!"
print(text.replace("Python", "World")) # " Hello, World! "

words = text.strip().split(",")
print(words) # ['Hello', ' Python!']

print("Python".startswith("Py")) # True
print("Python".find("th")) # 2

print(f"Length of text: {len(text)}") # Length of text: 17

List
Definition

 Ordered, mutable (can be changed), and allows duplicate elements.


 Elements accessed by index (starting at 0).
Creating Lists
my_list = [1, 2, 3, 4]
empty_list = []
mixed_list = [1, "two", 3.0, True]

Accessing Elements
print(my_list[0]) # 1
print(my_list[-1]) # 4 (last element)
print(my_list[1:3]) # [2, 3]

Common List Methods

Method Description Example

.append(x) Add element x at end my_list.append(5)

.insert(i, x) Insert x at index i my_list.insert(2, 99)

Extend list by appending elements from


.extend(iter) my_list.extend([6, 7])
iterable

.remove(x) Remove first occurrence of x my_list.remove(2)

Remove and return element at index i (default my_list.pop() or


.pop([i])
last) my_list.pop(1)

.index(x) Return index of first occurrence of x my_list.index(3)

.count(x) Count occurrences of x my_list.count(2)

.sort() Sort the list in place my_list.sort()

.reverse() Reverse the list in place my_list.reverse()

.clear() Remove all elements my_list.clear()

Example
lst = [3, 1, 4]
lst.append(2) # [3,1,4,2]
lst.sort() # [1,2,3,4]
lst.remove(3) # [1,2,4]
print(lst.pop()) # 4
print(lst) # [1,2]
Tuple
Definition

 Ordered, immutable (cannot be changed after creation), allows duplicates.


 Like lists but read-only.

Creating Tuples
t = (1, 2, 3)
singleton = (5,) # single element tuple requires comma
empty = ()

Accessing Elements
print(t[0]) # 1
print(t[-1]) # 3
print(t[1:3]) # (2, 3)

Tuple Methods

Method Description

.count(x) Count occurrences of x

.index(x) Return index of first occurrence of x

Example
t = (1, 2, 3, 2)
print(t.count(2)) # 2
print(t.index(3)) # 2

Set
Definition

 Unordered collection of unique elements.


 Mutable (can add/remove elements).
 No indexing or slicing.

Creating Sets
s = {1, 2, 3}
empty_set = set() # must use set() to create empty set, {} creates empty
dict

Adding/Removing Elements
s.add(4) # add element
s.remove(2) # remove element, raises KeyError if not found
s.discard(5) # remove element if present, no error if absent
s.pop() # removes and returns arbitrary element
s.clear() # remove all elements

Set Operations (Methods)

Method Description Example

.union(other) / ` ` Union of sets

.intersection(other) / & Intersection s & {2,3}

.difference(other) / - Difference s - {2}

.symmetric_difference(other) / ^ Elements in either set but not both s ^ {1,3}

.issubset(other) Check if subset {1,2}.issubset(s)

.issuperset(other) Check if superset s.issuperset({1})

Example
a = {1, 2, 3}
b = {2, 3, 4}
print(a | b) # {1, 2, 3, 4}
print(a & b) # {2, 3}
print(a - b) # {1}

Dictionary
Definition

 Unordered collection of key-value pairs.


 Keys must be immutable (e.g., strings, numbers, tuples).
 Mutable.

Creating Dictionaries
d = {'name': 'Alice', 'age': 30}
empty_dict = {}
d2 = dict(a=1, b=2)

Accessing Values
print(d['name']) # 'Alice'
print(d.get('age')) # 30
print(d.get('salary', 0)) # 0 (default if key not found)

Adding/Updating Entries
d['salary'] = 5000
d.update({'age': 31, 'city': 'NY'})

Removing Entries
d.pop('age') # removes 'age' key and returns its value
d.popitem() # removes and returns arbitrary key-value pair
del d['name'] # delete by key
d.clear() # empty dictionary

Dictionary Methods

Method Description

.keys() Returns a view of keys

.values() Returns a view of values

.items() Returns a view of (key, value) tuples

.get(key, default) Returns value for key or default

.update(other_dict) Update dictionary with another dict

.pop(key) Remove key and return value

.popitem() Remove and return (key, value) pair

.clear() Remove all items

Example
person = {'name': 'Bob', 'age': 25}
print(person.keys()) # dict_keys(['name', 'age'])
print(person.values()) # dict_values(['Bob', 25])
print(person.items()) # dict_items([('name', 'Bob'), ('age', 25)])
person['age'] = 26
person['city'] = 'LA'
age = person.pop('age')
print(age) # 26

Python if, elif, and else Statements in Detail

1. Purpose

 Used to execute certain blocks of code only if specific conditions are true.
 Controls program flow by making decisions.

2. Basic Syntax
if condition:
# code to execute if condition is True
else:
# code to execute if condition is False

3. Example
age = 18

if age >= 18:


print("You are an adult.")
else:
print("You are a minor.")

Output:

You are an adult.

4. elif (else if)

 Allows multiple conditions to be checked sequentially.


 The first true condition’s block runs, and the rest are skipped.

Syntax:
if condition1:
# block 1
elif condition2:
# block 2
elif condition3:
# block 3
else:
# else block

Example:
score = 85
if score >= 90:
print("Grade: A")
elif score >= 80:
print("Grade: B")
elif score >= 70:
print("Grade: C")
else:
print("Grade: F")

Output:

Grade: B

5. Conditions

 Conditions are expressions that evaluate to True or False.


 Common operators used in conditions:

Operator Meaning Example

== Equal to x == 5

!= Not equal to x != 10

< Less than x < 20

> Greater than x > 0

<= Less than or equal to x <= 100

>= Greater than or equal to x >= 50

6. Logical Operators

Combine multiple conditions:

Operator Meaning Example

and Both True x > 0 and x < 10

or Either True x < 0 or x > 10

not Negates condition not(x == 5)

Example:

x = 15
if x > 10 and x < 20:
print("x is between 10 and 20")

7. Nested if Statements

 You can put if statements inside other if or else blocks.

Example:

x = 10
if x > 5:
if x < 15:
print("x is between 6 and 14")
else:
print("x is 15 or more")
else:
print("x is 5 or less")

Output:

x is between 6 and 14

8. Ternary Conditional Operator (Inline if-else)

 Short way to write simple if-else expressions.

Syntax:

value_if_true if condition else value_if_false

Example:

age = 17
status = "Adult" if age >= 18 else "Minor"
print(status) # Output: Minor

9. Example Program Using if-elif-else


num = int(input("Enter a number: "))

if num > 0:
print("Positive number")
elif num == 0:
print("Zero")
else:
print("Negative number")
Python Loops and Functions in Detail

1. For Loop

Purpose:

 Iterate over a sequence (like a list, tuple, string, or range).


 Executes a block of code repeatedly for each item in the sequence.

Syntax:
for <variable> in <sequence>:
# code block to execute

Example:
fruits = ['apple', 'banana', 'cherry']
for fruit in fruits:
print(fruit)

Output:

apple
banana
cherry

Using range()
for i in range(5): # 0 to 4
print(i)

2. While Loop

Purpose:

 Repeats a block of code as long as a condition is true.


 Condition checked before each iteration.

Syntax:
while condition:
# code block to execute

Example:
count = 0
while count < 3:
print(count)
count += 1

Output:

0
1
2

3. Loop Control Statements

 break: Exit the loop immediately.


 continue: Skip the rest of the loop body for the current iteration and continue with the
next iteration.
 else (optional with loops): Executes after the loop completes normally (no break).

Example with break and continue:


for i in range(10):
if i == 5:
break # Exit loop when i is 5
if i % 2 == 0:
continue # Skip even numbers
print(i)

Output:
1
3

4. Functions

Purpose:

 Block of reusable code that performs a specific task.


 Can take inputs (parameters) and return outputs.

Defining a Function
def function_name(parameters):
# code block
return value # optional

Calling a Function
result = function_name(arguments)

Example:
def greet(name):
print(f"Hello, {name}!")

greet("Alice") # Output: Hello, Alice!

5. Function Components

5.1. Parameters vs Arguments

 Parameters: Variables listed in function definition.


 Arguments: Actual values passed when calling the function.

5.2. Return Statement

 Sends back a result from the function.


 If no return, function returns None.

def add(a, b):


return a + b

sum = add(3, 4) # 7

5.3. Default Parameters


def greet(name="Guest"):
print(f"Hello, {name}!")

greet() # Hello, Guest!


greet("Bob") # Hello, Bob!

5.4. Keyword Arguments


def describe_pet(name, animal_type="dog"):
print(f"{name} is a {animal_type}")

describe_pet(animal_type="cat", name="Whiskers")

5.5. Variable-length Arguments

 Use *args for non-keyword variable arguments.


 Use **kwargs for keyword variable arguments.

def add_numbers(*args):
return sum(args)

print(add_numbers(1, 2, 3, 4)) # 10

6. Example Putting it All Together


def factorial(n):
"""Calculate factorial of n using a while loop."""
result = 1
while n > 1:
result *= n
n -= 1
return result

for i in range(6):
print(f"{i}! = {factorial(i)}")

Output:

0! = 1
1! = 1
2! = 2
3! = 6
4! = 24
5! = 120
NumPy in Python: Detailed Overview

1. What is NumPy?

 NumPy stands for Numerical Python.


 It is a fundamental package for scientific computing in Python.
 Provides support for:
o Large, multi-dimensional arrays and matrices.
o A large collection of high-level mathematical functions to operate on these arrays
efficiently.
 Written in C for performance; operations are faster than standard Python lists.

2. Why Use NumPy?

 Python lists are slow for numerical operations because they store objects of different
types and lack vectorized operations.
 NumPy arrays (called ndarrays) are:
o Fixed-type (homogeneous).
o Stored compactly in memory.
o Support vectorized operations (operate on whole arrays element-wise without
explicit loops).
 Essential for:
o Data analysis
o Machine learning
o Image processing
o Scientific research

3. Installing NumPy

If you don’t have NumPy installed, install it using:

pip install numpy

4. Importing NumPy

By convention, import NumPy as np:

import numpy as np
5. NumPy Arrays

5.1 Creating Arrays

 From Python lists:

a = np.array([1, 2, 3, 4])
print(a) # [1 2 3 4]

 Multi-dimensional array (2D matrix):

b = np.array([[1, 2], [3, 4]])


print(b)
# [[1 2]
# [3 4]]

5.2 Array Attributes


print(a.ndim) # Number of dimensions (1)
print(b.shape) # Shape (rows, columns) (2, 2)
print(a.size) # Total number of elements (4)
print(a.dtype) # Data type (int64, float64, etc.)

6. Creating Special Arrays


np.zeros((3, 3)) # 3x3 array of zeros
np.ones((2, 4)) # 2x4 array of ones
np.eye(3) # 3x3 Identity matrix
np.arange(0, 10, 2) # Array from 0 to 10 with step 2: [0, 2, 4, 6, 8]
np.linspace(0, 1, 5) # 5 equally spaced values between 0 and 1

7. Array Operations

 Arithmetic operations are element-wise:

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

print(a + b) # [5 7 9]
print(a * b) # [4 10 18]
print(a ** 2) # [1 4 9]
print(np.sqrt(a)) # [1. 1.414 1.732]

 Operations also support broadcasting (automatic expansion of smaller arrays to match


larger arrays):

a = np.array([1, 2, 3])
print(a + 10) # [11 12 13]
8. Indexing and Slicing

 Similar to Python lists but supports multi-dimensional indexing:

a = np.array([1, 2, 3, 4, 5])
print(a[0]) # 1
print(a[1:4]) # [2 3 4]

b = np.array([[1, 2], [3, 4], [5, 6]])


print(b[1, 0]) # 3 (row 1, column 0)
print(b[:, 1]) # [2 4 6] (all rows, column 1)

9. Reshaping Arrays

Change the shape without changing data:

a = np.arange(6) # [0 1 2 3 4 5]
b = a.reshape((2, 3))
print(b)
# [[0 1 2]
# [3 4 5]]

10. Aggregation Functions

Calculate summaries:

a = np.array([1, 2, 3, 4])

print(a.sum()) # 10
print(a.mean()) # 2.5
print(a.min()) # 1
print(a.max()) # 4
print(a.std()) # Standard deviation

For multi-dimensional arrays, specify axis:

b = np.array([[1, 2], [3, 4]])


print(b.sum(axis=0)) # Sum each column: [4 6]
print(b.sum(axis=1)) # Sum each row: [3 7]

11. Boolean Indexing

Select elements based on a condition:

a = np.array([1, 2, 3, 4, 5])
print(a[a > 3]) # [4 5]
12. Copy vs View

 b = a — Both point to the same data.


 b = a.copy() — Creates a new array with copied data.

13. Example: Matrix Multiplication


a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6], [7, 8]])

c = np.dot(a, b)
print(c)
# [[19 22]
# [43 50]]

Or using operator:

c = a @ b

14. Why NumPy Is Essential?

 Performance: NumPy operations are implemented in C and optimized.


 Memory efficiency: Arrays consume less memory than Python lists.
 Convenience: Easy to manipulate arrays with rich set of functions.
 Interoperability: Works well with libraries like Pandas, SciPy, Matplotlib, scikit-learn.

15. Simple Code Example


import numpy as np

# Create array of 10 elements 0 to 9


arr = np.arange(10)

# Reshape to 2x5 matrix


mat = arr.reshape(2, 5)

print("Matrix:\n", mat)

# Calculate sum of each column


col_sum = mat.sum(axis=0)
print("Column sums:", col_sum)

# Filter elements greater than 5


filtered = arr[arr > 5]
print("Elements > 5:", filtered)
Python Pandas

1. What is Pandas?

 Pandas is an open-source Python library providing high-performance, easy-to-use data


structures and data analysis tools.
 It is built on top of NumPy and designed for working with tabular data (like
spreadsheets or SQL tables).
 Provides two main data structures:
o Series (1D labeled array)
o DataFrame (2D labeled, tabular data structure)

2. Why Use Pandas?

 Handles heterogeneous data (different data types in columns).


 Supports missing data gracefully.
 Offers powerful data filtering, grouping, aggregation, and reshaping.
 Integrates well with other data science tools.
 Widely used in:
o Data cleaning
o Data exploration
o Statistical modeling
o Machine learning preprocessing

3. Installing Pandas

If not installed, run:

pip install pandas

4. Importing Pandas

Conventionally imported as:

import pandas as pd

5. Core Data Structures

5.1 Series
 A 1D labeled array capable of holding any data type.
 Has an index for labels (default integer index if not provided).

import pandas as pd

s = pd.Series([10, 20, 30, 40], index=['a', 'b', 'c', 'd'])


print(s)

Output:

a 10
b 20
c 30
d 40
dtype: int64

5.2 DataFrame

 2D table with rows and columns.


 Columns can have different data types.
 Can be created from dict of lists, lists of dicts, NumPy arrays, or other DataFrames.

data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['NY', 'LA', 'Chicago']
}

df = pd.DataFrame(data)
print(df)

Output:

Name Age City


0 Alice 25 NY
1 Bob 30 LA
2 Charlie 35 Chicago

6. Reading and Writing Data

6.1 Reading CSV


df = pd.read_csv('data.csv')

6.2 Writing CSV


df.to_csv('output.csv', index=False)

Similarly supports Excel, JSON, SQL databases, etc.


7. Data Inspection
df.head() # First 5 rows
df.tail(3) # Last 3 rows
df.info() # Summary info (types, non-null counts)
df.describe() # Statistical summary of numeric columns
df.shape # Tuple (rows, columns)
df.columns # List of column names

8. Selecting Data

8.1 Select Column(s)


ages = df['Age'] # Series
subset = df[['Name', 'City']] # DataFrame

8.2 Select Rows by Index


df.iloc[0] # First row by integer position
df.loc[0] # First row by label (if index is default numeric,
same as iloc)
df.loc[0:2] # Rows from 0 to 2 inclusive

8.3 Conditional Selection (Filtering)


df[df['Age'] > 28]

9. Adding / Modifying Columns


df['Salary'] = [70000, 80000, 90000] # Add new column

df['AgePlusTen'] = df['Age'] + 10 # Modify/create column using


vectorized operation

10. Handling Missing Data


df.isnull() # Boolean DataFrame showing missing values
df.dropna() # Remove rows with missing values
df.fillna(0) # Fill missing values with 0

11. Grouping and Aggregation

Group data by column(s) and apply aggregation functions.

grouped = df.groupby('City')
print(grouped['Age'].mean()) # Average age per city

# Multiple aggregations
grouped['Age'].agg(['mean', 'min', 'max'])

12. Sorting
df.sort_values(by='Age') # Sort ascending by Age
df.sort_values(by='Age', ascending=False) # Sort descending

13. Merging and Joining

Combine DataFrames on common columns or indices.

df1 = pd.DataFrame({'key': ['A', 'B', 'C'], 'val1': [1, 2, 3]})


df2 = pd.DataFrame({'key': ['A', 'B', 'D'], 'val2': [4, 5, 6]})

merged = pd.merge(df1, df2, on='key', how='inner') # inner, left, right,


outer joins
print(merged)

14. Applying Functions

Apply functions to DataFrame or Series elements.

df['AgePlusTen'] = df['Age'].apply(lambda x: x + 10)

15. Reshaping

Pivot tables and stacking/unstacking.

df.pivot(index='Name', columns='City', values='Age')

16. Example: Simple Data Analysis


import pandas as pd

# Create DataFrame
data = {'Name': ['Anna', 'Bob', 'Cara', 'Dave'],
'Age': [28, 24, 35, 40],
'Score': [85, 90, 88, 92]}

df = pd.DataFrame(data)
# Filter people older than 30
older_than_30 = df[df['Age'] > 30]
# Calculate average score
avg_score = df['Score'].mean()
print("People older than 30:\n", older_than_30)
print("Average score:", avg_score)
Python Matplotlib

1. What is Matplotlib?

 Matplotlib is a widely used Python library for creating static, interactive, and
animated visualizations.
 It provides a MATLAB-like interface for plotting, making it familiar to those from
scientific and engineering backgrounds.
 The most commonly used module is pyplot, which provides functions to create plots and
charts easily.

2. Why Use Matplotlib?

 Enables creation of a wide variety of plots: line, bar, scatter, histogram, pie, etc.
 Highly customizable plots.
 Integrates well with NumPy and Pandas for quick data visualization.
 Essential tool for data analysis, exploratory data analysis (EDA), and reporting.

3. Installing Matplotlib

If you don’t have it installed:

pip install matplotlib

4. Importing Matplotlib
import matplotlib.pyplot as plt

5. Basic Plotting

5.1 Line Plot


import matplotlib.pyplot as plt

x = [1, 2, 3, 4, 5]
y = [2, 3, 5, 7, 11]

plt.plot(x, y) # Plot line connecting points


plt.title("Simple Line Plot")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.show()

6. Common Plot Types


Plot Type Description Example Code Snippet

Line Plot Connect points with lines plt.plot(x, y)


Plot Type Description Example Code Snippet

Scatter Plot Plot individual points plt.scatter(x, y)

Bar Chart Vertical bars for categorical data plt.bar(categories, values)

Histogram Distribution of numerical data plt.hist(data, bins=10)

Pie Chart Circular proportion chart plt.pie(sizes, labels=labels)

Box Plot Summary statistics (median, quartiles) plt.boxplot(data)

7. Plot Customization

7.1 Adding Labels and Title


plt.xlabel("X-axis Label")
plt.ylabel("Y-axis Label")
plt.title("Plot Title")

7.2 Adding Grid


plt.grid(True)

7.3 Changing Line Style, Color, Marker


plt.plot(x, y, color='red', linestyle='--', marker='o')

 Colors: 'b', 'g', 'r', 'c', 'm', 'y', 'k', 'w'


 Linestyles: '-' (solid), '--' (dashed), '-.', ':'
 Markers: 'o' (circle), '^' (triangle), 's' (square), etc.

8. Multiple Plots on Same Figure


plt.plot(x, y, label='Line 1')
plt.plot(x, [i**2 for i in x], label='Line 2')
plt.legend() # Show legend
plt.show()

9. Subplots

Create multiple plots in one figure.

fig, axs = plt.subplots(2, 1) # 2 rows, 1 column


axs[0].plot(x, y)
axs[0].set_title('First Plot')
axs[1].bar(['A', 'B', 'C'], [3, 7, 5])
axs[1].set_title('Second Plot')
plt.tight_layout() # Adjust spacing
plt.show()

10. Saving Plots

Save figure as image file:

plt.savefig('plot.png') # Save current figure as PNG

Supported formats: PNG, JPG, JPEG, PDF, etc.

11. Example: Scatter Plot with Customization


import matplotlib.pyplot as plt
import numpy as np

x = np.random.rand(50)
y = np.random.rand(50)
sizes = np.random.randint(20, 200, size=50)
colors = np.random.rand(50)

plt.scatter(x, y, s=sizes, c=colors, alpha=0.5, cmap='viridis')


plt.colorbar() # Show color scale
plt.title("Random Scatter Plot")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.show()

12. Interactive Plots

 Use %matplotlib inline in Jupyter notebooks to display plots inline.


 %matplotlib notebook enables interactive zooming and panning.

13. Integration with Pandas

Pandas DataFrames have built-in plot methods powered by Matplotlib.

import pandas as pd
data = {'Apples': [3, 2, 0, 1],
'Oranges': [0, 3, 7, 2]}
df = pd.DataFrame(data)
df.plot(kind='bar')
plt.show()
Data Preprocessing in Python: Step-by-Step

What is Data Preprocessing?

 Data preprocessing is the process of cleaning and transforming raw data into a format
that is suitable for modeling.
 It typically involves:
o Handling missing values
o Encoding categorical variables
o Scaling/normalizing features
o Handling outliers
o Splitting data into train/test sets

Sample Dataset

We’ll create a simple dataset to demonstrate preprocessing:

import pandas as pd
import numpy as np

data = {
'Age': [25, 30, np.nan, 22, 40, np.nan, 28],
'Salary': [50000, 60000, 58000, 52000, np.nan, 62000, 58000],
'City': ['New York', 'Los Angeles', 'New York', 'Chicago', 'Chicago',
'Los Angeles', np.nan],
'Purchased': ['Yes', 'No', 'Yes', 'No', 'Yes', 'No', 'Yes']
}

df = pd.DataFrame(data)
print(df)

Step 1: Handling Missing Data

Missing data can cause errors or bias your model.

Check missing values:


print(df.isnull().sum())

Fill missing numeric values with mean or median:


df['Age'].fillna(df['Age'].mean(), inplace=True)
df['Salary'].fillna(df['Salary'].median(), inplace=True)

Fill missing categorical values with mode (most frequent):


df['City'].fillna(df['City'].mode()[0], inplace=True)

Step 2: Encoding Categorical Variables

ML models work with numbers, so convert categories to numeric.

Label Encoding (for binary categories):


df['Purchased'] = df['Purchased'].map({'Yes': 1, 'No': 0})

One-Hot Encoding (for nominal categories with multiple values):


df = pd.get_dummies(df, columns=['City'])

Step 3: Feature Scaling

Normalize or standardize features to bring them to similar scale.

Standardization (mean=0, std=1):


from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
df[['Age', 'Salary']] = scaler.fit_transform(df[['Age', 'Salary']])

Alternatively, Min-Max Scaling (values between 0 and 1):


from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler()
df[['Age', 'Salary']] = scaler.fit_transform(df[['Age', 'Salary']])

Step 4: Splitting Dataset into Training and Testing Sets

Separate data for training and evaluation.

from sklearn.model_selection import train_test_split

X = df.drop('Purchased', axis=1)
y = df['Purchased']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,


random_state=42)

print("Training set size:", X_train.shape)


print("Testing set size:", X_test.shape)
Step 5: Handling Outliers (Optional)

Detect and handle outliers using methods like Z-score or IQR.

# Example: Using IQR method to remove outliers in 'Age'


Q1 = df['Age'].quantile(0.25)
Q3 = df['Age'].quantile(0.75)
IQR = Q3 - Q1
filtered_df = df[~((df['Age'] < (Q1 - 1.5 * IQR)) | (df['Age'] > (Q3 + 1.5 *
IQR)))]
print(filtered_df)

Complete Code (Putting It All Together)


import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

# Sample data
data = {
'Age': [25, 30, np.nan, 22, 40, np.nan, 28],
'Salary': [50000, 60000, 58000, 52000, np.nan, 62000, 58000],
'City': ['New York', 'Los Angeles', 'New York', 'Chicago', 'Chicago',
'Los Angeles', np.nan],
'Purchased': ['Yes', 'No', 'Yes', 'No', 'Yes', 'No', 'Yes']
}
df = pd.DataFrame(data)
# Handle missing values
df['Age'].fillna(df['Age'].mean(), inplace=True)
df['Salary'].fillna(df['Salary'].median(), inplace=True)
df['City'].fillna(df['City'].mode()[0], inplace=True)

# Encode categorical variables


df['Purchased'] = df['Purchased'].map({'Yes': 1, 'No': 0})
df = pd.get_dummies(df, columns=['City'])

# Feature scaling
scaler = StandardScaler()
df[['Age', 'Salary']] = scaler.fit_transform(df[['Age', 'Salary']])

# Split data
X = df.drop('Purchased', axis=1)
y = df['Purchased']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,


random_state=42)
print("Preprocessed Training Data:\n", X_train)
print("Training Labels:\n", y_train)

This is the basic flow of preprocessing. Depending on your dataset and problem, you may
want to include feature engineering, handling imbalanced data, or more advanced scaling.

You might also like