KEMBAR78
Artificial Intelligence Techniques (2025) - Week 3 | PDF | Machine Learning | Artificial Intelligence
0% found this document useful (0 votes)
14 views37 pages

Artificial Intelligence Techniques (2025) - Week 3

The document outlines key concepts and types of machine learning, including definitions of Artificial Intelligence (AI) and Machine Learning (ML). It discusses various machine learning paradigms such as supervised, unsupervised, and reinforcement learning, along with their techniques and objectives. Additionally, it addresses challenges like bias and variance in model training, and methods for improving model performance through techniques like regularization and cross-validation.

Uploaded by

sizwemakgalemele
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views37 pages

Artificial Intelligence Techniques (2025) - Week 3

The document outlines key concepts and types of machine learning, including definitions of Artificial Intelligence (AI) and Machine Learning (ML). It discusses various machine learning paradigms such as supervised, unsupervised, and reinforcement learning, along with their techniques and objectives. Additionally, it addresses challenges like bias and variance in model training, and methods for improving model performance through techniques like regularization and cross-validation.

Uploaded by

sizwemakgalemele
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 37

Artificial

Click to edit Master title style


Intelligence
Techniques (ITAIA2-
B22)
Week 3 - Lessons 21, 22, and 23 – Machine Learning

D r. A y a n d a M a d y i b i

1
Click to edit
Learning Master title style
Outcomes
• Describe the key concepts and types of machine learning.

2 2
Click to edit
Definition andMaster
Scopetitle style
of AI and ML
• Artificial Intelligence (AI): A multidimensional field encompassing the creation of
intelligent systems capable of mimicking human cognitive functions. AI is a subfield
of Computer Science.

• Machine Learning (ML): A subfield of AI that empowers systems to learn and


improve from data, without explicit programming.

3 3
Click to edit
Definition andMaster
Scopetitle style
of AI and ML

4 4
Click
Formstoofedit Master title style
Learning
• Component Enhancement: Machine learning can enhance various components of an agent
program, such as mapping conditions to actions, based on available data, feedback, and prior
knowledge.

• Learning from Observation: For instance, a self-driving car agent can learn optimal braking
conditions by observing human drivers and analyzing their responses in different scenarios.

• Recognition Skills: Agents can improve their ability to recognize and infer world properties,
like identifying buses from labeled images, through machine learning techniques.

• Understanding Consequences: Agents can also learn to understand the effects of their
actions, such as the implications of braking on wet roads, through trial and error and feedback
mechanisms.

5 5
Click to edit
Machine MasterTypes
Learning title style
• A Machine Learning Type is a specific approach within ML that defines how a system learns from data.
These approaches categorize how data is presented to the learning algorithm and what the desired
outcome is. Here's a breakdown of key points:

• Machine Learning encompasses various learning paradigms, each with distinct characteristics:

• Labeled data (Supervised Learning): Data comes with pre-defined categories or target
values.

• Often used for tasks like classification (spam filtering) and regression (sales
forecasting).

• Unlabeled data (Unsupervised Learning): Data lacks predefined categories.

• Valuable for tasks like customer segmentation and anomaly detection.

• Interaction with an environment (Reinforcement Learning): The algorithm learns through trial
and error by interacting with a simulated or real environment.
6 6
• Particularly useful for robotics and game-playing AI.
Click to edit Learning
Supervised Master title style
Example

7 7

https://www.youtube.com/watch?v=SrZ6vifbmdc
Click to edit
ML Type, Master title
Technique and style
Objective
• Here's a breakdown of common machine learning types, their techniques, and their objectives:

• Supervised Learning:

• Objective: Predict a target variable based on labeled training data. The data includes both
input features (independent variables) and the desired output (dependent variable).

• Techniques:

• Linear Regression: Predicts a continuous output value based on a linear relationship with
the input features. (Example: Predicting house prices based on size and location)

• Logistic Regression: Classifies data points into two categories (binary classification).
(Example: Spam filtering)

• Decision Trees: Learns a tree-like structure where splits in the tree are based on
features, leading to a final classification or prediction. (Example: Customer churn
prediction)
8 8
• Support Vector Machines (SVMs): Find a hyperplane that best separates data points
Click to edit Master
Unsupervised title Example
Learning style

https://www.youtube.com/watch?v=yteYU_QpUxs 9 9
Click to edit
ML Type, Master title
Technique and style
Objective
• Here's a breakdown of common machine learning types, their techniques, and their objectives:

• Unsupervised Learning:

• Objective: Discover hidden patterns or structures in unlabeled data (data without predefined
categories).

• Techniques:

• K-Means Clustering: Groups data points into ‘k’ clusters based on similarity. (Example:
Segmenting customers based on purchase behavior)

• Principal Component Analysis (PCA): Reduces the dimensionality of data by identifying


the most significant features. (Example: Simplifying image data for faster processing)

• Anomaly Detection: Identifies data points that deviate significantly from the expected
pattern, potentially indicating fraud or system failures. (Example: Detecting unusual
credit card transactions)
10
10
Click to edit Master
Reinforcement title style
Learning Example

11
https://www.youtube.com/watch?v=RAUmWPpxa2M 11
Click to edit
ML Type, Master title
Technique and style
Objective
• Here's a breakdown of common machine learning types, their techniques, and their
objectives:

• Reinforcement Learning:

• Objective: Train an agent to interact with an environment, take actions, and learn
through trial and error to maximize rewards.

• Techniques:

• Q-Learning: The agent learns a ‘Q-value’ for each action in a state, indicating
the expected future reward. (Example: Training an AI to play a game by
learning from its successes and failures)

• Deep Reinforcement Learning: Combines reinforcement learning with deep


neural networks for complex tasks. (Example: Training a robot to navigate an
12
12
environment)
Click to editTechniques
ML Types, Master title style
and Examples

13
13
Click to edit Master
ML Techniques title style
Videos
 Supervised Learning
o Linear Regression: https://www.youtube.com/watch?v=kVOF8LPGSs8
o Logistic Regression: https://www.youtube.com/watch?v=qiXm6xK363o
o Decision Trees: https://www.youtube.com/watch?v=qiXm6xK363o
o Support Vector Machines (SVMs):
https://www.youtube.com/watch?v=vdtjFEBJKY8
 Unsupervised Learning
o K-Means Clustering: https://www.youtube.com/watch?v=z8uvCejr5Qk
o Principal Component Analysis (PCA):
https://www.youtube.com/watch?v=FyVCyjna97U
 Reinforcement Learning
o Q-Learning: https://www.youtube.com/watch?v=QRMNPCsnSHk 14
14
Click
Bias, to edit Master
Variance, title stylein ML
and Trade-off
• Bias and Variance in Machine Learning: Supervised machine learning algorithms aim to estimate a
mapping function from input data to output variables, facing challenges like bias and variance errors.
Bias refers to the oversimplification of models, leading to underfitting, while variance indicates model
sensitivity to training data, leading to overfitting.

• Examples of Bias and Variance: High-bias algorithms (e.g., Linear Regression) make strong
assumptions about data form, often failing in complex scenarios. High-variance algorithms (e.g.,
Decision Trees), though excellent on training data, may not generalize well to new data due to
overfitting.

• Trade-off Dynamics: The bias-variance tradeoff is crucial for model accuracy, involving a balance
where increasing one typically decreases the other. For instance, increasing the number of neighbors
in a k-nearest neighbors algorithm increases bias but can reduce variance, affecting model
generalization. https://www.youtube.com/watch?v=HieukflXxR0

15
15
Click to edit
Validation Master
Error title style
Curve

• Validation Error Curve: This curve illustrates the relationship between model complexity and
validation error, showing regions of underfitting (high bias) and overfitting (high variance). The
optimal model complexity lies in the middle, minimizing both bias and variance for the best
generalization on unseen data.
16
16
Click to editvs
Overfitting Master title style
Underfitting
• Overfitting and Underfitting: Machine learning models suffer from overfitting when they capture not
only the underlying patterns but also the noise in the training data, leading to poor generalization on
new data. Underfitting occurs when models are too simplistic, failing to capture essential patterns in
the data, resulting in low accuracy on both training and test datasets.

• Identifying Model Issues: Overfitting is indicated by high accuracy on training data but significantly
lower accuracy on test data, suggesting the model is too complex. Underfitting is evident when a
model performs poorly on both training and test sets, indicating it is too simple and has a high bias.

17
17
Click to editvs
Overfitting Master title style
Underfitting

18
18
Click to editBias
Addressing Master
andtitle style Issues
Variance
• To strike the right balance and address bias and variance issues, consider the following solutions:

• Regularization: Regularization techniques like L1 (Lasso) and L2 (Ridge) can help mitigate overfitting. These
methods add penalty terms to the model’s cost function, discouraging it from becoming overly complex.

• Feature Engineering: Thoughtful feature selection and engineering can reduce both bias and variance. By
including relevant features and excluding noisy ones, you can improve model performance.

• Cross-Validation: Utilize cross-validation to assess your model’s performance on different subsets of the data.
This helps you gauge how well your model generalizes across various data splits, providing valuable insights
into bias and variance.

• Ensemble Methods: Ensemble techniques such as Random Forests and Gradient Boosting combine multiple
models to achieve better performance. They can effectively reduce overfitting while improving predictive
accuracy.

• Collect More Data: If your model suffers from high bias (underfitting), acquiring more data can help it capture
more complex patterns. Additional data can be especially beneficial when dealing with deep neural networks.

19
19
Click to editExample:
Restaurant Master title style
Supervised Learning
• Step 1: Data Collection

• Suppose we collect data from a restaurant over several days. The dataset might look like this:

Number of Customers Ahead (x) Wait time (minutes) (y)

1 5

2 10

3 15

4 20

5 25

20
20
Click to editExample:
Restaurant Master title style
Supervised Learning
• Step 2 Define Hypothesis Space

• We hypothesize that the wait time increases linearly with the number of customers ahead. Thus, our
hypothesis function ℎ(𝑥)h(x) is:

• ℎ(𝑥)=𝜃0+𝜃1𝑥

• Step 3: Model Training

• We use linear regression to find the best parameters 𝜃0​ and 𝜃1​ . For simplicity, let's assume 𝜃0=0 (no wait
time if no customers are ahead), and we only need to find 𝜃1. Using the method of least squares, we calculate
𝜃1 as follows:

• Given our data, we can observe that the relationship is directly proportional, so 𝜃1​ should ideally be 5 (as each additional
customer increases the wait time by 5 minutes).

• Step 4: Evaluation and Bias-Variance Tradeoff

• We evaluate our model using a test set (not shown here for simplicity). If the model predicts accurately across
both training and test datasets, it suggests a good fit. If it performs poorly on the test set, it might be
21
overfitting; if it performs poorly on both, it might be underfitting. 21
Click to editExample:
Restaurant Master title style
Supervised Learning
• Step 5: Model Deployment

• Once tested and adjusted, the model can be used in real-time to predict wait times based on the number of
customers.

• Example Calculation:

• If there are 6 customers ahead, the predicted wait time using our model would be

• ℎ(6)=5×6=30 minutes

• This simple example demonstrates the basic steps of supervised learning, from
hypothesis formulation through training and testing, to deployment, using a
straightforward linear relationship.

22
22
Click to edit Master
Expressiveness title style
of Decision Tree
• Decision Tree Basics: A decision tree maps attribute values to a decision through a series of tests
from the root to the leaves. Each internal node tests an attribute, branches represent attribute
values, and leaf nodes determine the output, typically used for Boolean classification where outputs
are true or false.

• Logical Representation and Limitations: Decision trees can be equated to logical statements in
disjunctive normal form, where each path through the tree represents a conjunction of conditions
leading to an outcome. While effective for many logical functions and creating interpretable models,
they struggle with complex functions like majority or parity, which require exponentially large trees,
and with real-valued attributes due to their rectangular, axis-aligned decision boundaries.

• Scalability Challenges: The representation capacity of decision trees is limited by the exponential
growth in the number of possible functions as the number of attributes increases. For example, with
20 attributes, the number of possible Boolean functions is around 10300,00010 300,000 , highlighting
the impracticality of using decision trees for all possible functions.
23
23
Click to edit
Learning Master
Decision title style
Trees
• Greedy Approach: The LEARN-DECISION-TREE algorithm uses a greedy divide-and-conquer strategy to
build efficient decision trees by testing the most important attributes first.

• Four Recursive Cases: The algorithm handles four scenarios: all examples are of one class, mixed
examples require splitting on the best attribute, no examples remain (use parent's majority class), or
no attributes remain (use majority class).

• Simplification for Generalization: The resulting tree prioritizes generalization over exact replication,
often producing simpler trees by excluding unnecessary tests and detecting patterns in the data.

• Evaluation via Learning Curves: Decision tree performance is measured using learning curves, which
typically show improved accuracy with larger training sets, potentially reaching 95% or higher with
sufficient data.

24
24
Click to edit
Choosing Master title
Attributes Testsstyle
• Attribute Selection Based on Importance: The decision tree algorithm selects attributes based on their
importance, measured by information gain, which is calculated using entropy—a measure of
uncertainty in information theory.

• Entropy Explained: Entropy quantifies the unpredictability of a variable's value, with higher entropy
indicating greater uncertainty. For instance, a fair coin has 1 bit of entropy, while a biased coin has
less, approaching zero as it becomes more predictable.

• Entropy in Decision Trees: In decision trees, entropy is used to determine the best attribute for
splitting the data. The attribute that most effectively reduces uncertainty (or entropy) in the dataset
is considered the most important.

• Information Gain Calculation: Information gain is computed by subtracting the entropy of the dataset
after a split (on an attribute) from the entropy before the split. The attribute with the highest
information gain is chosen for the split, guiding the construction of the tree towards clearer and more
decisive partitions.
https://www.youtube.com/watch?v=WaIK3UjYWs0 25
25
Click to edit Master
Generalization title style
and Overfitting
• Generalization Over Fitting: Learning algorithms aim to generalize well to new data,
avoiding overfitting, which is more likely with complex models like high-degree
polynomials or large decision trees, especially when the number of attributes is high and
training examples are few.

• Pruning to Combat Overfitting: Decision tree pruning reduces overfitting by removing


nodes that contribute little to predictive accuracy, typically identified through low
information gain or lack of statistical significance, enhancing the tree's ability to handle
noisy data.

• Balancing Complexity and Noise: Pruned trees are generally more effective in noisy
environments, providing simpler, more interpretable, and efficient models. However,
overly aggressive early stopping in tree generation can prevent the discovery of
significant attribute combinations, suggesting a generate-then-prune approach might be 26
26

more effective in certain scenarios.


Click to edit Master
Applicability title style
of Decision Trees
• Handling Missing and Complex Data: Decision trees need to manage complications like
missing data by modifying classification methods and adjusting information-gain
calculations, and they must effectively handle continuous and multivalued attributes
through methods like split point tests or using information-gain ratios.

• Adaptation for Real-World Use: To be practical, decision trees must accommodate real-
world data complexities, including continuous variables and multivalued attributes, with
commercial packages developed to enhance their applicability in diverse domains like
finance and physical sciences.

• Strengths and Limitations: While decision trees are favored for their interpretability and
versatility in handling various data types for both classification and regression, they are
challenged by issues like suboptimal accuracy and instability, which can lead to
significant changes in the tree structure from minor data variations. 27
27
Click to edit
Decision TreeMaster title style
Example

28
28
Click
Pythonto edit
CodeMaster title style
– Diabetes Example
• See the VS Code

29
29
Click
Model toSelection
edit Master
andtitle style
Optimization
• Our goal in machine learning is to select a hypothesis that best predicts future
data, assuming examples are independent and identically distributed (i.i.d.).

• To avoid bias, we use separate datasets: training for model creation, validation
for model selection, and test for final evaluation; k-fold cross-validation helps
when data is limited.

• Model selection chooses the best hypothesis space, while optimization finds the
best hypothesis within it, typically using validation error to guide the process
before final testing.

30
30
Click
Model toSelection
edit Master title style
• Model selection involves increasing model complexity and choosing the version
with the lowest validation error, balancing underfitting and overfitting; for some
models like decision trees, this results in a U-shaped validation error curve.

• Different models handle increased complexity differently: decision trees often


overfit as they grow, while models like deep neural networks may keep
improving with more capacity.

• Model selection can compare different model types and optimize multiple
hyperparameters using advanced methods like grid search, beyond simple linear
search.

31
31
Click to edit Loss
Minimizing Master title style
in Machine Learning
• Minimizing error rate is not always sufficient; the severity of different types of
errors matters, so loss functions are used to assign higher penalties to more
serious mistakes.

• Various loss functions, like L1, L2, and 0/1 loss, help quantify prediction errors,
and the goal is to minimize the expected loss across all possible cases.

• Since the true data distribution is usually unknown, machine learning models
minimize empirical loss calculated from available examples to estimate the best
hypothesis.

32
32
Click to edit in
Challenges Master title style
Estimating the Best Hypothesis
• The estimated best hypothesis can differ from the true function due to factors
like unrealizability (true function not in hypothesis space), variance (different
data samples yield different models), noise (random data fluctuations), and
computational complexity (difficulty searching the entire space).

• In small-scale learning, generalization loss is mainly due to limited data and


imperfect hypothesis spaces, while in large-scale learning, computational limits
often prevent finding the optimal hypothesis even with abundant data.

33
33
Click to edit Master title style
Regularization
• Regularization simplifies models by penalizing complexity, aiming to minimize a
combination of empirical loss and a complexity measure, balanced by a
hyperparameter λ.

• Different regularization functions are used for different models; for example,
polynomials often use the sum of squared coefficients, and feature selection
removes irrelevant attributes to reduce complexity.

• The minimum description length (MDL) approach selects the hypothesis that
minimizes the total encoding length of the model and data, though its
effectiveness can depend on encoding choices, especially in small problems.

34
34
Click to edit Master title style
Optimization
• Hyperparameter tuning finds the best model settings using methods like hand-
tuning, grid search, random search, Bayesian optimization, and population-based
training.

• Computational learning theory, including PAC learning, studies how many


examples are needed for a model to generalize well and how to avoid overfitting.

• The number of training examples required for good generalization depends on


the size of the hypothesis space and the desired accuracy.

• Restricting the hypothesis space with prior knowledge, simpler models, or


learnable subsets helps improve generalization and search efficiency.

35
35
Click to edit Master title style
Optimization
• PAC learning can be applied to decision lists, where limiting test size allows learning with
a reasonable number of examples.

• The DECISION-LIST-LEARNING algorithm builds consistent decision lists by finding tests


that match subsets of the training data, repeating until all examples are covered.

• Linear regression fits data with a line (or hyperplane), using methods like gradient
descent and stochastic gradient descent for optimization, and extends to multiple
features.

• Nonparametric models like k-nearest-neighbors use all data points for predictions, with
efficiency improved by structures like k-d trees and LSH, but face challenges in high
dimensions.

• Boosting combines weak models to improve accuracy, while developing machine learning
36
36
systems involves careful problem formulation, data management, feature engineering,
Click to edit Master title style

Thank
You

37
37

You might also like