CP4252 Machine Learning LTPC
3024
UNIT I INTRODUCTION AND MATHEMATICAL FOUNDATIONS
What is Machine Learning? Need –History – Definitions – Applications - Advantages, Disadvantages &
Challenges -Types of Machine Learning Problems – Mathematical Foundations - Linear Algebra &
Analytical Geometry -Probability and Statistics- Bayesian Conditional Probability -Vector Calculus &
Optimization - Decision Theory - Information theory
UNIT II SUPERVISED LEARNING
Introduction-Discriminative and Generative Models -Linear Regression - Least Squares -Under-fitting /
Overfitting -Cross-Validation – Lasso Regression- Classification - Logistic Regression- Gradient Linear
Models -Support Vector Machines –Kernel Methods -Instance based Methods - K-Nearest Neighbors -
Tree based Methods –Decision Trees –ID3 – CART - Ensemble Methods –Random Forest - Evaluation of
Classification Algorithms
UNIT III UNSUPERVISED LEARNING AND REINFORCEMENT LEARNING
Introduction - Clustering Algorithms -K – Means – Hierarchical Clustering - Cluster Validity - Dimensionality
Reduction –Principal Component Analysis – Recommendation Systems – EM algorithm. Reinforcement
Learning – Elements -Model based Learning – Temporal Difference Learning
UNIT IV PROBABILISTIC METHODS FOR LEARNING
Introduction -Naïve Bayes Algorithm -Maximum Likelihood -Maximum Apriori -Bayesian Belief Networks
-Probabilistic Modelling of Problems -Inference in Bayesian Belief Networks – Probability Density
Estimation - Sequence Models – Markov Models – Hidden Markov Models
UNIT V NEURAL NETWORKS AND DEEP LEARNING
Neural Networks – Biological Motivation- Perceptron – Multi-layer Perceptron – Feed Forward Network
– Back Propagation-Activation and Loss Functions- Limitations of Machine Learning – Deep Learning–
Convolution Neural Networks – Recurrent Neural Networks – Use cases
UNIT – I
Introduction and mathematical foundations
What is Machine Learning? Need –History – Definitions – Applications - Advantages, Disadvantages &
Challenges -Types of Machine Learning Problems – Mathematical Foundations - Linear Algebra &
Analytical Geometry -Probability and Statistics- Bayesian Conditional Probability -Vector Calculus &
Optimization - Decision Theory - Information theory
1.1 What is machine learning? Need – History and Definitions - Applications
✓ Machine Learning (ML) is a branch of artificial intelligence (AI) that allows computers to learn
from data and improve over time without being explicitly programmed.
✓ It focuses on the development of algorithms that enable machines to identify patterns, make
decisions, and predict outcomes based on historical data.
Need for Machine Learning
1. Traditional Programming Limitations
Traditional programming requires explicitly coding every rule and decision, which becomes impossible
for complex, dynamic, and large-scale problems.
ML Solution:
• Learns patterns and rules from data automatically.
• Eliminates the need for manual coding in complex scenarios.
Example:
Instead of coding rules to detect spam emails, an ML model can learn from past email data to recognize
spam accurately.
2. Processing Big Data Efficiently
With the rise of the internet, IoT, and digital platforms, vast amounts of data are generated every
second. Analyzing this manually is impractical.
ML Solution:
• Quickly analyzes massive datasets.
• Extracts insights and patterns in real-time.
• Enables big data applications like recommendation systems and fraud detection.
Example:
Netflix and Amazon use ML to process huge volumes of user data to deliver personalized
recommendations.
3. Dynamic Adaptation & Continuous Improvement
Hard-coded solutions are static and can’t adapt to changing environments.
ML Solution:
• Adapts to new patterns in data automatically.
• Learns and improves over time with more data.
• Supports self-learning and autonomous decision-making.
Example:
Self-driving cars continuously learn from real-world scenarios to drive safely.
4. Real-Time Decision Making
Many applications require immediate responses, such as fraud detection and autonomous driving.
ML Solution:
• Provides real-time predictions and decisions.
• Efficiently handles streaming data.
Example:
Credit card companies use ML for instant fraud detection during transactions.
5. Solving Complex and Unpredictable Problems
Some problems are too complex for rule-based programming, such as understanding natural language
or recognizing images.
ML Solution:
• Learns from examples instead of relying on fixed rules.
• Can handle unpredictable and unstructured data.
Example:
• Image recognition software can identify objects in pictures.
• Voice assistants like Siri and Alexa use NLP to understand and respond to commands.
6. Personalization and User Experience
Users expect personalized experiences, which require understanding individual preferences.
ML Solution:
• Learns user behaviors and preferences.
• Offers tailored recommendations and experiences.
Example:
Spotify suggests music based on individual listening habits.
7. Automation of Repetitive Task
Manual processing is inefficient for repetitive, data-intensive tasks.
ML Solution:
• Automates tasks like document processing, customer support, and anomaly detection.
Example:
Chatbots automate customer service, answering common queries instantly.
8. Making Data-Driven Decisions
Businesses need actionable insights from data to stay competitive.
ML Solution:
• Analyzes historical data for predictive insights.
• Optimizes decision-making processes.
Example:
E-commerce companies predict sales, optimize pricing, and manage inventory using ML.
9. Managing Uncertainty and Variability
Many real-world problems involve uncertainty, requiring probabilistic reasoning.
ML Solution:
• Provides probabilistic predictions for uncertain situations.
Example:
Weather forecasting uses ML to predict weather patterns despite uncertainties.
10. Enabling Emerging Technologies
ML powers the latest innovations and tech advancements.
ML Solution:
• Forms the backbone of AI applications.
• Drives advances in robotics, computer vision, NLP, and autonomous systems.
Example:
• AI-driven chatbots and virtual assistants.
• Autonomous drones and vehicles.
History of Machine Learning:
The evolution of machine learning (ML) has been a fascinating journey, transforming from a theoretical
concept to a core technology driving modern AI applications. Here's a brief overview of its key milestones:
1. Early Foundations (1950s-1970s)
• 1950s: Alan Turing proposes the idea of machines that can learn (Turing Test, 1950). Arthur Samuel
develops a self-learning checkers program, coining the term "machine learning" (1959).
• 1960s: Early neural networks like the Perceptron are developed by Frank Rosenblatt, but
limitations in handling complex patterns stall progress.
• 1970s: Interest wanes due to the "AI Winter" as limitations in hardware and algorithms prevent
practical applications.
2. Rise of Statistical Learning (1980s-1990s)
• 1980s: Introduction of decision trees, support vector machines (SVMs), and reinforcement
learning. Neural networks regain interest due to backpropagation.
• 1990s: Statistical learning theory and probabilistic models (e.g., Hidden Markov Models, Bayesian
Networks) become popular for speech recognition and computer vision.
3. Big Data & The Internet Era (2000s)
• 2000s: Increased computational power, availability of large datasets, and cloud computing lead to
significant progress. Algorithms like Random Forests, Gradient Boosting, and ensemble methods
emerge.
4. Deep Learning Revolution (2010s)
• 2010s: Neural networks with many layers (deep learning) achieve breakthroughs in image
recognition (AlexNet, 2012), speech recognition, and natural language processing (NLP).
• 2014: GANs (Generative Adversarial Networks) introduced by Ian Goodfellow.
• 2017: The Transformer model revolutionizes NLP, leading to models like BERT and GPT.
• 2018-2019: Transfer learning and pre-trained models gain popularity.
5. The AI & ML Boom (2020s-Present)
• 2020s: Large language models (LLMs) like ChatGPT, GPT-3, and GPT-4 redefine NLP. AI tools
become mainstream with applications in healthcare, finance, and entertainment.
• Ongoing: Advances in multimodal models (handling text, images, and more), reinforcement
learning, and ethical AI practices.
6. The Future of Machine Learning
• Continuous improvement in AI autonomy, interpretability, and ethical practices.
• Potential for artificial general intelligence (AGI).
• Greater integration of AI in daily life—smart assistants, autonomous vehicles, and more.
Applications of Machine Learning:
1. Healthcare:
o Disease Prediction: ML models predict diseases based on symptoms, medical history,
and genetic data (e.g., cancer detection).
o Personalized Medicine: ML helps in designing customized treatment plans based on
patient data.
2. Finance:
o Fraud Detection: Detecting fraudulent activities through anomaly detection.
o Algorithmic Trading: Using ML algorithms to make high-frequency trading decisions
based on historical data.
3. E-commerce:
o Recommendation Systems: ML algorithms suggest products based on past browsing
history or purchase behavior (e.g., Amazon, Netflix).
o Customer Segmentation: Classifying customers based on their buying patterns for
targeted marketing.
4. Self-driving Cars:
o Autonomous Vehicles: ML is used for image recognition, route optimization, and
decision-making in autonomous vehicles.
5. Natural Language Processing (NLP):
o Speech Recognition: Converting spoken language into text (e.g., Google Assistant, Siri).
o Language Translation: Automatically translating text or speech between languages
(e.g., Google Translate).
6. Robotics:
o Robot Learning: Robots use ML to perform tasks like picking up objects or navigating
through unknown environments.
7. Image and Video Analysis:
o Facial Recognition: Identifying individuals in images or videos (e.g., Facebook, security
systems).
o Object Detection: Used in various fields like security, manufacturing, and autonomous
vehicles.
Advantages of Machine Learning:
1. Automation of Decision-Making: ML allows systems to make decisions and take actions
without human intervention.
2. Efficiency and Scalability: ML systems can process large datasets quickly, identifying patterns
and insights that would be impossible for humans to detect.
3. Personalization: ML can tailor experiences, services, or products to individual preferences.
4. Improvement Over Time: ML algorithms continuously improve with more data, becoming more
accurate and effective.
5. Cost Savings: ML can automate repetitive tasks, reducing the need for human resources and
cutting operational costs.
Disadvantages of Machine Learning:
1. Data Dependency: ML models require large amounts of quality data to train, and poor data
quality can lead to inaccurate predictions.
2. Complexity: Developing and fine-tuning ML models can be resource-intensive and require
specialized knowledge.
3. Interpretability: Many advanced ML models, like deep learning, are often seen as "black
boxes," making it difficult to understand how decisions are made.
4. Bias in Data: If the training data contains biases, ML models may learn and perpetuate these
biases, leading to unfair or discriminatory outcomes.
5. Overfitting and Underfitting: ML models can overfit (become too specialized to training data)
or underfit (fail to capture the underlying patterns in the data), leading to poor generalization.
Challenges in Machine Learning:
1. Data Privacy and Security: ML models can raise concerns about the privacy of personal data,
especially in sensitive domains like healthcare and finance.
2. Data Quality and Availability: The accuracy of ML models heavily depends on the quality and
availability of large datasets.
3. Interpretability and Transparency: There’s a growing need for interpretable models that
provide transparency into how decisions are made, especially in high-stakes areas like
healthcare and law.
4. Scalability: Some ML models may struggle to scale efficiently as data grows in size and
complexity.
5. Computational Resources: Training large ML models, particularly deep learning models, can
require significant computational power and energy, making it resource-intensive
1.2 Types of machine learning?
Machine Learning (ML) is a branch of artificial intelligence (AI) that allows computers to learn from
data and improve over time without being explicitly programmed. It focuses on the development of
algorithms that enable machines to identify patterns, make decisions, and predict outcomes based on
historical data.
1.Supervised Learning
Supervised learning is a type of machine learning where the model is trained on labeled data. In this
case, the input data comes with the correct output (label). The model learns to map inputs to outputs
and is then able to predict the output for new, unseen data.
Examples:
• Classification: Predicting categories (e.g., spam or not spam in emails).
• Regression: Predicting numbers (e.g., predicting house prices based on features like size).
2. Unsupervised Learning
Unsupervised learning is a type of machine learning where the model is trained on unlabeled data.
The algorithm tries to find hidden patterns or structures in the data without any pre-defined labels
or answers.
Examples:
• Clustering: Grouping similar things together (e.g., customer segmentation).
• Dimensionality Reduction: Reducing data to fewer features while keeping important info
(e.g., simplifying large datasets).
3. Semi-supervised Learning
Semi-supervised learning is a type of machine learning where the model is trained on a small amount
of labeled data and a large amount of unlabeled data. The model uses the labeled data to learn
patterns and applies that knowledge to unlabeled data.
Examples:
• Image classification with only a few labeled images but many unlabeled images.
4. Reinforcement Learning
Reinforcement learning is a type of machine learning where an agent learns by interacting with an
environment. The agent takes actions and receives feedback in the form of rewards or penalties,
with the goal of maximizing cumulative rewards over time.
Examples:
• Games: Teaching an AI to play chess or Go.
• Self-driving cars: Learning to drive safely by interacting with the environment.
6. Self-supervised Learning
Self-supervised learning is a type of unsupervised learning where the model generates its own
labels from the input data and learns to predict those labels. The model learns by predicting part
of the data from other parts of the same data.
Examples:
• Text Prediction: Predicting the next word in a sentence (e.g., GPT-3).
• Image Completion: Filling in missing parts of an image.
7. Transfer Learning
Transfer learning is a machine learning technique where a model trained on one task is reused
for a related task. The model is fine-tuned for the new task, often saving time and resources.
Examples:
• Image recognition: Using a pre-trained model on general images and fine-tuning it for specific
tasks like detecting cancer in medical images.
• Natural Language Processing (NLP): Using a pre-trained model for tasks like translating
languages.
1.3 Mathematical foundations for Machine Learning:
1.3.1 Linear Algebra
• Linear Algebra is a foundational mathematical discipline for machine learning (ML).
• It provides the tools for representing and manipulating data in high-dimensional
spaces, which is essential in many ML algorithms.
1. Vectors
• A vector is a list of numbers, and it is one of the most important data structures in
machine learning.
• In ML, vectors are often used to represent data points. For instance, a feature vector
x=[x1,x2,...,xn] represents an input sample with n features.
• Operations:
o Addition: v+w=[v1+w1,v2+w2,...,vn+wn]
o Scalar multiplication: α⋅v=[α⋅v1,α⋅v2,...,α⋅vn]
o Dot Product: v⋅w=v1w1+v2w2+...+vnwn
2.Matrices
• A matrix is a 2D array of numbers. A matrix is used to represent data or transformations
in machine learning.
• In ML, Matrices represent datasets, where each row corresponds to an observation and
each column represents a feature. For example, a matrix A with dimensions m×n could
represent a dataset with m samples and n features.
• Operations:
o Matrix multiplication: If A is of shape m×n and B is of shape n×p, the product AB
is a matrix of shape m×p.
o Transpose: The transpose of a matrix A (denoted AT) is the matrix obtained by
swapping rows and columns.
3. Eigenvectors and Eigenvalues
• For a square matrix A, an eigenvector v and eigenvalue λ satisfy the equation:
Av=λv
• In ML, eigenvectors and eigenvalues are key in dimensionality reduction, like PCA, where
they identify the directions of maximum variance in the data.
4. Linear Transformations
• A linear transformation is a function that takes a vector and returns another vector, often
by applying a matrix to the vector.
• Linear transformations are used in algorithms like linear regression, neural networks.
5. Singular Value Decomposition (SVD)
• SVD is a factorization of a matrix A into three matrices U, Σ, and VT such that:
A=UΣVT
• In ML, SVD is used in matrix factorization methods, which are key in techniques like
recommendation systems and topic modeling (e.g., Latent Semantic Analysis).
6. Norms
• A norm is a function that assigns a positive length or size to vectors.
• Norms are often used to measure the distance between points (e.g., Euclidean distance,
L2 norm).
Analytic Geometry (Coordinate Geometry)
• Analytical Geometry or coordinate Geometry refers to the use of geometric concepts, such as
points, vectors, lines, hyperplanes, and distances, to represent and manipulate data in machine
learning algorithms.
• It allows for the visualization and understanding of data structures in multi-dimensional spaces,
helping in tasks like classification, regression, dimensionality reduction, and clustering.
1. Coordinate Systems and Vectors
• Vectors represent data points in multi-dimensional space. Each data sample is a vector with
features as components (e.g., [x1,x2,...,xn][x_1, x_2, ..., x_n]).
• In ML, vectors represent datasets in high-dimensional spaces, enabling algorithms to perform
tasks like classification and regression.
2. Distance and Similarity Measures
• Euclidean Distance: Measures the straight-line distance between two points (e.g., used in K-
Nearest Neighbors (KNN) and Clustering).
• Cosine Similarity: Measures how similar two vectors are, often used in text mining (e.g.,
document similarity).
3. Hyperplanes and Linear Separability
• Hyperplanes: Flat subspaces (e.g., a line in 2D, a plane in 3D) used to separate data points into
different categories.
• Linear separability refers to data that can be perfectly divided by a hyperplane.
4. Planes and Curves
• Planes in 3D are defined by equations representing linear relationships
between features.
• In ML, linear regression, a best-fit plane (or hyperplane) is used to model the relationship between
variables.
5. Projections and Dimensionality Reduction
• Projection: Mapping high-dimensional data onto lower-dimensional spaces while preserving key
information.
• In ML, Principal Component Analysis (PCA) projects data onto principal components to reduce
dimensionality, improving computational efficiency.
6. Linear and Affine Transformations
• Linear Transformations involve matrix multiplication to map data from one space to another,
fundamental in many algorithms, like Neural Networks.
• Affine Transformations include both linear transformations and translations, useful in tasks like
image transformations.
7. Angles and Orthogonality
• Orthogonality means vectors are perpendicular (dot product = 0), and is key in techniques like
PCA, where principal components are orthogonal.
1.3.2 Probability in machine learning:
• Probability is the bedrock of ML, which tells how likely is the event to occur.
• The value of Probability always lies between 0 to 1.
• It is the core concept as well as a primary prerequisite to understanding the ML models
and their applications.
• Probability can be calculated by the number of times the event occurs divided by the
total number of possible outcomes.
• Let's suppose we tossed a coin, then the probability of getting head as a possible
outcome can be calculated as below formula:
• P (H) = Number of ways to head occur/ total number of possible
outcomes P (H) = ½
P (H) = 0.5
Where;
P (H) = Probability of occurring Head as outcome while tossing a coin.
Types of Probability
1. Empirical Probability (Experimental Probability)
Empirical probability is the probability based on observed data or experimentation. It is determined by
performing experiments or gathering data and calculating the frequency of an event.
2. Joint Probability
Joint probability refers to the probability of two (or more) events occurring simultaneously. It can be
calculated for both independent and dependent events.
3. Conditional Probability
Conditional probability is the probability of an event occurring given that another event has already
occurred. It reflects how the probability of one event changes based on the knowledge of another
event.
4. Theoretical Probability
Theoretical probability is the probability of an event occurring based on mathematical reasoning or
assumptions, assuming all outcomes are equally likely.
Statistics in Machine Learning
• Statistics in Machine Learning refers to the application of statistical methods to analyze,
interpret, and model data.
• It helps in data preprocessing, model selection, evaluation, and prediction by using techniques
like hypothesis testing, probability theory, and regression analysis.
• Statistics provides tools to understand data distributions, assess uncertainty, and optimize
models for better performance.
Types
o Descriptive Statistics
o Inferential Statistics
Descriptive Statistics
• Descriptive statistics summarize and describe the main features of a data set using numerical
and graphical methods.
• Summarization of Data: It involves calculating measures such as mean, median, and mode to
describe the central tendency of a dataset.
• Data Visualization: It uses graphs like histograms, bar charts, and box plots to visually represent
the data distribution and patterns.
It is divided into two categories:
1. Measure of Central Tendency:
• Mean: The arithmetic average of a set of values.
• Median: The middle value when data is ordered. If the number of observations is odd, it is the
middle value; if even, it’s the average of the two middle values.
• Mode: The most frequent value in a dataset.
Mode = Term with Highest Frequency
2. Measure of Variability:
• Range: Difference between the maximum and minimum values.
Range = Maximum value – Minimum value
• Variance: Measures the spread of data, calculated as the average of the squared deviations from
the mean.
• Standard Deviation: The square root of variance, representing the spread of data points.
• Interquartile Range (IQR): The range between the first (Q1) and third quartiles (Q3),
representing the middle 50% of the data.
IQR=Q3−Q1.
Inferential Statistics
• Inferential statistics uses sample data to make inferences about a population.
• It helps in hypothesis testing, confidence intervals, and regression analysis to draw
conclusions.
• Making Predictions: It uses sample data to make generalizations or predictions about a larger
population.
• Hypothesis Testing: It involves techniques like hypothesis testing and confidence intervals to
assess the significance of sample results and draw conclusions.
Key concepts include:
1. Hypothesis Testing: Involves evaluating the null hypothesis (H0) to determine if the sample
data significantly differs from what is expected under H0.
2. Confidence Interval: A range of values used to estimate a population parameter with a
certain level of confidence.
3. Correlation Coefficient: Measures the strength and direction of the relationship between
two variables, with Pearson's correlation coefficient (r) ranging from -1 to 1.
4. T-test and ANOVA: Used to compare means between two groups or more than two groups,
respectively.
5. Regression: Models the relationship between dependent and independent variables to
make predictions.
1.3.3 Conditional probability and Bayesian theorem:
1. Conditional Probability
• Conditional probability helps to calculate the likelihood of an event (Event A) occurring, given that
another event (Event B) has already happened.
• It describes how the occurrence of one event can influence the probability of another event.
Examples of Conditional Probability:
• Drawing a second ace from a deck of cards, given that the first card drawn was an ace.
• Finding the probability of having a disease, given a positive test result.
• Determining the probability of someone liking Harry Potter, given that they enjoy fiction.
Mathematical Representation of Conditional Probability:
If Event A is the event we’re trying to calculate, and Event B is the known condition, the conditional
probability is represented as:
Bayes Theorem
• Bayes’ Theorem is a powerful tool for calculating conditional probabilities based on prior
knowledge.
• It provides a way to update the probability of an event based on new evidence.
• This is particularly useful in decision-making scenarios and classification problems.
Components of Bayes’ Theorem
1.3.4 Vector calculus and optimization:
Vector
• A vector is a mathematical object that encodes a length and direction.
• A vector is often represented as a 1−dimensional array of numbers, referred to as
components and displayed either in column form or row form.
• Vectors are commonly used in machine learning as they lend a convenient way to
organize data.
Scalar vs. Vector:
• Scalar: A single value representing magnitude (e.g., temperature, mass).
Example: 30°C, 5 kg.
• Vector: An array of values representing magnitude and direction.
Example: A 2D vector:
• n-Dimensional Vectors: In machine learning, data points with multiple features are
often represented as n-dimensional vectors.
Example: A vector of 4 features:
Example: Support vector machine (SVM)
A support vector machine (SVM) analyzes vectors in an n-dimensional space to find the
optimal hyperplane that maximizes the distance between classes. This separation
improves classification confidence for future data points.
Multivariate Calculus in Machine Learning
• Multivariate calculus extends calculus concepts to functions with multiple variables,
essential for understanding machine learning algorithms.
Partial Derivatives
• When a function has multiple inputs, we use partial derivatives to measure how the
function changes with respect to one variable, holding the others constant.
Given a function:
Gradient Vector
Optimization:
• Optimization is the process where we train the model iteratively that results in a
maximum and minimum function evaluation.
• It is one of the most important phenomena in Machine Learning to get better results.
Two important
• Optimization algorithms:
1. Gradient Descent
2. Stochastic Gradient Descent Algorithms
Gradient Descent
• Gradient Descent (GD) is a first-order optimization algorithm
• It is used to minimize a function by iteratively updating model parameters in the direction
of the steepest descent.
• finds out the local minima of a differentiable function.
• It is a minimization algorithm that minimizes a given function.
Gradient Descent Working
The disadvantage of Gradient Descent:
• When n(number of data points) is large, the time it takes for k iterations to calculate
the optimum vector becomes very large.
• Time Complexity: O(kn²)
Stochastic Gradient Descent (SGD)
• SGD is a variant of gradient descent where instead of using the entire dataset to compute
the gradient, a random subset (batch) is used.
• Faster for large datasets.
• More noise in updates, but can escape local minima
Steps in SGD:
1. Randomly shuffle dataset
2. Select a small batch of samples
3. Compute gradient and update parameters
4. Repeat until convergence
MAXIMA AND MINIMA
• Maxima is the largest and Minima is the smallest value of a function within a given
range.
• Global Maxima and Minima: It is the maximum value and minimum value respectively
on the entire domain of the function
• Local Maxima and Minima: It is the maximum value and minimum value respectively
of the function within a given range.
• There can be only one global minima and maxima but there can be more than one local
minima and maxima.
1.4.Decision Theory in Machine Learning
• Decision Theory is a framework for making optimal decisions under uncertainty.
• Decision Theory combines probability theory and utility theory to help make the
best decision based on expected outcomes.
• Objective: Minimize risk or maximize expected utility.
• Often used in classification, regression, and reinforcement learning.
Key Concepts in Decision Theory
1. States (S): Possible conditions of the world.
2. Actions (A): Possible actions or predictions made by the model.
3. Outcomes (O): Results of taking action AAA in state SSS.
4. Utility (U): Measures the value or cost of an outcome.
5. Loss Function (L): Measures the penalty for incorrect decisions.
Loss Functions in Decision Theory
• The goal is to minimize expected loss or maximize expected utility.
Common loss functions in ML:
• Loss (Classification): Loss is 0 for correct predictions and 1 for incorrect predictions.
Bayesian Decision Theory
A probabilistic approach for decision making:
• Models uncertainty using probabilities.
• Uses posterior probability to make predictions.
1.5 Information Theory in Machine Learning
• Information theory measures how much uncertainty a random variable contains and
how much information a message provides.
✓ Data Compression: Reducing data size (e.g., Huffman coding)
✓ Communication Theory: Efficient data transmission.
✓ Machine Learning: Model selection, regularization, and decision making.
Key Concepts in Information Theory