ML Unit 1
ML Unit 1
Introduction to Machine Learning: Evolution of Machine Learning, Paradigms for ML, Learning
by Rote, Learning by Induction, Reinforcement Learning, Types of Data, Matching, Stages in
Machine Learning, Data Acquisition, Feature Engineering, Data Representation, Model Selection,
Model Learning, Model Evaluation, Model Prediction, Search and Learning, Data Sets.
Theoretical Beginnings:
o Alan Turing introduced the concept of a "learning machine" in his 1950 paper
"Computing Machinery and Intelligence."
o The development of the perceptron by Frank Rosenblatt in 1958 laid the groundwork
for neural networks.
Key Innovations:
o Basic algorithms for learning and adaptation, inspired by early artificial intelligence
(AI) ideas.
o Development of foundational statistical methods like linear regression.
Symbolic AI Dominates:
o Emphasis on logic and reasoning with explicit rules.
o Systems like the General Problem Solver (1959) and expert systems (1970s-80s)
attempted to encode human expertise.
Limitations:
o Struggled with unstructured data and the complexity of real-world problems ("AI
Winter" periods followed due to lack of progress).
o Arthur Samuel first used the term "machine learning" in 1959
Backpropagation:
o Rediscovery of backpropagation algorithms (Rumelhart, Hinton, and Williams, 1986)
enabled training of deeper networks.
Challenges:
o Limited hardware capabilities hindered the practical use of neural networks.
o Shallow models dominated due to computational constraints.
5. Big Data Era and the Emergence of Deep Learning (2000s-2010s)
Future Directions
General AI: Progress towards artificial general intelligence (AGI), where systems exhibit
human-like cognitive abilities.
Self-Supervised Learning: Leveraging unlabeled data for training scalable models.
AI Governance: Addressing ethical, societal, and regulatory challenges associated with AI
adoption.
1. Supervised Learning
In supervised learning, the algorithm learns from labeled data, where each input has a corresponding
output or target value.
Goal: Predict the output for new, unseen inputs based on past examples.
Common Algorithms:
o Linear Regression
o Logistic Regression
o Decision Trees
o Support Vector Machines (SVMs)
o Neural Networks
Applications:
o Classification: Email spam detection, image recognition.
o Regression: Predicting house prices, stock market trends.
2. Unsupervised Learning
In unsupervised learning, the algorithm works with unlabeled data, identifying patterns, structures, or
relationships within the dataset.
Goal: Discover hidden patterns or groupings in the data.
Common Algorithms:
o K-Means Clustering
o Hierarchical Clustering
o Principal Component Analysis (PCA)
o Autoencoders
Applications:
o Clustering customers for marketing.
o Reducing dimensionality of data.
o Anomaly detection (e.g., fraud detection).
3. Semi-Supervised Learning
Semi-supervised learning is a hybrid approach that uses a small amount of labeled data along with a
large amount of unlabeled data.
Goal: Leverage the labeled data to improve learning performance with minimal labeling
effort.
Common Algorithms:
o Self-Training
o Label Propagation
o Graph-Based Methods
Applications:
o Speech and video analysis where labeling is expensive.
o Medical imaging.
5. Self-Supervised Learning
A rapidly growing paradigm where the algorithm generates pseudo-labels from unlabeled data and
uses these labels for training.
Goal: Learn robust representations from unlabeled data without requiring human annotation.
Common Algorithms:
o Contrastive Learning (e.g., SimCLR, MoCo)
o Transformer-based models (e.g., BERT, GPT)
Applications:
o Pre-training large language models.
o Visual representation learning for images.
6. Online Learning
In online learning, data arrives sequentially, and the model updates incrementally without
reprocessing the entire dataset.
Goal: Adapt to new data in real-time while maintaining performance on previously learned
tasks.
Common Algorithms:
o Online Gradient Descent
o Perceptron
Applications:
o Real-time recommendation systems.
o Predictive maintenance in IoT.
7. Transfer Learning
Transfer learning involves leveraging knowledge from a pre-trained model to solve a different but
related task.
Goal: Reuse pre-trained models to improve learning efficiency on new tasks.
Common Frameworks:
o Fine-tuning pre-trained models (e.g., ResNet, BERT).
Applications:
o Domain adaptation in NLP.
o Computer vision tasks with limited labeled data.
8. Multitask Learning
In multitask learning, a single model is trained to perform multiple related tasks simultaneously.
Goal: Improve learning efficiency by sharing knowledge across tasks.
Applications:
o Multi-label classification.
o Joint learning of NLP tasks (e.g., sentiment analysis + part-of-speech tagging).
9. Generative Learning
Generative learning focuses on modeling the underlying data distribution to generate new, synthetic
data similar to the training set.
Goal: Create new samples or infer missing data.
Common Algorithms:
o Generative Adversarial Networks (GANs)
o Variational Autoencoders (VAEs)
Applications:
o Image synthesis (e.g., Deepfake generation).
o Drug discovery.
Types of Data:
In machine learning (ML), the type of data used plays a crucial role in determining the appropriate
algorithms, preprocessing techniques, and evaluation methods. Below are the major types of data in
ML:
1. Based on Structure
a. Structured Data
Definition: Data that is organized in a well-defined format, often as rows and columns in
tables.
Examples:
o Tabular data in spreadsheets or databases.
o Numerical values (e.g., sales figures, temperature).
o Categorical data (e.g., gender, product type).
Applications:
o Predictive modeling in business analytics.
o Customer segmentation.
b. Unstructured Data
2. Based on Format
a. Numerical Data
Definition: Data in textual form, often requiring natural language processing (NLP).
Examples:
o Product reviews, emails, tweets.
Applications:
o Sentiment analysis.
o Language translation.
d. Time-Series Data
Definition: Visual data, including static images and sequences of frames (videos).
Applications:
o Object detection and recognition.
o Video summarization.
f. Audio Data
3. Based on Labeling
a. Labeled Data
5. Based on Relationships
a. Independent Data
Definition: Data where instances are independent of each other.
Example:
o Images in a dataset for object classification.
Use Case:
o Standard supervised or unsupervised learning tasks.
b. Dependent Data
Definition: Data with dependencies or correlations between instances.
Example:
o Sequential data (time-series, text).
Use Case:
o Tasks involving RNNs or attention-based models.
Matching:
Matching in machine learning refers to the process of identifying similar or related entities across
datasets or within a dataset. This process is crucial in various applications, from recommendation
systems to entity resolution. Below are key aspects of matching in ML:
1. Types of Matching
a. Exact Matching
Definition: Identifying entities that are similar but not necessarily identical.
Example: Matching "Jon Smith" to "John Smyth."
Techniques:
o String similarity measures (e.g., Levenshtein distance, Jaro-Winkler).
o Fuzzy matching using token-based methods.
c. Schema Matching
Definition: Matching fields or columns across different datasets with varying schema.
Example: Mapping "Customer Name" in one dataset to "Client Full Name" in another.
Applications: Data integration and ETL (Extract, Transform, Load) processes.
d. Entity Matching
Definition: Matching entities across datasets that refer to the same real-world object.
Example: Identifying that "Amazon Inc." in one dataset and "AMZN" in another refer to the
same company.
Applications: Data deduplication, master data management.
Approach: Computes the probability of two entities being a match using statistical models.
Example:
o Fellegi-Sunter model for record linkage.
Applications:
o Matching in noisy or incomplete datasets.
d. Clustering-Based Matching
Approach: Uses neural networks for complex matching tasks, especially with unstructured
data.
Techniques:
o Siamese Networks: Learn a similarity function for pairs of inputs.
o Transformers: For matching textual entities (e.g., BERT).
o Image Matching: Using CNNs for visual similarity.
Applications:
o Image-to-image matching.
o Textual paraphrase detection.
String-Based Measures:
o Levenshtein (Edit) Distance
o Jaro-Winkler Similarity
o Cosine Similarity (for tokenized strings)
Numerical Measures:
o Euclidean Distance
o Manhattan Distance
Semantic Measures:
o Word embeddings (e.g., Word2Vec, GloVe)
o Contextual embeddings (e.g., BERT)
4. Applications of Matching in ML
a. Recommendation Systems
5. Challenges in Matching
Data Sets:
Datasets are the foundation of machine learning (ML), providing the data needed to train, validate,
and test models. Depending on the application and type of problem being addressed, datasets can vary
in size, structure, and purpose. Here’s an overview of datasets in ML:
1. Types of Datasets
a. Training Dataset
Definition: The dataset used to train the machine learning model by helping it learn patterns
and relationships.
Key Characteristics:
o Typically the largest portion of the data.
o Requires preprocessing to ensure quality (e.g., cleaning, normalization).
Example: Images of cats and dogs with labeled categories for training a classification model.
b. Validation Dataset
Definition: A separate dataset used during training to tune model hyperparameters and
evaluate performance.
Key Characteristics:
o Helps prevent overfitting.
o Allows performance monitoring during training.
Example: A subset of labeled customer data used to validate a recommendation model.
c. Test Dataset
Definition: The dataset used to evaluate the final model's performance after training.
Key Characteristics:
o Not used during training or validation.
o Provides an unbiased estimate of model generalization.
Example: A hold-out dataset of unseen customer transactions for fraud detection.
Definition: Data that fits into a tabular format with rows and columns.
Example: Sales data with attributes like product ID, price, and quantity.
b. Unstructured Data
Definition: Data that has some organizational properties but does not fit into a strict table
format.
Example: JSON, XML, or log files.
5. Dataset Preparation
a. Data Collection
Collect raw data from various sources (e.g., sensors, APIs, user logs).
b. Data Cleaning
Train/Validation/Test Split:
o 60-70% training data.
o 10-20% validation data.
o 10-20% test data.
1. Problem Definition
Goal: Clearly define the problem you aim to solve and understand the desired outcome.
Key Steps:
o Identify the business or technical challenge.
o Specify whether the problem is supervised, unsupervised, or reinforcement-based.
o Define success metrics (e.g., accuracy, precision, recall, or business-specific KPIs).
Example: Predict customer churn in a subscription-based business.
2. Data Collection
3. Data Preparation
Goal: Clean, preprocess, and transform the data to make it usable for modeling.
Key Steps:
o Handle missing values (e.g., imputation or deletion).
o Remove duplicates and outliers.
o Normalize, scale, or encode features as needed.
o Split data into training, validation, and test sets.
Example: Standardize numerical features, encode categorical variables, and handle class
imbalances.
5. Feature Engineering
6. Model Selection
7. Model Training
8. Model Evaluation
9. Hyperparameter Tuning
10. Deployment
Goal: Deploy the model into a production environment for real-world use.
Key Steps:
o Integrate the model into applications, APIs, or pipelines.
o Monitor resource requirements (e.g., CPU, GPU, memory).
o Ensure scalability and reliability under load.
Example: Deploy a recommendation engine as part of an e-commerce website.
Goal: Ensure the model remains effective and accurate over time.
Key Steps:
o Monitor performance metrics in production.
o Detect data drift or concept drift (changes in data or target patterns).
o Update or retrain the model as needed with new data.
Example: Retrain a fraud detection model with updated transaction logs every month.
Data Acquisition:
Data acquisition refers to the process of collecting and obtaining data required for developing
machine learning models. This stage is critical as the quality and relevance of the data significantly
impact the performance of the model. Data can be acquired from various sources, and the method
depends on the specific problem domain.
Data Representation:
Data representation refers to how raw data is transformed and structured for input into a machine
learning model. Effective representation ensures the data is interpretable by the model and preserves
critical information for accurate predictions. Choosing the right representation is crucial as it directly
impacts model performance.
Model Interpretability: Simplifies the task for algorithms to learn meaningful patterns.
Data Compatibility: Converts raw, unstructured, or diverse data into a format suitable for
ML algorithms.
Improved Performance: Better representation enhances feature extraction, leading to more
accurate predictions.
High Dimensionality: Too many features can lead to computational inefficiency and
overfitting.
Data Sparsity: Sparse data representations can hinder learning in some algorithms.
Domain-Specific Knowledge: Effective representation often requires understanding the
domain.
Scalability: Representing large datasets efficiently can be computationally expensive.
Model Selection:
Model selection refers to the process of choosing the best machine learning model from a set of
candidates to solve a specific problem. It involves evaluating models' performance based on the data
and task requirements, considering various metrics, constraints, and trade-offs.
Model Learning:
Model learning refers to the process by which a machine learning algorithm identifies patterns and
relationships in training data to make predictions or decisions. It involves optimizing a model's
parameters to minimize error and improve its performance on specific tasks.
The algorithm defines how the model adjusts its parameters based on the input data.
Measures the difference between the predicted output and the actual target.
Common types:
o Regression: Mean Squared Error (MSE), Mean Absolute Error (MAE).
o Classification: Cross-Entropy Loss, Hinge Loss.
Objective: Minimize the loss function to improve model performance.
d. Optimization
The process of finding the best parameters that minimize the loss function.
Common techniques:
o Gradient Descent: Iteratively updates parameters in the direction of the steepest
descent.
o Variants: Stochastic Gradient Descent (SGD), Adam, RMSProp.
e. Model Parameters
Compute the gradient of the loss function with respect to each parameter.
Use the chain rule to propagate gradients backward through the network.
e. Parameter Update
Repeat the forward pass, loss calculation, backpropagation, and parameter update for multiple
epochs (iterations over the entire dataset).
3. Learning Strategies
a. Batch Learning
The model updates its parameters after processing each individual data point.
Advantages: Faster updates, better for large datasets.
Disadvantages: Noisy convergence.
c. Mini-Batch Learning
The model updates its parameters after processing a small batch of data points.
Advantages: Combines benefits of batch and stochastic learning.
Disadvantages: Requires tuning batch size for optimal performance.
The model performs well on training data but poorly on unseen data.
Solutions:
o Use regularization (L1, L2).
o Increase training data.
o Apply dropout (for neural networks).
b. Underfitting
Gradients become too small or too large during backpropagation, hindering learning.
Solutions:
o Use gradient clipping.
o Apply better weight initialization techniques.
o Use activation functions like ReLU.
d. Learning Rate Issues
A learning rate that's too high may cause divergence, while one that's too low may slow
convergence.
Solutions:
o Use learning rate schedules (e.g., exponential decay).
o Apply adaptive optimizers like Adam.
e. Data Quality
Poor quality data (e.g., noise, outliers, missing values) can hinder learning.
Solutions:
o Preprocess and clean the data.
o Augment the dataset with synthetic samples.
Model Evaluation:
Model evaluation is the process of assessing how well a machine learning model performs on unseen
data. It helps determine the model's effectiveness and generalization capability, ensuring it meets the
required standards for a specific task. Proper evaluation is crucial to avoid overfitting and under
fitting, ensuring the model's robustness.
Assess Generalization: Understand how well the model performs on data it hasn't seen
during training.
Compare Models: Evaluate different models or algorithms to select the best one for the task.
Optimize Performance: Identify areas where the model can be improved (e.g., via
hyperparameter tuning or feature engineering).
Prevent Overfitting: Ensure that the model doesn't memorize the training data but
generalizes well to new, unseen data.
A technique where the dataset is split into multiple subsets (folds), and the model is trained
and tested on different folds. This provides a more reliable estimate of model performance.
K-Fold Cross-Validation: The dataset is divided into kkk folds, and the model is trained kkk
times, each time using a different fold as the validation set.
Stratified K-Fold Cross-Validation: Used for imbalanced datasets, ensuring each fold
maintains the same class distribution as the full dataset.
Split the data into training and testing sets, using a portion of the data for training and another
portion for testing. A common split ratio is 70-30 or 80-20.
b. Cross-Validation
As discussed earlier, splitting the data into multiple folds helps improve the reliability of
evaluation, particularly for smaller datasets.
c. Bootstrap Sampling
Create multiple datasets by sampling with replacement from the original data, allowing for
multiple evaluations of the model.
4. Confusion Matrix
5. Bias-Variance Trade-off
Bias refers to the error introduced by assuming a simplified model, while variance refers to
the error introduced by a model that is too complex and sensitive to small fluctuations in the
training data.
o High bias, low variance: The model is too simple (underfitting).
o Low bias, high variance: The model is too complex (overfitting).
The goal is to find the optimal balance between bias and variance that minimizes the total error.
Imbalanced Datasets: For classification problems with imbalanced classes, accuracy can be
misleading. Precision, recall, F1-score, and ROC-AUC are more informative.
Outliers: Outliers can distort model evaluation metrics, especially in regression tasks.
Preprocessing may be needed to handle outliers.
Data Leakage: Information from outside the training set accidentally used during model
training can lead to overly optimistic performance estimates.
Overfitting and Underfitting: Models that perform well on training data but poorly on test
data may be overfitting. Cross-validation helps mitigate this.
Model Prediction:
Model prediction in machine learning refers to the process where a trained machine learning model is
used to make predictions on new, unseen data. This involves inputting data into the model and
obtaining output results based on the patterns or relationships the model learned during the training
phase.
1. Model Training
Before making predictions, a model is trained using labeled data (for supervised learning). During
training:
The model learns the relationship between input features (independent variables) and the
target variable (dependent variable).
Various algorithms (e.g., linear regression, decision trees, neural networks) adjust their
parameters to minimize error on the training dataset.
Preprocessing: New data must undergo the same preprocessing steps as the training data
(e.g., scaling, normalization, encoding).
Feature Selection: Ensure the input features match the format and order used during training.
3. Making Predictions
4. Evaluation (Optional)
Regression Metrics: Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), R²
score.
Classification Metrics: Accuracy, Precision, Recall, F1 score, ROC-AUC.
Search involves exploring a space of possible solutions to find the best one based on a given objective
or criteria. This is a key component in several areas of machine learning:
Key Areas of Search
Optimization Problems:
o Algorithms like gradient descent search for the optimal parameters (weights) of a
model by minimizing (or maximizing) a cost function.
o Example: Training a neural network involves searching for weights that minimize
loss.
o Techniques like grid search, random search, and Bayesian optimization search for the
best hyper parameter values to improve model performance.
o Example: Finding the best learning rate or number of layers for a neural network.
o Algorithms like A*, breadth-first search (BFS), and depth-first search (DFS) are used
in problems such as path finding, game playing, and planning.
Evolutionary Algorithms:
o Techniques like genetic algorithms and particle swarm optimization search for
solutions by simulating natural evolution.
Types of Search
Stochastic Search:
Supervised Learning:
Unsupervised Learning:
o The model learns patterns or structures in data without labeled outputs.
o Example Algorithms: K-means clustering, principal component analysis (PCA).
o Use Cases: Customer segmentation, anomaly detection.
Semi-Supervised Learning:
o Combines a small amount of labeled data with a large amount of unlabeled data.
o Use Cases: Text classification with limited labeled examples.
Reinforcement Learning:
o The model learns by interacting with an environment, receiving feedback in the form
of rewards or penalties.
o Example Algorithms: Q-learning, Deep Q-Networks (DQN).
o Use Cases: Game playing, robotics, self-driving cars.
Learning Methods
Data Acquisition:
Data acquisition refers to the process of collecting and preparing data for use in machine learning
models. It is a critical step because the quality and quantity of data directly impact the performance of
the model. This process involves identifying data sources, collecting data, and ensuring it is in a
usable format.
Model Accuracy: High-quality, relevant data ensures better learning and generalization.
Model Robustness: Diverse data reduces bias and improves model reliability.
Domain Understanding: Acquiring data helps understand the problem domain and define
the objectives.
2. Sources of Data
Primary Data Sources (Collected First-Hand):
Public Datasets:
Define the problem and identify the input features and target variables.
Determine the type of data needed: numerical, categorical, text, images, audio, etc.
Step 2: Identify Data Sources
Choose between primary (new data collection) and secondary (existing datasets) sources
based on availability and feasibility.
Step 3: Collect the Data
Organize the data into suitable formats (e.g., CSV, JSON, or database tables).
Use storage solutions such as local storage, cloud platforms, or data lakes.
o Ensuring data privacy and compliance with regulations like GDPR or CCPA.
Volume of Data:
Accessibility:
Cost:
ETL Tools: Apache NiFi, Talend, or Informatica for data extraction and transformation.
Database Query Tools: SQL for querying and extracting structured data.
Cloud Data Acquisition:
Feature Engineering:
Feature engineering is the process of transforming raw data into meaningful features that improve the
performance of machine learning models. It involves creating, selecting, and optimizing input
variables to better represent the underlying problem and facilitate model learning.
Reduces Overfitting:
Suppose we are working with a dataset of housing prices and want to predict house prices.
Dataset Snapshot:
Square_Feet Bedrooms Location Year_Built Price
2000 3 Suburb 1990 300000
1500 2 City 2010 250000
Steps:
Transform Features:
Feature Selection:
o Remove highly correlated features or irrelevant ones.
Resulting Features:
Square_Feet Bedrooms House_Age Price_per_SqFt Location_City Location_Suburb
2000 3 33 150 0 1
1500 2 13 166.67 1 0
Python Libraries:
Visualization Tools:
Learning by Rote:
Learning by rote refers to the process of memorizing data or patterns exactly as they appear without
generalization or understanding. In machine learning, this concept corresponds to models that
"memorize" the training data instead of learning underlying patterns or relationships.
While rote learning might achieve high performance on the training set, it often leads to poor
performance on unseen data due to overfitting.
Exact Memorization:
1. The model retains specific details of the training data rather than learning
generalizable patterns.
Lack of Generalization:
1. It performs well on the training data but fails to predict accurately for new, unseen
data.
Overfitting:
1. The model is overly complex and fits the noise or peculiarities in the training dataset
rather than the actual trends.
Example of Learning by Rote
Scenario:
Suppose you train a machine learning model to classify images of cats and dogs.
1. The model memorizes exact pixel values of the images in the training set.
2. When presented with a new image of a cat or dog with slight differences (e.g., angle,
lighting), the model fails to classify it correctly.
1. A properly trained model learns features like fur patterns, shapes, and other attributes
that distinguish cats from dogs.
2. This allows it to generalize and correctly classify unseen images.
1. Models like deep neural networks can have excessive capacity, enabling them to
memorize the data.
Insufficient Data:
1. Small datasets increase the likelihood of the model memorizing specific examples.
Lack of Regularization:
1. Without techniques like dropout, L1/L2 regularization, or early stopping, models may
overfit.
1. Use larger and more representative datasets to ensure the model sees diverse
examples.
Regularization Techniques:
Cross-Validation:
1. Use k-fold cross-validation to ensure the model generalizes across different subsets of
the data.
Early Stopping:
1. Monitor performance on a validation set and stop training when performance stops
improving.
Data Augmentation:
1. For image or text data, apply transformations (e.g., rotations, flips, synonyms) to
create diverse training samples.
Simpler Models:
1. Choose models with appropriate complexity for the data (e.g., decision trees with
depth limits).
Learning by Induction:
Learning by induction refers to the process of deriving general rules or patterns from specific
examples. It is a fundamental approach in machine learning where models generalize from the
training data to make predictions or decisions about unseen data.
Inductive learning is often contrasted with deductive learning (applying existing rules to specific
cases) and abductive learning (inferring the most likely explanation for a given observation).
Generalization:
1. Models learn patterns or relationships from training data and apply them to unseen
examples.
Uncertainty:
1. Inductive learning involves making predictions even when complete certainty is not
possible.
Empirical Approach:
Input:
1. A set of labeled training examples (xi,yi), where xi represents the input features and yi
is the corresponding output label.
Learning Process:
1. The algorithm identifies patterns or relationships in the data (e.g., finding decision
boundaries, fitting a curve, or clustering similar data points).
Output:
1. A model that can make predictions on new, unseen data by applying the learned
patterns.
Induction Process:
o Predicting house prices based on features like size, location, and number of rooms.
2. Unsupervised Learning:
Induction Process:
Induction Process:
o Agents infer policies that maximize rewards over time by interacting with an
environment.
Example Use Case:
Collect Data:
Pre-process Data:
Choose a Model:
o Select an algorithm appropriate for the task (e.g., decision trees, neural networks,
support vector machines).
Generalization Capability:
Data-Driven:
Versatility:
Over fitting:
Bias-Variance Trade-off:
o Balancing under fitting (too simple) and over fitting (too complex).
Uncertainty:
o Predictions may be inaccurate when patterns in the data are weak or ambiguous.
Reinforcement Learning:
Reinforcement Learning (RL) is a type of machine learning where an agent learns to make
decisions by interacting with an environment to maximize cumulative rewards over time. Unlike
supervised learning, RL doesn't rely on labelled data but instead learns from the consequences of its
actions.
Key Concepts in Reinforcement Learning
Agent:
1. The entity that takes actions in the environment to achieve a goal.
Environment:
1. The external system the agent interacts with, which provides feedback on the agent's
actions.
State (s):
1. A representation of the environment's current situation.
Action (a):
1. The choice made by the agent at a given state.
Reward (r):
1. A numerical value provided by the environment as feedback for the agent's action.
Policy (π):
1. A strategy or mapping from states to actions that the agent follows.
Value Function (V(s)):
1. Estimates the expected cumulative reward starting from a state s, assuming the agent
follows a specific policy.
Q-Function (Q(s,a)):
1. Estimates the expected cumulative reward starting from a state s, taking an action a,
and following a specific policy thereafter.
Agent-Environment Interaction:
1. The agent takes an action, receives a reward, and transitions to a new state.
Policy Update:
Iteration:
1. Repeat interactions and updates until the agent learns an optimal policy.
Model-Free RL:
1. The agent learns directly from interactions without knowing the environment's
transition dynamics.
2. Examples:
1. Q-Learning
2. Deep Q-Networks (DQN)
Model-Based RL:
1. The agent learns a model of the environment and uses it to plan and optimize actions.
2. Examples:
1. Alpha Zero
On-Policy RL:
1. Updates the policy based on the actions the agent actually takes.
2. Example: SARSA (State-Action-Reward-State-Action).
Off-Policy RL:
Healthcare: