KEMBAR78
Unit 1 Introduction | PDF | Machine Learning | Data Mining
0% found this document useful (0 votes)
46 views40 pages

Unit 1 Introduction

The document discusses the concept of intelligent machines, focusing on the goals of AI/ML to emulate human cognitive functions such as perception, memory, learning, and reasoning. It outlines the components of well-posed machine learning problems, emphasizing the importance of clearly defined tasks, experiences, and performance measures, and provides examples across various fields like healthcare, finance, and retail. Additionally, it highlights the significance of data representation and domain knowledge in effectively applying machine learning techniques.

Uploaded by

mahesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views40 pages

Unit 1 Introduction

The document discusses the concept of intelligent machines, focusing on the goals of AI/ML to emulate human cognitive functions such as perception, memory, learning, and reasoning. It outlines the components of well-posed machine learning problems, emphasizing the importance of clearly defined tasks, experiences, and performance measures, and provides examples across various fields like healthcare, finance, and retail. Additionally, it highlights the significance of data representation and domain knowledge in effectively applying machine learning techniques.

Uploaded by

mahesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

Chapter 1: Introduction

1.1 Towards Inteliigent Machines

➤ What is Intelligence in Machines?

Humans learn from experience, adapt to new situations, and make decisions.

Goal of AI/ML: Build machines that emulate human cognitive functions like:

Perception (seeing, hearing)

Memory

Learning from data

Reasoning and decision-making

➤ Biological Inspiration:

Human brain: ~100 billion neurons.

Learns by connecting sensory input (eyes, ears, etc.) with reasoning and actions.

Machine learning draws inspiration from:

Neural networks (artificially mimic brain neurons)

Fuzzy logic (handles imprecise, vague information)

➤ Transition from Conventional to Cognitive Machines:

Aspect Traditional Computers Cognitive Machines


Information Numbers, deterministic logic Relative/graded data (like human thinking)
Operations Predefined programs Learn from data
Goal Process data Think and adapt
1.2 WELL-POSED MACHINE LEARNING PROBLEMS

Definition (as per Tom Mitchell):

"A computer program is said to learn from experience (E) with respect to some class of tasks
(T) and performance measure (P), if its performance at tasks in T, as measured by P,
improves with experience E."

• To be well-posed, an ML problem must clearly define three components:

Component Description Example


1. Task (T) What is the machine trying to do? Predict if an email is spam or not
What data or observations is it A dataset of labeled emails
2. Experience (E)
learning from? (spam/ham)
3. Performance How to evaluate how well it's Accuracy, Precision, Recall, F1-
Measure (P) learning? score

➢ Example: Email Spam Classifier

Component Value
Task (T) Classify emails as "spam" or "not spam"
Experience (E) Historical emails labeled as spam/not spam
Performance Measure (P) % of correctly classified emails (accuracy)

If the machine improves its classification accuracy as it sees more labeled emails, it is said to have
learned.

➢ Ill-posed ML Problem Example

• If any one of T, E, or P is missing or unclear, the problem is not well-posed.

Example:

“I want the machine to be smart.”

This is ill-posed because:

• No clear task
• No data mentioned
• No performance measure

Analogy
➢ Just like a math problem needs:

• Clear question
• Given data
• Method to solve/check

➢ A machine learning problem must define:

• What to do
• What to learn from
• How to judge performance

➢ A well-posed machine learning problem is one that clearly specifies:

1. What task is being performed,


2. What experience the machine uses to learn,
3. How performance is evaluated and improved.

Examples of TEP:

1. Medical Field – Disease Diagnosis

Task (T):

Classify whether a patient has diabetes or not.

Experience (E):

A dataset of patients' medical records including:


• Blood pressure
• Glucose level
• BMI
• Age
• Diagnosis (Diabetes: Yes/No)

Performance Measure (P):

• Accuracy of classification
• Confusion matrix
• Sensitivity (recall for positive class)

2. Finance – Credit Card Fraud Detection

Task (T):

Detect if a credit card transaction is fraudulent.

Experience (E):

Historical transaction data with:

• Amount
• Location
• Time
• Device ID
• Label: Fraud or Not

Performance Measure (P):

• Precision (minimize false positives)


• Recall (detect most frauds)
• F1-Score (balance both)

3. Robotics – Path Planning for Autonomous Robot

Task (T):

Enable a robot to navigate from point A to point B avoiding obstacles.

Experience (E):
Past data of:

• Sensor readings
• Map of environment
• Successful and failed paths

Performance Measure (P):

• Time taken to reach target


• Collision rate
• Energy efficiency

4. Stock Market – Price Prediction

Task (T):

Predict the closing price of a stock for the next day.

Experience (E):

Time series data of:

• Past stock prices


• Trading volume
• Indicators (e.g., moving averages)

Performance Measure (P):

• Mean Absolute Error (MAE)


• Mean Squared Error (MSE)

5. Retail – Product Recommendation

Task (T):

Recommend products to users based on their purchase history.

Experience (E):

User behavior data:

• Past purchases
• Ratings
• Clicks, views
Performance Measure (P):

• Click-through rate (CTR)


• Conversion rate

1.3 Applications of Machine Learning in Diverse Fields

Machine Learning (ML) can be applied wherever data exists. As data grows in quantity and
complexity, ML provides automatic learning and decision-making capabilities in many
industries.

1. Internet and Web Services

Application Description
Search Engines Google uses ML (like PageRank) to rank web pages.
Ad Recommendation Ads on YouTube or Google match your interests using ML.
Email Spam Filters Automatically classify emails as spam or not.

2. Retail and E-Commerce

Application Description
Product Amazon suggests products using your purchase history
Recommendations (Recommender Systems).
Customer Behavior Walmart analyzes sales data to plan promotions, inventory.
Analysis
Personalized Marketing Offers and discounts are targeted using browsing and buying
patterns.

3. Healthcare and Medicine

Application Description
Medical Diagnosis ML analyzes X-rays, MRIs, ECGs for detecting diseases.
Drug Discovery Predict whether a chemical compound can be used as a drug.
Genome Sequencing Align biological sequences to find genetic similarities.

4. Banking and Finance

Application Description
Fraud Detection Recognize unusual spending patterns in credit card usage.
Customer Retention Predict which customers are likely to leave (churn).
Loan Risk Prediction Assess whether a customer is likely to repay a loan.
5. Image, Speech, and Text Processing

Application Description
Biometric Recognition Face and fingerprint recognition systems.
Speech Recognition Voice commands in Alexa, Google Assistant.
Handwriting Recognition Reading cheques and addresses from handwritten documents.
Text Mining Extract information from documents, emails, or online
reviews.
Natural Language Processing Chatbots, translators, and sentiment analysis.
(NLP)

6. Industrial and Manufacturing Systems

Application Description
Fault Detection Detect machine failures before they occur.
Control and Automation Intelligent machines that adapt production lines (CNC, FMS).
Predictive Maintenance Predict when a machine is likely to break down.

7. Energy and Utilities

Application Description
Load Forecasting Predict future electricity demand using time-series ML.
Smart Grids Optimize power distribution in real-time.

8. Transport and Autonomous Systems

Application Description
Self-driving Cars Identify pedestrians, traffic signs, make decisions.
Traffic Prediction Google Maps predicting congestion using ML.

9. Business Intelligence and CRM

Application Description
Customer Segmentation Group customers based on behavior.
Churn Prediction Identify users likely to stop using a service.
Forecasting Sales/Revenue Predict future growth and plan accordingly.

10. Science and Research

Application Description
Astronomy Analyze telescope data to find galaxies or black holes.
Geology Predict earthquakes or analyze seismic data.
Meteorology Forecast weather and climate patterns.
1.4 Data Representation

• In machine learning, data is the source of learning. For any learning algorithm to
work, raw data must be represented in a structured and understandable format, typically
as numerical vectors.
• Experience in the form of raw data is a source of learning in many applications. Raw data
require some preprocessing with respect to the class of tasks.
• This leads to an information system, that represents the knowledge in the raw data used for
decision making.
• The information-system data (representation-space data) may be stored in data warehouse.
• Data warehousing provides integrated, consistent and cleaned data to machine learning
algorithms. However, machine learning is not confined to analysis of data accessed online
from data warehouses. For many applications, we can assume availability of data in a flat
file, which is a simple data table.

➢ Why is Data Representation Important?

• ML algorithms work with numerical data in vector form.


• Proper representation helps in:
o Identifying patterns
o Measuring similarity between instances
o Applying mathematical models

➢ Types of Data Representation

Type Description Example


Structured Data Organized in tabular form (rows and Excel sheets, database tables
columns)
Unstructured No fixed structure Text documents, images,
Data audio

Common Terms in Data Tables

Term Meaning
Instance / Pattern / Record / Sample A single row (one observation)
Feature / Attribute / Variable A single column (a property or measurement)
Label / Output / Target / Decision Attribute The desired output (e.g., classification label)
➢ Example: Simple Data Table

• Information system is a form of data table D; each row of the table represents a
measurement/ observation, and each column gives the values of an attribute of the
information system for all measurements/observations.
• Different terms have been used to name the rows depending on the context of application.
Some commonly used terms are: instances, examples, samples, measurements,
observations, records, patterns, objects, cases, events.
• Similarly, different terms have been used to name the columns; attributes and features
being the most common.
• For directed/supervised learning problems, an outcome for each observation is known a
priori. This knowledge is expressed by one distinguished attribute, called the decision
attribute. Information systems of this kind are called decision systems.
• The last column in Table represents a decision attribute with respect to the task to
categorize patients into two classes: {Flu: yes}, {Flu: no}; Flu is the decision attribute
with respect to the condition attributes: Headache, Muscle-pain, and Temperature.

• Here:
o Rows = Patients (instances)
o Columns = Attributes (Headache, Muscle Pain, Temperature)
o Flu = Decision attribute (label/output)

➢ Decision Systems

• If the data includes an output/label, it's called a decision system.


• For classification problems, the output is categorical.
• For regression problems, the output is numeric.
Example: Vector Representation

For patient 1:

• Headache = Yes → 1
• Muscle Pain = Yes → 1
• Temperature = High → 2 then

Label (Flu) = Yes → 1

Types of Attributes (Features)

Type Description Examples


Numeric (Quantitative) Real or integer numbers Age, Temperature
Nominal (Categorical) Finite set of categories Gender (Male/Female), Color
Ordinal Categories with order Pain Level: Low, Medium, High

➢ Representation in Vector Form

• Each instance is represented as a vector:

Where:

• x1, x2,...,,xn are the n features


• x is the feature vector

Geometrical View

• If you have 2 features, each instance is a point in 2D space.


• For n features, the dataset lies in n-dimensional space.
• Distance between vectors tells how similar two instances are.

➢ Data Table Template


• In above table, the training experience is available in the form of N examples:
s(i) Є S; i = 1, 2, …, N; where S is a set of possible instances.
• We specify an instance by a fixed number n of attributes/features xj ; j = 1, 2,
…, n [xj Є X]. We can visualize each instance with n numerical features as a
point in n-dimensional state space Rn
• The pair (S, X) constitutes the information system, where S is a non-empty
set of instances, and X is a non-empty set of features; we have represented
instances by the index ‘i’, and features by the index ‘j’:
{s(i) ; i = 1, 2, …, N} Є S
{x(i) ; i = 1, 2, …, N} Є X
{xj(i) ; i = 1, 2, …, N
j = 1, 2, …, n}
• The tuple (S, X, Y) constitutes a decision system where finite set of condition
attributes xj Є X, and the decision attribute (output) y(i) Є Y.
• We can visualize Y as a one-dimensional region in state space, i.e., Y Є R.
• We may express attribute y as,
y(i) Є {y(1), y(2), …, y(N)}
➢ Time Series Representation (Brief Overview)

Used when data is sequential over time, e.g., stock prices.

• Input = Past values: y(t),y(t−1),y(t−2),...,y(t−n)y(t), y(t-1), y(t-2), ..., y(t-n)


• Output = Future value: y(t+1)y(t+1)

Key Concept Explanation


Data Representation Converting raw data into structured, numerical form
Feature Vector A list of features for one instance
Decision System Dataset with both input features and output labels
Structured Data Tabular format, easy to process
Importance Good representation leads to better learning and generalization

1.4.1 Time Series Forecasting – Detailed Notes

➢ Time series forecasting involves predicting future values of a variable based on its past
values.
➢ Definition: A time series is a sequence of data points indexed by time, usually recorded at
regular intervals (daily, hourly, etc.).
➢ Examples of Time Series Problems

Application What is Forecasted


Stock Market Tomorrow’s stock price
Weather Rainfall or temperature for next day
Power Industry Future electricity demand
Finance Exchange rates, inflation
Health Heart rate trend from past ECG data

➢ Key Characteristics

• Data points are not independent — each value depends on past values.
• Data is ordered over time — most recent values are more important.
• Often exhibits trends, seasonality, or noise.

➢ Mathematical Representation – NARMA Model

y(t+1) = f(y(t), y(t-1), ..., y(t-n))


Where:
- y(t) = value at time t
- y(t+1) = predicted value
- f(·) = function learned by ML
- n = number of lags

➢ Key Terms

Lag – Number of past values


Univariate – One variable over time
Multivariate – Multiple variables
k-step prediction – Predicting value k steps ahead
➢ Structure of Input-Output

(a) Univariate: x = [y(t-1), y(t-2), ..., y(t-n)] → y(t+1)


(b) Multivariate: x = [z1(t-1), ..., zm(t-Lm)] → y(t+k)

➢ Example: Stock Price Prediction

y(t+1) = f(y(t), y(t-1), y(t-2))


Input: [y(t), y(t-1), y(t-2)] → Output: y(t+1)

➢ How ML Learns in Time Series

1. Create dataset using past as input, future as label.


2. Train ML model to learn function f(·).
3. Use model to predict future values.

➢ Benefits of ML in Time Series

- Handles nonlinearity
- No strict statistical assumptions
- Learns from multiple variables

➢ Challenges

- Overfitting on small/noisy data


- High dimensionality
- Concept drift over time

1.5 Domain Knowledge for Productive Use of Machine Learning

• Domain Knowledge means understanding the specific field or application area where
machine learning is being used.
• It helps in making better decisions when preparing and analyzing data.

➢ Why is Domain Knowledge Important?

Machine Learning is not just about applying algorithms—it’s about solving real-world
problems.
For that:

Step in ML Process Role of Domain Knowledge


Problem formulation Understand what to predict and why
Feature selection Know which attributes matter most
Data preprocessing Detect incorrect, missing, or irrelevant data
Interpretation Evaluate if the model’s results make sense in real-life

• The design of learning systems requires knowledge of what various machine


learning algorithms do, along with a deep knowledge of the application domain.
• Knowledge of the domain is absolutely essential for success.
• Domain knowledge without knowledge of machine learning techniques is still
useful, but knowledge of machine learning techniques without domain
knowledge is of no productive use; it can lead to some trash results accepted as
valid conclusions, and strategic decisions based on such results can be disastrous.
• We have seen earlier that raw data when mapped to a vector space is N x n matrix
(data matrix); the N rows represent the N objects/instances/patterns, and the n
columns represent features/ attributes.
• For many practical applications, the features/attributes are numeric in nature.
Mapping of raw data to vector spaces requires appropriate processing of the
data.
• Today, raw data are no longer restricted to numerical measurements/observations
only.
• Machine intelligence is capable of dealing with multimedia data: image,
audio, text. Conversion of multimedia raw data to vector space is a tedious task
requiring in-depth knowledge in the application area
• The problems of feature generation and feature selection must be addressed at the
outset of any machine learning system design. The key is to choose features that
• are computationally feasible;
• lead to ‘good’ machine-learning success; and
• reduce the problem data into a manageable amount of information without
discarding valuable (or vital) information.
• Generation of features for patterns/objects in a machine learning problem is very
much application dependent.
• Domain knowledge plays a crucial role in generating the features that will
subsequently be fed to a machine learning algorithm.
• Each feature must carry valuable information with respect to the machine
learning task.
• Also, two features carrying ‘good’ information when treated separately, may be
highly mutually correlated; in that case, there is little gain in including both of
them in the feature vector.

➢ Example Scenarios

1. Healthcare Domain

• Without domain knowledge: You might use irrelevant symptoms for flu detection.
• With domain knowledge: You choose meaningful features like fever, cough, fatigue, etc.

2. Agriculture

• Predicting crop yield without knowing the effect of rainfall or soil type may fail.
• A domain expert knows which weather and soil parameters are important.

3. Manufacturing

• Predictive maintenance models require knowledge of machine behavior and failure signals.

➢ Impact on Feature Engineering

Domain experts help identify:

o Relevant features
o Transformations (e.g., BMI = weight/height² in medical analysis)
o Important correlations
➢ Role in Preprocessing

Task Domain Expert's Contribution


Handling missing data Decide if missing values are critical or ignorable
Removing outliers Identify if an extreme value is real or error
Data transformation Convert raw features into usable formats
➢ Example: Customer Churn Prediction

• A marketing expert knows that:


o Low engagement (e.g., fewer logins) is a churn signal.
o Complaints filed could be more important than average purchase.

ML alone may not capture these without input from domain knowledge.

Key Role of Domain Knowledge Examples


Feature selection Choosing relevant symptoms in disease prediction
Data cleaning Interpreting outliers, handling missing values
Problem framing Choosing correct inputs and outputs
Interpretation Understanding if model predictions are realistic

➢ Domain knowledge bridges the gap between raw data and meaningful machine learning.
It helps ensure that the ML model not only performs well statistically but also makes sense in
the real-world context.

1.6 Diversity of Data: Structured and Unstructured

➢ What Does Data Diversity Mean?

Data in the real world comes in various forms and formats. Machine learning models must be
able to handle different types of data:

• Some data is well-organized and labeled


• Some data is free-form, complex, and messy

Understanding this diversity is essential for building effective ML systems.

• Generally, digital information can be categorized into two classes—structured and


unstructured. Studies have recently revealed that 70–80 per cent of all the data available
to corporations today is unstructured, and this figure is only increasing at a rapid rate.
• Usually, traditional data sources exist in the structured realm, which means, traditional
data follows a predefined format, which is unambiguous and is available in a
predefined order.
• For instance, for the stock trade, the first field received should be the date in a
MM/DD/YYYY format. Next could be an account number in a 12-digit data format,
followed by stock symbol—three to five digit character field, and so on. Structured data
fits into a fixed file format.
• We rarely have any control on unstructured data sources, for instance, text streams
from a social media site. It is not possible to ask users to follow specific standards of
grammar or sentence ordering or vocabulary. We will only get what we get when people
post something. This amounts to capturing everything possible, and worrying later
about what matters. This makes such data different from traditional data. The ‘big data’
problems are faced with this type of data that we get largely from unstructured data
sources.

➢ Structured Data

❖ Characteristics:

• Stored in tabular form: rows and columns (like in Excel or databases)


• Each row = one instance
• Each column = one attribute (feature)
• Easy to analyze using conventional ML tools

❖ Examples:

Customer ID Age Gender Purchase


C001 25 Male Yes
C002 30 Female No

• Rows: Individual customers


• Columns: Attributes like Age, Gender, etc.

❖ Sources:

• SQL Databases
• Spreadsheets (Excel)
• CSV files

➢ Unstructured Data

❖ Characteristics:

• No fixed format or schema


• Cannot be directly fed into traditional ML algorithms
• Needs feature extraction or preprocessing before use

❖ Examples:

Type of Data Description


Text Social media posts, emails, documents
Images Photographs, X-rays, medical scans
Audio Voice commands, recordings
Video CCTV footage, YouTube videos

These types of data are rich in information but harder to process.


• For the structured data, first-generation data mining algorithms are in effective use in
various kinds of applications wherein it becomes possible to derive numeric and categorical
features.
• Diversity of data, leading us to unstructured domain in a big way, poses a big research
challenge: mining sequence data, mining graphs and networks, mining spatiotemporal
data, mining cyber physical data, mining multimedia data, mining web data, mining data
streams, and other issues.
• In addition to complexity in data, the volume of data is too massive. Scaling to complex,
extremely large datasets—the big data—is probably the most debated current research
issue.
• The next-generation machine learning is on an evolving platform today.

➢ How ML Handles Unstructured Data?

• Requires advanced techniques such as:

• Text: Natural Language Processing (NLP)


• Images: Convolutional Neural Networks (CNNs)
• Audio: Signal Processing + Deep Learning

➢ Example:

• A model cannot understand a paragraph of text directly.


• Text must be converted into numerical form using techniques like:
o Bag-of-Words
o TF-IDF
o Word Embeddings (Word2Vec, GloVe)

➢ Structured vs Unstructured – Comparison Table

Feature Structured Data Unstructured Data


Format Tabular (rows/columns) Free-form (text, images, audio)
Ease of Use Easy to process Hard to process
Examples Excel sheets, SQL tables PDFs, tweets, X-rays
Processing Tools ML algorithms (SVM, DT, LR) Deep Learning (CNN, RNN, NLP)
Representation Numerical Needs transformation

➢ Semi-Structured Data

• Falls between structured and unstructured


• Has some organization, but not in a strict table
• Example: JSON, XML, HTML files

➢ Real-World Statistics

• Around 70–80% of all data generated today is unstructured


• Big data platforms (like Hadoop, Spark) are used to manage such data

Concept Meaning
Structured Data Organized, tabular format, easy to analyze
Unstructured Data Free-form, complex data (text, image, audio)
Semi-Structured Partially organized (like JSON)
Challenge Unstructured data must be converted into numerical features before use
Tools Traditional ML for structured, Deep Learning for unstructured

1.7 Forms of Learning in Machine Learning

• In the broadest sense, any method that incorporates information from experience in the design
of a machine, employs learning.
• A learning method depends on the type of experience from which the machine will learn (with
which the machine will be trained).
• The type of available learning experience can have significant impact on success or failure of
the learning machine.
• The field of machine learning usually distinguishes four forms of learning:
– Supervised learning
– Unsupervised learning
– Reinforcement learning and
– Learning based on natural processes—evolution, swarming, and immune systems.

1. Supervised Learning

Definition:

The algorithm is trained on a labeled dataset — each input comes with a known correct output
(called the label).
Objective:

Examples:

Input Features Output (Label)


Symptoms (Fever, Cough) Disease (e.g., Flu)
Email text Spam / Not Spam
House size, location Price

Use Cases:

• Classification (e.g., disease diagnosis, image recognition)


• Regression (e.g., price prediction)

• The machine is designed by exploiting the priori known information in the form of
‘direct’ training examples consisting of observed values of system states
– (input vectors): x(1), …, x(N) , and
– the response (output) to each state: y(1), …, y(N).
• The ‘supervisor’ has, thus, provided the following data:

D = {s (i) , y(i) }; i = 1, …, N

s(i) = xj(i) : {x1(i),x2(i),x3(i),....,xn(i) }

• The dataset D is used for inferring a model of the system. If the data D lies in the region
X of the state space Rn (X Є Rn) then X must be fully representative of situations over
which our machine will later be used.
• There are two types of tasks for which supervised/directed learning is used:
– Classification (pattern recognition) and
– Regression (numeric prediction)
• Supervised learning is used to train the machine using labeled data.
• This algorithm takes labeled inputs and maps with the known outputs.
• Supervised learning models need external supervisor to train models.
Example of Supervised learning

A person’s age and weight

• In supervised learning, there will be one dependent feature and there can be any number
of independent features.
• In our example,
– Age is a an independent feature and
– Weight is a dependent feature.

Age Weight
20 45
24 49
24 48
23 46
Classification

• Training data {x, y} are the input-output data;


– x is an input vector with n features xj; j = 1, …, n, as its components and
– output y is a discrete class yq ; q = 1, …, M.
• In classification tasks, the goal is to predict the output values for new inputs (i.e., deciding
which of the M classes each new vector x belongs to) based on training from examples of
each class.

Regression:

• Training data {x, y} are the input-output data; x are the regressors, and y is a continuous
numeric quantity.
• Regression task consists of fitting a function to the input-output data to predict output values
(numeric) for new inputs.
• Classification and regression tasks arise in many applications, such as, signal processing,
optimization, modeling and identification, control, and many business applications.
2. Unsupervised Learning

Definition:

➢ The algorithm is given data without any labels and must discover hidden patterns or
structure in the data.
➢ Another form of machine learning tasks is when output y(i) is not available in training data.
➢ In this type of problem, we are given a set of feature vectors x(i), and the goal is to unravel the
underlying similarities.
➢ Two different types of learning tasks frequently appear in the real-world applications of
unsupervised learning.
➢ Cluster analysis
➢ Association analysis
➢ Unsupervised learning uses unlabeled data to train the machine.
➢ Understands patterns and trends in the data and discovers the output.
➢ Unsupervised learning techniques do not need any supervision to train models.

Objective:

Group or organise data based on similarity, without predefined categories.

Examples:

• Grouping customers based on buying behaviour


• Segmenting regions in a satellite image
• Dimensionality reduction (e.g., PCA)

Use Cases:

• Clustering (K-Means, DBSCAN)


• Association Rule Mining (e.g., Market Basket Analysis)
• Feature reduction

➢ Cluster analysis

• Cluster analysis is employed to create groups or clusters of similar records on the basis of
many measurements made for these records.
• A primary issue in clustering is that of defining ‘similarity’ between feature vectors x(i) ; i
= 1, 2, …, N, representing the records.
• Another critical issue is the selection of an algorithmic scheme that will cluster (group) the
vectors based on the accepted similarity measure.
• Clustering jobs emerge in several applications. Biologists, for instance, make use of classes
and subclasses to organise species.
• A widespread application of cluster analysis in marketing is for market segmentation,
where customers are grouped based on demographic and transaction history information,
and a tailored marketing strategy is developed for each segment.
• Other application domains for cluster analysis are remote sensing, image segmentation,
image and speech coding, and many more.
• After cluster patterns have been detected, it is the responsibility of the investigator to
interpret them and decide whether they are helpful.
➢ Association analysis
• Association analysis emerged with the study of customer transaction databases to establish
an association between purchases of different items/services on offer.
• This common area of application is known as market basket analysis, which studies
customers’ purchase patterns for products that are bought together Example:
Amazon.com.
• Other application domains for association analysis are medical diagnosis, scientific data
analysis, web mining, and many more.

3. Semi-Supervised Learning

Definition:

The dataset contains a small amount of labelled data and a large amount of unlabeled data.

Objective:

Improve learning performance by using both types of data.

Examples:

• Labeling few medical images and using thousands of unlabeled ones to build a better
model
• Email classification with only a few labelled messages

Use Case:

Common in areas where labelling is expensive (e.g., medical, legal data)

4. Reinforcement Learning

Definition:

An agent learns by interacting with an environment, receiving feedback in the form of


rewards or penalties.

Objective:

Learn an optimal policy to maximize cumulative reward.

Examples:

• Robot learning to walk


• Self-driving car navigating traffic
• Playing games like chess, Go (AlphaGo)
➢ Reinforcement learning is a feedback-based machine learning approach.
➢ Here, an agent learns which action to perform by looking at the environment and the
results of actions.
➢ For each correct action, the agent gets positive feedback, and for each incorrect action,
the agent gets negative feedback or a penalty.

Terminology:

• Agent: Learner or decision-maker


• Environment: What the agent interacts with
• State: Current situation of the agent
• Action: What the agent can do
• Reward: Feedback signal (+ve or −ve)

Example for Reinforcement:

Reinforcement Learning is a feedback-based learning technique where an agent learns to take


actions in an environment to maximize rewards.
Key Elements in the Image:

1. Dog (Agent):
o The dog represents the agent in reinforcement learning.
o The agent interacts with the environment and makes decisions based on its state.
2. Sitting → Walk (Action/State Transition):
o Initially, the dog is in a "Sitting" state.
o An action (here, transitioning from sitting to walking) changes the state.
o This represents the agent performing an action based on its current state.
3. Reward (Positive Feedback):
o Once the dog performs the correct action ("Walk"), it receives a reward (a treat).
o In RL, rewards are used to reinforce correct behavior.
o Over time, the agent learns which actions lead to higher rewards.

Learning Process in RL (as shown):

1. The agent (dog) starts in a state (sitting).


2. It takes an action (walk).
3. The environment provides a reward (treat) if the action is correct.
4. This reward helps the agent learn that walking leads to a positive outcome.
5. With repeated interactions, the agent learns to choose actions that maximize rewards.
Explanation of the Grid World Image:

Scenario:

• A robot agent (bottom-left corner) starts at the cell labeled "Start".


• The goal is for the agent to navigate through the grid to reach a positive outcome while
avoiding negative ones.

Key Elements in the Image:

1. Robot (Agent):
o Represents the learning agent that moves within the environment.
o It decides its actions (up, down, left, right) at each step.
2. Grid Cells (Environment):
o Each cell is a state the agent can occupy.
o The environment provides feedback (reward or penalty) based on the cell the
agent moves to.
3. Green Flag Cell (Goal):
o Top-right corner with checkered flags on a green background.
o This is the goal state.
o Entering this cell gives the agent a positive reward (success).
4. Red Fire Cell (Danger):
o Adjacent to the goal, with a flame icon on a red background.
o This is a penalty state.
o Entering this cell gives a negative reward (failure).
5. Gray Cell (Obstacle):
o The gray block is an obstacle or blocked path.
o The agent cannot pass through this cell, representing an invalid move.

Objective of the Agent:

• The agent must learn a policy (a strategy) to:


o Maximize cumulative rewards
o Avoid penalties
o Find the optimal path to the green goal while navigating around obstacles and
avoiding the red danger zone.

Learning Process:

1. The agent tries various paths (trial-and-error).


2. Receives feedback based on the cells it enters.
3. Adjusts its strategy over time to improve.
4. Eventually, it learns the optimal path to reach the goal efficiently.
5. Online Learning

Definition:

Model learns incrementally from a stream of data, one observation at a time.

Advantage:

Used in real-time applications where data keeps updating.

Examples:

• Real-time stock price prediction


• Personalized news recommendation

6. Batch (Offline) Learning

Definition:

Model is trained on the entire dataset at once, and then deployed.

Examples:

• Train a loan approval model on 10,000 historical records


• Train an image recognition system using a fixed dataset

Forms of Learning : Comparative table

Form Data Type Goal Examples


Supervised Labeled Predict output Email spam detection,
Learning disease prediction
Unsupervised Unlabeled Discover structure Customer segmentation,
Learning clustering
Semi-Supervised Few labels + many Improve with less Medical image classification
unlabeled effort
Reinforcement Reward-based Learn best actions Game playing, robotic
Learning control
Online Learning Real-time data Adapt Stock market, news
continuously recommendation
Batch Learning Fixed data Train once, Loan approval, weather
deploy prediction
➢ Learning Based on Natural Processes: Evolution, Swarming, and Immune
Systems
Some learning approaches take inspiration from nature for the development of novel problem
solving techniques.
The thread that ties together learning based on evolution process, swarm intelligence, and
immune systems is that all have been applied successfully to a variety of optimization
problems.

➢ Evolutionary Computation
➢ It derives ideas from evolutionary biology to develop search and optimization methods that
help solve complicated problems.
➢ Evolutionary biology essentially states that a population of individuals possessing the ability
to reproduce and exposed to genetic variation followed by selection, gives rise to new
populations, which are fitter to their environment.
➢ Computational abstraction of these processes gave rise to the so called evolutionary
algorithms. The primary streams of evolutionary computation are genetic algorithms,
evolution strategies, evolutionary programming and genetic programming.
➢ Even though differences exist among these models, they all present the fundamental traits of
an evolution process.

➢ Swarm Intelligence
➢ Swarm intelligence is a feature of systems of unintelligent agents with inadequate individual
abilities, displaying collectively intelligent behavior.
➢ It includes algorithms derived from the collective behavior of social insects and other animal
societies.
➢ The primary lines of research that can be recognized within swarm intelligence are: (i) Based
on social insects (Ant Colony Optimization) (ii) Based on the ability of human societies to
process knowledge (Particle Swarm Optimization).

➢ Ant Colony Optimization (ACO):


Ants, seemingly small simple creatures, cooperate to solve complex problems, such as the
most effective route to a source of food, that seem well beyond the ability of individual members
of the hive or colony.

• Particle Swarm Optimization (PSO):

The particle swarm algorithm is motivated, among other things, by the creation of a
simulation of human social behavior—the quality of human societies to process knowledge.
Particle swarm considers a population of individuals possessing the ability to interact with
the environment and one another. Therefore, population-level behaviors will arise from
individual interactions.
Although the approach was initially inspired by particle systems and the collective
behavior of some animal societies, the algorithm primarily emphasizes on its social adaptation
of knowledge.

➢ Artificial Immune Systems


➢ All living beings possess the ability to resist disease-causing agents or pathogens in the form
of bacteria, viruses, parasites and fungi.
➢ The main role of the immune system is to act as a shield for the body, protecting it from
infections caused by pathogens.
➢ An Artificial Immune System (AIS) replicates certain aspects of the natural immune system,
which is primarily applied to solve pattern-recognition problems and cluster data.
➢ The natural immune system has an extraordinary ability to match patterns, employed to
differentiate between foreign cells making an entry into the body (referred to as antigen) and
the cells that are part of the body.
➢ As the natural immune system faces and handles antigens, it exhibits its adaptive nature: the
immune system memorizes the structure of these antigens to ensure a speedier response to the
antigens in the future.

1.8 Machine Learning and Data Mining –

Data mining: The process of extracting useful information from a huge amount of data is called
Data mining. Data mining is a tool that is used by humans to discover new, accurate, and useful
patterns in data or meaningful relevant information for the ones who need it.

Machine learning: The process of discovering algorithms that have improved courtesy of
experience derived data is known as machine learning. It is the algorithm that permits the machine
to learn without human intervention. It’s a tool to make machines smarter, eliminating the human
element.

What is the Relationship Between Machine Learning and Data Mining?

• Both Machine Learning (ML) and Data Mining (DM) involve analyzing data.
• They often use similar techniques (e.g., classification, clustering, decision trees).
• However, their goals and usage are different.
What is Machine Learning (ML)?

➢ Machine Learning is the science of making computers learn from data and improve their
performance at a task without being explicitly programmed. The main goal is to build

• To build predictive models that generalize well to unseen data.


• Focuses on learning from labeled or unlabeled data to make predictions or decisions.

➢ Example Tasks:

• Classifying images as cats or dogs


• Predicting student exam scores
• Diagnosing a disease from symptoms

What is Data Mining (DM)?

• Data Mining is the process of discovering hidden patterns, trends, and associations in
large datasets. The main goal is to:
o To extract useful knowledge from data.
o Focuses on exploration, summarization, and pattern discovery rather than
prediction.
• Example Tasks:

• Finding customer buying habits in a supermarket


• Discovering fraudulent transactions
• Identifying which products are often bought together

➢ Key Differences Between ML and DM

Aspect Machine Learning Data Mining


Main Goal Learn a model for prediction Discover patterns and insights
Focus Accuracy, generalization Knowledge discovery, understanding
data
Data Can use labeled or unlabeled Often uses historical data
Requirement data
Typical Output Predictive model (classifier, Patterns, clusters, rules, associations
regressor)
Application Self-driving cars, diagnosis, NLP Business analysis, marketing, fraud
detection

➢ Similarities Between ML and DM

• Use common techniques:


o Clustering (e.g., K-Means)
o Classification (e.g., Decision Trees)
o Association Rule Mining (e.g., Apriori)
• Both require:
o Data preprocessing
o Feature selection
o Model evaluation

Data mining Machine learning


Extracting useful information from large Introduce algorithm from data as well as
amount of data from past experience
Used to understand the data flow Teaches the computer to learn and
understand from the data flow
Huge databases with unstructured data Existing data as well as algorithms
Models can be developed for using data Machine learning algorithm can be used
mining technique in the decision tree, neural networks and
some other area of artificial intelligence
human interference is more in it. No human effort required after design
It is used in cluster analysis It is used in web Search, spam filter, fraud
detection and computer design
Data mining abstract from the data Machine learning reads machine
warehouse
Data mining is more of a research using Self learned and trains system to do the
methods like machine learning intelligent task

Data mining Machine learning


Applied in limited area Can be used in vast area
Uncovering hidden patterns and insights Making accurate predictions or decisions
based on data
Exploratory and descriptive Predictive and prescriptive
Historical data Historical and real-time data
Patterns, relationships, and trends Predictions, classifications, and
recommendations
Clustering, association rule mining, Regression, classification, clustering, deep
outlier detection learning
Data cleaning, transformation, and Data cleaning, transformation, and feature
integration engineering
Strong domain knowledge is often Domain knowledge is helpful, but not
required always necessary
Can be used in a wide range of Primarily used in applications where
applications, including business, prediction or decision-making is
healthcare, and social science important, such as finance, manufacturing,
and cybersecurity
Simple Analogy

Field Analogy
Machine Learning Teaching a student how to solve future problems
Data Mining Reading a student’s old exam papers to understand their habits

Criteria Machine Learning (ML) Data Mining (DM)


Predictive or Predictive Descriptive
Descriptive
Real-time or Can be real-time Mostly historical data
Historical
Examples Email spam filter, image Market basket analysis, fraud
recognition patterns
Output Model (classifier/regressor) Patterns, associations, summaries

➢ Machine Learning and Data Mining both use data, but:

• ML focuses on learning to predict


• DM focuses on understanding and extracting knowledge

They are complementary. Data mining can prepare data for machine learning, and ML can
automate pattern discovery..

1.9 Basic Algebra in Machine Learning Techniques

➢ Why Do We Need Algebra in Machine Learning?

Machine Learning heavily uses linear algebra to represent data, perform computations, and
build models.

All ML algorithms work on vectors, matrices, and linear transformations.

Key Algebraic Concepts Used in ML : Scalars, Vectors, Matrices


2. Vector Notation in ML

• An input data point is represented as a column vector:

• If there are NN data points (samples), all data can be combined into a matrix:

Where:

• Each row = one input vector (sample)


• Each column = one feature
3. Matrix Operations

a) Addition/Subtraction

• Only valid if matrices are of same shape


• Element-wise operation

b) Transpose

• Flips rows and columns.

c) Matrix Multiplication

• Used in ML models like Linear Regression and Neural Networks.

• Dimensions must match for multiplication to be valid.


• Each output value is a dot product of a row of AA and vector xx.

d) Dot Product (Inner Product)

4. Identity and Inverse Matrix


➢ Application in ML Algorithms

ML Technique Algebra Used


Linear Regression Dot product for prediction: y=wTxy = \mathbf{w}^T \mathbf{x}
Gradient Descent Vector derivatives for optimization
PCA Eigenvalues and eigenvectors of covariance matrix
Neural Networks Matrix multiplications in forward/backward pass
SVM Inner products for similarity and classification

➢ Geometrical View of Vectors

• Each vector = point in space.


• ML tries to find a hyperplane (linear separator) using vector algebra.
• Distance between vectors = similarity between data points.

➢ Linear algebra is the language of machine learning.

• It helps represent and compute with data in a clean, mathematical way.


• Knowing basic operations like vectors, matrices, and dot products is essential for
understanding ML algorithms.

You might also like