0% found this document useful (0 votes)

14 views14 pages

AML Unit-1

The document discusses data representation techniques essential for training machine learning models, including methods for numerical, categorical, text, time series, image, and composite data. It emphasizes the importance of domain knowledge throughout the machine learning process, from objective definition to deployment, and highlights the differences between structured and unstructured data. Additionally, it covers the significance of linear algebra in machine learning, supervised learning concepts, and the implications of bias and variance in model performance.

Uploaded by

mahekgure05

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views14 pages

AML Unit-1

Uploaded by

mahekgure05

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 14

Data Representation: It refer to the techniques used to transform and present input data in a

format that is suitable for training and evaluating machine learning models. Effective data
representation is crucial for ensuring that models can learn meaningful patterns and
relationships from the input features. Different types of data, such as numerical, categorical, and
text, may require specific representation methods.

Numerical Data

Numerical features often have different scales, and models might be sensitive to these
variations. Scaling methods, such as Min-Max scaling or Z-score normalization, ensure that
numerical features are on a similar scale, preventing certain features from dominating the model
training process.

Categorical Data

One-Hot Encoding: Categorical variables, which represent discrete categories, need to be

encoded numerically for machine learning models. One-hot encoding is a common method
where each category is transformed into a binary vector, with a 1 indicating the presence of the
category and 0 otherwise.

Text Data

Vectorization: Text data needs to be converted into a numerical format for machine learning
models. Techniques like TF-IDF (Term Frequency-Inverse Document Frequency) or word
embeddings, such as Word2Vec or GloVe, are used to represent words or documents as
numerical vectors.

Time Series Data

Temporal Features: For time series data, relevant temporal features may be extracted, such as
day of the week, month, or time of day. Additionally, lag features can be created to capture
historical patterns in the data.

Image Data

Pixel Values: Images are typically represented as grids of pixel values. Deep learning models,
particularly convolutional neural networks (CNNs), directly operate on these pixel values to
extract hierarchical features.

Composite Data

Combining Representations: In some cases, datasets may consist of a combination of

numerical, categorical, and text features. Representing such composite data involves using a
combination of the methods mentioned above, creating a comprehensive and effective input
format for the model.
Domain Knowledge : Domain Knowledge in machine learning refers to expertise and
understanding of the specific field or subject matter to which the machine learning model is
applied. While machine learning algorithms are powerful tools for analyzing data and making
predictions, they often require domain experts to ensure that the models interpret the data
correctly and make meaningful predictions.

 Objective Definition: Domain experts is inegral throughout the machine learning process,
from defining object to deploy models.
 Data Collection: Collecting relevant datasets from diverse sources is important, align with
domain intricacies and data availability.
 Data Preprocessing: Cleaning, transforming, and encoding data to ensure quality and
compatibility with the chosen machine learning algorithms.
 Model Selection & Tuning: Selecting appropriate algorithms and fine-tuning model
parameters, Guided by domain knowledge to optimize performance and interoperability.
 Interpretation of Results: Domain Experts interpret model outputs, validating prections
against domain- specific knowledge and contextual understanding.
 Deployemnt : Deploying the trained model into prection environments, considering
domain constraints and scalability requirements for real-world applications.

Diversity of Data: "Diversity of data" refers to the different types of data structures that can be
used for training models, primarily categorized as "structured" data (organized in neat tables with
predefined fields) and "unstructured" data (like text, images, or audio, which lack a clear,
consistent format) .

"structured data" refers to information neatly organized in a predefined format, like a table with
rows and columns, making it easy to analyze with traditional tools, while "unstructured data" is
information that doesn't fit into a structured format, like text documents, images, or audio files,
requiring specialized techniques to extract meaningful insights.
Key points about structured data:
 Well-defined format:
Data is organized with clear labels and data types, usually stored in relational databases.
 Examples:
Customer details with name, address, phone number, product sales data, financial records.
 Analysis methods:
Easily analyzed using standard statistical methods and traditional machine learning algorithms.
 Structured data applications:
 Recommendation systems based on user purchase history
 Fraud detection in financial transactions
 Customer churn prediction based on demographic data

Key points about unstructured data:

 No predefined format: Data exists in its native format, like a text document or image, without a
structured organization.
 Examples: Social media posts, emails, scanned documents, videos, audio recordings.
 Analysis methods: Requires advanced techniques like Natural Language Processing (NLP) for
text analysis or computer vision for image recognition to extract meaningful information.
How they are used in applied machine learning:
Unstructured data applications:
 Sentiment analysis of customer reviews
 Image recognition for object detection in security systems
 Text summarization to extract key points from documents

.No. Data Mining Machine Learning

Extracting useful information from Introduce algorithm from data as well as from
1.
large amount of data past experience

Teaches the computer to learn and understand

2. Used to understand the data flow
from the data flow

Huge databases with unstructured

3. Existing data as well as algorithms
data
.No. Data Mining Machine Learning

machine learning algorithm can be used in the

Models can be developed for using
4. decision tree, neural networks and some other
data mining technique
area of artificial intelligence

5. human interference is more in it. No human effort required after design

It is used in web Search, spam filter, fraud

6. It is used in cluster analysis
detection and computer design

Data mining abstract from the data

7. Machine learning reads machine
warehouse

Data mining is more of a research

Self learned and trains system to do the
8. using methods like machine
intelligent task
learning

9. Applied in limited area Can be used in vast area

Uncovering hidden patterns and Making accurate predictions or decisions based

10.
insights on data

11. Exploratory and descriptive Predictive and prescriptive

12. Historical data Historical and real-time data

Predictions, classifications, and

13. Patterns, relationships, and trends
recommendations

Clustering, association rule mining, Regression, classification, clustering, deep

14.
outlier detection learning

Data cleaning, transformation, and Data cleaning, transformation, and feature

15.
integration engineering

Strong domain knowledge is often Domain knowledge is helpful, but not always
16.
required necessary
.No. Data Mining Machine Learning

Can be used in a wide range of Primarily used in applications where prediction

17. applications, including business, or decision-making is important, such as
healthcare, and social science finance, manufacturing, and cybersecurity

Linear Algebra for Machine learning

Machine learning has a strong connection with mathematics. Each machine learning algorithm is
based on the concepts of mathematics & also with the help of mathematics, one can choose the
correct algorithm by considering training time, complexity, number of features, etc. Linear
Algebra is an essential field of mathematics, which defines the study of vectors, matrices,
planes, mapping, and lines required for linear transformation.

The term Linear Algebra was initially introduced in the early 18 th century to find out the
unknowns in Linear equations and solve the equation easily; hence it is an important branch of
mathematics that helps study data. Also, no one can deny that Linear Algebra is undoubtedly the
important and primary thing to process the applications of Machine Learning. It is also a
prerequisite to start learning Machine Learning and data science.

Linear algebra plays a vital role and key foundation in machine learning, and it enables ML
algorithms to run on a huge number of datasets.

The concepts of linear algebra are widely used in developing algorithms in machine learning.
Although it is used almost in each concept of Machine learning, specifically, it can perform the
following task:

o Optimization of data.
o
o Applicable in loss functions, regularisation, covariance matrices, Singular Value
Decomposition (SVD), Matrix Operations, and support vector machine classification.
o Implementation of Linear Regression in Machine Learning.
Below are some benefits of learning Linear Algebra before Machine learning:

o Better Graphic experience

o Improved Statistics
o Creating better Machine Learning algorithms
o Estimating the forecast of Machine Learning
o Easy to Learn

Better Graphics Experience:

o Linear Algebra helps to provide better graphical processing in Machine Learning like
Image, audio, video, and edge detection. These are the various graphical representations
supported by Machine Learning projects that you can work on. Further, parts of the given
data set are trained based on their categories by classifiers provided by machine learning
algorithms. These classifiers also remove the errors from the trained data.
o Moreover, Linear Algebra helps solve and compute large and complex data set through a
specific terminology named Matrix Decomposition Techniques.

Improved Statistics:

Statistics is an important concept to organize and integrate data in Machine Learning. Also,
linear Algebra helps to understand the concept of statistics in a better manner. Advanced
statistical topics can be integrated using methods, operations, and notations of linear algebra.

Creating better Machine Learning algorithms:

Linear Algebra also helps to create better supervised as well as unsupervised Machine Learning
algorithms.

Few supervised learning algorithms can be created using Linear Algebra, which is as follows:

o Logistic Regression
o Linear Regression
o Decision Trees
o Support Vector Machines (SVM)

Relevant resources for machine learning:

"Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow" book, Google's
Machine Learning Crash Course, the Python Data Science Handbook, the TensorFlow library,
and platforms like Kaggle for datasets and practice projects; with key skills including Python
programming and data visualization techniques using libraries like Matplotlib and Seaborn.
Key points about these resources:
 "Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow":
A widely recommended book that provides a practical guide to building machine learning
models using popular Python libraries like Scikit-Learn, Keras, and TensorFlow.
 Google's Machine Learning Crash Course:
A free online course offered by Google AI Education, ideal for beginners, covering essential
machine learning concepts with interactive elements.
 Python Data Science Handbook:
A valuable resource for learning core data science concepts in Python, including data
manipulation with Pandas and NumPy, which are fundamental for machine learning projects.
 TensorFlow:
An open-source library developed by Google, providing extensive tools for building, training,
and deploying machine learning models.
 Kaggle:
A platform with a large collection of datasets and machine learning competitions, enabling
hands-on practice with real-world data.
Other relevant aspects to consider:
 Programming Language:
Python is the most commonly used language for machine learning due to its simplicity and
extensive libraries like Scikit-learn, Keras, and TensorFlow.
 Data Visualization:
Understanding data through visualization tools like Matplotlib and Seaborn is crucial for
exploratory analysis and interpreting machine learning results.
 Machine Learning Concepts:
Familiarize yourself with supervised learning (e.g., regression, classification), unsupervised
learning (e.g., clustering), and reinforcement learning.

Supervised Machine Learning

Supervised learning is the types of machine learning in which machines are trained using well
"labelled" training data, and on basis of that data, machines predict the output. The labelled data
means some input data is already tagged with the correct output.

In supervised learning, the training data provided to the machines work as the supervisor that
teaches the machines to predict the output correctly. It applies the same concept as a student
learns in the supervision of the teacher.

Supervised learning is a process of providing input data as well as correct output data to the
machine learning model. The aim of a supervised learning algorithm is to find a mapping
function to map the input variable(x) with the output variable(y).

In the real-world, supervised learning can be used for Risk Assessment, Image classification,
Fraud Detection, spam filtering, etc.

How Supervised Learning Works?

In supervised learning, models are trained using labelled dataset, where the model learns about
each type of data. Once the training process is completed, the model is tested on the basis of test
data (a subset of the training set), and then it predicts the output.

The working of Supervised learning can be easily understood by the below example and
diagram:
Suppose we have a dataset of different types of shapes which includes square, rectangle, triangle,
and Polygon. Now the first step is that we need to train the model for each shape.

o If the given shape has four sides, and all the sides are equal, then it will be labelled as
a Square.
o If the given shape has three sides, then it will be labelled as a triangle.
o If the given shape has six equal sides then it will be labelled as hexagon.
Now, after training, we test our model using the test set, and the task of the model is to identify
the shape.

The machine is already trained on all types of shapes, and when it finds a new shape, it classifies
the shape on the bases of a number of sides, and predicts the output.

"Learning from observation":

It refers to the process where a machine learning model analyzes labeled data points
(observations) to identify patterns and relationships between input features and the
corresponding target variable, allowing it to learn how to predict new outputs based on new input
data; essentially, the model learns by observing the known relationships within the training data
to make predictions on unseen data.
Key points about learning from observation in supervised learning:
 Labeled data:
Unlike unsupervised learning, supervised learning requires labeled data where each data point
has a known output value (the "label") which the model uses to learn the correct associations
between features and target values.
 Feature extraction:
The model extracts relevant features from each observation to understand the underlying
patterns that contribute to the target variable.
 Pattern recognition:
By analyzing numerous observations, the model identifies recurring patterns and relationships
within the data, allowing it to make predictions on new data points with similar characteristics.
 Model refinement:
As the model observes more data, it continuously adjusts its internal parameters to improve its
accuracy in predicting future outcomes.

Example:
 Image classification: If a model is learning to classify images of animals, each image is an
"observation" with a label indicating the animal species. By observing thousands of labeled
images, the model learns to identify features like fur color, shape, and size that distinguish
different animal types, enabling it to classify new images accurately.

What is Bias?
Bias is simply defined as the inability of the model because of that there is some difference or
error occurring between the model’s predicted value and the actual value. These differences
between actual or expected values and the predicted values are known as error or bias error or
error due to bias. Bias is a systematic error that occurs due to wrong assumptions in
the machine learning process.
Let Y be the true value of a parameter, and let Y^ be an estimator of Y based on a sample of
data. Then, the bias of the estimator Y^ is given by:
Bias(Y^)=E(Y^)–Y
where E(Y^) is the expected value of the estimator Y^. It is the measurement of the model that
how well it fits the data.
 Low Bias: Low bias value means fewer assumptions are taken to build the target function.
In this case, the model will closely match the training dataset.
 High Bias: High bias value means more assumptions are taken to build the target function.
In this case, the model will not match the training dataset closely.
The high-bias model will not be able to capture the dataset trend. It is considered as
the underfitting model which has a high error rate. It is due to a very simplified algorithm.
For example, a linear regression model may have a high bias if the data has a non-linear
relationship.

Variance:
In machine learning variance is the amount by which the performance of a predictive model
changes when it is trained on different subsets of the training data. More specifically, variance
is the variability of the model that how much it is sensitive to another subset of the training
dataset. i.e. how much it can adjust on the new subset of the training dataset.

Let Y be the actual values of the target variable, and Y^ be the predicted values of the target
variable. Then the variance of a model can be measured as the expected value of the square of
the difference between predicted values and the expected value of the predicted values.

Variance=E[(Y^–E[Y^])2]
where E[Yˉ] is the expected value of the predicted values. Here expected value is averaged
over all the training data.

Variance errors are either low or high-variance errors.

 Low variance: Low variance means that the model is less sensitive to changes in the
training data and can produce consistent estimates of the target function with different
subsets of data from the same distribution. This is the case of underfitting when the model
fails to generalize on both training and test data.
 High variance: High variance means that the model is very sensitive to changes in the
training data and can result in significant changes in the estimate of the target function
when trained on different subsets of data from the same distribution. This is the case of
overfitting when the model performs well on the training data but poorly on new, unseen
test data. It fits the training data too closely that it fails on the new training dataset.

Computational learning theory (CoLT):is a foundational aspect of artificial intelligence

that focuses on understanding the principles and algorithms that enable machines to learn
from data. This field combines elements of computer science, statistics, and mathematical
logic to analyze the design and performance of learning algorithms. The significance of
computational learning theory in machine learning lies in its ability to provide a formal
framework for quantifying learning tasks and assessing the efficiency of various algorithms.

Importance of Computational Learning Theory

The importance of computational learning theory in machine learning can be summarized as

follows:
1. Framework for Analysis: It provides a structured approach to analyze the capabilities and
limitations of learning algorithms.
2. Guidance for Algorithm Design: Insights from computational learning theory can inform the
development of new algorithms that are more efficient and effective.
3. Understanding Generalization: It helps in understanding how well a learning algorithm can
generalize from training data to unseen data, which is crucial for real-world applications.

Key Concepts in Computational Learning Theory

1. Probably Approximately Correct (PAC) Learning

PAC learning, introduced by Leslie Valiant in 1984, is a framework that formalizes the notion of
learning from examples. The central idea is that a learning algorithm can be considered
successful if it can produce a hypothesis that is approximately correct with high probability,
given a sufficient amount of training data.
Key Elements of PAC Learning
 Hypothesis: A function that maps inputs to outputs, representing the learned model.
 Error Rate: The fraction of instances where the hypothesis differs from the true function.
 Confidence: The probability that the hypothesis is approximately correct.
The PAC learning framework allows researchers to derive bounds on the number of examples
needed for a learning algorithm to achieve a desired level of accuracy and confidence.
2. Vapnik-Chervonenkis (VC) Dimension

The VC dimension is a measure of the capacity of a statistical classification algorithm, defined as

the largest set of points that can be shattered by the algorithm. Shattering means that the
algorithm can perfectly classify all possible labelings of the points.
Importance of VC Dimension
 Capacity Control: The VC dimension helps in understanding the trade-off between model
complexity and generalization ability. A model with a high VC dimension may fit the training
data well but could overfit and perform poorly on unseen data.
 Generalization Bounds: It provides theoretical bounds on the generalization error, allowing
practitioners to select models that balance complexity and performance.
3. Sample Complexity

Sample complexity refers to the number of training examples required for a learning algorithm to
achieve a certain level of accuracy and confidence. Understanding sample complexity is crucial
for designing efficient learning algorithms, as it directly impacts the amount of data needed for
training.
Factors Influencing Sample Complexity
 Dimensionality: The number of features in the dataset can significantly affect the sample
complexity. High-dimensional data often requires more samples to achieve reliable learning.
 Noise: The presence of noise in the data can increase the sample complexity, as the algorithm
must learn to distinguish between relevant patterns and random fluctuations.

Applications of Computational Learning Theory

Computational learning theory has numerous applications across various domains, including:
 Natural Language Processing (NLP): Algorithms for text classification, sentiment analysis,
and language modeling benefit from insights gained through computational learning theory.
 Computer Vision: Image recognition and object detection tasks often rely on learning
algorithms that are informed by principles from computational learning theory.
 Healthcare: Predictive models for disease diagnosis and treatment outcomes are developed
using learning algorithms guided by computational learning theory.
 Finance: Risk assessment and fraud detection systems leverage machine learning models that are
designed with the help of computational learning theory.

Occam’s Razor in Machine Learning:

Occam's razor is commonly employed in machine learning to guide model selection and prevent
overfitting. Overfitting occurs when a model becomes overly complex and fits the training data
too closely, resulting in poor generalization to new, unseen data. Occam's razor helps address
this issue by favoring simpler models that are less likely to overfit.

In machine learning, Occam's razor can be visualized using the bias-variance trade-off. The bias
refers to the error introduced by approximating a real-world problem with a simplified model,
while variance refers to the model's sensitivity to fluctuations in the training data. The goal is to
find the optimal balance between bias and variance to achieve good generalization.

As the model complexity increases, the bias decreases since the model becomes more capable of
representing complex patterns. However, the variance tends to increase, making the model more
sensitive to the training data. The optimal trade-off point minimizes the total error, achieving a
balance between simplicity and flexibility.

Occam's razor suggests selecting a model that lies closer to the optimal trade-off point, favoring
simplicity and avoiding unnecessary complexity. This can be represented mathematically using
regularization techniques such as L1 or L2 regularization, which add penalty terms to the model's
objective function −

Regularized Objective = Loss + Regularization Term

The regularization term imposes a constraint on the model's complexity, penalizing large
parameter values. By tuning the regularization parameter, the model can strike the right balance
between simplicity and accuracy, aligning with Occam's razor.

Overall, Occam's razor guides the selection of simpler models and the application of
regularization techniques in machine learning to mitigate overfitting, improve generalization, and
adhere to the principle of simplicity.

Example: Uses of Occam’s Razor in Machine Learning

One example of how Occam's razor is used in machine learning is feature selection. Feature
selection involves choosing a subset of relevant features from a larger set of available features to
improve the model's performance and interpretability. Occam's razor can guide this process by
favoring simpler models with fewer features.

When faced with a high-dimensional dataset, selecting all available features may lead to
overfitting and increased computational complexity. Occam's razor suggests that a simpler model
with a reduced set of features can often achieve comparable or even better performance.
Various techniques can be employed to implement Occam's razor in feature selection. One
common approach is called "forward selection," where features are incrementally added to the
model based on their individual contribution to its performance. Starting with an empty set of
features, the algorithm iteratively selects the most informative feature at each step, considering
its impact on the model's performance. This process continues until a stopping criterion, such as
reaching a desired level of performance or a predetermined number of features, is met.
Another approach is "backward elimination," where all features are initially included in the
model, and features are gradually eliminated based on their contribution or lack thereof. The
algorithm removes the least informative feature at each step, re-evaluates the model's
performance, and continues eliminating features until the stopping criterion is satisfied.

By employing these feature selection techniques guided by Occam's razor, machine learning
models can achieve better generalization, reduce overfitting, improve interpretability, and
optimize computational efficiency. Occam's razor helps to uncover the most relevant features
that capture the essence of the problem at hand, simplifying the model without sacrificing its
predictive capabilities.

Estimating generalization errors:

In machine learning, generalization error plays a crucial role in assessing the performance of a
predictive model. This metric measures how well a model performs on unseen data, which is
vital for ensuring the model is not just memorizing the training data but rather learning the
underlying patterns. A model that generalizes well can make accurate predictions on new,
previously unseen datasets, which is the ultimate goal of machine learning. Understanding
generalization error helps developers fine-tune their models and avoid problems such as
overfitting or underfitting, which can compromise the model’s predictive capabilities.

To analyze generalization error, different approaches can be utilized, including cross-validation

and the use of training and validation datasets. Cross-validation involves partitioning the data
into various subsets, allowing the model to train on one subset while validating its performance
on another. This iterative process produces a comprehensive evaluation of the model’s ability to
generalize beyond the training data. By closely monitoring the generalization error during the
training process, practitioners can make informed decisions about model complexity, feature
selection, and other vital parameters that influence model accuracy.

How to estimate generalization error

 Cross-validation
Split the data into multiple subsets, use each subset for testing while training on the remaining
subsets. This method can be used to estimate generalization error across different subsets of
data.
 Hold-out method
Split the data into a training set and a test set, train the model on the training set, and evaluate
the model on the test set.
 Covariance penalty
Use a covariance penalty to estimate generalization error. This method can be more accurate
than cross-validation.
Why estimate generalization error?
 Estimating generalization error helps identify problems like overfitting or underfitting.
 Estimating generalization error helps improve the model's performance.
 Estimating generalization error helps ensure the model performs effectively in real-world
applications.

Module - 1
No ratings yet
Module - 1
9 pages
ML Lecture Notes Unit-1
No ratings yet
ML Lecture Notes Unit-1
45 pages
Faculty Notes 2
No ratings yet
Faculty Notes 2
44 pages
Module1 Introduction
No ratings yet
Module1 Introduction
35 pages
Machinelearning Unit1
No ratings yet
Machinelearning Unit1
9 pages
ML Module 1 Final
No ratings yet
ML Module 1 Final
134 pages
Unit 1-2
No ratings yet
Unit 1-2
15 pages
ML Notes All
No ratings yet
ML Notes All
257 pages
E-Notes 33718 Content Document 20250325122736PM
No ratings yet
E-Notes 33718 Content Document 20250325122736PM
18 pages
MMC102 - Module 4 - Notes
No ratings yet
MMC102 - Module 4 - Notes
39 pages
Machine Learning Unit - 1
No ratings yet
Machine Learning Unit - 1
7 pages
ML Module2-Chapter 1
No ratings yet
ML Module2-Chapter 1
50 pages
(A) What Is Machine Learning? Explain The Impact of Various Machine Learning Techniques in Today's World
No ratings yet
(A) What Is Machine Learning? Explain The Impact of Various Machine Learning Techniques in Today's World
6 pages
ML - Part - A
No ratings yet
ML - Part - A
10 pages
Advance Concepts of Modeling in AI Class 10 Notes
No ratings yet
Advance Concepts of Modeling in AI Class 10 Notes
17 pages
Steps ML
No ratings yet
Steps ML
37 pages
Aiml Model
No ratings yet
Aiml Model
13 pages
ML Interactively
No ratings yet
ML Interactively
273 pages
ML Chapter 01
No ratings yet
ML Chapter 01
38 pages
Pa 2
No ratings yet
Pa 2
13 pages
Unit I 1
No ratings yet
Unit I 1
203 pages
Lecture Notes 1 2 Intro Python
No ratings yet
Lecture Notes 1 2 Intro Python
13 pages
Module2 ch2
No ratings yet
Module2 ch2
36 pages
Advance ML - Unit 1
No ratings yet
Advance ML - Unit 1
12 pages
Mlunit 1
No ratings yet
Mlunit 1
139 pages
ML Unit-1
No ratings yet
ML Unit-1
139 pages
Module 1
No ratings yet
Module 1
22 pages
Ad3501 DL Notes
No ratings yet
Ad3501 DL Notes
16 pages
My ML Notes
No ratings yet
My ML Notes
6 pages
ML Notes
No ratings yet
ML Notes
7 pages
Ass 2
No ratings yet
Ass 2
6 pages
Machine Learning
100% (1)
Machine Learning
90 pages
4.introductin To Machine Learning
No ratings yet
4.introductin To Machine Learning
28 pages
Computation Methods SLM Copy Uploaded 1742876667475
No ratings yet
Computation Methods SLM Copy Uploaded 1742876667475
81 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
19 pages
Module 3
No ratings yet
Module 3
15 pages
Machine 1
No ratings yet
Machine 1
14 pages
ML Module 1 (Bcs602)
No ratings yet
ML Module 1 (Bcs602)
48 pages
Roadmap ML....... DL
No ratings yet
Roadmap ML....... DL
7 pages
Module 1 Part - 1
No ratings yet
Module 1 Part - 1
42 pages
Machine Learning With Python
No ratings yet
Machine Learning With Python
6 pages
ML Unit-1
No ratings yet
ML Unit-1
64 pages
ML Unit-1
No ratings yet
ML Unit-1
12 pages
Week 12 Intro To DS and ML
No ratings yet
Week 12 Intro To DS and ML
67 pages
Zarantech - Intro To ML
No ratings yet
Zarantech - Intro To ML
105 pages
Karthik
No ratings yet
Karthik
10 pages
Chapter 1 - Introduction
No ratings yet
Chapter 1 - Introduction
17 pages
Machine Learning - ch1
No ratings yet
Machine Learning - ch1
46 pages
ML Systems & Data Science Guide
No ratings yet
ML Systems & Data Science Guide
26 pages
Chapter-1 ML Intro
No ratings yet
Chapter-1 ML Intro
36 pages
ML Unit-1
No ratings yet
ML Unit-1
15 pages
Machine Learning Module Overview
No ratings yet
Machine Learning Module Overview
29 pages
Ad3501 DL Unit 1
No ratings yet
Ad3501 DL Unit 1
7 pages
Data Science & AI Essentials
100% (1)
Data Science & AI Essentials
20 pages
AIML Module-2.2 Notes
No ratings yet
AIML Module-2.2 Notes
55 pages
ML ch1
No ratings yet
ML ch1
20 pages
MCA - ML Question Bank Answer
No ratings yet
MCA - ML Question Bank Answer
139 pages
ML Notes-1
No ratings yet
ML Notes-1
59 pages
Machine Learning Basics & Challenges
No ratings yet
Machine Learning Basics & Challenges
6 pages
60000/40000 Security Platforms: Release Notes
No ratings yet
60000/40000 Security Platforms: Release Notes
11 pages
Zero Trust Cloud Security Internship Report
No ratings yet
Zero Trust Cloud Security Internship Report
19 pages
Oracle SIH Data Model
No ratings yet
Oracle SIH Data Model
101 pages
Sony Camera Remote SDK Guide
No ratings yet
Sony Camera Remote SDK Guide
81 pages
Install Step7 Wincc v17 Enus
No ratings yet
Install Step7 Wincc v17 Enus
90 pages
EDPM Paper 1 2020
No ratings yet
EDPM Paper 1 2020
10 pages
Text
No ratings yet
Text
2 pages
M.tech Computer Science Thesis Topics
100% (3)
M.tech Computer Science Thesis Topics
4 pages
Student Database Management System
No ratings yet
Student Database Management System
8 pages
Orderbook Data For Make Good Orderbook Stratfy
No ratings yet
Orderbook Data For Make Good Orderbook Stratfy
10 pages
Report Assignment TIS3151 (The Nanobots)
No ratings yet
Report Assignment TIS3151 (The Nanobots)
21 pages
LAB HMIWeb Display Builder Advanced Scripting
No ratings yet
LAB HMIWeb Display Builder Advanced Scripting
52 pages
Automatic Peel Remover Machine
No ratings yet
Automatic Peel Remover Machine
24 pages
00-OSDU - Senior - Tournament - May 1
No ratings yet
00-OSDU - Senior - Tournament - May 1
2 pages
Iphone 16e - Apple (UK)
No ratings yet
Iphone 16e - Apple (UK)
1 page
Web3py Readthedocs Io en Stable
100% (1)
Web3py Readthedocs Io en Stable
244 pages
Computer Architecture Insights
No ratings yet
Computer Architecture Insights
35 pages
SAP EWM Putaway Storage Type Sequence
No ratings yet
SAP EWM Putaway Storage Type Sequence
4 pages
Ci500588j DataWarrior 2D Rubberband Scaling
No ratings yet
Ci500588j DataWarrior 2D Rubberband Scaling
15 pages
APRIL 2023 IT Passport Examination
No ratings yet
APRIL 2023 IT Passport Examination
19 pages
21ST Report
No ratings yet
21ST Report
18 pages
AD Airport Service (CAA) List Updated, Notes & Syllabus - Page 2 - CSS Forums
No ratings yet
AD Airport Service (CAA) List Updated, Notes & Syllabus - Page 2 - CSS Forums
6 pages
Fault Detection Localization Plan
No ratings yet
Fault Detection Localization Plan
4 pages
ch08 Unit3
No ratings yet
ch08 Unit3
56 pages
Education Experience: - Full Stack Developer Intern
No ratings yet
Education Experience: - Full Stack Developer Intern
1 page
SHRIYA S ANUR 2110426 - CIA 3 - PPT Submission
No ratings yet
SHRIYA S ANUR 2110426 - CIA 3 - PPT Submission
7 pages
Kaspersky Security Admin Guide
100% (1)
Kaspersky Security Admin Guide
47 pages
ULMA Packaging
No ratings yet
ULMA Packaging
4 pages
LXF - 254 - September 2019
No ratings yet
LXF - 254 - September 2019
100 pages
Amiks Karki LB6
No ratings yet
Amiks Karki LB6
15 pages

AML Unit-1

Uploaded by

AML Unit-1

Uploaded by

Data Representation: It refer to the techniques used to transform and present input data in a

One-Hot Encoding: Categorical variables, which represent discrete categories, need to be

Time Series Data

Combining Representations: In some cases, datasets may consist of a combination of

Key points about unstructured data:

.No. Data Mining Machine Learning

Teaches the computer to learn and understand

Huge databases with unstructured

machine learning algorithm can be used in the

5. human interference is more in it. No human effort required after design

It is used in web Search, spam filter, fraud

Data mining abstract from the data

Data mining is more of a research

9. Applied in limited area Can be used in vast area

Uncovering hidden patterns and Making accurate predictions or decisions based

11. Exploratory and descriptive Predictive and prescriptive

12. Historical data Historical and real-time data

Predictions, classifications, and

Clustering, association rule mining, Regression, classification, clustering, deep

Data cleaning, transformation, and Data cleaning, transformation, and feature

Can be used in a wide range of Primarily used in applications where prediction

Linear Algebra for Machine learning

o Better Graphic experience

Better Graphics Experience:

Creating better Machine Learning algorithms:

Relevant resources for machine learning:

Supervised Machine Learning

How Supervised Learning Works?

"Learning from observation":

Variance errors are either low or high-variance errors.

Computational learning theory (CoLT):is a foundational aspect of artificial intelligence

Importance of Computational Learning Theory

The importance of computational learning theory in machine learning can be summarized as

Key Concepts in Computational Learning Theory

1. Probably Approximately Correct (PAC) Learning

The VC dimension is a measure of the capacity of a statistical classification algorithm, defined as

Applications of Computational Learning Theory

Occam’s Razor in Machine Learning:

Regularized Objective = Loss + Regularization Term

Example: Uses of Occam’s Razor in Machine Learning

Estimating generalization errors:

To analyze generalization error, different approaches can be utilized, including cross-validation

How to estimate generalization error

You might also like