Unit 1
Unit 1
1.1Introduction
1.2Working, Features and Need
1.3Applications
1.4Life cycle
1.5Machine Learning Required Skills
1.6Difference between Data Science, Artificial Intelligence and Machine
Learning and Deep Learning
1.7Types of Machine Learning: Supervised, Unsupervised, and
Reinforcement Learning
1.8Key Concepts: Features, Labels, Models, Training, and Testing,
Overfitting, Underfitting.
Introduction-
Different types of phones data
Machine Learning (ML) is a branch of artificial intelligence (AI) that allows
computers to learn from experience (data) and make decisions without being
explicitly programmed to perform specific tasks. Instead of relying on a fixed
set of rules to handle every possible scenario, machine learning algorithms
analyze data, detect patterns, and make predictions or decisions based on
what they've learned. Over time, as these algorithms are exposed to more
data, they can adapt and improve, becoming better at their tasks through
experience.
The term "machine learning" was first introduced by Arthur Samuel in 1959.
For Example:
Working-
Machine learning uses a step-by-step approach to make accurate predictions,
where each step is essential. Here’s a breakdown of the process in simpler
terms:
1. Data Collection: Collecting data is the first and most important step. The
quality of the data largely affects the accuracy of the model’s
predictions. We can gather data from sources like APIs, websites, social
media, or use built-in datasets for learning. It’s important to use data
responsibly, respecting privacy and fairness.
2. Data Preprocessing: Before using data in a model, we clean it up. This
means removing duplicates, handling missing values, addressing outliers,
and standardizing the format. Good preprocessing helps prevent errors
and improves the model’s accuracy.
3. Model Training: Once the data is ready, we choose an algorithm that fits
our problem and use it to build the model. We usually split the data into
training and testing sets. Common algorithms include linear regression,
logistic regression, and decision trees. To get better results, we adjust
model settings, or "hyperparameters," using techniques like grid search
or random search.
Imagine baking cookies and trying to get the perfect batch. You might
adjust the baking time, oven temperature, or amount of ingredients,
which can impact the final taste and texture. Hyperparameters are similar
for machine learning models; they’re settings we tweak to try to get the
best results.
Besides these steps, we also visualize the model’s performance and predictions
to gain insights. For example, creating feature importance plots helps identify
which features most affect predictions, aiding in feature selection and
engineering.
Features-
Adaptability: ML models improve as they’re exposed to more data,
making them highly adaptable.
Scalability: ML can process large datasets, uncovering insights and
trends from large volumes of data that would be difficult for humans to
analyze.
Efficiency: Automates repetitive or complex tasks, saving time and
effort.
Need-
Rising Demand: Machine learning is increasingly in demand due to its
ability to handle tasks too complex for direct human intervention.
Data Processing Power: Humans can’t manually process the vast
amounts of data available today, so we rely on machine learning to
manage and interpret this data.
Automatic Learning from Data: Machine learning algorithms learn from
large datasets, automatically building models and making predictions
based on patterns in the data.
Efficiency and Cost Savings: Machine learning saves time and money by
automating tasks and reducing the need for manual intervention.
Performance Measurement: We use the cost function to evaluate how
well a machine learning model is performing and adjust as needed.
Real-World Applications: Machine learning is used in self-driving cars,
fraud detection, face recognition, and social media recommendations.
Industry Adoption: Companies like Netflix and Amazon use machine
learning to analyze user data, understand customer preferences, and
recommend products.
Key Benefits:
Rapid Growth in Data: Machine learning helps us manage and utilize the
vast amounts of data being produced.
Solving Complex Problems: It tackles complex tasks that are challenging
for humans.
Improving Decision-Making: Machine learning supports decision-making
across various sectors, including finance and healthcare.
Uncovering Patterns: It identifies hidden patterns and extracts valuable
information from data.
Applications-
Machine learning is a buzzword for today's technology, and it is growing very
rapidly day by day. We are using machine learning in our daily life even
without knowing it such as Google Maps, Google assistant, Alexa, etc. Below
are some most trending real-world applications of Machine Learning:
1. Image Recognition:
Image recognition is one of the most common applications of machine
learning. It is used to identify objects, persons, places, digital images, etc. The
popular use case of image recognition and face detection is, Automatic friend
tagging suggestion:
2. Speech Recognition
While using Google, we get an option of "Search by voice," it comes under
speech recognition, and it's a popular application of machine learning.
Speech recognition is a process of converting voice instructions into text, and it
is also known as "Speech to text", or "Computer speech recognition." At
present, machine learning algorithms are widely used by various applications
of speech recognition. Google assistant, Siri, Cortana, and Alexa are using
speech recognition technology to follow the voice instructions.
3. Traffic prediction:
If we want to visit a new place, we take help of Google Maps, which shows us
the correct path with the shortest route and predicts the traffic conditions.
o Real Time location of the vehicle form Google Map app and sensors
o Average time has taken on past days at the same time.
Everyone who is using Google Map is helping this app to make it better. It
takes information from the user and sends back to its database to improve the
performance.
4. Product recommendations:
Machine learning is widely used by various e-commerce and entertainment
companies such as Amazon, Netflix, etc., for product recommendation to the
user. Whenever we search for some product on Amazon, then we started
getting an advertisement for the same product while internet surfing on the
same browser and this is because of machine learning.
5. Self-driving cars:
One of the most exciting applications of machine learning is self-driving cars.
Machine learning plays a significant role in self-driving cars. Tesla, the most
popular car manufacturing company is working on self-driving car. It is using
unsupervised learning method to train the car models to detect people and
objects while driving.
6. Email Spam and Malware Filtering:
Whenever we receive a new email, it is filtered automatically as important,
normal, and spam. We always receive an important mail in our inbox with the
important symbol and spam emails in our spam box, and the technology
behind this is Machine learning. Below are some spam filters used by Gmail:
o Content Filter
o Header filter
o General blacklists filter
o Rules-based filters
o Permission filters
Some machine learning algorithms such as Multi-Layer Perceptron, Decision
tree, and Naïve Bayes classifier are used for email spam filtering and malware
detection.
These assistant record our voice instructions, send it over the server on a
cloud, and decode it using ML algorithms and act accordingly.
For each genuine transaction, the output is converted into some hash values,
and these values become the input for the next round. For each genuine
transaction, there is a specific pattern which gets change for the fraud
transaction hence, it detects it and makes our online transactions more secure.
9. Stock Market trading:
Machine learning is widely used in stock market trading. In the stock market,
there is always a risk of up and downs in shares, so for this machine
learning's long short term memory neural network is used for the prediction
of stock market trends.
1. Gathering Data
The first step is to gather data from various sources, like files, databases,
or the internet.
The quantity and quality of data greatly impact the accuracy of
predictions, so more data typically means better results.
2. Data Preparation
After gathering data, we organize it and prepare it for the next steps.
This includes putting the data in order and combining it into a single set,
ready for analysis.
3. Data Wrangling
4. Data Analysis
5. Model Training
6. Model Testing
After training, we test the model using new data to see how accurate it
is.
Testing helps measure how well the model performs and whether it
meets our requirements.
7. Deployment
Once the model works well, we deploy it in the real world, where it can
help solve practical problems.
Before finalizing, we verify that the model continues to perform well
with real-time data.
These steps ensure that the machine learning model is built, tested, and ready
for practical use. Each stage is essential for creating a model that meets project
needs and delivers accurate results.
Artificial
Machine Deep
Aspect Data Science Intelligence
Learning (ML) Learning
(AI)
Subset of ML
Branch of
that uses
Field focused on computer Subset of AI
neural
extracting science aimed focused on
networks
insights from at creating enabling
with
Definition data through intelligent machines to
multiple
analysis, systems that learn from data
layers for
statistics, and can simulate and improve
complex
computation human with experience
data
behavior
processing
To enable
machines to
learn and
To create To allow
To analyze and make
systems that machines to
interpret data to decisions on
Primary Goal can mimic learn from data
aid decision- complex
human and make
making data, often
intelligence predictions
involving
images or
text
Algorithms, Neural
logical Supervised, networks,
Statistics, data
reasoning, unsupervised, especially
Techniques mining, data
problem- and deep neural
Used cleaning,
solving, reinforcement networks
visualization
expert learning (CNNs,
systems RNNs)
Image
Robotics,
Customer Email filtering, recognition,
natural
segmentation, recommendation language
Example Use language
fraud detection, engines, translation,
Cases processing,
recommendation predictive self-driving
autonomous
systems maintenance car
vehicles
perception
Advanced
Statistics, Logic, Statistics,
math, deep
programming, programming, probability,
Skill learning
data wrangling, understandin programming,
Requirements frameworks
machine learning g of AI knowledge of ML
(TensorFlow,
basics algorithms algorithms
PyTorch)
High, due to
Varies; can be Moderate to
Moderate to intensive
Computational low for simple high, depending
high, depending training of
Power tasks, high for on algorithm
on data size deep neural
complex AI complexity
networks
Varies; includes
AI platforms Scikit-Learn, TensorFlow,
Key Python, R, SQL,
like IBM TensorFlow, Keras, Keras,
Tools/Libraries Pandas, Matplotlib
Watson, PyTorch PyTorch
Google AI
3. Reinforcement Learning
How It Works: The agent observes the current state of the environment
and decides on an action. Based on this action, the environment
changes, and the agent receives a reward or penalty. Through trial and
error, the agent learns the best actions to take in various situations to
maximize the total reward over time. Reinforcement learning involves
exploring various actions (exploration) and sticking to known successful
actions (exploitation).
Common Algorithms: Q-learning, Deep Q-Networks (DQN), Policy
Gradient Methods, and Actor-Critic Methods.
Example Use Cases:
o Game-playing AI: RL is commonly used to train AI systems in
games such as chess, Go, and video games, where the agent
learns strategies through self-play and competition, refining its
moves with each game.
o Autonomous Driving: Reinforcement learning helps self-driving
cars learn how to navigate roads by interacting with simulated or
real environments, receiving feedback based on safe driving and
rule adherence.
o Robotic Control: Robots learn to perform tasks, such as picking up
objects or walking, by receiving rewards or penalties based on the
success of their actions.
Key Concepts:
1. Features
For example, in a model predicting house prices, the features could include:
2. Labels
Labels represent the output variables the model is trying to predict. They are
also referred to as target variables or dependent variables. Labels are what the
model aims to learn to predict based on the input features.
In supervised learning, the model is trained using data where both the features
and labels are known. For example:
In the house price prediction model, the label would be the house price,
which the model tries to predict based on the features like square
footage, location, and number of bedrooms.
In a classification problem, the label could be a category such as "spam"
or "not spam" for an email classification model.
3. Models
Training Set: This is the portion of the data used to train the model. The
model learns patterns from this data by adjusting its internal parameters
(such as weights in a neural network or coefficients in a linear regression
model).
Testing Set: Once the model has been trained, it is tested on a separate
set of data that it has not seen before. This helps evaluate how well the
model can generalize to new, unseen data. The testing set gives an
indication of how the model might perform in real-world scenarios.
Train-test split: A simple random division of the data into two parts (e.g.,
80% for training, 20% for testing).
Cross-validation: A more robust method where the data is divided into
multiple folds, and the model is trained and tested multiple times, with
each fold serving as the testing set once.
5. Overfitting
Overfitting occurs when a model learns the details and noise in the training
data to such an extent that it negatively impacts its performance on new,
unseen data. It essentially memorizes the training data rather than
generalizing the underlying patterns.
The model is too complex (e.g., a very deep decision tree or a neural
network with too many layers).
There is insufficient training data, which leads to the model learning
irrelevant patterns that don't generalize.
Overfitting can be detected by comparing the performance of the model on
the training set and the testing set. If the model performs well on the training
set but poorly on the testing set, overfitting is likely.
6. Underfitting
The model is too simple (e.g., a linear model for data that has a non-
linear relationship).
The training data is not enough or lacks diversity.
There is not enough time for the model to learn from the data (e.g.,
insufficient training or too little data).