Introduction to Machine
learning
What is Machine Learning (ML) ?
•   The term Machine Learning was coined by Arthur Samuel in 1959, an American pioneer in the field of
    computer gaming and artificial intelligence, and stated that “it gives computers the ability to learn without
    being explicitly programmed”.
    And in 1997, Tom Mitchell gave a “well-posed” mathematical and relational definition that “A computer
program is said to learn from experience E with respect to some task T and some performance measure P, if its
performance on T, as measured by P, improves with experience E.
•   Machine learning (ML) is the study of computer algorithms that can improve automatically through
    experience and by the use of data. It is seen as a part of artificial intelligence.
•   Machine learning algorithms build a model based on sample data, known as training data, in order to make
    predictions or decisions without being explicitly programmed to do so.
•   Machine learning algorithms are used in a wide variety of applications, such as in medicine, email
    filtering, speech recognition, and computer vision, where it is difficult or unfeasible to develop conventional
    algorithms to perform the needed tasks.
Machine learning approaches are traditionally divided into three broad categories, depending on the nature
of the "signal" or "feedback" available to the learning system:
1. Supervised learning: The computer is presented with example inputs and their desired outputs, given by
   a "teacher", and the goal is to learn a general rule that maps inputs to outputs.
2. Unsupervised learning: No labels are given to the learning algorithm, leaving it on its own to find
structure in its input. Unsupervised learning can be a goal in itself (discovering hidden patterns in data) or a
means towards an end (feature learning).
3. Reinforcement learning: A computer program interacts with a dynamic environment in which it must
perform a certain goal (such as driving a vehicle or playing a game against an opponent). As it navigates its
problem space, the program is provided feedback that's analogous to rewards, which it tries to maximize.
Supervised Learning:
Within supervised machine learning we further categorize problems into the following categorizes:
Classification
1. A classification problem is a problem where we are using data to predict which category something
    falls into. An example of a classification problem could be analyzing a image to determine if it
    contains a car or a person, or analyzing medical data to determine if a certain person is in a high risk
    group for a certain disease or not. In other words we are trying to use data to make a prediction about a
    discrete set of values or categorizes.
Examples of algorithms used for supervised classifications problems are:
•Naive Bayes Classifier
•Supper Vector Machines
•Logistic Regression
•Neural Networks
2. Regression
Regression problems on the other hand are problems where we try to make a prediction on a continuous
scale. Examples could be predicting the stock price of a company or predicting the temperature tomorrow
based on historical data.
Examples of algorithms use for supervised regression problems are:
•Linear Regression
•Nonlinear Regression
•Bayesian Linear Regression
Unsupervised Learning:
As mentioned above unsupervised machine learning problems are problems where we have little or no idea
about the results should look like. We are basically providing the machine learning algorithms with data and
asking it algorithm to look for hidden features of data and cluster the data in a way that makes sense based on
the data.
Examples of algorithms used for unsupervised machine learning problems are:
•K-means clustering
•Neural Networks
•Principal component analysis
Reinforcement Learning:
•   Reinforcement Learning is a feedback-based Machine learning technique in which an agent learns to
    behave in an environment by performing the actions and seeing the results of actions. For each good
    action, the agent gets positive feedback, and for each bad action, the agent gets negative feedback or
    penalty.
•   In Reinforcement Learning, the agent learns automatically using feedbacks without any labeled data,
    unlike supervised learning.
•   Since there is no labeled data, so the agent is bound to learn by its experience only.
•   RL solves a specific type of problem where decision making is sequential, and the goal is long-term, such
    as game-playing, robotics, etc.
•   The agent interacts with the environment and explores it by itself. The primary goal of an agent in
    reinforcement learning is to improve the performance by getting the maximum positive rewards.
•   The agent learns with the process of hit and trial, and based on the experience, it learns to perform the
    task in a better way. Hence, we can say that "Reinforcement learning is a type of machine learning
    method where an intelligent agent (computer program) interacts with the environment and learns
    to act within that." How a Robotic dog learns the movement of his arms is an example of Reinforcement
    learning.
•   It is a core part of Artificial intelligence, and all AI agent works on the concept of reinforcement learning.
    Here we do not need to pre-program the agent, as it learns from its own experience without any human
    intervention.
•Example: Suppose there is an AI agent present within a maze environment, and his goal is to find the diamond.
The agent interacts with the environment by performing some actions, and based on those actions, the state of
the agent gets changed, and it also receives a reward or penalty as feedback.
•The agent continues doing these three things (take action, change state/remain in the same state, and get
feedback), and by doing these actions, he learns and explores the environment.
•The agent learns that what actions lead to positive feedback or rewards and what actions lead to negative
feedback penalty. As a positive reward, the agent gets a positive point, and as a penalty, it gets a negative point.
Phases of ML
1. Gathering Data
2. Data preparation
3. Data Wrangling
4. Analyse Data
5. Train the model
6. Test the model
7. Deployment
1. Gathering Data:
Data Gathering is the first step of the machine learning life cycle. The goal of this step is to identify and obtain
all data-related problems.
In this step, we need to identify the different data sources, as data can be collected from various sources such
as files, database, internet, or mobile devices. It is one of the most important steps of the life cycle. The
quantity and quality of the collected data will determine the efficiency of the output. The more will be the
data, the more accurate will be the prediction.
This step includes the below tasks:
•Identify various data sources
•Collect data
•Integrate the data obtained from different sources
By performing the above task, we get a coherent set of data, also called as a dataset. It will be used in further
steps.
2. Data preparation:
After collecting the data, we need to prepare it for further steps. Data preparation is a step where we put our data into a
suitable place and prepare it to use in our machine learning training.
In this step, first, we put all data together, and then randomize the ordering of data.
This step can be further divided into two processes:
•Data exploration:
It is used to understand the nature of data that we have to work with. We need to understand the characteristics, format, and
quality of data.
A better understanding of data leads to an effective outcome. In this, we find Correlations, general trends, and outliers.
•Data pre-processing:
Now the next step is preprocessing of data for its analysis.
3. Data Wrangling:
Data wrangling is the process of cleaning and converting raw data into a useable format. It is the process of cleaning the
data, selecting the variable to use, and transforming the data in a proper format to make it more suitable for analysis in the
next step. It is one of the most important steps of the complete process. Cleaning of data is required to address the quality
issues.
It is not necessary that data we have collected is always of our use as some of the data may not be useful. In real-world
applications, collected data may have various issues, including:
•Missing Values
•Duplicate data
•Invalid data
•Noise
So, we use various filtering techniques to clean the data.
It is mandatory to detect and remove the above issues because it can negatively affect the quality of the outcome.
4. Data Analysis:
Now the cleaned and prepared data is passed on to the analysis step. This step involves:
•Selection of analytical techniques
•Building models
•Review the result
The aim of this step is to build a machine learning model to analyze the data using various analytical
techniques and review the outcome. It starts with the determination of the type of the problems, where we
select the machine learning techniques such as Classification, Regression, Cluster analysis, Association, etc.
then build the model using prepared data, and evaluate the model.
Hence, in this step, we take the data and use machine learning algorithms to build the model.
5. Train Model:
Now the next step is to train the model, in this step we train our model to improve its performance for better
outcome of the problem.
We use datasets to train the model using various machine learning algorithms. Training a model is required so
that it can understand the various patterns, rules, and, features.
6. Test Model:
Once our machine learning model has been trained on a given dataset, then we test the model. In this step, we
check for the accuracy of our model by providing a test dataset to it.
Testing the model determines the percentage accuracy of the model as per the requirement of project or
problem.
7. Deployment:
The last step of machine learning life cycle is deployment, where we deploy the model in the real-world system.
If the above-prepared model is producing an accurate result as per our requirement with acceptable speed, then
we deploy the model in the real system. But before deploying the project, we will check whether it is improving
its performance using available data or not. The deployment phase is similar to making the final report for a
project.
Well posed learning problems:
Well Posed/defined Learning Problem: A computer program is said to learn from experience E with respect
to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P,
improves with experience E.
To have a well-defined learning problem, three features needs to be identified:
1. The class of tasks(T)
2. The measure of performance to be improved(P)
3. The source of experience(E)
Certain examples that efficiently defines the well-posed learning problem are –
1. Handwriting Recognition Problem
•Task – Acknowledging handwritten words within portrayal
•Performance Measure – percent of words accurately classified
•Experience – a directory of handwritten words with given classifications
2. Fruit Prediction Problem
•Task – forecasting different fruits for recognition
•Performance Measure – able to predict maximum variety of fruits
•Experience – training machine with the largest datasets of fruits images
3. Face Recognition Problem
•Task – predicting different types of faces
•Performance Measure – able to predict maximum types of faces
•Experience – training machine with maximum amount of datasets of different face images
Challenges in Machine learning:
1. Poor Quality of Data
• Data plays a significant role in the machine learning process. One of the significant issues that machine
   learning professionals face is the absence of good quality data. Unclean and noisy data can make the whole
   process extremely exhausting.
• Therefore, we need to ensure that the process of data preprocessing which includes removing outliers,
   filtering missing values, and removing unwanted features, is done with the utmost level of perfection.
2. Underfitting of Training Data
• This process occurs when data is unable to establish an accurate relationship between input and output
   variables. It simply means trying to fit in undersized jeans. It signifies the data is too simple to establish a
   precise relationship. To overcome this issue:
     1. Maximize the training time
     2. Enhance the complexity of the model
     3. Add more features to the data
     4. Reduce regular parameters
     5. Increasing the training time of model
3. Overfitting of Training Data
    Overfitting refers to a machine learning model trained with a massive amount of data that negatively affect its
performance.
 We can tackle this issue by:
•Analyzing the data with the utmost level of perfection
•Use data augmentation technique
•Remove outliers in the training set
•Select a model with lesser features
4. Machine Learning is a Complex Process
• The machine learning industry is young and is continuously changing. Rapid hit and trial experiments are being carried
  on. The process is transforming, and hence there are high chances of error which makes the learning complex.
• It includes analyzing the data, removing data bias, training data, applying complex mathematical calculations, and a lot
  more. Hence it is a really complicated process which is another big challenge for Machine learning professionals.
5. Lack of Training Data
• The most important task you need to do in the machine learning process is to train the data to achieve an
  accurate output.
• Less amount training data will produce inaccurate or too biased predictions.
• Let us understand this with the help of an example. Consider a machine learning algorithm similar to
  training a child. One day you decided to explain to a child how to distinguish between an apple and a
  watermelon. You will take an apple and a watermelon and show him the difference between both based on
  their color, shape, and taste. In this way, soon, he will attain perfection in differentiating between the two.
• But on the other hand, a machine-learning algorithm needs a lot of data to distinguish. For complex
  problems, it may even require millions of data to be trained. Therefore we need to ensure that Machine
  learning algorithms are trained with sufficient amounts of data.
6. Slow Implementation
This is one of the common issues faced by machine learning professionals. The machine learning models are
highly efficient in providing accurate results, but it takes a tremendous amount of time. Slow programs, data
overload, and excessive requirements usually take a lot of time to provide accurate results. Further, it requires
constant monitoring and maintenance to deliver the best output.
7. Imperfections in the Algorithm When Data Grows
• So you have found quality data, trained it amazingly, and the predictions are really concise and accurate.
  Yay, you have learned how to create a machine learning algorithm!!
• But wait, there is a twist; the model may become useless in the future as data grows. The best model of the
  present may become inaccurate in the coming Future and require further rearrangement.
•    So you need regular monitoring and maintenance to keep the algorithm working. This is one of the most
    exhausting issues faced by machine learning professionals.
CLASSIFICATION:
What is Classification In Machine Learning
Classification is a process of categorizing a given set of data into classes, It can be performed on both
structured or unstructured data. The process starts with predicting the class of given data points. The classes
are often referred to as target, label or categories.
The classification predictive modeling is the task of approximating the mapping function from input variables
to discrete output variables. The main goal is to identify which class/category the new data will fall into.
There are perhaps four main types of classification tasks that you may encounter; they are:
• Binary Classification
• Multi-Class Classification
• Multi-Label Classification
Binary Classification:
Binary classification is the task of classifying the elements of a set into two groups (each called class) on the basis of
a classification rule. Typical binary classification problems include:
•Email spam detection (spam or not).
•Churn prediction (churn or not).
•Conversion prediction (buy or not).
Typically, binary classification tasks involve one class that is the normal state and another class that is the abnormal
state.
For example “not spam” is the normal state and “spam” is the abnormal state. Another example is “cancer not detected”
is the normal state of a task that involves a medical test and “cancer detected” is the abnormal state.
The class for the normal state is assigned the class label 0 and the class with the abnormal state is assigned the class
label 1.
Popular algorithms that can be used for binary classification include:
• Logistic Regression
• k-Nearest Neighbors
• Decision Trees
• Support Vector Machine
• Naive Bayes
Multiclass Classification:
Multi-class classification refers to those classification tasks that have more than two class labels.
Examples of multi-class classification are:
•classification of news in different categories,
•classifying books according to the subject,
•classifying students according to their streams etc.
•Unlike binary classification, multi-class classification does not have the notion of normal and abnormal outcomes.
Instead, examples are classified as belonging to one among a range of known classes.
•The number of class labels may be very large on some problems. For example, a model may predict a photo as belonging
to one among thousands or tens of thousands of faces in a face recognition system.
Many algorithms used for binary classification can be used for multi-class classification.
Popular algorithms that can be used for multi-class classification include:
• k-Nearest Neighbors.
• Decision Trees.
• Naive Bayes.
• Random Forest.
• Gradient Boosting.
Binary vs Multiclass Classification
Decision Tree Classification Algorithm
•Decision Tree is a Supervised learning technique that can be used for both classification and Regression problems, but
mostly it is preferred for solving Classification problems. It is a tree-structured classifier, where internal nodes represent
the features of a dataset, branches represent the decision rules and each leaf node represents the outcome.
•In a Decision tree, there are two nodes, which are the Decision Node and Leaf Node. Decision nodes are used to make any
decision and have multiple branches, whereas Leaf nodes are the output of those decisions and do not contain any further
branches.
•The decisions or the test are performed on the basis of features of the given dataset.
•It is a graphical representation for getting all the possible solutions to a problem/decision based on given conditions.
•It is called a decision tree because, similar to a tree, it starts with the root node, which expands on further branches and
constructs a tree-like structure.
•In order to build a tree, we use the CART algorithm, which stands for Classification and Regression Tree algorithm.
•A decision tree simply asks a question, and based on the answer (Yes/No), it further split the tree into subtrees.
Below diagram explains the general structure of a decision tree:
How does the Decision Tree algorithm Work?
• Step-1: Begin the tree with the root node, says S, which contains the complete dataset.
• Step-2: Find the best attribute in the dataset using Attribute Selection Measure (ASM).
• Step-3: Divide the S into subsets that contains possible values for the best attributes.
• Step-4: Generate the decision tree node, which contains the best attribute.
• Step-5: Recursively make new decision trees using the subsets of the dataset created in step -3. Continue this process
  until a stage is reached where you cannot further classify the nodes and called the final node as a leaf node.
Example:
Suppose there is a candidate who has a job offer and wants to decide whether he should accept the offer or Not. So, to
solve this problem, the decision tree starts with the root node (Salary attribute by ASM). The root node splits further into
the next decision node (distance from the office) and one leaf node based on the corresponding labels. The next decision
node further gets split into one decision node (Cab facility) and one leaf node. Finally, the decision node splits into two
leaf nodes (Accepted offers and Declined offer). Consider the below diagram: