MACHINE LEARNING
MACHINE LEARNING
Machine Learning is a branch of artificial intelligence.
Capability to learn without being explicitly programmed.
It provides techniques to extract data and then appends various methods to learn from the
collected data and then with the help of some well-defined algorithms to be able to predict
future trends from the data.
Ex: Google search, Amazon, Netflix
Arthur Samuel first used the term "machine learning" in 1959.
Features of Machine Learning:
Machine learning uses data to detect various patterns in a given dataset.
It can learn from past data and improve automatically.
It is a data-driven technology.
Machine learning is much similar to data mining as it also deals with the huge amount of
the data.
Categories of Machine Learning
At a broad level, machine learning can be classified as:
1) Supervised Learning
       Supervised learning is a type of machine learning in which the algorithm is trained
       on the labeled dataset.
               In supervised learning, the algorithm is provided with input features and
       corresponding output labels, and it learns to generalize from this data to make
       predictions on new, unseen data.
       There are two main types of supervised learning:
        Regression:
       In regression task the algorithm learns to predict continuous values based on input
       features.
       Regression algorithms in machine learning are:
       Linear Regression, Polynomial Regression, Ridge Regression, Decision Tree
       Regression, Random Forest Regression, Support Vector Regression, etc
        Classification:
       In classification task the algorithm learns to assign input data to a specific category or
       class based on input features.
       The output labels in classification are discrete values.
       Classification algorithms can be binary, where the output is one of two possible
       classes, or multiclass, where the output can be one of several classes.
       The different Classification algorithms in machine learning are: Logistic Regression,
       Naive Bayes, Decision Tree, Support Vector Machine (SVM), K-Nearest Neighbors
       (KNN), etc
2. Unsupervised Machine Learning
               Unsupervised learning is a type of machine learning where the algorithm learns
       to recognize patterns in data without being explicitly trained using labeled examples.
       The goal of unsupervised learning is to discover the underlying structure or distribution
       in the data.
       There are two main types of unsupervised learning:
        Clustering:
       Clustering algorithms group similar data points together based on their
       characteristics. Some popular clustering algorithms include K-means, Hierarchical
       clustering.
        Dimensionality Reduction:
       Dimensionality reduction algorithms reduce the number of input variables in a dataset
       while preserving as much of the original information as possible.
       This is useful for reducing the complexity of a dataset and making it easier to
       visualize and analyze.
       Some popular dimensionality reduction algorithms include Principal Component
       Analysis (PCA), t-SNE, and Autoencoders.
Ex:
Applying a classification model to new data.
                                               ---
INTRODUCING SCIKIT-LEARN
        Scikit-learn (Sklearn) is the most useful and robust library for machine learning in
Python. It provides a selection of efficient tools for machine learning and statistical modeling
including classification, regression, clustering and dimensionality reduction via a consistence
interface in Python.
Data Representation in Scikit-Learn
Data as table
        The best way to represent data in Scikit-learn is in the form of tables. A table represents
a 2-D grid of data where rows represent the individual elements of the dataset and the columns
represents the quantities related to those individual elements.
Ex:
import seaborn as sns
iris = sns.load_dataset('iris')
iris.head()
        In general, we will refer to the rows of the matrix as samples, the number of rows as
n_samples, columns of the matrix as features, and the number of columns as n_features.
Data as Feature Matrix
        Features matrix may be defined as the table layout where information can be thought
of as a 2-D matrix. It is stored in a variable named X and assumed to be two dimensional with
shape [n_samples, n_features]. Mostly, it is contained in a NumPy array or a Pandas
DataFrame.
        The samples (rows) always refer to the individual objects and the features (columns)
always refer to the distinct observations that describe each sample in a quantitative manner.
Data as Target array
          Along with Features matrix, denoted by X, we also have target array. It is also called
label. It is denoted by y. The label or target array is usually one-dimensional having length
n_samples. Target array may have both the values, continuous numerical values and discrete
values.
                                                ---
SCIKIT-LEARN’S ESTIMATOR API
It is one of the main APIs implemented by Scikit-learn.
It provides a consistent interface for a wide range of ML applications.
That’s why all machine learning algorithms in Scikit-Learn are implemented via Estimator
API.
The Scikit-Learn API is designed with the following guiding principles
Consistency
       All objects share a common interface drawn from a limited set of methods, with
       consistent documentation.
Inspection
       All specified parameter values are exposed as public attributes.
Limited object hierarchy
       Only algorithms are represented by Python classes; datasets are represented in
       standard formats (NumPy arrays, Pandas DataFrames, SciPy sparse matrices) and
       parameter names use standard Python strings.
Composition
       As we know that, ML algorithms can be expressed as the sequence of many
       fundamental algorithms. Scikit-learn makes use of these fundamental algorithms
       whenever needed.
Sensible defaults
       When models require user-specified parameters, the library defines an appropriate
       default value.
Steps in using Estimator API
Step 1: Choose a class of model
In this first step, we need to choose a class of model. It can be done by importing the
appropriate Estimator class from Scikit-learn.
Step 2: Choose model hyperparameters
In this step, we need to choose class model hyperparameters. It can be done by instantiating
the class with desired values.
Step 3: Arranging the data
Next, we need to arrange the data into features matrix (X) and target vector(y).
Step 4: Model Fitting
Now, we need to fit the model to your data. It can be done by calling fit() method of the
model instance.
Step 5: Applying the model
After fitting the model, we can apply it to new data. For supervised learning, use predict()
method to predict the labels for unknown data. While for unsupervised learning, use predict()
or transform() to infer properties of the data.
Ex:    Supervised learning example: Simple linear regression
                                                  ---
FEATURE ENGINEERING
       Feature engineering is the process of transforming raw data into features that are
suitable for machine learning models. In other words, it is the process of selecting, extracting,
and transforming the most relevant features from the available data to build more accurate and
efficient machine learning models.
       The success of machine learning models heavily depends on the quality of the features
used to train them. Feature engineering involves a set of techniques that enable us to create
new features by combining or transforming the existing ones.
Categorical Features
       It transforms each categorical attribute into a numeric representation. Transforming
categorical data into numeric data is often called “categorical-column encoding”.
       One-hot encoding is the simplest and most basic categorical-column encoding method.
The idea is to have a unique binary number of multiple digits for each category. Hence, the
number of digits is the number of categories. The binary number has one digit as 1 and the rest
zeros, hence the name ‘one-hot.’
Text Features
       Another common need in feature engineering is to convert text to a set of representative
numerical values. One of the simplest methods of encoding data is by word counts: you take
each snippet of text, count the occurrences of each word within it, and put the results in a table.
Ex:
sample = ['problem of evil', 'evil queen', 'horizon problem']
Image Features
       Another common need is to suitably encode images for machine learning analysis.
The simplest approach is to use the pixel values.
Derived Features
         The goal of creating derived features is to improve the performance of machine learning
models by providing additional information or reducing noise in the data. Derived features are
useful in many machine-learning applications, including image recognition, natural language
processing, and financial analysis.
                                                  ---
NAIVE BAYES CLASSIFICATION
 Naive Bayes algorithm is a supervised learning algorithm, which is based on Bayes
      theorem and used for solving classification problems.
 It is mainly used in text classification that includes a high-dimensional training dataset.
 It is one of the simple and most effective classification algorithms which helps in
      building the fast machine learning models that can make quick predictions.
 It predicts on the basis of the probability of an object.
 Some popular examples of Naive Bayes Algorithm are spam filtration, classifying
      articles.
The Naive Bayes algorithm is comprised of two words Naive and Bayes:
Naive:
         It is called Naive because it assumes that the occurrence of a certain feature is
independent of the occurrence of other features.
Ex:      Such as if the fruit is identified on the bases of color, shape, and taste, then red,
         spherical, and sweet fruit is recognized as an apple.
Bayes:
         It is called Bayes because it depends on the principle of Bayes' Theorem.
Bayes' Theorem:
         Bayes' theorem is also known as Bayes' Rule or Bayes' law, which is used to determine
the probability of a hypothesis with prior knowledge. It depends on the conditional probability.
The formula for Bayes' theorem is given as:
Where,
P(A|B) is Posterior probability: Probability of hypothesis A on the observed event B.
P(B|A) is Likelihood probability: Probability of the evidence given that the probability of a
hypothesis is true.
P(A) is Prior Probability: Probability of hypothesis before observing the evidence.
P(B) is Marginal Probability: Probability of Evidence.
Types of Naïve Bayes Model:
Gaussian:
         The Gaussian model assumes that features follow a normal distribution. This means if
predictors take continuous values instead of discrete, then the model assumes that these values
are sampled from the Gaussian distribution.
Multinomial:
         The Multinomial Naive Bayes classifier is used when the data is multinomial
distributed. It is primarily used for document classification problems, it means a particular
document belongs to which category such as Sports, Politics, education, etc.
Steps to implement:
         Data Pre-processing step
         Fitting Naive Bayes to the Training set
         Predicting the test result
         Test accuracy of the result(Creation of Confusion matrix)
         Visualizing the test set result.
                                               ---
LINEAR REGRESSION
Linear regression is one of the easiest and most popular Machine Learning algorithms.
Linear regression algorithm shows a linear relationship between a dependent and one or more
independent variables.
Types of Linear Regression
Simple Linear Regression
         This is the simplest form of linear regression, and it involves only one independent
variable and one dependent variable. The equation for simple linear regression is.
y=\beta_{0}+\beta_{1}X
where:
         Y is the dependent variable
         X is the independent variable
         β0 is the intercept
         β1 is the slope
Ex:
import matplotlib.pyplot as plt
import seaborn as sns; sns.set()
import numpy as np
rng = np.random.RandomState(1)
x = 10 * rng.rand(50)
y = 2 * x - 5 + rng.randn(50)
from sklearn.linear_model import LinearRegression
model = LinearRegression(fit_intercept=True)
model.fit(x[:, np.newaxis], y)
xfit = np.linspace(0, 10, 1000)
yfit = model.predict(xfit[:, np.newaxis])
plt.scatter(x, y)
plt.plot(xfit, yfit);
Multiple Linear Regression
         This involves more than one independent variable and one dependent variable. The
equation for multiple linear regression is:
y=\beta_{0}+\beta_{1}X+\beta_{2}X+.........\beta_{n}X
where:
Y is the dependent variable
X1, X2, …, Xp are the independent variables
β0 is the intercept
β1, β2, …, βn are the slopes
Best Fit Line
         Our primary objective while using linear regression is to locate the best-fit line, which
implies that the error between the predicted and actual values should be kept to a minimum.
The best Fit Line equation provides a straight line that represents the relationship between the
dependent and independent variables. The slope of the line indicates how much the dependent
variable changes for a unit change in the independent variable(s).
                                                ---
DECISION TREES AND RANDOM FORESTS
Decision Trees Classification
        Decision Tree is a Supervised learning technique. It is mostly preferred for solving
classification problems. It is a tree-structured classifier, where internal nodes represent the
features of a dataset, branches represent the decision rules and each leaf node represents the
outcome.
        A decision tree simply asks a question, and based on the answer (Yes/No), it further
split the tree into subtrees.
        In a Decision tree, there are two nodes, which are the Decision Node and Leaf Node.
Decision nodes are used to make any decision and have multiple branches, whereas Leaf nodes
are the output of those decisions and do not contain any further branches.
Implementation of Decision Tree
       Data Pre-processing step
       Fitting a Decision-Tree algorithm to the Training set
       Predicting the test result
       Test accuracy of the result.
       Visualizing the test set result.
Random Forest
       Random Forest is a popular machine learning algorithm that belongs to the supervised
learning technique. It can be used for both Classification and Regression problems in ML.
       Random Forest is a classifier that contains a number of decision trees on various
subsets of the given dataset and takes the average to improve the predictive accuracy of that
dataset. The greater number of trees in the forest leads to higher accuracy.
Implementation of Random Forest Algorithm
      Data Pre-processing step
      Fitting the Random forest algorithm to the Training set
      Predicting the test result
      Test accuracy of the result (Creation of Confusion matrix)
      Visualizing the test set result.
                                              ---
PRINCIPAL COMPONENT ANALYSIS
       Principal component analysis is a fast and flexible unsupervised method for
dimensionality reduction in data. It is a statistical process that converts the observations of
correlated features into a set of linearly uncorrelated features. These new transformed
features are called the Principal Components. The number of these PCs are either equal to or
less than the original features present in the dataset.
Some properties of these principal components are given below:
 The principal component must be the linear combination of the original features.
 These components are orthogonal, i.e., the correlation between a pair of variables is zero.
 The importance of each component decreases when going to 1 to n, it means the 1 PC
    has the most importance, and n PC will have the least importance.
Steps for PCA algorithm
      Getting the dataset
      Representing data into a structure
      Standardizing the data
      Calculating the Covariance of Z
      Calculating the Eigen Values and Eigen Vectors
      Sorting the Eigen Vectors
      Calculating the new features Or Principal Components
      Remove less or unimportant features from the new dataset.
PCA as Noise Filtering
       When utilizing real-life data several factors can impact the data. One significant
element is noise. Data collection often presents opportunities for human error and the potential
for unreliable data collection tools leading to inaccuracies commonly referred to as noise. This
noise can present challenges in machine learning, as algorithms can misinterpret and generalize
from this noise.
       If a dataset has a high volume of noise, it can severely disrupt the whole data analysis.
Data scientists, often measure noise using a signal to noise ratio. Therefore, data scientists
must address and manage noise in their data science algorithms.
       PCA aims to eliminate damaged data from a signal or image utilizing preservative noise
while keeping the essential features. It's a geometric and statistical technique that lowers the
input signal data dimensionality by projecting it along different axes. In simple terms, you can
imagine projecting a point in the XY plane along the X-axis and subsequently removing the
noisy Y-axis. This process is known as "dimensionality reduction."
                                             ---
K-MEANS CLUSTERING
        K-Means Clustering is an Unsupervised Learning algorithm, which groups the
unlabeled dataset into different clusters. Here K defines the number of pre-defined
clusters that need to be created in the process.
        The algorithm takes the unlabeled dataset as input, divides the dataset into k-number
of clusters, and repeats the process until it does not find the best clusters. The value of k
should be predetermined in this algorithm.
The k-means clustering algorithm mainly performs two tasks:
 Determines the best value for K center points or centroids by an iterative process.
 Assigns each data point to its closest k-center. Those data points which are near to the
    particular k-center, create a cluster.
        Hence each cluster has datapoints with some commonalities, and it is away from other
clusters.
The working of the K-Means algorithm is explained in the below steps:
 Step-1: Select the number K to decide the number of clusters.
 Step-2: Select random K points or centroids. (It can be other from the input dataset).
 Step-3: Assign each data point to their closest centroid, which will form the predefined K
              clusters.
 Step-4: Calculate the variance and place a new centroid of each cluster.
 Step-5: Repeat the third steps, which means reassign each datapoint to the new closest
              centroid of each cluster.
 Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.
 Step-7: The model is ready.
                                             ---