KEMBAR78
Report Machine Learning | PDF | Machine Learning | Time Complexity
0% found this document useful (0 votes)
278 views23 pages

Report Machine Learning

Uploaded by

sshuk8107
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
278 views23 pages

Report Machine Learning

Uploaded by

sshuk8107
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Introduction

Itis
Machine Learning is the science of getting computers to learn without being explicitly programmed.
In its
closely related to computational statistics, which focuses on making prediction using computer.

as analysis. Machine
application across business problems, machinc learning
is also referrcd predictive

the development of
Learning is
closcly related to computational statistics. Machine Learning focuses on

that can access data and use it to learn themsclves. The process of learning begins with
computer programs
oraer to 1oOK Tor patterns in data
Observalions or aata, sucn as exampies, airec experiencc, or nstruction, in

The primary aim is to allow


and make better decisions in the future based on the examples that we provide.

the computers learn without human intervention or assistance and adjust actions accordingly.
automatically

History of Machine Learning

The name machine learning was coined in 1959 by Arthur Samuel. Tom M. Mitchell provided a widely

the machine learning field: "A computer


quoted, more formal definition of the algorithms studied in

said learn from experience E with respect to some class of tasks T and performance
program is to

measure Pifits performance at tasks in T, as measured by P, improves with experience E."This


in which the question
follows Alan Turing's proposal in his paper "Computing Machinery and Intelligence",

we can
"Can machines think?" is replaced with the question "Can machines do what (as thinking entities)

that could be possessed by a thinking machine and the various


do?". In Turing's proposal the characteristics

implications in constructing one are exposed.

Types of Machine Learning

and
The types of machine learning algorithms differ in their approach, the type of data they input and output,

of task or problem that they are intended to solve. Broadly Machine Learning can be categorized
the type

into four categories.

1. Supervised Learning

I. Unsupervised Learning

Reinforcement Learning

IV. Semi-supervised Learning

of massive of data. While it generally delivers more


Machine-learning enables analysis quantities faster,

in order to or dangerous risks, it may also require additional


accurate results identify profitable opportunities

it
time and resources to train properly.
SupervisedLearning

Supervised Learning is a type of leaming in which we are given a data set and we already know what are

correct output should look like, having the idea that there is a relationship between the input and output.

Basically,it is learning task of leaming a function that maps an input to an output based on example
input

It infers a function from labeled data consisting of a set of training examples.


output pairs. training

Supervised lcaming problems are categorized

Unsupervised Learning

with little or no idea what


Unsupervised Leaming is a type of learning that allows us to approach problems

We can derive the the data based on a relationship


our problem should look like. structure by clustering

resut.
the variables With unsupervised there is no feedback based on prediction
among in data. learning

in data set
Basically.
it is a type of self-organized learning that helps in finding previously unknown patterns

without preexising
label

Reinforcement Learning

Reinforcement leaming is a learning method that interacts with its environment by producing actions and

discovers errors or rewards. Trial and error search and delayed reward are the most relevant characteristics

ofreinforcement learning. This method allows machines and software agents to automatically determine the

ideal behavior within a specifie


context in order to maximize is performanoe. Simple reward feedback is

required for the agent to which


learn action is best.

Semi-Supervised Learning

Semi-supervised learning fall somewhere in between supervised and unsupervised learning, since they use

both labeled and unlabeleddata for training -typicallya small amount of labeled data and a large amountof
unlabeled data. The systems that use this method are able to considerably improve learning accuracy.

Usually, semi-supervised learning is chosen when the acquired labeled data requires skilled and relevant

resources in order to train it / learn from it Otherwise, acquiring unlabeled data generally doesn't
require

additional resources.
Literature Survey

Theory
A of machine
core objective of.a learmer is to gencralize from its cxperienee. The computational analysis

branch of computer science known as


learming algorithms and their performance is a theoretical

are finitc and the futurc is unccrtain, learning theory


computational learning thcory. Because training sets

.
common. The is one way to quantify generalization
performance are quite bias-variance decomposition

emor
should match the
For the best perfomance in the context of generalization, the complexity of the hypothesis

than the function, then the


complexity of the funetion underlying the data. If the hypothesis is less complex

model has underfit the data. If the complexity of the model is increased in response, then the training error

and
complex, then the model subject to overfitting
But if the hypothesis is too is generalization
docreases.

will be poorer.
and of
In addition to performance bounds, learning theorists study the time complexity feasibility learning.

In a computation is considered feasible if it can be done in polynomial time.


computational learming theory,

There are wo kinds of time complexity result. Positive results show that a certain class of functions can be

show cannot be learmed in polynomial time.


leamed in polynomial time. Negative results that certain classes

The ChallengesFacing Machine Learning


While there has been much progress in machine learning, there are also challenges.
For example, the

mainstream machine technologies are black-box approaches, making us concerned about their
learning

potential
risks. To tackle this challenge, we may want to make machine learning more explainable and

controllable. As another example, the computational complexity of machine learning algorithms is usually

very high and we may want to invent lightweight algorithms or implementations. Furthermore, in many
domains such as physics. chemistry, biology. and social sciences, people usually seek elegantly sinple

the Schrödinger equation) to uncover the underlying laws behind various


equations (e.g., phenomena.

Machine learning takes much more time. You have to gather and prepare data, then train the algorithm.

There are much more uncertainties. That is why, while in traditional website or application development an
a machine
experienced team can estimate the time quite precisely, learning project used for example to

provide product recommendations can take much less or much more time than expected.
Why? Because

cven the best machine learning engineers don't know how the decp learning networks will behave when
analyzing different sets of data. It also means that the machine learning engineers and data scientists cannot

guarantee that the training process of a model can be replicated


Applications of Machine Learning
Machine learning is one of the most cxciting technologies that one would have ever come across. As it is

evident from the name, it


gives the computer that which makes it more similar to humans: The ability to

leam. Machine learning one would


is actively being used today. perhaps in many more places than expect
We probably use a learning algorithm dozen of time without even knowing it. Applications
of Machine

Leaninginclude:
Web Search Engine: One of the reasons why search engines like google, bing
ete work so well s

because the system has learnt how to rank pages through a complex learning algorithm.

Photo tagging Applications: Be it facebook or any other photo tagging the ability to tag
application,

makes that
friends it even more happening. It is all possible because of a face recognition algorithm

runs behind the application.

Spam Detector: Our mail agent like Gmail or Hotmail does a lot of hard work for us in classifying

the mails and moving the spam mails to spam folder. This is again achieved by a spam classifier
unning in the back end of mail application.

Future Scope
Future of Machine Learming is as vast as the limits of human mind. We can always keep learning, and

teaching the computers how to learn. And at the same time, wondering how some of the most complex

machine learming algorithms have been running in the back of our own mind so effortlessly all the time.

There is a bright future for machine learning. Companies like Google, Quora, and Facebook hire people with

machine learning. There is intense research in machine learning at the top universities in the world. The

global machine learning as a service market is rising expeditiously mainly due to the Internet revolution. The

process of connecting the world virtually has generated vast amountof data which is boosting the adoption

of machine learning solutions. Considering all these applications and dramatic improvements that ML has

brought us, it doesn't take a genius to realize that in coming future we will
definitely see more advanced

applications of ML, applications that will stretch the capabilities of machine learning to an unimaginable

level.
Organization of Training Workshop
Company Profile

DreamUny Education was created with a mission to create skilled software engineers for our country and

and the of
bridge the gap betwecn the of demanded
the world. It aims to quality skills by industry quality

skills impartcd by conventional institutes. With assessments, learning paths and courses authored by industry

across speed up release


experts, Dream Uny helps businesses and individuals benchmark expertise roles,

cycles and build reliable, secure products.

Obiectives
Main objectives of training were to learn:

How to determine and measure program complexity,

Pyhon Programming
ML Library Scikit, Numpy, Matplotlib, Pandas, Theano, TensorFlow

Statistical Math for the Algorithms.

Learning solve statistics and mathematical concepts.


to

Supervised and Unsupervised Learning

Classification and Regression

ML Algorithms
Machine Learning Programming and Use Cases.

Methodologies
There were several facilitation techniques used by the trainer which included question and answer,

brainstorming, group discussions, case study discussions and practical implementation of some of the topics

by trainees on flip charts and paper sheets. The multitude of training methodologies was utilized in order to

make sure all the participants get the whole concepts and they practice what they learn, because only

listening to the trainers can be forgotten, but what the trainees do by themselves they will never After
forget.

the post-tests were administered and the final course evaluation forms were filled in by the the
participants,

trainer cxpressed his closing remarks and reiterated the importance of the training for the trainees in their

daily activities and their readiness for applying the learnt concepts in their
assigned tasks. Certificates of
at the end.
completion were distributed among the participants
2.3MACHINE LEARNING:

Machine learning is a subset of artificial intelligence in the field of computer

science that often uses the ability to


statistical techniques to give computers

"learn" (ie., prògressively improve performance on a specific task) with data,

without being explicitly programmed

Machine learning is closely related to (and often overlaps with) computational

statistics, which also focuses on prediction-making through the use of computers.


It has strong ties to mathematical optimization, which delivers methods, theory
and application domains to the field.

Machine learning (ML) is a category of algorithm that allows software

applications to become more accurate in


predicting outcomes without being

explicitly programmed:The basic premise of machine learning is to build

algorithms that can receive input data and use statistical analysis to predict an
output while updating outputs as new data becomes available.[3]

2.3.1 How Machine Learning works?

Machine learning algorithms are often categorized as supervised or


unsupervised.

Supervised algorithms require a data scientist or data analyst with


machine
learning skills to
provide both input and desired in addition to
output, furnishing
feedback about the accuracy of predictions during algorithm training. Data
scientists determine which variables,or features, the
model should analyze and use to develop predictions.
Once training is complete,

the algorithm will apply what was learned to new data.

to be with desired outcome data.


algorithms do not need
trained
Unsupervised
data and
Instead, they use an iterative approach called deep learning to review

arin alareiskmr nllad renl

networks are used for more complex processing tasks than supervised learning

-to-text and natural language


systems, including image recognition, speech
millions of
generation. These neural networks work by combing through

of data and automatically often subtle correlations


examples training identifying

can use its bank of


between many variables. Once trained, the algorithm

associations to new data. These algorithms have only become feasible in


interpret

ofbig data, asthey amountsof training data.


the age require massive

2.3.2Advantages of Machine Learning

1. Trendsand Patterns Are Identified With Ease

MachineLeaming is adept at reviewing large volumes of data and identifying patterns

and trends that might not be apparent to a human. For instance, a machine learning

program may successfully pinpoint a causal relationship


between two events. This

makes the technology highly effective at data mining. particularly on a continual,

ongoing basis, as would be required for an algorithm.

2. Machine Learning ImprovesOver Time

Machine Learning technologytypically improves efficiency and accuracy over time

thanks to the ever-increasing amounts of data that are processed. This


gives the

algorithm or program more "experience," which can, in turn, be used to make better

decisions or predictions.

A great example of this


improvement
over time involves weather prediction models.

Predictions are made by looking at past weather patterns añd events; this data is then
used to determine what's most likely to occur in a scenario. The more
particular data

you have in your data set, the greater the accuracy of a given forecast. The same basic
concept is also true for algorithms that are used to make decisions or

recommendations.

3.Machine Learning Lets You Adapt Without Human Intervention

Machine Learning allows for instantaneous adaptation, without the necd for human
intervention.An excellent example of this can be found in security
and anti-virus

software programs, which leverage machine learning and Al technology to implement

filters and other safeguards in response to new threats.

These systems use data science to identify new threats and trends. Then, the Al

technology is used to implement the measures for neutralizing or


appropriate

protecting against that threat. Data Science has eliminated the gap between the time

when a new threat is identified and the time when a response is issued. This near-

immediate response is critical in a niche where bots, viruses, worms, hackers and other
cyber threats can impact thousands or even millions of people in minutes.

4.Automation

Machine Learning is a key component in technologies such as predictive analytics and

artificial
intelligence. The automated nature of Data Science means it can save time

and money, as developers and analysts are freed up to perform high-level tasks that a

computer simply cannot handle.

On the flip side, you have a computer running the show and that's something that is

certain to make any developer squirm with discomfort. For now, technology is

imperfect.Stil, there are workarounds. For instance, if you're


employing Data Science

technology in order to develop an algorithm, you might program the Data Science

interface so it just suggests improvements or changes that must be implemented by a

human.

This workaround adds a human gatekeeper to the equation, thereby eliminating the

potential for problems that can arise when a computer is in charge. After all, an

algorithm update that looks good on paper may not work


effectively when it's put

practice.
Various Python libraries used in the project

2.4Numpy

NumPy is
NumPy
the fundamental package for scientific computing with Python.
It contains among

other things:

a powerful N-dimensional array object

sophisticated (broadcasting)functions

tools for integrating C/IC+ and Fortran code

useful linear algebra, Fourier transform,and random numbercapabilities

Besides its obvious scientific uses,NumPy can also be used as an efficient multi-dimensional

container of generic data. can be defined. This allows NumPy to


Arbitrary data-types

seamlessly and speedily integrate with a wide variety of databases.

NumPy is licensed under the BSD license, enabling reuse with few restrictions. The core

functionality of NumPy is its "ND array", for n-dimensional array, data structure. These

arrays are strideviews on memory.In contrast to Python's built-in list data structure (which
despite the name, is a these arrays are homogeneously typed: all elements of
dynamic array),

a single array must be of the same type. NumPy has built-in support for memory
mappedarrays.

Here is some function that are defined in this NumPy Library.

1. zeros (shape [, dtype, order])


- Return a new array of given
shape and type, filled with
zeros.

2. array (object [, dtype, copy, order, lubok, ndim])-Create an array

3. as array (a [, dtype, order))- to an array.


Convert the input

4. As an array (a [, dtype, order|) -


Convert the input to an ND array, but pass ND array
subclasses through.
- values within a given
5. arange([start,] stop L, step,] [, dtype]) Return evenly spaced

interval.

6. linspace (start, stop , num, endpoint, ..])- Retun evenly spaced numbers
over a

interval.
specified

manr fimotinne uhirh arn 1neod tn nerfarm enenifiad nnarntinn m tho nivven
t thana

input values

2.5 Pandas

Pandas
easy-to
Pandas is an open-source, BSD-licensed Python library providing high-performance,
with
use data structures and data analysis tools for the Python programming language. Python

academic and commercial domains


Pandas is used in a wide range of fields including

etc. In this tutorial, we will learn the


including finance, economics, Statistics, analytics,

various features of Python Pandas and how to use them in practice.

The name Pandas is derived from the word Panel Data


-an Econometrics from

Multidimensional data.

in 2008, developer Wes McKinneystarted developing pandas when in need of high

flexible tool for analysis of data.


performance,

was majorly used for, data munging and preparation. It had very little
Prior to Pandas, Python

contribution towards data analysis. Pandas solved this problem. Using Pandas, we can

five steps in the processing and analysis of data, regardless of the origin
accomplish typical

load, manipulate, model, and analyze.


ofdata prepare,

a wide range of fields academic and commercial


Python with Pandas used
is in including

economics, Statistics, etc.


domains including finance, analytics,

Key Features of Pandas

.Fast and efficient DataFrame object with default and customized


indexing
2.6Matplotlib

matplKtlib 2 which produces quality figures in a


Matplotlib is a Python plotting library publication

of hardcopy formats and interactive environments across platforms. Matplotlib can be


variety

used the Python and IPython shells, the Jupyter notebook, web application
in Python scripts,

servers, and four graphical user interface toolkits.

A
and hard possible. You can generate plots,
Matplotlib tries to make easy things easy things

a few lines of code. For


bar charts, errorcharts, scatterplots, etc., with just
histograms, power spectra,

and thumbnail gallery.


examples,see the sample plots

when
the pyplot module provides a MATLAB-like interface, particularly
For simple plotting
of line styles, font
have full control
combined with IPython. For the power user, you

interface or via a set of functions


axes etc, via an object-oriented
properties, properties,

familiar to MATLAB users.


2.7Scikit-Learn

Oleain software machine learning library


for the Python
is a free
Scikit-leam (formerly scikits. Icarn)

and clustering algorithms


language. I features various classification, regression
programming
k-means and
random forests, gradient boosting,
including support vector machines,
and scientific libraries
with the Python numerical
DBSCAN, and is designed to interoperate

NumPy and SciPy.

as scikits.learn, a Google Summer of Code project by David


The scikit-learn project started

a "SciKit"(SciPy a separately-
Cournapeau. Its name stems from the notion that it is Toolkit),

extension to SciPy. The original codebase was later


developed and distributed third-party

Fabian Pedregosa, Gael Varoquaux, Alexandre


rewrittenby other developers.In 2010
of the project and made the
Gramfort and Vincent Michel, all from INRIA took leadership

on February the Ist 2010. Of the various scikits, scikit-learn as well as


first public release

were described as "well-maintained and popular" in November2012. As of


scikit-image

2018, scikit-learn is under active development.

Scikit-learn is largely written in Python, with some core algorithms written in Cython to

a Cython wrapper around


achieve performance. Support vector machines are implemented by
a wrapper around
support vector machines by
and linear similar
LIBSVM; logistic regression

LIBLINEAR. [10J

2.6.1 Advantages of using Scikit-Learn:


and interface to tons of different models.
provides a clean
consistent
.Scikit-learn

.Itprovides you with many options for cach model, but also chooses sensible defaults.

lts documentation is exceptional,


and it helps you to understand the models as well as

how to use them properly.

It is also actively being developed.


Chapter-3

Technology Implemented
Python-The New Generation Language
Python is a widely used gencral-purpose, high level programming language. It was initially designed by

Guido van Rossum in 1991 and developed by Python Software Foundation. It was mainly developed for an

in fewer lines
emphasis on code readability, and its syntax allows programmers to express concepts of code.

Pylhon is dynamically typed and garbage-collected.


It supports multiple programming paradigms, including

nmcedural. ohiect-oriented. and fimctional nrnprammin Pvthon is often descrihed as a "hatteries included"

language due to its comprehensive standard library.

Features

Interpreted

no separate compilation and execution steps like C/C++. It directly run the program
In Python there is

from the source code. Internally, Python converts the source code into an intermediate form called

bytecodes which is then translated into native language of specific computer to run it.

Platform Independent

Python programs can be developed and executed on the multiple operating system platform.Pyhon

can be used on Linux, Windows, Macintosh, Solaris and many more.


Multi- Paradigm

Python is a multi-paradigm programming language. Object-oriented programming and structured

programming are fully supported, and many of its features support functional programming and aspect-

oriented programming.

Simple
Python is a very simple language. It is a very easy to learn as it is closer to English language. In python

more emphasis is on the solution


to the problem rather than the syntax.

Rich Library Support

can help to do various things involving regular


Python standard library is very vast. It
expressions,

documentation generation, unit testing, threading, databases, web browsers, CGI, email, XML, HTML,
WAV files. cryptography, GUI and many more.
Free and Open Source

Firstly, Python is freely available. Secondly, it is open-source. This means that its source code is

available to the public. We can download it, change it, use it, and distribute it. This is called 1 LOSS

Source Software). As
the Python community, we're
(Free/Libre and Open all
headed toward one goal

an ever-bettering Python.
Why Python Isa Perfect Language for Machine Lcarning?

1. A great library ecosystem-

A great choice of libraries is one of the main reasons Python is the most popular programming

A by diferent sources
language used for Al. library is a module or a group of modules published

which include a pre-written piece of code that allows users to reach some functionality or perform

different actions. Python libraries provide base level items so developers don't have to code them

irom ine very Degnning every uine. IL requires conunuous uaa processing,
anu ryuon s iioraries

let us access, handle and transform data. 1These are some of the most widespread libraries you can use

for ML and Al:

o,Scikit-leam for handling


basic ML algorithms like clustering, linear and logistic regressions,

regression, classification, and others.

oPandasfor high-level data structures and analysis. It allows merging and filtering ofdata, as

well as gathering it from other external sources like Excel, for instance.

oKerasfordeep learning. It allows fast calculations and prototyping, as it uses the GPU in

addition to the CPU of the computer.

o TensorFlow for working with deep learning by setting up, training, and utilizing artificial

neural networks with massive datasets.

o
Matplotlib forcreating 2D plots, histograms, charts, and other forms of visualization.

o NLTK forworking with computational linguistics, natural language recognition, and

processing.

o Scikit-image forimage processing.


o PyBrainforneural networks, unsupervised and reinforcement learning.

o Caffe for deep learning that allows switching between the CPU and the GPU and processing

60+ mln images a day using a single NVIDIA K40 GPU.

o StatslModels forstatistical algorithms and data exploration.

In the PyPl repository,


we can discover and compare more python libraries.

2. A low entry barrier -


Working in the ML and Al industry means dealing with a bunch of data that we need to process in

the most convenient and effective way. The low entry barrier allows more data scientists to quickly

pick up Python and start using it for Al development without wasting too much effort into learning

the language. In addition to this, there's a lot of documentation available, and Python's communityis

always there to help out and give advice.


3. Flexibility-

Python for
machine leaming is a great choice, as this language is very flexible:

It offers an option to choose either to use OOPs or scripting.

can implement any


There's also noneed to recompile the source code, developers
changes and quickly see the results.

Prnorammers can comhine Pvthon and other lanouaoes to reach their onal

4. Good Visualization Options-

For Al developers, it's important to highlight that in artificial intelligence, deep learning, and

machine learning. it's vital to be able to data in a human-readable format. Libraries like
represent

Matplotlib allow data scientists to build charts, histograms,and plots for better data comprehension,

effective presentation, and visualization. Different application programming interfaces also simplify
the visualization process and make it easier to create clear reports.

5. CommunitySupport

It's always very helpful when there's strong community support built around the programming

language. Python is an open-source language which means that there's a bunch ofresources open for
programmers starting from beginners and ending with pros. A lot of Python documentation is

available online as well as in


Python communities and forums, where programmers and machine

learning developers discuss errors, solve problems, and


help each other out. Python programming
language is
absolutely free as is the variety of useful libraries and tools.

6. Growing Popularity-

As a result of the advantages discussed above,


Python is
becoming more and more popular among
data scientists. According to StackOverflow, the popularity of is
Python predicted to grow until
2020, at least. This meansit's
*easier to search for
developers and replace team players if
required.
Also, the cost of their work maybe not as high as when using a less
popular programminglanguage.
MachineLearning
Data Preprocessing, Analysis
algorithms don't work so well with
&
processing raw
Visualization
data. Before we can feed such

data to an ML algorithm, we must preprocess it. We must apply some transformations on it. With data

we convert a clean data To there are 7 techniques -


preprocessing, raw data into set.
perform data this,

1. Rescaling Data-

For data with attributes of varying scales, we can rescale atributes to possess the same scale. We rescale
attributes into the range 0 to I and call it normalization. We use the MinMaxScaler class from scikit-

learn. This gives us values between 0 and 1.

2. Standardizing Data -

With standardizing.we can take attributes with a Gaussian distribution and different means and standard

deviations and transform them into a standard Gaussian distribution with a mean of 0 and a standard

deviation of 1.

3. Normalizing Data -

In this task, we rescale each observation to a length of 1


(a unit norm). For this, we use the Normalizer

class.

4. BinarizingData -

Using a binary threshold, it is possible to transform our data by marking the values aboveit 1 and those

equal to or below it, 0. For this purpose, we use the Binarizerclass.

5. Mean Removal-
We can remove the mean from each feature to center it on zero.

6. One Hot Encoding-


When dealing with few and scattered numerical values, we may not need to store these. Then, we can

perform One Hot Encoding. For k distinct values, we can transform the feature into a k-dimensional

vector with one value of and 0 as the rest values.


I
7. Label Encoding-

Some labels can be words or numbers. Usually, training data is labelled with words to make it readable.

Label encoding converts word labels into numbers to let algorithms work on them.
Machine Learning Algorithms

There are many types of Machine Learning Algorithms specific to different use cascs. As we work with

two
datasets, a machinc leaming algorithm works in stages.
Weusually split the dataaround 20%-80%
we split a dataset into a data and test
betweentesting andraining stages. Under supervised learning, training

1. Lincar Regression-

one of observes continuous


Linear regresion is the supervised Machine learning algorithms in Python that

features and predicts an outcome. Depending on whether it runs on a single variable or on many features, we
can call it
simple linear regression or multiple linear regression.

This is one of the most popular Python ML algorithms and often under-appreciated.
It assigns optimal

weights to variables to create a line ax+b to predict the output. We often use linear regression to estimate

real values like a number of calls and costs of houses based on continuous variables. The regression line is

the best line that fits Y=a*X+b to denote a relationship between independent and dependent variables.

15

10
".
-20 -10 10 20 30 40 So 60

2. Logistic Regression

Logistic regression is a supervised classification is unique Machine Learning algorithms in Python that finds

its use in estimating discrete values like O/1, yes/no, and true/false. This is based on a given set of

independent variables. We use a logistic


function to predict
the
probability of an event and this gives us an

output between 0 and 1. Although it says 'regression', this is actually a classification algorithm. Logistic

regression fits data into a logit function and is also called logit regression.
sig()= --1.0-g().

.8-

-0.6

8 4 2
3. Decision Tree-

A decision tree falls under supervised Machine Learning Algorithms in Python and comes of use for both

This model takes an instance, traverses the


classification and regression- although mostly for classification.

and compares with a determined conditional statement. Whether it descends to the


tree, important features

closer to the
Usually, more important features are
left child branch or the right depends on the result. root.

Decision Tree, a Machine on both and continuous


Learming algorithm in Python can work categorical

dependent variables. Here, we split a population into two or more homogeneous sets. Tree models where the

variable can take a discrete set of values are called classification trees; in these tree structures, leaves
target

representclass labels and branches represent conjunctions of features that lead to those class labels. Decision

trees where the variable can take continuous values real numbers) are called regression
target (typically

trees.

4. Support Vector Machine (SVM)-


sVM is a supervised classification is one of the most important Machines Learning algorithms in Python,

that plots a line that divides different categories of your data. In this ML algorithm, we calculate the vector

to optimize the line. This is to ensure that the closest point in each group lies farthest from each other. While-

you will almost always find this to be a linear vector, it can be other than that. An SVM model is a

representation
of the examples as points
in space, mapped so that the examples
ofthe separate categories are
as wide as possible. In addition to linear SVMs can
divided by a clear gap that is performing classification,

efficiently perform a non-linear classification using what is called the kernel trick,
implicitly mapping their

feature data are not


inputs into high-dimensional spaces. When unlabeled, supervised learning is possible,

and an unsupervised learming approach is required, which attempts to find natúral of the data to
clustering

map new data to these formed groups.


groups, and then
.
O.0 a5 10 15 2.0 2.5 3.0

5. Nave BayesAlgorithm-
Naive Bayes is a classification method which is based on Bayes' theorem. This assumes independence

between predictors. A Naive Bayes classifier will assume that a feature in a class is unrelated to any other.

Consider a fruit. This is an apple if it is round, red. and 2.5 inches in diameter. A Naive Bayes classifier will

say these characteristics independently contribute to the probability of the fruit being an apple. This is even

if features depend on each other. For very large data sets, it is casy to build a Naive Bayesian model. Not

only is this model very simple, it performs better than many highly sophisticated classification methods.

Naive Bayes classifiers are highly scalable, requiring a number of parameters linear in the number of

variables (features/predictors) in a learning problem. Maximum-likelihood training can be done by

evaluating a closed-fom expression. which takes linear time, rather than by expensive iterative

approximation as used for many other types of classifiers.

Class Prior Probabity


Likelihood

Plelx)-Plalc)P)
Posterior
P) Predictor Prior Probablty
Probability

P(c X)= P(t,l¢)x P(a;le)xx-P(. lc)x P(¢)


6.kNN Algorithm

This is a Python Machine Learning algorithm for classification and regression- mostly for classification.

This is a supervised learning algorithm that considers different centroids and uses a usually Euclidean

function to comparedistance. Then, it analyzes the results and classifies each point to the group to optimize

it to place with all closest points to it. It classifies new cases using a majority vote of k of its neighbors. The

case it to a class is the one most common among its K nearest neighbors. For this, it uses a distance
assigns

approximated locally and all computation is deferred until classification. k-NN is a special case of a

variable-bandwidth, kernel density "balloon" estimator with a uniform kernel.

7. K-Means Algorithm-

k-Means is an unsupervised algorithm that solves the problem of clustering. It classifies data using a number
of clusters. The data points inside a class are homogeneous and heterogeneous to peer groups. k-means

clustering is a method of vector quantization, originally from signal processing, that is popular for cluster

analysis in data mining. k-means clustering aims to partitíon n observations into k clusters in which each

observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. k-means

clustering is rather easy to apply to even large data sets, particularly


when using heuristics such as Lloyd's

algorithm. It often is used as a preprocessing step for other algorithms, for example to find a starting

configuration. The problem is computationally difficult (NP-hard). k-means originates from signal

processing, and still finds use in this domain. In cluster analysis, the k-means algorithm can be used to

partition the input data set into k partitions (clusters). k-means clustering has been used as a feature learning

(or dictionary learning) step, in either (semi-)supervised learning or unsupervised leaming.


8. Random Forest -

A random forest is an ensemble of decision trees. In order to classify every new object based on its

atributes. trees vote for class- each tree provides a classification. The classification with the most votes wins

in the forest. Random forests or random decision forests are an ensemble learning method for classification,

regression and other tasks that operates by constructing a multitude of decision trees at training time and

Outputting the class that is the mode of the classes (classification) or mean prediction (regression)of the

individual trees.

X dataset

N, feature N, feature N. ieature N. feature

Cla N LtSs M Class N

MAJORITY VOTING

FINAL CLASS

You might also like