KEMBAR78
Machine Learning With Python | PDF | Machine Learning | Cross Validation (Statistics)
0% found this document useful (0 votes)
12 views6 pages

Machine Learning With Python

This document describes Machine Learning and its different types, including supervised, unsupervised, and reinforcement learning. It also explains how to avoid overfitting through data retention and cross-validation. Additionally, it outlines the key steps to build an ML model: collecting, processing, and exploring data, training and evaluating algorithms, and using the model. Finally, it mentions that Python is popular for ML due to its abundant machine learning libraries.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views6 pages

Machine Learning With Python

This document describes Machine Learning and its different types, including supervised, unsupervised, and reinforcement learning. It also explains how to avoid overfitting through data retention and cross-validation. Additionally, it outlines the key steps to build an ML model: collecting, processing, and exploring data, training and evaluating algorithms, and using the model. Finally, it mentions that Python is popular for ML due to its abundant machine learning libraries.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Machine Learning with Python

Machine Learning

One of the fields of study that is gaining more popularity is within the
Computer science is machine learning. Many of
the services we use in our daily lives like Google, Gmail, Netflix, Spotify or
Amazon utilizes the tools provided by Machine Learning to achieve
an increasingly personalized service and thus achieve competitive advantages over its
rivals.

What is Machine Learning?

But what exactly is Machine Learning? Machine Learning is the design and
study of the computer tools that use past experience to make
future decisions; it is the study of programs that can learn from data. The
The fundamental objective of Machine Learning is to generalize, or induce a rule
unknown from examples where that rule is applied. The most typical example
where we can see the use of Machine Learning is in the filtering of spam emails or
spam. Through the observation of thousands of emails that have been marked
Previously as garbage, spam filters learn to classify new messages.
Machine Learning combines concepts and techniques from different areas of knowledge,
like mathematics, statistics, and computer science; for this reason, there is
many ways to learn the discipline.

Types of Machine Learning

Machine Learning has a wide range of applications, including engines of


search, medical diagnoses, fraud detection in credit card usage,
analysis of the stock market, classification of DNA sequences, recognition of
speech and written language, games and robotics. But in order to address each one of
These topics are crucial, first of all, to distinguish the different types of problems of
Machine Learning that we can encounter.
Supervised learning

In supervised learning problems, the algorithm is taught or trained based on


data that already comes labeled with the correct answer. The larger the set is
from data, the algorithm can learn about the topic. Once completed the
training, new data is provided, no longer with the labels of the answers
correct, and the learning algorithm uses the past experience it acquired during
the training phase to predict an outcome. This is similar to the method of
learning that is used in schools, where we are taught problems and the ways
to solve them, so that we can later apply the same methods in situations
similar.

Unsupervised learning

In unsupervised learning problems, the algorithm is trained using a


dataset that has no labels; in this case, it is never told to the
algorithm what the data represent. The idea is that the algorithm can find for itself
only patterns that help to understand the dataset. Unsupervised learning
it is similar to the method we use to learn to speak when we are babies, in a
At first we listen to our parents talking and we don’t understand anything; but as we go on
as we listen to thousands of conversations, our brain will start to form a
model about how language works and we will begin to recognize patterns and to
wait for certain sounds.

Reinforcement learning

In reinforcement learning problems, the algorithm learns by observing the world.


that surrounds him. His input information is the feedback he receives.
from the outside world in response to its actions. Therefore, the system learns to
trial-and-error basis. A good example of this type of learning can be found
in the games, where we try new strategies and we select and
perfecting those that help us win the game. As we go forward
gaining more practice, the cumulative effect of reinforcement on our actions
victorious will end up creating a winning strategy.
Overtraining

As we mentioned when we defined Machine Learning, the fundamental idea is


find patterns that we can generalize in order to then apply this generalization
about the cases that we have not yet observed and make predictions. But also
It may happen that during training we only discover coincidences in the data.
that resemble interesting patterns, but do not generalize. This is what is
known as overtraining or overfitting.

Overtraining is the tendency that most algorithms have


Machine Learning to adjust to very specific characteristics of the data from
training that has no causal relationship with the objective function we are
seeking to generalize. The most extreme example of an overfitted model is a
model that only memorizes the correct answers; this model when used with
data that has never been seen before will have a random performance, as it never achieved
generalize a pattern to predict.

How to avoid overtraining

As we mentioned earlier, all Machine Learning models have


tendency towards overtraining; this is why we must learn to live with it
same and try to take preventive measures to reduce it as much as possible. The two
Main strategies to deal with overtraining are: data retention and
cross-validation.

In the first case, the idea is to divide our dataset into one or more subsets.
of training and other evaluation sets. That is to say, we are not going to pass it
we will not give all our data to the algorithm during training, but we will retain one
part of the training data to perform an evaluation of the effectiveness of
model. With this, what we seek is to prevent the same data that we use for
train be the same ones we use to evaluate. In this way we will be able to
analyze more precisely how the model behaves as we provide more of it
let's train and be able to detect the critical point at which the model stops
generalize and begins to overfit the training data.
Cross-validation is a more sophisticated procedure than the previous one. Instead of
just to get a simple estimate of the effectiveness of generalization; the idea is
conduct a statistical analysis to obtain other measures of estimated performance,
such as the mean and variance, and thus understand how performance is expected
varies across different datasets. This variation is fundamental for the
evaluation of confidence in performance estimation. Cross-validation
It also makes better use of a limited dataset; since unlike the
simple division of the data into one for training and another for evaluation; the validation
Cruzada calculates its estimates on the entire dataset by conducting
of multiple divisions and systematic exchanges between training data and data
of evaluation.

Steps to build a machine learning model

Building a Machine Learning model is not just about using an algorithm


learning or using a Machine Learning library; rather, it is a whole process that
usually involves the following steps:

Collect the data. We can collect the data from many sources, we can
for example, extracting data from a website or obtaining data using an API or
from a database. We can also use other devices that collect the
data by us; or use data that is publicly available. The number of options
What we have to collect data is endless! This step seems obvious, but it is one of
those that bring the most complications and take the most time.

Preprocess the data. Once we have the data, we need to ensure that
it has the correct format to feed our learning algorithm. It is practically
inevitable to perform several preprocessing tasks before being able to use the
data. Likewise, this point is usually much simpler than the previous step.

Explore the data. Once we have the data and it is in the correct format,
we can perform a preliminary analysis to correct the cases of missing values or try
find at first glance some pattern in them that facilitates the construction of the
model. In this stage, statistical measures and the
graphs in 2 and 3 dimensions to get a visual idea of how our behave
data. At this point, we can detect outliers that we should discard; or
find the characteristics that have the most influence for making a prediction.

Train the algorithm. This is where we start to use Machine techniques.


Learning really. At this stage, we nourish the learning algorithm(s) with the
data that we have been processing in the previous stages. The idea is that the algorithms
they can extract useful information from the data we pass them to then be able to make
predictions.

Evaluate the algorithm. At this stage, we test the information or knowledge.


that the algorithm obtained from the training of the previous step. We assess how accurate
it is the algorithm in its predictions and if we are not very satisfied with its performance,
We can go back to the previous stage and continue training the algorithm by changing
some parameters until achieving an acceptable performance.

Use the model. In this last stage, we set our model to face the
real problem. Here we can also measure its performance, which may force us to
review all the previous steps.

Python libraries for machine learning

As I always like to mention, one of the great advantages that Python offers over
other programming languages; it is how large and prolific the community is
developers around him; a community that has contributed with a great variety of
first-level libraries that extend the functionalities of the language. In the case of
Machine Learning, the main libraries that we can use are:

Scikit-Learn

Scikit-learn is the main library available for working with Machine Learning, it includes
the implementation of a large number of learning algorithms. We can use it
for classifications, feature extraction, regressions, groupings, reduction
of dimensions, model selection, or preprocessing. It has an API that is
consistent across all models and integrates very well with the rest of the packages
scientists that Python offers. This library also facilitates evaluation tasks,
diagnosis and cross validations since it provides us with several factory methods
to be able to carry out these tasks in a very simple way.

Statsmodels

Statsmodels is another great library that focuses on statistical models and is used
mainly for predictive and exploratory analysis. Just like Scikit-learn, it also
it integrates very well with the other scientific packages of Python. If we want
fit linear models, conduct statistical analysis, or maybe a bit of modeling
predictive, then Statsmodels is the ideal library. The statistical tests it offers
They are quite broad and cover validation tasks for most cases.

PyMC

pyMC is a Python module that implements Bayesian statistical models,


including Monte Carlo Markov chain (MCMC). pyMC offers functionalities for
make Bayesian analysis as simple as possible. Include Bayesian models,
statistical distributions and diagnostic tools for covariance of the
models. If we want to perform a Bayesian analysis, this is undoubtedly the library to use.

NTLK

NLTK is the leading library for natural language processing or NLP for its acronym.
provides user-friendly interfaces to over 50 bodies and lexical resources,
like WordNet, along with a set of text processing libraries for the
classification, tokenization, labeling, analysis, and semantic reasoning.

Obviously, here I am only listing a few of the many libraries that exist in
Python to work with Machine Learning problems, I invite you to create your own
research on the topic.

You might also like