SageMaker
Built-in
Algorithms
Cheat
Sheet
Linear Learner
Learns a linear function / linear threshold
function, and maps a high-dimensional vector x to
an approximation of the numeric label y.
● Regression
● Binary / multiclass classification
Use cases
● Predict quantitative value based on given
numeric input
○ Estimate this year’s ROI, based on last 5
years ROI
● Discrete binary classification problems
○ Based on past customer response, should I
mail this customer or not?
● Discrete muticlass classification problems
○ Based on past customer response, how
should I reach the customer? Email, DM or
a phone call?
Factorization Machines
Captures interaction between features within
high dimensional sparse datasets.
● Regression
● Binary classification
Use cases
● High dimensional sparse datasets
○ Use given known information about the
person viewing the page based on click-
stream data to calculate which ad user will
click on
● Recommendation engine
○ What to recommend based on user’s
history?
K-Nearest Neighbors
(KNN)
Regression: finds K closest points to the sample
point and returns the average of the feature
values. Classification: queries K points closest to
the sample point and returns most frequently
used label as the predicted label.
● Regression
● Classification
Use cases
● Credit ratings
○ Group people together to credit risk based
on attributes of known credit usage they
share with others
● Recommendation engine
○ Find recommendations based on similar
likes
XGBoost
Predicts a target variable by combining an
ensemble of estimates from a set of simpler and
weaker models.
● Regression
● Classification
● Ranking
Use cases
● Fraud detection
○ Map input transaction to the probability
that it is fraudulent based on dataset of
past transactions and information if they
were fraudulent
● Ranking
○ Return relevance scores for searched
products in a e-commerce system based on
search results, clicks, and past purchases
K-Means
Finds discrete groupings within data, where
members of a group are as similar as possible to
one another and as different as possible from
members of other groups. Euclidean distance
between these points represents similarity of
observations.
● Clustering
Use cases
● Group similar objects/data together
○ Find high-, medium-, and low-spending
customers from their transaction histories
● Handwriting recognition
● Analog audio classification
Random Cut Forest
Detects anomalous data points within a data set
and associates an anomaly score with each data
point. Low score values indicate that the data
point is considered "normal." High values indicate
the presence of an anomaly in the data.
● Anomaly detection
Use cases
● Fraud detection
○ Detect suspicious financial transaction by
unusual amount / time / location and flag
it for a closer look
● Quality control
○ Analyze an audio test pattern played by a
high-end speaker system for any unusual
frequencies
Image Classification
Takes an image as input and outputs one or more
labels assigned to that image. Uses a
convolutional neural network (CNN) that can be
trained from scratch or trained using transfer
learning when a large number of training images
are not available.
● Multi-label classification
Use cases
● Label/tag an image based on the content of
the image
○ Alert about adult content in an image
Object Detection
Takes images as input and identifies all instances
of objects within the image scene. The object is
categorized into one of the classes in a specified
collection with a confidence score that it belongs
to the class. Its location and scale in the image are
indicated by a rectangular bounding box.
● Object detection and classification
Use cases
● Detect people and objects in an image
○ Police review a large photo gallery for a
missing person
Semantic Segmentation
Tags every pixel in an image with a class label
from a predefined set of classes.
● Computer vision
Use cases
● Computer vision
○ Self-driving cars to identify objects in their
way
○ Robot sensing
● Medical imaging diagnostics
Latent Dirichlet Allocation
(LDA)
Describes a set of observations as a mixture of
distinct categories to discover a user-specified
number of topics shared by documents within a
text corpus. The topics are not specified up front,
and are not guaranteed to align with how a
human may naturally categorize documents.
● Topic modeling
Use cases
● Article recommendations based on similarity
○ Recommend articles on similar topics
which you read or rated in the past
● Musical influence modelling
○ Explore which musical artists over time
were truly innovative and those who
influenced from the first ones
Neural Topic Model
(NTM)
Organizes a corpus of documents into topics that
contain word groupings based on their statistical
distribution. The semantics of topics are usually
inferred by examining the top ranking words they
contain. Only the number of topics, not the topics
themselves, are prespecified. The topics are not
guaranteed to align with how a human might
naturally categorize documents.
● Topic modeling
Use cases
● Classify or summarize documents based on
the topics detected
○ Tag a document as belonging to a medical
category based on the terms used in the
document
● Retrieve information or recommend content
based on topic similarities
Sequence To Sequence
(seq2seq)
Converts a sequence of tokens (for example, text,
audio) and the output generated is another
sequence of tokens.
● Machine translation
● Text summarization
● Speech-to-text
Use cases
● Machine translation
○ Convert text from Spanish to English
● Text summarization
○ Summarize a long text corpus: an abstract
for a research paper
● Speech-to-text
○ Convert audio files to text: transcribe call
center conversations for further analysis
Blazing Text
(word2vec)
Used for natural language processing (NLP) tasks.
Maps words to high-quality distributed vectors
and captures the semantic relationships between
words.
Use cases
● Sentiment analysis
○ Evaluate customer comments based on
positive / negative sentiment
● Named entity recognition
● Machine translation
Blazing Text
(Text classification)
Useful for web searches, information retrieval,
ranking, and document classification.
Assigns a set of predefined categories to open-
ended text. Can be used to organise and
categorise almost all kind of text.
Use cases
● Document classification
○ Review a large collection of documents and
detect if they contain sensitive data like
personal information or trade secrets
○ Categorize books in a library into academic
disciplines
● Web searches
● Information retrieval
● Ranking
Object2Vec
Generalizes Word2Vec embedding technique for
words. Converts high-dimensional objects into
low-dimensional space while preserving the
semantics of the relationship between the pairs
in the original embedding space.
Use cases
● Rating prediction
○ Predict movie popularity based on rating
similarity
● Document classification
○ What genre is the book based on its
similarity to known genres?
○ Identify duplicate support tickets and find
the correct routing based on similarity of
text in the tickets
IP Insights
IP anomaly detection. Learns the usage patterns
for IPv4 addresses capturing associations
between IP addresses and various entities such
as user id / account number.
Use cases
● Tiered authentication model
○ Dynamically trigger 2-factor
authentication routine if user logs from an
anomalous IP
● Fraud detection / prevention
○ Permit only certain activities if the IP is
unusual
DeepAR
Forecasts scalar (one-dimensional) time series
using recurrent neural networks (RNN) on
multiple sets of historic data. Extrapolates the
time series into the future.
Use cases
● Forecasting new product sales
○ Predict sales on a new product based on
previous sales data from other products
● Predict labor needs for special events
○ Use labor utilization rates at another
distribution centre to predict the required
level of staffing for a brand new center
Principal Component
Analysis (PCA)
Attempts to reduce the dimensionality (number
of features) within a dataset while still retaining
as much information as possible. Finds a new set
of features called components, which are
composites of the original features that are
uncorrelated with one another.
Use cases
● Feature engineering: dimensionality
reduction
○ Drop those columns from a dataset that
have a weak relation with the label/target
variable: the color of a car when predicting
its mileage
Reinforcement Learning
An area of machine learning concerned with how
intelligent agents ought to take actions in an
environment in order to maximize the notion of
cumulative reward.
Use cases
● Autonomous vehicles
○ Model can learn through iterations of a
trial and error in a simulation. Once the
model is good enough, it can be tested in a
real vehicle on a test track
● Intelligent HVAC control system
○ Model learns about impact of a sunlight
and equipment efficiency to optimize
temperature control for lowest energy
consumption