Basics of Predictive modeling
Imagine how the world would change when any advertisement you receive is only about a
product you are interested in. How beautiful it would be to receive information only about
relevant products? How efficient would it be when you get all the required grocery items in
first aisle? How much can mankind gain by being able to predict your diseases by looking at
historical medical record and current symptoms?
All of this can be done by using power of predictive analytics. Many companies are already
using this and becoming better and sharper with their targeting. They are able to get more
than 100% response uplift from their marketing campaigns by predicting the need of
customers and communicating with relevant products only.
So what is Predictive Analytics and how can it help?
According to Gartner:
Predictive modeling is a commonly used statistical technique to predict future behavior.
Predictive modeling solutions are a form of data-mining technology that works by analyzing
historical and current data and generating a model to help predict future outcomes
Simply put, predictive analytics uses past trends and applies them to future. For example, if
a customer purchases a smart phone from a e-commerce website, he might be interested in
it’s accessories immediately. He might be a potential customer for phone battery a few
years down the line. Currently, chances of him buying accessory of a competitor
smartphone are relatively bleak.
While the example might sound simple, imagine doing this for thousands of categories you
might be selling. With in those thousands of categories, there might be multiple options
(hundreds of covers, pouches, stylus…). Further, even if you have a thousand visitors every
day (small number of many e-retailers), predicting the next purchase without data based
decisioning for these customers might become impossible.
This is exactly where predictive analytics will come to your help (remember Amazon helping
you out with, You might also like….).
I understand how predictive analytics can help, what do I do next?
If you are a business owner who wants to harness business analytics, you need to setup an
Analytics team. I’ll cover details of setting this up sometime later. This post is for people
wanting to learn the art of Predictive Analytics.
Following is a typical life cycle of building predictive models:
Steps to build a predictive model
The first step in any predictive model is to collate data from various sources. This can be
data you own about your customer (like pages visited in past, products purchased in past),
or data which the customer has provided (e.g. Address, Name, Age etc.).
This data needs to be cleaned and arranged in a structure so that it can be analyzed easily.
This structure needs to be in sync with various business hypothesis. For example, if
business hypothesis is that particular age / gender group may have higher likelihood to
purchase certain set of products, Age and Gender needs to be attributed at customer level.
Once these data sets are ready, we then use various predictive modeling techniques and
business understanding to come out with various business insights (nuggets of gold). These
insights can then be used in marketing / web site layout to increase efficiency.
What is predictive analytics?
Predictive analytics is the use of data, statistical algorithms and machine-learning
techniques to identify the likelihood of future outcomes based on historical data.
The goal is to go beyond descriptive statistics and reporting on what has happened to
providing a best assessment on what will happen in the future. The end result is to
streamline decision making and produce new insights that lead to better actions.
Predictive models use known results to develop (or train) a model that can be used to
predict values for different or new data. The modeling results in predictions that represent a
probability of the target variable (for example, revenue) based on estimated significance
from a set of input variables. This is different from descriptive models that help you
understand what happened or diagnostic models that help you understand key relationships
and determine why something happened.
More and more organizations are turning to predictive analytics to increase their bottom line
and competitive advantage using predictive analytics. Why now?
Growing volumes and types of data and more interest in using data to produce
valuable information.
Faster, cheaper computers and easier to use software.
Tougher economic conditions and a need for competitive differentiation.
With interactive and easy-to-use software becoming more prevalent, predictive analytics is
no longer just the domain of mathematicians and statisticians. Business analysts and line-
of-business experts are using these technologies as well.
What do predictive analytics do?
A 2014 TDWI report found that organizations want to use predictive analytics to:
1. Predict trends.
2. Understand customers.
3. Improve business performance.
4. Drive strategic decision making.
5. Predict behavior.
Some of the most common uses of predictive analytics include:
Fraud detection and security – Predictive analytics can help stop losses due to fraudulent
activity before they occur. By combining multiple detection methods – business rules,
anomaly detection, predictive analytics, link analytics, etc. – you get greater accuracy and
better predictive performance. And in today’s world, cybersecurity is a growing concern.
High-performance behavioral analytics examines all actions on a network in real time to
spot abnormalities that may indicate occupational fraud, zero-day vulnerabilities and
advanced persistent threats.
Marketing – Predictive analytics can help you better understand your customers. Most
modern organizations use predictive analytics to determine customer responses or
purchases, as well as promote cross-sell opportunities. Predictive models help businesses
attract, retain and grow the most profitable customers and maximize their marketing
spending.
Operations – Predictive analytics plays an important role in operations for many
organizations, allowing them to function smoothly and efficiently. Many companies use
predictive models to forecast inventory and manage factory resources. Others use them for
more specialized needs. Airlines use predictive analytics to decide how many tickets to sell
at each price for a flight. Hotels try to predict the number of guests they can expect on any
given night to adjust price to maximize occupancy and increase revenue. Predictive
analytics are also used in human resources, asset maintenance, government and life
sciences.
Risk – One of the most well-known examples of predictive analytics is credit scoring. Credit
scores are used ubiquitously to assess a buyer’s likelihood of default for purchases ranging
from homes to cars to insurance. A credit score is a number generated by a predictive
model that incorporates all of the data relevant to a person’s credit-worthiness. Predictive
analytics has other risk-related uses, including claims and collections.
Predictive analytics use across industries – real-life examples
Any industry can use predictive analytics to optimize their operations and increase revenue.
Here are a few examples:
Credit card, banking and financial services. Detect and reduce fraud, measure
credit risk, maximize cross-sell/up-sell opportunities, retain customers and optimize
marketing campaigns. Commonwealth Bank can reliably predict the likelihood of fraud
activity for any given transaction before it is authorized -- within 40 milliseconds of the
transaction being initiated.
Governments and the public sector. Improve service and performance; detect
and prevent fraud, improper payments and the misuse of funds and taxpayer dollars; and
detect criminal activities and patterns. The Hong Kong government visualizes and analyzes
big, unstructured data to anticipate and address public complaints.
Health care providers. Predict the effectiveness of new procedures, medical tests
and medications, and improve services or outcomes by providing safe and effective patient
care. Taipei Medical University executives analyze, monitor performance across all hospitals
in its system.
Health insurers. Detect and handle insurance claims fraud, identify which patients
are most at risk of chronic diseases and know which interventions make the most medical
and financial sense. Blue Cross and Blue Shield of North Carolina built a model to more
accurately predict hospital readmissions and deploy nurse case managers to help patients
most at risk.
Insurance companies. Determine insurance premium rates, detect claims fraud,
optimize claims processes, retain customers, improve profitability and optimize marketing
campaigns. Within two hours of an earthquake striking rural New Zealand, Farmers Mutual
Group assessors were headed to affected areas. With SAS analytics they knew who their
most at-risk policy holders were and chartered a helicopter to get to them quickly.
Manufacturers. Identify factors leading to reduced quality and production failures,
and optimize parts, service resources and distribution. Lenovo detected a product issue 30
percent faster and reduced warranty costs 10 to 15 percent for previously hard-to-detect
issues.
Media and entertainment. Deepen insight into audiences by identifying influencing
attributes, trends, drivers and desires across properties, and score visitors to determine
appropriate audience segments and behavior value. How is the slot floor doing every day?
How is the gaming floor performing? How are the nonsmoking tables compared to the
smoking tables? The answers – which previously could take numerous weeks and many
dollars to find out – are now coming in minutes and at a far lower cost for Foxwoods Resort
Casino.
Oil, gas and utility companies. Predict equipment failures and future resource
needs, mitigate safety and reliability risks, and improve performance. Salt River Project is
the second-largest public power utility in the US and one of Arizona's largest water
suppliers. A sophisticated forecasting model helps them know the best time to sell excess
electricity for the best price.
Retailers. Asses the effectiveness of promotional events and campaigns, predict
which offers are most appropriate for consumers, determine which products to stock where
and how to build brand loyalty. Macy's increased its use of predictive analytics and reduced
email subscription churn by 20 percent.
Sports franchises. Sports analytics is a hot area, thanks in part to Nate Silver and
tournament predictions. The NBA’s Orlando Magic uses SAS predictive analytics to improve
revenue and determine starting lineups.
Telecommunication companies. Segment customers, reduce customer churn,
retain profitable customers and develop effective cross-sell/up-sell campaigns. T-Mobile is
looking at new ways to retain valued customers through new insights it uncovers in massive
volumes of customer data.
What do you need to get started?
The first thing you need to get started using predictive analytics is a problem to solve. What
do you want to know about the future based on the past? What do you want to understand
and predict? You’ll also want to consider what will be done with the predictions. What
decisions will be driven by the insights? What actions will be taken?
Second, you’ll need data. In today’s world, that means data from a lot of places. Your
transactional systems, data collected by sensors, third-party information, call center notes,
web logs, etc. You’ll need a data wrangler, or someone with data management experience,
to help you cleanse and get the data prepped for analysis. To prepare the data for a
predictive modeling exercise also requires someone who understands both the data and the
business problem. How you define your target is essential to how you can interpret the
outcome. (Data preparation is considered one of the most time-consuming aspects of the
analysis process. So be prepared for that.)
After that, the predictive model building begins. With increasingly easy-to-use software
becoming more available, a wider array of people can build analytical models. But you’ll still
likely need some sort of data analyst who can help you refine your models and come up
with the best performer. And then you might need someone in IT who can help deploy your
models. That means putting the models to work on your chosen data – and that’s where
you get your results.
Predictive modeling requires a team approach. You need people who understand the
business problem to be solved. Someone who knows how to prepare data for analysis.
Someone who can build and refine the models. Someone in IT to ensure that you have the
right analytics infrastructure for model building and deployment.
Types of predictive models
Predictive analytics models are not a monolith. There are different models developed for
design-specific functions.
Forecast models
A forecast model is one of the most common predictive analytics models. It handles metric
value prediction by estimating the values of new data based on learnings from historical
data. It is often used to generate numerical values in historical data when there is none to
be found. One of the greatest strengths of predictive analytics is its ability to input multiple
parameters. For this reason, they are one of the most widely used predictive analytics
models in use. They are used in different industries and business purposes. For example, a
call centre can predict how many support calls they will get in a day or a shoe store can
calculate inventory they need for the upcoming sales period using forecast analytics.
Forecast models are popular because they are incredibly versatile.
Classification models
One of the most common predictive analytics models are classification models. These
models work by categorising information based on historical data. Classification models are
used in different industries because they can be easily retrained with new data and can
provide a broad analysis for answering questions. Classification models can be used in
different industries like finance and retail, which explains why they are so common
compared to other models.
Outliers Models
While classification and forecast models work with historical data, the outliers model works
with anomalous data entries within a dataset. As the name implies, anomalous data refers
to data that deviates from the norm. It works by identifying unusual data, either in isolation
or in relation with different categories and numbers. Outlier models are useful in industries
where identifying anomalies can save organisations millions of dollars, namely in retail and
finance. One reason why predictive analytics models are so effective in detecting fraud is
because outlier models can be used to find anomalies. Since an incidence of fraud is a
deviation from the norm, an outlier model is more likely to predict it before it occurs. For
example, when identifying a fraudulent transaction, the outlier model can assess the
amount of money lost, location, purchase history, time and the nature of the purchase.
Outlier models are incredibly valued because of their close connection to anomaly data.
Time series model
While classification and forecast models focus on historical data, outliers focus on anomaly
data. The time series model focuses on data where time is the input parameter. The time
series model works by using different data points (taken from the previous year’s data) to
develop a numerical metric that will predict trends within a specified period.
If organisations want to see how a particular variable changes over time, then they need a
Time Series predictive analytics model. For example, if a small business owner wants to
measure sales for the past four quarters, then a Time Series model is needed. A Time Series
model is superior to conventional methods of calculating the progress of a variable because
it can forecast for multiple regions or projects simultaneously or focus on a single region or
project, depending on the organisation’s needs. Furthermore, it can take into account
extraneous factors that could affect the variables, like seasons.
Clustering Model
The clustering model takes data and sorts it into different groups based on common
attributes. The ability to divide data into different datasets based on specific attributes is
particularly useful in certain applications, like marketing. For example, marketers can divide
a potential customer base based on common attributes. It works using two types of
clustering – hard and soft clustering. Hard clustering categorises each data point as
belonging to a data cluster or not. While soft clustering assigns data probability when
joining a cluster.
How do predictive analytics models work?
Predictive analytics models have their strengths and weaknesses and are best used for
specific uses. One of the biggest benefits applicable to all models is that they are reusable
and can be adjusted to have common business rules. A model can be reusable and trained
using algorithms. But how do these predictive analytics models actually work?
The analytical models run one or more algorithms on the data set on which the prediction is
going to be carried out. It is a repetitive process because it involves training the model.
Sometimes, multiple models are used on the same data set before one that suits business
objectives is found. It is important to note that predictive analytics models work through an
iterative process. It starts with pre-processing, then data is mined to understand business
objectives, followed by data preparation. Once preparation is complete, data is modelled,
evaluated and finally deployed. Once the process is completed, it is iterated on again.
Data algorithms play a huge role in this analysis because they are used in data mining and
statistical analysis to help determine trends and patterns in data. There are several types of
algorithms built into the analytics model incorporated to perform specific functions.
Examples of these algorithms include time-series algorithms, association algorithms,
regression algorithms, clustering algorithms, decision trees, outlier detection algorithms and
neural network algorithms. Each algorithm performs a specific function. For example, outlier
detection algorithms detect the anomalies in a dataset, while regression algorithms predict
continuous variables based on other variables present in the dataset.
Creating predictive algorithm models
While developing a predictive analytics model is no simple task, we managed to break down
the process to six essential steps.
Defining scope and scale – Determine the process that will use the predictive analytics
models and what the desired business outcomes will be.
Profile data – Predictive analytics is data-intensive. So the next step is to explore the data
needed for analysis. Organisations have to decide where it is stored, its current state, and
how accessible will it be.
Gather, cleanse and integrate data – Once data is found, it needs to be cleaned and
gathered. It is an important step because predictive analytics models need a strong
foundation to work effectively.
Incorporate analytics into the business process – The model can only be used to
integrate it into the business process to get the best outcomes.
Monitor models and measure the business results – The model needs to be measured
to see if it makes genuine contributions to the overall business processes.
Limitations of predictive analytics models
Despite the immense economic benefits predictive analytics models, it is not a fool-proof,
fail-safe model. There are some disadvantages to predictive analytics. Predictive models
need are specific set of conditions to work, if these conditions are not met, then it is of little
value to the organisation.
The need for massive training datasets
For predictive analytics models to be successful at predicting outcomes, there needs to be a
huge sample size representative of the population. Ideally, the sample size should be in the
high thousands to a few million. If datasets are smaller than the predictive analytics models
will be unduly influenced by anomalies in the data, which will distort findings. The need for
massive datasets inevitably locks out a lot of small to medium-sized organisations who may
not have this much data to work with.
Properly categorising data
Predictive analytics models rely on machine learning algorithms, and these algorithms can
properly assess data if it is labelled properly. Data labelling is a particularly demanding and
meticulous process because it needs to be accurate. Incorrect classification and labelling
cause several problems, like poor performance and accuracy in findings.
Applying learnings to different cases
Data models have a problem with generalisability, which is the ability to transfer findings
from one case to another. While predictive models are effective in their findings for one
case, they often struggle to transfer their findings to a different situation. Hence, there are
some applicability issues when it comes to the findings derived from a predictive analytics
model. However, there is a solution in certain methods, like transfer learning that could help
mitigate some of these shortcomings.
Predictive models in the future
The future will see predictive analytics models play an integral role in business processes
because of the immense economic value they generate. While not perfect, the value they
offer organisations, both public and private, is immense. With predictive analytics,
organisations have the opportunity to take action proactively in a variety of functions. Fraud
prevention in banks, disaster prevention for governments and sublime marketing campaigns
are just some of the possibilities tangible with predictive analytics models, which is why
they will be an intangible asset for the future.