0% found this document useful (0 votes)

333 views10 pages

Data Science in Practice

The document provides an overview of five common applications of data science: recommender systems, credit scoring, dynamic pricing, customer churn prediction, and fraud detection. For each application, it defines what the application is, how it works using common techniques like collaborative filtering or linear models, and provides a real-world use case example like a company using these models/applications. The goal is to illustrate how data science is applied in practice to solve business problems and optimize processes.

Uploaded by

Ayotunde Salako

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

333 views10 pages

Data Science in Practice

Uploaded by

Ayotunde Salako

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

hat

Data Science in Practice

Five common applications of data science with concrete,
real-life use cases

Presented by
Yhat
http://yhat.com/
June 2016

hat

| Data Science in Practice

What is data science...actually?

How do real companies use data science to make products
and operations better?
What does the data science lifecycle look like?
In our first whitepaper, Applied Data Science, we translated the hype-y
term of data science into the plain english definition of using data to
make better decisions, optimize processes and improve products and
services. We also described the central goal of data science: getting
statistical models into production.

In this whitepaper we introduce five common applications of data

science that build upon those concepts. Our goal is to debunk the
impression that data science is some type of obscure black magic and
give you concrete examples of how it is applied in reality. Youll learn how
real companies are using data science to make their products and dayto-day operations better. Last but not least, we describe the data science
life cycle and explain Yhats role in getting models into production.

hat

| Data Science in Practice

APPLICATION 1:
RECOMMENDER SYSTEMS
Recommender systems, also known as
recommender engines, are one of the most
well known applications of data science.
Recommender systems are a subclass of
information filtering systems, systems that
cut through the noise of all options and
present users with just the subset of options

WHAT IS A RECOMMENDER SYSTEM?

A model that filters information to present users
with a curated subset of options theyre likely to find
appealing
HOW DOES IT WORK?
Generally via a collaborative approach
(considering users previous behavior) or contentbased approach (based on discrete assigned
characteristics)
WHAT IS A REAL USE CASE?
Tendril uses recommendation models to match
eligible customers with new or existing energy
products

theyll find appealing. The data being filtered

can range from products on an e-commerce

approach to filtering. Collaborative filtering

site to dating matches that appear as you

considers a users previous behavior, as well

search for the one.

as the behavior of similar users. Contentbased filtering provides recommendations

Recommender Systems

based on discrete attributes or assigned

characteristics.
Data scientists at energy software company

Content Filtering
(location, age, gender)

Collaborative Filtering
(previous behavior, similar users)

Tendril opted for a hybrid approach that

combines both collaborative and contentbased filtering. Tendril provides analytics
and consumer solutions to energy suppliers,

Recommender
System

including which energy products consumers

would most likely consider. We use Support
Vector Regression models to predict
household energy consumption to provide
our clients with in-depth, personalized
information about their customers, explains

Recommender systems offer a more

Mark Gately, Data Analytics Manager at

intelligent approach to information filtering

Tendril. This detailed information is also

than a simple search algorithm by introducing

used in recommendation models, which help

users to items they might not have otherwise

match eligible customers with new or existing

discovered. Recommender systems generally

energy products.

take either a collaborative or content-based

hat

| Data Science in Practice

APPLICATION 2:
CREDIT SCORING

One such company is Ferratum Bank,

a pioneer in financial technology and
mobile consumer lending since 2005. We
developed complex statistical and machine

If you have ever applied for a credit card

learning models to enable smarter lending

or a loan, youre likely already familiar with

decisions, explains Scott Donnelly, Director

the concept of credit scoring. What you

of Business Lending at Ferratum Bank. By

may be less aware of is the set of decision

getting creative with our approach and

management rules evaluating how likely

adopting innovative technologies, weve been

an applicant is to repay debts behind the

able to reinvent how both consumers and

scenes.

businesses obtain loans. This has allowed us

to reach prospective customers that in the

WHAT IS CREDIT SCORING?

A model that determines an applicants
creditworthiness for a mortgage, loan or credit card

past may have been overlooked by traditional

HOW DOES IT WORK?

A set of decision management rules evaluates how
likely an applicant is to repay debts

APPLICATION 3:
DYNAMIC PRICING

WHAT IS A REAL USE CASE?

Ferratum Bank uses machine learning models to
reach prospective customers that may have been
overlooked by traditional banking institutions

banking institutions.

You walk out of the store, arms full of

groceries, only to realize that a torrential
downpour began as you perused the

The first general purpose credit scoring

produce inside. You struggle to retrieve your

algorithm, now known as the FICO score, was

phone, check your favorite ride app and are

introduced in 1989. The FICO score is still

dismayed to find...a 2.1x surge!? Welcome to

one of the most widely used models in the

your first lesson on dynamic pricing.

United States today, though peer-to-peer and

direct lending organizations have focused
on developing new techniques over the
past few years. These new machine learning
models and algorithms capture innovative
factors and relationships that traditional
loan scorecards couldnt, like how applicants
manage monthly cash flow or whether
friends or community members would
endorse the applicant.

hat

| Data Science in Practice

WHAT IS DYNAMIC PRICING?

Modeling price as a function of supply, demand,
competitor pricing and exogenous factors
HOW DOES IT WORK?
Generalized linear models and classification trees
are popular techniques for estimating the right
price to maximize expected revenue
WHAT IS A REAL USE CASE?
Turo uses dynamic pricing models to suggest prices
to the people who list and rent out cars

Businesses use dynamic pricing algorithms

ago we started to model supply and demand

to model rates as a function of supply,

dynamics, so working on dynamic pricing was

demand, competitor pricing, and exogenous

an intuitive next step.

factors (e.g. weather or time). Many fields,

from airline travel to athletics admission

We quickly realized that the gap between

ticketing, employ dynamic pricing to

model development and model deployment,

maximize expected revenue. The nuts and

in production, was much bigger than

bolts of dynamic pricing strategies vary

expected. It requires a very wide spectrum of

widely, though generalized linear models and

skills: from knowledge of statistical modeling

classification trees are popular techniques for

to software architecture best practices. We

estimating the right (lowest/highest) price

use Yhats platform, ScienceOps, to transform

that consumers are willing to pay for a book,

our dynamic pricing prototype into a

a flight, or a cab.

production-ready algorithm in the languages

our Data Science team prefers to work in, R

Dynamic Pricing

Price (e.g. fare of ride)

and Python.

D1
P2
P1

APPLICATION 4:
CUSTOMER CHURN
Churn rate describes the rate at which
customers abandon a product or service.
Understanding customers likelihood to churn
is particularly important for subscription-

Quantity (e.g. # of rides)

based models, everything ranging from

traditional cable or gym memberships to
recently popularized monthly subscription

Turo, a peer-to-peer car rental service

operating in over 2,500 cities, uses dynamic
pricing to suggest prices to the people who
list and rent out their cars on the platform.
Dynamic pricing helps us to balance supply
and demand and ensure that both our
travelers and our hosts are getting a fair
market deal, explains Jrme Selles, Director
of Data Science and Analytics. Three years

hat

| Data Science in Practice

boxes.
Data scientists looking to predict customer
churn may consider a variety of algorithms
for the job, such as support vector machines,
random forest, or k-nearest-neighbors.
Beyond the accuracy of a given model, data
scientists must also balance the tradeoff
between precision (correctly predicting a

churning customer) and recall (how many

not graduating more efficiently, so they can

predictions were actually successful). So

intervene and help those students graduate.

whats better? Classifying every churning

customer but occasionally mislabeling a
non-churning customer? Or identifying fewer
churning customers, but not mislabeling nonchurners? Its a difficult decision that requires
in-depth knowledge of the business case and
years of experience.

APPLICATION 5:
FRAUD DETECTION
Financial technology, or FinTech, companies
offer financial services like banking, investing,
and payment processing via software, rather
than through traditional banking institutions.

WHAT IS CUSTOMER CHURN?

Predicting which customers are going to abandon
a product or service
HOW DOES IT WORK?
Data scientists may consider using support vector
machines, random forest or k-nearest-neighbors
algorithms
WHAT IS A REAL USE CASE?
EAB combines data from transcripts, standardized
test scores, demographics and more to identify
students at risk of not graduating

These are familiar questions for the data

scientists at EAB, the education division of
The Advisory Board Company. EAB provides
data driven applications and insights to
hundreds of institutions of higher education.
A key component of our Student Success
Collaborative product, used by academic
advisors and other administrators, is a
predictive model of student graduation, says
Harlan Harris, Director of Data Science. We
combine data from transcripts, standardized
test scores, demographics, and other facts
about students to provide a graduation
risk score. Colleges and universities use
these scores to identify students at risk of

hat

| Data Science in Practice

Companies processing massive volumes of

financial transactions also need a quantifiable
way to detect and prevent fraudulent
transactions from being processed.
WHAT IS FRAUD DETECTION?
Detecting and preventing fraudulent financial
transactions from being processed
HOW DOES IT WORK?
Fraud detection is a binary classification problem:
is this transaction legitimate or not?
WHAT IS A REAL USE CASE?
Via SMS Group uses a combination of complex
data lookups and decision algorithms written in
R and implemented in PHP to assess whether a
loan applicant is fraudulent

Traditional fraud detection presents a fairly

straightforward problem: Is a transaction
legitimate or not? Otherwise called a binary
classification problem. This can be trickier
than it seems, especially when you have
thousands (or even millions) of legitimate
transactions occurring for every instance
of fraud. To add insult to injury, a single
occurrence of fraud can cost a company an
exorbitant amount of money. To combat this,

some data science teams pair supervised

data can be used to inform decisions,

classification techniques with anomaly

optimize processes, and improve products

detection algorithms to identify outliers and

and services across a very wide range of

pick out suspicious behavior.

business problems. Regardless of the specific

question at hand, the data science lifecycle

Dmitrijs vovs is responsible for managing

always culminates with selecting the winning

risk at VIA SMS Group, where over 60MM

model strategy and implementing it into an

is loaned to consumers across 6 countries

application where real business value can be

every year. The risk analytics team at VIA

realized.

SMS Group use advanced algorithms to

assess whether an applicant is fraudulent

For data scientists who build their models

prior to considering whether or not to

in open source programming languages

underwrite the requested loan. We write our

like R and Python, the path to production

decision algorithms in the R programming

application is not always clear. Mobile and

language and implement them into our

web applications are built using platforms

web and mobile apps in the server-side

and frameworks like .NET, Ruby on Rails,

language of PHP. By using R, we can leverage

Java, PHP or Node.js, which cannot consume

a combination of complex data lookups

models written in R or Python. As a result,

and state of the art algorithms to identify

many models are abandoned after months

fraudulent transactions, explains Dmitrijs.

of work before they ever see the light of

day. Alternatively, data scientists advanced

THE DATA SCIENCE LIFECYCLE

As the five applications above demonstrate,

statistical procedures may be tossed

over the fence to engineers and manually
recoded into another language, a notably

Now what?

So when can we
go live with the
new model?

hat

| Data Science in Practice

Any of you know

what Gradient
Boosting is?

difficult, time consuming and error-prone

rely on ScienceOps to take data science

process.

models from prototype to production.

YHATS ROLE

To find out more about how your business

Yhats data science operations

system, ScienceOps, eliminates the

can deploy models rapidly, frequently and

reliably with ScienceOps, get in touch with
the Yhat team or schedule a demo today.

counterproductive barrier between data

scientists and engineers by making R and
Python models accessible via REST API.
Instead of translating models, data scientists
can deploy models to ScienceOps, where
automatically generated API endpoints make
production integration quick & easy.

WHAT IS SCIENCEOPS?
Yhats data science operations system that
eliminates the barrier between data scientists and
engineers
HOW DOES IT WORK?
ScienceOps makes R and Python models accessible
via REST API and provides a platform to monitor,
manage and scale data science models
WHAT IS A REAL USE CASE?
ScienceOps is used by companies around the
globe, including each of those highlighed in the five
applications above

Our core mission at Yhat is to allow data

scientists to deploy predictive models rapidly,
frequently and reliably, but we recognize
that a data scientists job does not end there.
Beyond the initial step of implementing
models, ScienceOps also provides the ability
to monitor, manage and scale models.
Companies around the globe, including each
of those highlighted in the use cases above,

hat

| Data Science in Practice

Works Cited
Chiang, Eric. Predicting Customer Churn with Scikit-learn. The Yhat Blog. Yhat, 20 Mar. 2014.
Web.
Huang, Cheng-Lung, Mu-Chen Chen, and Chieh-Jen Wang. Credit Scoring with a Data Mining

Approach Based on Support Vector Machines. Expert Systems with Application 33 (2007):

847-56. Web.

Leskovec, Jure, and Jeffrey Ullman. Recommendation Systems. Mining of Massive Data Sets.

Ed. Anand Rajaraman. 2.1 ed. Cambridge: Cambridge UP, 2014. 307-41. Print.

Phua, Clifton, Vincent Lee, Kate Smith, and Ross Gayler. A Comprehensive Survey of Data

Mining-based Fraud Detection Research. Web. <https://arxiv.org/pdf/1009.6119.pdf>.

Yhat. Applied Data Science: Practical Guide to Building Data-driven Products beyond Analysts

Laptops. New York: Yhat, 2014. Print.

hat

| Data Science in Practice

About
Yhat (pronounced Y-hat) provides an end-to-end data science platform for developing,
deploying, and managing real-time decision APIs.
Yhats flagship product, ScienceOps, enables data scientists to transform static insights into
production-ready decision making APIs that integrate seamlessly with any customer- or
employee-facing app. Yhat also created Rodeo, an open source integrated development
environment (IDE) for Python.

hat

| Data Science in Practice

Data Science Use Cases
100% (1)
Data Science Use Cases
10 pages
Introduction To Data Mining: Dr. Dipti Chauhan Assistant Professor SCSIT, SUAS Indore
No ratings yet
Introduction To Data Mining: Dr. Dipti Chauhan Assistant Professor SCSIT, SUAS Indore
16 pages
A Guide To Teaching Data Science PDF
No ratings yet
A Guide To Teaching Data Science PDF
26 pages
Cluster
100% (1)
Cluster
72 pages
Data Analytics: Key Concepts & Terms
No ratings yet
Data Analytics: Key Concepts & Terms
22 pages
Client Predictive Analytics Proposal PDF
100% (1)
Client Predictive Analytics Proposal PDF
7 pages
DataScientist v2
No ratings yet
DataScientist v2
14 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
7 pages
Documenting Data Science Projects
No ratings yet
Documenting Data Science Projects
9 pages
Data Mining in Medicine
No ratings yet
Data Mining in Medicine
42 pages
Data Science in E-Commerce - Report - Writing
100% (1)
Data Science in E-Commerce - Report - Writing
18 pages
Machine Learning: Bilal Khan
100% (2)
Machine Learning: Bilal Khan
20 pages
Data Science Theory: Analysis and Analytics
No ratings yet
Data Science Theory: Analysis and Analytics
14 pages
Data Mini Proj
100% (2)
Data Mini Proj
44 pages
Advanced Analytics Introduction
No ratings yet
Advanced Analytics Introduction
8 pages
The Data Science Guide
100% (1)
The Data Science Guide
92 pages
Data Analytics and Performance
100% (8)
Data Analytics and Performance
81 pages
Beginner's Guide to Regression Models
No ratings yet
Beginner's Guide to Regression Models
18 pages
Real Estate ML Project Guide
No ratings yet
Real Estate ML Project Guide
20 pages
An Introduction To Data Mining
No ratings yet
An Introduction To Data Mining
47 pages
Data Science For Business 3 PDF
No ratings yet
Data Science For Business 3 PDF
28 pages
Data Science for Business Insights
No ratings yet
Data Science for Business Insights
40 pages
Introduction To Data Mining
75% (4)
Introduction To Data Mining
45 pages
SAS Presentation
No ratings yet
SAS Presentation
49 pages
Predictive Modeling Lecture Notes 1
No ratings yet
Predictive Modeling Lecture Notes 1
11 pages
Machine Learning Guide Line
No ratings yet
Machine Learning Guide Line
10 pages
Assignment Data Analysis Example
100% (1)
Assignment Data Analysis Example
10 pages
Unit I-Ch 01-Big Data Introduction
No ratings yet
Unit I-Ch 01-Big Data Introduction
40 pages
Fast Food Data Warehouse Case Study
No ratings yet
Fast Food Data Warehouse Case Study
5 pages
Data Science PPT Module 1
100% (1)
Data Science PPT Module 1
24 pages
PCA Using Python
No ratings yet
PCA Using Python
18 pages
Data Analytics Project
No ratings yet
Data Analytics Project
9 pages
FinalPaper SalesPredictionModelforBigMart
No ratings yet
FinalPaper SalesPredictionModelforBigMart
14 pages
Predictive Modeling Project Report
100% (2)
Predictive Modeling Project Report
31 pages
Data Analytics for Aspiring Analysts
No ratings yet
Data Analytics for Aspiring Analysts
54 pages
Crime Prediction in Nigeria's Higer Institutions
No ratings yet
Crime Prediction in Nigeria's Higer Institutions
13 pages
Data Science Note
No ratings yet
Data Science Note
24 pages
A Comprehensive Guide To Data Exploration: Steps of Data Exploration and Preparation Missing Value Treatment
100% (2)
A Comprehensive Guide To Data Exploration: Steps of Data Exploration and Preparation Missing Value Treatment
8 pages
Machine Learning GL
No ratings yet
Machine Learning GL
25 pages
Career Plans For Next 2 Years
No ratings yet
Career Plans For Next 2 Years
11 pages
Business Analytics and Big Data PDF
100% (1)
Business Analytics and Big Data PDF
15 pages
Hypothesis Testing Spinning The Wheel
No ratings yet
Hypothesis Testing Spinning The Wheel
1 page
Big Data Technology
100% (1)
Big Data Technology
10 pages
Introduction To Data Mining With Case Studies - Sample Index
0% (1)
Introduction To Data Mining With Case Studies - Sample Index
16 pages
Starting A Data Science Team: Dr. Jonathan D. Adler
No ratings yet
Starting A Data Science Team: Dr. Jonathan D. Adler
39 pages
Data Analysis Methods & Tools
100% (1)
Data Analysis Methods & Tools
19 pages
Data Science A Beginner S Guide 1668243666
100% (1)
Data Science A Beginner S Guide 1668243666
26 pages
DTS Modul Data Science Methodology
100% (1)
DTS Modul Data Science Methodology
56 pages
Statistic Interview Questions and Answers by Jeevan Raj
No ratings yet
Statistic Interview Questions and Answers by Jeevan Raj
21 pages
Data Analyst Interview Questions
No ratings yet
Data Analyst Interview Questions
49 pages
Statistics Interview Questions & Answers For Data Scientists
No ratings yet
Statistics Interview Questions & Answers For Data Scientists
43 pages
Data Science Interview Question
No ratings yet
Data Science Interview Question
93 pages
U1T3 - White Paper - Data Visualization Techniques From Basics To Big Data With SAS Visual Analytics
No ratings yet
U1T3 - White Paper - Data Visualization Techniques From Basics To Big Data With SAS Visual Analytics
19 pages
D Ata Science In: Practice
0% (1)
D Ata Science In: Practice
7 pages
Data Science (Introduction) Questions and Answers
No ratings yet
Data Science (Introduction) Questions and Answers
45 pages
Module 5 Data Science
No ratings yet
Module 5 Data Science
25 pages
Handbook DSC 1 2
No ratings yet
Handbook DSC 1 2
35 pages
Case Study Data Science Business
100% (1)
Case Study Data Science Business
805 pages
1.2.1 and 1.2.2
No ratings yet
1.2.1 and 1.2.2
54 pages
Lecture 7 - DS and Business Strategy
No ratings yet
Lecture 7 - DS and Business Strategy
39 pages
Oracle: Question & Answers
No ratings yet
Oracle: Question & Answers
10 pages
Debugging File I/O Errors
No ratings yet
Debugging File I/O Errors
1 page
HackLikePro v3
0% (1)
HackLikePro v3
72 pages
Amisys Certified IT Recruiter
No ratings yet
Amisys Certified IT Recruiter
10 pages
Smart Building Management Systems
100% (3)
Smart Building Management Systems
32 pages
ENG6500 8 DL IntroductionToDeepLearning Part2
No ratings yet
ENG6500 8 DL IntroductionToDeepLearning Part2
65 pages
FDMWD Imp Qns For Internal
No ratings yet
FDMWD Imp Qns For Internal
3 pages
SQL Practice for Students
No ratings yet
SQL Practice for Students
10 pages
Al 01
No ratings yet
Al 01
4 pages
Robotic WiFi Localization Advances
No ratings yet
Robotic WiFi Localization Advances
11 pages
Division Techniques Restoring Vs Non Restoring in COA
No ratings yet
Division Techniques Restoring Vs Non Restoring in COA
12 pages
CV - Otavio Rocha Geraldo-1
No ratings yet
CV - Otavio Rocha Geraldo-1
5 pages
Petrel (May) Webinar Series
No ratings yet
Petrel (May) Webinar Series
1 page
Radio Remote Control
No ratings yet
Radio Remote Control
4 pages
Final Exam
No ratings yet
Final Exam
12 pages
Ciena 5100 5200 For Service Providers DS
No ratings yet
Ciena 5100 5200 For Service Providers DS
5 pages
Aau Cs Model Exit 2025
No ratings yet
Aau Cs Model Exit 2025
18 pages
Related Searches: Electrical-Interview-Questions-Answers PDF
No ratings yet
Related Searches: Electrical-Interview-Questions-Answers PDF
1 page
Error Detection Assignment - 221208 - 160356 - 221208 - 160425
No ratings yet
Error Detection Assignment - 221208 - 160356 - 221208 - 160425
3 pages
Ps Qi Arabic
No ratings yet
Ps Qi Arabic
4 pages
TIB973 Consys 24.4
No ratings yet
TIB973 Consys 24.4
40 pages
Pay Men Tech Response Messages
100% (1)
Pay Men Tech Response Messages
16 pages
Xmagpy Manual
No ratings yet
Xmagpy Manual
60 pages
Chapter 4a - Network Layer - The Data Plane
No ratings yet
Chapter 4a - Network Layer - The Data Plane
111 pages
API Gateway
No ratings yet
API Gateway
1 page
Program Mandate Overview
No ratings yet
Program Mandate Overview
5 pages
Ict Action Plan 2023-2024
No ratings yet
Ict Action Plan 2023-2024
1 page
Cisco CCNA Exam Registration Guide
No ratings yet
Cisco CCNA Exam Registration Guide
3 pages
Smart Home Using IOT Integrated With Cloud: November 2016
No ratings yet
Smart Home Using IOT Integrated With Cloud: November 2016
30 pages
Delfinovin Help & FAQ PDF
No ratings yet
Delfinovin Help & FAQ PDF
5 pages