KEMBAR78
Business Analytics Using Data Mining: Term 6 | PDF | Analytics | Predictive Analytics
0% found this document useful (0 votes)
138 views26 pages

Business Analytics Using Data Mining: Term 6

Uploaded by

Hansika Gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
138 views26 pages

Business Analytics Using Data Mining: Term 6

Uploaded by

Hansika Gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

a

]is

an

ta
H
HanIsikIa_GIuptIa20I22[Iat]IisbI.edIu I/ HIansIikaI GuIptaI/HaInsiIka_IG

[at

/H

up
22

aG
20

du
ru

pta

b.e

sik
Gu

t]is

an

an
2[a
a_

/H

/H
sik

02

du
an

ta2

.ed

b.e
Business Analytics using Data Mining

up

]isb
ta/

t]is
_G
p

[at

2[a
Gu

ika

22

02
a

ns

20
sik

ta2
Professor Vandith Pamuru
/Ha

pta
an

up
Gu
ta
/H

_G
up

_
ika

ika
aG
du
b.e

ns

ns
Term 6
sik

Ha

Ha
t]is

an

ta/
ta
2[a

/H

up

up
02

aG

aG
du
ta2

Term 6: January 31 – March 03, 2022


b.e

sik

sik
up

t]is

an

an
G

2 [a
a_

/H

/H
sik

02

du
n

ta2

.ed

Post Graduate Programme in Management


Ha

b.e

2021-22
up

]isb

t]is
_G

[at

2[a
ika

22

02
ns

20

DISCLAIMER: The academic course pack contains copyrighted materials which are only
ta2
Ha

ta

meant to be downloaded by the authorized users for their course work. Please note that the
up

up

access is made available only to the duration of the course. Sharing of access with any
one (copying, forwarding, or other means) is a violation of copyright law and is strictly
_G

_G

prohibited.
ika

ika
ns

ns
Ha

Ha

1
a
]is

an

ta
H
HanIsikIa_GIuptIa20I22[Iat]IisbI.edIu I/ HIansIikaI GuIptaI/HaInsiIka_IG

[at

/H

up
22

aG
20

du
pta

b.e

sik
Business Analytics using Data Mining

Gu

t]is

an

an
2[a
a_

/H

/H
Table of Contents

sik

02

du
an

ta2

.ed

b.e
H

up

]isb
ta/
S.No Topic Page No.

t]is
_G
p

[at

2[a
1

Gu
A Predictive Analytics Primer 03

ika

22

02
a

ns

20
sik

ta2
2 Where predictive analytics is having the biggest impact 07

/Ha

pta
an

up
Gu
ta
/H

_G
up
3 Screening for Chronic Kidney Disease 13

_
ika

ika
aG
du

12 predictive analytics screw-ups


b.e

ns

ns
sik

4 Ha Link

Ha
t]is

an

https://www.predictiveanalyticsworld.com/machinelearningtimes/12-
/

ta/
ta
2[a

/H

predictive-analytics-screw-ups/2049/
up

up
5 Cluster Analysis for Segmentation 20
02

aG

aG
du
ta2

b.e

sik

sik

Amazon.com recommendations: item-to-item collaborative filtering


up

6 Link
t]is

an

an
G

https://www.cs.umd.edu/~samir/498/Amazon-Recommendations.pdf
2 [a
a_

/H

/H
sik

02

* Reference book at LRC for the book Data Mining for the book- Business Analytics: Concepts, Techniques, and Application
u

du
n

ta2

.ed

with XLMinerby Galit Shmueli, Nitin R. Pateland Peter C. Bruce.


Ha

b.e
up

]isb

t]is
_G

[at

2[a
ika

22

02
ns

20

ta2
Ha

ta
up

up
_G

_G
ika

ika
ns

ns
Ha

Ha

2
a
]is

an

ta
H
HanIsikIa_GIuptIa20I22[Iat]IisbI.edIu I/ HIansIikaI GuIptaI/HaInsiIka_IG

[at

/H

up
22

aG
20

du
pta

b.e

sik
Gu

t]is

an

an
2[a
a_

/H

/H
sik

02
ANALYTICS

du
A Predictive Analytics

an

ta2

.ed

b.e
H

up

]isb
ta/

t]is
Primer

_G
p

[at

2[a
Gu

ika

22

02
a

ns

20
by Thomas H. Davenport sik

ta2
SEPTEMBER 02, 2014

/Ha

pta
an

up
Gu
ta
/H

_G
No one has the ability to capture and analyze data from the future. However, there is a way to predict
up

_
the future using data from the past. It’s called predictive analytics, and organizations do it every day.

ika

ika
aG
du
b.e

ns

ns
Has your company, for example, developed a customer lifetime value (CLTV) measure? That’s using
sik

Ha

Ha
predictive analytics to determine how much a customer will buy from the company over time. Do
t]is

an

you have a “next best offer” or product recommendation capability? That’s an analytical prediction of
/

ta/
ta
2[a

/H

the product or service that your customer is most likely to buy next. Have you made a forecast of
up

up
next quarter’s sales? Used digital marketing models to determine what ad to place on what
02

aG

aG
du

publisher’s site? All of these are forms of predictive analytics.


ta2

b.e

sik

sik
up

Predictive analytics are gaining in popularity, but what do you—a manager, not an analyst—really
t]is

an

an

need to know in order to interpret results and make better decisions? How do your data scientists do
G

2 [a
a_

what they do? By understanding a few basics, you will feel more comfortable working with and
/H

/H

communicating with others in your organization about the results and recommendations from
sik

02

predictive analytics. The quantitative analysis isn’t magic—but it is normally done with a lot of past
u

du
n

ta2

.ed

data, a little statistical wizardry, and some important assumptions. Let’s talk about each of these.
Ha

b.e
up

]isb

t]is

The Data: Lack of good data is the most common barrier to organizations seeking to employ
_G

[at

predictive analytics. To make predictions about what customers will buy in the future, for example,
2[a
ika

you need to have good data on who they are buying (which may require a loyalty program, or at least
22

02

a lot of analysis of their credit cards), what they have bought in the past, the attributes of those
ns

20

ta2

products (attribute-based predictions are often more accurate than the “people who buy this also buy
Ha

ta

this” type of model), and perhaps some demographic attributes of the customer (age, gender,
up

up

residential location, socioeconomic status, etc.). If you have multiple channels or customer
_G

_G
ika

ika
ns

ns

COPYRIGHT © 2014 HARVARD BUSINESS SCHOOL PUBLISHING CORPORATION. ALL RIGHTS RESERVED. 2
Ha

Ha

Reproduced with permission from the Publisher for use only in “Business Analytics using Data Mining [Term 6_PGP]” taught by “Professor Vandith Pamuru” at Indian School of Business-Mohali scheduled on “January 31 – March 03, 2022

3
a
]is

an

ta
H
HanIsikIa_GIuptIa20I22[Iat]IisbI.edIu I/ HIansIikaI GuIptaI/HaInsiIka_IG

[at

/H

up
22
touchpoints, you need to make sure that they capture data on customer purchases in the same way

aG
20

du
your previous channels did.

pta

b.e

sik
All in all, it’s a fairly tough job to create a single customer data warehouse with unique customer IDs

Gu

t]is

an

an
on everyone, and all past purchases customers have made through all channels. If you’ve already

2[a
a_

/H

/H
done that, you’ve got an incredible asset for predictive customer analytics.

sik

02

du
The Statistics: Regression analysis in its various forms is the primary tool that organizations use for

an

ta2

.ed
predictive analytics. It works like this in general: An analyst hypothesizes that a set of independent

b.e
H

up

]isb
variables (say, gender, income, visits to a website) are statistically correlated with the purchase of a

ta/

t]is
product for a sample of customers. The analyst performs a regression analysis to see just how

_G
p

[at

2[a
correlated each variable is; this usually requires some iteration to find the right combination of

Gu

ika

22
variables and the best model. Let’s say that the analyst succeeds and finds that each variable in the

02
a
model is important in explaining the product purchase, and together the variables explain a lot of

ns

20
sik

ta2
variation in the product’s sales. Using that regression equation, the analyst can then use the

/Ha

pta
an

regression coefficients—the degree to which each variable affects the purchase behavior—to create a

up
Gu
score predicting the likelihood of the purchase.
ta
/H

_G
up

_
ika

ika
aG

Voila! You have created a predictive model for other customers who weren’t in the sample. All you
du

have to do is compute their score, and offer the product to them if their score exceeds a certain level.
b.e

ns

ns
sik

It’s quite likely that the high scoring customers will want to buy the product—assuming the analyst
Ha

Ha
t]is

did the statistical work well and that the data were of good quality.
an

ta/
ta
2[a

/H

up

up
The Assumptions: That brings us to the other key factor in any predictive model—the assumptions
02

that underlie it. Every model has them, and it’s important to know what they are and monitor
aG

aG
du
ta2

whether they are still true. The big assumption in predictive analytics is that the future will continue
b.e

sik

sik

to be like the past. As Charles Duhigg describes in his book The Power of Habit, people establish
up

t]is

strong patterns of behavior that they usually keep up over time. Sometimes, however, they change
an

an
G

those behaviors, and the models that were used to predict them may no longer be valid.
2 [a
a_

/H

/H
sik

02

What makes assumptions invalid? The most common reason is time. If your model was created
u

du
n

several years ago, it may no longer accurately predict current behavior. The greater the elapsed time,
ta2

.ed
Ha

b.e

the more likely customer behavior has changed. Some Netflix predictive models, for example, that
up

]isb

were created on early Internet users had to be retired because later Internet users were substantially
t]is
_G

different. The pioneers were more technically-focused and relatively young; later users were
[at

2[a

essentially everyone.
ika

22

02
ns

20

Another reason a predictive model’s assumptions may no longer be valid is if the analyst didn’t
ta2
Ha

ta

include a key variable in the model, and that variable has changed substantially over time. The great
up

up

—and scary—example here is the financial crisis of 2008-9, caused largely by invalid models
_G

_G

predicting how likely mortgage customers were to repay their loans. The models didn’t include the
possibility that housing prices might stop rising, and even that they might fall. When they did start
ika

ika
ns

ns

COPYRIGHT © 2014 HARVARD BUSINESS SCHOOL PUBLISHING CORPORATION. ALL RIGHTS RESERVED. 3
Ha

Ha

Reproduced with permission from the Publisher for use only in “Business Analytics using Data Mining [Term 6_PGP]” taught by “Professor Vandith Pamuru” at Indian School of Business-Mohali scheduled on “January 31 – March 03, 2022

4
a
]is

an

ta
H
HanIsikIa_GIuptIa20I22[Iat]IisbI.edIu I/ HIansIikaI GuIptaI/HaInsiIka_IG

[at

/H

up
22
falling, it turned out that the models became poor predictors of mortgage repayment. In essence, the

aG
20

du
fact that housing prices would always rise was a hidden assumption in the models.

pta

b.e

sik
Since faulty or obsolete assumptions can clearly bring down whole banks and even (nearly!) whole

Gu

t]is

an

an
economies, it’s pretty important that they be carefully examined. Managers should always ask

2[a
a_

/H

/H
analysts what the key assumptions are, and what would have to happen for them to no longer be
valid. And both managers and analysts should continually monitor the world to see if key factors

sik

02

du
involved in assumptions might have changed over time.

an

ta2

.ed

b.e
H

up

]isb
With these fundamentals in mind, here are a few good questions to ask your analysts:

ta/

t]is
_G
p

[at

2[a
• Can you tell me something about the source of data you used in your analysis?

Gu

ika

22
• Are you sure the sample data are representative of the population?

02
a
• Are there any outliers in your data distribution? How did they affect the results?

ns

20
sik

ta2
• What assumptions are behind your analysis?

/Ha

pta
an

• Are there any conditions that would make your assumptions invalid?

up
Gu
ta
/H

_G
up
Even with those cautions, it’s still pretty amazing that we can use analytics to predict the future. All

_
ika

ika
aG

we have to do is gather the right data, do the right type of statistical model, and be careful of our
du

assumptions. Analytical predictions may be harder to generate than those by the late-night television
b.e

ns

ns
sik

soothsayer Carnac the Magnificent, but they are usually considerably more accurate. Ha

Ha
t]is

an

ta/
ta
2[a

/H

Thomas H. Davenport is the president’s distinguished professor in management and information technology at Babson
up

up
College, and cofounder of the International Institute for Analytics. He also contributes to the MIT Initiative on the Digital
02

Economy as a fellow, and as a senior advisor to Deloitte Analytics. Author of over a dozen management books, his latest
aG

aG
du

is Only Humans Need Apply: Winners and Losers in the Age of Smart Machines.
ta2

b.e

sik

sik
up

t]is

an

an
G

2 [a
a_

/H

/H
sik

02

du
n

ta2

.ed
Ha

b.e
up

]isb

t]is
_G

[at

2[a
ika

22

02
ns

20

ta2
Ha

ta
up

up
_G

_G
ika

ika
ns

ns

COPYRIGHT © 2014 HARVARD BUSINESS SCHOOL PUBLISHING CORPORATION. ALL RIGHTS RESERVED. 4
Ha

Ha

Reproduced with permission from the Publisher for use only in “Business Analytics using Data Mining [Term 6_PGP]” taught by “Professor Vandith Pamuru” at Indian School of Business-Mohali scheduled on “January 31 – March 03, 2022

5
a
]is

an

ta
H
HanIsikIa_GIuptIa20I22[Iat]IisbI.edIu I/ HIansIikaI GuIptaI/HaInsiIka_IG

[at

/H

up
22

aG
20

du
Copyright 2014 Harvard Business Publishing. All Rights Reserved. Additional restrictions

pta

b.e

sik
may apply including the use of this content as assigned course material. Please consult your

Gu

t]is
institution's librarian about any restrictions that might apply under the license with your

an

an
institution. For more information and teaching resources from Harvard Business Publishing

2[a
a_

/H

/H
including Harvard Business School Cases, eLearning products, and business simulations
please visit hbsp.harvard.edu.

sik

02

du
an

ta2

.ed

b.e
H

up

]isb
ta/

t]is
_G
p

[at

2[a
Gu

ika

22

02
a

ns

20
sik

ta2
/Ha

pta
an

up
Gu
ta
/H

_G
up

_
ika

ika
aG
du
b.e

ns

ns
sik

Ha

Ha
t]is

an

ta/
ta
2[a

/H

up

up
02

aG

aG
du
ta2

b.e

sik

sik
up

t]is

an

an
G

2 [a
a_

/H

/H
sik

02

du
n

ta2

.ed
Ha

b.e
up

]isb

t]is
_G

[at

2[a
ika

22

02
ns

20

ta2
Ha

ta
up

up
_G

_G
ika

ika
ns

ns
Ha

Ha

Reproduced with permission from the Publisher for use only in “Business Analytics using Data Mining [Term 6_PGP]” taught by “Professor Vandith Pamuru” at Indian School of Business-Mohali scheduled on “January 31 – March 03, 2022

6
a
]is

an

ta
H
HanIsikIa_GIuptIa20I22[Iat]IisbI.edIu I/ HIansIikaI GuIptaI/HaInsiIka_IG

[at

/H

up
22

aG
20

du
pta

b.e

sik
Gu

t]is

an

an
2[a
a_

/H

/H
sik

02
ANALYTICS

du
Where Predictive

an

ta2

.ed

b.e
H

up

]isb
ta/

t]is
Analytics Is Having the

_G
p

[at

2[a
Gu

ika

22
Biggest Impact

02
a

ns

20
sik

ta2
/Ha

pta
an

up
by Jacob LaRiviere, Preston McAfee, Justin Rao, Vijay K. Narayanan and Walter Sun

Gu
ta
/H

_G
MAY 25, 2016
up

_
ika

ika
aG
du
b.e

ns

ns
sik

Ha

Ha
t]is

an

ta/
ta
2[a

/H

up

up
02

aG

aG
du
ta2

b.e

sik

sik
up

t]is

an

an
G

2 [a
a_

/H

/H
sik

02

du
n

ta2

.ed
Ha

b.e
up

]isb

t]is
_G

VINCENT TSUI FOR HBR


[at

2[a
ika

22

The big data revolution is upon us. Firms are scrambling to hire a new brand of analysts dubbed “data
02
ns

scientists,” and universities have responded to this demand by introducing data science courses into
20

ta2

degrees ranging from computer science to business. Survey-based reports find that firms are
Ha

ta

currently spending an estimated $36 billion on storage and infrastructure, and that is expected to
up

up

double by 2020.
_G

_G
ika

ika
ns

ns

COPYRIGHT © 2016 HARVARD BUSINESS SCHOOL PUBLISHING CORPORATION. ALL RIGHTS RESERVED. 2
Ha

Ha

Reproduced with permission from the Publisher for use only in “Business Analytics using Data Mining [Term 6_PGP]” taught by “Professor Vandith Pamuru” at Indian School of Business-Mohali scheduled on “January 31 – March 03, 2022

7
a
]is

an

ta
H
HanIsikIa_GIuptIa20I22[Iat]IisbI.edIu I/ HIansIikaI GuIptaI/HaInsiIka_IG

[at

/H

up
22
Once companies are logging and storing detailed data on all their customer engagements and internal

aG
20

du
processes, what’s next? Presumably, firms are investing in big data infrastructure because they

pta

b.e
believe that it offers a positive return on investment. However, looking at the surveys and consulting

sik
reports, it is unclear what the precise use cases are that will drive this positive ROI from big data.

Gu

t]is

an

an
2[a
a_

/H

/H
Our goal in this article is to offer specific, real-world case studies to show how big data has provided
value for companies that have worked with Microsoft’s analytics teams. These cases reveal the

sik

02

du
circumstances in which big data predictive analytics are likely to enable novel and high-value

an

ta2

.ed
solutions, and the situations where the gains are likely to be minimal.

b.e
H

up

]isb
ta/

t]is
Predicting demand. The first use case involves predicting demand for consumer products that are in

_G
p

[at

2[a
the “long tail” of consumption. Firms value accurate demand forecasts because inventory is

Gu

ika

22
expensive to keep on shelves and stockouts are detrimental to both short-term revenue and long-

02
a
term customer engagement. Aggregated total sales is a poor proxy because firms need to distribute

ns

20
sik

ta2
inventory geographically, necessitating hyperlocal forecasts. The traditional way of solving this

/Ha

pta
an

problem is using time-series econometrics with historical sales data. This method works well for

up
Gu
ta
popular products in large regions but tends to fail when data gets thin because random noise
/H

_G
up
overwhelms the underlying signal.

_
ika

ika
aG
du

A big data solution to this problem is to use anonymized and aggregated web search or sentiment
b.e

ns

ns
sik

data linked to each store’s location on top of the existing time-series data. Microsoft data scientists
Ha

Ha
t]is

have employed this approach to help a forecasting firm predict auto sales. Building models with web
an

/
search data as one of the inputs reduces mean absolute forecast error, a standard measure of
ta/
ta
2[a

/H

up

up
prediction accuracy, for monthly national sales predictions on the order of 40% from baseline for
02

auto makes with relatively small market shares, compared to traditional time-series models.
aG

aG
du
ta2

Although the gains were smaller for the most popular models at the national level, the relative
b.e

sik

sik

improvement increases as one drills down to the regional level.


up

t]is

an

an
G

In this case, the big data solution leverages the previously unused data point that people do a
2 [a
a_

/H

/H

considerable amount of social inquiry and research online before buying a car. The increased
sik

02

prediction accuracy, in turn, makes it possible to achieve large increases in operational efficiency
u

du
n

— having the right inventory in the right locations.


ta2

.ed
Ha

b.e
up

]isb

Anonymized web search data has proven to be helpful for other forecasts as well since online activity
t]is
_G

often is a good leading proxy for purchases and actions of the general public. Having the additional
[at

2[a

data is insufficient on its own. Processing search data and combining it with traditional sources is
ika

22

vital in creating a successful prediction: We found that raw search query volume is insufficient in
02
ns

20

parsing out the signals that correlate to true product demand.


ta2
Ha

ta
up

up

Being intelligent about which signals to draw from big data requires care, and best practices can be
_G

_G

case-specific. For example, single queries from a user might be less important than multiple queries
ika

ika
ns

ns

COPYRIGHT © 2016 HARVARD BUSINESS SCHOOL PUBLISHING CORPORATION. ALL RIGHTS RESERVED. 3
Ha

Ha

Reproduced with permission from the Publisher for use only in “Business Analytics using Data Mining [Term 6_PGP]” taught by “Professor Vandith Pamuru” at Indian School of Business-Mohali scheduled on “January 31 – March 03, 2022

8
a
]is

an

ta
H
HanIsikIa_GIuptIa20I22[Iat]IisbI.edIu I/ HIansIikaI GuIptaI/HaInsiIka_IG

[at

/H

up
22
from a user. Although we used search data in this case study, a firm could just as easily use the

aG
20

du
location of users visiting their website or link detailed sales data to a customer’s location.

pta

b.e

sik
Improved pricing. Using a single price is economically inefficient because part of the demand curve

Gu

t]is

an

an
that could be profitably served is priced out of the market. As a consequence, firms regularly offer

2[a
a_

/H

/H
targeted discounts, promotions, and segment-based pricing to target different consumers. E-
commerce websites have a distinct advantage in pursuing such an approach because they log

sik

02

du
detailed information on customer browsing, not just the goods they end up purchasing, and

an

ta2

.ed
aggressively adjust prices over time. These price adjustments are a form of experimentation and,

b.e
H

up

]isb
jointly with big data, allow firms to learn more about their customers’ price responsiveness.

ta/

t]is
_G
p

[at

2[a
Offline retailers can mimic e-commerce’s nuanced pricing strategies by tracking consumers through

Gu

ika

22
smartphone connectivity and logging which customers enter the store, what type of goods they look

02
a
at, and whether they make a purchase. Machine learning applied to this data can algorithmically

ns

20
sik

ta2
generate customer segments based on price responsiveness and preferences, which generally offers a

/Ha

pta
an

large improvement on traditional demographic-based targeting.

up
Gu
ta
/H

_G
up
Our experience with pricing advertising on the Bing search engine is that using big data can produce

_
ika

ika
aG

substantial gains by better matching advertisers to consumers. The success of algorithmic targeting
du

has been well documented and is a key driver of revenue in online advertising market. Advances in
b.e

ns

ns
sik

measurement technology increasingly allow offline firms to benefit from these types of gains through
Ha

Ha
t]is

more efficient pricing.


an

ta/
ta
2[a

/H

up

up
Predictive maintenance. Smoothly operating supply chains are vital for stable profits. Machine
02

downtime imposes a cost to firms due to forgone productivity and can be particularly disruptive in
aG

aG
du
ta2

both complex manufacturing supply chains and consumer products. Executives in asset-intensive
b.e

sik

sik

industries often state that the primary operational risk to their businesses is unexpected failures of
up

t]is

their assets. A wave of new data generated by the “internet of things” (IoT) can provide real-time
an

an
G

telemetry on detailed aspects of production processes. Machine-learning models trained on these


2 [a
a_

/H

/H

data allow firms to predict when different machines will fail.


sik

02

du
n

Airlines are particularly interested in predicting mechanical failures in advance so that they can
ta2

.ed
Ha

b.e

reduce flight delays or cancellations. Microsoft data scientists from the Cortana Intelligence Suite
up

]isb

team are able to predict the probability of aircrafts being delayed or canceled in the future based on
t]is
_G

relevant data sources, such as maintenance history and flight route information. A machine-learning
[at

2[a

solution based on historical data and applied in real time predicts the type of mechanical issue that
ika

22

will result in a delay or cancellation of a flight within the next 24 hours, allowing the airlines to take
02
ns

20

maintenance actions while the aircrafts are being serviced, thus preventing possible delays or
ta2
Ha

ta

cancellations.
up

up
_G

_G

Similar predictive-maintenance solutions are also built in other industries — for example, tracking
real-time telemetry data to predict the remaining useful life of an aircraft engine, using sensor data to
ika

ika
ns

ns

COPYRIGHT © 2016 HARVARD BUSINESS SCHOOL PUBLISHING CORPORATION. ALL RIGHTS RESERVED. 4
Ha

Ha

Reproduced with permission from the Publisher for use only in “Business Analytics using Data Mining [Term 6_PGP]” taught by “Professor Vandith Pamuru” at Indian School of Business-Mohali scheduled on “January 31 – March 03, 2022

9
a
]is

an

ta
H
HanIsikIa_GIuptIa20I22[Iat]IisbI.edIu I/ HIansIikaI GuIptaI/HaInsiIka_IG

[at

/H

up
22
predict the failure of an ATM cash withdrawal transaction, employing telemetry data to predict the

aG
20

du
failure of electric submersible pumps used to extract crude in the oil and gas industry, predicting the

pta

b.e
failures of circuit boards at early stages in the manufacturing process, predicting credit defaults, and

sik
forecasting energy demand in hyperlocal regions to predict the overload situations of energy grids.

Gu

t]is

an

an
Machine learning will make supply chains less brittle and reduce the effects of disruptions for many

2[a
a_

/H

/H
goods and services.

sik

02

du
These cases help highlight a few general principles:

an

ta2

.ed

b.e
H

up

]isb
• The value derived from the analytics piece can greatly exceed the cost of the infrastructure. This

ta/

t]is
indicates there will be strong growth in big data consulting services and specialized roles within

_G
p

[at

2[a
firms.

Gu

ika

22
• Big data is less about size and more about introducing fundamentally new information to

02
a
prediction and decision processes. This information matters most when existing data sources are

ns

20
sik

ta2
insufficient to provide accurate or actionable predictions — for example, due to small sample sizes

/Ha

pta
an

or coarseness of historical sales (small effective regions, niche products, new offerings, etc.).

up
Gu
ta
• The new information is often buried in detailed and relatively unstructured data logs (known as a
/H

_G
up
“data lake”), and techniques from computer science are needed to extract insights from it. To

_
ika

ika
aG

leverage big data, it is vital to have talented data engineers, statisticians, and behavioral scientists
du

working in tandem. “Data scientist” is often used to refer to someone who has these three skills,
b.e

ns

ns
sik

but in our experience single individuals rarely have all three. Ha

Ha
t]is

an

ta/
Radically new applications. The cases that we’ve discussed concern how big data can be employed to
ta
2[a

/H

up

up
improve existing processes (e.g., more-precise demand forecasts, better price sensitivity estimates,
02

better predictions of machine failure). But it also has the potential to be applied in ways that disrupt
aG

aG
du
ta2

existing processes. For example, machine-learning models taking massive data sets as inputs,
b.e

sik

sik

coupled with clever designs that account for patient histories, have to the potential to revolutionize
up

t]is

how certain diseases are diagnosed and treated. Another example involves matching distributed
an

an
G

electricity generation (e.g., solar panels on roofs) to localized electricity demand, unlocking huge
2 [a
a_

/H

/H

value by equating electricity supply and demand with more-efficient generation.


sik

02

du
n

The value described from predicting demand more accurately, better pricing, and predictive
ta2

.ed
Ha

b.e

maintenance are the specific use cases that easily justify large firms’ investments in big data
up

]isb

infrastructure and data science. These uses are likely to drive value of the same order of magnitude as
t]is
_G

the investments. The value of radically new applications is challenging to understand ex ante and
[at

2[a

speculative by nature. It is reasonable to expect losses for many firms, due to uncertain and higher
ika

22

risk investments, with a few firms earning spectacular profits.


02
ns

20

ta2
Ha

ta

Jacob LaRiviere is an economist at Microsoft Technology and Research, an adjunct professor at the University of
up

up

Tennessee, and an affiliate faculty member at the University of Washington.


_G

_G

Preston McAfee is a corporate vice president and the chief economist at Microsoft.
ika

ika
ns

ns

COPYRIGHT © 2016 HARVARD BUSINESS SCHOOL PUBLISHING CORPORATION. ALL RIGHTS RESERVED. 5
Ha

Ha

Reproduced with permission from the Publisher for use only in “Business Analytics using Data Mining [Term 6_PGP]” taught by “Professor Vandith Pamuru” at Indian School of Business-Mohali scheduled on “January 31 – March 03, 2022

10
a
]is

an

ta
H
HanIsikIa_GIuptIa20I22[Iat]IisbI.edIu I/ HIansIikaI GuIptaI/HaInsiIka_IG

[at

/H

up
22
Justin Rao is an economist at Microsoft Research and an affiliate faculty member at the University of Washington.

aG
20

du
pta

b.e

sik
Vijay K. Narayanan leads the Algorithms and Data Science Solutions unit of the Data Group at Microsoft.

Gu

t]is

an

an
Walter Sun is the founder of Bing Predicts and a partner data scientist at Microsoft. He is an affiliate faculty member of

2[a
a_

/H

/H
the University of Washington and an adjunct professor at Seattle University.

sik

02

du
an

ta2

.ed

b.e
H

up

]isb
ta/

t]is
_G
p

[at

2[a
Gu

ika

22

02
a

ns

20
sik

ta2
/Ha

pta
an

up
Gu
ta
/H

_G
up

_
ika

ika
aG
du
b.e

ns

ns
sik

Ha

Ha
t]is

an

ta/
ta
2[a

/H

up

up
02

aG

aG
du
ta2

b.e

sik

sik
up

t]is

an

an
G

2 [a
a_

/H

/H
sik

02

du
n

ta2

.ed
Ha

b.e
up

]isb

t]is
_G

[at

2[a
ika

22

02
ns

20

ta2
Ha

ta
up

up
_G

_G
ika

ika
ns

ns

COPYRIGHT © 2016 HARVARD BUSINESS SCHOOL PUBLISHING CORPORATION. ALL RIGHTS RESERVED. 6
Ha

Ha

Reproduced with permission from the Publisher for use only in “Business Analytics using Data Mining [Term 6_PGP]” taught by “Professor Vandith Pamuru” at Indian School of Business-Mohali scheduled on “January 31 – March 03, 2022

11
a
]is

an

ta
H
HanIsikIa_GIuptIa20I22[Iat]IisbI.edIu I/ HIansIikaI GuIptaI/HaInsiIka_IG

[at

/H

up
22

aG
20

du
Copyright 2016 Harvard Business Publishing. All Rights Reserved. Additional restrictions

pta

b.e

sik
may apply including the use of this content as assigned course material. Please consult your

Gu

t]is
institution's librarian about any restrictions that might apply under the license with your

an

an
institution. For more information and teaching resources from Harvard Business Publishing

2[a
a_

/H

/H
including Harvard Business School Cases, eLearning products, and business simulations
please visit hbsp.harvard.edu.

sik

02

du
an

ta2

.ed

b.e
H

up

]isb
ta/

t]is
_G
p

[at

2[a
Gu

ika

22

02
a

ns

20
sik

ta2
/Ha

pta
an

up
Gu
ta
/H

_G
up

_
ika

ika
aG
du
b.e

ns

ns
sik

Ha

Ha
t]is

an

ta/
ta
2[a

/H

up

up
02

aG

aG
du
ta2

b.e

sik

sik
up

t]is

an

an
G

2 [a
a_

/H

/H
sik

02

du
n

ta2

.ed
Ha

b.e
up

]isb

t]is
_G

[at

2[a
ika

22

02
ns

20

ta2
Ha

ta
up

up
_G

_G
ika

ika
ns

ns
Ha

Ha

Reproduced with permission from the Publisher for use only in “Business Analytics using Data Mining [Term 6_PGP]” taught by “Professor Vandith Pamuru” at Indian School of Business-Mohali scheduled on “January 31 – March 03, 2022

12
a
]is

an

ta
H
HanIsikIa_GIuptIa20I22[Iat]IisbI.edIu I/ HIansIikaI GuIptaI/HaInsiIka_IG

[at

/H

up
22

aG
20

du
UV0871

pta

b.e

sik
Gu

t]is

an

an
2[a
a_

/H

/H
SCREENING FOR CHRONIC KIDNEY DISEASE

sik

02

du
an

ta2

.ed

b.e
Chronic Kidney Disease (CKD) is a progressive condition that results in

up

]isb
significant morbidity and mortality. Because of the important role the kidneys

ta/

t]is
play in maintaining homeostasis, CKD can affect almost every body system.

_G
p

[at

2[a
Early recognition and intervention are essential to slowing disease progression,

Gu

ika
maintaining quality of life, and improving outcomes. Family physicians have the

22

02
opportunity to screen at-risk patients, identify affected patients, and ameliorate the
a

ns

20
sik
impact of CKD by initiating early therapy and monitoring disease progression.1

ta2
/Ha

pta
an

up
Gu
ta
/H

The purpose of this case is to create an easy-to-use screening tool to identify patients at

_G
up
risk for CKD. Despite the wide availability and low cost of a test for CKD based on one or more

_
ika

ika
blood samples, studies have shown that many in the at-risk population have not been tested. One
aG
du

reason for this is that awareness of CKD is low. Given the proven benefits of early detection and
b.e

ns

ns
treatment, the need for some kind of screening tool is clear. Although there is no reason to test
sik

Ha

Ha
everyone, those patients with a high enough probability of having CKD should be tested. The
t]is

an

purpose of this case is to see if those high-risk patients can be identified using easily obtainable
/

ta/
ta
2[a

patient data.
/H

up

up
02

aG

aG
du

The Study Population


ta2

b.e

sik

sik
up

Since 1975, the National Center for Health Statistics of the Centers for Disease Control
t]is

an

an

and Prevention has conducted nationwide surveys of U.S. adults. Using trained personnel, the
G

center collected a wide variety of demographic and health information using direct interviews,
2 [a
a_

/H

/H

examinations, and blood samples. The data set consists of selected information from 8,819 adults
sik

02

20 years of age or older taken from the 1999–2000 and 2001–2002 surveys. The sample subjects
u

du

were randomly divided into two pools: a 6,000-case training set and a 2,819-case validation
n

ta2

.ed
Ha

b.e

sample.
up

]isb

t]is
_G

1
Catherine S. Snively, MD, and Cecilia Gutierrez, MD, “Chronic Kidney Disease: Prevention and Treatment of
[at

2[a

Common Complications,” American Family Physician, 70 (10), (November 2004).


ika

22

02
ns

20

This case was prepared by Professor Phillip E. Pfeifer and Professor Heejung Bang (Weill Cornell Medical
ta2

College). It was written as a basis for class discussion rather than to illustrate effective or ineffective handling of an
Ha

ta

administrative situation. Copyright © 2007 by the University of Virginia Darden School Foundation, Charlottesville,
up

up

VA. All rights reserved. To order copies, send an e-mail to sales@dardenbusinesspublishing.com. No part of this
publication may be reproduced, stored in a retrieval system, used in a spreadsheet, or transmitted in any form or by
_G

_G

any means—electronic, mechanical, photocopying, recording, or otherwise—without the permission of the Darden
School Foundation.
ika

ika
ns

ns
Ha

Ha

This document is authorized for use only in Professor Vandith Pamuru's Business Analytics using Data Mining[PGP] at Indian School of Business (ISB) from Dec 2021 to Mar 2022.

13
a
]is

an

ta
H
HanIsikIa_GIuptIa20I22[Iat]IisbI.edIu I/ HIansIikaI GuIptaI/HaInsiIka_IG

[at

/H

up
22

aG
20

du
-2- UV0871

pta

b.e

sik
A test for CKD was administered to everyone in the study population.2 The variable of

Gu

t]is

an

an
interest is CDK, a 0/1 dummy variable indicating whether or not the subject had CKD. Exhibit 1

2[a
a_

/H

/H
defines the 34 variables in the data set. Notice that variables in columns A through J are
demographic in nature, K through V were collected during the physical exam, and W through

sik

02
AH are based, in part, on self-reported health histories.

du
an

ta2

.ed

b.e
H

up

]isb
The Causes of CKD3

ta/

t]is
_G
p

[at

2[a
The two main causes of chronic kidney disease are diabetes and high blood pressure,

Gu

ika
which are responsible for up to two-thirds of the cases. Diabetes happens when your blood sugar

22

02
is too high, causing damage to many organs in your body, including the kidneys and heart, as
a

ns

20
sik
well as blood vessels, nerves, and eyes. High blood pressure, or hypertension, occurs when the

ta2
/Ha

pta
pressure of your blood against the walls of your blood vessels increases. If uncontrolled, or
an

up
poorly controlled, high blood pressure can be a leading cause of heart attacks, strokes, and

Gu
ta
/H

chronic kidney disease. Also, chronic kidney disease can cause high blood pressure.

_G
up

_
ika

ika
Other conditions that affect the kidneys are:
aG
du
b.e

ns

ns
• Glomerulonephritis, a group of diseases that cause inflammation and damage to the
sik

kidney’s filtering units. These disorders are the third most common type of kidney
Ha

Ha
t]is

disease.
an

ta/
ta
2[a

/H

• Inherited diseases, such as polycystic kidney disease, which causes large cysts to form in
up

up
the kidneys and damage the surrounding tissue.
02

aG

aG
du

• Malformations that occur as a baby develops in its mother’s womb. For example, a
ta2

narrowing may occur that prevents normal outflow of urine and causes urine to flow back
b.e

sik

sik

up to the kidney. This causes infections and may damage the kidneys.
up

t]is

an

an

• Lupus and other diseases that affect the body’s immune system.
G

2 [a
a_

/H

/H

• Obstructions caused by problems like kidney stones, tumors, or an enlarged prostate


gland in men.
sik

02

du
n

Repeated urinary infections.


ta2

.ed


Ha

b.e
up

]isb

t]is
_G

[at

2[a
ika

22

02
ns

20

2
The test used a formula to estimate glomerular filtration rate based on measured serum creatinine
ta2

concentration, age, gender, and race. CKD was defined as estimated filtration rate less than 60 ml/min/1.73 m2. For
Ha

ta

details, see Heejung Bang, David A. Shoham, Philip J. Klemmer, Ronald J. Falk, Madhu Mazumdar, Debbie
up

up

Gipson, Romulo E. Colindres, and Abhijit V. Kshirsagar, “SCreening for Occult Renal Disease (SCORED): A
Simple Prediction Model for Chronic Kidney Disease,” Archives of Internal Medicine, 2007.
_G

_G

3
This section is excerpted from the National Kidney Foundation Web site (www.kidney.org), © 2007, National
Kidney Foundation, Inc., 30 East 33rd Street, New York, NY 10016.
ika

ika
ns

ns
Ha

Ha

This document is authorized for use only in Professor Vandith Pamuru's Business Analytics using Data Mining[PGP] at Indian School of Business (ISB) from Dec 2021 to Mar 2022.

14
a
]is

an

ta
H
HanIsikIa_GIuptIa20I22[Iat]IisbI.edIu I/ HIansIikaI GuIptaI/HaInsiIka_IG

[at

/H

up
22

aG
20

du
-3- UV0871

pta

b.e

sik
Who Is at Risk?4

Gu

t]is

an

an
2[a
a_

/H

/H
While anyone at any age can develop chronic kidney disease (CKD), a number of risk
factors have been identified that may lead to possible problems with your kidneys. These

sik

02
include:

du
an

ta2

.ed

b.e
• Diabetes. Diabetes is the leading cause of CKD. If you have diabetes, talk with your

up

]isb
doctor about how to keep your blood glucose as close to normal as possible to ensure

ta/

t]is
your diabetes is under control.

_G
p

[at

2[a
Gu
• Hypertension. Hypertension, also called high blood pressure, is the second-highest cause

ika

22
of CKD. Keep your blood pressure under control. A number of effective medications are

02
a

ns

20
available to help you with this task. Your doctor will help you to determine which
sik

ta2
medication is right for you.

/Ha

pta
an

up
• Cardiovascular disease. In addition to hypertension, other diseases of the heart and

Gu
ta
/H

blood vessels may increase your risk for kidney disease. People who have had heart

_G
up
attacks or strokes, congestive heart failure, coronary artery disease, or peripheral vascular

_
disease need to be monitored carefully for kidney problems.

ika

ika
aG
du

• Family history of kidney disease. Some kidney diseases are genetic. People with a
b.e

ns

ns
sik

mother, father, brother, or sister who has had a kidney disease are more likely to develop
Ha

Ha
t]is

problems with their kidneys.


an

Age. People 60 years and older are at a higher risk for developing CKD.
ta/
ta
2[a


/H

up

up
• Race. People belonging to certain ethnic groups, such as First Nations (Canadian
02

aboriginal peoples) and Pacific Islanders, are at a higher risk for developing this disease.
aG

aG
du
ta2

b.e

sik

sik
up

The Challenge
t]is

an

an
G

2 [a
a_

The list of risk factors above is a reflection of the results of several separate studies. What
/H

/H

we want to do is figure out how to combine all the possible risk factors to measure the overall
sik

02

risk faced by the study subjects.


u

du
n

ta2

.ed
Ha

b.e

The 34 variables in the data set are all easily obtained by a family physician during
up

]isb

routine checkups. Only the cholesterol measurements and the hemoglobin count (used to help
t]is

define anemia) require blood tests. The challenge is to come up with some kind of way to use the
_G

[at

first 33 variables to predict the 34th. The idea would be to create something very simple (like the
2[a
ika

quizzes you see in popular magazines, for example) that would identify subjects at risk of having
22

02

CKD. The high-risk subjects would then be encouraged to have their serum creatinine levels
ns

20

checked and/or undergo a complete urinalysis. The challenge here is strictly one of prediction.
ta2
Ha

ta

The variables used need not cause CKD. They need only be indicators of the presence of CKD.
up

up
_G

_G

4
This section was excerpted from the Web site of the government of British Columbia on 18 June 2007.
(www.gov.bc.ca), © 2001, Province of British Columbia.
ika

ika
ns

ns
Ha

Ha

This document is authorized for use only in Professor Vandith Pamuru's Business Analytics using Data Mining[PGP] at Indian School of Business (ISB) from Dec 2021 to Mar 2022.

15
a
]is

an

ta
H
HanIsikIa_GIuptIa20I22[Iat]IisbI.edIu I/ HIansIikaI GuIptaI/HaInsiIka_IG

[at

/H

up
22

aG
20

du
-4- UV0871

pta

b.e

sik
It is also important to note that the study population is not a random sample of U.S.

Gu

t]is

an

an
adults. That means that our predictions will not apply directly to the U.S. population and should

2[a
a_

/H

/H
not be used for actual decision-making.

sik

02
To get us started, Exhibit 2 reports summary statistics for the 6,000-subject training set

du
an

ta2

.ed
for each of the numerical variables. These statistics are reported for those with and without CKD.

b.e
A T-statistic to test the equality of the means for the two groups is also reported. Of the 11

up

]isb
numerically scaled variables, age is the most significant predictor of CKD with the average age

ta/

t]is
of those with CKD being 73 compared to 47 for those without CKD.

_G
p

[at

2[a
Gu

ika
For categorical variables, a chi-squared test of association is appropriate. Exhibit 3

22

02
reports the cross tabulation counts as well as the calculated chi-squared statistics. Remember, the
a

ns

20
sik
degrees of freedom associated with each of these chi-squares depend on the number of categories

ta2
/Ha

pta
taken on by each variable. Remember also that subjects with missing values have been ignored
an

up
when constructing Exhibits 2 and 3. The most significant predictor of CKD from among the

Gu
ta
/H

categorical variables is hypertension. Of those with hypertension, 15.5% had CKD compared to

_G
up
2.7% of those without hypertension. It also appears Hispanics are under-represented in the CKD

_
ika

ika
population and whites are over-represented. It also appears that those who list “noplace” as
aG
du

where they get their health care are very unlikely to have CKD.
b.e

ns

ns
sik

Ha

Ha
t]is

an

ta/
ta
2[a

/H

up

up
02

aG

aG
du
ta2

b.e

sik

sik
up

t]is

an

an
G

2 [a
a_

/H

/H
sik

02

du
n

ta2

.ed
Ha

b.e
up

]isb

t]is
_G

[at

2[a
ika

22

02
ns

20

ta2
Ha

ta
up

up
_G

_G
ika

ika
ns

ns
Ha

Ha

This document is authorized for use only in Professor Vandith Pamuru's Business Analytics using Data Mining[PGP] at Indian School of Business (ISB) from Dec 2021 to Mar 2022.

16
a
]is

an

ta
H
HanIsikIa_GIuptIa20I22[Iat]IisbI.edIu I/ HIansIikaI GuIptaI/HaInsiIka_IG

[at

/H

up
22

aG
20

du
-5- UV0871

pta

b.e

sik
Exhibit 1

Gu

t]is

an

an
SCREENING FOR CHRONIC KIDNEY DISEASE

2[a
a_

/H

/H
Variable Definitions

sik

02

du
an

ta2

.ed

b.e
Col. Variable Definition

up

]isb
A ID Identification number

ta/

t]is
B Age Age (years)

_G
p

[at
C Female 1 if female

2[a
Gu
D Racegrp Self-reported race/ethnic group (white, black, Hispanic, other)

ika

22
E Educ 1 if more than high school

02
a

ns

20
F Unmarried 1 if unmarried sik

ta2
G Income 1 if household income is above the median

/Ha

pta
H CareSource Self-reported source of medical care (Dr./HMO, clinic, noplace, other)
an

up
I Insured 1 if covered by health insurance.

Gu
ta
/H

J Weight Weight (kg)

_G
up
K Height Height (cm)

_
ika

ika
aG

Body mass index (kg/m2)


du

L BMI
M Obese 1 if BMI is greater than 30 kg/m2
b.e

ns

ns
sik

N Waist Waist circumference (cm) Ha

Ha
O SBP Systolic blood pressure (max)
t]is

an

P DBP Diastolic blood pressure (min)


/

ta/
ta
2[a

Q HDL (mg/dL) the "good" cholesterol


/H

R LDL (mg/dL) the "bad" cholesterol


up

up
02

S Total Chol (mg/dL) the sum of good and bad cholesterol


aG

aG
du

T Dyslipidemia Too high LDL or too low HDL


ta2

U PVD Peripheral vascular disease reflected by reduced SBP at the leg relative to the arm.
b.e

sik

sik
up

Mostly sit (1); stand or walk a lot (2); lift light loads or climb stairs often (3);
V Activity
t]is

an

an

heavy work and heavy loads (4).


G

2 [a
a_

/H

/H

W Poor Vision Self-reported poor vision


X Smoker Smoked at least 100 cigarettes.
sik

02

Y Hypertension The presence of at least one of four indicators of high blood pressure.
u

du
n

Z Fam Hypertension Family history of hypertension (high blood pressure)


ta2

.ed
Ha

b.e

AA Diabetes Self-reported physician diagnosed or lab test result


AB Fam Diabetes Family history of diabetes
up

]isb

t]is

AC Stroke Self-reported response to "Has a doctor ever told you that you had a stroke?"
_G

Response to "Has a doctor ever told you that you had angina pectoris,
[at

2[a

AD CVD
myocardial infarction, or stroke?"
ika

22

AE Fam CVD Family history of cardiovascular disease


02
ns

20

Self-reported response to "Has a doctor ever told you that you


AF CHF
ta2

had congestive heart failure?"


Ha

ta

Treatment for anemia received in past 3 months


up

up

AG Anemia
or hemoglobin at exam lower than 11g/dL
_G

_G

AH CKD Chronic kidney disease as indicated by measured serum creatinine.


ika

ika
ns

ns
Ha

Ha

This document is authorized for use only in Professor Vandith Pamuru's Business Analytics using Data Mining[PGP] at Indian School of Business (ISB) from Dec 2021 to Mar 2022.

17
a
]is

an

ta
H
HanIsikIa_GIuptIa20I22[Iat]IisbI.edIu I/ HIansIikaI GuIptaI/HaInsiIka_IG

[at

/H

up
22

aG
20

du
-6- UV0871

pta

b.e

sik
Exhibit 2

Gu

t]is

an

an
SCREENING FOR CHRONIC KIDNEY DISEASE

2[a
a_

/H

/H
Descriptive Statistics for Numerically Scaled Variables

sik

02
(training-set data broken out by CKD groups)

du
an

ta2

.ed

b.e
H

up

]isb
ta/

t]is
CKD=0 CKD=1

_G
Average Std Dev Count Average Std Dev Count T-stat

[at

2[a
Gu
Age 47.15 17.90 5536 73.05 11.71 464 -43.56

ika

22
Weight 79.17 19.60 5432 77.74 19.25 435 1.49

02
a

ns
Height 167.25 10.12 5433 165.29 10.41 428 3.77

20
sik

ta2
BMI 28.24 6.22 5377 28.35 5.98 417 -0.36

/Ha

pta
Waist 96.54 15.24 5365 100.10 14.44 420 -4.85
an

up
SBP 124.27 20.14 5352 141.47 25.28 442 -13.94

Gu
ta
/H

_G
DBP 71.86 12.24 5318 67.73 14.28 430 5.83
up

_
HDL 51.97 15.79 5529 50.08 16.18 463 2.41

ika

ika
aG
du

LDL 151.85 42.46 5529 157.20 44.02 463 -2.52


Total Chol 203.82 42.04 5531 207.28 44.98 463 -1.60
b.e

ns

ns
sik

Activity 2.06 0.82 5530 1.69 Ha 0.67 462 11.11

Ha
t]is

an

ta/
ta
2[a

/H

up

up
02

aG

aG
du
ta2

b.e

sik

sik
up

t]is

an

an
G

2 [a
a_

/H

/H
sik

02

du
n

ta2

.ed
Ha

b.e
up

]isb

t]is
_G

[at

2[a
ika

22

02
ns

20

ta2
Ha

ta
up

up
_G

_G
ika

ika
ns

ns
Ha

Ha

This document is authorized for use only in Professor Vandith Pamuru's Business Analytics using Data Mining[PGP] at Indian School of Business (ISB) from Dec 2021 to Mar 2022.

18
a
]is

an

ta
H
HanIsikIa_GIuptIa20I22[Iat]IisbI.edIu I/ HIansIikaI GuIptaI/HaInsiIka_IG

[at

/H

up
22

aG
20

du
-7- UV0871

pta

b.e

sik
Exhibit 3

Gu

t]is

an

an
SCREENING FOR CHRONIC KIDNEY DISEASE

2[a
a_

/H

/H
CrossTabs for Categorical Variables

sik

02
Training Set Data

du
an

ta2

.ed

b.e
H

up

]isb
Variable=0 Variable=1

ta/

t]is
Variable CKD=0 CKD=1 %1s CKD=0 CKD=1 %1s Chi-square

_G
p

[at
Female 2655 210 7.3%* 2881 254 8.1% 1.3

2[a
Gu

ika
Educ 3064 308 9.1% 2458 155 5.9% 21.1

22

02
Unmarried 3335 227 6.4% 1926 211 9.9% 23.1

ns

20
Income 2723 293sik 9.7% 2088 104 4.7% 44.5

ta2
Insured 1137 17 1.5% 4329 439 9.2% 78.2

/Ha

pta
an

Obese 3708 281 7.0% 1669 136 7.5% 0.4

up
Dyslipidemia 4951 414 7.7% 585 50 7.9% 0.0

Gu
ta
/H

_G
PVD 5379 395 6.8% 157 69 30.5% 171.1
up

_
Poor Vision 4932 355 6.7% 277 60 17.8% 57.0

ika

ika
aG
du

Smoker 3902 273 6.5% 1634 191 10.5% 27.4


Hypertension 3476 97 2.7% 2007 367 15.5% 322.0
b.e

ns

ns
Fam Hypertension 4231 388 8.4% 1305 76 5.5% 12.5
sik

Ha

Ha
Diabetes 4998 334 6.3% 537 130 19.5% 145.3
t]is

an

Fam Diabetes 3829 307 7.4% 1707 157 8.4% 1.8


/

ta/
ta

Stroke 5403 404 7.0% 128 59 31.6% 153.7


2[a

/H

CVD 5249 348 6.2% 275 115 29.5% 276.7


up

up
02

Fam CVD 3469 306 8.1% 1824 118 6.1% 7.7


aG

aG
du

CHF 5401 404 7.0% 113 56 33.1% 158.3


ta2

Anemia 5434 442 7.5% 99 22 18.2% 18.9


b.e

sik

sik
up

t]is

*Read: Of the subjects who were not female, 7.3% (210) had CKD. Of the females, 8.1% (254) had CKD.
an

an
G

2 [a
a_

/H

/H
sik

02

Racegrp CKD=0 CKD=1 Total CareSource CKD=0 CKD=1 Total


u

du
n

ta2

.ed

Black 1001 77 1078 Vlinhth 1169 100 1269


Ha

b.e

Hispa 1688 70 1758 Dr./HMO 3156 326 3482


up

]isb

Other 184 6 190 Noplace 911 14 925


t]is

White 2663 311 2974 Other 298 24 322


_G

[at

2[a

Total 5536 464 6000 Total 5534 464 5998


ika

Chi-square 71.7 Chi-Square 63.2


22

02
ns

20

ta2
Ha

ta
up

up
_G

_G
ika

ika
ns

ns
Ha

Ha

This document is authorized for use only in Professor Vandith Pamuru's Business Analytics using Data Mining[PGP] at Indian School of Business (ISB) from Dec 2021 to Mar 2022.

19
a
]is

an

ta
H
HanIsikIa_GIuptIa20I22[Iat]IisbI.edIu I/ HIansIikaI GuIptaI/HaInsiIka_IG

[at

/H

up
22

aG
20

du
UV0745

pta

b.e
Rev. Mar. 28, 2018

sik
Gu

t]is

an

an
2[a
a_

/H

/H
sik

02
Cluster Analysis for Segmentation

du
an

ta2

.ed

b.e
H

up

]isb
ta/

t]is
_G
Introduction

[at

2[a
Gu

ika
We all understand that consumers are not all alike. This provides a challenge for the development and

22

02
marketing of profitable products and services. Not every offering will be right for every customer, nor will
a

ns

20
every customer be equally responsive to your marketing efforts. Segmentation is a way of organizing
sik

ta2
/Ha
customers into groups with similar traits, product preferences, or expectations. Once segments are identified,

pta
an

marketing messages and in many cases even products can be customized for each segment. The better the

up
segment(s) chosen for targeting by a particular organization, the more successful the organization is assumed

Gu
ta
/H

_G
to be in the marketplace. Since its introduction in the late 1950s, market segmentation has become a central
up

_
concept of marketing practice.

ika

ika
aG
du

Segments are constructed on the basis of customers’ (1) demographic characteristics, (2) psychographics,
b.e

ns

ns
(3) desired benefits from products/services, and (4) past-purchase and product-use behaviors. These days,
sik

most firms possess rich information about customers’ actual purchase behavior, geodemographic, and
Ha

Ha
t]is

an

psychographic characteristics. In cases where firms do not have access to detailed information about each
/

ta/
customer, information from surveys of a representative sample of the customers can be used as the basis for
ta
2[a

/H

segmentation.
up

up
02

aG

aG
du

An Example
ta2

b.e

sik

sik
up

Consider Geico, an auto insurance company. Suppose Geico hypothetically plans to customize its auto
t]is

insurance offerings and needs to understand what its customers view as important from their insurance
an

an
G

provider. Geico can ask its customers to rate how important the following two attributes are to them when
2 [a
a_

/H

/H

considering the type of auto insurance they would use:


sik

02

 savings on premium
u

du
n

ta2

.ed

 existence of a neighborhood agent


Ha

b.e
up

]isb

The importance of the attributes is measured using a seven-point Likert-type scale, where a rating of one
t]is

represents not important and seven represents very important. Unless every respondent who is surveyed gives
_G

identical ratings, the data will contain variations that you can use to cluster or group respondents together, and
[at

2[a

such clusters are the segments. The groupings of customers are most similar to each other if they are part of
ika

22

the same segment and most different from each other if they are part of different segments. By inference,
02
ns

20

ta2
Ha

ta
up

up

This technical note was prepared by Rajkumar Venkatesan, Associate Professor of Business Administration. Copyright  2007 by the University of
Virginia Darden School Foundation, Charlottesville, VA. All rights reserved. To order copies, send an email to sales@dardenbusinesspublishing.com. No
_G

_G

part of this publication may be reproduced, stored in a retrieval system, used in a spreadsheet, or transmitted in any form or by any means—electronic, mechanical, photocopying,
recording, or otherwise—without the permission of the Darden School Foundation. Our goal is to publish materials of the highest quality, so please submit any
ika

ika

errata to editorial@dardenbusinesspublishing.com.
ns

ns
Ha

Ha

This document is authorized for use only in Professor Vandith Pamuru's Business Analytics using Data Mining[PGP] at Indian School of Business (ISB) from Dec 2021 to Mar 2022.

20
a
]is

an

ta
H
HanIsikIa_GIuptIa20I22[Iat]IisbI.edIu I/ HIansIikaI GuIptaI/HaInsiIka_IG

[at

/H

up
22

aG
20

du
Page 2 UV0745

pta

b.e

sik
Gu

t]is
then, actions taken toward customers in the same segment should lead to similar responses, and actions taken

an

an
toward customers in different segments should lead to different responses.

2[a
a_

/H

/H
Another way of saying this is that the aspects of auto insurance that are important to any given customer

sik

02
in one segment will also be important to other customers in that same segment. Furthermore, those aspects

du
an
that are important to that customer will be different from what is important to a customer in a different

ta2

.ed

b.e
segment. Figure 1 shows what the analysis in this example might look like:

up

]isb
ta/

t]is
Figure 1. Segmentation of Geico customers.

_G
p

[at

2[a
Gu

ika
Premium Savings

22
Very Important

02
a

ns

20
sik

ta2
/Ha

pta
Segment A Segment C
an

up
(49%) (15%)

Gu
ta
/H

_G
up
Agent Not

_
Agent Very

ika

ika
Important
aG
du

Important
b.e

ns

ns
sik

Segment B
Ha

Ha
t]is

(36%)
an

ta/
ta
2[a

/H

up

up
02

Premium
aG

aG
du

Savings Not
ta2

Important
b.e

sik

sik
up

Source: All figures created by case writer, unless otherwise noted.


t]is

an

an
G

The analysis shows three distinct segments. The majority of Geico’s customers (Segment A, 49%) prefer
2 [a
a_

/H

/H

savings on their premium, and they do not prefer having a neighborhood agent. Customers who belong to
sik

Segment B (about 36%) prefer having a neighborhood agent and premium savings is not important to them.
02

du

Some customers (Segment C, 15%) prefer both the savings on their premium as well as a neighborhood
n

ta2

.ed

agent. This analysis shows that Geico can benefit by adding an offline channel (i.e., developing a network of
Ha

b.e

neighborhood agents) to serve Segment B and also charge a higher premium to them for providing this
up

]isb

convenience. Of course, the caveat is the increased competition with other insurance providers, such as
t]is

Allstate and State Farm, who already provide this service.


_G

[at

2[a
ika

22

Cluster Analysis
02
ns

20

ta2

Cluster analysis is a class of statistical techniques that can be applied to data that exhibit natural
Ha

ta

groupings. Cluster analysis makes no distinction between dependent and independent variables. The entire set
up

up

of interdependent relationships is examined. Cluster analysis sorts through the raw data on customers and
groups them into clusters. A cluster is a group of relatively homogeneous customers. Customers who belong to
_G

_G

the same cluster are similar to each other. They are also dissimilar to customers outside the cluster,
ika

ika
ns

ns
Ha

Ha

This document is authorized for use only in Professor Vandith Pamuru's Business Analytics using Data Mining[PGP] at Indian School of Business (ISB) from Dec 2021 to Mar 2022.

21
a
]is

an

ta
H
HanIsikIa_GIuptIa20I22[Iat]IisbI.edIu I/ HIansIikaI GuIptaI/HaInsiIka_IG

[at

/H

up
22

aG
20

du
Page 3 UV0745

pta

b.e

sik
Gu

t]is
particularly customers in other clusters. The primary input for cluster analysis is a measure of similarity

an

an
between customers, such as correlation coefficients, distance measures, and association coefficients.

2[a
a_

/H

/H
The following are the basic steps involved in cluster analysis:

sik

02

du
1. Formulate the problem—select the variables you want to use as the basis for clustering.

an

ta2

.ed

b.e
2. Compute the distance between customers along the selected variables.

up

]isb
ta/
3. Apply the clustering procedure to the distance measures.

t]is
_G
p

[at
4. Decide on the number of clusters.

2[a
Gu

ika
5. Map and interpret clusters—draw conclusions—illustrative techniques like perceptual maps are

22

02
useful.
a

ns

20
sik

ta2
/Ha

pta
an
Distance Measures

up
Gu
ta
/H

_G
The main input into any cluster analysis procedure is a measure of distance between individuals who are
up
being clustered. The objective of a distance measure is to quantify the difference between two individuals on

_
ika

ika
aG

the variables you are using for the segmentation. A shorter (longer) distance between two individuals would
du

imply they have similar (dissimilar) preferences on the segmentation variables. Distance between two
b.e

ns

ns
individuals is obtained through a measure called Euclidean distance. If two individuals, Joe and Sam, are being
sik

clustered on the basis of n variables, then the Euclidean distance between Joe and Sam is represented as:
Ha

Ha
t]is

an

x  x Sam,1   ...  x Joe,n  x Sam,n 


/

ta/
2 2
ta
2[a

/H

Joe,1
Euclidean distance =
up

up
02

where:
aG

aG
du
ta2

b.e

xJoe,1 = the value of Joe along variable 1, and


sik

sik
up

XSam,1 = the value of Sam along variable 1.


t]is

an

an
G

A pairwise distance matrix among individuals who are being clustered can be created using the Euclidean
2 [a
a_

/H

/H

distance measure. Extending the preceding example, consider three individuals—Joe, Sam, and Sara—who
sik

are being clustered based on their preference for Premium Savings and a Neighborhood Agent. The
02

du

importance ratings on these two attributes for Joe, Sam, and Sara are shown in Table 1.
n

ta2

.ed
Ha

b.e

Table 1. Sample data for cluster analysis.


up

]isb

t]is

Individual Name Importance Score


_G

Premium Savings Neighborhood Agent


[at

2[a

Joe 4 7
ika

22

Sam 3 4
02
ns

Sara 5 3
20

ta2
Ha

ta
up

up
_G

_G
ika

ika
ns

ns
Ha

Ha

This document is authorized for use only in Professor Vandith Pamuru's Business Analytics using Data Mining[PGP] at Indian School of Business (ISB) from Dec 2021 to Mar 2022.

22
a
]is

an

ta
H
HanIsikIa_GIuptIa20I22[Iat]IisbI.edIu I/ HIansIikaI GuIptaI/HaInsiIka_IG

[at

/H

up
22

aG
20

du
Page 4 UV0745

pta

b.e

sik
Gu

t]is
The Euclidean distance between Joe and Sam is obtained as:

an

an
2[a
a_

/H

/H
Euclidean distance (Joe, Sam) =
4  32  7  42 = 3.2.

sik

02

du
an
The first term in this Euclidean distance measure is the squared difference between Joe and Sam on the

ta2

.ed

b.e
importance score for Premium Savings, and the second term is the squared difference between them on the

H
importance score for Neighborhood Agent. The Euclidean distances are then computed for each pairwise

up

]isb
ta/

t]is
combination of the three individuals being clustered to obtain a pairwise distance matrix. The pairwise

_G
distance matrix for Joe, Sam, and Sara is shown in Table 2.

[at

2[a
Gu

ika

22
Table 2. Pairwise distance matrix.

02
a

ns

20
sik Joe Sam Sara

ta2
Joe 0 3.2 4.1

/Ha

pta
Sam 0 2.2
an

up
Sara 0

Gu
ta
/H

_G
up
The distance between Joe and Sam is 3.2, as shown in Table 2. This pairwise distance matrix is then provided

_
as an input to a clustering algorithm.

ika

ika
aG
du
b.e

ns

ns
sik

K-Means Clustering Algorithm Ha

Ha
t]is

an

K-means clustering belongs to the nonhierarchical class of clustering algorithms. It is one of the more
/

ta/
ta
2[a

/H

popular algorithms used for clustering in practice because of its simplicity and speed. It is considered to be
up

up
more robust to different types of variables, is more appropriate for large datasets that are common in
02

marketing, and is less sensitive to some customers who are outliers (in other words, extremely different from
aG

aG
du

others).
ta2

b.e

sik

sik

For K-means clustering, the user has to specify the number of clusters required before the clustering
up

algorithm is started. The basic algorithm for K-means clustering is as follows:


t]is

an

an
G

Algorithm
2 [a
a_

/H

/H
sik

1. Choose the number of clusters, k.


02

du
n

2. Generate k random points as cluster centroids.


ta2

.ed
Ha

b.e

3. Assign each point to the nearest cluster centroid.


up

]isb

t]is

4. Recompute the new cluster centroid.


_G

[at

5. Repeat the two previous steps until some convergence criterion is met. Usually the convergence
2[a

criterion is that the assignment of customers to clusters has not changed over multiple iterations.
ika

22

02
ns

A cluster centroid is simply the average of all the points in that cluster. Its coordinates are the arithmetic
20

ta2

mean for each dimension separately over all the points in the cluster. Consider Joe, Sam, and Sara in the
Ha

ta

previous example. Let’s represent them based on their importance ratings on Premium Savings and
up

up

Neighborhood Agent as: Joe = {4,7}, Sam = {3,4}, Sara = {5,3}. If you assume that they belong to the same
cluster, then the center for their cluster is obtained as:
_G

_G

Cluster centroid Z = (z1,z2) = {(4+3+5)/3, (7+4+3)/3}.


ika

ika
ns

ns
Ha

Ha

This document is authorized for use only in Professor Vandith Pamuru's Business Analytics using Data Mining[PGP] at Indian School of Business (ISB) from Dec 2021 to Mar 2022.

23
a
]is

an

ta
H
HanIsikIa_GIuptIa20I22[Iat]IisbI.edIu I/ HIansIikaI GuIptaI/HaInsiIka_IG

[at

/H

up
22

aG
20

du
Page 5 UV0745

pta

b.e

sik
Gu

t]is
z1 is measured as the average of the ratings of Joe, Sam, and Sara on Premium Savings. Similarly, z2 is

an

an
measured as the average of their ratings on Neighborhood Agent. Figure 2 provides a visual representation

2[a
a_

/H

/H
of K-means clustering.

sik

02
Figure 2. Visual representation of K-means clustering.

du
an

ta2

.ed

b.e
H

up

]isb
ta/

t]is
_G
p

[at

2[a
Gu

ika

22

02
a

ns

20
sik

ta2
/Ha

pta
an

up
Gu
ta
/H

_G
up

_
ika

ika
aG
du
b.e

ns

ns
sik

Ha

Ha
t]is

an

Number of clusters
/

ta/
ta
2[a

/H

up

up
One of the main issues with K-means clustering is that it does not provide an estimate of the number of
02

clusters that exists in the data. The K-means clustering has to be repeated several times with different “Ks”
aG

aG
du

(or number of clusters) to determine the number of clusters that is appropriate for the data. A commonly
ta2

b.e

used method to determine the number of clusters is the elbow criterion.


sik

sik
up

t]is

The elbow criterion states that you should choose a number of clusters so that adding another cluster
an

an
G

does not add sufficient information. The elbow is identified by plotting the ratio of the within cluster variance to
2 [a
a_

/H

/H

between cluster variance against the number of clusters. The within cluster variance is an estimate of the average
of the variance in the variables used as a basis for segmentation (Importance Score ratings for Premium
sik

02

Savings and Neighborhood Agent in the Geico example) among customers who belong to a particular cluster.
u

du

The between cluster variance is an estimate of the variance of the segmentation basis variables between
n

ta2

.ed
Ha

b.e

customers who belong to different segments. The objective of cluster analysis (as mentioned before) is to
minimize the within cluster variance and maximize the between cluster variance. Therefore, as the number of
up

]isb

t]is

clusters is increasing, the ratio of the within cluster variance to the between cluster variance will keep
_G

decreasing.
[at

2[a
ika

22

But at some point, the marginal gain from adding an additional cluster will drop, giving an angle in the
02

graph (the elbow). In Figure 3, the elbow is indicated by the circle. The number of clusters chosen should
ns

20

therefore be 3.
ta2
Ha

ta
up

up
_G

_G
ika

ika
ns

ns
Ha

Ha

This document is authorized for use only in Professor Vandith Pamuru's Business Analytics using Data Mining[PGP] at Indian School of Business (ISB) from Dec 2021 to Mar 2022.

24
a
]is

an

ta
H
HanIsikIa_GIuptIa20I22[Iat]IisbI.edIu I/ HIansIikaI GuIptaI/HaInsiIka_IG

[at

/H

up
22

aG
20

du
Page 6 UV0745

pta

b.e

sik
Gu

t]is
Figure 3. Elbow plot for determining number of clusters.

an

an
2[a
a_

/H

/H
Elbow Plot

sik

02

du
300

an

ta2

.ed
Ratio of Within Cluster to Between Cluster

b.e
H
250

up

]isb
ta/

t]is
_G
200

[at

2[a
Gu
Variance

ika

22
150

02
a

ns

20
sik

ta2
100

/Ha

pta
an

up
50

Gu
ta
/H

_G
up

_
0

ika

ika
1 2 3 4 5 6 7
aG
du

Num ber of Clusters


b.e

ns

ns
sik

Ha

Ha
t]is

It should also be noted that the initial assignment of cluster seeds has a bearing on the final model
an

performance. Some common methods for ensuring the stability of the results obtained from K-means
/

ta/
ta
2[a

/H

clustering include:
up

up

02

Running the algorithm multiple times with different starting values. When using random starting
aG

aG
du

points, running the algorithm multiple times will ensure a different starting point each time.
ta2

b.e

 Splitting the data randomly into two halves and running the cluster analysis separately on each half.
sik

sik
up

The results are robust and stable if the number of clusters and the size of different clusters are similar
t]is

in both halves.
an

an
G

2 [a
a_

/H

/H

Profiling Clusters
sik

02

du
n

Once clusters are identified, the description of the clusters in terms of the variables used for clustering—
ta2

.ed
Ha

b.e

or using additional data such as demographics—helps to customize marketing strategy for each segment. This
process of describing the clusters is called profiling. Figure 1 is an example of such a process. A good deal of
up

]isb

t]is

cluster-analysis software also provides information on which cluster a customer belongs to. This information
_G

can be used to calculate the means of the profiling variables for each cluster. In the Geico example, it is useful
[at

2[a

to investigate whether the segments also differ with respect to demographic variables such as age and income.
ika

22

In Table 3, consider the distribution of age and income for Segments A, B, and C as provided in Figure 1.
02
ns

20

ta2
Ha

ta
up

up
_G

_G
ika

ika
ns

ns
Ha

Ha

This document is authorized for use only in Professor Vandith Pamuru's Business Analytics using Data Mining[PGP] at Indian School of Business (ISB) from Dec 2021 to Mar 2022.

25
a
]is

an

ta
H
HanIsikIa_GIuptIa20I22[Iat]IisbI.edIu I/ HIansIikaI GuIptaI/HaInsiIka_IG

[at

/H

up
22

aG
20

du
Page 7 UV0745

pta

b.e

sik
Gu

t]is
Table 3. Age and income distribution for segments.

an

an
2[a
a_

/H

/H
Segment Mean Range
Age Income ($) Age Income ($)

sik

02
A 21 15,000 16–25 0–25,000

du
an
B 45 120,000 33–55 75,000–215,000

ta2

.ed

b.e
C 39 40,000 39–54 24,000–60,000

up

]isb
ta/

t]is
Mean represents the averages of age and income of customers belonging to a particular segment. Range

_G
represents the minimum and maximum values of age and income for customers in a segment. Whereas the

[at

2[a
Gu
mean is useful for identifying the central tendency of a segment, the range helps in evaluating whether the

ika

22
segments overlap with regards to the profile variable.

02
a

ns

20
From Table 3, you see that Segment A customers who prefer high savings on their premium and do not
sik

ta2
/Ha
prefer having a neighborhood agent tend to be younger and have low income. These could probably be

pta
an

college students or recent graduates who are more comfortable with transacting online. Customers who

up
Gu
belong to Segment B, on the other hand, are older and have higher income levels. It would be interesting to
ta
/H

_G
evaluate if these customers also tend to be married with kids. The security of having a neighborhood agent
up

_
who can help in case of an accident or emergency is very important to them, and they do not mind paying a

ika

ika
aG
du

higher price for this sense of security. These customers may also not be comfortable in transacting (or
providing personal information) online.
b.e

ns

ns
sik

Finally, while Segment C customers are as old as Segment B customers, they tend to have lower incomes
Ha

Ha
t]is

an

and do not prefer to have a neighborhood agent (probably because of low disposable incomes). Identification
/

ta/
of the segments through these demographic characteristics enables a marketer to target as well as customize
ta
2[a

/H

communications to each segment. For example, if Geico decides to develop a network of neighborhood
up

up
agents, it can first focus on neighborhoods (identified through their zip codes) that match the profile of
02

aG

aG
du

Segment B customers.
ta2

b.e

sik

sik
up

Conclusion
t]is

an

an
G

Given a segmentation basis, the K-means clustering algorithm would identify clusters and the customers
2 [a
a_

/H

/H

that belong to each cluster. The management, however, has to carefully select the variables to use for
segmentation. Criteria frequently used for evaluating the effectiveness of a segmentation scheme include:
sik

02

identifiability, sustainability, accessibility, and actionability.1 Identifiability refers to the extent that managers can
u

du
n

ta2

.ed

recognize segments in the marketplace. In the Geico example, the profiling of customers allows you to
Ha

b.e

identify customer segments through their age and income information. PRIZM and ACORN are popular
up

]isb

databases that provide geodemographic information that can be used for segmentation as well as profiling.
t]is

The sustainability criterion is satisfied if the segments represent a large enough portion of the market to ensure
_G

profitable customization of the marketing program. The extent to which managers can reach the identified
[at

2[a

segments through their marketing campaigns is captured by the accessibility criterion. Finally, actionability refers
ika

22

to whether customers in the segment and the marketing mix necessary to satisfy their needs are consistent
02

with the goals and core competencies of the firm. The success of any segmentation process therefore requires
ns

20

ta2

managerial intuition and careful judgment.


Ha

ta
up

up
_G

_G

1 For more details, refer to Wagner Kamakura and Michel Wedel, Market Segmentation: Conceptual and Methodological Foundations, 2nd ed. (Norwell, MA:
ika

ika

Kluwer Academic Publishers, 2000).


ns

ns
Ha

Ha

This document is authorized for use only in Professor Vandith Pamuru's Business Analytics using Data Mining[PGP] at Indian School of Business (ISB) from Dec 2021 to Mar 2022.

26

You might also like