0% found this document useful (0 votes)

70 views189 pages

Causal Inference Extended Tutorial

Uploaded by

Abdullah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

70 views189 pages

Causal Inference Extended Tutorial

Uploaded by

Abdullah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 189

WORKSHOP: University of Michigan (2018)

Causal inference in online systems:

Methods, pitfalls and best
practices
From Prediction to Causation

Amit Sharma
Researcher, Microsoft Research India
amshar@microsoft.com
@amt_shrma

http://www.github.com/amit-sharma/causal-inference-tutorial
Session Objectives and Takeaways
• What is causal inference? Why should we care?
• Most machine learning algorithms depend on correlations.
• Correlations alone are a dangerous path to actionable insights.

• Learn how to formulate and estimate causal

effects.
• To evaluate the impact of online systems.
• To make underlying algorithms more robust to changes in data.

• Apply causal inference methods to a practical

problem
• Estimating the causal impact of a recommendation system.
Agenda
• 9am-11am: Introduction to causal inference
• 11:15am-12pm: Working with code
• 12-1:30pm: Lunch
• 1:30pm-2:30pm Advanced methods and a working
example
• 3-4pm: Case studies and further curiosities

3
We have lots of data!
www.tylervigen.com 4
Observed data can be confounded, or even
spurious.
www.tylervigen.com 5
𝑦 = 𝛽 𝑥 +𝜖

¿𝑋,𝑌>¿ ?

^
𝑦 ^
𝛽
Prediction
Causation Hofman, Sharma, and Watts (2017). Science, 355.6324 6
PART I:
Causality, what?

7
I.A What is causality?

8
A question that has attracted scholars for
centuries

Largely a philosophical pursuit for many centuries.

9
“More has been learned about causal
inference in the last few decades than the
sum total of everything that had been
learned about it in all prior recorded history”
--Gary King, Harvard University

10
“More has been learned about causal inference in the last few
decades than the sum total of everything that had been learned
about it in all prior recorded history”

--Gary King, Harvard University

A “causal revolution” is upon us.

--Judea Pearl, University of California Los Angeles

11
What are they talking about?

12
No one knows what causality means
Hume, 18th century

If you strike a match and it lights up, does the striking

cause lighting?

What if you repeat the experiment 100 times?

How do you know that striking always leads to light?

How is it different from regularity or predictability?

Does causality even exist?

13
Still, everyone aims to find causal effects
Empirical causal inference is pragmatic, best-effort.

Concerns effect of actions that generalize to all

reasonable contexts.
If I strike a match, would it light up (assuming that
everything else stays the same)?

“A bag of tricks to produce knowledge!”

--Try the action multiple times
--Try controlling for the environment
--Somehow account for uncontrolled factors
14
A practical definition

Definition: X causes Y iff

changing X leads to a change in Y,
keeping everything else constant.

The causal effect is the magnitude by which Y is changed by

a unit change in X.

Called the “interventionist” interpretation of causality.

*Interventionist definition [
http://plato.stanford.edu/entries/causation-mani/] 15
Bag of tricks => Powerful statistical frameworks

Potential Outcomes Bayesian Networks

16
I.B Why should we care?
We have increasing amounts of data and highly accurate
predictions. How is causal inference useful?

17
Predictive systems are everywhere
How do predictive systems work?
Aim: Predict future activity for a user.

…
We see data about their user profile and past activity.

E.g., for any user, we might see their age, gender, past
activity and their social network.
From data to prediction


Higher Activity Lower
Activity
Use these correlations to make a predictive model.
Future Activity ->
f(number of friends, logins in past month)
From data to “actionable insights”

Number of friends can predict activity with high

accuracy.
How do we increase activity of users?

Would increasing the number of friends increase

people’s activity on our system?
Maybe, may be not (!)
Different explanations are possible

How do we know
what causes what?
Decision: To increase activity, would it make sense to
launch a campaign to increase friends?
Another example: Search Ads

Search engines uses ad targeting to show relevant

ads.
Prediction model based on user’s search query.

Search Ads have the highest click-through rate (CTR)

in online ads.
Are search ads really that effective?

Ad targeting was highly accurate.

Blake-Tadelis-Noskos (2014)
But search results point to the same website

Counterfactual question: Would I have reached

Amazon.com anyways, without the ad?
Without reasoning about causality, may
overestimate effectiveness of ads

x% of ads shown
are effective

<x% of ads shown

are effective
Okay, search ads have an explicit intent. Display
ads should be fine?

Probably not.
There can be many hidden causes for an action, some
of which may be hard to quantify.
Estimating the impact of ads

Toys R Us designs new ads.

Big jump in clicks to their ads compared to past
campaigns. Were these ads more effective?
People anyways buy more toys in December

Misleading to compare ad campaigns with changing

underlying demand.
So far, so good.
Be mindful of
hidden causes, or
else we might
overestimate
causal effects. 1 2
(But) 1 2

Ignoring hidden
causes can also
lead to
completely wrong
conclusions.
Example: Did a system change lead to better
outcomes?
Have a current production algorithm. Want to test if a
new algorithm is better.
Say a feature that provides information or discount for
a financial product.

?
Algorithm A Algorithm B
Comparing old versus new algorithm
Two algorithms, A (production) and B (new) running
on the system.
From system logs, collect data for 1000 sessions for
each. Measure Success Rate (SR).

Old Algorithm (A) New Algorithm (B)

50/1000 (5%) 54/1000 (5.4%)

New algorithm is better?
Change in SR by income of people
8

So let us look at 4

SR
SR separately. 2

1 2 3 4

Old Algorithm (A) New Algorithm (B) Low-income

10/400 (2.5%) 4/200 (2%) Users

Old Algorithm (A) New Algorithm (B) High-income

40/600 (6.6%) 50/800 (6.2%) Users
The Simpson’s paradox
Old algorithm (A) New Algorithm
(B)
CTR for Low- 10/400 (2.5%) 4/200 (2%)
Activity users

CTR for High- 40/600 (6.6%) 50/800 (6.2%)

Activity users

Total CTR 50/1000 (5%) 54/1000 (5.4%)

Is Algorithm A better?
Simpson (1951)
Answer (as usual): May be, may be not.

E.g., Algorithm A could have been shown at different

times than B.
There could be other hidden causal variations.
Example: Simpson’s paradox in Reddit

Average comment length decreases over time.

But for each yearly cohort of users, comment length
increases over time.
Barbosa-Cosley-Sharma-Cesar (2016)
Making sense of
such data can be
too complex.
PART II:
Causal inference
basics

39
II.A In search of an intervention
and a counterfactual:
A historical tour of causal
inference

40
…And there was fire!

Action: Strike stones.

Outcome: Sparks? Fire?
Probably one of the oldest experiments in causal inference.
Intervention: Taking an action.
41
The idea of a “controlling” while taking action

Hooke discovered that pulling a spring causes it to

expand in length proportional to the amount of the force
applied. [Hooke’s law]

Only works if the other end is fixed, or controlled.

A common property of many physical experiments,
leading to the notion of a “controlled” experiment.
Intervention: Taking an action while keeping other relevant
factors constant.
42
What if you cannot intervene?
For centuries, controlled experiments have worked
well for many physical experiments and do so even
today.

However, they do not work in the messier life sciences

or social sciences.

We needed another big idea for causality to branch

out of physical experiments.
The idea of a “counterfactual”.

43
1854: London was having a devastating
cholera outbreak 44
Enter John Snow. He found higher cholera deaths near
a water pump, but could be just correlational.
45
S & TER
WA PANY
COM

V
LA TERANY
WA MP
CO

MB
ET
H

New Idea: Two major water companies for London:

one upstream and one downstream.
46
S&V
and
Lambeth

No difference in neighborhood, still an 8-fold increase

in cholera with the downstream company.
47
Real world Acting in the Real world
INTERVENTION

Observed Data from the No data from the

Real world Counterfactual world

COUNTERFACTUAL
48
Fisher experiment
A way to combine the ideas of an intervention and
counterfactual.
Q. : Estimate the effect of fertilizers.

Controlled experiment 49
Fisher experiment
A way to combine the ideas of an intervention and
counterfactual.
Q. : Estimate the effect of fertilizers.

Natural experiment 50
Fisher experiment
A way to combine the ideas of an intervention and
counterfactual.
Q. : Estimate the effect of fertilizers.

Randomized Controlled experiment 51

Ideal experiment
Old Algorithm
Naïve estimate

clicks to
recommendations
Causal estimate

New Algorithm Old Algorithm

clicks to clicks to
recommendations recommendations

Cloned user

Ideally, requires creation of multiple worlds.

52
II.B How do we systematically
reason about and estimate the
relationship between effects
and their causes?

53
Two questions leading from Fisher’s work
What if you have a complicated randomized
experiment?
E.g. Instead of plots, randomized at the level of
farmers. But all farmers may not comply.

What if you could not randomize but had a lot of

historical data?
E.g. Yield and fertilizer data for thousands of
plots.

54
Two questions leading from Fisher’s work
1. Need a language for “complicated” questions.
Counterfactuals cannot be expressed by probability.

Try “If the fertilizer was applied, what is the probability that
the yield was greater than , in case that the fertilizer was not
applied?

Pearl’s graphical model framework

55
Two questions leading from Fisher’s work
1. Need a language for “complicated” questions.
Counterfactuals cannot be expressed by probability.

Try “If the fertilizer was applied, what is the probability that
the yield was greater than , in case that the fertilizer was not
applied?

→ Pearl’s graphical model framework

2. Need an estimation framework that understands this

language.
Methods for estimating a causal quantity.
→ Neyman-Rubin’s potential outcome framework
56
Aside: Formulating causal inference problems

Causal inference: Principled basis for both experimental

and non-experimental methods.

Such questions form the basis of almost all scientific

inquiry.
E.g., occur in medicine (drug trials, effect of a drug), social sciences (effect of a
certain policy), and genetics (effect of genes on disease).

Frameworks:
• Causal graphical models [Pearl 2009]
• Potential Outcomes Framework [Imbens-Rubin 2016]
Graphical Models: Express causal relationships
visually

𝑐𝑡𝑟=𝑓(𝑎𝑙𝑔, 𝑎𝑐𝑡, time)

Edges represent direct causes.

Directed paths represent indirect causes.

58
Graphical Models: Express causal relationships
as a Bayesian network

Markov condition: A node is independent of all other

non-descendants given its parents.
Leads to factorization of joint probability.

59
Graphical Models: Make assumptions explicit

The graph encodes all causal assumptions.

Assumptions are the nodes and edges that are
missing.
60
Example: Assumptions encoded in the graph
• level of users affects which
they are shown and their
overall .
• is different at different times of
day.
• Unobserved of a user
determine when they visit the
Store, which also affects their
level, and in turn the they are
shown.

61
Graphical Models: A language for intervention
and counterfactual

Intervention on a node: Acting to change the node

exogenously by severing its ties to its parents.

which is different from

62
Graphical Models: A language for intervention
and counterfactual

Counterfactual: The recommendations were always

shown. But what would have happened if they were
not?

63
Graphical models: Provide an mechanistic way
of identifying a causal effect

Appeals to the idea of controlling.

When we cannot control the environment, use
conditioning.
64
Should we also restrict our comparison to
people who come at the same times?

does depend on of a user’s visit.

But the algorithm assigned does

not change based on .

While may be different at

different times, any is equally
likely to be shown at any point in
time.

65
Tricky to find correct variables to condition on.
Fortunately, graphical models make it precise.

“Backdoor” paths: Look for (undirected) paths that point to both and .

Pearl (2009) 66
Backdoor criterion: Condition on enough
variables to cover all backdoor paths

Identified Causal Effect:

67
Identified Causal Effect:

For complicated graphs, do-calculus provides a set of rules to

automate the identification process.

Wait, but how to estimate this?

68
Potential Outcomes: Every variable has a
counterfactual
Imagine
Need to estimate difference between the real and
counterfactual world.

This formulation has led to some powerful methods

for estimating causal effect.

Equivalent to graphical models.

69
Potential Outcomes: Estimating an effect
identified from the backdoor criterion
Imagine
Causal effect:

Can estimate using regression.

Valid if all effects are linear.

70
Unifying the two frameworks
Use graphical models and do-calculus for
modeling the world
identifying the causal effect

Use potential outcomes-based methods for

estimating the causal effect

71
II.C Different ways to identify
and estimate causal effect

72
Running example: Estimating effect of an
algorithm

𝑃 (𝐶𝑇𝑅 ∨ 𝑑𝑜 ( 𝐴𝑙𝑔𝑜𝑟𝑖𝑡h𝑚 ))
73
Lookback: Need answers to “what if” questions

Counterfactual thinking*:
What would have happened if I had changed X?

E.g. What would have been the CTR had we not

shifted to the new algorithm?

*Counterfactual theories of causation

http://plato.stanford.edu/entries/causation-counterfactual/
Ideal experiment
Old Algorithm
Naïve estimate

clicks to
recommendations
Causal estimate

New Algorithm Old Algorithm

clicks to clicks to
recommendations recommendations

Cloned user

Ideally, requires creation of multiple worlds.

Methods for answering causal questions

Randomization
A/B test

EASE OF USE
VALIDITY

Multi-armed bandits
Natural Experiments
Regression discontinuity
Instrumental Variables
Conditioning
Stratification, Matching
Propensity Scores
1. Randomization to the
rescue
Randomizing algorithm assignment: A/B test

We cannot clone users.

Next best alternative: Randomly

assign which users see new
Algorithm’s recommendations
and which see the old
algorithm’s.
Randomization removes hidden variation

Old Algorithm

clicks to

… recommendations
Causal estimate

New Algorithm Old Algorithm

clicks to clicks to
recommendations recommendations

Random User 1 Random User 2

Cost: Possibly risky, unethical

Say the new algorithm was really bad.

Can decrease the percentage of users who see the

new algorithm, but how do we know this beforehand?

Such manual tweaks even more inefficient if multiple

algorithms to test.
Efficient randomization: Multi-armed bandits

Two goals:
1. Show the best
known algorithm to
most users.
2. Keep randomizing
to update
knowledge about
competing
algorithms.
Bandits: The right mix of explore and exploit

Old Algorithm

clicks to

… recommendations

Current-best Random
Algorithm Algorithm

clicks to clicks to
recommendations recommendations

Most users Other users

Algorithm: ɛ-greedy multi-armed bandits

Repeat:
(Explore) With low probability ɛ, choose an
output item randomly.
(Exploit) Otherwise, show the current-best
algorithm.

Use CTR results for Random output items to train new

algorithms offline.
Practical Example: Contextual bandits on
Yahoo! News

Actions: Different news

articles to display
A/B tests using all articles inefficient.

Randomize the articles

shown using ɛ-greedy
policy.
Better: Use context of visit (user,
browser, time, etc.) to have different
current-best algorithms for different
contexts.

Li-Chu-Langford-Schapire (2010) 84
Caveat: Not always feasible to randomize, or
ensure that people fully comply

Randomization may be too expensive or involve ethical

hazards.
There may not be perfect compliance with random
assignment. E.g. referral experiment for a subscription service
like Netflix.

Even when feasible, randomization methods need a limited set

of "good" alternatives to test.
• How do we identify a good set of algorithms or a good set of parameters?
• Common metrics like CTR will not be useful, because they might miss hidden causes.

Need causal inference “without intervention”.

2. So how about naturally
occurring experiments?
“Natural” experiments: exploit variation in
observed data
Can exploit naturally occurring close-to-random
variation in data.
Since data is not randomized, need assumptions about
the data-generating process.
If there is sufficient reason to believe the assumptions,
we can estimate causal effects.

Dunning (2002), Rosenzweig-Wolpin (2000)

Example: Effect of Store recommendations
Suppose instead of comparing recommendation
algorithms, we want to estimate the causal effect of
showing any algorithmic recommendation.

Can be used to benchmark how much revenue a

recommendation system brings, and allocate
resources accordingly.
(and perhaps help analyze the tradeoff with users’ privacy)
Exploiting arbitrary cutoffs to
recommendations

Only 3 recommendations shown to user.

Assumption: Closely-ranked not-shown apps
are as relevant as shown apps

Causal effect of being

shown as recommendation

3rd ranked app 4th ranked app

(Shown)
(Not-shown)

number of app number of app

installs installs

Same user
Algorithm: Regression discontinuity
For any top-k recommendation list:
Using logs, identify apps that were similarly
ranked but could not make it to the top-k shown
apps.
Measure difference in app installs between
shown and not-shown apps for each user.

Imbens-Lemieux (2008)
Another natural experiment: Instrumental
Variables
Can look at as-if random variations due to external
events.
E.g. Featuring on the Today show may lead to a sudden spike in installs for an
app.
Such external shocks can be used to determine causal
effects, such as the effect of showing recommendations.

Angrist-Pischke (2008)
Cont. example: Effect of store
recommendations
How many
new visits are
caused by the
recommender
system?
Demand for App 1 is correlated with demand for App 2.
Users would most likely have visited App 2 even without
recommendations.
Traffic on normal days to App 1

click-throughs
from
App 1 to App 2

Cannot say much about the

causal effect of
recommendations from App 1.
click-throughs
from
App 1 to App 2
External shock brings as-if random users to
App1

click-throughs
from
its App 1 to App 2
s
vi 1
in p
ike Ap If demand for App 2 remains
Sp to
constant, additional views to
App 2 would not have
happened had these new users
click-throughs
from not visited App 1.
App 1 to App 2

Sharma-Hofman-Watts (2015)
Exploiting sudden variation in traffic to App 1
To compute Causal CTR of Visits to App1 on Visits to
App2:
• Compare observed effect of external event separately on Visits to App1,
and on Rec. Clicks to App2.
• Causal click-through rate =
More generally…

Unobserved
As-If-Random
¿
Confounders
(U)

Instrument
Cause (X) Outcome (Y)
(Z)

Exclusion (𝑍 ∐𝑌∨𝑋,𝑈) 97
Lottery
Weather
Shocks
Discontinuities
Hard-to-find
variations

Dunning (2012)
98
Lottery
Weather
Change in access Shocks
of digital services
Discontinuities
Change in train Hard-to-find
stops in a city variations

Change in medicines
at a hospital

…
99
Caveat: Natural experiments are hard to find
Estimates may not be generalizable to all products.

Critical assumptions may not be satisfied.

Both sources of experimentation:
• Controlled
• Natural
ruled out.
Can we estimate causal effects with only observational
data?
3. What can we conclude with
observed data?
Imagine a randomized experiment…

Old Algorithm

clicks to

… recommendations

New Algorithm Old Algorithm

clicks to clicks to
recommendations recommendations

Random User 1 Random User 2

Compare with a similar user instead of random

Old Algorithm

clicks to

… recommendations
Causal estimate

New Algorithm Old Algorithm

clicks to clicks to
recommendations recommendations

User 1 Similar User 2

Continuing example: Effect of Algorithm on
CTR
Does new Algorithm B increase CTR for recommendations on
Windows Store, compared to old algorithm A?

1. Make assumptions about how the data as

generated.
2. Create a graphical model representing those
assumptions.
Previous example: Effect of Algorithm over CTR

Does new Algorithm B increase CTR for recommendations on

Windows Store, compared to old algorithm A?

1. Make assumptions
about how the
data as generated.
2. Create a graphical
model
representing those
assumptions.
Backdoor criterion: Condition on enough
variables to cover all backdoor paths
Algorithm: Stratification
With observational data:
1. Assume a graphical model that explains how the
data was generated.
2. Choose variables to condition on using backdoor
criterion.
3. Stratify data into subsamples such that each
subsample has the same value of all conditioned
variables.
4. Evaluate the difference in outcome variable
separately within these strata.
5. (Optional) Aggregate over all data.
Stratification may be inefficient if there are
multiple hidden causes
Stratification creates tiny strata when data is high-
dimensional. Hard to obtain stable estimates.
E.g. activity data may be high-dimensional: a vector for purchases in each
app category.

Key Idea: Instead of conditioning on all relevant

attributes, can condition on the likelihood of being
assigned an Algorithm.

Morgan-Winship (2014)
This was stratification…

Old Algorithm

clicks to
User 2 and User 1
… recommendations are the same on all
relevant attributes.

New Algorithm Old Algorithm

clicks to clicks to
recommendations recommendations

User 1 Similar User 2

Instead condition on propensity to treatment

Old Algorithm

clicks to
User 2 and User 1
… recommendations are equally likely to
be shown New
Algorithm.
New Algorithm Old Algorithm

clicks to clicks to
recommendations recommendations

User 1 Similar User 2

Continued example: Effect of Algorithm on CTR

Based on backdoor criterion, need to

condition only on Activity.
Activity is multi-dimensional.

Estimate likelihood to be shown New

Algorithm using observed Algorithm-
user pairs.

Compare CTR between users with the same propensity score.

Algorithm: Propensity score matching
With observational data:
1. Assume a graphical model that explains how the data was
generated.
2. Choose variables to condition on using backdoor criterion.
3. Compute propensity score for each user based on conditioned
variables.
4. Match pairs of individuals with similar scores, where one of them
saw Old Algorithm and the other saw New Algorithm.
5. Compare the outcome variable within each such matched pair
and aggregate.

Morgan-Winship (2014)
Example: Causal effect of a social news feed
Friends on a social network may Like similar items
E.g. on Last.fm, friends of a user may like similar
music to the user

This may be due to influence, or

simply due to homophily

Causal question: Given only

log data, how can we determine
social influence due to the newsfeed,
compared to homophily effects?
117
Example: Causal effect of a social newsfeed
Solution: Use matching based on past items liked by
each user, to create control group of non-friends that
are as similar to a user as her friends.

f5 n5
f1 n1

u f4 u n4

f2 f3 n2 n3

Ego Network Non-Friends

Sharma-Cosley (2015), Aral-Muchnik-Sundarajan (2009) 118

Caveat: Causal effect only if assumed graphical
model is correct
There might be unknown and unobserved causes that
might affect an Algorithm’s CTR.
E.g. early adopters, more tech-savvy, or another characteristic.

There might be known unobserved user features.

E.g. their age or the context in which they use an online system.

At best, with only observational data, we can obtain

strong hints to causality.
5. Many of these techniques can be combined

120
Remember, we are always looking for the ideal
experiment with multiple worlds
Old Algorithm
Naïve estimate

clicks to
recommendations
Causal estimate

New Algorithm Old Algorithm

clicks to clicks to
recommendations recommendations

Cloned user

121
Example: Randomization + Instrumental Variable
Fisher example: You could not randomize over plots,
but can randomize which farmers get fertilizers

Algorithm example: You cannot remove

recommendations at random, but could advertise a
focal product to a random subset of people on the
homepage.

122
Random assignment as an instrument

Unobserved
As-If-Random
¿
Confounders
(U)

Random
assignmen Cause (X) Outcome (Y)
t(Z)

Exclusion (𝑍 ∐𝑌∨𝑋,𝑈) 123

Random assignment as an instrument

Unobserved
Confounders
(U)

Random
assignmen Cause (X) Outcome (Y)
t(Z)

124
Can use this variation to compute causal effect
An increase in Z can lead to a change in Y only
through X.

So change in Y is a product of change in Z->X and X-

>Y arrows.

Compare the extent by which random assignment

affects X versus Y.
Causal effect (X->Y) =

125
II.D. Key takeaways
Three steps to causal inference
Model
Use a causal graphical model. Make your assumptions
explicit even if it is cumbersome.

Identify
Use the graph to identify the effect you are after., or
check if your desired identification strategy is valid (BEFORE
looking at the data)

Estimate
Use any statistical method to estimate the identified
effect.

127
Estimating causal effects: Best practices
Whenever possible, use randomization.
If number of output items low, consider using explore-exploit methods.

If randomization is not feasible, consider exploiting

natural experiments.
Better to consider multiple sources of natural experiments.

If natural experiments are hard to find, consider using

conditioning methods.
Use them as strong hints for causality.
Causal inference is tricky
Correlations are seldom enough. And sometimes horribly misleading.

Always be skeptical of causal claims from observational

any data.
More data does not automatically lead to better causal estimates.

http://tylervigen.com/spurious-correlations
III. Hands-on tutorial
Code and resources available at
http://www.github.com/amit-sharma/causal-inference-tu
torial

Contact: amshar@microsoft.com, @amt_shrma

Prerequisites
Need Python, Jupyter notebook
Packages: numpy, pandas
1. Install them using:
pip install numpy pandas
pip install scikit-learn
(Optional)
pip install seaborn

2. Clone git repository

https://www.github.com/amit-sharma/causal-inference-tutorial
Example 1: Does having ice cream cause you
to swim?

133
Step 1: Model the world

Temperature

Ice Cream Swimming

134
Step 2: Identify the causal effect

Temperature

Ice Cream Swimming

𝑃 ¿ 135
Step 3: Estimate the effect
def linear_causal_estimate(df):
observed_common_causes = df[['w0']]
treatment_2d =
df["Treatment"].values.reshape(len(df["Treatment"]), -1)

features = np.concatenate((treatment_2d,
observed_common_causes),
axis=1)
model = linear_model.LinearRegression()
model.fit(features, df["Outcome"])
coefficients = model.coef_
estimate= {'value':coefficients[0],
'intercept':model.intercept_}
return estimate

estimate = linear_causal_estimate(df)
print("Causal Estimate is " + str(estimate["value"]))
136
Step 3: Estimate the effect

137
Example 2: Effect of recommendations
Study the effect of app store recommendation
system.
Using system logs,
Compare two recommendation algorithms.
Estimate the causal effect of recommendations.
I. Which of the two algorithms is better?
Situation: Two Algorithms, A and B, were used to
show app recommendations on the Store.

Data: System log data recording users’ visits.

Causal question: Which algorithm leads to higher

click-through rates?
Loading user-app visits data
Jupyter notebook

user_app_visits_A =
pd.read_csv("../datasets/user_app_visits_A.csv)

user_app_visits_B =
pd.read_csv("../datasets/user_app_visits_B.csv)
Dataset at a glance
Data description
user_id: Unique ID for user
activity_level: User’s activity level. Discrete (1:Lowest, 4:Highest)
product_id: Unique ID for an app
category: Category for an app (e.g. productivity, music, etc.)
is_rec_visit: Whether the app visit came through a
recommendation click-through.
rec_rank: Rank in the recommendation list (only top-3 apps
shown to user, -1 means that app was not in the recommendation
list)
What’s in the dataset?

> nrow(user_app_visits_A)
[1] 1,000,000
> length(unique(user_app_visits_A$user_id))
[1] 10,000
> length(unique(user_app_visits_A$product_id))
[1] 990
> length(unique(user_app_visits_A$category))
[1] 10
Causal assumptions
We ask the system designers/look at the source code
for the system.
Algorithm was selected based on activity level of
users.
Further, CTR depends on
• Activity level
• Time of day
• App category
Step 3: Estimate the effect of changing
algorithm

Naïve estimate for comparing algorithms

Algorithm B appears to be better.

Step 1: Model the world
Step 2: Using backdoor criterion, identify
correct variables on condition on
Step 3: Estimate the effect of changing
algorithm
Stratified estimate for comparing algorithms
> stratified_by_activity_estimate(user_app_visits_A)
Source: local data frame [4 x 2]
activity_level stratified_estimate
1 1 0.1248852
2 2 0.1750483
3 3 0.2266394
4 4 0.2763522
> stratified_by_activity_estimate(user_app_visits_B)
Source: local data frame [4 x 2]
activity_level stratified_estimate
1 1 0.1253469
2 2 0.1753933
3 3 0.2257211
4 4 0.2749867
If we had conditioned on category…

> stratified_by_category_estimate(user_app_visits_A)
Source: local data frame [10 x 2]
category stratified_estimate
1 1 0.1758294
2 2 0.2276829
3 3 0.2763157
4 4 0.1239860
5 5 0.1767163
… … …
> stratified_by_category_estimate(user_app_visits_B)
Source: local data frame [10 x 2]
category stratified_estimate
1 1 0.2002127
2 2 0.2517528
3 3 0.3021371
4 4 0.1503150
5 5 0.1999519
… … …
I. Which of the two algorithms is better?
The two Algorithms lead to roughly the same CTR.
Answer: Both are equally effective.

Still, the CTR estimate must be an over-estimate of the

causal effect of recommendations, as people might
have visited some of the apps anyways.
How to estimate the causal effect?
II. What is the causal effect of the
recommendation system?
Situation: Two Algorithms, A and B, were used to
show app recommendations on the Store.

Data: System logs containing user-app visits.

Causal question: How many apps would users have

visited in case no recommendations were shown?
Step 1, Model: Graphical model to estimate
causal effect

We observe total recommendation click-throughs.

But some of them may be due to correlated
demand.
Step 2, Identify: Using regression discontinuity
analysis

We know that the Store only shows top-3

recommendations.

Comparing number of visits to the 4th ranked app (not

shown to the user) with the 3rd ranked app can be
used to estimate the effect of showing a
recommendation.
(Assuming 3rd and 4th ranked apps are equally relevant to a user)
Step 3, Estimate: Discontinuity estimate for
recommendations

> naive_observational_estimate(user_app_visits_A)
naive_estimate
[1] 0.200768

> ranking_discontinuity_estimate(user_app_visits_A)
discontinuity_estimate
[1] 0.121362

40% of app visits coming from recommendation click-

throughs are not causal.
Could have happened even without the
recommendation system.
IV. Advanced methods

155
Advanced methods

1. Inverse propensity weighting

2. Doubly robust estimation

Refutation: 4th step in causal analysis

(Model, Identify, Estimate, Refute)

156
IV.A. Inverse Propensity
Weighting

157
Weighting to balance treatment and control
Suppose people who were shown a recommendation
were really likely to find it.

So if we had a score for their likelihood to find it

themselves, we could:
downweight the people who saw
recommendations who had a higher likelihood
give higher weight to people who had a lower
likelihood of finding it anyways

Turns out this creates a balanced dataset, assuming

the likelihood is accurate. 158
Estimation

Old Algorithm Naïve estimate

clicks to
recommendations
Causal estimate

New Algorithm Old Algorithm

clicks to clicks to
recommendations recommendations

Control users

159
Caveat
What if likelihood/propensity too low?

Leads to high variance estimates.

A single value can derail the estimate.

Can we combine the benefits of prediction and causal

inference?

160
IV.B. Doubly robust estimation

161
Prediction + IPW
Use IPW as the unbiased estimate.

To reduce variance, add a term that is the output of a

predictive ML model that tries to predict outcome
variable.

Combine them in a special way to yield a property:

The estimate is unbiased if either the propensity is
correctly estimated, or if the prediction model is
accurate.

162
Simple example: Estimating average height of
students in a class

163
IV.C. Refuting the obtained
causal estimate

164
Causal estimate is a result of your assumptions
“Causal” part does not come from the data.

It comes from your assumptions that lead to

identification.

The data is simply used for statistical estimation.

Critical to verify your assumptions. How?

165
Sanity check 1: Add random variables to your
model
Can add randomly drawn common causes, or
random instruments.

Rerun your analysis.

Does the causal estimate change?

166
Sanity check 2: Replace treatment by a placebo
(A/A test)
Can randomize or permute the treatment.

Or match treatment and control to have the same

distribution of the treatment variable.

Rerun your analysis.

Does the causal estimate change?

167
Sanity Check 3: Divide data into subsets
(cross-validation)
Create subsets of your data.

Rerun your analysis.

Does the causal estimate change?

168
Sensitivity Analysis: How bad should a
confounder be so that your estimate reverses?
Use simulation to add effect of unknown confounders.

Domain knowledge to guide reasonable values of the

simulation.

Make comparisons to other known estimates.

169
Does smoking cause lung cancer?

Demographics Genes

Smoking Lung Cancer

Cornwell (1959) showed that the effect of Genes had to be 8

times any known confounder for the effect to go to zero.

170
Observational causal inference: Best practices
Always follow the four steps: Model, Identify, Estimate,
Refute.
Refute is the most important step.

Aim for simplicity.

If your analysis is too complicated, it is most likely wrong.

Try at least two methods with different assumptions.

Higher confidence in estimate if both methods agree.

171
V. Hands-on exercise

172
Example 3: Estimating causal effect with many
auxiliary variables

Use multiple methods

Add Step 4: Try to refute your own estimate.

173
VI. Case studies on online
social systems

174
Studies
1. Distinguishing between homophily and influence
2. Measuring peer effects
3. Is running contagious?
4. Estimating counterfactuals in an online ad system
5. Measuring effect of marketing campaigns

175
Distinguishing homophily and contagion is
impossible

176
Distinguishing homophily and contagion is
impossible

177
Measuring peer effects

178
Measuring peer effects

Use
randomized
experiments to
validate

179
Is running contagious?

180
Is running contagious?
Use weather as an instrument: Does rain in NYC cause
your friends in LA to also run less?

181
Estimating causal effects in ad systems

182
Estimating causal effects in ad systems

183
Estimating effect of marketing campaigns

184
Estimating effect of marketing campaigns
Use synthetic controls.
Construct them by using Bayesian modeling.

Intuitive method when no controls present in the data.

185
VII. Further curiosity

186
1. Dealing with social networks
Networks mess up causal inference, because
individual units are no longer connected.

The key assumption of independence of

counterfactuals breaks.

If I treat a person, she might spread it (partially) to her

friends, and so the assignment to friends is not
completely randomized.
Think of a video sharing feature.

187
1. Dealing with social networks
Networks mess up causal inference, because
individual units are no longer connected.

Possible solution: New Zealand.

Or better clustering of the network (Ugander et al.)

Or really complicated statistics (Eckles et al.).

188
2. Causal inference and machine learning

Causal inference  robust prediction

(Supervised) ML Causal inference

Predicted value under the Predicted value under the
training distribution counterfactual distribution
P(X,y). P’(X,y).
2. Causal inference and machine learning

Machine learning Causal inference

Use causal inference Use ML algorithms to
methods for robust, better model the non-
generalizable prediction. linear effect of
confounders, or find
low-dimensional
representations.

In general, be wary of methods that have not been empirically

tested, especially ones that you do not understand.
3. Automating causal inference
Causal inference is tricky, requires patience and
ingenuity.

But also a lot of mechanical and statistical parts.

To what extent can tasks of modeling, identification,

estimation and refutation be automated like ML?
-- Finding the best method based on data
-- Mining natural experiments from data

Really, really nascent field.

191
Causal inference: Best practices
Always follow the four steps: Model, Identify, Estimate,
Refute.
Refute is the most important step.

Aim for simplicity.

If your analysis is too complicated, it is most likely wrong.

Remember the order for validity: Randomization,

Natural experiments, Conditioning.
Consider observational methods as strong hints.
Thank you!

Amit Sharma
Researcher, Microsoft Research India

amshar@microsoft.com
@amt_shrma

Inventory Management Using Machine Learning
No ratings yet
Inventory Management Using Machine Learning
5 pages
U It-4.cffh1 Network Design
No ratings yet
U It-4.cffh1 Network Design
48 pages
供应链管理：战略、规划与运营第二版
No ratings yet
供应链管理：战略、规划与运营第二版
522 pages
Entrepreneurship Theory Process and Practice 9th Edition Kuratko Digital Access
No ratings yet
Entrepreneurship Theory Process and Practice 9th Edition Kuratko Digital Access
409 pages
CHAPTER 8 Identifying A Sustainable Competitve Advantage
No ratings yet
CHAPTER 8 Identifying A Sustainable Competitve Advantage
9 pages
The Moderating Effect of System Quality On The Relationship Between Customer Satisfaction and Purchase Intention PLS-SEM & fsQCA Approaches
No ratings yet
The Moderating Effect of System Quality On The Relationship Between Customer Satisfaction and Purchase Intention PLS-SEM & fsQCA Approaches
15 pages
Supply Chain Management From Vision To Implementation 1st Edition Stanley E Fawcett Lisa M Ellram Jeffrey A Ogden Digital Access
No ratings yet
Supply Chain Management From Vision To Implementation 1st Edition Stanley E Fawcett Lisa M Ellram Jeffrey A Ogden Digital Access
405 pages
HCSCM - Introduction To Purchasing Management
No ratings yet
HCSCM - Introduction To Purchasing Management
107 pages
Criticality Analysis of Spare Parts Management: A Multi-Criteria Classification Regarding A Cross-Plant Central Warehouse Strategy
No ratings yet
Criticality Analysis of Spare Parts Management: A Multi-Criteria Classification Regarding A Cross-Plant Central Warehouse Strategy
11 pages
Forecasting for Supply Chain Management
No ratings yet
Forecasting for Supply Chain Management
55 pages
Aligning Supply Chain Logistics Costs Via ERP Coordination: Volume 9 - Issue 2 - April-June 2018
No ratings yet
Aligning Supply Chain Logistics Costs Via ERP Coordination: Volume 9 - Issue 2 - April-June 2018
20 pages
Mis - 465
No ratings yet
Mis - 465
8 pages
Lesson2 CriticalThinkingSkills
No ratings yet
Lesson2 CriticalThinkingSkills
6 pages
Supply Chain Decision Phases & Basic Concepts PDF
No ratings yet
Supply Chain Decision Phases & Basic Concepts PDF
43 pages
Demand Management: Market and Technology Forecasting
No ratings yet
Demand Management: Market and Technology Forecasting
32 pages
Philippine Development Plan, 2011-2016
No ratings yet
Philippine Development Plan, 2011-2016
393 pages
Chapter 12-Service Response Logisitcs
No ratings yet
Chapter 12-Service Response Logisitcs
29 pages
Information Decoupling Point PDF
100% (1)
Information Decoupling Point PDF
14 pages
Chapter 7-Tahoe-Salt
No ratings yet
Chapter 7-Tahoe-Salt
13 pages
Strategic Research Report
100% (1)
Strategic Research Report
93 pages
SCM Slides
No ratings yet
SCM Slides
402 pages
Effective Price Benchmarking With LTL's Most Popular Base Rate
No ratings yet
Effective Price Benchmarking With LTL's Most Popular Base Rate
12 pages
Inventory Safety Stock Analysis
No ratings yet
Inventory Safety Stock Analysis
3 pages
Introduction To Business Analytics, Second EDITION Majid Nabavi Install Download
No ratings yet
Introduction To Business Analytics, Second EDITION Majid Nabavi Install Download
56 pages
CH 07 Consumers, Producers, and The Efficiency of Markets
No ratings yet
CH 07 Consumers, Producers, and The Efficiency of Markets
40 pages
Forecasting For Operations Planning: Qualitative Methods Quantitative Methods
No ratings yet
Forecasting For Operations Planning: Qualitative Methods Quantitative Methods
16 pages
Chapter 8 HUMB 313 External Selection II
No ratings yet
Chapter 8 HUMB 313 External Selection II
29 pages
Corporate Cash Flow and Agency Costs
No ratings yet
Corporate Cash Flow and Agency Costs
3 pages
Inventory Control Subject To Known Demand
No ratings yet
Inventory Control Subject To Known Demand
37 pages
Chopra3 - PPT - ch04 - Supply Chain Management
No ratings yet
Chopra3 - PPT - ch04 - Supply Chain Management
18 pages
Data Analytics
No ratings yet
Data Analytics
10 pages
Testbank For Operations Management Processes and Supply Chains 13th Edition Krajewski
No ratings yet
Testbank For Operations Management Processes and Supply Chains 13th Edition Krajewski
17 pages
OM-Make Vs Buy
No ratings yet
OM-Make Vs Buy
15 pages
Business Analytics Unit 3 Notes
No ratings yet
Business Analytics Unit 3 Notes
20 pages
Operations and Supply Chain Strategies CA1
No ratings yet
Operations and Supply Chain Strategies CA1
11 pages
Process Intelligence: A Business Game-Changer
No ratings yet
Process Intelligence: A Business Game-Changer
22 pages
IPPTChap015 Logistics, Distribution, and Transportation Final
No ratings yet
IPPTChap015 Logistics, Distribution, and Transportation Final
38 pages
Supply Chain Planning - Leading Practices PDF
No ratings yet
Supply Chain Planning - Leading Practices PDF
55 pages
Bsi Whitepaper Risk Vs Resilience Supply Chain
No ratings yet
Bsi Whitepaper Risk Vs Resilience Supply Chain
8 pages
Modern ERP Select Implement and Use Todays Advanced Business Systems Ebook and TestBank Bundle Unlocked Test Bank
No ratings yet
Modern ERP Select Implement and Use Todays Advanced Business Systems Ebook and TestBank Bundle Unlocked Test Bank
334 pages
Breakthrough Thinking
100% (1)
Breakthrough Thinking
6 pages
Krajewski 11e SM Ch02
No ratings yet
Krajewski 11e SM Ch02
40 pages
Hierarchy of Supply Chain Metrics
No ratings yet
Hierarchy of Supply Chain Metrics
8 pages
Demand Forecasting
No ratings yet
Demand Forecasting
5 pages
Statistical Tests for Medical Research
No ratings yet
Statistical Tests for Medical Research
3 pages
Testbank and Solutions For Principles of Supply Chain Management A Balanced Approach 6th Edition Wisner
No ratings yet
Testbank and Solutions For Principles of Supply Chain Management A Balanced Approach 6th Edition Wisner
18 pages
Omnichannel Distribution
No ratings yet
Omnichannel Distribution
24 pages
Business Analytics 3
No ratings yet
Business Analytics 3
47 pages
Correlation New
No ratings yet
Correlation New
37 pages
Industry 4.0
No ratings yet
Industry 4.0
54 pages
KEI APICS CPIM Information Booklet 2010.01
No ratings yet
KEI APICS CPIM Information Booklet 2010.01
30 pages
17.supply Chain Management
100% (1)
17.supply Chain Management
22 pages
Supply Chain Management Course Guide
100% (1)
Supply Chain Management Course Guide
5 pages
The Impact AI On Our Life Book
100% (1)
The Impact AI On Our Life Book
11 pages
Supply Chain Analytics in Tableu
No ratings yet
Supply Chain Analytics in Tableu
15 pages
Digital Transformation: Breaking Rules for Success
No ratings yet
Digital Transformation: Breaking Rules for Success
8 pages
Strategy Diamond Work Pack
No ratings yet
Strategy Diamond Work Pack
9 pages
2020 - Introduction - To - Causal - Inference - From ML Perspective
100% (1)
2020 - Introduction - To - Causal - Inference - From ML Perspective
133 pages
01 Foundations
No ratings yet
01 Foundations
102 pages
Aleix Ruiz de Villa Robert - Causal Inference For Data Science (MEAP V04) - Manning (2023)
No ratings yet
Aleix Ruiz de Villa Robert - Causal Inference For Data Science (MEAP V04) - Manning (2023)
217 pages
1.4.6 Energy Flow Worksheet
100% (2)
1.4.6 Energy Flow Worksheet
7 pages
4TH Quarter 2017 18 Calendar
No ratings yet
4TH Quarter 2017 18 Calendar
1 page
2015 AP Lab Grading Rubric
No ratings yet
2015 AP Lab Grading Rubric
2 pages
6.7 Coordinate Proof-1 1-0
No ratings yet
6.7 Coordinate Proof-1 1-0
13 pages
Understanding Metals and Metalloids
No ratings yet
Understanding Metals and Metalloids
2 pages
Quadratic Rewiew Packet by Httpwwwmsftaorg
No ratings yet
Quadratic Rewiew Packet by Httpwwwmsftaorg
5 pages
Kindergarten - End of Year Assessment
No ratings yet
Kindergarten - End of Year Assessment
21 pages
Anderson Serangoon JC H2 Math Prelim 2022
No ratings yet
Anderson Serangoon JC H2 Math Prelim 2022
43 pages
Manual MSA.3.Ingles PDF
No ratings yet
Manual MSA.3.Ingles PDF
240 pages
Econometrics Chapter Three
No ratings yet
Econometrics Chapter Three
35 pages
Justin Dastous Research Paper
No ratings yet
Justin Dastous Research Paper
53 pages
Average Relative Error in Geochemical Determinations: Clarification, Calculation, and A Plea For Consistency
No ratings yet
Average Relative Error in Geochemical Determinations: Clarification, Calculation, and A Plea For Consistency
11 pages
Probability and Statistics
No ratings yet
Probability and Statistics
2 pages
Ex08 PDF
No ratings yet
Ex08 PDF
29 pages
Study of Subjective Well-Being and Socially Risky Behaviours of Youth
No ratings yet
Study of Subjective Well-Being and Socially Risky Behaviours of Youth
26 pages
XVIII. Chi Sqaure Test For A Variance or Standard Deviation
No ratings yet
XVIII. Chi Sqaure Test For A Variance or Standard Deviation
19 pages
HST.582J / 6.555J / 16.456J Biomedical Signal and Image Processing
No ratings yet
HST.582J / 6.555J / 16.456J Biomedical Signal and Image Processing
20 pages
B Tech II Sem Syllabus of CSE ECE EEE Etc R16
No ratings yet
B Tech II Sem Syllabus of CSE ECE EEE Etc R16
16 pages
Econometrics I: Professor William Greene Stern School of Business Department of Economics
No ratings yet
Econometrics I: Professor William Greene Stern School of Business Department of Economics
49 pages
Business Quantitative Techniques Project: Company Profile
No ratings yet
Business Quantitative Techniques Project: Company Profile
10 pages
PTSP Mid 2 Questions
No ratings yet
PTSP Mid 2 Questions
2 pages
Describing Data: Numerical Measures: Nguyen Thi Lien
No ratings yet
Describing Data: Numerical Measures: Nguyen Thi Lien
50 pages
LectureNotes PCA
No ratings yet
LectureNotes PCA
20 pages
2024-26 MAA HL Overview Updated Nov 2024
No ratings yet
2024-26 MAA HL Overview Updated Nov 2024
6 pages
Water - METHOD VERIFICATION
No ratings yet
Water - METHOD VERIFICATION
23 pages
Mostly Harmless Econometrics Notes Part 1
No ratings yet
Mostly Harmless Econometrics Notes Part 1
3 pages
MN5554 Reliability Notes
No ratings yet
MN5554 Reliability Notes
64 pages
SAP Material Ledger: Actual Costing Guide
No ratings yet
SAP Material Ledger: Actual Costing Guide
2 pages
MZA User Guide for Zone Delineation
No ratings yet
MZA User Guide for Zone Delineation
8 pages
Sampling 3ed. Edition Steven K. Thompson Download
100% (1)
Sampling 3ed. Edition Steven K. Thompson Download
52 pages
PTSP Objective Questions
No ratings yet
PTSP Objective Questions
7 pages
Standard Costing in Managerial Accounting
0% (1)
Standard Costing in Managerial Accounting
52 pages
MAS CPAR 04 Standard Costs and Variance Analysis
0% (2)
MAS CPAR 04 Standard Costs and Variance Analysis
13 pages
Introduction To Geo Mathematics
No ratings yet
Introduction To Geo Mathematics
81 pages
Factors Affecting Mobile Banking Adoption Behavior PDF
No ratings yet
Factors Affecting Mobile Banking Adoption Behavior PDF
25 pages
(Ebook PDF) Business Statistics in Practice 3rd Canadian Editioninstant Download
100% (3)
(Ebook PDF) Business Statistics in Practice 3rd Canadian Editioninstant Download
44 pages
(Ebook PDF) Using and Interpreting Statistics: A Practical Text For The Behavioral, Social, and Health Sciences 3rd Editionpdf Download
100% (4)
(Ebook PDF) Using and Interpreting Statistics: A Practical Text For The Behavioral, Social, and Health Sciences 3rd Editionpdf Download
48 pages