Causal Data Science for Business Analytics
Hamburg University of Technology
Monday, 24. June 2024
observational data
.Assumption: “Conditional Exchangeability / Unconfoundedness / Ignorability / Independence”.
Formally, \((Y(1), Y(0)) \perp\!\!\!\perp T \, | \, \mathbf{X}\).
Assumption: “Positivity / Overlap / Common Support”.
For all values of covariates \(x\) present in the population of interest (i.e. \(x\) such that \(P(X=x) > 0\)), we have \(0 < P(T=1|X=x) < 1\).
Theorem: “Identification of the ATE”:
\(\tau_{\text{ATE}} = \mathbb{E}[Y_i(1)] - \mathbb{E}[Y_i(0)] = \mathbb{E_X}[\mathbb{E}[Y_i|T_i=1, X_i] - \mathbb{E}[Y_i|T_i=0, X_i]]\)
Proof:
\[ \begin{align*} \tau_{\text{ATE}} = \mathbb{E}[\tau_i] &= \mathbb{E}[Y_i(1) - Y_i(0)] \\ &= \mathbb{E}[Y_i(1)] - \mathbb{E}[Y_i(0)] \\ &\text{(linearity of expectation)} \\ &= \mathbb{E}_X [\mathbb{E}[Y_i(1) \mid X_i]] - \mathbb{E}_X [\mathbb{E}[Y_i(0) \mid X_i]] \\ &\text{(law of iterated expectations)} \\ &= \mathbb{E}_X [\mathbb{E}[Y_i(1) \mid T_i = 1, X_i]] - \mathbb{E}_X [\mathbb{E}[Y_i(0) \mid T_i = 0, X_i]] \\ &\text{(conditional ignorability and positivity)} \\ &= \mathbb{E}_X [\mathbb{E}[Y_i \mid T_i = 1, X_i]] - \mathbb{E}_X [\mathbb{E}[Y_i \mid T_i = 0, X_i]] \\ &\text{(consistency)} \end{align*} \]
Most direct approach is to run a linear regression conditional on covariates:
\(Y_i = \beta_0 + \beta_T T_i + \beta_{X_1} X_{i1} + \dots + \beta_{X_K} X_{iK} + \epsilon_i\)
\(Y_i = \beta_0 + \beta_T T_i + \mathbf{\beta_{X}' X_i} + \epsilon_i\)
\(\mathbb{E}[Y_i | T_i, \mathbf{X_i}] = \beta_0 + \beta_T T_i + \mathbf{\beta_{X}' X_i}\)
three-stage procedure
:Run a regression of the form \(Y_i = \beta_{Y0} + \mathbf{\beta_{Y \sim X}' X_i} + \epsilon_{Y_i \sim X_i}\) and extract the estimated residuals \(\hat\epsilon_{Y_i \sim X_i}\).
Run a regression of the form \(T_i = \beta_{T0} + \mathbf{\beta_{T \sim X}' X_i} + \epsilon_{T_i \sim X_i}\) and extract the estimated residuals \(\hat\epsilon_{T_i \sim X_i}\).
Run a residual-on-residual regression of the form \(\hat\epsilon_{Y_i \sim X_i} = \beta_T \hat\epsilon_{T_i \sim X_i} + \epsilon_i\) (no constant).
The resulting estimate \(\hat\beta_T\) is numerically identical
to the estimate we would get if we just run the full OLS model.
effect heterogeneity
: e.g. \(T_i \cdot X_{i1}\).library(Matching) # load Matching package for the data
library(Jmisc) # load Jmisc package with demean function
library(lmtest) # load lmtest package
library(sandwich) # load sandwich package
library(modelsummary) # load modelsummary package
data(lalonde) # load lalonde data
attach(lalonde) # store all variables in own objects
T = treat # define treatment (training)
Y = re78 # define outcome
X = cbind(age,educ,nodegr,married,black,hisp,re74,re75,u74,u75) # covariates
DXdemeaned = T * demean(X) # interaction of D and demeaned X
ols = lm(Y ~ T + X + DXdemeaned) # run OLS regression
modelsummary(ols, vcov = sandwich::vcovHC, estimate = "est = {estimate} (se = {std.error}, t = {statistic}){stars}", statistic = "p = {p.value}, CI = [{conf.low}, {conf.high}]", gof_map = c("r.squared"), coef_omit = "X")
(1) | |
---|---|
(Intercept) | est = 7161.163 (se = 3975.959, t = 1.801)+ |
p = 0.072, CI = [-653.935, 14976.260] | |
T | est = 1583.468 (se = 711.171, t = 2.227)* |
p = 0.027, CI = [185.600, 2981.336] | |
R2 | 0.104 |
Idea:
find and match treated and nontreated observations with similar (or ideally identical) covariate values.
Regression-based correction for the bias
which comes from not finding fully comparable matches for a reference observation:
library(Matching) # load Matching package for the data
data(lalonde) # load lalonde data
attach(lalonde) # store all variables in own objects
T = treat # define treatment (training)
Y = re78 # define outcome
X = cbind(age,educ,nodegr,married,black,hisp,re74,re75,u74,u75) # covariates
pairmatching = Match(Y=Y, Tr=T, X=X) # pair matching
summary(pairmatching) # matching output
Estimate... 1686.1
AI SE...... 866.4
T-stat..... 1.9461
p.val...... 0.051642
Original number of observations.............. 445
Original number of treated obs............... 185
Matched number of observations............... 185
Matched number of observations (unweighted). 267
pairmatching = Match(Y=Y, Tr=T, X=X, M=3, BiasAdjust = TRUE, estimand = "ATE", caliper = NULL) # pair matching
summary(pairmatching) # matching output
Estimate... 1396.3
AI SE...... 712.09
T-stat..... 1.9609
p.val...... 0.049893
Original number of observations.............. 445
Original number of treated obs............... 185
Matched number of observations............... 445
Matched number of observations (unweighted). 1525
Curse of dimensionality
as caveat of covariate matching:
Propensity Score
:
Theorem 4.1: “Propensity Score”.
Unconfoundedness given \(\mathbf{X}\) implies unconfoundedness given the propensity score \(PS(\mathbf{X})\).
Formally, \((Y(1), Y(0)) \perp\!\!\!\perp T \, | \, \mathbf{X} \implies (Y(1), Y(0)) \perp\!\!\!\perp PS(\mathbf{X})\).
treatments assignment mechanism
: \(P(T_i = 1 | Y_i(1), Y_i(0), PS(\mathbf{X_i})) = P(T_i = 1 | PS(\mathbf{X_i}))\)Proof:
\[ \begin{align*} P(T_i = 1 | Y_i(1), Y_i(0), PS(\mathbf{X_i})) &= \mathbb{E}[T_i | Y_i(1), Y_i(0), PS(\mathbf{X_i})] \\ &\text{(T is binary: turn probability into expectation)} \\ &= \mathbb{E}[\mathbb{E}[T_i | Y_i(1), Y_i(0), PS(\mathbf{X_i}), \mathbf{X_i}] | Y_i(1), Y_i(0), PS(\mathbf{X_i})] \\ &\text{(law of iterated expectations: introduce X)} \\ &= \mathbb{E}[\mathbb{E}[T_i | Y_i(1), Y_i(0), \mathbf{X_i}] | Y_i(1), Y_i(0), PS(\mathbf{X_i})] \\ &\text{(remove PS(X) from inner expectation, because it is redundant if we have X)} \\ &= \mathbb{E}[\mathbb{E}[T_i | \mathbf{X_i}] | Y_i(1), Y_i(0), PS(\mathbf{X_i})] \\ &\text{(apply original unconfoundedness and eliminate Y_i(t))} \\ &= \mathbb{E}[P(T_i = 1 | \mathbf{X_i}) | Y_i(1), Y_i(0), PS(\mathbf{X_i})] \\ &\text{(T is binary: turn expectation into probability)} \\ &= \mathbb{E}[PS(\mathbf{X_i}) | Y_i(1), Y_i(0), PS(\mathbf{X_i})] \\ &\text{(conditioning on itself makes addional info from Y_i(t) superfluous)} \\ &= PS(\mathbf{X_i}) = P(T_i = 1 | \mathbf{X_i}) \end{align*} \]
library(Matching) # load Matching package for the data
data(lalonde) # load lalonde data
attach(lalonde) # store all variables in own objects
T = treat # define treatment (training)
Y = re78 # define outcome
X = cbind(age,educ,nodegr,married,black,hisp,re74,re75,u74,u75) # covariates
ps = glm(T ~ X, family=binomial)$fitted # estimate the propensity score by logit
psmatching=Match(Y=Y, Tr=T, X=ps, BiasAdjust = TRUE, estimand = "ATE") # propensity score matching
summary(psmatching) # matching output
Estimate... 1590.7
AI SE...... 700.77
T-stat..... 2.2699
p.val...... 0.023211
Original number of observations.............. 445
Original number of treated obs............... 185
Matched number of observations............... 445
Matched number of observations (unweighted). 709
library(Matching) # load Matching package
library(boot) # load boot package
data(lalonde) # load lalonde data
attach(lalonde) # store all variables in own objects
T=treat # define treatment (training)
Y=re78 # define outcome
X=cbind(age,educ,nodegr,married,black,hisp,re74,re75,u74,u75) # covariates
bs=function(data, indices) { # defines function bs for bootstrapping
dat=data[indices,] # bootstrap sample according to indices
ps=glm(dat[,2:ncol(dat)],data=dat,family=binomial)$fitted # propensity score
effect=Match(Y=dat[,1], Tr=dat[,2], X=ps, BiasAdjust = TRUE, estimand = "ATE")$est # ATET
return(effect) # returns the estimated ATET
} # closes the function bs
bootdata=data.frame(Y,T,X) # data frame for bootstrap procedure
set.seed(1) # set seed
results = boot(data=bootdata, statistic=bs, R=999) # 999 bootstrap estimations
results # displays the results
ORDINARY NONPARAMETRIC BOOTSTRAP
Call:
boot(data = bootdata, statistic = bs, R = 999)
Bootstrap Statistics :
original bias std. error
t1* 1590.71 41.8785 771.4947
[,1]
[1,] 0.03922158
pseudo-population
where the treatment assignment is independent of the observed covariates:
Theorem 4.2: “Inverse Propensity Score Weighting”.
Given \((Y(1), Y(0)) \perp\!\!\!\perp T \, | \, \mathbf{X}\) (conditional unconfoundedness) and given \(0 < PS(\mathbf{X_i}) < 1\) for all \(i\) (positivity), then:
\[\tau_{\text{ATE}} = \mathbb{E}[Y_i(1) - Y_i(0)] = \mathbb{E}[\frac{T_iY_i}{PS(\mathbf{X_i})}] - \mathbb{E}[\frac{(1-T_i)Y_i}{1-PS(\mathbf{X_i})}]\]
Proof
for \(\mathbb{E}[Y_i(1)]\):\[ \begin{align*} \mathbb{E}\left[\frac{T_iY_i}{PS(\mathbf{X_i})}\right] = \mathbb{E}\left[\frac{T_iY_i(1)}{PS(\mathbf{X_i})}\right] &= \mathbb{E}\left[\mathbb{E}\left[\frac{T_iY_i(1)}{PS(\mathbf{X_i})} \bigg| \mathbf{X_i}\right]\right] \\ &\text{(law of iterated expectations: introduce X)} \\ &= \mathbb{E}\left[\frac{1}{PS(\mathbf{X_i})} \mathbb{E}[T_iY_i(1) | \mathbf{X_i}]\right] \\ &\text{(1/PS(X) is non-random given X, so it can be moved outside)} \\ &= \mathbb{E}\left[\frac{1}{e(X)} \mathbb{E}(Z | X) \mathbb{E}(Y_i(1) | \mathbf{X_i})\right] \\ &\text{(condional ignorability between T and Y(1) allows to separate expectations)} \\ &= \mathbb{E}\left[\frac{1}{PS(\mathbf{X_i})} PS(\mathbf{X_i}) \mathbb{E}[Y_i(1) | \mathbf{X_i}]\right] \\ &\text{(E(Z∣X)=PS(X) by definition, terms cancel out)} \\ &= \mathbb{E}\left[\mathbb{E}[Y_i(1) | \mathbf{X_i}]\right] = \mathbb{E}[Y_i(1)] \\ &\text{(law of iterated expectations)} \\ \end{align*} \]
Empirical Likelihood (EL)
estimator:
library(causalweight) # load causalweight package
library(Matching) # load Matching package for the data
data(lalonde) # load lalonde data
attach(lalonde) # store all variables in own objects
T = treat # define treatment (training)
Y = re78 # define outcome
X = cbind(age,educ,nodegr,married,black,hisp,re74,re75,u74,u75) # covariates
set.seed(1) # set seed
ipw=treatweight(y=Y,d=T,x=X, boot=999) # run IPW with 999 bootstraps
ipw$effect # show ATE
[1] 1640.468
[1] 678.4239
[1] 0.01560364
library(Matching) # load Matching package
data(lalonde) # load lalonde data
attach(lalonde) # store all variables in own objects
library(CBPS) # load CBPS package
library(lmtest) # load lmtest package
library(sandwich) # load sandwich package
T = treat # define treatment (training)
Y = re78 # define outcome
X = cbind(age,educ,nodegr,married,black,hisp,re74,re75,u74,u75) # covariates
cbps = CBPS(T ~ X, ATT = 0) # covariate balancing for ATE estimation
results = lm(Y ~ T, weights = cbps$weights) # weighted regression
coeftest(results, vcov = vcovHC) # show results
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4582.71 353.65 12.9582 < 2e-16 ***
T 1699.74 705.11 2.4106 0.01633 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Conditional outcome regression:
Inverse probability weighting:
doubly robust
methods:
Augmented Inverse Propensity Score Weighting (AIPW)
estimator.Theorem 4.3: “Doubly Robust Estimator”.
Given \((Y(1), Y(0)) \perp\!\!\!\perp T \, | \, \mathbf{X}\) (conditional unconfoundedness) and given \(0 < PS(\mathbf{X_i}) < 1\) for all \(i\) (positivity), then:
If either \(PS(\mathbf{X_i, \beta_{PS}}) = PS(\mathbf{X_i})\) or \(\mu_1(\mathbf{X_i,\beta_1}) = \mu_1(\mathbf{X_i})\), then \(\tilde{\mu}_{1}^{\text{dr}} = \mathbb{E}[Y_i(1)]\)
If either \(PS(\mathbf{X_i, \beta_{PS}}) = PS(\mathbf{X_i})\) or \(\mu_0(\mathbf{X_i,\beta_0}) = \mu_0(\mathbf{X_i})\), then \(\tilde{\mu}_{0}^{\text{dr}} = \mathbb{E}[Y_i(0)]\)
If either \(PS(\mathbf{X_i, \beta_{PS}}) = PS(\mathbf{X_i})\) or \(\{\mu_1(\mathbf{X_i,\beta_1}) = \mu_1(\mathbf{X_i}), \mu_0(\mathbf{X_i,\beta_0}) = \mu_0(\mathbf{X_i})\}\), then \(\tilde{\mu}_{1}^{\text{dr}} - \tilde{\mu}_{0}^{\text{dr}} = \tau_{\text{ATE}}^{\text{dr}}\)
Proof
for \(\tilde{\mu}_{1}^{\text{dr}} = \mathbb{E}[Y_i(1)]\):\[ \begin{align*} \tilde{\mu}_{1}^{\text{dr}} - \mathbb{E}[Y_i(1)] &= \mathbb{E}\left[\frac{T_i(Y_i(1) - \mu_1(X_i, \beta_1))}{PS(\mathbf{X_i, \beta_{PS}})} - (Y_i(1) - \mu_1(\mathbf{X_i,\beta_1}))\right] \\ &\text{(by definition)} \\ &= \mathbb{E}\left[\frac{T_i - PS(\mathbf{X_i, \beta_{PS}})}{PS(\mathbf{X_i, \beta_{PS}})}(Y_i(1) - \mu_1(\mathbf{X_i,\beta_1}))\right] \\ &\text{(combining terms)} \\ &= \mathbb{E}\left(\mathbb{E}\left[\frac{T_i - PS(\mathbf{X_i, \beta_{PS}})}{PS(\mathbf{X_i, \beta_{PS}})}(Y_i(1) - \mu_1(\mathbf{X_i,\beta_1})) \bigg| \mathbf{X_i}\right]\right) \\ &\text{(law of iterated expectations)} \\ &= \mathbb{E}\left(\mathbb{E}\left[\frac{T_i - PS(\mathbf{X_i, \beta_{PS}})}{PS(\mathbf{X_i, \beta_{PS}})} \bigg| \mathbf{X_i}\right] \cdot \mathbb{E}\left[Y_i(1) - \mu_1(\mathbf{X_i,\beta_1})) \bigg| \mathbf{X_i}\right]\right) \\ &\text{(ignorability allows to separate expectations)} \\ &= \mathbb{E}\left(\frac{PS(\mathbf{X_i}) - PS(\mathbf{X_i, \beta_{PS}})}{PS(\mathbf{X_i, \beta_{PS}})} \cdot \left(\mu_1(\mathbf{X_i}) - \mu_1(\mathbf{X_i,\beta_1})) \right)\right) \\ \end{align*} \]
Step 1
: obtain the fitted values of the propensity scores:
Step 2
: obtain the fitted values of the outcome regressions:
Step 3
: construct the doubly robust estimator:
\(\hat{\tau}_{\text{ATE}}^{\text{dr}} = \hat{\mu}_{1}^{\text{dr}} - \hat{\mu}_{0}^{\text{dr}}\) with (definition of augmented outcome model)
\(\hat{\mu}_{1}^{\text{dr}} = \frac{1}{n} \sum_{i=1}^n \left[\frac{T_i(Y_i - \mu_1(X_i, \hat{\beta_1)})}{PS(\mathbf{X_i, \hat{\beta}_{PS}})} + \mu_1(\mathbf{X_i,\hat{\beta_1}})\right]\)
\(\hat{\mu}_{0}^{\text{dr}} = \frac{1}{n} \sum_{i=1}^n \left[\frac{(1-T_i)(Y_i - \mu_0(X_i, \hat{\beta_0)})}{1- PS(\mathbf{X_i, \hat{\beta}_{PS}})} + \mu_0(\mathbf{X_i,\hat{\beta_0}})\right]\)
library(Matching) # load Matching package
library(drgee) # load drgee package
T = treat # define treatment (training)
Y = re78 # define outcome
X = cbind(age,educ,nodegr,married,black,hisp,re74,re75,u74,u75) # covariates
dr = drgee(oformula = formula(Y ~ X), eformula = formula(T ~ X), elink="logit") # DR reg
summary(dr) # show results
Call: drgee(oformula = formula(Y ~ X), eformula = formula(T ~ X), elink = "logit")
Outcome: Y
Exposure: T
Covariates: Xage,Xeduc,Xnodegr,Xmarried,Xblack,Xhisp,Xre74,Xre75,Xu74,Xu75
Main model: Y ~ T
Outcome nuisance model: Y ~ Xage + Xeduc + Xnodegr + Xmarried + Xblack + Xhisp + Xre74 + Xre75 + Xu74 + Xu75
Outcome link function: identity
Exposure nuisance model: T ~ Xage + Xeduc + Xnodegr + Xmarried + Xblack + Xhisp + Xre74 + Xre75 + Xu74 + Xu75
Exposure link function: logit
Estimate Std. Error z value Pr(>|z|)
T 1674.1 672.4 2.49 0.0128 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Note: The estimated parameters quantify the conditional
exposure-outcome association, given the covariates
included in the nuisance models)
445 complete observations used
Thank you for your attention! | |
Causal Data Science: (4) Observed Confounding