Causal Data Science for Business Analytics
Hamburg University of Technology
Monday, 24. June 2024
Definition 3.1: “Covariate Balance”
We have covariate balance if the distribution of covariates \(\mathbf{X}\) is the same across treatment groups.
Formally:
\(P(\mathbf{X} | T = t) \stackrel{d}{=} P(\mathbf{X} | T = t')\) for all \(t, t'\).
\(\mathbf{X} \perp\!\!\!\perp T\).
Perspective 1a: "Graphical Models: Backdoor Adjustment"
Perspective 1b: "Graphical Models: do-Operator"
Perspective 2: "Potential Outcomes" (Review & Extension)"
Perspective 2: "Potential Outcomes" (Review & Extension)"
\[ \begin{align} ATE_{as} &= \underbrace{\mathbb{E}[Y_i(1) - Y_i(0)]}_{ATE_{cs}} \\ &+ \underbrace{\mathbb{E}[Y_i(0)|T_i=1] - \mathbb{E}Y_i(0)|T_i=0]}_{\text{Confounding Bias}} \\ &+ \underbrace{(1-\mathbb{E}[T_i])[\underbrace{\mathbb{E}[Y_i(1)|T_i=1] - \mathbb{E}[Y_i(0)|T_i=1]}_{\text{ATT}} - \underbrace{(\mathbb{E}[Y_i(1)|T_i=0] - \mathbb{E}[Y_i(0)|T_i=0])}_{\text{ATU}}]}_{\text{Heterogeneity Bias}} \end{align} \]
Perspective 2: "Potential Outcomes - Exchangeability"
implies \(\{Y(1), Y(0)\} \perp\!\!\!\perp T\).
\[ \begin{align} ATE_{as} &= \underbrace{\mathbb{E}[Y_i(1) - Y_i(0)]}_{ATE_{cs}} \\ &+ \underbrace{\mathbb{E}[Y_i(0)|T_i=1] - \mathbb{E}Y_i(0)|T_i=0]}_{\color{#00C1D4}{\text{Confounding Bias = 0}}} \\ &+ \underbrace{(1-\mathbb{E}[T_i])[\underbrace{\mathbb{E}[Y_i(1)|T_i=1] - \mathbb{E}[Y_i(0)|T_i=1]}_{\text{ATT}} - \underbrace{(\mathbb{E}[Y_i(1)|T_i=0] - \mathbb{E}[Y_i(0)|T_i=0])}_{\text{ATU}}]}_{\color{#FF7E15}{\text{Heterogeneity Bias = 0}}} \end{align} \]
Identification
:
Source: Schochet, Burghardt, and Glazerman (2001); Schochet, Burghardt, and McConnell (2008).
conditional independence
holds in a properly randomized experiment:
Conditional Average Treatment Effect (CATE)
is also identified:
Effect Heterogeneity
.Unbiased
: on average, equal to the true parameter values across different samples:
Consistent
: converges in probability to the true parameter values as sample size increases:
Asymptotically normally distributed
: follows a normal distribution across suffciently large samples:
Note: We skip the proofs here. They can be found in any introductory econometrics textbook such as Wooldridge (2010): Econometric Analysis of Cross Section and Panel Data (MIT Press) .
Standard errors (SE)
: measure the variability of the estimated ATE across different samplesHypothesis tests
: assess whether the estimated ATE is statistically different from zeroConfidence intervals (CI)
: provide a range of plausible values for the true ATETest statistic
: \(t_1 = \frac{\hat{\beta_1} - 0}{se(\hat{\beta_1})}\)p-value
: probability of observing a test statistic as extreme as \(t_1\) under the null hypothesis
Range of ATE values such that the true ATE \(\beta_1\) is included with probability \(1-\alpha\), based on the estimated ATE \(\hat{\beta_1}\) and the standard error \(se(\hat{\beta_1})\) obtained in the sample
Constructed in such a way that in the (hypothetical) case that we could draw many samples and construct confidence intervals in all those samples, a share of \(1-\alpha\) confidence intervals would include the true \(\beta_1\)
Two-sided confidence interval
:
One-sided confidence interval
:
In general
: percentage of variance in the outcome \(Y\) that is explained by all variables in the model:
Specific case
: just one binary treatment variable \(T\) in the model:
library(causalweight) # load causalweight package
library(sandwich) # load sandwich package
library(modelsummary) # load modelsummary package)
data(JC) # load JC data
T=JC$assignment # define treatment (assignment to JC)
Y=JC$earny4 # define outcome (earnings in fourth year)
ols=lm(Y~T) # run OLS regression
# display results
modelsummary(ols, vcov = sandwich::vcovHC,
estimate = "est = {estimate} (se = {std.error}, t = {statistic}){stars}",
statistic = "p = {p.value}, CI = [{conf.low}, {conf.high}]",
gof_map = c("r.squared"))
(1) | |
---|---|
(Intercept) | est = 197.926 (se = 3.073, t = 64.416)*** |
p = <0.001, CI = [191.903, 203.949] | |
T | est = 16.055 (se = 4.074, t = 3.941)*** |
p = <0.001, CI = [8.069, 24.041] | |
R2 | 0.002 |
Bootstrap sampling
: \(B\) randomly drawn samples of the same size as the original sample, with replacementSource: Huber, Martin (2023). Causal analysis: Impact evaluation and Causal Machine Learning with applications in R. MIT Press, 2023.
library(causalweight) # load causalweight package
library(boot) # load boot package
data(JC) # load JC data
T=JC$assignment # define treatment (assignment to JC)
Y=JC$earny4 # define outcome (earnings in fourth year)
bootdata=data.frame(Y,T) # data frame with Y,D for bootstrap procedure
bs=function(data, indices) { # defines function bs for bootstrapping
dat=data[indices,] # creates bootstrap sample according to indices
coefficients=lm(dat)$coef # estimates coefficients in bootstrap sample
return(coefficients) # returns coefficients
} # closes the function bs
set.seed(1) # set seed
results = boot(data=bootdata, statistic=bs, R=1999) # 1999 bootstrap estimations
results # displays the results
ORDINARY NONPARAMETRIC BOOTSTRAP
Call:
boot(data = bootdata, statistic = bs, R = 1999)
Bootstrap Statistics :
original bias std. error
t1* 197.92584 0.02480312 3.013465
t2* 16.05513 -0.02075945 3.954810
tstat=results$t0[2]/sd(results$t[,2]) # compute the t-statistic
2*pnorm(-abs(tstat)) # compute the p-value asssuming standard normal distribution
T
4.914718e-05
treatmentinformation
: Graph with information on monthly gross private sector earnings shown.treatmentorder
: Reversed order of questions about professional and personal preferences (“framing”).library(causalweight) # load causalweight package
data(wexpect) # load wexpect data
T1=wexpect$treatmentinformation # define first treatment (wage information)
T2=wexpect$treatmentorder # define second treatment (order of questions)
Y=wexpect$wexpect2 # define outcome (wage expectations)
ols=lm(Y~T1+T2) # run OLS regression
# display results
modelsummary(ols, vcov = sandwich::vcovHC, estimate = "est = {estimate} (se = {std.error}, t = {statistic}){stars}", statistic = "p = {p.value}, CI = [{conf.low}, {conf.high}]", gof_map = c("r.squared"))
(1) | |
---|---|
(Intercept) | est = 9.408 (se = 0.159, t = 59.268)*** |
p = <0.001, CI = [9.096, 9.719] | |
T1 | est = 0.345 (se = 0.243, t = 1.421) |
p = 0.156, CI = [-0.132, 0.822] | |
T2 | est = -0.173 (se = 0.234, t = -0.741) |
p = 0.459, CI = [-0.633, 0.286] | |
R2 | 0.006 |
Discretize
a continuous treatment by generating binary indicators for (very small) brackets of values:
library(datarium) # load datarium package
library(np) # load np package
data(marketing) # load marketing data
T=marketing$newspaper # define treatment (newspaper advertising)
Y=marketing$sales # define outcome (sales)
results = npregbw(Y~T) # kernel regression
Multistart 1 of 1 |
Multistart 1 of 1 |
Multistart 1 of 1 |
Multistart 1 of 1 /
Multistart 1 of 1 |
Multistart 1 of 1 |
library(datarium) # load datarium package
library(np) # load np package
data(marketing) # load marketing data
T=marketing$newspaper # define treatment (newspaper advertising)
Y=marketing$sales # define outcome (sales)
results = npregbw(Y~T) # kernel regression
Multistart 1 of 1 |
Multistart 1 of 1 |
Multistart 1 of 1 |
Multistart 1 of 1 /
Multistart 1 of 1 |
Multistart 1 of 1 |
Given successful randomization, there is no need to include covariates in the estimation of the ATE.
However, including covariates can reduce variance and thus uncertainty in the estimation of the ATE.
\(Y_i = \underbrace{\hat{\beta_0} + \hat{\beta}_1 T_i + \hat{\beta}_{X_1} X_{i1} + \dots + \hat{\beta}_{X_K} X_{iK}}_{\hat{E}[Y_i | T_i, X_i]} + \hat{\epsilon}_i\)
\(R^2 = \frac{\text{Var}(\hat{E}[Y_i | T_i, X_i])}{\text{Var}(Y_i)}\) gets larger while \(\frac{\text{Var}(\hat{\epsilon}_i)}{\text{Var}(Y_i)}\) gets smaller with the inclusion of covariates.
This further reduces the standard error of the ATE estimate \(se(\hat{\beta}_1)\).
Pre-treatment covariates
Post-treatment covariates
library(causalweight) # load causalweight package
library(sandwich) # load sandwich package
data(coffeeleaflet) # load coffeeleaflet data
attach(coffeeleaflet) # store all variables in own objects
T=c(coffeeleaflet$treatment) # define treatment (leaflet)
Y=c(coffeeleaflet$awarewaste) # define outcome (aware of waste production)
X=cbind(coffeeleaflet$mumedu,coffeeleaflet$sex) # define covariates (grade, gender, age)
ols=lm(Y~T+X) # run OLS regression
modelsummary(ols, vcov = sandwich::vcovHC, estimate = "est = {estimate} (se = {std.error}, t = {statistic}){stars}", statistic = "p = {p.value}, CI = [{conf.low}, {conf.high}]", gof_map = c("r.squared"))
(1) | |
---|---|
(Intercept) | est = 1.187 (se = 0.249, t = 4.770)*** |
p = <0.001, CI = [0.698, 1.676] | |
T | est = 0.332 (se = 0.096, t = 3.449)*** |
p = <0.001, CI = [0.143, 0.521] | |
X1 | est = 0.272 (se = 0.090, t = 3.007)** |
p = 0.003, CI = [0.094, 0.450] | |
X2 | est = 0.137 (se = 0.100, t = 1.370) |
p = 0.171, CI = [-0.060, 0.333] | |
R2 | 0.046 |
Thank you for your attention! | |
Causal Data Science: (3) Randomized Experiments