Causal Data Science for Business Analytics
Hamburg University of Technology
Monday, 24. June 2024
Unconfoundedness assumption \(\{Y(0), Y(1)\} \perp\!\!\!\perp T\) helps to identify the ATE: \(\tau_{\text{ATE}} = \mathbb{E}[Y_i|T_i=1] - \mathbb{E}[Y_i|T_i=0]\).
Average Treatment Effect on the Treated (ATT)
: \(\tau_{\text{ATT}} = \mathbb{E}[Y_i(1) - Y_i(0) | T_i=1]\)
\[ \begin{align*} \tau_{\text{ATT}} = \mathbb{E}[Y_i(1) - Y_i(0) | T_i=1] &= \mathbb{E}[Y_i(1) | T_i=1] - \mathbb{E}[Y_i(0) | T_i=1] \\ &= \mathbb{E}[Y_i | T_i=1] - \mathbb{E}[Y_i(0) | T_i=1] \\ &= \mathbb{E}[Y_i | T_i=1] - \mathbb{E}[Y_i(0) | T_i=0] \\ &= \mathbb{E}[Y_i | T_i=1] - \mathbb{E}[Y_i | T_i=0] \end{align*} \]
before
and after
treatment \(t=0,1\):
SUTVA
(consistency & no interference), two new assumptions:Assumption (A.pt) “Parallel Trends”
\(\mathbb{E}[Y_{i,t=1}(0) - Y_{i,t=0}(0) | T_i=1] = \mathbb{E}[Y_{i,t=1}(0) - Y_{i,t=0}(0) | T_i=0]\).
unconfoundedness of the change
(rather than potential outcomes themselves): \((Y_1(0)-Y_0(0)) \perp\!\!\!\perp T\).Assumption (A.na) “No Anticipation”
\(\mathbb{E}[Y_{i,t=0}(1) - Y_{i,t=0}(0) | T_i=1] = 0 \quad\) or \(\quad Y_{i,t=0}(1) = Y_{i,t=0}(0)\)
Definition “Difference-in-Differences ATT”
\(\tau_{\text{DiD}} = \mathbb{E}[Y_{i,t=1}(1) - Y_{i,t=1}(0) | T_i=1] = (\mathbb{E}[Y_{i,t=1} | T_i=1] - \mathbb{E}[Y_{i,t=0} | T_i=1]) - (\mathbb{E}[Y_{i,t=1} | T_i=0] - \mathbb{E}[Y_{i,t=0} | T_i=0])\).
\[ \begin{aligned} \tau_{\text{DiD}} = \mathbb{E}[Y_{i,1}(1) - Y_{i,1}(0) | T_i=1] &\overset{\text{LIE}}{=} \mathbb{E}[Y_{i,1}(1) | T_i=1] - \mathbb{E}[Y_{i,1}(0) | T_i=1] \\ &= \mathbb{E}[Y_{i,1}(1) | T_i=1] {\color{#00C1D4}- \mathbb{E}[Y_{i,0}(0) | T_i=1]} - \mathbb{E}[Y_{i,1}(0) | T_i=1] {\color{#00C1D4}+ \mathbb{E}[Y_{i,0}(0) | T_i=1]} \\ & \overset{\text{(A.na)}}{=} \mathbb{E}[Y_{i,1}(1) | T_i=1] -\mathbb{E}[Y_{i,0}({\color{#00C1D4}1}) | T_i=1] - \mathbb{E}[Y_{i,1}(0) | T_i=1] + \mathbb{E}[Y_{i,0}(0) | T_i= 1] \\ & \overset{\text{(SUTVA)}}{=} {\color{#00C1D4}\mathbb{E}[Y_{i,1} | T_i=1]} - {\color{#00C1D4}\mathbb{E}[Y_{i,0} | T_i=1]} - {\color{#FF7E15}(\mathbb{E}[Y_{i,1}(0) - Y_{i,0}(0) | T_i= 1])} \\ & \overset{\text{(A.pt)}}{=} \mathbb{E}[Y_{i,1} | T_i=1] - \mathbb{E}[Y_{i,0} | T_i=1] - {\color{#00C1D4}(\mathbb{E}[Y_{i,1}(0) - Y_{i,0}(0) | T_i= 0])} \\ &\overset{\text{LIE}}{=} \mathbb{E}[Y_{i,1} | T_i=1] - \mathbb{E}[Y_{i,0} | T_i=1] - (\mathbb{E}[Y_{i,1}(0) | T_i=0] - \mathbb{E}[Y_{i,0}(0) | T_i=0]) \\ &\overset{\text{SUTVA}}{=} \underbrace{\mathbb{E}[Y_{i,1} | T_i=1] - \mathbb{E}[Y_{i,0} | T_i=1]}_{\text{change in the treated group}} - \underbrace{(\mathbb{E}[Y_{i,1} | T_i=0] - \mathbb{E}[Y_{i,0} | T_i=0])}_{\text{change in the control group}} \\ \end{aligned} \]
two types of regressions
:Two-way fixed effects (TWFE) regression
:
panel data
: i.e. same observations before and after treatment.and repeated cross-sectionional data
: i.e. different observations before and after treatment.Graphical interpretation
of \(Y_{i,t} = \alpha + \beta T_i + \gamma t + \tau_{\text{DiD}} (T_i \times t) + \epsilon_{i,t}\):
selection bias
should remain constant in \(t=1\).same (parallel) for the treated group
.
Repeated cross-sectional data
: Assess house prices before and after a new highway is built. Repeated cross-section data of 179 houses in 1978 and 142 houses in 1981 in Kiel, Germany. Not the same houses over time.library(wooldridge) # load wooldridge package for data
library(fixest) # load fixest package for FE regression
data(kielmc) # load kielmc data
attach(kielmc) # attach data
kielmc$Y = kielmc$rprice # define outcome
kielmc$T = kielmc$nearinc # define treatment group
kielmc$t = kielmc$y81 # define period dummy
feols(Y ~ T + t + T:t, # eqivalent to lm(Y ~ T*t)
data = kielmc,
vcov = vcov_cluster("cbd") # cluster st.error w.r.t. distance to center
)
OLS estimation, Dep. Var.: Y
Observations: 321
Standard-errors: Clustered (cbd)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 82517.2 3221.92 25.61117 < 2.2e-16 ***
T -18824.4 7796.29 -2.41453 0.02146099 *
t 18790.3 5154.86 3.64516 0.00090981 ***
T:t -11863.9 6621.82 -1.79164 0.08236582 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
RMSE: 30,053.9 Adj. R2: 0.166131
Panel data
: Assess the impact of participating in the U.S. National Supported Work (NSW) training program targeted to individuals with social and economic problems on their real earnings. Experimental vs. non-experimental control group.library(tidyverse)
library(fixest)
library(haven) # to read Stata files
# 1. Experimental data
data <- haven::read_dta("https://raw.github.com/Mixtape-Sessions/Causal-Inference-2/master/Lab/Lalonde/lalonde_exp_panel.dta")
# ---- Difference-in-means - Averages
with(data, {y11 = mean(re[year == 78 & ever_treated == 1])
y01 = mean(re[year == 78 & ever_treated == 0])
dim = y11 - y01
dim})
[1] 1794.342
# 2. Non-Experimental data
data <- haven::read_dta("https://raw.github.com/Mixtape-Sessions/Causal-Inference-2/master/Lab/Lalonde/lalonde_nonexp_panel.dta")
# ---- Difference-in-means - Averages
with(data, {y11 = mean(re[year == 78 & ever_treated == 1])
y01 = mean(re[year == 78 & ever_treated == 0])
dim = y11 - y01
dim})
[1] -8497.516
# ---- Difference-in-Differences - Averages
with(data, {y00 = mean(re[year == 75 & ever_treated == 0])
y01 = mean(re[year == 78 & ever_treated == 0])
y10 = mean(re[year == 75 & ever_treated == 1])
y11 = mean(re[year == 78 & ever_treated == 1])
did = (y11 - y10) - (y01 - y00)
did})
[1] 3621.232
data$post_treat = data$ever_treated * (data$year == 78)
# ---- Difference-in-Differences - Two-Way Fixed Effects Regression
feols(re ~ post_treat | id + year,
data = data |> filter(year %in% c(75, 78)),
vcov = vcov_cluster(c("id", "year")))
# ---- Difference-in-Differences - Interactive Regression
feols(re ~ ever_treated + I(year == 78) + post_treat,
data = data |> filter(year %in% c(75, 78)),
vcov = vcov_cluster(c("id", "year")))
OLS estimation, Dep. Var.: re
Observations: 32,354
Fixed-effects: id: 16,177, year: 2
Standard-errors: Clustered (id & year)
Estimate Std. Error t value Pr(>|t|)
post_treat 3621.23 609.84 5.93801 0.10621
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
RMSE: 3,859.0 Adj. R2: 0.670904
Within R2: 0.002483
OLS estimation, Dep. Var.: re
Observations: 32,354
Standard-errors: Clustered (id & year)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 13650.80 51.4092 265.5324 0.0023975 **
ever_treated -12118.75 99.1118 -122.2736 0.0052064 **
I(year == 78) 1195.86 21.2944 56.1583 0.0113350 *
post_treat 3621.23 41.0534 88.2078 0.0072170 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
RMSE: 9,428.0 Adj. R2: 0.017819
Functional form misspecification
:
Compositional change with repeated cross-sections
:
Time-varying Confounding
:
Time-invariant covariate
: \(X_{i}\)
Time-varying covariate
: \(X_{i,t}\)
Advantages
: avoids time-varying confounders affected by treatment (mediators are “bad controls”) and helps to reduce the dimensionality of the problem.Disadvantage
: no control for time trends in X independently of the treatment and thus PT violation remains.Newer estimation approaches
can handle time-varying covariates in more flexible ways (e.g. Caetano and Callaway, 2023):
Assumption (A.cpt) “Conditional Parallel Trends”
\(\mathbb{E}[Y_{i,t=1}(0) - Y_{i,t=0}(0) | T_i=1, \mathbf{X_i}] = \mathbb{E}[Y_{i,t=1}(0) - Y_{i,t=0}(0) | T_i=0, \mathbf{X_i}] \quad\) and
\(\mathbf{X_i}\) is not affected by \(T_i\): \(\quad \mathbf{X_i}(1) = \mathbf{X_i}(0) = \mathbf{X_i}\)
Conditional unconfoundedness of the change
(rather than potential outcomes themselves): \((Y_1(0)-Y_0(0)) \perp\!\!\!\perp T \mid \mathbf{X_i}\).Assumption (A.cna) “Conditionally No Anticipation”
\(\mathbb{E}[Y_{i,t=0}(1) - Y_{i,t=0}(0) | T_i=1, \mathbf{X_i}] = 0\)
Assumption (A.pos) “Positivity / Common Support / Overlap”
\(\Pr(T_i = 1 \mid \mathbf{X_i}) < 1 \space \text{and} \space \Pr(T_i = 1) > 0\).
\[ \begin{aligned} \tau_{\text{DiD}}(\mathbf{x}) &= \mathbb{E}[Y_{i,1}(1) - Y_{i,1}(0) | T_i=1, \mathbf{X_i = x}] \\ &\overset{\text{LIE}}{=} \mathbb{E}[Y_{i,1}(1) | T_i=1, \mathbf{x}] - \mathbb{E}[Y_{i,1}(0) | T_i=1, \mathbf{x}] \\ &= \mathbb{E}[Y_{i,1}(1) | T_i=1, \mathbf{x}] {\color{#00C1D4}- \mathbb{E}[Y_{i,0}(0) | T_i=1, \mathbf{x}]} - \mathbb{E}[Y_{i,1}(0) | T_i=1, \mathbf{x}] {\color{#00C1D4}+ \mathbb{E}[Y_{i,0}(0) | T_i=1, \mathbf{x}]} \\ & \overset{\text{(A.cna)}}{=} \mathbb{E}[Y_{i,1}(1) | T_i=1, \mathbf{x}] -\mathbb{E}[Y_{i,0}({\color{#00C1D4}1}) | T_i=1, \mathbf{x}] - \mathbb{E}[Y_{i,1}(0) | T_i=1, \mathbf{x}] + \mathbb{E}[Y_{i,0}(0) | T_i= 1, \mathbf{x}] \\ & \overset{\text{(SUTVA)}}{=} {\color{#00C1D4}\mathbb{E}[Y_{i,1} | T_i=1, \mathbf{x}]} - {\color{#00C1D4}\mathbb{E}[Y_{i,0} | T_i=1, \mathbf{x}]} - {\color{#FF7E15}(\mathbb{E}[Y_{i,1}(0) - Y_{i,0}(0) | T_i= 1, \mathbf{x}])} \\ & \overset{\text{(A.cpt)}}{=} \mathbb{E}[Y_{i,1} | T_i=1, \mathbf{x}] - \mathbb{E}[Y_{i,0} | T_i=1, \mathbf{x}] - {\color{#00C1D4}(\mathbb{E}[Y_{i,1}(0) - Y_{i,0}(0) | T_i= 0, \mathbf{x}])} \\ &\overset{\text{LIE}}{=} \mathbb{E}[Y_{i,1} | T_i=1, \mathbf{x}] - \mathbb{E}[Y_{i,0} | T_i=1, \mathbf{x}] - (\mathbb{E}[Y_{i,1}(0) | T_i=0, \mathbf{x}] - \mathbb{E}[Y_{i,0}(0) | T_i=0, \mathbf{x}]) \\ &\overset{\text{SUTVA}}{=} \underbrace{\mathbb{E}[Y_{i,1} | T_i=1, \mathbf{x}] - \mathbb{E}[Y_{i,0} | T_i=1, \mathbf{x}]}_{\text{change for T=1 and X=x}} - \underbrace{(\mathbb{E}[Y_{i,1} | T_i=0, \mathbf{x}] - \mathbb{E}[Y_{i,0} | T_i=0, \mathbf{x}])}_{\text{change for T=0 and X=x}} \\ \end{aligned} \]
\(\tau_{\text{DiD}} = \mathbb{E}[Y_{i,1}(1) - Y_{i,1}(0) | T_i=1] = \mathbb{E}_{\mathbf{X_i}}\left[ \underbrace{\mathbb{E}[Y_{i,1}(1) - Y_{i,1}(0) | T_i=1, \mathbf{X_i}]}_{\tau_{\text{DiD}}(\mathbf{X_i})} \mid T_i=1\right]\)
Four estimation procedures
:
additional assumptions
needed (e.g. Caetano and Callaway, 2023):
library(tidyverse)
library(fixest)
library(haven) # to read Stata files
data <- haven::read_dta("https://raw.github.com/Mixtape-Sessions/Causal-Inference-2/master/Lab/Lalonde/lalonde_nonexp_panel.dta")
data$post_treat = data$ever_treated * (data$year == 78)
data$post = as.integer(data$year == 78)
# ---- Difference-in-Differences - Two-Way Fixed Effects Regression
feols(
re ~ post_treat + age:post + agesq:post + agecube:post + educ:post + educsq:post + marr:post + nodegree:post + black:post + hisp:post | id + year,
data = data |> filter(year %in% c(75, 78)),
vcov = vcov_cluster(c("id", "year"))
)
OLS estimation, Dep. Var.: re
Observations: 32,354
Fixed-effects: id: 16,177, year: 2
Standard-errors: Clustered (id & year)
Estimate Std. Error t value Pr(>|t|)
post_treat 2450.964333 645.332739 3.797985 0.163900
age:post -1392.968721 188.627206 -7.384771 0.085686 .
post:agesq 32.005450 5.576397 5.739450 0.109818
post:agecube -0.254779 0.052340 -4.867790 0.128988
post:educ -132.810162 95.337273 -1.393056 0.396361
post:educsq 10.228897 3.957951 2.584392 0.235037
post:marr -578.337803 163.583509 -3.535429 0.175485
post:nodegree 417.398327 193.694598 2.154930 0.276597
post:black -281.607046 205.559277 -1.369955 0.401418
post:hisp -126.167682 234.813629 -0.537310 0.686116
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
RMSE: 3,737.9 Adj. R2: 0.691052
Within R2: 0.064074
Regression Adjustment
exploits the fact that under conditional parallel trends, strong overlap, and no anticipation the ATT can be written as (Heckman, Ichimura and Todd, 1997):\[ \begin{aligned} \tau_{\text{DiD}} &= \mathbb{E}_{\mathbf{X_i}}\left[\space \underbrace{\mathbb{E}[Y_{i,1} - Y_{i,0} | T_i=1, \mathbf{X_i}] - \mathbb{E}[Y_{i,1} - Y_{i,0} | T_i=0, \mathbf{X_i}]}_{\tau_{\text{DiD}}(\mathbf{X_i})} \mid T_i=1\right] \\ &\overset{\text{LIE}}{=} \mathbb{E}[Y_{i,1} - Y_{i,0} | T_i=1] - \mathbb{E}_{\mathbf{X_i}}\left[\space \mathbb{E}[Y_{i,1} - Y_{i,0} | T_i=0, \mathbf{X_i}] \mid T_i=1\right]\\ &\overset{\text{LIE}}{=} \mathbb{E}[Y_{i,1} - Y_{i,0} | T_i=1] - \mathbb{E}_{\mathbf{X_i}}\left[\space \mathbb{E}[Y_{i,1} | T_i=0, \mathbf{X_i}] - \mathbb{E}[Y_{i,0} | T_i=0, \mathbf{X_i}] \mid T_i=1\right] \\ &:= \mathbb{E}[Y_{i,1} - Y_{i,0} | T_i=1] - \mathbb{E}_{\mathbf{X_i}}\left[\space \mu_{1}(0, \mathbf{X_i}) - \mu_{0}(0, \mathbf{X_i}) \mid T_i=1\right] \end{aligned} \]
Potential outcome evolution for the treatment group is imputed with a regression based only on X of the control group.
Sample version: \(\hat{\tau}_{\text{DiD}} =\frac{1}{N_1} \underset{i:T_i=1}{\sum} (( Y_{i,1} - Y_{i,0}) - (\hat{\mu}_{1}(0, \mathbf{X_i}) - \hat{\mu}_{0}(0, \mathbf{X_i})))\)
library(tidyverse)
library(DRDID)
library(haven) # to read Stata files
data <- haven::read_dta("https://raw.github.com/Mixtape-Sessions/Causal-Inference-2/master/Lab/Lalonde/lalonde_nonexp_panel.dta")
# ---- Difference-in-Differences - Double-robust
ordid(
yname = "re", tname = "year", idname = "id", dname = "ever_treated",
xformla = ~ age + agesq + agecube + educ + educsq + marr + nodegree + black + hisp + re74 + u74,
data = data |> filter(year == 75 | year == 78)
)
Call:
ordid(yname = "re", tname = "year", idname = "id", dname = "ever_treated",
xformla = ~age + agesq + agecube + educ + educsq + marr +
nodegree + black + hisp + re74 + u74, data = filter(data,
year == 75 | year == 78))
------------------------------------------------------------------
Outcome-Regression DID estimator for the ATT:
ATT Std. Error t value Pr(>|t|) [95% Conf. Interval]
1769.9984 643.0996 2.7523 0.0059 509.5233 3030.4736
------------------------------------------------------------------
Estimator based on panel data.
Outcome regression est. method: OLS.
Analytical standard error.
------------------------------------------------------------------
See Sant'Anna and Zhao (2020) for details.
library(tidyverse)
library(mlr3)
library(mlr3learners)
library(haven) # to read Stata files
data <- haven::read_dta("https://raw.github.com/Mixtape-Sessions/Causal-Inference-2/master/Lab/Lalonde/lalonde_nonexp_panel.dta")
data <- data |> filter(year == 75 | year == 78) |>
select(-treat, -data_id) |>
pivot_wider(names_from = year, values_from = re, names_prefix = "re_") |>
mutate(re_diff = re_78 - re_75)
data_pred <- data |> select(re_diff, ever_treated, age, agesq, agecube, educ, educsq,
marr, nodegree, black, hisp, re74, u74)
# Define prediction model for the outcome difference delta_mu
task_mu <- as_task_regr(data_pred |> select(-ever_treated), target = "re_diff")
lrnr_mu <- lrn("regr.ranger", predict_type = "response")
# learn outcome difference delta_mu among untreated
lrnr_mu$train(task_mu, row_ids = which(data$ever_treated == 0))
# predict outcome difference delta_mu among treated
delta_mu <- lrnr_mu$predict(task_mu, row_ids = which(data$ever_treated == 1))$response
Y1 <- data$re_78[data$ever_treated == 1]
Y0 <- data$re_75[data$ever_treated == 1]
mean( (Y1 - Y0) - delta_mu )
[1] 2188.271
The IPW approach proposed by Abadie (2005):
Sample version:
Intuition
: what happens when \(T_i=1\) and \(T_i=0\)?
library(tidyverse)
library(DRDID)
library(haven) # to read Stata files
data <- haven::read_dta("https://raw.github.com/Mixtape-Sessions/Causal-Inference-2/master/Lab/Lalonde/lalonde_nonexp_panel.dta")
# ---- Difference-in-Differences - Double-robust
ipwdid(
yname = "re", tname = "year", idname = "id", dname = "ever_treated",
xformla = ~ age + agesq + agecube + educ + educsq + marr + nodegree + black + hisp + re74 + u74,
data = data |> filter(year == 75 | year == 78)
)
Call:
ipwdid(yname = "re", tname = "year", idname = "id", dname = "ever_treated",
xformla = ~age + agesq + agecube + educ + educsq + marr +
nodegree + black + hisp + re74 + u74, data = filter(data,
year == 75 | year == 78))
------------------------------------------------------------------
IPW DID estimator for the ATT:
ATT Std. Error t value Pr(>|t|) [95% Conf. Interval]
2048.1972 724.1233 2.8285 0.0047 628.9156 3467.4788
------------------------------------------------------------------
Estimator based on panel data.
Hajek-type IPW estimator (weights sum up to 1).
Propensity score est. method: maximum likelihood.
Analytical standard error.
------------------------------------------------------------------
See Sant'Anna and Zhao (2020) for details.
library(tidyverse)
library(mlr3)
library(mlr3learners)
library(haven) # to read Stata files
data <- haven::read_dta("https://raw.github.com/Mixtape-Sessions/Causal-Inference-2/master/Lab/Lalonde/lalonde_nonexp_panel.dta")
data <- data |> filter(year == 75 | year == 78) |>
select(-treat, -data_id) |>
pivot_wider(names_from = year, values_from = re, names_prefix = "re_")
data_pred <- data |> select(re_78, re_75, ever_treated, age, agesq, agecube, educ, educsq,
marr, nodegree, black, hisp, re74, u74)
# Define prediction model for the propensity score e
task_e <- as_task_classif(data_pred |> select(-re_78, -re_75), target = "ever_treated")
lrnr_e <- lrn("classif.ranger", predict_type = "prob")
# Learn propensity score e among all observations
lrnr_e$train(task_e)
# Predict propensity score ehat
ehat <- lrnr_e$predict(task_e)$prob[, 2]
# Calculate the ATT
T <- data$ever_treated
P <- mean(data$ever_treated)
Y1 <- data$re_78
Y0 <- data$re_75
mean( (Y1-Y0)/P * (T - ehat)/(1-ehat) )
[1] 2906.464
library(tidyverse)
library(DRDID)
library(haven) # to read Stata files
data <- haven::read_dta("https://raw.github.com/Mixtape-Sessions/Causal-Inference-2/master/Lab/Lalonde/lalonde_nonexp_panel.dta")
# ---- Difference-in-Differences - Double-robust
drdid(
yname = "re", tname = "year", idname = "id", dname = "ever_treated",
xformla = ~ age + agesq + agecube + educ + educsq + marr + nodegree + black + hisp + re74 + u74,
data = data |> filter(year == 75 | year == 78)
)
Call:
drdid(yname = "re", tname = "year", idname = "id", dname = "ever_treated",
xformla = ~age + agesq + agecube + educ + educsq + marr +
nodegree + black + hisp + re74 + u74, data = filter(data,
year == 75 | year == 78))
------------------------------------------------------------------
Further improved locally efficient DR DID estimator for the ATT:
ATT Std. Error t value Pr(>|t|) [95% Conf. Interval]
2032.9217 707.4779 2.8735 0.0041 646.265 3419.5784
------------------------------------------------------------------
Estimator based on panel data.
Outcome regression est. method: weighted least squares.
Propensity score est. method: inverse prob. tilting.
Analytical standard error.
------------------------------------------------------------------
See Sant'Anna and Zhao (2020) for details.
library(tidyverse)
library(mlr3)
library(mlr3learners)
library(haven) # to read Stata files
data <- haven::read_dta("https://raw.github.com/Mixtape-Sessions/Causal-Inference-2/master/Lab/Lalonde/lalonde_nonexp_panel.dta")
data <- data |> filter(year == 75 | year == 78) |>
select(-treat, -data_id) |>
pivot_wider(names_from = year, values_from = re, names_prefix = "re_") |>
mutate(re_diff = re_78 - re_75)
data_pred <- data |> select(re_diff, ever_treated, age, agesq, agecube, educ, educsq,
marr, nodegree, black, hisp, re74, u74)
# Define prediction model for the propensity score e
task_e <- as_task_classif(data_pred |> select(-re_diff), target = "ever_treated")
lrnr_e <- lrn("classif.ranger", predict_type = "prob")
# Learn propensity score e among all observations
lrnr_e$train(task_e)
# Predict propensity score ehat
ehat <- lrnr_e$predict(task_e)$prob[, 2]
# Define prediction model for the outcome difference delta_mu
task_mu <- as_task_regr(data_pred |> select(-ever_treated), target = "re_diff")
lrnr_mu <- lrn("regr.ranger", predict_type = "response")
# learn outcome difference delta_mu among untreated
lrnr_mu$train(task_mu, row_ids = which(data$ever_treated == 0))
# predict outcome difference delta_mu among all observations
delta_mu <- lrnr_mu$predict(task_mu)$response
# Calculate the ATT
T <- data$ever_treated
P <- mean(data$ever_treated)
Y1 <- data$re_78
Y0 <- data$re_75
mean( ((Y1-Y0) - delta_mu) * (T - ehat)/(P*(1-ehat)) )
[1] 2283.593
What if there are multiple periods and units adopt treatment at different times?
Notation:
Assumption (A.stpt) “Parallel Trends”
\(\mathbb{E}[Y_{i,t}(\infty) - Y_{i,t-1}(\infty) | G_i=g] = \mathbb{E}[Y_{i,t-1}(\infty) - Y_{i,t}(\infty) | G_i=g'] \quad \forall g, g', t\).
Assumption (A.stna) “No Anticipation”
\(\mathbb{E}[Y_{i,t}(g) - Y_{i,t}(\infty) | T_i=1] = 0 \quad\) or \(\quad Y_{i,t}(g) = Y_{i,t}(\infty) \quad \forall t < g\)
treatment effect is constant
across time and units, \(Y_{it}(g) - Y_{it}(\infty) \equiv \tau\), identification possible
: \(\tau = \beta\).treatment effect is heterogeneous
, i.e. depends on time since treatement, \(Y_{it}(t-r) - Y_{it}(\infty) \equiv \tau_r\), then identification fails
, because some \(\tau_{r}\)’s may get negative weights.Clean comparisons
: DiD’s between treated and not-yet-treated units.Forbidden comparisons
: DiD’s between already-treated units (who began treatment at different times).
always treated
(in both periods) & switchers
(treated only in period 2).heterogeneity only in time since treatment
.across adoption cohorts
, then:
Callaway and Sant’Anna (2021): R package ‘did’ (Focus)
Sun and Abraham (2021): R package ‘fixest’
group-time-specific
treatment effect on the treated:
Simple
:
Dynamic
:
Group
:
Calendar
:
Anticipation
:
Covariates
:
library(did) # Load the 'did' package
library(fixest) # Load the 'fixest' package
temp_file <- tempfile(fileext = ".RData") # Define a temporary file path to save the downloaded file
# Download the file from Dropbox
download.file("https://www.dropbox.com/scl/fi/wnp1hrkz00izr72h6ua1t/job_displacement_data.RData?rlkey=nr15rrbv1ev8ra1kzfa1knux2&dl=1", temp_file, mode = "wb")
load(temp_file) # Load the RData file into the R session
rm(temp_file) # Optionally, remove the temporary file
# Check the structure of the loaded data
head(job_displacement_data)
# run TWFE
fixest::feols(income ~ i(year >= group) | id + year,
data=job_displacement_data,
cluster=c("id", "year"))
# A tibble: 6 × 7
id year group income female white occ_score
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 7900002 1984 0 31130 1 1 4
2 7900002 1985 0 32200 1 1 3
3 7900002 1986 0 35520 1 1 4
4 7900002 1987 0 43600 1 1 4
5 7900002 1988 0 39900 1 1 4
6 7900002 1990 0 38200 1 1 4
OLS estimation, Dep. Var.: income
Observations: 11,682
Fixed-effects: id: 1,298, year: 9
Standard-errors: Clustered (id & year)
Estimate Std. Error t value Pr(>|t|)
year >= group::TRUE -6455.36 2041.49 -3.16208 0.013353 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
RMSE: 14,824.0 Adj. R2: 0.674268
Within R2: 0.002425
# run command to obtain 64 (8 years * 8 groups) Tau(t,g)'s
result <- att_gt(
yname = "income",
tname = "year",
idname = "id",
gname = "group",
xformla = ~ female + white + occ_score, # ~ 1
data = job_displacement_data,
allow_unbalanced_panel = TRUE,
control_group = "nevertreated", # "notyettreated"
anticipation = 0,
clustervars = "id",
est_method = "dr", # "reg", "ipw"
)
# aggregate results over time since treatment
result_es <- aggte(result, type="dynamic")
ggdid(result_es)
Call:
aggte(MP = result, type = "group")
Reference: Callaway, Brantly and Pedro H.C. Sant'Anna. "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015>
Overall summary of ATT's based on group/cohort aggregation:
ATT Std. Error [ 95% Conf. Int.]
-6290.805 1664.296 -9552.766 -3028.844 *
Group Effects:
Group Estimate Std. Error [95% Simult. Conf. Band]
1985 -9152.7025 2998.059 -15874.750 -2430.655 *
1986 -1863.3794 5769.769 -14799.971 11073.212
1987 816.6743 6509.247 -13777.925 15411.274
1988 6797.5169 4136.877 -2477.913 16072.947
1990 -13962.6186 3847.891 -22590.104 -5335.134 *
1991 -7107.5120 5033.109 -18392.415 4177.391
1992 -10570.4752 8649.169 -29963.066 8822.115
1993 -24680.0159 5672.560 -37398.653 -11961.379 *
---
Signif. codes: `*' confidence band does not cover 0
Control Group: Never Treated, Anticipation Periods: 0
Estimation Method: Doubly Robust
two-stage procedure
:
events studies
of the form \(Y_{i,t} = \alpha_i + \gamma_t + \sum_{k \neq 0} \tau_{it}^k D_{it}^k + \delta X_{it} + \epsilon_{i,t}\):
Key difference to C&S (2021)
is the trade-off between efficiency and strength of identifying assumption:
Plus
: averaging over multiple pre-treatment periods (instead of one) can increase precision.Minus
: parallel trends need to hold for all groups and time periods (instead of only post-treatment parallel trends).library(did2s) # Load the 'did' package
library(tidyverse) # Load the 'tidyverse' package
temp_file <- tempfile(fileext = ".RData") # Define a temporary file path to save the downloaded file
# Download the file from Dropbox
download.file("https://www.dropbox.com/scl/fi/wnp1hrkz00izr72h6ua1t/job_displacement_data.RData?rlkey=nr15rrbv1ev8ra1kzfa1knux2&dl=1", temp_file, mode = "wb")
load(temp_file) # Load the RData file into the R session
rm(temp_file) # Optionally, remove the temporary file
job_displacement_data <- job_displacement_data |>
# create time-variant treatment indicator D_it
mutate(d_it = case_when(
group == 0 ~ 0,
year >= group ~ 1,
TRUE ~ 0)
) |>
# create releative event-time indicator D_it_k
mutate(d_it_k = case_when(
group == 0 ~ Inf,
TRUE ~ year - group)
)
# Static model
result <- did2s(
job_displacement_data,
yname = "income",
first_stage = ~ occ_score | id + year, # only time-variant covariates
second_stage = ~ d_it,
treatment = "d_it",
cluster_var = "id"
)
fixest::esttable(result)
# Event Study
result <- did2s(
job_displacement_data,
yname = "income",
first_stage = ~ occ_score | id + year, # only time-variant covariates
second_stage = ~ i(d_it_k, ref = c(-3, Inf)),
treatment = "d_it",
cluster_var = "id"
)
# Dynamic TWFE
twfe <- feols(income ~ i(d_it_k, ref = c(-3, Inf)) + occ_score | id + year,
data=job_displacement_data,
cluster=c("id", "year"))
fixest::iplot(list(result, twfe), main = "Event study: Staggered treatment",
xlab = "Relative time to treatment", col = c("#005e73", "#00C1D4") , ref.line = -0.5)
# Legend
legend(x=5, y=30000, col = c("#005e73", "#00C1D4"), pch = c(20, 17),
legend = c("Two-stage estimate", "Dynamic TWFE"))
OLS estimation, Dep. Var.: income
Observations: 11,448
Standard-errors: Custom
Estimate Std. Error t value Pr(>|t|)
d_it -5900.79 2151.88 -2.74215 0.0061132 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
RMSE: 15,051.3 Adj. R2: 0.006894
Event study plot
is a common way to visualize pre-trends, which van be generated based on:
Low power
: even if pre-trends are non-zero, we may fail to detect it statisticallyDistortions from pre-testing
: if we only analyze cases without statistically significant pre-trends, this introduces a form of selection bias (pre-test bias
which can make things worse).Power issues in pre-trend testing
: We can’t reject zero pre-trend, but we also can’t reject pre-trends that under smooth extrapolations to the post-treatment period would produce substantial bias.Distortions from pre-testing
: If we happen to draw sample from the population where pre-trends are insignificant, the treatment effect we discover later on might be significantly biased (upwards in this example).Thank you for your attention! | |
Causal Data Science: (8) Difference-in-Differences