(8) Difference-in-Differences

Causal Data Science for Business Analytics

Christoph Ihl

Hamburg University of Technology

Monday, 24. June 2024

Basic Model

Focus: ATT After Treatment

  • Unconfoundedness assumption \(\{Y(0), Y(1)\} \perp\!\!\!\perp T\) helps to identify the ATE: \(\tau_{\text{ATE}} = \mathbb{E}[Y_i|T_i=1] - \mathbb{E}[Y_i|T_i=0]\).

  • Average Treatment Effect on the Treated (ATT): \(\tau_{\text{ATT}} = \mathbb{E}[Y_i(1) - Y_i(0) | T_i=1]\)

    • Weaker identification assumption suffices: \(Y(0) \perp\!\!\!\perp T|X\):

\[ \begin{align*} \tau_{\text{ATT}} = \mathbb{E}[Y_i(1) - Y_i(0) | T_i=1] &= \mathbb{E}[Y_i(1) | T_i=1] - \mathbb{E}[Y_i(0) | T_i=1] \\ &= \mathbb{E}[Y_i | T_i=1] - \mathbb{E}[Y_i(0) | T_i=1] \\ &= \mathbb{E}[Y_i | T_i=1] - \mathbb{E}[Y_i(0) | T_i=0] \\ &= \mathbb{E}[Y_i | T_i=1] - \mathbb{E}[Y_i | T_i=0] \end{align*} \]

  • Introducing time periods before and after treatment \(t=0,1\):
    • \(\tau_{\text{DiD}} = \mathbb{E}[Y_{i,t=1}(1) - Y_{i,t=1}(0) | T_i=1]\)

Assumptions and Definition

  • In addition to SUTVA (consistency & no interference), two new assumptions:

Assumption (A.pt) “Parallel Trends”

\(\mathbb{E}[Y_{i,t=1}(0) - Y_{i,t=0}(0) | T_i=1] = \mathbb{E}[Y_{i,t=1}(0) - Y_{i,t=0}(0) | T_i=0]\).

  • Equivalent to unconfoundedness of the change (rather than potential outcomes themselves): \((Y_1(0)-Y_0(0)) \perp\!\!\!\perp T\).

Assumption (A.na) “No Anticipation”

\(\mathbb{E}[Y_{i,t=0}(1) - Y_{i,t=0}(0) | T_i=1] = 0 \quad\) or \(\quad Y_{i,t=0}(1) = Y_{i,t=0}(0)\)

  • Treatment has no effect on the treatment group before it is administered.

Definition “Difference-in-Differences ATT”

  • Given consistency, parallel trends, and no anticipation, the ATT is given by the difference between change in the treated group and change in the control group:

\(\tau_{\text{DiD}} = \mathbb{E}[Y_{i,t=1}(1) - Y_{i,t=1}(0) | T_i=1] = (\mathbb{E}[Y_{i,t=1} | T_i=1] - \mathbb{E}[Y_{i,t=0} | T_i=1]) - (\mathbb{E}[Y_{i,t=1} | T_i=0] - \mathbb{E}[Y_{i,t=0} | T_i=0])\).

Identification

  • Proof:

\[ \begin{aligned} \tau_{\text{DiD}} = \mathbb{E}[Y_{i,1}(1) - Y_{i,1}(0) | T_i=1] &\overset{\text{LIE}}{=} \mathbb{E}[Y_{i,1}(1) | T_i=1] - \mathbb{E}[Y_{i,1}(0) | T_i=1] \\ &= \mathbb{E}[Y_{i,1}(1) | T_i=1] {\color{#00C1D4}- \mathbb{E}[Y_{i,0}(0) | T_i=1]} - \mathbb{E}[Y_{i,1}(0) | T_i=1] {\color{#00C1D4}+ \mathbb{E}[Y_{i,0}(0) | T_i=1]} \\ & \overset{\text{(A.na)}}{=} \mathbb{E}[Y_{i,1}(1) | T_i=1] -\mathbb{E}[Y_{i,0}({\color{#00C1D4}1}) | T_i=1] - \mathbb{E}[Y_{i,1}(0) | T_i=1] + \mathbb{E}[Y_{i,0}(0) | T_i= 1] \\ & \overset{\text{(SUTVA)}}{=} {\color{#00C1D4}\mathbb{E}[Y_{i,1} | T_i=1]} - {\color{#00C1D4}\mathbb{E}[Y_{i,0} | T_i=1]} - {\color{#FF7E15}(\mathbb{E}[Y_{i,1}(0) - Y_{i,0}(0) | T_i= 1])} \\ & \overset{\text{(A.pt)}}{=} \mathbb{E}[Y_{i,1} | T_i=1] - \mathbb{E}[Y_{i,0} | T_i=1] - {\color{#00C1D4}(\mathbb{E}[Y_{i,1}(0) - Y_{i,0}(0) | T_i= 0])} \\ &\overset{\text{LIE}}{=} \mathbb{E}[Y_{i,1} | T_i=1] - \mathbb{E}[Y_{i,0} | T_i=1] - (\mathbb{E}[Y_{i,1}(0) | T_i=0] - \mathbb{E}[Y_{i,0}(0) | T_i=0]) \\ &\overset{\text{SUTVA}}{=} \underbrace{\mathbb{E}[Y_{i,1} | T_i=1] - \mathbb{E}[Y_{i,0} | T_i=1]}_{\text{change in the treated group}} - \underbrace{(\mathbb{E}[Y_{i,1} | T_i=0] - \mathbb{E}[Y_{i,0} | T_i=0])}_{\text{change in the control group}} \\ \end{aligned} \]

How DiD Estimation Works

Diff-in-Diffs Regression

  • DiD estimator \(\tau_{\text{DiD}}\) can be obtained by two types of regressions:
  1. Two-way fixed effects (TWFE) regression:
    • \(Y_{i,t} = \alpha_i + \gamma_t + \tau_{\text{DiD}} (T_i \times t) + \epsilon_{i,t}\)
    • Individual fixed effects \(\alpha_i\): capture time-invariant characteristics of individuals.
    • Time fixed effects \(\gamma_t\): capture time-specific effects common to all individuals.
    • Interaction between the treatment \(T_i\) and a time dummy \(t\): \(T_i \times t\).
    • Works for panel data: i.e. same observations before and after treatment.
  1. Regressing \(Y_i\) on the treatment \(T_i\), a time dummy \(t\) and their interaction \(T_i \times t\):
    • \(Y_{i,t} = \alpha + \beta T_i + \gamma t + \tau_{\text{DiD}} (T_i \times t) + \epsilon_{i,t}\)
    • Works for both panel data and repeated cross-sectionional data: i.e. different observations before and after treatment.

Diff-in-Diffs Regression

  • Graphical interpretation of \(Y_{i,t} = \alpha + \beta T_i + \gamma t + \tau_{\text{DiD}} (T_i \times t) + \epsilon_{i,t}\):
    • \(\alpha = \mathbb{E}[Y_{i,0} | T_i=0]\) is the mean outcome of the nontreated at \(t=0\).
    • \(\beta = \mathbb{E}[Y_{i,0} | T_i=1] - \mathbb{E}[Y_{i,0} | T_i=0]\) is the mean difference in outcomes across treatment groups at \(t=0\).
      • This selection bias should remain constant in \(t=1\).
    • \(\gamma = \mathbb{E}[Y_{i,1} | T_i=0] - \mathbb{E}[Y_{i,0} | T_i=0]\) is the time trend in mean outcomes among the non-treated.
      • This trend should be the same (parallel) for the treated group.



  • Alternative interpretation: \[ \begin{aligned} \tau_{\text{DiD}} = \underbrace{\mathbb{E}[Y_{i,1} | T_i=1] - \mathbb{E}[Y_{i,1} | T_i=0]}_{\text{difference in post-treatment}} \\ - \underbrace{(\mathbb{E}[Y_{i,0} | T_i=1] - \mathbb{E}[Y_{i,0} | T_i=0])}_{\text{difference in pre-treatment}} \end{aligned} \]

Basic DiD: Example

  • Repeated cross-sectional data: Assess house prices before and after a new highway is built. Repeated cross-section data of 179 houses in 1978 and 142 houses in 1981 in Kiel, Germany. Not the same houses over time.
library(wooldridge)  # load wooldridge package for data
library(fixest) # load fixest package for FE regression
data(kielmc)  # load kielmc data
attach(kielmc)   # attach data
kielmc$Y = kielmc$rprice  # define outcome
kielmc$T = kielmc$nearinc # define treatment group
kielmc$t = kielmc$y81     # define period dummy

feols(Y ~ T + t + T:t,    # eqivalent to lm(Y ~ T*t)
      data = kielmc, 
      vcov = vcov_cluster("cbd") # cluster st.error w.r.t. distance to center
)
OLS estimation, Dep. Var.: Y
Observations: 321
Standard-errors: Clustered (cbd) 
            Estimate Std. Error  t value   Pr(>|t|)    
(Intercept)  82517.2    3221.92 25.61117  < 2.2e-16 ***
T           -18824.4    7796.29 -2.41453 0.02146099 *  
t            18790.3    5154.86  3.64516 0.00090981 ***
T:t         -11863.9    6621.82 -1.79164 0.08236582 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
RMSE: 30,053.9   Adj. R2: 0.166131

Basic DiD: Example 2

  • Panel data: Assess the impact of participating in the U.S. National Supported Work (NSW) training program targeted to individuals with social and economic problems on their real earnings. Experimental vs. non-experimental control group.
library(tidyverse)
library(fixest)
library(haven) # to read Stata files
# 1. Experimental data
data <- haven::read_dta("https://raw.github.com/Mixtape-Sessions/Causal-Inference-2/master/Lab/Lalonde/lalonde_exp_panel.dta")
# ---- Difference-in-means - Averages
with(data, {y11 = mean(re[year == 78 & ever_treated == 1])
              y01 = mean(re[year == 78 & ever_treated == 0])
              dim = y11 - y01
              dim})
[1] 1794.342
# 2. Non-Experimental data
data <- haven::read_dta("https://raw.github.com/Mixtape-Sessions/Causal-Inference-2/master/Lab/Lalonde/lalonde_nonexp_panel.dta")
# ---- Difference-in-means - Averages
with(data, {y11 = mean(re[year == 78 & ever_treated == 1])
              y01 = mean(re[year == 78 & ever_treated == 0])
              dim = y11 - y01
              dim})
[1] -8497.516
# ---- Difference-in-Differences - Averages
with(data, {y00 = mean(re[year == 75 & ever_treated == 0])
              y01 = mean(re[year == 78 & ever_treated == 0])
              y10 = mean(re[year == 75 & ever_treated == 1])
              y11 = mean(re[year == 78 & ever_treated == 1])
              did = (y11 - y10) - (y01 - y00)
              did})
[1] 3621.232
data$post_treat = data$ever_treated * (data$year == 78)
# ---- Difference-in-Differences - Two-Way Fixed Effects Regression
feols(re ~ post_treat | id + year, 
        data = data |> filter(year %in% c(75, 78)),
        vcov = vcov_cluster(c("id", "year")))
# ---- Difference-in-Differences - Interactive Regression
feols(re ~ ever_treated + I(year == 78) + post_treat, 
        data = data |> filter(year %in% c(75, 78)),
        vcov = vcov_cluster(c("id", "year")))
OLS estimation, Dep. Var.: re
Observations: 32,354
Fixed-effects: id: 16,177,  year: 2
Standard-errors: Clustered (id & year) 
           Estimate Std. Error t value Pr(>|t|) 
post_treat  3621.23     609.84 5.93801  0.10621 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
RMSE: 3,859.0     Adj. R2: 0.670904
                Within R2: 0.002483
OLS estimation, Dep. Var.: re
Observations: 32,354
Standard-errors: Clustered (id & year) 
               Estimate Std. Error   t value  Pr(>|t|)    
(Intercept)    13650.80    51.4092  265.5324 0.0023975 ** 
ever_treated  -12118.75    99.1118 -122.2736 0.0052064 ** 
I(year == 78)   1195.86    21.2944   56.1583 0.0113350 *  
post_treat      3621.23    41.0534   88.2078 0.0072170 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
RMSE: 9,428.0   Adj. R2: 0.017819

Types of Confounding Covariates

  • Time-invariant covariate: \(X_{i}\)
    • Does not change over time for an individual.
    • Confounder if the means of the covariate are different in the two groups and it has a time-varying effect on the outcome.
  • Time-varying covariate: \(X_{i,t}\)
    • Does change over time for an individual.
    • Confounder if the covariate means evolve differently between the two groups or the covariate means start at different levels and evolve in parallel, and the covariate has a time-varying effect on the outcome.

Handling of Confounding Covariates

  • Most estimation approaches treat all covariates as time-invariant:
    • Time-varying covariates are fixed at their pre-treatment value across all periods.
    • Advantages: avoids time-varying confounders affected by treatment (mediators are “bad controls”) and helps to reduce the dimensionality of the problem.
    • Disadvantage: no control for time trends in X independently of the treatment and thus PT violation remains.
  • Newer estimation approaches can handle time-varying covariates in more flexible ways (e.g. Caetano and Callaway, 2023):
    • All covariates values from each period in each period (highest dimensionality).
    • Covariates values from the current period and base period.
    • Changes in covariate values from period to period.
    • Average covariate values across time periods.

ATT Conditional on Covariates: Identification

  • Given the conditional parallel trends assumption, no anticipation assumption, and overlap condition, the ATT conditional on \(\mathbf{X_i = x}\), \(\tau_{\text{DiD}}(\mathbf{x})\), can be identified for all \(\mathbf{x}\) with \(\Pr(T_i = 1 \mid \mathbf{X_i = x}) > 0\) as:

\[ \begin{aligned} \tau_{\text{DiD}}(\mathbf{x}) &= \mathbb{E}[Y_{i,1}(1) - Y_{i,1}(0) | T_i=1, \mathbf{X_i = x}] \\ &\overset{\text{LIE}}{=} \mathbb{E}[Y_{i,1}(1) | T_i=1, \mathbf{x}] - \mathbb{E}[Y_{i,1}(0) | T_i=1, \mathbf{x}] \\ &= \mathbb{E}[Y_{i,1}(1) | T_i=1, \mathbf{x}] {\color{#00C1D4}- \mathbb{E}[Y_{i,0}(0) | T_i=1, \mathbf{x}]} - \mathbb{E}[Y_{i,1}(0) | T_i=1, \mathbf{x}] {\color{#00C1D4}+ \mathbb{E}[Y_{i,0}(0) | T_i=1, \mathbf{x}]} \\ & \overset{\text{(A.cna)}}{=} \mathbb{E}[Y_{i,1}(1) | T_i=1, \mathbf{x}] -\mathbb{E}[Y_{i,0}({\color{#00C1D4}1}) | T_i=1, \mathbf{x}] - \mathbb{E}[Y_{i,1}(0) | T_i=1, \mathbf{x}] + \mathbb{E}[Y_{i,0}(0) | T_i= 1, \mathbf{x}] \\ & \overset{\text{(SUTVA)}}{=} {\color{#00C1D4}\mathbb{E}[Y_{i,1} | T_i=1, \mathbf{x}]} - {\color{#00C1D4}\mathbb{E}[Y_{i,0} | T_i=1, \mathbf{x}]} - {\color{#FF7E15}(\mathbb{E}[Y_{i,1}(0) - Y_{i,0}(0) | T_i= 1, \mathbf{x}])} \\ & \overset{\text{(A.cpt)}}{=} \mathbb{E}[Y_{i,1} | T_i=1, \mathbf{x}] - \mathbb{E}[Y_{i,0} | T_i=1, \mathbf{x}] - {\color{#00C1D4}(\mathbb{E}[Y_{i,1}(0) - Y_{i,0}(0) | T_i= 0, \mathbf{x}])} \\ &\overset{\text{LIE}}{=} \mathbb{E}[Y_{i,1} | T_i=1, \mathbf{x}] - \mathbb{E}[Y_{i,0} | T_i=1, \mathbf{x}] - (\mathbb{E}[Y_{i,1}(0) | T_i=0, \mathbf{x}] - \mathbb{E}[Y_{i,0}(0) | T_i=0, \mathbf{x}]) \\ &\overset{\text{SUTVA}}{=} \underbrace{\mathbb{E}[Y_{i,1} | T_i=1, \mathbf{x}] - \mathbb{E}[Y_{i,0} | T_i=1, \mathbf{x}]}_{\text{change for T=1 and X=x}} - \underbrace{(\mathbb{E}[Y_{i,1} | T_i=0, \mathbf{x}] - \mathbb{E}[Y_{i,0} | T_i=0, \mathbf{x}])}_{\text{change for T=0 and X=x}} \\ \end{aligned} \]

ATT Conditional on Covariates: Estimation

  • The unconditional ATT can then be identified by averaging \(\tau_{\text{DiD}}(\mathbf{x})\) over the distribution of \(X_i\) in the treated population.
  • Using the law of iterated expectations, we have:

\(\tau_{\text{DiD}} = \mathbb{E}[Y_{i,1}(1) - Y_{i,1}(0) | T_i=1] = \mathbb{E}_{\mathbf{X_i}}\left[ \underbrace{\mathbb{E}[Y_{i,1}(1) - Y_{i,1}(0) | T_i=1, \mathbf{X_i}]}_{\tau_{\text{DiD}}(\mathbf{X_i})} \mid T_i=1\right]\)

  • Four estimation procedures:
    1. Two-Way Fixed Effects (TWFE) Regression
    2. Outcome Regression Adjustment
    3. Inverse Probability Weighting (IPW)
    4. Doubly Robust Estimation

Two-Way Fixed Effects (TWFE) Regression

  • Augment TWFE specification with covariates (e.g. Zeldow and Hatfield, 2021):
    • Time-invariant covariate with time-variant effect on outcome:
      • \(Y_{i,t} = \alpha_i + \gamma_t + \tau_{\text{DiD}} (T_i \times t) + \delta (X_i \times t) + \epsilon_{i,t}\)
    • Time-variant covariate with time-invariant effect on outcome:
      • \(Y_{i,t} = \alpha_i + \gamma_t + \tau_{\text{DiD}} (T_i \times t) + \delta X_{it} + \epsilon_{i,t}\)
    • Time-variant covariate with time-variant effect on outcome:
      • \(Y_{i,t} = \alpha_i + \gamma_t + \tau_{\text{DiD}} (T_i \times t) + \delta (X_{it} \times t) + \epsilon_{i,t}\)
  • Rather strong additional assumptions needed (e.g. Caetano and Callaway, 2023):
    • Treatment effect is homogeneous across different values of X.
    • Outcome is linear in X.
    • Only controlling for covariate changes, not for levels.

TWFE Regression: Example

library(tidyverse)
library(fixest)
library(haven) # to read Stata files

data <- haven::read_dta("https://raw.github.com/Mixtape-Sessions/Causal-Inference-2/master/Lab/Lalonde/lalonde_nonexp_panel.dta")

data$post_treat = data$ever_treated * (data$year == 78)
data$post = as.integer(data$year == 78)

# ---- Difference-in-Differences - Two-Way Fixed Effects Regression
feols(
  re ~ post_treat + age:post + agesq:post + agecube:post + educ:post + educsq:post + marr:post + nodegree:post + black:post + hisp:post | id + year, 
  data = data |> filter(year %in% c(75, 78)),
  vcov = vcov_cluster(c("id", "year"))
)
OLS estimation, Dep. Var.: re
Observations: 32,354
Fixed-effects: id: 16,177,  year: 2
Standard-errors: Clustered (id & year) 
                  Estimate Std. Error   t value Pr(>|t|)    
post_treat     2450.964333 645.332739  3.797985 0.163900    
age:post      -1392.968721 188.627206 -7.384771 0.085686 .  
post:agesq       32.005450   5.576397  5.739450 0.109818    
post:agecube     -0.254779   0.052340 -4.867790 0.128988    
post:educ      -132.810162  95.337273 -1.393056 0.396361    
post:educsq      10.228897   3.957951  2.584392 0.235037    
post:marr      -578.337803 163.583509 -3.535429 0.175485    
post:nodegree   417.398327 193.694598  2.154930 0.276597    
post:black     -281.607046 205.559277 -1.369955 0.401418    
post:hisp      -126.167682 234.813629 -0.537310 0.686116    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
RMSE: 3,737.9     Adj. R2: 0.691052
                Within R2: 0.064074

Outcome Regression Adjustment

  • Regression Adjustment exploits the fact that under conditional parallel trends, strong overlap, and no anticipation the ATT can be written as (Heckman, Ichimura and Todd, 1997):

\[ \begin{aligned} \tau_{\text{DiD}} &= \mathbb{E}_{\mathbf{X_i}}\left[\space \underbrace{\mathbb{E}[Y_{i,1} - Y_{i,0} | T_i=1, \mathbf{X_i}] - \mathbb{E}[Y_{i,1} - Y_{i,0} | T_i=0, \mathbf{X_i}]}_{\tau_{\text{DiD}}(\mathbf{X_i})} \mid T_i=1\right] \\ &\overset{\text{LIE}}{=} \mathbb{E}[Y_{i,1} - Y_{i,0} | T_i=1] - \mathbb{E}_{\mathbf{X_i}}\left[\space \mathbb{E}[Y_{i,1} - Y_{i,0} | T_i=0, \mathbf{X_i}] \mid T_i=1\right]\\ &\overset{\text{LIE}}{=} \mathbb{E}[Y_{i,1} - Y_{i,0} | T_i=1] - \mathbb{E}_{\mathbf{X_i}}\left[\space \mathbb{E}[Y_{i,1} | T_i=0, \mathbf{X_i}] - \mathbb{E}[Y_{i,0} | T_i=0, \mathbf{X_i}] \mid T_i=1\right] \\ &:= \mathbb{E}[Y_{i,1} - Y_{i,0} | T_i=1] - \mathbb{E}_{\mathbf{X_i}}\left[\space \mu_{1}(0, \mathbf{X_i}) - \mu_{0}(0, \mathbf{X_i}) \mid T_i=1\right] \end{aligned} \]

  • Potential outcome evolution for the treatment group is imputed with a regression based only on X of the control group.

  • Sample version: \(\hat{\tau}_{\text{DiD}} =\frac{1}{N_1} \underset{i:T_i=1}{\sum} (( Y_{i,1} - Y_{i,0}) - (\hat{\mu}_{1}(0, \mathbf{X_i}) - \hat{\mu}_{0}(0, \mathbf{X_i})))\)

    1. Estimate the conditional expectation of the outcome at time \(t\), \(\hat{\mu}_{t}(T_i = 0, \mathbf{X_i})\) among untreated units.
    1. Create prediction for each treated unit using the covarite values \(X_i\) among the treated units.
    1. Calculate difference between observed and predicted difference for each treated unit and average.

Outcome Regression Adjustment: Example

library(tidyverse)
library(DRDID)
library(haven) # to read Stata files

data <- haven::read_dta("https://raw.github.com/Mixtape-Sessions/Causal-Inference-2/master/Lab/Lalonde/lalonde_nonexp_panel.dta")

# ---- Difference-in-Differences - Double-robust
ordid(
  yname = "re", tname = "year", idname = "id", dname = "ever_treated", 
  xformla = ~ age + agesq + agecube + educ + educsq + marr + nodegree + black + hisp + re74 + u74,
  data = data |> filter(year == 75 | year == 78)
)
 Call:
ordid(yname = "re", tname = "year", idname = "id", dname = "ever_treated", 
    xformla = ~age + agesq + agecube + educ + educsq + marr + 
        nodegree + black + hisp + re74 + u74, data = filter(data, 
        year == 75 | year == 78))
------------------------------------------------------------------
 Outcome-Regression DID estimator for the ATT:
 
   ATT     Std. Error  t value    Pr(>|t|)  [95% Conf. Interval] 
1769.9984   643.0996    2.7523     0.0059    509.5233  3030.4736 
------------------------------------------------------------------
 Estimator based on panel data.
 Outcome regression est. method: OLS.
 Analytical standard error.
------------------------------------------------------------------
 See Sant'Anna and Zhao (2020) for details.

Outcome Regression with ML: Example

library(tidyverse)
library(mlr3)
library(mlr3learners)
library(haven) # to read Stata files

data <- haven::read_dta("https://raw.github.com/Mixtape-Sessions/Causal-Inference-2/master/Lab/Lalonde/lalonde_nonexp_panel.dta")
data <- data |> filter(year == 75 | year == 78) |> 
  select(-treat, -data_id) |>
  pivot_wider(names_from = year, values_from = re, names_prefix = "re_") |>
  mutate(re_diff = re_78 - re_75)
data_pred <- data |> select(re_diff, ever_treated, age, agesq, agecube, educ, educsq, 
                              marr, nodegree, black, hisp, re74, u74)

# Define prediction model for the outcome difference delta_mu
task_mu <- as_task_regr(data_pred |> select(-ever_treated), target = "re_diff")
lrnr_mu <- lrn("regr.ranger", predict_type = "response")
# learn outcome difference delta_mu among untreated
lrnr_mu$train(task_mu, row_ids = which(data$ever_treated == 0))
# predict outcome difference delta_mu among treated
delta_mu <- lrnr_mu$predict(task_mu, row_ids = which(data$ever_treated == 1))$response

Y1 <- data$re_78[data$ever_treated == 1]
Y0 <- data$re_75[data$ever_treated == 1]

mean( (Y1 - Y0) - delta_mu )
[1] 2188.271

Inverse Probability Weighting (IPW)

  • The IPW approach proposed by Abadie (2005):

    • \(\tau_{\text{DiD}} = \mathbb{E}\left( \frac{ Y_{i,1} - Y_{i,0}}{P(T_i=1)} \frac{T_i - e(\mathbf{X_i})}{1-e(\mathbf{X_i})} \right)\)
    • \(e(\mathbf{X_i}) = P[T_i = 1 | \mathbf{X_i}]\)
  • Sample version:

    • \(\hat{\tau}_{\text{DiD}} = \frac{1}{N} \underset{i}{\sum} \left( \frac{ Y_{i,1} - Y_{i,0}}{P(T_i=1)} \frac{T_i - \hat{e}(\mathbf{X_i})}{1-\hat{e}(\mathbf{X_i})} \right)\)
  • Intuition: what happens when \(T_i=1\) and \(T_i=0\)?
    • Weighting with the propensity only happens to the control group’s first differences – not the treatment groups!
    • Why? Because it’s the \(Y_1(0)\) that is missing, not the \(Y_1(1)\).

IWP: Example

library(tidyverse)
library(DRDID)
library(haven) # to read Stata files

data <- haven::read_dta("https://raw.github.com/Mixtape-Sessions/Causal-Inference-2/master/Lab/Lalonde/lalonde_nonexp_panel.dta")

# ---- Difference-in-Differences - Double-robust
ipwdid(
  yname = "re", tname = "year", idname = "id", dname = "ever_treated", 
  xformla = ~ age + agesq + agecube + educ + educsq + marr + nodegree + black + hisp + re74 + u74,
  data = data |> filter(year == 75 | year == 78)
)
 Call:
ipwdid(yname = "re", tname = "year", idname = "id", dname = "ever_treated", 
    xformla = ~age + agesq + agecube + educ + educsq + marr + 
        nodegree + black + hisp + re74 + u74, data = filter(data, 
        year == 75 | year == 78))
------------------------------------------------------------------
 IPW DID estimator for the ATT:
 
   ATT     Std. Error  t value    Pr(>|t|)  [95% Conf. Interval] 
2048.1972   724.1233    2.8285     0.0047    628.9156  3467.4788 
------------------------------------------------------------------
 Estimator based on panel data.
 Hajek-type IPW estimator (weights sum up to 1).
 Propensity score est. method: maximum likelihood.
 Analytical standard error.
------------------------------------------------------------------
 See Sant'Anna and Zhao (2020) for details.

IWP with ML: Example

library(tidyverse)
library(mlr3)
library(mlr3learners)
library(haven) # to read Stata files
data <- haven::read_dta("https://raw.github.com/Mixtape-Sessions/Causal-Inference-2/master/Lab/Lalonde/lalonde_nonexp_panel.dta")
data <- data |> filter(year == 75 | year == 78) |> 
  select(-treat, -data_id) |>
  pivot_wider(names_from = year, values_from = re, names_prefix = "re_")
data_pred <- data |> select(re_78, re_75, ever_treated, age, agesq, agecube, educ, educsq, 
                            marr, nodegree, black, hisp, re74, u74)

# Define prediction model for the propensity score e
task_e <- as_task_classif(data_pred |> select(-re_78, -re_75), target = "ever_treated")
lrnr_e <- lrn("classif.ranger", predict_type = "prob")
# Learn propensity score e among all observations
lrnr_e$train(task_e)
# Predict propensity score ehat
ehat <- lrnr_e$predict(task_e)$prob[, 2]

# Calculate the ATT
T <- data$ever_treated
P <- mean(data$ever_treated)
Y1 <- data$re_78
Y0 <- data$re_75

mean( (Y1-Y0)/P * (T - ehat)/(1-ehat) ) 
[1] 2906.464

Doubly Robust Estimation

  • Outcome regression and IPW approaches can also be combined in the context of Diff-in-Diffs to form “doubly-robust” (DR) methods that are valid if either the outcome model or the propensity score model is correctly specified (Sant’Anna and Zhao, 2020):
    • \(\tau_{\text{DiD}} = \mathbb{E}\left( (Y_{i,1} - Y_{i,0} - (\mu_{1}(0, \mathbf{X_i}) - \mu_{0}(0, \mathbf{X_i}))) \left(\frac{T_i - e(\mathbf{X_i})}{P(T_i)(1-e(\mathbf{X_i}))}\right) \right)\)
  • Sample version:
    • \(\hat{\tau}_{\text{DiD}} = \frac{1}{N} \underset{i}{\sum} \left( (Y_{i,1} - Y_{i,0} - (\hat{\mu}_{1}(0, \mathbf{X_i}) - \hat{\mu}_{0}(0, \mathbf{X_i}))) \left(\frac{T_i - \hat{e}(\mathbf{X_i})}{P(T_i)(1-\hat{e}(\mathbf{X_i}))}\right) \right)\)
  • Double machine learning for difference-in-differences models (Chang, 2020).

Doubly Robust Estimation: Example

library(tidyverse)
library(DRDID)
library(haven) # to read Stata files

data <- haven::read_dta("https://raw.github.com/Mixtape-Sessions/Causal-Inference-2/master/Lab/Lalonde/lalonde_nonexp_panel.dta")

# ---- Difference-in-Differences - Double-robust
drdid(
  yname = "re", tname = "year", idname = "id", dname = "ever_treated", 
  xformla = ~ age + agesq + agecube + educ + educsq + marr + nodegree + black + hisp + re74 + u74,
  data = data |> filter(year == 75 | year == 78)
)
 Call:
drdid(yname = "re", tname = "year", idname = "id", dname = "ever_treated", 
    xformla = ~age + agesq + agecube + educ + educsq + marr + 
        nodegree + black + hisp + re74 + u74, data = filter(data, 
        year == 75 | year == 78))
------------------------------------------------------------------
 Further improved locally efficient DR DID estimator for the ATT:
 
   ATT     Std. Error  t value    Pr(>|t|)  [95% Conf. Interval] 
2032.9217   707.4779    2.8735     0.0041    646.265   3419.5784 
------------------------------------------------------------------
 Estimator based on panel data.
 Outcome regression est. method: weighted least squares.
 Propensity score est. method: inverse prob. tilting.
 Analytical standard error.
------------------------------------------------------------------
 See Sant'Anna and Zhao (2020) for details.

Doubly Robust Estimation with ML: Example

library(tidyverse)
library(mlr3)
library(mlr3learners)
library(haven) # to read Stata files

data <- haven::read_dta("https://raw.github.com/Mixtape-Sessions/Causal-Inference-2/master/Lab/Lalonde/lalonde_nonexp_panel.dta")
data <- data |> filter(year == 75 | year == 78) |> 
  select(-treat, -data_id) |>
  pivot_wider(names_from = year, values_from = re, names_prefix = "re_") |>
  mutate(re_diff = re_78 - re_75)
data_pred <- data |> select(re_diff, ever_treated, age, agesq, agecube, educ, educsq, 
                            marr, nodegree, black, hisp, re74, u74)

# Define prediction model for the propensity score e
task_e <- as_task_classif(data_pred |> select(-re_diff), target = "ever_treated")
lrnr_e <- lrn("classif.ranger", predict_type = "prob")
# Learn propensity score e among all observations
lrnr_e$train(task_e)
# Predict propensity score ehat
ehat <- lrnr_e$predict(task_e)$prob[, 2]

# Define prediction model for the outcome difference delta_mu
task_mu <- as_task_regr(data_pred |> select(-ever_treated), target = "re_diff")
lrnr_mu <- lrn("regr.ranger", predict_type = "response")
# learn outcome difference delta_mu among untreated
lrnr_mu$train(task_mu, row_ids = which(data$ever_treated == 0))
# predict outcome difference delta_mu among all observations
delta_mu <- lrnr_mu$predict(task_mu)$response

# Calculate the ATT
T <- data$ever_treated
P <- mean(data$ever_treated)
Y1 <- data$re_78
Y0 <- data$re_75

mean( ((Y1-Y0) - delta_mu) * (T - ehat)/(P*(1-ehat)) )
[1] 2283.593

Staggered treatment Timing

Staggered Timing

  • Remember basic DiD model:
    • Two periods and a common treatment date.
    • Identification from parallel trends and no anticipation.
  • Active recent literature has focused on relaxing the first assumption:
    • What if there are multiple periods and units adopt treatment at different times?
    • Maintaining parallel trends and no anticipation assumptions.
  • Notation:
    • Panel of observations \(i\) and time periods \(t = 1 ... T_t\).
    • Units adopt a binary treatment at different dates \(G_i \in (1, ..., T_t) \cup \infty\).
      • where \(G_i = \infty\) means never-treated.
    • Potential outcomes \(Y_{it}(g)\) depend on time (\(t\)) and time you were first treated (\(g\)).

Extending the Identifying Assumptions

  • Key identifying assumptions from the canonical model are extended in a natural way:

Assumption (A.stpt) “Parallel Trends”

\(\mathbb{E}[Y_{i,t}(\infty) - Y_{i,t-1}(\infty) | G_i=g] = \mathbb{E}[Y_{i,t-1}(\infty) - Y_{i,t}(\infty) | G_i=g'] \quad \forall g, g', t\).

  • Intuitively, says that if treatment hadn’t happened, all “adoption cohorts” would have parallel average outcomes in all periods. (Note: could impose slightly weaker versions, e.g. only require PT post-treatment).

Assumption (A.stna) “No Anticipation”

\(\mathbb{E}[Y_{i,t}(g) - Y_{i,t}(\infty) | T_i=1] = 0 \quad\) or \(\quad Y_{i,t}(g) = Y_{i,t}(\infty) \quad \forall t < g\)

  • Treatment has no effect on the treatment group before it is administered.

TWFE Regression with Staggered Timing

  • Suppose we extend Two-way fixed effects (TWFE) regression to staggered treatment timing:
    • \(Y_{i,t} = \alpha_i + \gamma_t + \beta D_{it} + \epsilon_{i,t}\)
    • where \(D_{it} = 1[t \geq G_i]\) is an indicator for whether the unit has been treated by time \(t\).
  • Given no anticipation and parallel trends across all adoption cohorts:
    • if treatment effect is constant across time and units, \(Y_{it}(g) - Y_{it}(\infty) \equiv \tau\), identification possible: \(\tau = \beta\).
    • if treatment effect is heterogeneous, i.e. depends on time since treatement, \(Y_{it}(t-r) - Y_{it}(\infty) \equiv \tau_r\), then identification fails, because some \(\tau_{r}\)’s may get negative weights.
  • Reason:
    • Clean comparisons: DiD’s between treated and not-yet-treated units.
    • Forbidden comparisons: DiD’s between already-treated units (who began treatment at different times).
      • can lead to negative weights, if treatment effects in the already treated “control group” change over time.

Forbidden Comparisons in TWFE: Intuition

    1. Suppose two period model with two groups: always treated (in both periods) & switchers (treated only in period 2).
    • With two periods, \(Y_{i,t} = \alpha_i + \gamma_t + \beta D_{it} + \epsilon_{i,t}\) is the same as \(\Delta Y_{i} = \alpha + \beta \Delta D_{i} + u_i\) (by first-differencing).
    • \(\Delta D_{i} = 1\) for switchers and \(0\) for the control group of always treated, thus:
    • \(\hat{\beta} = \left( \overline{Y}_{\text{switchers}, 2} - \overline{Y}_{\text{switchers}, 1} \right) - \left( \overline{Y}_{\text{AT}, 2} - \overline{Y}_{\text{AT}, 1} \right)\)
    • Problem: if treatment effect for always-treated grows over time, \(\hat{\beta}\) can get negative weights.
    1. Frisch-Waugh-Lovell theorem says that we can obtain \(\beta\) in \(Y_{i,t} = \alpha_i + \gamma_t + \beta D_{it} + \epsilon_{i,t}\) in two steps:
      1. Regress \(D_{i,t}\) on fixed effects (in a linear probability model - LPM): \(D_{i,t} = \tilde{\alpha}_i + \tilde{\gamma}_t + \tilde{\epsilon}_{i,t}\).
      1. Regress \(Y_{i,t}\) on \(D_{i,t} - \hat{D}_{i,t}\), thus: \(\beta = \frac{\mathbb{E}(Y_{i,t}(D_{i,t} - \hat{D}_{i,t}))}{Var(D_{i,t} - \hat{D}_{i,t})}\).
    • However, LPMs can predict \(\hat{D}_{i,t} > 1\), and \(Y_{it}\) can get negative weight.
    1. Even if weights are non-negative (i.e. individual \(\tau_i\)’s are constant - no dynamics), \(\beta\) might still be biased:
    • \(\beta = \sum_{i=1}^{N} \sum_{t=1}^{T_t} w_{it} \beta_{it}\): \(w_{it}\) is inversely proportional to the variance of \(\beta_{it}\).
      • Proportional to available information for \(i\), i.e. the number of observations pre and post treatment.
    • But are individuals in the middle of the panel also the most representative of the population?

Dynamic TWFE Regression with Staggered Timing

  • Sun and Abraham (2021) show that similar issues arise with dynamic TWFE (“event study”) specifications:
    • \(Y_{i,t} = \alpha_i + \gamma_t + \sum_{k \neq 0} \beta_k D_{it}^k + \epsilon_{i,t}\)
    • where \(D_{it}^k = 1[t - G_i = k]\) are leading and lagging “event time” dummies.
  • This dynamic specification yields a sensible causal estimand when there is heterogeneity only in time since treatment.
  • However, if there is heterogeneity in dynamic treatment effects also across adoption cohorts, then:
    • Like for static TWFE, \(\beta_k\) may put negative weight on treatment effects after k periods for some units.
    • Furthermore, \(\beta_k\) may be “contaminated” by treatment effects at different leads and lags \(k' \neq k\).
  • Thus, interpreting \(\beta_k\) as estimates …
    • of the dynamic effects of treatment (\(k>0\)) may be misleading.
    • for pre-trends tests (\(k<0\)) may also be misleading.
      • We will return to pre-trends tests later.

New DiD-Estimators for Staggered Timing

Estimator by Callaway & Sant’Anna (2021)

  • Callaway & Sant’Anna (2021) define the group-time-specific treatment effect on the treated:
    • \(\tau(g, t) = \mathbb{E}[Y_{i,t}(g) - Y_{i,t}(\infty) | G_i=g]\), with \(t \geq g\).
    • ATT in period \(t\) for units first treated in period \(g\).
  • Under PT and No Anticipation, it can be identified as:
    • \(\tau(g, t) = \underbrace{\mathbb{E}[Y_{i,t} - Y_{i,g-1} | G_i=g]}_{\text{change for cohort g}} - \underbrace{\mathbb{E}[Y_{i,t} - Y_{i,g-1} | G_i = \infty]}_{\text{change for never-treated}}\)
    • Similar to the basic model, this is a two-group two-period comparison.
      • Similar identification proof (next).
    • Differences:
        1. Period \(g-1\) is pre-treatment period (right before cohort g becomes treated).
        1. More flexibility in terms of comparison group: (a) never-treated, (b) not-yet-treated, (c) not-yet-but-eventually-treated, (d) last-to-be-treated.
  • Sample version of \(\tau(g, t)\):
    • \(\hat{\tau}(g, t) = \frac{1}{N_{g}} \sum_{i=1}^{N_g} \left( Y_{i,t} - Y_{i,g-1} \right) 1[G_i = g] - \frac{1}{N_{\infty}} \sum_{i=1}^{N_{\infty}} \left( Y_{i,t} - Y_{i,g-1} \right) 1[G_i = \infty]\)

Callaway & Sant’Anna (2021): Proof

  • Start with identification result and work backwards:
    • \(\mathbb{E}[Y_{i,t} - Y_{i,g-1} | G_i=g] - \mathbb{E}[Y_{i,t} - Y_{i,g-1} | G_i = \infty]\)
  • Apply definition of Potential Outcomes:
    • \(\mathbb{E}[Y_{i,t}(g) - Y_{i,g-1}(g) | G_i=g] - \mathbb{E}[Y_{i,t}(\infty) - Y_{i,g-1}(\infty) | G_i = \infty]\)
  • Use No Anticipation to substitute \(Y_{i,g-1}(\infty)\) for \(Y_{i,g-1}(g)\):
    • \(\mathbb{E}[Y_{i,t}(g) - Y_{i,g-1}(\infty) | G_i=g] - \mathbb{E}[Y_{i,t}(\infty) - Y_{i,g-1}(\infty) | G_i = \infty]\)
  • Add and subtract \(\mathbb{E}[Y_{i,t}(\infty) | G_i=g]\):
    • \(\mathbb{E}[Y_{i,t}(g) - Y_{i,g-1}(\infty) | G_i=g] - \mathbb{E}[Y_{i,t}(\infty) - Y_{i,g-1}(\infty) | G_i = \infty] + \mathbb{E}[Y_{i,t}(\infty) | G_i=g] - \mathbb{E}[Y_{i,t}(\infty) | G_i=g]\)
  • Rearrange terms:
    • \(\mathbb{E}[Y_{i,t}(g) - Y_{i,t}(\infty) | G_i=g] + \underbrace{\mathbb{E}[Y_{i,t}(\infty) - Y_{i,g-1}(\infty) | G_i=g] - \mathbb{E}[Y_{i,t}(\infty) - Y_{i,g-1}(\infty) | G_i = \infty]}_{=0}\)
  • QED:
    • \(\tau(g, t) = \mathbb{E}[Y_{i,t}(g) - Y_{i,t}(\infty) | G_i=g]\)

Callaway & Sant’Anna (2021): Aggregation

  • If have a large number of observations and relatively few groups/periods, can report \(\hat{\tau}(g, t)\)’s directly.
  • If there are many groups/periods, the \(\hat{\tau}(g, t)\)’s may be very imprecisely estimated and/or too numerous to report.
  • In these cases, it is often desirable to report meaningful averages of the \(\hat{\tau}(g, t)\)’s.
  • Four aggregation schemes:
    • Simple:
      • Computes a single weighted average of all group-time average treatment effects with weights proportional to group size.
    • Dynamic:
      • Computes event-study parameters which average the \(\hat{\tau}(g, t)\)’s at a particular lag since (or lengths of exposure to) the treatment.
      • Can also be constructed for k < 0 to estimate “pre-trends”.
    • Group:
      • Computes group averages which average the \(\hat{\tau}(g, t)\)’s for a particular cohort treated a \(g\).
    • Calendar:
      • Computes “calendar averages” which average the \(\hat{\tau}(g, t)\)’s for a particular calendar time (year).

Callaway & Sant’Anna (2021): Further Variants

  • Anticipation:
    • In many applications, units may observe that an intervention is about to occur, so that they change their behaviors before the intervention is actually implemented.
    • Straightforward adaptation: if there is one period of anticipation, set the base period to \(g−2\) rather than \(g−1\), so that:
      • \(\tau(g, t) = \mathbb{E}[Y_{i,t} - Y_{i,g-2} | G_i=g] - \mathbb{E}[Y_{i,t} - Y_{i,g-2} | G_i = \infty]\)
  • Covariates:
    • Staggered timing estimator can also be extended to include covariates:
        1. Conditional outcome regression.
        1. Inverse Probability Weighting.
        1. Doubly Robust Estimation.

Callaway & Sant’Anna (2021): Example

  • Assess the impact of job displacement (i.e. losing job w/o own fault, e.g. mass layoff) on income of 1,298 individuals.
library(did) # Load the 'did' package
library(fixest) # Load the 'fixest' package
temp_file <- tempfile(fileext = ".RData") # Define a temporary file path to save the downloaded file
# Download the file from Dropbox
download.file("https://www.dropbox.com/scl/fi/wnp1hrkz00izr72h6ua1t/job_displacement_data.RData?rlkey=nr15rrbv1ev8ra1kzfa1knux2&dl=1", temp_file, mode = "wb")
load(temp_file) # Load the RData file into the R session
rm(temp_file) # Optionally, remove the temporary file


# Check the structure of the loaded data
head(job_displacement_data)

# run TWFE
fixest::feols(income ~ i(year >= group) | id + year,
              data=job_displacement_data,
              cluster=c("id", "year"))
# A tibble: 6 × 7
       id  year group income female white occ_score
    <dbl> <dbl> <dbl>  <dbl>  <dbl> <dbl>     <dbl>
1 7900002  1984     0  31130      1     1         4
2 7900002  1985     0  32200      1     1         3
3 7900002  1986     0  35520      1     1         4
4 7900002  1987     0  43600      1     1         4
5 7900002  1988     0  39900      1     1         4
6 7900002  1990     0  38200      1     1         4
OLS estimation, Dep. Var.: income
Observations: 11,682
Fixed-effects: id: 1,298,  year: 9
Standard-errors: Clustered (id & year) 
                    Estimate Std. Error  t value Pr(>|t|)    
year >= group::TRUE -6455.36    2041.49 -3.16208 0.013353 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
RMSE: 14,824.0     Adj. R2: 0.674268
                 Within R2: 0.002425

Callaway & Sant’Anna (2021): Example cont’d

  • Assess the impact of job displacement (i.e. losing job w/o own fault, e.g. mass layoff) on income of 1,298 individuals.
# run command to obtain 64 (8 years * 8 groups) Tau(t,g)'s
result <- att_gt(
  yname = "income",
  tname = "year",
  idname = "id",
  gname = "group",
  xformla = ~ female + white + occ_score, #  ~ 1
  data = job_displacement_data,
  allow_unbalanced_panel = TRUE,
  control_group = "nevertreated", # "notyettreated"
  anticipation = 0,
  clustervars = "id",
  est_method = "dr", # "reg", "ipw"
  )

# aggregate results over time since treatment
result_es <- aggte(result, type="dynamic")
ggdid(result_es)

# aggregate results over groups and overall
result_overall <- aggte(result, type="group")
summary(result_overall)

Call:
aggte(MP = result, type = "group")

Reference: Callaway, Brantly and Pedro H.C. Sant'Anna.  "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015> 


Overall summary of ATT's based on group/cohort aggregation:  
       ATT    Std. Error     [ 95%  Conf. Int.]  
 -6290.805      1664.296  -9552.766   -3028.844 *


Group Effects:
 Group    Estimate Std. Error [95% Simult.  Conf. Band]  
  1985  -9152.7025   2998.059    -15874.750   -2430.655 *
  1986  -1863.3794   5769.769    -14799.971   11073.212  
  1987    816.6743   6509.247    -13777.925   15411.274  
  1988   6797.5169   4136.877     -2477.913   16072.947  
  1990 -13962.6186   3847.891    -22590.104   -5335.134 *
  1991  -7107.5120   5033.109    -18392.415    4177.391  
  1992 -10570.4752   8649.169    -29963.066    8822.115  
  1993 -24680.0159   5672.560    -37398.653  -11961.379 *
---
Signif. codes: `*' confidence band does not cover 0

Control Group:  Never Treated,  Anticipation Periods:  0
Estimation Method:  Doubly Robust

Imputation-based Estimation: Procedure

  • Recall the TWFE specification (with covariates) with the (heterogeneous) \(\tau_{it}\) if PT assumption holds:
    • \(Y_{i,t} = \alpha_i + \gamma_t + \tau_{it} D_{it} + \delta X_{it} + \epsilon_{i,t}\)
    • where \(D_{it} = T_i \times t\) and \(\mathbb{E}[\epsilon_{i,t} | \{D_{it}, X_{it}\}_{t=1}^{T_t}] = 0\)
  • If we have some \(t\) where \(D_{it}= 0\) for all \(i\) (e.g. pre-treatment period observations), two-stage procedure:
      1. Using all observations with \(D_{it}= 0\), regress \(Y_{i,t}\) on the fixed effect \(\alpha_i\) and \(\gamma_t\) as well as on the covariates \(X_{it}\):
      • \(Y_{i,t}(0) = \alpha_i + \gamma_t + \delta X_{it} + \epsilon_{i,t}\).
      • Obtain \(\hat{Y}_{i,t}(0) = \hat{\alpha}_i + \hat{\gamma}_t + \hat{\delta} X_{it}\).
      1. Regress adjusted outcomes \(Y_{i,t} - \hat{Y}_{i,t}(0)\) on \(D_{it}\) to obtain \(\hat{\tau}_{it}\):
      • \((Y_{i,t} - \hat{Y}_{i,t}(0)) = \alpha_0 + \tau_{it} D_{it} + \epsilon_{i,t}\).
  • With events studies of the form \(Y_{i,t} = \alpha_i + \gamma_t + \sum_{k \neq 0} \tau_{it}^k D_{it}^k + \delta X_{it} + \epsilon_{i,t}\):
    • 1st stage remains the same.
    • 2nd stage: Regress adjusted outcomes \(Y_{i,t} - \hat{Y}_{i,t}(0)\) on event dummies \(D_{it}^k\) to obtain \(\hat{\tau}_{it}^k\):
      • \((Y_{i,t} - \hat{Y}_{i,t}(0)) = \alpha_0 + \sum_{k \neq 0} \tau_{it}^k D_{it}^k + \epsilon_{i,t}\).

Imputation-based Estimation: Comparison

  • Key difference to C&S (2021) is the trade-off between efficiency and strength of identifying assumption:
    • Plus: averaging over multiple pre-treatment periods (instead of one) can increase precision.
    • Minus: parallel trends need to hold for all groups and time periods (instead of only post-treatment parallel trends).

Imputation-based Estimation: Example

  • Assess the impact of job displacement (i.e. losing job w/o own fault, e.g. mass layoff) on income of 1,298 individuals.
library(did2s) # Load the 'did' package
library(tidyverse) # Load the 'tidyverse' package
temp_file <- tempfile(fileext = ".RData") # Define a temporary file path to save the downloaded file
# Download the file from Dropbox
download.file("https://www.dropbox.com/scl/fi/wnp1hrkz00izr72h6ua1t/job_displacement_data.RData?rlkey=nr15rrbv1ev8ra1kzfa1knux2&dl=1", temp_file, mode = "wb")
load(temp_file) # Load the RData file into the R session
rm(temp_file) # Optionally, remove the temporary file

job_displacement_data <- job_displacement_data |> 
  # create time-variant treatment indicator D_it
  mutate(d_it = case_when(
    group == 0 ~ 0,
    year >= group ~ 1,
    TRUE ~ 0)
  ) |>
  # create releative event-time indicator D_it_k
  mutate(d_it_k = case_when(
    group == 0 ~ Inf,
    TRUE ~ year - group)
  )


# Static model
result <- did2s(
  job_displacement_data,
  yname = "income",
  first_stage = ~ occ_score | id + year, # only time-variant covariates
  second_stage = ~ d_it,
  treatment = "d_it",
  cluster_var = "id"
)

fixest::esttable(result)


# Event Study
result <- did2s(
  job_displacement_data,
  yname = "income",
  first_stage = ~ occ_score | id + year, # only time-variant covariates
  second_stage = ~ i(d_it_k, ref = c(-3, Inf)),
  treatment = "d_it",
  cluster_var = "id"
)

# Dynamic TWFE
twfe <- feols(income ~ i(d_it_k, ref = c(-3, Inf)) + occ_score | id + year,
              data=job_displacement_data,
              cluster=c("id", "year"))

fixest::iplot(list(result, twfe), main = "Event study: Staggered treatment", 
              xlab = "Relative time to treatment", col = c("#005e73", "#00C1D4") , ref.line = -0.5)
# Legend
legend(x=5, y=30000, col = c("#005e73", "#00C1D4"), pch = c(20, 17), 
       legend = c("Two-stage estimate", "Dynamic TWFE"))
OLS estimation, Dep. Var.: income
Observations: 11,448
Standard-errors: Custom 
     Estimate Std. Error  t value  Pr(>|t|)    
d_it -5900.79    2151.88 -2.74215 0.0061132 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
RMSE: 15,051.3   Adj. R2: 0.006894

Thank you for your attention!