Causal Data Science for Business Analytics
Hamburg University of Technology
Monday, 1. July 2024
“Synthetic control approach developed by Abadie, Diamond, and Hainmueller (2010, 2014) and Abadie and Gardeazabal (2003) is is arguably the most important innovation in the policy evaluation literature in the last 15 years.” (Athey and Guido W. Imbens, 2017)
What if parallel trends really isn’t realistic?
Originally designed for comparative case studies:
synthetic control unit
.library(Synth)
library(SCtools)
data(basque)
dataprep.out <- dataprep(
foo = basque,
predictors = c("school.illit", "school.prim", "school.med", # define covariates: school degrees and investment as share of GDP
"school.high", "school.post.high", "invest"),
time.predictors.prior = 1964:1969, # define pre treatment period to average covariates over
special.predictors = list( # addtional predictors with special periods to average over
list("gdpcap", 1960:1969 ,"mean"), # pre-treatment outcomes - GDP
list("sec.agriculture", seq(1961, 1969, 2), "mean"), # production in different sectors
list("sec.energy", seq(1961, 1969, 2), "mean"),
list("sec.industry", seq(1961, 1969, 2), "mean"),
list("sec.construction", seq(1961, 1969, 2), "mean"),
list("sec.services.venta", seq(1961, 1969, 2), "mean"),
list("sec.services.nonventa", seq(1961, 1969, 2), "mean"),
list("popdens", 1969, "mean")), # population density
dependent = "gdpcap",
unit.variable = "regionno",
unit.names.variable = "regionname",
time.variable = "year",
treatment.identifier = 17, # treated unit
controls.identifier = c(2:16, 18), # control units
time.optimize.ssr = 1960:1969, # periods to optimize over
time.plot = 1955:1997)
synth.out = synth(dataprep.out)
path.plot(dataprep.res = dataprep.out, synth.res = synth.out,Xlab="Year",Ylab="GDP Per Capita")
abline(v=1970,lty=2,col="#005e73")
# average treatment effect on the treated over post-treatment period
mean(dataprep.out$Y1plot[16:43] - (dataprep.out$Y0plot[16:43,] %*% synth.out$solution.w))
[1] -0.5799225
Unit 1
is treated during periods \(T_0 + 1, \ldots, T_t\).Weights: \(\mathbf{W} = (w_2, \ldots, w_{N+1})'\), with \(w_i \geq 0\) and \(\sum_{i=2}^{N+1} w_i = 1\).
\(\mathbf{X_1}\) is a \(k \times 1\) vector of pre-intervention covariates for the treated unit.
\(\mathbf{X_0}\) is a \(k \times N\) matrix of pre-intervention covariates for the donor pool.
\(\mathbf{X_1}\) and \(\mathbf{X_0}\) can include (the same) covariates over different pre-intervention periods and also pre-intervention outcomes.
Find \(\mathbf{W}* = (w_2^*, \ldots, w_{N+1}^*)\) that minimizes \(\| \mathbf{X}_1 - \mathbf{X}_0 \mathbf{W} \|\), subject to the weight constraints.
Synthetic control estimator: \(\hat{\tau}_{1t} = Y_{1t} - \sum_{i=2}^{N+1} w_i^* Y_{it} \quad \forall \quad t > T_0\).
\[ \begin{equation} \left( \begin{array}{c|c} \begin{matrix} X_{11} & X_{12} & \cdots & X_{1T_0} \\ \hline X_{21} & X_{22} & \cdots & X_{2T_0} \\ \vdots & \vdots & \ddots & \vdots \\ X_{N1} & X_{N2} & \cdots & X_{NT_0} \end{matrix} & \begin{matrix} Y_{1T} \\ \hline Y_{2T} \\ \vdots \\ Y_{NT} \end{matrix} \end{array} \right) \equiv \left( \begin{array}{c|c} \begin{matrix} \mathbf{X}_{1} \\ \hline \mathbf{X}_{0} \\ \end{matrix} & \begin{matrix} Y_{1T} \\ \hline \mathbf{Y}_{0T} \\ \end{matrix} \end{array} \right) \end{equation} \]
Goal of the vertical
imputation & estimation approach:
Impute \(Y_{1T_t}(0)\) by a weighted-average of \(\mathbf{Y}_{0T_t}\).
Estimate weights such that \(\mathbf{X}_{0}\) reproduce \(\mathbf{X}_{1}\).
\[ \begin{aligned} \mathbf{W}^* = \arg\min_{\mathbf{W}} \| \sqrt{V}\mathbf{X}_1 - \mathbf{X}_0 \mathbf{W} \|_2^2 &= \arg\min_{\mathbf{W}} [(\mathbf{X}_1 - \mathbf{X}_0 \mathbf{W})' \mathbf{V} (\mathbf{X}_1 - \mathbf{X}_0 \mathbf{W})] \\ &= \arg\min_{\mathbf{W}} \sum_{m=1}^{k} v_m \bigg( X_{1m} - \sum_{i=2}^{N+1} w_iX_{im}\bigg)^2 \\ \end{aligned} \]
\(v_m\) is the weight assigned to the \(m\)-th covariate should reflect:
its relative importance for the discrepancy between treated and control units.
its predictive power on the counterfactual outcome of the treated unit 1: \(Y_{1t}(0)\).
The synthetic control shaped by \(W^*(V^*)\) to reproduce the behavior of the outcome variable for the treated unit in the absence of the treatment.
Choice of \(V^*\) matrix is critical as it directly shapes the \(W^*\) matrix.
Cross-validation
to minimize out-of-sample prediction error:
placebo effects
in each iteration.p-Value
: proportion of placebo effects with a larger Post-MSPE/Pre-MSPE
-ratio than the actual treatment effect.library(Synth)
library(SCtools)
# generate placebo tests
placebo <- generate.placebos(dataprep.out = dataprep.out,synth.out = synth.out, strategy = "multisession")
# p-value: how extreme the treated unit’s ratio is in comparison with that of placebos.
# but filter out control units with extreme MSPE ratios
# resulting from poor fit in the pre-treatment period
mspe_test(placebo, discard.extreme = TRUE, mspe.limit = 5)
# plot placebo tests, but filter out control units with extreme MSPE ratios
# resulting from poor fit in the pre-treatment period
plot_placebos(placebo, discard.extreme = TRUE, mspe.limit = 5)
$p.val
[1] 0.8571429
$test
MSPE.ratios unit
1 399.28801 Andalucia
2 76.85500 Aragon
3 7767.20471 Principado De Asturias
4 55.67757 Canarias
5 2564.77781 Cantabria
6 470.57144 Castilla Y Leon
7 22.92823 Castilla-La Mancha
8 32.11487 Cataluna
9 114.23552 Comunidad Valenciana
10 90.72002 Galicia
11 130.86336 Murcia (Region de)
12 167.21979 Navarra (Comunidad Foral De)
13 156.15636 Rioja (La)
14 55.64514 Basque Country (Pais Vasco)
convex hull constraint
:
outcome model
and reweigh the original SC model with the new weights.doubly robust
:
\[ \begin{align*} \hat{Y}_{1T}^{\text{aug}}(0) &= \overbrace{\sum_{i=2}^{N+1} \hat{w}_i^{\text{sc}} Y_{iT}}^{\text{SC estimate}} + \overbrace{\left( \hat{m}_{1T} - \sum_{i=2}^{N+1} \hat{w}_i^{\text{sc}} \hat{m}_{iT} \right)}^{\text{imbalance correction}} \\ &= \underbrace{\hat{m}_{1T}}_{\text{outcome model}} + \underbrace{\sum_{i=2}^{N+1} \hat{w}_i^{\text{sc}} (Y_{iT} - \hat{m}_{iT})}_{\text{"IPW-like" re-weights to balance residuals}} \end{align*} \]
Four steps according to Chernozhukov, Wüthrich, Zhu (2021):
\[p(\tau_0) = \frac{1}{T} \sum_{t=1}^{T_0} 1 \left\{ \left| Y_{1T} - \tau_0 - \sum_{i=2}^{N+1} \hat{w}_i(\tau_0) Y_{iT} \right| \leq \left| Y_{1t} - \sum_{i=2}^{N+1} \hat{w}_i(\tau_0) Y_{it} \right| \right\} + \frac{1}{T}\]
\[\hat{C}_Y^{\text{conf}} = \left\{ y \in \mathbb{R} \left| \left| y - \sum_{i=2}^{N+1} \hat{w}_i (Y_{1T} - y) Y_{iT} \right| \leq q_{T, \alpha}^+ \left( \left| Y_{1t} - \sum_{i=2}^{N+1} \hat{w}_i (Y_{1t} - y) Y_{it} \right| \right) \right. \right\}\]
library(augsynth)
data(kansas)
attach(kansas)
# outcome ~ treatment | auxillary covariates
results <- augsynth(lngdpcapita ~ treated | lngdpcapita + log(revstatecapita) +
log(revlocalcapita) + log(avgwklywagecapita) +
estabscapita + emplvlcapita,
unit = fips,
time = year_qtr,
data = kansas,
progfunc = "Ridge",# function to use to impute control outcomes
scm = T) # whether to use the SCM
summary(results) # summarize the results
plot(results) # plot the results
# percentage change from the logged treatment effect
(exp(-0.0609)-1)*100 = -5.9%
Call:
single_augsynth(form = form, unit = !!enquo(unit), time = !!enquo(time),
t_int = t_int, data = data, progfunc = "Ridge", scm = ..2)
Average ATT Estimate (p Value for Joint Null): -0.0609 ( 0.14 )
L2 Imbalance: 0.054
Percent improvement from uniform weights: 86.6%
Covariate L2 Imbalance: 0.005
Percent improvement from uniform weights: 97.7%
Avg Estimated Bias: 0.027
Inference type: Conformal inference
Time Estimate 95% CI Lower Bound 95% CI Upper Bound p Value
2012.25 -0.021 -0.044 0.002 0.058
2012.50 -0.047 -0.081 -0.019 0.039
2012.75 -0.050 -0.083 -0.012 0.031
2013.00 -0.045 -0.074 -0.022 0.034
2013.25 -0.055 -0.083 -0.022 0.025
2013.50 -0.071 -0.110 -0.033 0.025
2013.75 -0.058 -0.091 -0.025 0.024
2014.00 -0.081 -0.119 -0.037 0.027
2014.25 -0.078 -0.116 -0.024 0.013
2014.50 -0.065 -0.114 -0.006 0.040
2014.75 -0.057 -0.110 0.000 0.050
2015.00 -0.075 -0.124 -0.022 0.037
2015.25 -0.063 -0.106 -0.014 0.022
2015.50 -0.067 -0.106 -0.019 0.025
2015.75 -0.063 -0.101 -0.009 0.028
2016.00 -0.078 -0.127 -0.030 0.019
Allow multiple units and differential treatment timing, even of parallel trends assumption is violated.
Goal
: find a coherent way to manage multiple synthetic controls and aggregating them into a single parameter estimate.
Challenge
: balance the imperfect biases in the pooled and the separate unit-level estimates.
Solution
: Partially-pooled synthetic control (Ben-Michael, Feller & Rothstein, 2021b).
Separate
Synthetic Control: Estimate separate SC models for each treated unit.
Pooled
Synthetic Control: Minimize the average pre-treatment imbalance across all treated units.
Partially-pooled
Synthetic Control (Ben-Michael, Feller & Rothstein, 2021b): Minimize a weighted average of the two imbalances.
\[ \begin{aligned} \hat{\alpha}_j &= \frac{1}{L_j} \sum_{\ell=1}^{L_j} Y_{j, T_j - \ell} - \frac{1}{L_j} \sum_{i=1}^N \sum_{\ell=1}^{L_j} \hat{w}_{ij}^* Y_{j, T_j - \ell} \\ &= \bar{Y}_{j, T_j}^{\text{pre}} - \sum_i \hat{w}_{ij}^* \bar{Y}_{i, T_j}^{\text{pre}} \end{aligned} \]
Thank you for your attention! | |
Causal Data Science: (9) Synthetic Controls