(7) Unobserved Confounding and Instrumental Variables

Causal Data Science for Business Analytics

Christoph Ihl

Hamburg University of Technology

Monday, 24. June 2024

Partial Identification

Motivation

  • Conditional Independence / Unconfoundedness: assumption is not testable.
  • “The Law of Decreasing Credibility: The credibility of inference decreases with the strength of the assumptions maintained.” (Manski, 2003)

\[ \begin{align*} \tau_{\text{ATE}} &= \mathbb{E}[Y_i(1)] - \mathbb{E}[Y_i(0)] \\ &= \mathbb{E}_{\mathbf{X, U}}[\mathbb{E}[Y_i|T_i=1, \mathbf{X_i, U_i}] - \mathbb{E}[Y_i|T_i=0, \mathbf{X_i, U_i}]] \\ & \color{#FF7E15}{\stackrel{?}{\approx}} \mathbb{E}_{\mathbf{X}}[\mathbb{E}[Y_i|T_i=1, \mathbf{X_i}] - \mathbb{E}[Y_i|T_i=0, \mathbf{X_i}]] \end{align*} \]

  • “Questionable” equality is required to hold for a point estimate of the ATE.
  • Partial Identification is the method to estimate the ATE under weaker assumptions yielding a set estimate - an interval with upper and lower bounds.
  • Trade-off between assumptions and width of the interval.

No Assumption (Worst-case) Bounds

  • Assume potential outcomes are bounded: \(y^{LB} \leq Y_i(t) \leq y^{UB}\), \(\forall t\).
    • Bounds of ITE: \(y^{LB} - y^{UB} \leq Y_i(1) - Y_i(0) \leq y^{UB} - y^{LB}\)
    • Bounds of ATE: \(y^{LB} - y^{UB} \leq \mathbb{E}[Y_i(1) - Y_i(0)] \leq y^{UB} - y^{LB}\)
    • Interval length: \(2(y^{UB} - y^{LB})\)
  • But the ATE interval length can actually be halved. How?
  • Let’s use the observational-counterfactual decomposition of the ATE: \[ \begin{align*} \mathbb{E}[Y_i(1) - Y_i(0)] &= \mathbb{E}[Y_i(1)] - \mathbb{E}[Y_i(0)] \\ &= P(T_i=1)\color{#00C1D4}{\mathbb{E}[Y_i(1)|T_i=1]} + P(T_1=0)\color{#FF7E15}{\mathbb{E}[Y_i(1)|T_i=0]} - P(T_i=1)\color{#FF7E15}{\mathbb{E}[Y_i(0)|T_i=1]} - P(T_i=0)\color{#00C1D4}{\mathbb{E}[Y_i(0)|T_i=0]}\\ &= P(T_i=1)\color{#00C1D4}{\mathbb{E}[Y_i|T_i=1]} + P(T_i=0)\color{#FF7E15}{\mathbb{E}[Y_i(1)|T_i=0]} - P(T_i=1)\color{#FF7E15}{\mathbb{E}[Y_i(0)|T_i=1]} - P(T_i=0)\color{#00C1D4}{\mathbb{E}[Y_i|T_i=0]}\\ &:= p\color{#00C1D4}{\mathbb{E}[Y_i|T_i=1]} + (1-p)\color{#FF7E15}{\mathbb{E}[Y_i(1)|T_i=0]} - p\color{#FF7E15}{\mathbb{E}[Y_i(0)|T_i=1]} - (1-p)\color{#00C1D4}{\mathbb{E}[Y_i|T_i=0]}\\ \end{align*} \]
  • Upper bound: \(\mathbb{E}[Y_i(1) - Y_i(0)] \leq p\color{#00C1D4}{\mathbb{E}[Y_i|T_i=1]} + (1-p)\color{#00C1D4}{y^{UB}} - p\color{#00C1D4}{y^{LB}} - (1-p)\color{#00C1D4}{\mathbb{E}[Y_i|T_i=0]}\)
  • Lower bound: \(\mathbb{E}[Y_i(1) - Y_i(0)] \geq p\color{#00C1D4}{\mathbb{E}[Y_i|T_i=1]} + (1-p)\color{#00C1D4}{y^{LB}} - p\color{#00C1D4}{y^{UB}} - (1-p)\color{#00C1D4}{\mathbb{E}[Y_i|T_i=0]}\)
  • Interval length: \((1-p)y^{UB} - py^{LB} - (1-p)y^{LB} + py^{UB} = y^{UB} - y^{LB}\)
  • Unfortunately, the interval always contains 0. We need more assumptions!

Monotone Treatment Response (MTR) Bounds

  • Assume that the treatment has a non-negative monotone effect on the outcome: \(Y_i(1) \geq Y_i(0)\), \(\forall i\).
    • (Works also with a non-positive monotone effect).
    • Lower bound of ITE: \(Y_i(1) - Y_i(0) \geq 0\).
    • Lower bound of ATE: \(\mathbb{E}[Y_i(1) - Y_i(0)] \geq 0\).
    • Why?
  • First use the assumption to derive the following two implications:
    • \(\mathbb{E}[Y_i(1)|T_i=0] \geq \mathbb{E}[Y_i(0)|T_i=0] = \mathbb{E}[Y_i|T_i=0]\).
    • \(-\mathbb{E}[Y_i(0)|T_i=1] \geq -\mathbb{E}[Y_i(1)|T_i=1] = -\mathbb{E}[Y_i|T_i=1]\).
  • Use the two implications to replace the counterfactuals in the observational-counterfactual decomposition to derive a lower bound of the ATE: \[ \begin{align*} \mathbb{E}[Y_i(1) - Y_i(0)] &= p\color{#00C1D4}{\mathbb{E}[Y_i|T_i=1]} + (1-p)\color{#FF7E15}{\mathbb{E}[Y_i(1)|T_i=0]} - p\color{#FF7E15}{\mathbb{E}[Y_i(0)|T_i=1]} - (1-p)\color{#00C1D4}{\mathbb{E}[Y_i|T_i=0]}\\ & \geq p\color{#00C1D4}{\mathbb{E}[Y_i|T_i=1]} + (1-p)\color{#FF7E15}{\mathbb{E}[Y_i|T_i=0]} - p\color{#FF7E15}{\mathbb{E}[Y_i|T_i=1]} - (1-p)\color{#00C1D4}{\mathbb{E}[Y_i|T_i=0]} \\ &= 0 \end{align*} \]
  • Can be combined with no-assumption upper bound to get a tighter interval, but it still always contains 0.

Monotone Treatment Selection (MTS) Bounds

  • Assume positive self-selection: those who generally have better outcomes self-select into treatment:
    • \(\mathbb{E}[Y_i(1)|T_i=1] \geq \mathbb{E}[Y_i(1)|T_i=0]\).
    • \(\mathbb{E}[Y_i(0)|T_i=1] \geq \mathbb{E}[Y_i(0)|T_i=0]\).
    • Upper bound of ATE is the associational difference: \(\mathbb{E}[Y_i(1) - Y_i(0)] \leq \mathbb{E}[Y_i|T_i=1] - \mathbb{E}[Y_i|T_i=0]\).
    • Why?
  • Let’s use again the observational-counterfactual decomposition of the ATE and replace: \[ \begin{align*} \mathbb{E}[Y_i(1) - Y_i(0)] &= p\color{#00C1D4}{\mathbb{E}[Y_i|T_i=1]} + (1-p)\color{#FF7E15}{\mathbb{E}[Y_i(1)|T_i=0]} - p\color{#FF7E15}{\mathbb{E}[Y_i(0)|T_i=1]} - (1-p)\color{#00C1D4}{\mathbb{E}[Y_i|T_i=0]}\\ &\leq p\color{#00C1D4}{\mathbb{E}[Y_i|T_i=1]} + (1-p)\color{#FF7E15}{\mathbb{E}[Y_i(1)|T_i=1]} - p\color{#FF7E15}{\mathbb{E}[Y_i(0)|T_i=0]} - (1-p)\color{#00C1D4}{\mathbb{E}[Y_i|T_i=0]}\\ &= p\color{#00C1D4}{\mathbb{E}[Y_i|T_i=1]} + (1-p)\color{#00C1D4}{\mathbb{E}[Y_i|T_i=1]} - p\color{#00C1D4}{\mathbb{E}[Y_i|T_i=0]} - (1-p)\color{#00C1D4}{\mathbb{E}[Y_i|T_i=0]}\\ &= \color{#00C1D4}{\mathbb{E}[Y_i|T_i=1]} - \color{#00C1D4}{\mathbb{E}[Y_i|T_i=0]}\\ \end{align*} \]
  • Can be combined with MTR lower bound to get a tighter interval, but it still always contains 0.

Optimal Treatment Selection (OTS) Bounds 1

  • Assume individuals always receive the treatment that is best for them:
    • \(T_i = 0 \implies Y_i(0) > Y_i(1) \quad\) and \(\quad T_i = 1 \implies Y_i(1) \geq Y_i(0)\).
  • From the assumption, we know:
    • \(\mathbb{E}[Y_i(1)|T_i=0] \leq \mathbb{E}[Y_i(0)|T_i=0] = \mathbb{E}[Y_i|T_i=0] \quad\) and \(\quad \mathbb{E}[Y_i(0)|T_i=1] \leq \mathbb{E}[Y_i(1)|T_i=1] = \mathbb{E}[Y_i|T_i=1]\).
  • Therefore, we can derive an upper bound for the ATE (together with no-assumption lower bound): \[ \begin{align*} \mathbb{E}[Y_i(1) - Y_i(0)] &= p\color{#00C1D4}{\mathbb{E}[Y_i|T_i=1]} + (1-p)\color{#FF7E15}{\mathbb{E}[Y_i(1)|T_i=0]} - p\color{#FF7E15}{\mathbb{E}[Y_i(0)|T_i=1]} - (1-p)\color{#00C1D4}{\mathbb{E}[Y_i|T_i=0]}\\ &\leq p\color{#00C1D4}{\mathbb{E}[Y_i|T_i=1]} + (1-p)\color{#00C1D4}{\mathbb{E}[Y_i|T_i=0]} - p\color{#00C1D4}{y^{LB}} - (1-p)\color{#00C1D4}{\mathbb{E}[Y_i|T_i=0]}\\ &= p\color{#00C1D4}{\mathbb{E}[Y_i|T_i=1]} - p\color{#00C1D4}{y^{LB}}\\ \end{align*} \]
  • And a lower bound for the ATE (together with no-assumption lower bound): \[ \begin{align*} \mathbb{E}[Y_i(1) - Y_i(0)] &= p\color{#00C1D4}{\mathbb{E}[Y_i|T_i=1]} + (1-p)\color{#FF7E15}{\mathbb{E}[Y_i(1)|T_i=0]} - p\color{#FF7E15}{\mathbb{E}[Y_i(0)|T_i=1]} - (1-p)\color{#00C1D4}{\mathbb{E}[Y_i|T_i=0]}\\ &\geq p\color{#00C1D4}{\mathbb{E}[Y_i|T_i=1]} + (1-p)\color{#00C1D4}{y^{LB}} - p\color{#00C1D4}{\mathbb{E}[Y_i|T_i=1]} - (1-p)\color{#00C1D4}{\mathbb{E}[Y_i|T_i=0]}\\ &= (1-p)\color{#00C1D4}{y^{LB}} - (1-p)\color{#00C1D4}{\mathbb{E}[Y_i|T_i=0]}\\ \end{align*} \]
  • Interval still always includes 0 and has length: \(p\mathbb{E}[Y_i|T_i=1] + (1 - p)\mathbb{E}[Y_i|T_i=0] - y^{LB}\).

Optimal Treatment Selection (OTS) Bounds 2

  • Assume individuals always receive the treatment that is best for them, but add counterpositive:
    • \(T_i = 0 \implies Y_i(0) > Y_i(1) \quad\) Counterpositive: \(T_i = 1 \impliedby Y_i(0) \leq Y_i(1)\).
    • \(T_i = 1 \implies Y_i(1) \geq Y_i(0) \quad\) Counterpositive: \(T_i = 0 \impliedby Y_i(1) < Y_i(0)\).
  • From the above, we can derive two implications:
    • \(\mathbb{E}[Y_i(1)|T_i=0] = \mathbb{E}[Y_i(1)|Y_i(0) > Y_i(1)] \color{#00C1D4}{\leq} \mathbb{E}[Y_i(1)|Y_i(0) \leq Y_i(1)] = \mathbb{E}[Y_i(1)|T_i=1] = \mathbb{E}[Y_i|T_i=1]\)
    • \(\mathbb{E}[Y_i(0)|T_i=1] = \mathbb{E}[Y_i(0)|Y_i(1) \geq Y_i(0)] \color{#00C1D4}{<} \mathbb{E}[Y_i(0)|Y_i(1) < Y_i(0)] = \mathbb{E}[Y_i(0)|T_i=0] = \mathbb{E}[Y_i|T_i=0]\)
  • Therefore, we can derive an upper and lower bound for the ATE:

\[ \begin{align*} \mathbb{E}[Y_i(1) - Y_i(0)] &= p\color{#00C1D4}{\mathbb{E}[Y_i|T_i=1]} + (1-p)\color{#FF7E15}{\mathbb{E}[Y_i(1)|T_i=0]} - p\color{#FF7E15}{\mathbb{E}[Y_i(0)|T_i=1]} - (1-p)\color{#00C1D4}{\mathbb{E}[Y_i|T_i=0]}\\ &\leq p\color{#00C1D4}{\mathbb{E}[Y_i|T_i=1]} + (1-p)\color{#00C1D4}{\mathbb{E}[Y_i|T_i=1]} - p\color{#00C1D4}{y^{LB}} - (1-p)\color{#00C1D4}{\mathbb{E}[Y_i|T_i=0]}\\ &= \color{#00C1D4}{\mathbb{E}[Y_i|T_i=1]} - p\color{#00C1D4}{y^{LB}} - (1-p)\color{#00C1D4}{\mathbb{E}[Y_i|T_i=0]}\\ \end{align*} \] \[ \begin{align*} \mathbb{E}[Y_i(1) - Y_i(0)] &\geq p\color{#00C1D4}{\mathbb{E}[Y_i|T_i=1]} + (1-p)\color{#00C1D4}{y^{LB}} - p\color{#00C1D4}{\mathbb{E}[Y_i|T_i=0]} - (1-p)\color{#00C1D4}{\mathbb{E}[Y_i|T_i=0]}\\ &= p\color{#00C1D4}{\mathbb{E}[Y_i|T_i=1]} + (1-p)\color{#00C1D4}{y^{LB}} - \color{#00C1D4}{\mathbb{E}[Y_i|T_i=0]}\\ \end{align*} \]

  • Interval can, but doesn’t have to include 0, finally. Length: \((1-p)\mathbb{E}[Y_i|T_i=1] + p\mathbb{E}[Y_i|T_i=0] - y^{LB}\).

Partial Identification and Bounds: Example

  • Assess the effect of 401(k) program participation on net financial assets of 9,915 households in the US in 1991.
library(hdm) # for the data
library(drgee) # for doubly robust estimator
data(pension) # Get data
Y = pension$net_tfa # Outcome
T = pension$p401 # Treatment
X = cbind(pension$age,pension$db,pension$educ,pension$fsize,pension$hown,
          pension$inc,pension$male,pension$marr,pension$pira,pension$twoearn) # covariates 
dr = drgee(oformula = formula(Y ~ X), eformula = formula(T ~ X), elink="logit") # DR reg
ATE <- as.numeric(dr$coefficients) # ATE           
p = mean(T) # Propensity score
ymin = as.numeric(quantile(Y, probs = 0.05)) # outcome lower bound
ymax = as.numeric(quantile(Y, probs = 0.95)) # outcome upper bound
Y1 = mean(Y[T == 1]) # outcome mean for treated
Y0 = mean(Y[T == 0]) # outcome mean for untreated

# No assumption (worst case) bounds
UB = p*Y1 + (1-p)*ymax-p*ymin-(1-p)*Y0
LB = p*Y1 + (1-p)*ymin-p*ymax-(1-p)*Y0
L = UB - LB
cat(sprintf("LowerBound (worst) = %d, ATE = %d, UpperBound (worst) = %d, IntervalLength = %d", round(LB), round(ATE), round(UB), round(L)))

# Monotone Treatment Response (MTR) Bounds
UB = p*Y1 + (1-p)*ymax-p*ymin-(1-p)*Y0
LB = 0
L = UB - LB
cat(sprintf("LowerBound (MTR) = %d, ATE = %d, UpperBound (worst) = %d, IntervalLength = %d", round(LB), round(ATE), round(UB), round(L)))
# Monotone Treatment Selection (MTS) Bounds
UB = Y1 - Y0
L = UB - LB
cat(sprintf("LowerBound (MTR) = %d, ATE = %d, UpperBound (MTS) = %d, IntervalLength = %d", round(LB), round(ATE), round(UB), round(L)))

# Optimal Treatment Selection 1 (OTS 1) Bounds
UB = p*Y1 - p*ymin
LB = (1-p)*ymin - (1-p)*Y0
L = UB - LB
cat(sprintf("LowerBound (OTS 1) = %d, ATE = %d, UpperBound (OTS 1) = %d, IntervalLength = %d", round(LB), round(ATE), round(UB), round(L)))

# Optimal Treatment Selection 2 (OTS 2) Bounds
UB = Y1 - p*ymin - (1-p)*Y0
LB = p*Y1 + (1-p)*ymin - Y0
L = UB - LB
cat(sprintf("LowerBound (OTS 2) = %d, ATE = %d, UpperBound (OTS 2) = %d, IntervalLength = %d", round(LB), round(ATE), round(UB), round(L)))

# Mix OTS1 (Upper) and OTS 2 (Lower) Bounds
UB = p*Y1 - p*ymin
LB = p*Y1 + (1-p)*ymin - Y0
L = UB - LB
cat(sprintf("LowerBound (OTS 2) = %d, ATE = %d, UpperBound (OTS 1) = %d, IntervalLength = %d", round(LB), round(ATE), round(UB), round(L)))
LowerBound (worst) = -28746, ATE = 11333, UpperBound (worst) = 72253, IntervalLength = 100999
LowerBound (MTR) = 0, ATE = 11333, UpperBound (worst) = 72253, IntervalLength = 72253
LowerBound (MTR) = 0, ATE = 11333, UpperBound (MTS) = 27372, IntervalLength = 27372
LowerBound (OTS 1) = -14687, ATE = 11333, UpperBound (OTS 1) = 12365, IntervalLength = 27052
LowerBound (OTS 2) = -7526, ATE = 11333, UpperBound (OTS 2) = 32575, IntervalLength = 40101
LowerBound (OTS 2) = -7526, ATE = 11333, UpperBound (OTS 1) = 12365, IntervalLength = 19890

Sensitivity Analysis

Linear Model & Single Confounder

  • \(Y_i\) as linear function of \(T_i\), observed confounding variables \(\mathbf{X_i}\) and a single unobserved confounding variables \(U_i\):
    • \(Y_i = \tau T_i + \mathbf{\beta' X_i} + \gamma U_i + \epsilon_{Y_i}\) and assume that \(Cov(\epsilon_{Y_i},T_i) = 0\)
  • Since \(U_i\) is unobserved, we have to estimate this model:
    • \(Y_i = \tilde{\tau}T_i + \mathbf{\tilde{\beta}' X_i} + \tilde{\epsilon}_{Y_i}\)
  • How does the estimable treatment effect \(\tilde{\tau}\) differ from the true treatment effect \(\tau\)?
  • To find out, let’s apply the Frisch-Waugh-Lovell theorem to the above models to partial out the observed covariates \(\mathbf{X_i}\):
    • \((Y_i - \mathbb{E}(Y_i|\mathbf{X_i}) = \tau(T_i - \mathbb{E}(T_i|\mathbf{X_i})) + \gamma(U_i - \mathbb{E}(U_i|\mathbf{X_i})) + \epsilon_{Y_i}\)
    • \((Y_i - \mathbb{E}(Y_i|\mathbf{X_i}) = \tilde{\tau}(T_i - \mathbb{E}(T_i|\mathbf{X_i})) + \tilde{\epsilon}_{Y_i}\)
  • Obtain \(\tilde{\tau}\) and replace \((Y_i - \mathbb{E}(Y_i|\mathbf{X_i})\):

\[ \begin{align*} \tilde{\tau} &= \frac{Cov((Y_i - \mathbb{E}(Y_i|\mathbf{X_i}), (T_i - \mathbb{E}(T_i|\mathbf{X_i})))}{Var((T_i - \mathbb{E}(T_i|\mathbf{X_i})))} = \frac{Cov((\tau(T_i - \mathbb{E}(T_i|\mathbf{X_i})) + \gamma(U_i - \mathbb{E}(U_i|\mathbf{X_i})) + \epsilon_{Y_i}), (T_i - \mathbb{E}(T_i|\mathbf{X_i})))}{Var((T_i - \mathbb{E}(T_i|\mathbf{X_i})))} \\ &= \tau \underbrace{\frac{ Cov((T_i - \mathbb{E}(T_i|\mathbf{X_i})),(T_i - \mathbb{E}(T_i|\mathbf{X_i})))}{Var((T_i - \mathbb{E}(T_i|\mathbf{X_i})))}}_{=1} + \gamma \underbrace{\frac{ Cov((U_i - \mathbb{E}(U_i|\mathbf{X_i})),(T_i - \mathbb{E}(T_i|\mathbf{X_i})))}{Var((T_i - \mathbb{E}(T_i|\mathbf{X_i})))}}_{:=\delta} + \underbrace{\frac{ Cov(\epsilon_{Y_i},(T_i - \mathbb{E}(T_i|\mathbf{X_i})))}{Var((T_i - \mathbb{E}(T_i|\mathbf{X_i})))}}_{=0}= \tau + \color{#FF7E15}{\underbrace{\gamma \delta}_{\text{Bias}}} \end{align*} \]

Ommitted Confounder Bias - Interpretation

  • \(\gamma\) is the impact of the unobserved confounder \(U_i\) on the outcome \(Y_i\).
  • \(\delta\) is the impact of the treatment \(T_i\) on the unobserved confounder \(U_i\) while controlling for the observed coundounder \(\mathbf{X_i}\).
    • \(\delta\) can be interpreted as imbalance in the unobserved confounder \(U_i\) across values of \(T_i\).
  • Overall bias results from an unobserved confounder’s impact on the outcome times its imbalance across treatment levels.
  • Question: How strong does the bias of an unobserved confounder have to be to invalidate the treatment effect estimate?
  • Answers to this question can be visualized by a contour plot of the bias \(\gamma \delta\) as a function of \(\gamma\) and \(\delta\).

Ommitted Confounder Bias - Contour Plot

  • Hypothetical example: estimated treatment effect unadjusted for the unobserved confounder \(\tilde{\tau} = 25\).

  • Levels of bias (contours) diminish the estimated \(\tilde{\tau}\) implying different levels of \(\tau\).

  • Benchmark covariates \(X_b\) for comparison.

Recent Extensions to More General Settings

  1. Cinelli & Hazlett (2020):
  • Approach:
    • Reparameterize the bias terms with scale-free partial \(R^2\) measures:
      • \(\gamma\): \(R^2\) to assess the strength of association between \(\mathbf{U_i}\) and \(Y_i\) while controlling for \(\mathbf{X_i}\).
      • \(\delta\): \(R^2\) to assess the strength of association between \(\mathbf{U_i}\) and \(T_i\) while controlling for \(\mathbf{X_i}\).
    • Derive two new sensitivity measures:
      • Robustness Value (RV): minimum strength of association that \(\mathbf{U_i}\) must have with both \(T_i\) and \(Y_i\) to explain away the estimated \(\tilde{\tau}\).
      • \(\color{#00C1D4}{R^2_{Y \sim T|\mathbf{X}}}\): an extreme confounder \(U^E_i\) that explains 100% of the residual variance of \(Y_i\) (=> \(R^2_{Y \sim U^E|T\mathbf{X}} = 1\)), must explain at least as much as \(R^2_{T \sim U^E|\mathbf{X}} = R^2_{Y \sim T|\mathbf{X}}\) of the residual variance of \(T_i\) to fully explain away the estimated \(\tilde{\tau}\).
  • Key Advantages:

    • No assumptions on functional form of the treatment mechanism or the distribution of unobserved confounders.

    • Handles multiple confounders that may interact with the treatment and outcome in non-linear ways.

    • Benchmark the strength of confounders based on comparisons with observed covariates.

  • Implementation: R package “sensemakr”.

Recent Extensions to More General Settings

  1. Chernozhukov, Cinelli, et al. (2021-2024):
  • Approach:
    • Using Riesz representers to derive influence functions for causal parameters in non-parametric settings.
  • Key Advantages:
    • Applicable in general nonparametric models without stringent assumptions about functional forms or distributions.
    • Both treatment mechanism and outcome mechanism can be modeled with arbitrary machine learning models.
    • Extends to a broad range of causal parameters (also from AIPW, IV, DiD models).
  • Implementation: R package “dml.sensemakr”.

Sensitivity Analysis: Example

  • Assess the effect of 401(k) program participation on net financial assets of 9,915 households in the US in 1991.
library(hdm) # for the data
library(sensemakr) # load sensemakr package
data(pension) # Get data

# runs conditional outcome regression model
model <- lm(net_tfa ~ p401 + age + db + educ + fsize + hown + inc + 
            male + marr + pira + twoearn, data = pension)

# runs sensemakr for sensitivity analysis
sensitivity <- sensemakr(model = model, treatment = "p401",
                         benchmark_covariates = c("inc"), kd=1:3)

# plot 
# plot(sensitivity)

# short description of results
sensitivity
Sensitivity Analysis to Unobserved Confounding

Model Formula: net_tfa ~ p401 + age + db + educ + fsize + hown + inc + male + 
    marr + pira + twoearn

Null hypothesis: q = 1 and reduce = TRUE 

Unadjusted Estimates of ' p401 ':
  Coef. estimate: 11590.38 
  Standard Error: 1345.253 
  t-value: 8.61577 

Sensitivity Statistics:
  Partial R2 of treatment with outcome: 0.00744 
  Robustness Value, q = 1 : 0.08291 
  Robustness Value, q = 1 alpha = 0.05 : 0.06468 

For more information, check summary.

Instrumental Variables

What is an Instrumental Variable?

  • Assumptions:
    1. Relevance: \(Z\) is significantly correlated with \(T\), i.e. \(Cov(Z, T) \neq 0\).
      Path 1 must exist.
    2. Exclusion Restriction: \(Z\) affects \(Y\) only through \(T\).
      A direct path 2 must not exist.
    3. Unconfoundedness (Exogeneity, Validity): \(Z\) is independent of \(U\), i.e. \(Cov(Z, \epsilon_Y) = 0\).
      Conditioning on \(\mathbf{X}\) required in some contexts (will cover this later).
      Path 3 must not exist.
  • Even with these assumptions fulfilled, there is no nonparametric identification of the ATE.
    • The backdoor path \(T \leftarrow U \rightarrow Y\) cannot be blocked.
  • Two identification approaches:
    1. Parametric assumption (i.e. linearity):
      • Identification of homogeneous treatment effect.
    2. No parametric, but monotonicity assumption:
      • Nonparametric identification of Local Average Treatment Effect (LATE) instead of ATE.

Where do Good IVs come from?

  1. Lotteries - purely random:
    • By the researcher: randomized experiments with respect to the instrument not the treatment itself.
      • Encouragement designs: random assignment of invitations or incentives to participate in a program.
    • Sometimes also institutionalized: e.g. draft lotteries in sport or military, school assignment, prize lotteries among customers as marketing device.
  2. Natural Experiments - as-good-as-random:
    • Random conditional on some covariates, i.e. rely on a selection-on-observables for the IV instead of the treatment.
    • Changes in policy, regulation or law applicable to defined subpopulation.
    • Variation in decision makers, evaluators, judges.
    • Economic shocks.
    • Historical events.
    • Changes in weather conditions.
    • Variation in geographical distance.

Binary Linear Setting

  • Additional Assumptions:
    • Binary \(Z_i\) and \(T_i\) and linear outcome model with exclusion restriction for \(Z_i\): \(Y_i = \beta_0 + \tau T_i + \beta_U U_i + \epsilon_i\).
  • Let us derive the Wald Estimand for the treatment effect in this IV setting in the following steps:
    1. Start by the associational difference.
    2. Use linearity of expectations and instrumental unconfoundedness assumption - rearrange.
    3. Use instrumental unconfoundedness assumption again.

\[ \begin{align*} \mathbb{E}[Y_i | Z_i = 1] &- \mathbb{E}[Y_i | Z_i = 0] = \mathbb{E}[\beta_0 + \tau T_i + \beta_U U_i + \epsilon_i | Z_i = 1] - \mathbb{E}[\beta_0 + \tau T_i + \beta_U U_i + \epsilon_i | Z_i = 0] \\ &= \beta_0 + \tau \mathbb{E}[T_i | Z_i = 1] + \beta_u \mathbb{E}[U_i | Z_i = 1] + \mathbb{E}[\epsilon_i | Z_i = 1] - \beta_0 - \tau \mathbb{E}[T_i | Z_i = 0] - \beta_u \mathbb{E}[U_i | Z_i = 0] - \mathbb{E}[\epsilon_i | Z_i = 0] \\ &= \tau (\mathbb{E}[T_i | Z_i = 1] - \mathbb{E}[T_i | Z_i = 0]) + \beta_u (\mathbb{E}[U_i | Z_i = 1] - \mathbb{E}[U_i | Z_i = 0]) \\ &= \tau (\mathbb{E}[T_i | Z_i = 1] - \mathbb{E}[T_i | Z_i = 0]) + \beta_u (\mathbb{E}[U_i] - \mathbb{E}[U_i]) = \tau (\mathbb{E}[T_i | Z_i = 1] - \mathbb{E}[T_i | Z_i = 0]) \end{align*} \]

  • Solve for \(\tau\) to get the Wald Estimand:

\[ \tau = \frac{\mathbb{E}[Y_i | Z_i = 1] - \mathbb{E}[Y_i | Z_i = 0]}{\mathbb{E}[T_i | Z_i = 1] - \mathbb{E}[T_i | Z_i = 0]} \]

  • Sample version, i.e. Wald Estimator:

\[ \hat{\tau} = \frac{\sum_{i=1}^n Y_i \cdot Z_i - \sum_{i=1}^n Y_i \cdot (1 - Z_i)}{\sum_{i=1}^n T_i \cdot Z_i - \sum_{i=1}^n T_i \cdot (1 - Z_i)} \]

  • Relevance assumption ensures that denominator is not zero.

Continuous Linear Setting

  • Additional Assumptions:
    • Continuous. \(Z_i\) and \(T_i\) and linear outcome model with exclusion restriction for \(Z_i\): \(Y_i = \beta_0 + \tau T_i + \beta_U U_i + \epsilon_i\).
  • Let us derive the Wald Estimand for the treatment effect in this IV setting in the following steps:
    1. Start with the classic covariance identity.
    2. Use linearity of expectations and instrumental unconfoundedness - rearrange.
    3. Use covariance identity and instrumental unconfoundedness assumption again.

\[ \begin{align*} Cov(Y_i, Z_i) &= \mathbb{E}[Y_i Z_i] - \mathbb{E}[Y_i] \mathbb{E}[Z_i] = \mathbb{E}[(\beta_0 + \tau T_i + \beta_U U_i + \epsilon_i) Z_i] - \mathbb{E}[\beta_0 + \tau T_i + \beta_U U_i + \epsilon_i] \mathbb{E}[Z_i] \\ &= \beta_0\mathbb{E}[Z_i] + \tau\mathbb{E}[T_iZ_i] + \beta_U \mathbb{E}[U_iZ_i] + \mathbb{E}[\epsilon_iZ_i] - \beta_0\mathbb{E}[Z_i] - \tau\mathbb{E}[T_i]\mathbb{E}[Z_i] - \beta_U\mathbb{E}[U_i]\mathbb{E}[Z_i] - \mathbb{E}[\epsilon_i]\mathbb{E}[Z_i]\\ &= \tau\mathbb{E}[T_iZ_i] + \beta_U \mathbb{E}[U_iZ_i] - \tau\mathbb{E}[T_i]\mathbb{E}[Z_i] - \beta_U\mathbb{E}[U_i]\mathbb{E}[Z_i] = \tau(\mathbb{E}[T_iZ_i] - \mathbb{E}[T_i]\mathbb{E}[Z_i]) + \beta_U (\mathbb{E}[U_iZ_i] - \mathbb{E}[U_i]\mathbb{E}[Z_i])\\ &= \tau Cov(T_i, Z_i) + \beta_U Cov(U_i, Z_i) = \tau Cov(T_i, Z_i) \end{align*} \]

  • Solve for \(\tau\) to get the Wald Estimand:

\[ \tau = \frac{Cov(Y_i, Z_i)}{Cov(T_i, Z_i)} \]

  • Sample version, i.e. Wald Estimator:

\[ \hat{\tau} = \frac{\sum_{i=1}^n Y_i \cdot Z_i - \bar{Y} \bar{Z}}{\sum_{i=1}^n T_i \cdot Z_i - \bar{T} \bar{Z}} \]

  • Relevance assumption ensures that denominator is not zero.

Two-Stage Least Squares Estimator (2SLS)

  1. Stage: Linearly regress \(T_i\) on \(Z_i\) to estimate \(\mathbb{E}[T_i|Z_i]\). This gives the projection of \(T_i\) onto \(Z_i\): \(\hat{T}_i\).

  2. Stage: Linearly regress \(Y_i\) on \(\hat{T}_i\) to estimate \(\mathbb{E}[Y_i|\hat{T}_i]\). Obtain the estimate \(\hat{\tau}\) as the fitted coefficient of \(\hat{T}_i\).

  • Also works as an estimator in the binary setting.

Local Average Treatment Effect

Stratification of Data

  • Define potential treatments conditional on \(Z_i\): \(T_i(0)\) if \(Z_i = 0\) and \(T_i(1)\) if \(Z_i = 1\).
  • Principal strata:
    • Compliers always take the treatment that they’re encouraged to take: \(T_i(1) = 1\) and \(T_i(0) = 0\).
    • Always-Takers always take the treatment, regardless of encouragement: \(T_i(1) = 1\) and \(T_i(0) = 1\).
    • Never-Takers never take the treatment, regardless of encouragement: \(T_i(1) = 0\) and \(T_i(0) = 0\).
    • Defiers always take the opposite treatment that they’re encouraged to take: \(T_i(1) = 0\) and \(T_i(0) = 1\).
  • Causal graph for compliers and defiers:

  • Causal graph for always-takers and never-takers:

  • But can’t identify what strata a given unit is in: e.g. \(Z_i = 0\) & \(T_i = 0\) could be compliers or never-takers; etc.

LATE: Definition & Identification

  • Instead of nonparametrically identifying the ATE, it is only possible to nonparametrically identify the LATE:

Definition: “Local Average Treatment Effect (LATE) / Complier Average Causal Effect (CACE)”

\(\mathbb{E}[Y_i(T_i=1) - Y_i(T_i=0) | T_i(Z_i=1) = 1, T_i(Z_i=0) = 0]\)

  • Instead of the linearity assumption, we need the monotonicity assumption:

Assumption: “Monotonicity”

\(\forall i, T_i(Z_i=1) \geq T_i(Z_i=0)\)

  • Monotonicity implies that there are no defiers in the population.

Theorem: “LATE Nonparametric Identification”

  • Given that \(Z_i\) is an instrument, \(Z_i\) and \(T_i\) are binary variables, and that monotonicity holds, the following is true:

\(\mathbb{E}[Y_i(1) - Y_i(0) | T_i(1) = 1, T_i(0) = 0] = \frac{\mathbb{E}[Y_i | Z_i = 1] - \mathbb{E}[Y_i | Z_i = 0]}{\mathbb{E}[T_i | Z_i = 1] - \mathbb{E}[T_i | Z_i = 0]}\)

  • Numerator is 2nd-stage Intention-to-Treat (ITT) Effect; denominator is 1st-stage effect or Complier Share.

LATE: Proof 1

  • Start with causal effect of \(Z_i\) on \(Y_i\) and decompose it into weighted stratum-specific causal effects:

\[ \begin{align*} \mathbb{E}[Y_i(Z_i=1) - Y_i(Z_i=0)] &= \mathbb{E}[Y_1(Z_i = 1) - Y_i(Z_i = 0) \mid T_i(1) = 1, T_i(0) = 0]P(T_i(1) = 1, T_i(0) = 0) \quad \text{(compliers)} \\ &+ \mathbb{E}[Y_1(Z_i = 1) - Y_i(Z_i = 0) \mid T_i(1) = 0, T_i(0) = 1]P(T_i(1) = 0, T_i(0) = 1) \quad \text{(defiers)}\\ &+ \mathbb{E}[Y_1(Z_i = 1) - Y_i(Z_i = 0) \mid T_i(1) = 1, T_i(0) = 1]P(T_i(1) = 1, T_i(0) = 1) \quad \text{(always-takers)}\\ &+ \mathbb{E}[Y_1(Z_i = 1) - Y_i(Z_i = 0) \mid T_i(1) = 0, T_i(0) = 0]P(T_i(1) = 0, T_i(0) = 0) \quad \text{(never-takers)}\\ \end{align*} \]

  • Solve for the effect of \(Z_i\) on \(Y_i\) among compliers:

\[ \begin{align*} \mathbb{E}[Y_1(Z_i = 1) - Y_i(Z_i = 0) \mid T_i(1) = 1, T_i(0) = 0] = \frac{\mathbb{E}[Y_i(Z_i=1) - Y_i(Z_i=0)]}{P(T_i(1) = 1, T_i(0) = 0)} \end{align*} \]

  • Compliers always take the treatment they are encouraged to (replace Z for T) and apply instrumental unconfoundedness assumption to identify the numerator:

\[ \begin{align*} \mathbb{E}[Y_1(T_i = 1) - Y_i(T_i = 0) \mid T_i(1) = 1, T_i(0) = 0] = \frac{\mathbb{E}[Y_i(Z_i=1) - Y_i(Z_i=0)]}{P(T_i(1) = 1, T_i(0) = 0)} = \frac{\mathbb{E}[Y_i|Z_i=1] - \mathbb{E}[Y_i|Z_i=0)]}{P(T_i(1) = 1, T_i(0) = 0)} \end{align*} \]

LATE: Proof 2

  • To identify the denominator (pobability of being a complier), take everyone (probability 1) and subtract always-takers and never-takers, since there are no defiers, due to monotonicity:

\[ \begin{align*} \mathbb{E}[Y_1(T_i = 1) - Y_i(T_i = 0) \mid T_i(1) = 1, T_i(0) = 0] &= \frac{\mathbb{E}[Y_i|Z_i=1] - \mathbb{E}[Y_i|Z_i=0)]}{P(T_i(1) = 1, T_i(0) = 0)} \\ &= \frac{\mathbb{E}[Y_i|Z_i=1] - \mathbb{E}[Y_i|Z_i=0)]}{1 - P(T_i = 0 | Z_i = 1) - P(T_i = 1 | Z_i = 0)} \\ \end{align*} \]

\[ \begin{align*} \mathbb{E}[Y_1(T_i = 1) - Y_i(T_i = 0) \mid T_i(1) = 1, T_i(0) = 0] &= \frac{\mathbb{E}[Y_i|Z_i=1] - \mathbb{E}[Y_i|Z_i=0)]}{1 - (1 - P(T_i = 1 | Z_i = 1)) - P(T_i = 1 | Z_i = 0)} \\ &= \frac{\mathbb{E}[Y_i|Z_i=1] - \mathbb{E}[Y_i|Z_i=0)]}{P(T_i = 1 | Z_i = 1) - P(T_i = 1 | Z_i = 0)} \\ \end{align*} \]

  • Finally, because \(T_i\) is a binary variable, we can swap probabilities of \(T_i = 1\) for expectations:

\[ \begin{align*} \mathbb{E}[Y_1(T_i = 1) - Y_i(T_i = 0) \mid T_i(1) = 1, T_i(0) = 0] = \frac{\mathbb{E}[Y_i|Z_i=1] - \mathbb{E}[Y_i|Z_i=0)]}{\mathbb{E}[T_i | Z_i = 1] - \mathbb{E}[T_i | Z_i = 0)]} \\ \end{align*} \]

IV Estimation: Example

  • Assess the effect of 401(k) program participation on net financial assets of 9,915 households in the US in 1991.
library(hdm) # for the data
data(pension) # Get data
Y = pension$net_tfa # Outcome
Z = pension$e401 # Instrument
T = pension$p401 # Treatment
ITT=mean(Y[Z==1])-mean(Y[Z==0])   # estimate intention-to-treat effect (ITT)
first=mean(T[Z==1])-mean(T[Z==0]) # estimate first stage effect (complier share)
LATE=ITT/first                    # compute LATE
ITT; first; LATE                  # show ITT, first stage effect, and LATE
[1] 19559.34
[1] 0.7045084
[1] 27763.11
library(hdm) # for the data
library(AER) # load AER package for ivreg (2SLS)
data(pension) # Get data
Y = pension$net_tfa # Outcome
Z = pension$e401 # Instrument
T = pension$p401 # Treatment

LATE=ivreg(Y~T|Z)                 # run two stage least squares regression
summary(LATE,vcov = vcovHC)       # results with heteroscedasticity-robust se

Call:
ivreg(formula = Y ~ T | Z)

Residuals:
    Min      1Q  Median      3Q     Max 
-513090  -15011  -10788   -1624 1498247 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  10788.0      690.6   15.62   <2e-16 ***
T            27763.1     1985.4   13.98   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 62380 on 9913 degrees of freedom
Multiple R-Squared: 0.03586,    Adjusted R-squared: 0.03577 
Wald test: 195.5 on 1 and 9913 DF,  p-value: < 2.2e-16 

Instrumental Variable Estimation with Double Machine Learning

Motivation: Control for Covariates

  • In many applications, it may not be credible that IV assumptions like random assignment hold unconditionally, i.e. without controlling for observed covariates.
    • Natural experiments vs. purely randomized encouragement design.
    • Problematic if covariates are confounders between instrument and outcome.
    • E.g. geographic proximity to college as IV when assessing the effect of eductaion (treatment) on earnings (outcome). Pros & cons of this IV?
  • 2SLS can be extended, but the linear specification for covariates has to be correct.

Partially Linear IV Model

  • \(T_i\) is additively separable and we require conditional unconfoundedness of the instrument \(Z_i\): \[\begin{align}\begin{aligned}Y_i = \tau T_i + g(\mathbf{X_i}) + \epsilon_{Y_i}, & &\mathbb{E}(\epsilon_{Y_i} | Z_i,\mathbf{X_i}) = 0 \\Z_i = h(\mathbf{X_i}) + \epsilon_{Z_i}, & &\mathbb{E}(\epsilon_{Z_i} | \mathbf{X_i}) = 0\end{aligned}\end{align}\]

  • Robinson (1988)-style/ partialling-out version of the Wald estimand:

    • \(\tau\) is identified by using residuals of the predicted instrument as instrument for the residual-on-residual regression:

\[\tau = \frac{\mathbb{E}[(Y_i - \mu(\mathbf{X_i})) (Z_i - h(\mathbf{X_i}))]}{\mathbb{E}[(T_i - e(\mathbf{X_i})) (Z_i - h(\mathbf{X_i}))]} = \frac{\text{Cov}[(Y_i - \mu(\mathbf{X_i})), (Z_i - h(\mathbf{X_i}))]}{\text{Cov}[(T_i - e(\mathbf{X_i})), (Z_i - h(\mathbf{X_i}))]}\]

Partially Linear IV Model

  • Nuisance parameters: \(\quad \mu(\mathbf{X_i}) = \mathbb{E}[Y_i \mid \mathbf{X_i}] \quad \quad \quad e(\mathbf{X_i}) = \mathbb{E}[T_i \mid \mathbf{X_i}] \quad \quad \quad h(\mathbf{X_i}) = \mathbb{E}[Z_i \mid \mathbf{X_i}]\)


  • DML recipe: we need a moment condition of a Neyman-orthogonal score with the estimand as solution:

\[ \begin{align} \mathbb{E} [ ( Y_i - \mu(X) - \tau (T_i - e(\mathbf{X_i})) ) (Z - h(\mathbf{X_i})) ] &= 0 \\ \mathbb{E} \left[ (Y_i - \mu(\mathbf{X_i}))(Z_i - h(\mathbf{X_i})) - \tau (T_i - e(\mathbf{X_i}))(Z_i - h(\mathbf{X_i})) \right] &= 0 \\ \tau \mathbb{E} [ \underbrace{(-1)(T_i - e(\mathbf{X_i}))(Z_i - h(\mathbf{X_i}))}_{\psi_a} ] + \mathbb{E} [ \underbrace{(Y_i - \mu(\mathbf{X_i}))(Z_i - h(\mathbf{X_i}))}_{\psi_b} ] &= 0 \end{align} \]

Interactive (AIPW) IV Model

  • Relaxing homogeneous treatment assumption, but we require conditional unconfoundedness of \(Z_i\): \[\begin{align}\begin{aligned}Y_i = g(T_i, \mathbf{X_i}) + \epsilon_{Y_i}, & &\mathbb{E}(\epsilon_{Y_i} | Z_i,\mathbf{X_i}) = 0 \\Z_i = h(\mathbf{X_i}) + \epsilon_{Z_i}, & &\mathbb{E}(\epsilon_{Z_i} | \mathbf{X_i}) = 0\end{aligned}\end{align}\]


  • Based on Frölich (1988), generalizing the Wald estimator of LATE to the case with confounders.
    • ATE of \(Z_i\) on \(Y_i\) (reduced form, intention-to-treat effect) divided by ATE of \(Z_i\) on \(T_i\) (first stage / complier share).

\[\tau_{\text{LATE}} = \frac{\mathbb{E}\left[\mu(1, \mathbf{X_i}) - \mu(0, \mathbf{X_i}) + \frac{Z_i(Y_i - \mu(1, \mathbf{X_i}))}{h(\mathbf{X_i})} - \frac{(1-Z_i)(Y_i - \mu(0, \mathbf{X_i})}{(1-h(\mathbf{X_i}))} \right]}{\mathbb{E}\left[e(1, \mathbf{X_i}) - e(0, \mathbf{X_i}) + \frac{Z_i(T_i - e(1, \mathbf{X_i}))}{h(\mathbf{X_i})} - \frac{(1-Z_i)(T_i - e(0, \mathbf{X_i})}{(1-h(\mathbf{X_i}))} \right]}\]

Interactive (AIPW) IV Model

  • Nuisance parameters: \(\quad \mu(Z_i, \mathbf{X_i}) = \mathbb{E}[Y_i \mid Z_i, \mathbf{X_i}] \quad \quad \quad e(\mathbf{Z_i, X_i}) = P[T_i \mid Z_i, \mathbf{X_i}] \quad \quad \quad h(\mathbf{X_i}) = P[Z_i \mid \mathbf{X_i}]\)


  • DML recipe: we need a moment condition of a Neyman-orthogonal score with the estimand as solution:

\[ \begin{align} \mathbb{E}[\psi_b] + \mathbb{E}[\psi_a] \cdot \tau_{\text{LATE}} = &\mathbb{E}\bigg[\mu(1, \mathbf{X_i}) - \mu(0, \mathbf{X_i}) + \frac{Z_i(Y_i - \mu(1, \mathbf{X_i}))}{h(\mathbf{X_i})} - \frac{(1-Z_i)(Y_i - \mu(0, \mathbf{X_i})}{(1-h(\mathbf{X_i}))} \bigg] \\ &+ \mathbb{E}\bigg[(-1)\left[e(1, \mathbf{X_i}) - e(0, \mathbf{X_i}) + \frac{Z_i(T_i - e(1, \mathbf{X_i}))}{h(\mathbf{X_i})} - \frac{(1-Z_i)(T_i - e(0, \mathbf{X_i})}{(1-h(\mathbf{X_i}))}\right] \bigg] \cdot \tau_{\text{LATE}} = 0 \end{align} \]

IV Estimation with DML-AIPW: Example

  • Assess the effect of 401(k) program participation on net financial assets of 9,915 households in the US in 1991.
  • Participation in 401(k) program is not random, but influenced by income together with unobserved saving preferences.
  • Eligibility for 401(k) program can serve as an instrument, but it is not purely random:
    • Employers differ in leniency to offer a 401(k) program to their employees.
    • Wealthy companies are more likely to offer a 401(k) program and pay higher income.
    • Employees have chosen their employer based on their income/saving preferences.
# Load required packages
library(DoubleML)
library(mlr3)
library(mlr3learners)
library(data.table)

# suppress messages during fitting
lgr::get_logger("mlr3")$set_threshold("warn")

# load data as a data.table
data = fetch_401k(return_type = "data.table", instrument = TRUE)

# Set up basic model: Specify variables for data-backend
features_base = c("age", "inc", "educ", "fsize","marr", "twoearn", "db", "pira", "hown")

# Initialize DoubleMLData (data-backend of DoubleML)
data_dml_base = DoubleMLData$new(data,
                                 y_col = "net_tfa", # outcome variable
                                 d_cols = "p401", # treatment variable
                                 x_cols = features_base, # covariates
                                 z_cols = "e401") # instrument

# Initialize Random Forrest Learner
randomForest = lrn("regr.ranger")
randomForest_class = lrn("classif.ranger")

# Random Forest
set.seed(123)
dml_iivm_forest = DoubleMLIIVM$new(data_dml_base,
                              ml_g = randomForest,
                              ml_m = randomForest_class,
                              ml_r = randomForest_class,
                              n_folds = 3, 
                              score = "LATE", # only choice for Interactive IV models
                              trimming_threshold = 0.01,
                              subgroups = list(always_takers = FALSE, # not in sample: no participation w/o eligibility.
                                               never_takers = TRUE))

# Set nuisance-part specific parameters
dml_iivm_forest$set_ml_nuisance_params(
    "ml_g0", "p401", 
    list(max.depth = 6, mtry = 4, min.node.size = 7))  # mu(Z=0,X) = E[Y | Z=0, X]
dml_iivm_forest$set_ml_nuisance_params(
    "ml_g1", "p401", 
    list(max.depth = 6, mtry = 3, min.node.size = 5)) # mu(Z=1,X) = E[Y | Z=1, X]
dml_iivm_forest$set_ml_nuisance_params(
    "ml_m", "p401", 
    list(max.depth = 6, mtry = 3, min.node.size = 6)) # h(X) = P(Z=1 | X)
dml_iivm_forest$set_ml_nuisance_params(
    "ml_r1", "p401", 
    list(max.depth = 4, mtry = 7, min.node.size = 6)) # e(Z,X) = P(T=1 | Z, X)

dml_iivm_forest$fit()
dml_iivm_forest$summary()
Estimates and significance testing of the effect of target variables
     Estimate. Std. Error t value Pr(>|t|)    
p401     11694       1603   7.294    3e-13 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Thank you for your attention!