Causal Data Science for Business Analytics
Hamburg University of Technology
Monday, 24. June 2024
directed cycle
or feedback loop
: \(B \to C \to D \to B\).\(P_{joint} = P(x_1)P(x_2 \mid x_1)P(x_3 \mid x_2, x_1)P(x_4 \mid x_3)\)
\(P_{joint} = P(x_1)P(x_2)P(x_3 \mid x_1)P(x_4 \mid x_3)\)
Given a probability distribution and a corresponding DAG, we can formalize the specification of local (in-) dependencies with:
Assumption 2.1: “Local Markov Assumption”
Given its parents in the DAG, a node X is independent of all its non-descendants.
It follows:
Definition 2.1: “Bayesian Network Factorization”
Given a probability distribution \(P\) and a DAG \(G\), \(P\) factorizes according to \(G\) if:
\(P(x_1, x_2, \ldots, x_n) = \prod_{i} P(x_i \mid pa_i)\)
with \(pa_i\) denoting the parents of node \(i\) in \(G\).
Then \(P\) and \(G\) are called Markov compatible
.
Assumption 2.2: “Minimality Assumption”
We need a further assumption to go from associations to causal relationships in a DAG:
Definition 2.2: “What is a cause?”
A variable X is said to be a cause of a variable Y if Y can change in response to changes in X.
An outcome variable Y listens
to X.
Assumption 2.3: “(Strict) Causal Edge Assumption”
In a directed graph, every parent is a direct cause of all its children.
This assumption is “strict” in the sense that every edge is active
, just like in DAGs that satisfy minimality.
Two unconnected nodes:
\(P(x_1, x_2) = P(x_1) P(x_2)\)Two connected nodes:
\(P(x_1, x_2) = P(x_1) P(x_2 \mid x_1)\)Chain:
Fork:
Immorality:
associated
through \(x_2\)
"Local Markov Assumption"
: we can block the associative path by conditioning on the parent \(x_2\)
"Bayesian network factorization"
of chains:
"Bayes' rule"
:
"Bayes' rule"
twice more:
associated
through \(x_2\) as common cause or confounder
"Local Markov Assumption"
: we can block the associative path by conditioning on parent \(x_2\)
"collider"
that blocks the path between \(x_1\) and \(x_3\)
"Bayesian network factorization"
of immoralities:
Looks
and talent
are independent of each other in the general population
selection bias
(or Berkson's paradox
)Data generating process (dgp)
: \(x_1 \sim N(0, 1), \quad x_3 \sim N(0,1), \quad x_2 = x_1 + x_3\)
Covariance in the population:
\[\begin{align*} \text{Cov}(x_1, x_3) &= \mathbb{E}[(x_1 - \mathbb{E}[x_1])(x_3 - \mathbb{E}[x_3])] \\ &= \mathbb{E}[x_1x_3] \quad (\text{zero mean})\\ &= \mathbb{E}[x_1]\mathbb{E}[x_3] \quad (\text{independent}) \\ &= 0 \end{align*}\]
\[\begin{align*} \text{Cov}(x_1, x_3 | x_2 = x) &= \mathbb{E}[x_1x_3 | x_2 = x] \\ &= \mathbb{E}[x_1(x - x_1)] \quad (\text{substituting x_3 by x - x_1 as per dgp}) \\ &= x\mathbb{E}[x_1] - \mathbb{E}[x_1^2] \quad (\text{x is constant and expectations linear}) \\ &= -1 \quad (\text{E(x_1) = 0 and E(x_1*x_1) = Var(x_1) = 1}) \end{align*}\]
library(tidyverse)
library(ggplot2)
library(ggpubr)
# simulate data
set.seed(123) # for reproducibility
looks <- rnorm(1000)
talent <- rnorm(1000)
x <- talent + looks
group <- 1 * (x > quantile(x, c(.75)))
# create a dataframe
df <- data.frame(looks, talent, group) %>%
mutate(group = if_else(group == 1, "With Job", "Without Job")) %>%
add_row(looks = Inf, talent = -Inf, group = "Overall")
# plot
ggplot(df, aes(x = looks, y = talent)) +
geom_point(aes(color = group)) +
geom_smooth(method = "lm", se = FALSE, formula = y ~ x, aes(color = "Overall")) + # Regression line for all data
geom_smooth(data = subset(df, group == "With Job"), method = "lm", se = FALSE, formula = y ~ x, aes(color = "With Job")) +
geom_smooth(data = subset(df, group == "Without Job"), method = "lm", se = FALSE, formula = y ~ x, aes(color = "Without Job")) +
stat_regline_equation(aes(label = ..eq.label.., color = as.factor(group)), formula = y ~ x) +
stat_regline_equation(aes(label = ..eq.label.., color = "Overall"), formula = y ~ x) +
labs(color = "Group") +
theme(legend.position = "bottom")
Definition 2.3: “Blocked Path”
A path \(p\) between nodes \(X\) and \(Y\) is blocked by a (potentially empty) conditioning set \(Z\) if either of the following is true:
\(p\) contains a chain of nodes \(... \rightarrow W \rightarrow ...\) or a fork \(... \leftarrow W \rightarrow ...\), and \(W\) is conditioned, i.e. \(W \in Z\).
\(p\) contains an immorality \(... \rightarrow W \leftarrow ...\), and the collider \(W\) is not conditioned, i.e. \(W \notin Z\).
Definition 2.4: “d-Separation”
Two nodes \(X\) and \(Y\) are d-separated
by a set of nodes \(Z\) if all of the paths between \(X\) and \(Y\) are blocked by \(Z\).
Theorem 2: “Global Markov Assumption”
Given that \(P\) is Markov compatible with respect to \(G\) (satisfies the local Markov assumption), if \(X\) and \(Y\) are d-separated
in \(G\) conditioned on \(Z\), then \(X\) and \(Y\) are independent in \(P\) conditioned on \(Z\).
Formally, \(X \perp\!\!\!\perp_{G} Y \,|\, Z \implies X \perp\!\!\!\perp_{P} Y \,|\, Z\).
library(ggdag)
library(ggplot2)
dag <- dagify(
# relationship for each node
W ~ Z,
W ~ X,
Y ~ X,
U ~ W,
# Location of each node
coords = list(
x = c(Z = 0, W = 1, X = 2, Y = 3, U = 1),
y = c(Z = 0, W = -0.5, X = 0, Y = 0, U = -1)
)
)
dag %>%
ggplot(aes(x = x, y = y,
xend = xend, yend = yend)) +
geom_dag_text(color = "black") +
geom_dag_edges() +
geom_dag_point(shape = 1) +
theme_dag()
d-separated
conditional on
library(ggdag)
library(ggplot2)
dag <- dagify(
Y ~ M2 + X3 + W3,
W3 ~ W2,
W1 ~ W2,
T ~ W1,
M1 ~ T,
M2 ~ M1,
X1 ~ T,
X3 ~ Y,
X2 ~ X1 + X3,
coords = list(
x = c(T = 0, W1 = 1, W2 = 1.5, W3 = 2, M1 = 1, M2 = 2, Y = 3, X1 = 1, X2 = 1.5, X3 = 2),
y = c(T = -1, W1 = 0, W2 = 0.5, W3 = 0, M1 = -1, M2 = -1, Y = -1, X1 = -2, X2 = -2.5, X3 = -2)
)
)
dag %>%
ggplot(aes(x = x, y = y,
xend = xend, yend = yend)) +
geom_dag_text(color = "black") +
geom_dag_edges() +
geom_dag_point(shape = 1) +
theme_dag()
d-separated
conditional on
[1] FALSE
[1] FALSE
[1] TRUE
[1] TRUE
[1] FALSE
[1] TRUE
selection bias
or confounding association
.Neal, Brady (2020). Introduction to causal inference from a Machine Learning Perspective. Course Lecture Notes (draft).
do-operator
.ATE
with it: \[\text{ATE} = \mathbb{E}[Y(1)] - \mathbb{E}[Y(0)] = \mathbb{E}[Y|do(T = 1)] - \mathbb{E}[Y|do(T = 0)]\]Definition 2.5: “Backdoor Criterion”
A set of variables \(W\) satisfies the backdoor criterion relative to \(T\) and \(Y\) if the following are true:
\(W\) blocks all backdoor paths between \(T\) and \(Y\) that contains an arrow into \(T\).
\(W\) does not contain any descendants of \(T\).
library(ggdag)
library(ggplot2)
dag <- dagify(
T ~ X1 + X2,
X6 ~ T,
X2 ~ X3,
X1 ~ X3 + X4,
X5 ~ X4,
Y ~ X1 + X5 + X6,
exposure = "T", outcome = "Y",
coords = list(
x = c(T = 0, X1 = 1, X2 = 0, X3 = 0, X4 = 2, X5 = 2, X6 = 1, Y = 2),
y = c(T = 0, X1 = 1, X2 = 1, X3 = 2, X4 = 2, X5 = 1, X6 = 0, Y = 0)
)
)
dag %>%
ggplot(aes(x = x, y = y,
xend = xend, yend = yend)) +
geom_dag_text(color = "black") +
geom_dag_edges() +
geom_dag_point(shape = 1) +
theme_dag()
library(ggdag)
library(ggplot2)
dag <- dagify(
T ~ X1 + X2,
X6 ~ T,
X2 ~ X3,
X1 ~ X3 + X4,
X5 ~ X4,
Y ~ X1 + X5 + X6,
exposure = "T", outcome = "Y",
coords = list(
x = c(T = 0, X1 = 1, X2 = 0, X3 = 0, X4 = 2, X5 = 2, X6 = 1, Y = 2),
y = c(T = 0, X1 = 1, X2 = 1, X3 = 2, X4 = 2, X5 = 1, X6 = 0, Y = 0)
)
)
adjustmentSets(dag)
{ X1, X5 }
{ X1, X4 }
{ X1, X3 }
{ X1, X2 }
Thank you for your attention! | |
Causal Data Science: (2) Graphical Causal Models