At a latest seminar dinner the dialog drifted to causal inference, and I discussed my dream of at some point producing a Girl Gaga parody music video referred to as “Unhealthy Management”.
A full of life dialogue of dangerous controls ensued, throughout which I provided considered one of my favourite examples: a very good instrument is a nasty management.
To summarize that earlier put up: together with a legitimate instrumental variable as a management variable can solely amplify the bias on the coefficient for our endogenous regressor of curiosity.
When used as a management, the instrument “soaks up” the great (exogenous) variation within the endogenous regressor, abandoning solely the dangerous (endogenous) variation.
That is the other of what occurs in an instrumental variables regression, the place we use the instrument to extract solely the great variation within the endogenous regressor.
Extra typically, a “dangerous management” is a covariate that we shouldn’t regulate for when utilizing a selection-on-observables method to causal inference.
Upon listening to my IV instance, my colleague instantly requested “however what in regards to the coefficient on the instrument itself?”
This can be a nice query and one I hadn’t thought of earlier than.
Immediately I’ll offer you my reply.
This put up is a sequel, so chances are you’ll discover it useful to look at my earlier put up earlier than studying additional.
On the very finish of the put up I’ll depend on a number of primary concepts about directed acyclic graphs (DAGs).
If this materials is unfamiliar, chances are you’ll discover my remedy results slides useful.
With these caveats, I’ll do my greatest to maintain this put up comparatively self-contained.
Recap of Half I
Suppose that (X) is our endogenous regressor of curiosity within the linear causal mannequin (Y = alpha + beta X + U) the place (textual content{Cov}(X,U) neq 0) however (textual content{Cov}(Z,U) = 0), and the place (Z) is an instrumental variable that’s correlated with (X).
Now think about the inhabitants linear regression of (Y) on each (X) and (Z), particularly
[
Y = gamma_0 + gamma_X X + gamma_Z Z + eta
]
the place the error time period (eta) satisfies (textual content{Cov}(X,eta) = textual content{Cov}(Z,eta) = mathbb{E}(eta) = 0) by development.
Additional outline the inhabitants linear regression of (X) on (Z), particularly
[
X = pi_0 + pi_Z Z + V
]
the place the error time period (V) satisfies (textual content{Cov}(Z,V) = mathbb{E}(V) = 0) by development.
Lastly, outline the inhabitants linear regression of (Y) on (X) as
[
Y = delta_0 + delta_X X + epsilon, quad text{Cov}(X,epsilon) = mathbb{E}(epsilon) = 0.
]
Utilizing this notation, the consequence from my earlier put up will be written as
[
delta_X = beta + frac{text{Cov}(X,U)}{text{Var}(X)}, quad text{and} quad gamma_X = beta + frac{text{Cov}(X,U)}{text{Var}(V)}.
]
To grasp what this tells us, discover that, utilizing the “first-stage” regression of (X) on (Z), we will write
[
text{Var}(V) equiv text{Var}(X – pi_0 – pi_Z Z) = text{Var}(X) – pi_Z^2 text{Var}(Z).
]
This reveals that each time (Z) is a related instrument ((pi_Z neq 0)), we will need to have (textual content{Var}(V) < textual content{Var}(X)).
It follows that (gamma_X) is extra biased than (delta_X): including (Z) as a management regressor solely makes our estimate of the impact of (X) worse!
What about (gamma_Z)?
So if (Z) soaks up the good variation in (X), what in regards to the coefficient (gamma_Z) on the instrument (Z)?
Maybe this coefficient incorporates some helpful details about the causal impact of (X) on (Y)?
To seek out out, we’ll use the FWL Theorem as follows:
[
gamma_Z = frac{text{Cov}(Y,tilde{Z})}{text{Var}(tilde{Z})}
]
the place (Z = lambda_0 + lambda_X X + tilde{Z}) is the inhabitants linear regression of (Z) on (X).
That is the reverse of the first-stage regression of (X) on (Z) described above.
Right here the error time period (tilde{Z}) satisfies (mathbb{E}(tilde{Z}) = textual content{Cov}(tilde{Z}, X) = 0) by development.
Substituting the causal mannequin provides
[
text{Cov}(Y, tilde{Z}) = text{Cov}(alpha + beta X + U, tilde{Z}) = beta text{Cov}(X,tilde{Z}) + text{Cov}(U,tilde{Z}) = text{Cov}(U, tilde{Z})
]
since (textual content{Cov}(X,tilde{Z}) = 0) by development.
Now, substituting the definition of (tilde{Z}),
[
text{Cov}(U, tilde{Z}) = text{Cov}(U, Z – lambda_0 – lambda_X X) = text{Cov}(U,Z) – lambda_X text{Cov}(U,X) = -lambda_X text{Cov}(X,U)
]
since (textual content{Cov}(U,Z) = 0) by assumption.
We are able to already see that (gamma_Z) is not going to assist us find out about (beta).
Initially, the time period containing (beta) vanished; second of all, the time period that remained is polluted by the endogeneity of (X), particularly (textual content{Cov}(X,U)).
Nonetheless, let’s see if we will get a clear expression for (gamma_Z).
To date now we have calculated the numerator of the FWL expression, displaying that (textual content{Cov}(Y,tilde{Z}) = -lambda_X textual content{Cov}(X,U)).
The following step is to calculate (textual content{Var}(tilde{Z})):
[
text{Var}(tilde{Z}) = text{Var}(Z – lambda_0 – lambda_X X) = text{Var}(Z) + lambda_X^2 text{Var}(X) – 2lambda_X text{Cov}(X,Z).
]
Since (lambda_X equiv textual content{Cov}(X,Z)/textual content{Var}(X)), our expression for (textual content{Var}(tilde{Z})) simplifies to
[
text{Var}(tilde{Z}) = text{Var}(Z) – lambda_X text{Cov}(X,Z)
]
so now we have found that:
[
gamma_Z = frac{-lambda_X text{Cov}(X,U)}{text{Var}(Z) – lambda_X text{Cov}(X,Z)}.
]
Name me old school, however I actually don’t like having (lambda_X) in that expression.
I’d really feel a lot happier if we may discover a solution to re-write this by way of the extra acquainted IV first-stage coefficient (pi_Z).
Let’s give it a strive!
Let’s use my favourite trick of multiplying by one:
[
lambda_X equiv frac{text{Cov}(X,Z)}{text{Var}(X)} = frac{text{Cov}(X,Z)}{text{Var}(X)} cdot frac{text{Var}(Z)}{text{Var}(Z)} = pi_Z cdot frac{text{Var}(Z)}{text{Var}(X)}.
]
Substituting for (lambda_X) provides
[
gamma_Z = frac{-pi_Z frac{text{Var}(Z)}{text{Var}(X)} text{Cov}(X,U)}{text{Var}(Z) – pi_Z frac{text{Var}(Z)}{text{Var}(X)} text{Cov}(X,Z)} = frac{-pi_Z text{Cov}(X,U)}{text{Var}(X) – pi_Z^2 text{Var}(Z)}.
]
We are able to simplify this even additional by substituting (textual content{Var}(V) = textual content{Var}(X) – pi_Z^2 textual content{Var}(Z)) from above to acquire
[
gamma_Z = -pi_Z frac{text{Cov}(X,U)}{text{Var}(V)}.
]
And now we acknowledge one thing from above: (textual content{Cov}(X,U)/textual content{Var}(V)) was the bias of (gamma_X) relative to the true causal impact (beta)!
This implies we will additionally write (gamma_Z = -pi_Z (gamma_X – beta)).
A Little Simulation
We appear to be doing an terrible lot of algebra on this weblog these days.
To be sure that we haven’t made any foolish errors, let’s test our work utilizing a bit of simulation experiment taken from my earlier put up.
Spoiler alert: all the pieces checks out!
set.seed(1234)
n <- 1e5
# Simulate instrument (z)
z <- rnorm(n)
# Simulate error phrases (u, v)
library(mvtnorm)
Rho <- matrix(c(1, 0.5,
0.5, 1), 2, 2, byrow = TRUE)
errors <- rmvnorm(n, sigma = Rho)
# Simulate linear causal mannequin
u <- errors[, 1]
v <- errors[, 2]
x <- 0.5 + 0.8 * z + v
y <- -0.3 + x + u
# Regression of y on x and z
gamma <- lm(y ~ x + z) |>
coefficients()
gamma
## (Intercept) x z
## -0.5471213 1.5018705 -0.3981116
# First-stage regression of x on z
pi <- lm(x ~ z) |>
coefficients()
pi
## (Intercept) z
## 0.5020338 0.7963889
# Examine two completely different expressions for gamma_Z to the estimate itself
c(gamma_z = unname(gamma[3]),
version1 = unname(-0.8 * cov(x, u) / var(v)),
version2 = unname(-pi[2] * (gamma[2] - 1))
)
## gamma_z version1 version2
## -0.3981116 -0.4024918 -0.3996841
Making Sense of This Outcome
To date all we’ve performed is horrible, tedious algebra and a bit of simulation to test that it’s appropriate.
However in actual fact there’s some very attention-grabbing instinct for the outcomes we’ve obtained, instinct that’s deeply related to the concept of a nasty management in a directed acyclic graph (DAG).
Within the mannequin we’ve described above, (Z) has a causal impact on (Y).
It is because (Z) causes (X) which in flip causes (Y).
As a result of (Z) is an instrument, its solely impact on (Y) goes by way of (X).
The unobserved confounder (U) is a standard reason for (X) and (Y) however is unrelated to (Z).
Even should you’re not accustomed to DAGs, you’ll most likely discover this diagram comparatively intuitive:
library(ggdag)
library(ggplot2)
iv_dag <- dagify(
Y ~ X + U,
X ~ Z + U,
coords = record(
x = c(Z = 1, X = 3, U = 4, Y = 5),
y = c(Z = 1, X = 1, U = 2, Y = 1)
)
)
iv_dag |>
ggdag() +
theme_dag()
Within the determine, an arrow from (A) to (B) signifies that (A) is a reason for (B).
A causal path, is a sequence of arrows that “obeys one-way indicators” and leads from (A) to (B).
As a result of there’s a directed path from (Z) to (Y), we are saying that (Z) is a reason for (Y).
To see this utilizing our regression equations from above, substitute the IV first-stage into the linear causal mannequin to acquire
[
begin{align*}
Y &= alpha + beta X + U = alpha + beta (pi_0 + pi_Z Z + V) + U
&= (alpha + beta pi_0) + beta pi_Z Z + (beta V + U).
end{align*}
]
This provides us a linear equation with (Y) on the left-hand aspect and (Z) alone on the right-hand aspect.
That is referred to as the “reduced-form” regression.
Since (textual content{Cov}(Z,U)=0) by assumption and (textual content{Cov}(Z,V) = 0) by development, the reduced-form is a bona fide inhabitants linear regression.
That signifies that regressing (Y) on (Z) will certainly give us a slope that equals (pi_Z occasions beta).
To see why the slope is a product, recall that (pi_Z) is the causal impact of (Z) on (X), the (Zrightarrow X) arrow within the diagram, whereas (beta) is the causal impact of (X) on (Y), the (X rightarrow Y) arrow within the diagram.
As a result of the one method (Z) can affect (Y) is thru (X), it is sensible that the causal impact of (Z) on (Y) is the product of those two results.
So now we see that the reduced-form coefficient (pi_Z beta) is certainly a causal impact.
How does this relate to (gamma_Z)?
Do not forget that (gamma_Z) was the coefficient on (Z) in a regression of (Y) on (Z) and (X), in different phrases a regression that adjusted for (X).
So is adjusting for (X) the precise name? Completely not!
There aren’t any back-door paths between (Z) and (Y).
Which means that we don’t have to regulate for something to study the causal impact of (Z) on (Y).
In truth adjusting for (X) is a mistake for two completely different causes.
First, (X) is a mediator on the trail (Z rightarrow X rightarrow Y).
If there have been no confounding, i.e. if (textual content{Cov}(X,U) = 0) so there is no such thing as a (Urightarrow X) arrow, adjusting for (X) would block the one causal path from (Z) to (Y).
We are able to see this in our equations from above.
Suppose that (textual content{Cov}(X,U) = 0).
Then now we have (gamma_X = beta) however (gamma_Z = 0)!
There was a useless giveaway in our derivation: the formulation for (gamma_Z) doesn’t rely on (beta) in any respect.
Second, as a result of there is confounding, adjusting for (X) creates a spurious affiliation between (Z) and (Y) by way of the back-door path (Z rightarrow X leftarrow U rightarrow Y).
As a result of (X) is a collider on the trail (Z rightarrow X leftarrow U rightarrow Y), this path begins out closed.
Adjusting for (X) opens this back-door path, making a spurious affiliation between (Z) and (Y).
To see why that is the case, suppose that (beta = 0).
On this case there may be no causal impact of (X) on (Y) and therefore no causal impact of (Z) on (Y).
But when (textual content{Cov}(X,U) neq 0), then now we have (gamma_Z neq 0)!
So if you wish to study the causal impact of (Z) on (Y), it’s not simply that (X) is a dangerous management; it’s a doubly dangerous management!
With out adjusting for (X), all the pieces is ok: the reduced-form regression of (Y) on (Z) provides us precisely what we’re after.
Epilogue
Once I confirmed this put up to a different colleague he requested me whether or not there may be any solution to find out about (beta) by combining (gamma_Z) and (gamma_X).
The reply isn’t any: the regression of (Y) on (X) and (Z) alone doesn’t comprise sufficient info.
Since
[
gamma_Z = -pi_Z frac{text{Cov}(X,U)}{text{Var}(V)} quad text{and} quad gamma_X = beta + frac{text{Cov}(X,U)}{text{Var}(V)}
]
we will rearrange to acquire the next expression for (beta):
[
beta = gamma_X + frac{gamma_Z}{pi_Z}
]
which we will confirm in our little simulation instance as follows:
gamma[2] + gamma[3]/pi[2]
## x
## 1.001975
Thus, with the intention to remedy for (beta), we have to run the first-stage regression to study (pi_Z).