A Good Instrument is a Dangerous Management

October 27, 2025

4

Right here’s a puzzle for you.
What’s going to occur if we regress some consequence of curiosity on each an endogenous regressor and a sound instrument for that regressor?
I hadn’t thought of this query till 2018, when one in all my undergraduate college students requested it throughout class.
If reminiscence serves, my off-the-cuff reply left a lot to be desired.
5 years later I’m lastly prepared to offer a completely passable reply; higher late than by no means I suppose!

We’ll begin by being a bit extra exact in regards to the setup.
Suppose that (Y) is expounded to (X) based on the next linear causal mannequin [
Y leftarrow alpha + beta X + U
]
the place (beta) is the causal impact of curiosity and (U) represents unobserved causes of (Y) that could be associated to (X).
Now, for any noticed random variable (Z), we will outline
[
V equiv X – (pi_0 + pi_1 Z), quad pi_0 equiv mathbb{E}[X] – pi_1 mathbb{E}[Z], quad pi_1 equiv frac{textual content{Cov}(X,Z)}{textual content{Var}(Z)}.
]
That is the inhabitants linear regression of (X) on (Z).
By building it satisfies (mathbb{E}[V] = textual content{Cov}(Z,V) = 0).
Thus we will write,
[
X = pi_0 + pi_1 Z + V, quad mathbb{E}[V] = textual content{Cov}(Z,V) = 0
]
for any random variables (X) and (Z), just by setting up (V) as described above.
If (pi_1 neq 0), we are saying that (Z) is related.
If (textual content{Cov}(Z,U) = 0), we are saying that (Z) is exogenous.
If (Z) is each related and exogenous, we are saying that it’s a legitimate instrument for (X).

As we’ve outlined it above, (V) is solely a regression residual.
But when (Z) is a sound instrument, it seems that we will consider (V) because the “endogenous half” of (X).
To see why, increase (textual content{Cov}(X,U)) as follows:
[
text{Cov}(X,U) = text{Cov}(pi_0 + pi_1 Z + V, ,U) = pi_1 text{Cov}(Z,U) + text{Cov}(U,V) = text{Cov}(U,V)
]
since we’ve got assumed that (textual content{Cov}(Z,U) = 0).
In phrases, the endogeneity of (X) is exactly the identical factor because the covariance between (U) and (V).

Right here’s a useful mind-set about this.
If (Z) is exogenous then our regression of (X) on (Z) partitions the general variation in (X) into two elements: the “good” (exogenous) variation (pi_1 Z) is uncorrelated with (U), whereas the “dangerous” (endogenous) variation (V) is correlated with (U).
The logic of two-stage least squares is that regressing (Y) on the “good” variation, (pi_1 Z) permits us to get better (beta), the causal impact of curiosity.

Utilizing the mannequin and derivations from above, let’s run a little bit simulation.
To simulate a sound instrument (Z) and an endogenous regressor (X) we will proceed as follows.
First generate unbiased commonplace regular attracts ({Z_i}_{i=1}^n).
Subsequent independently generate pairs of correlated commonplace regular attracts ({(U_i, V_i)}_{i=1}^n) with (textual content{Corr}(U_i, V_i) = rho).
Lastly, set
[
X_i = pi_0 + pi_1 Z_i + V_i quad text{and} quad
Y_i = alpha + beta X_i + U_i
]
for every worth of (i) between (1) and (n).
The next chunk of R code runs this simulation with (n = 5000), (rho = 0.5), (pi_0 = 0.5), (pi_1 = 0.8), (alpha = -0.3) and (beta = 1):

set.seed(1234)
n <- 5000
z <- rnorm(n)

library(mvtnorm)
Rho <- matrix(c(1, 0.5, 
                0.5, 1), 2, 2, byrow = TRUE)
errors <- rmvnorm(n, sigma = Rho)

u <- errors[, 1]
v <- errors[, 2]
x <- 0.5 + 0.8 * z + v
y <- -0.3 + x + u

Within the simulation (Z) is a sound instrument, (X) is an endogenous regressor, and the true causal impact of curiosity equals one.
Utilizing our simulation knowledge, let’s take a look at out three potential estimators:

(widehat{beta}_text{OLS}equiv) the slope coefficient from an OLS regression of (Y) on (X).
(widehat{beta}_text{IV}equiv) slope coefficient from an IV regression of (Y) on (X) with (Z) as an instrument.
(widehat{beta}_{X.Z}equiv) the coefficient on (X) in an OLS regression of (Y) on (X) and (Z).

c(fact = 1,
  b_OLS = cov(x, y) / var(x), 
  b_IV = cov(z, y) / cov(z, x), 
  b_x.z = unname(coef(lm(y ~ x + z))[2])) |> # unname() makes the names prettier!
  spherical(2)

## fact b_OLS  b_IV b_x.z 
##  1.00  1.31  1.01  1.49

As anticipated, OLS is much from the reality whereas IV just about nails it.
Curiously, the regression of y on x and z provides the worst efficiency of all! Is that this only a fluke?
Maybe it’s an artifact of the simulation parameters I selected, or simply dangerous luck arising from some uncommon simulation attracts.
To search out out, we’ll want a bit extra algebra.
However stick with me: the payoff is price it, and there’s not an excessive amount of additional math required!

Regression of (Y) on (X) and (Z)

The coefficient on (X) in a inhabitants linear regression of (Y) on (X) and (Z) is given by
[
beta_{X.Z} = frac{text{Cov}(tilde{X}, Y)}{text{Var}(tilde{X})}
]
the place (tilde{X}) is outlined because the residual in one other inhabitants linear regresasion: the regression of (X) on (Z).
However wait a minute: we’ve already seen this residual!
Above we known as it (V) and used it to put in writing (X = pi_0 + pi_1 Z + V).
Utilizing this equation, together with the linear causal mannequin relating (Y) to (X) and (U), we will re-express (beta_{X.Z}) as
[
begin{align*}
beta_{X.Z} &= frac{text{Cov}(V, Y)}{text{Var}(V)} = frac{text{Cov}(V, alpha + beta X + U)}{text{Var}(V)}
&= frac{text{Cov}(U,V) + betatext{Cov}(V, pi_0 + pi_1 Z + V)}{text{Var}(V)}
&= beta + frac{text{Cov}(U,V)}{text{Var}(V)}
end{align*}
]
since (textual content{Cov}(Z, V) = 0) by building.
We’ve some simulation knowledge at our disposal, so let’s examine this calculation.
Within the simulation (beta = 1) and
[
frac{text{Cov}(U, V)}{text{Var}(V)} = 0.5
]
since (textual content{Var}(U) = textual content{Var}(V) = 1) and (textual content{Cov}(U, V) = 0.5).
Subsequently (beta_{X.Z} = 1.5).
And, certainly, that is nearly precisely the worth of our estimate from our simulation above.

Regression of (Y) on (X) Solely

Thus far so good.
Now what in regards to the “regular” OLS estimator?
A fast calculation provides
[
beta_{text{OLS}} = beta + frac{text{Cov}(X,U)}{text{Var}(X)} = beta + frac{text{Cov}(V,U)}{text{Var}(X)}
]
utilizing the truth that (textual content{Cov}(X,U) = textual content{Cov}(U,V)), as defined above.
Once more, we will examine this in opposition to our simulation outcomes.
We all know that (textual content{Cov}(V,U) = 0.5) and
[
text{Var}(X) = text{Var}(pi_0 + pi_1 Z + V) = pi_1^2 text{Var}(Z) + text{Var}(V) = (0.8)^2 + 1 = 41/25
]
since (Z) and (V) are uncorrelated by building, (textual content{Var}(Z) = textual content{Var}(V) = 1) and (pi_1 = 0.8) within the simulation design.
Therefore, (beta_{textual content{OLS}} = 1 + 25/82 approx 1.305).
Once more, this agrees nearly completely with our simulation.

Evaluating the Outcomes

To summarize, we’ve got proven that
[
beta_{X.Z} = beta + frac{text{Cov}(U,V)}{text{Var}(V)}, quad text{while} quad
beta_{text{OLS}} = beta + frac{text{Cov}(U,V)}{text{Var}(X)}.
]
There is just one distinction between these two expressions: (beta_{X.Z}) has (textual content{Var}(V)) the place (beta_{textual content{OLS}}) has (textual content{Var}(X)).
Returning to our expression for (textual content{Var}(X)) from above,
[
text{Var}(X) = pi_1^2 text{Var}(Z) + text{Var}(V) > text{Var}(V)
]
so long as (pi_1 neq 0) and (textual content{Var}(Z) neq 0).
In different phrases, there may be all the time extra variation in (X) than there may be in (V), since (V) is the “leftover” a part of (X) after regressing on (Z).
As a result of the variances of (X) and (V) seem within the denominators of our expressions from above, it follows that
[
left| text{Cov}(U,V)/text{Var}(V)right| > left| text{Cov}(U,V)/text{Var}(X)right|.
]
In different phrases, (beta_{X.Z}) is all the time farther from the reality than (beta_{OLS}), precisely as we present in our simulation.

In our simulation, (widehat{beta}_{X.Z}) gave a worse estimate of (beta) than (widehat{beta}_{X.Z}).
The derivations from above present that this wasn’t a fluke: including a sound instrument (Z) as an extra management regressor solely makes the bias in our estimated causal impact worse than it was to start with.
This holds for any legitimate instrument and any endogenous regressor in a linear causal mannequin.
I hope you discovered the derivations from above convincing.
Besides, chances are you’ll be questioning if there’s an intuitive rationalization for this phenomenon.
I’m please to tell you that the reply is sure!

In an earlier put up I described the management operate strategy to instrumental variables regression.
That put up confirmed that the coefficient on (X) in a regression of (Y) on (X) and (V) provides the right causal impact.
We don’t know (V), however we will estimate it by regressing (X) on (Z) and saving the residuals.
The logic of a number of regression exhibits that together with (V) as a management regressor “soaks up” the portion of (X) that’s defined by (V).
As a result of (V) represents the “dangerous” (endogenous) variation in (X), this solves our endogeneity drawback.
In impact, (V) captures the unobserved “omitted variables” that play havoc with a naive regression of (Y) on (X).

Now, distinction this with a regression of (Y) on (X) and (Z).
On this case, we absorb the variation in (X) that’s defined by (Z).
However (Z) represents the good (exogenous) variation in (X)!
Absorbing this variation leaves solely the dangerous variation behind, making our endogeneity drawback worse than it was to start with.
On this instance, (Z) is what is called a dangerous management, a management regressor that makes issues worse quite than higher.
A standard piece of recommendation for avoiding dangerous controls is to solely embrace management regressors which can be correlated with (X) and (Y) however are not themselves attributable to (Z).
The instance on this put up exhibits that this recommendation unsuitable.
Right here (Z) isn’t attributable to (X), and is correlated with each (X) and (Y).
Nonetheless, it’s a dangerous management
In brief, a sound instrument offers a strong technique to perform causal inference from observational knowledge, however provided that you utilize it in the precise method.
An excellent instrument is a foul management!

A Good Instrument is a Dangerous Management

Regression of (Y) on (X) and (Z)

Regression of (Y) on (X) Solely

Evaluating the Outcomes

Related Articles

New blood check can predict danger of postpartum melancholy with greater than 80% accuracy

Spearman Rank Correlation Evaluation utilizing Agri Analyze

Serving to scientists run complicated knowledge analyses with out writing code | MIT Information

Latest Articles

New blood check can predict danger of postpartum melancholy with greater than 80% accuracy

Spearman Rank Correlation Evaluation utilizing Agri Analyze

Serving to scientists run complicated knowledge analyses with out writing code | MIT Information

A practitioner’s primer on deterministic software modernization

Run LM Studio Fashions Domestically in your Machine