Saturday, November 1, 2025

Three Methods of Considering About Instrumental Variables


On this publish we’ll study a quite simple instrumental variables mannequin from three totally different views: two acquainted and one a bit extra unique. Whereas all three yield the identical answer on this specific mannequin, they lead in numerous instructions in additional sophisticated examples. Crucially, every provides us a distinct means of pondering about the issue of endogeneity and methods to clear up it.

The Setup

Think about a easy linear causal mannequin of the shape (Y leftarrow alpha + beta X + U) the place (X) is endogenous, i.e. associated to the unobserved random variable (U). Our objective is to study (beta), the causal impact of (X) on (Y). To take a easy instance, suppose that (Y) is wage and (X) is years of education. Then (beta) is the causal impact of 1 further yr of education on an individual’s wage. The random variable (U) is a catchall, representing all unobserved causes of wage, similar to skill, household background, and so forth. A linear regression of (Y) on (X) won’t enable us to study (beta). For instance, should you’re very sensible, you’ll in all probability discover college simpler and keep at school longer. However being smarter probably has its personal impact in your wage, separate from years of training. Skill is a confounder as a result of it causes each years of education and wage.

Now suppose that (Z) is an instrumental variable: one thing that’s uncorrelated with (U) (exogenous) however correlated with (X) (related). For instance, a really well-known paper identified that quarter of beginning is correlated with years of education within the US and argued that it’s unrelated to different causes of wages. Discovering instrumental variable may be very onerous in apply. Certainly, I stay skeptical that quarter of beginning is basically unrelated to (U). However that’s a dialog for an additional day. For the second, suppose we’ve a bona fide exogenous and related instrument at our disposal. To make issues even less complicated, suppose that the true causal impact (beta) is homogeneous, i.e. the identical for everybody.

1st Perspective: The IV Strategy

Regress (Y) on (Z) to search out the causal impact of (Z) on (Y). Rescale it to acquire the causal impact of (X) on (Y).

If (Z) is a legitimate and related instrument, then
[
beta_{text{IV}} equiv frac{text{Cov}(Z,Y)}{text{Cov}(Z,X)} = frac{text{Cov}(Z, alpha + beta X + U)}{text{Cov}(Z,X)} = frac{betatext{Cov}(Z,X) + text{Cov}(Z,U)}{text{Cov}(Z,X)} = beta
]

which is exactly the causal impact we’re after! The ratio of (textual content{Cov}(Z,Y)) to (textual content{Cov}(Z,X)) is known as the instrumental variables (IV) estimand, but it surely appears to return out of nowhere. A extra intuitive option to write this amount multiplies the numerator and denominator by (textual content{Var}(Z)) to yield
[
beta_{text{IV}} equiv frac{text{Cov}(Z,Y)}{text{Cov}(Z,X)} = frac{text{Cov}(Y,Z)/text{Var}(Z)}{text{Cov}(X,Z)/text{Var}(Z)} equiv frac{gamma}{pi}.
]

We see that (beta_{textual content{IV}}) is the ratio of two linear regression slopes: the slope (gamma) from a regression of (Y) on (Z) divided by the slope (pi) from a regression of (X) on (Z). This makes intuitive sense if we take into consideration items. As a result of (Z) is unrelated to (U), (gamma) provides the causal impact of (Z) on (Y). If (Y) is measured in {dollars} and (Z) is measured in miles (e.g. distance to varsity), then (gamma) is measured in {dollars} per mile. If (X) is years of education, then (beta) ought to be measured in {dollars} per yr. To transform from {dollars}/mile to {dollars}/yr, we have to multiply by miles/yr or equivalently to divide by years/mile. And certainly, (pi) is measured in years/mile as required! That is yet one more instance of my favourite maxim: most formulation in statistics and econometrics are apparent should you hold observe of the items.

2nd Perspective: The TSLS Strategy

Assemble (tilde{X}) by utilizing (Z) to “clear out” the a part of (X) that’s correlated with (U). Then regress (Y) on (tilde{X}).

Let (delta) be the intercept and (pi) be the slope from a inhabitants linear regression of (X) on (Z). Defining (V equiv X – delta – pi Z), we are able to write
[
X = tilde{X} + V, quad tilde{X} equiv delta + pi Z, quad pi equiv frac{text{Cov}(X,Z)}{text{Var}(Z)}, quad
delta equiv mathbb{E}(X) – pimathbb{E}(Z).
]

By definition (tilde{X} equiv delta + pi Z) is the greatest linear predictor of (X) primarily based on (Z), in that (delta) and (pi) clear up the optimization drawback
[
min_{a, b} mathbb{E}[(X – a – bZ)^2].
]

What’s extra, (textual content{Cov}(Z,V) = 0) by development since:
[
begin{align*}
text{Cov}(Z,V) &= text{Cov}(Z, X – delta – pi Z) = text{Cov}(Z,X) – pi text{Var}(Z)
&= text{Cov}(Z,X) – frac{text{Cov}(X,Z)}{text{Var}(Z)} text{Var}(Z) = 0.
end{align*}
]

And since (Z) is uncorrelated with (U), so is (tilde{X}):
[
text{Cov}(tilde{X}, U) = text{Cov}(delta + pi Z, U) = pitext{Cov}(Z,U) = 0.
]

So now we’ve a variable (tilde{X}) that could be a good predictor of (X) however is uncorrelated with (U). In essence, we’ve used (Z) to “clear out” the endogeneity from (X) and we did this utilizing a first stage regression of (X) on (Z). Two-stage least squares (TSLS) combines this with a second stage regression of (Y) on (tilde{X}) to recuperate (beta). To see why this works, substitute (tilde{X} +V) for (X) within the causal mannequin, yielding
[
begin{align*}
Y &= alpha + beta X + U = alpha + beta (tilde{X} + V) + U
&= alpha + beta tilde{X} + (beta V + U)
&= alpha + beta tilde{X} + tilde{U}
end{align*}
]

the place we outline (tilde{U} equiv beta V + U). Lastly, since
[
begin{align*}
text{Cov}(tilde{X}, tilde{U}) &= text{Cov}(tilde{X}, beta V + U)
&= betatext{Cov}(tilde{X}, V) + text{Cov}(tilde{X}, U)
&= betatext{Cov}(delta + pi Z , V) + 0
&= betapitext{Cov}(Z, V) = 0
end{align*}
]

a regression of (Y) on (tilde{X}) recovers the causal impact (beta) of (X) on (Y).

third Perspective: The Management Operate Strategy

Use (Z) to unravel for (V), the a part of (U) that’s correlated with (X). Then regress (Y) on (X) controlling for (V).

I’m keen to guess that you just haven’t seen this strategy earlier than! The so-called management perform strategy begins from the identical place as TSLS: the first-stage regression of (X) on (Z) from above, particularly
[
X = delta + pi Z + V, quad text{Cov}(Z,V) = 0.
]

Just like the error time period (U) from the causal mannequin (Y leftarrow alpha + beta X + U), the primary stage regression error (V) is unobserved. However as unusual because it sounds, think about operating a regression of (U) on (V). Then we might acquire
[
U = kappa + lambda V + epsilon,
quad lambda equiv frac{text{Cov}(U,V)}{text{Var}(V)},
quad kappa equiv mathbb{E}(U) – lambda mathbb{E}(V)
]

the place (textual content{Cov}(V, epsilon) = 0) by development. Now, because the causal mannequin for (Y) contains an intercept, (mathbb{E}(U) = 0). And because the first-stage linear regression mannequin that defines (V) likewise contains an intercept, (mathbb{E}(V) = 0) as properly. Which means that (kappa = 0) so the regression of (U) on (V) turns into
[
U = lambda V + epsilon, quad lambda equiv frac{text{Cov}(U,V)}{text{Var}(V)}
quad text{Cov}(V, epsilon) = 0.
]

Now, substituting for (U) within the causal mannequin provides
[
Y = alpha + beta X + U = alpha + beta X + lambda V + epsilon.
]

By development (textual content{Cov}(V, epsilon) = 0). And since (X = delta + pi Z + V), it follows that
[
begin{align*}
text{Cov}(X,epsilon) &= text{Cov}(delta + pi Z + V, epsilon)
&= pi text{Cov}(Z,epsilon) + text{Cov}(V, epsilon)
&= pi text{Cov}(Z, U – lambda V) + 0
&= pi left[ text{Cov}(Z,U) – lambda text{Cov}(Z,V)right] = 0.
finish{align*}
]

Subsequently, if solely we might observe (V), a regression of (Y) on (X) that controls for (V) would enable us to recuperate the causal impact of curiosity, particularly (beta). Such a regression would additionally give us (lambda). To see why that is attention-grabbing, discover that
[
begin{align*}
text{Cov}(X,U) &= text{Cov}(gamma + pi Z + V, U) = pitext{Cov}(Z,U) + text{Cov}(V,U)
&= 0 + text{Cov}(V, lambda V + epsilon)
&= lambda text{Var}(V).
end{align*}
]

Since (textual content{Var}(V) > 0), (lambda) inform us the path of endogeneity in (X). If (lambda >0) then (X) is positively correlated with (U), if (lambda < 0) then (X) is negatively correlated with (U), and if (lambda = 0) then (X) is exogenous. If (U) is skill and skill has a constructive impact on years of education, for instance, then (lambda) can be constructive.

Now it’s time to deal with the elephant within the room: (V) is unobserved! It’s all superb and properly to say that if (V) had been noticed our issues can be solved, however on condition that it isn’t in reality noticed what are we presupposed to do? Right here’s the place the TSLS first stage regression involves the rescue. Each (X) and (Z) are noticed, so we are able to study (delta) and (pi) by regressing (X) on (Z). Given these coefficients, we are able to merely clear up for the unobserved error: (V = X – delta – pi Z). Like TSLS, the management perform strategy depends crucially on the primary stage regression. However whereas TSLS makes use of it to assemble (tilde{X} = delta + pi Z), the management perform strategy makes use of it to assemble (V = X – delta – pi Z). We don’t change (X) with its exogenous part (tilde{X}); as an alternative we “pull out” the part of (U) that’s correlated with (X), particularly (V). In impact we management for the “omitted variable” (V), therefore the title management perform.

Simulating the Three Approaches

Maybe that was all a bit summary. Let’s make it concrete by simulating some information and really calculating estimates of (beta) utilizing every of the three approaches described above. As a result of this train depends on a pattern of information moderately than a inhabitants, estimates will change parameters and residuals will change error phrases.

To start, we have to simulate (Z) independently of ((U,V)). For simplicity I’ll make these commonplace regular and set the correlation between (U) and (V) to 0.5.

set.seed(1983) # for replicability of pseudo-random attracts
n <- 1000
Z <- rnorm(n)
library(mvtnorm)
cor_mat <- matrix(c(1, 0.5,
                    0.5, 1), 2, 2, byrow = TRUE)
errors <- rmvnorm(n, sigma = cor_mat)
head(errors)
##            [,1]        [,2]
## [1,]  0.1612255 -0.96692422
## [2,]  1.4020130  1.55818062
## [3,]  1.7212525 -0.01997204
## [4,] -0.6972637 -0.68551762
## [5,]  1.3471669 -0.01766333
## [6,] -1.0441467 -0.23113677
U <- errors[,1]
V <- errors[,2]
rm(errors)

Since this can be a simulation we really can observe (U) and (V) and therefore might regress the one on the opposite. Since I set the usual deviation of each of them equal to at least one, (lambda) will merely equal the correlation between them, particularly 0.5

coef(lm(U ~ V - 1)) # exclude an intercept
##         V 
## 0.5047334

Wonderful! Every thing is working because it ought to. The subsequent step is to generate (X) and (Y). Once more to maintain issues easy, in my simulation I’ll set (alpha = delta = 0).

pi <- 0.3
beta <- 1.1
X <- pi * Z + V
Y <- beta * X + U

Now we’re able to run some regressions! We’ll begin with an OLS regression of (Y) on (X). This considerably overestimates (beta) as a result of (X) is in reality positively correlated with (U).

OLS <- coef(lm(Y ~ X))[2]
OLS
##        X 
## 1.567642

In distinction, the IV strategy works properly.

IV <- cov(Y, Z) / cov(X, Z)
IV
## [1] 1.049043

For the TSLS and management perform approaches we have to run the first-stage regression of (X) on (Z) and retailer the outcomes.

first_stage <- lm(X ~ Z)

The TSLS strategy makes use of the fitted values of this regression as (tilde{X}).

Xtilde <- predict(first_stage)
TSLS <- coef(lm(Y ~ Xtilde))[2] # drop the intercept since we're not occupied with it
TSLS
##   Xtilde 
## 1.049043

In distinction, the management perform strategy makes use of the residuals from the primary stage regression. It additionally provides us (lambda) along with (beta).

Vhat <- residuals(first_stage) 
CF <- coef(lm(Y ~ X + Vhat))[-1] # drop the intercept since we're not occupied with it
CF # The coefficient on Vhat is lambda
##         X      Vhat 
## 1.0490432 0.5558904

Discover that we acquire exactly the identical estimates for (beta) utilizing every of the three approaches.

c(IV, TSLS, CF[1])
##            Xtilde        X 
## 1.049043 1.049043 1.049043

It seems that on this easy linear mannequin with a single endogenous regressor and a single instrument, the three approaches are numerically equal. In different phrases, they provide precisely the identical reply. This won’t essentially be true in additional sophisticated fashions, so watch out!

Epilogue

It’s time to confess that this publish had a secret agenda: to introduce the thought of a management perform within the easiest way doable! For those who’re occupied with studying extra about management capabilities, a canonical instance that doesn’t transform equivalent to IV is the so-called Heckman Choice Mannequin, which you’ll be able to study extra about right here. (Scroll down till you see the heading “Heckman Choice Mannequin.”) The essential logic is analogous: to unravel an endogeneity drawback, use a first-stage regression to estimate an unobserved amount that “soaks up” the a part of the error time period that’s correlated along with your endogenous regressor of curiosity. If these movies whet your urge for food for extra management perform enjoyable, Wooldridge (2015) offers a useful overview together with many references to the econometrics literature.

Related Articles

Latest Articles