Sunday, October 19, 2025

Econometrics Puzzler #1: To Instrument or Not?


Welcome to the primary installment of the Econometrics Puzzler, a brand new collection of shorter posts that can check and strengthen your econometric instinct. Right here’s the format: I’ll pose a query that requires solely introductory econometrics data, however has an sudden reply. The concept is so that you can ponder the query earlier than studying my resolution. Many of those questions are based mostly on frequent misconceptions that come up year-after-year in my econometrics educating. I hope you’ll discover them each difficult and enlightening. At this time we’ll revisit everybody’s favourite instance: Angrist & Krueger’s 1991 paper on the returns to training.

To Instrument or To not Instrument?

Suppose I wish to predict somebody’s wage as precisely as potential utilizing a linear mannequin–that’s, I need my predictions to be as shut as they are often to the precise wages. (Actually we are going to predict the log of wage.) I observe a consultant pattern of staff that features their log wage (Y_i) and years of education (X_i). I may use an OLS regression of (Y) on (X) to make my predictions, however years of education are the traditional instance of an endogenous regressor; they’re correlated with myriad unobserved causes of wages, like “capability” and household background.
Thankfully, I even have a legitimate and related instrument: quarter of delivery (Z_i) is correlated with years of education and (supposedly) uncorrelated with unobserved causes of wage.

So right here’s the query: to get the absolute best predictions of wage from the data I’ve, ought to I run OLS or IV? Extra particularly, let’s use imply squared error (MSE) as our measure of “greatest”. To borrow a time period from Grant Sanderson, “pause and ponder” earlier than studying additional.

Taking it to the Knowledge

The Angrist & Krueger (1991) dataset is offered from Michal Kolesár’s ManyIV R bundle.
Right here I’ll limit consideration to individuals born within the first or fourth quarter of the 12 months.
Instrument is a dummy variable for being born within the fourth quarter, relative to being born within the first quarter:

# remotes::install_github("kolesarm/ManyIV") # if wanted

library(ManyIV) # Incorporates Angrist & Krueger (1991) dataset

# For details about the dataset, see the bundle documentation:
# ?ManyIV::ak80

library(dplyr)

dat <- ak80 |> 
  as_tibble() |> 
  filter(qob %in% c('Q1', 'This fall')) |> 
  mutate(z = (qob == 'This fall')) |> 
  choose(x = training, y = lwage, z)

To check how effectively OLS and IV carry out as predictors, we’ll perform a “pseudo-out-of-sample” experiment. First we’ll randomly break up dat right into a “coaching” pattern containing 80% of the observations and a “check” pattern containing the remaining 20%:

set.seed(1693) # For reproducibility

n_total <- nrow(dat) 
n_train <- spherical(0.8 * n_total) 
n_test <- n_total - n_train 

train_indices <- pattern(n_total, n_train, substitute = FALSE) 

dat_train <- dat[train_indices, ] 
dat_test <- dat[-train_indices, ] 

Now we’ll use dat_train to suit IV and OLS:

ols_fit <- lm(y ~ x, information = dat_train) 
ols_coefs <- coef(ols_fit)

library(ivreg) # set up with `set up.packages("ivreg")` if wanted 
iv_fit <- ivreg(y ~ x | z, information = dat_train)
iv_coefs <- coef(iv_fit)

rbind(OLS = ols_coefs, IV = iv_coefs)
##     (Intercept)          x
## OLS    5.004283 0.07008633
## IV     4.749959 0.09000644

Now we’re able to make our predictive comparability! We’ll “fake” that we don’t know the wages of the individuals in our check pattern and use the OLS and IV coefficients from above to foretell the “lacking” wages:

dat_test <- dat_test |> 
  mutate(ols_pred = ols_coefs[1] + ols_coefs[2] * x,
         iv_pred = iv_coefs[1] + iv_coefs[2] * x) 

After all we really do know the wages of everybody in dat_test; that is the column y. So we are able to now evaluate our predictions towards the reality. A standard measure of predictive high quality is imply squared error (MSE), the common squared distinction between the reality and our predictions. As a result of it squares the distinction between the reality and our prediction, MSE penalizes bigger errors greater than smaller ones. Whereas there are different methods to measure prediction error, MSE is a typical selection and one that can play a key function in the remainder of this put up. And the winner is … OLS! As a result of it has a decrease MSE, the predictions from the OLS mannequin are, on common, nearer to the true wages than the predictions from the IV mannequin:

dat_test |> 
  summarize(ols_mse = imply((y - ols_pred)^2),
            iv_mse = imply((y - iv_pred)^2))
## # A tibble: 1 × 2
##   ols_mse iv_mse
##       
## 1   0.407  0.411

OLS beats IV by a small however considerable margin. (The comparatively small distinction on this case displays the truth that IV and OLS estimates are pretty comparable on this instance.) It seems that this isn’t a fluke. The identical can be true in any instance. Except the instrument is completely correlated with the endogenous regressor, OLS will at all times have a decrease predictive MSE than IV.

What’s actually occurring right here?

I ask this query of my introductory econometric college students yearly and most of them are shocked by the reply. If we’ve got an endogenous regressor OLS is biased and inconsistent; why would we ever move up the chance to make use of a legitimate and related instrument! The reply is surprisingly easy: by definition the OLS estimand provides the very best linear predictor of (Y), the one which minimizes MSE: (min_{a,b} mathbb{E}[{Y – (a + b X)}^2]). That is true regardless of whether or not (X) is endogenous. Certainly, from a predictive perspective, endogeneity is a characteristic not a bug! The truth that years of education “smuggles in” details about capability and household background is precisely why it provides higher predictions than IV. Keep in mind: the entire level of IV is to take away the a part of (X) that’s associated to unobserved causes of (Y). That is precisely what we wish if our purpose is to know cause-and-effect, however it’s the reverse of what would make sense in a prediction drawback, the place we’d like to make use of as a lot info as potential.

A Purple Herring: The Bias-Variance Tradeoff

College students generally reply this query by invoking the bias-variance tradeoff, mentioning that “OLS is biased however has a decrease variance than IV, so it may have a decrease MSE.” That is right, however misses the deeper level. They’re eager about bias in estimating the causal parameter. However, once more, the purpose right here is that this isn’t related when prediction is our purpose. When ML researchers focus on the bias-variance tradeoff in predictive settings, they imply one thing fully totally different: bias of a linear predictive mannequin relative to the true conditional imply operate. OLS provides the very best linear approximation to (mathbb{E}[Y|X]), so it’s what we wish on this instance, since I stipulated we’d be working with linear fashions.

Take House Message

Causal inference and prediction are totally different objectives. Causality is about counterfactuals: what would occur if we intervened to vary somebody’s years of training? Prediction solutions a distinct query: if I observe that somebody has eight years of education, what’s my greatest guess of their wage? If you wish to predict, use OLS; if you wish to estimate a causal impact, use IV.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles