Sunday, October 19, 2025

Econometrics Puzzler #2: Becoming a Regression with Fitted Values


Suppose I run a easy linear regression of an final result variable on a predictor variable. If I save the fitted values from this regression after which run a second regression of the end result variable on the fitted values, what is going to I get? For further credit score: how will the R-squared from the second regression evaluate to that from the primary regression?

Instance: Peak and Handspan

Right here’s a easy instance: a regression of top, measured in inches, on handspan, measured in centimeters.

library(tidyverse)
library(broom)
dat <- read_csv('https://ditraglia.com/knowledge/height-handspan.csv')

ggplot(dat, aes(y = top, x = handspan)) +
  geom_point(alpha = 0.2) +
  geom_smooth(methodology = "lm", coloration = "purple") +
  labs(y = "Peak (in)", x = "Handspan (cm)")

# Match the regression
reg1 <- lm(top ~ handspan, knowledge = dat)
tidy(reg1)
## # A tibble: 2 × 5
##   time period        estimate std.error statistic  p.worth
##                           
## 1 (Intercept)    40.9     1.67        24.5 9.19e-76
## 2 handspan        1.27    0.0775      16.3 3.37e-44

As anticipated, larger individuals are larger in all dimensions, on common, so we see a optimistic relationship between handspan and top. Now let’s save the fitted values from this regression and run a second regression of top on the fitted values:

dat <- reg1 |> 
  increase(dat)
reg2 <- lm(top ~ .fitted, knowledge = dat)
tidy(reg2)
## # A tibble: 2 × 5
##   time period         estimate std.error statistic   p.worth
##                             
## 1 (Intercept) -1.76e-13    4.17   -4.23e-14 1.000e+ 0
## 2 .fitted      1.00e+ 0    0.0612  1.63e+ 1 3.37 e-44

The intercept isn’t fairly zero, however it’s about as shut as we will moderately anticipate to get on a pc and the slope is precisely one. Now how in regards to the R-squared? Let’s examine:

look(reg1)
## # A tibble: 1 × 12
##   r.squared adj.r.squared sigma statistic  p.worth    df logLik   AIC   BIC
##                               
## 1     0.452         0.450  3.02      267. 3.37e-44     1  -822. 1650. 1661.
## # ℹ 3 extra variables: deviance , df.residual , nobs 
look(reg2)
## # A tibble: 1 × 12
##   r.squared adj.r.squared sigma statistic  p.worth    df logLik   AIC   BIC
##                               
## 1     0.452         0.450  3.02      267. 3.37e-44     1  -822. 1650. 1661.
## # ℹ 3 extra variables: deviance , df.residual , nobs 

The R-squared values from the 2 regressions are similar! Stunned? Now’s your final likelihood to assume it by way of by yourself earlier than I give my resolution.

Resolution

Suppose we wished to decide on (alpha_0) and (alpha_1) to attenuate (sum_{i=1}^n (Y_i – alpha_0 – alpha_1 widehat{Y}_i)^2) the place (widehat{Y}_i = widehat{beta}_0 + widehat{beta}_1 X_i). That is equal to minimizing
[
sum_{i=1}^n left[Y_i – (alpha_0 + alpha_1 widehat{beta}_0) – (alpha_1widehat{beta}_1)X_iright]^2.
]

By building (widehat{beta}_0) and (widehat{beta}_1) reduce (sum_{i=1}^n (Y_i – beta_0 – beta_1 X_i)^2), so until (widehat{alpha_0} = 0) and (widehat{alpha_1} = 1) we’d have a contradiction!

Comparable reasoning explains why the R-squared values for the 2 regressions are the identical. The R-squared of a regression equals (1 – textual content{SS}_{textual content{residual}} / textual content{SS}_{textual content{whole}})
[
text{SS}_{text{total}} = sum_{i=1}^n (Y_i – bar{Y})^2,quad
text{SS}_{text{residual}} = sum_{i=1}^n (Y_i – widehat{Y}_i)^2
]

The whole sum of squares is similar for each regressions as a result of they’ve the identical final result variable. The residual sum of squares is similar as a result of (widehat{alpha}_0 = 0) and (widehat{alpha}_1 = 1) collectively suggest that each regressions have the identical fitted values.

Right here I targeted on the case of a easy linear regression, one with a single predictor variable, however the identical primary concept holds normally.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles