Econometrics Puzzler #2: Becoming a Regression with Fitted Values

October 17, 2025

79

Suppose I run a easy linear regression of an final result variable on a predictor variable. If I save the fitted values from this regression after which run a second regression of the end result variable on the fitted values, what is going to I get? For further credit score: how will the R-squared from the second regression evaluate to that from the primary regression?

Instance: Peak and Handspan

Right here’s a easy instance: a regression of top, measured in inches, on handspan, measured in centimeters.

library(tidyverse)
library(broom)
dat <- read_csv('https://ditraglia.com/knowledge/height-handspan.csv')

ggplot(dat, aes(y = top, x = handspan)) +
  geom_point(alpha = 0.2) +
  geom_smooth(methodology = "lm", coloration = "purple") +
  labs(y = "Peak (in)", x = "Handspan (cm)")

# Match the regression
reg1 <- lm(top ~ handspan, knowledge = dat)
tidy(reg1)

## # A tibble: 2 × 5
##   time period        estimate std.error statistic  p.worth
##                           
## 1 (Intercept)    40.9     1.67        24.5 9.19e-76
## 2 handspan        1.27    0.0775      16.3 3.37e-44

As anticipated, larger individuals are larger in all dimensions, on common, so we see a optimistic relationship between handspan and top. Now let’s save the fitted values from this regression and run a second regression of top on the fitted values:

dat <- reg1 |> 
  increase(dat)
reg2 <- lm(top ~ .fitted, knowledge = dat)
tidy(reg2)

## # A tibble: 2 × 5
##   time period         estimate std.error statistic   p.worth
##                             
## 1 (Intercept) -1.76e-13    4.17   -4.23e-14 1.000e+ 0
## 2 .fitted      1.00e+ 0    0.0612  1.63e+ 1 3.37 e-44

The intercept isn’t fairly zero, however it’s about as shut as we will moderately anticipate to get on a pc and the slope is precisely one. Now how in regards to the R-squared? Let’s examine:

look(reg1)

## # A tibble: 1 × 12
##   r.squared adj.r.squared sigma statistic  p.worth    df logLik   AIC   BIC
##                               
## 1     0.452         0.450  3.02      267. 3.37e-44     1  -822. 1650. 1661.
## # ℹ 3 extra variables: deviance , df.residual , nobs

look(reg2)

## # A tibble: 1 × 12
##   r.squared adj.r.squared sigma statistic  p.worth    df logLik   AIC   BIC
##                               
## 1     0.452         0.450  3.02      267. 3.37e-44     1  -822. 1650. 1661.
## # ℹ 3 extra variables: deviance , df.residual , nobs

The R-squared values from the 2 regressions are similar! Stunned? Now’s your final likelihood to assume it by way of by yourself earlier than I give my resolution.

Resolution

Suppose we wished to decide on (alpha_0) and (alpha_1) to attenuate (sum_{i=1}^n (Y_i – alpha_0 – alpha_1 widehat{Y}_i)^2) the place (widehat{Y}_i = widehat{beta}_0 + widehat{beta}_1 X_i). That is equal to minimizing
[
sum_{i=1}^n left[Y_i – (alpha_0 + alpha_1 widehat{beta}_0) – (alpha_1widehat{beta}_1)X_iright]^2.
]
By building (widehat{beta}_0) and (widehat{beta}_1) reduce (sum_{i=1}^n (Y_i – beta_0 – beta_1 X_i)^2), so until (widehat{alpha_0} = 0) and (widehat{alpha_1} = 1) we’d have a contradiction!

Comparable reasoning explains why the R-squared values for the 2 regressions are the identical. The R-squared of a regression equals (1 – textual content{SS}_{textual content{residual}} / textual content{SS}_{textual content{whole}})
[
text{SS}_{text{total}} = sum_{i=1}^n (Y_i – bar{Y})^2,quad
text{SS}_{text{residual}} = sum_{i=1}^n (Y_i – widehat{Y}_i)^2
]
The whole sum of squares is similar for each regressions as a result of they’ve the identical final result variable. The residual sum of squares is similar as a result of (widehat{alpha}_0 = 0) and (widehat{alpha}_1 = 1) collectively suggest that each regressions have the identical fitted values.

Right here I targeted on the case of a easy linear regression, one with a single predictor variable, however the identical primary concept holds normally.

Econometrics Puzzler #2: Becoming a Regression with Fitted Values

Instance: Peak and Handspan

Resolution

Related Articles

Historic ‘soiled dishes’ might have led archaeologists astray for many years

How Confessions Can Hold Language Fashions Sincere?

Emergent Introspective Consciousness in Giant Language Fashions

LEAVE A REPLY Cancel reply

Latest Articles

Historic ‘soiled dishes’ might have led archaeologists astray for many years

How Confessions Can Hold Language Fashions Sincere?

Emergent Introspective Consciousness in Giant Language Fashions

If you’d like your 2026 health objectives to succeed, you will begin them now

The Finest Printers for Dwelling and Workplace: Brother, HP, and Extra