(newcommand{epsilonb}{boldsymbol{epsilon}}
newcommand{ebi}{boldsymbol{epsilon}_i}
newcommand{Sigmab}{boldsymbol{Sigma}}
newcommand{betab}{boldsymbol{beta}}
newcommand{eb}{{bf e}}
newcommand{xb}{{bf x}}
newcommand{zb}{{bf z}}
newcommand{yb}{{bf y}}
newcommand{Xb}{{bf X}}
newcommand{Mb}{{bf M}}
newcommand{Eb}{{bf E}}
newcommand{Xtb}{tilde{bf X}}
newcommand{Vb}{{bf V}})I current the formulation for computing the strange least-squares (OLS) estimator, and I focus on some do-file implementations of them. I focus on the formulation and the computation of independence-based commonplace errors, strong commonplace errors, and cluster-robust commonplace errors. I introduce the Stata matrix instructions and matrix capabilities that I take advantage of in ado-commands that I focus on in upcoming posts.
That is the fifth submit within the sequence Programming an estimation command in Stata. I like to recommend that you simply begin firstly. See Programming an estimation command in Stata: A map to posted entries for a map to all of the posts on this sequence.
OLS formulation
Recall that the OLS level estimates are given by
[
widehat{betab} =
left( sum_{i=1}^N xb_i’xb_i right)^{-1}
left(
sum_{i=1}^N xb_i’y_i
right)
]
the place ({bf x}_i) is the (1times ok) vector of unbiased variables, (y_i) is the dependent variable for every of the (N) pattern observations, and the mannequin for (y_i) is
[
y_i = xb_ibetab’ + epsilon_i
]
If the (epsilon_i) are independently and identically distributed, we estimate the variance-covariance matrix of the estimator (VCE) by
[
widehat{Vb} = widehat{s}
left( sum_{i=1}^N xb_i’xb_i right)^{-1}
]
the place (widehat{s} = 1/(N-k)sum_{i=1}^N e_i^2) and (e_i=y_i-{bf x}_iwidehat{{boldsymbol beta}}).
See Cameron and Trivedi (2005), Inventory and Watson (2010), or Wooldridge (2015) for introductions to OLS.
Stata matrix implementation
I take advantage of the matrix accum command to compute the sum of the merchandise over the observations. Typing
. matrix accum zpz = z1 z2 z3
places (left( sum_{i=1}^N {bf z}_i'{bf z}_i proper)) into the Stata matrix zpz, the place ({bf z}_i=( {tt z1}_i, {tt z2}_i, {tt z3}_i, 1)). The (1) seems as a result of matrix accum has included the fixed time period by default, like virtually all estimation instructions.
Beneath, I take advantage of matrix accum to compute (left( sum_{i=1}^N {bf z}_i'{bf z}_i proper)), which accommodates (left( sum_{i=1}^N {bf x}_i'{bf x}_i proper)) and (left( sum_{i=1}^N {bf x}_i’y_i proper)).
Instance 1: Utilizing matrix accum
. sysuse auto
(1978 Vehicle Information)
. matrix accum zpz = value mpg trunk
(obs=74)
. matrix listing zpz
symmetric zpz[4,4]
value mpg trunk _cons
value 3.448e+09
mpg 9132716 36008
trunk 6565725 20630 15340
_cons 456229 1576 1018 74
Now, I extract (left( sum_{i=1}^N {bf x}_i'{bf x}_i proper)) from rows 2–4 and columns 2–4 of zpz and (left( sum_{i=1}^N {bf x}_i’y_i proper)) from rows 2–4 and column 1 of zpz.
Instance 2: Extracting submatrices
. matrix xpx = zpz[2..4, 2..4]
. matrix xpy = zpz[2..4, 1]
. matrix listing xpx
symmetric xpx[3,3]
mpg trunk _cons
mpg 36008
trunk 20630 15340
_cons 1576 1018 74
. matrix listing xpy
xpy[3,1]
value
mpg 9132716
trunk 6565725
_cons 456229
I now compute (widehat{{boldsymbol beta}}) from the matrices fashioned in instance 2.
Instance 3: Computing (widehat{betab})
. matrix xpxi = invsym(xpx)
. matrix b = xpxi*xpy
. matrix listing b
b[3,1]
value
mpg -220.16488
trunk 43.55851
_cons 10254.95
. matrix b = b'
. matrix listing b
b[1,3]
mpg trunk _cons
value -220.16488 43.55851 10254.95
I transposed b to make it a row vector as a result of level estimates in Stata are saved as row vectors.
Instance 3 illustrates that the Stata matrix b accommodates the estimated coefficients and the names of the variables on which these values are estimated coefficients. To make clear, our mannequin is
[
Eb[{tt price}|{tt mpg}, {tt trunk} ] = {tt mpg}*beta_{tt mpg}
+ {tt trunk}*beta_{tt trunk} + {tt _cons}
]
and b accommodates the data that (-220.16) is the estimated coefficient on mpg, that (43.56) is the estimated coefficient on trunk, and that (10254.95) is the estimated fixed. We are able to compute the linear mixture (xb_iwidehat{betab}) over the observations utilizing the data in b, as a result of b accommodates each the worth and the title for every coefficient.
I take advantage of matrix rating to compute this linear mixture for every commentary, and I take advantage of generate to reiterate what this linear mixture is.
Instance 4: Utilizing matrix rating to compute (xb_iwidehat{betab}’)
. matrix rating double xbhat1 = b
. generate double xbhat2 = mpg*(-220.16488) + trunk*(43.55851) + 10254.95
. listing xbhat1 xbhat2 in 1/4
+-----------------------+
| xbhat1 xbhat2 |
|-----------------------|
1. | 5890.4661 5890.4663 |
2. | 6991.2905 6991.2907 |
3. | 5934.0246 5934.0248 |
4. | 6548.5884 6548.5886 |
+-----------------------+
I take advantage of the predictions for (Eb[{tt price}|{tt mpg}, {tt trunk} ]) in xbhat1 to compute the residuals and the estimated VCE.
Instance 5: Computing the estimated VCE
. generate double res = (value - xbhat1)
. generate double res2 = res^2
. summarize res2
Variable | Obs Imply Std. Dev. Min Max
-------------+---------------------------------------------------------
res2 | 74 6674851 1.30e+07 11.24372 9.43e+07
. return listing
scalars:
r(N) = 74
r(sum_w) = 74
r(imply) = 6674850.504745401
r(Var) = 168983977867533.1
r(sd) = 12999383.74952956
r(min) = 11.24371634723049
r(max) = 94250157.2111593
r(sum) = 493938937.3511598
. native N = r(N)
. native sum = r(sum)
. native s2 = `sum'/(`N'-3)
. matrix V = (`s2')*xpxi
(See Programing an estimation command in Stata: The place to retailer your stuff for discussions of utilizing outcomes from r-class instructions and utilizing native macros.)
I confirm that my computations for (widehat{betab}) and the VCE match these of regress.
Instance 6: Evaluating in opposition to regress
. regress value mpg trunk
Supply | SS df MS Variety of obs = 74
-------------+---------------------------------- F(2, 71) = 10.14
Mannequin | 141126459 2 70563229.4 Prob > F = 0.0001
Residual | 493938937 71 6956886.44 R-squared = 0.2222
-------------+---------------------------------- Adj R-squared = 0.2003
Complete | 635065396 73 8699525.97 Root MSE = 2637.6
------------------------------------------------------------------------------
value | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
mpg | -220.1649 65.59262 -3.36 0.001 -350.9529 -89.3769
trunk | 43.55851 88.71884 0.49 0.625 -133.3418 220.4589
_cons | 10254.95 2349.084 4.37 0.000 5571.01 14938.89
------------------------------------------------------------------------------
. matrix listing e(b)
e(b)[1,3]
mpg trunk _cons
y1 -220.16488 43.55851 10254.95
. matrix listing b
b[1,3]
mpg trunk _cons
value -220.16488 43.55851 10254.95
. matrix listing e(V)
symmetric e(V)[3,3]
mpg trunk _cons
mpg 4302.3924
trunk 3384.4186 7871.0326
_cons -138187.95 -180358.85 5518194.7
. matrix listing V
symmetric V[3,3]
mpg trunk _cons
mpg 4302.3924
trunk 3384.4186 7871.0326
_cons -138187.95 -180358.85 5518194.7
Strong commonplace errors
The regularly used strong estimator of the VCE is given by
[
widehat{V}_{robust}=frac{N}{N-k}
left( sum_{i=1}^N xb_i’xb_i right)^{-1}
Mb
left( sum_{i=1}^N xb_i’xb_i right)^{-1}
]
the place
[Mb=sum_{i=1}^N widehat{e}_i^2xb_i’xb_i]
See Cameron and Trivedi (2005), Inventory and Watson (2010), or Wooldridge (2015) for derivations and discussions.
matrix accum with weights (widehat{e}_i^2) computes the components for (Mb). Beneath, I take advantage of matrix accum to compute (Mb) and (widehat{V}_{strong})
Instance 7: A sturdy VCE
. matrix accum M = mpg trunk [iweight=res2] (obs=493938937.4) . matrix V2 = (`N'/(`N'-3))*xpxi*M*xpxi
I now confirm that my computations match these reported by regress.
Instance 8: Evaluating computations of sturdy VCE
. regress value mpg trunk, strong
Linear regression Variety of obs = 74
F(2, 71) = 11.59
Prob > F = 0.0000
R-squared = 0.2222
Root MSE = 2637.6
------------------------------------------------------------------------------
| Strong
value | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
mpg | -220.1649 72.45388 -3.04 0.003 -364.6338 -75.69595
trunk | 43.55851 71.4537 0.61 0.544 -98.91613 186.0331
_cons | 10254.95 2430.641 4.22 0.000 5408.39 15101.51
------------------------------------------------------------------------------
. matrix listing e(V)
symmetric e(V)[3,3]
mpg trunk _cons
mpg 5249.5646
trunk 3569.5316 5105.6316
_cons -169049.76 -147284.49 5908013.8
. matrix listing V2
symmetric V2[3,3]
mpg trunk _cons
mpg 5249.5646
trunk 3569.5316 5105.6316
_cons -169049.76 -147284.49 5908013.8
Cluster-robust commonplace errors
The cluster-robust estimator of the VCE is regularly used when the information have a panel construction, also called a longitudinal construction. This VCE accounts for the within-group correlation of the errors, and it’s given by
[
widehat{V}_{cluster}=frac{N-1}{N-k}frac{g}{g-1}
left( sum_{i=1}^N xb_i’xb_i right)^{-1}
Mb_c
left( sum_{i=1}^N xb_i’xb_i right)^{-1}
]
the place
[Mb_c=sum_{j=1}^g
Xb_j’
(widehat{eb}_j widehat{eb}_j’)
Xb_j ]
(Xb_j) is the (n_jtimes ok) matrix of observations on (xb_i) in group (j), (widehat{eb}_j) is the (n_jtimes 1) vector of residuals in group (j), and (g) is the variety of teams. See Cameron and Trivedi (2005), Wooldridge (2010), and [R] regress for derivations and discussions.
matrix opaccum computes the components for (Mb_c). Beneath, I create the group variable cvar from rep78 and use matrix opaccum to compute (Mb_c) and (widehat{V}_{cluster})
Instance 9: A cluster-robust VCE
. generate cvar = cond( lacking(rep78), 6, rep78)
. tab cvar
cvar | Freq. P.c Cum.
------------+-----------------------------------
1 | 2 2.70 2.70
2 | 8 10.81 13.51
3 | 30 40.54 54.05
4 | 18 24.32 78.38
5 | 11 14.86 93.24
6 | 5 6.76 100.00
------------+-----------------------------------
Complete | 74 100.00
. native Nc = r(r)
. kind cvar
. matrix opaccum M2 = mpg trunk , group(cvar) opvar(res)
. matrix V2 = ((`N'-1)/(`N'-3))*(`Nc'/(`Nc'-1))*xpxi*M2*xpxi
I now confirm that my computations match these reported by regress.
Instance 10: Evaluating computations of cluster-robust VCE
. regress value mpg trunk, vce(cluster cvar)
Linear regression Variety of obs = 74
F(2, 5) = 9.54
Prob > F = 0.0196
R-squared = 0.2222
Root MSE = 2637.6
(Std. Err. adjusted for six clusters in cvar)
------------------------------------------------------------------------------
| Strong
value | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
mpg | -220.1649 93.28127 -2.36 0.065 -459.952 19.62226
trunk | 43.55851 58.89644 0.74 0.493 -107.8396 194.9566
_cons | 10254.95 2448.547 4.19 0.009 3960.758 16549.14
------------------------------------------------------------------------------
. matrix listing e(V)
symmetric e(V)[3,3]
mpg trunk _cons
mpg 8701.3957
trunk 4053.5381 3468.7911
_cons -223021 -124190.97 5995384.3
. matrix listing V2
symmetric V2[3,3]
mpg trunk _cons
mpg 8701.3957
trunk 4053.5381 3468.7911
_cons -223021 -124190.97 5995384.3
Finished and undone
I reviewed the formulation that underlie the OLS estimator and confirmed how one can compute them utilizing Stata matrix instructions and capabilities. Within the subsequent two posts, I write an ado-command that implements these formulation.
References
Cameron, A. C., and P. Okay. Trivedi. 2005. Microeconometrics: Strategies and purposes. Cambridge: Cambridge College Press.
Inventory, J. H., and M. W. Watson. 2010. Introduction to Econometrics. third ed. Boston, MA: Addison Wesley New York.
Wooldridge, J. M. 2010. Econometric Evaluation of Cross Part and Panel Information. 2nd ed. Cambridge, Massachusetts: MIT Press.
Wooldridge, J. M. 2015. Introductory Econometrics: A Fashionable Strategy. sixth ed. Cincinnati, Ohio: South-Western.
