Tuesday, March 3, 2026

Programming an estimation command in Stata: Utilizing Stata matrix instructions and capabilities to compute OLS objects


(newcommand{epsilonb}{boldsymbol{epsilon}}
newcommand{ebi}{boldsymbol{epsilon}_i}
newcommand{Sigmab}{boldsymbol{Sigma}}
newcommand{betab}{boldsymbol{beta}}
newcommand{eb}{{bf e}}
newcommand{xb}{{bf x}}
newcommand{zb}{{bf z}}
newcommand{yb}{{bf y}}
newcommand{Xb}{{bf X}}
newcommand{Mb}{{bf M}}
newcommand{Eb}{{bf E}}
newcommand{Xtb}{tilde{bf X}}
newcommand{Vb}{{bf V}})I current the formulation for computing the strange least-squares (OLS) estimator, and I focus on some do-file implementations of them. I focus on the formulation and the computation of independence-based commonplace errors, strong commonplace errors, and cluster-robust commonplace errors. I introduce the Stata matrix instructions and matrix capabilities that I take advantage of in ado-commands that I focus on in upcoming posts.

That is the fifth submit within the sequence Programming an estimation command in Stata. I like to recommend that you simply begin firstly. See Programming an estimation command in Stata: A map to posted entries for a map to all of the posts on this sequence.

OLS formulation

Recall that the OLS level estimates are given by

[
widehat{betab} =
left( sum_{i=1}^N xb_i’xb_i right)^{-1}
left(
sum_{i=1}^N xb_i’y_i
right)
]

the place ({bf x}_i) is the (1times ok) vector of unbiased variables, (y_i) is the dependent variable for every of the (N) pattern observations, and the mannequin for (y_i) is

[
y_i = xb_ibetab’ + epsilon_i
]

If the (epsilon_i) are independently and identically distributed, we estimate the variance-covariance matrix of the estimator (VCE) by

[
widehat{Vb} = widehat{s}
left( sum_{i=1}^N xb_i’xb_i right)^{-1}
]

the place (widehat{s} = 1/(N-k)sum_{i=1}^N e_i^2) and (e_i=y_i-{bf x}_iwidehat{{boldsymbol beta}}).

See Cameron and Trivedi (2005), Inventory and Watson (2010), or Wooldridge (2015) for introductions to OLS.

Stata matrix implementation

I take advantage of the matrix accum command to compute the sum of the merchandise over the observations. Typing

.  matrix accum zpz = z1 z2 z3

places (left( sum_{i=1}^N {bf z}_i'{bf z}_i proper)) into the Stata matrix zpz, the place ({bf z}_i=( {tt z1}_i, {tt z2}_i, {tt z3}_i, 1)). The (1) seems as a result of matrix accum has included the fixed time period by default, like virtually all estimation instructions.

Beneath, I take advantage of matrix accum to compute (left( sum_{i=1}^N {bf z}_i'{bf z}_i proper)), which accommodates (left( sum_{i=1}^N {bf x}_i'{bf x}_i proper)) and (left( sum_{i=1}^N {bf x}_i’y_i proper)).

Instance 1: Utilizing matrix accum

. sysuse auto
(1978 Vehicle Information)

. matrix accum zpz = value mpg trunk
(obs=74)

. matrix listing zpz

symmetric zpz[4,4]
           value        mpg      trunk      _cons
value  3.448e+09
  mpg    9132716      36008
trunk    6565725      20630      15340
_cons     456229       1576       1018         74

Now, I extract (left( sum_{i=1}^N {bf x}_i'{bf x}_i proper)) from rows 2–4 and columns 2–4 of zpz and (left( sum_{i=1}^N {bf x}_i’y_i proper)) from rows 2–4 and column 1 of zpz.

Instance 2: Extracting submatrices

. matrix xpx       = zpz[2..4, 2..4]

. matrix xpy       = zpz[2..4, 1]

. matrix listing xpx

symmetric xpx[3,3]
         mpg  trunk  _cons
  mpg  36008
trunk  20630  15340
_cons   1576   1018     74

. matrix listing xpy

xpy[3,1]
         value
  mpg  9132716
trunk  6565725
_cons   456229

I now compute (widehat{{boldsymbol beta}}) from the matrices fashioned in instance 2.

Instance 3: Computing (widehat{betab})

. matrix xpxi      = invsym(xpx)

. matrix b         = xpxi*xpy

. matrix listing b

b[3,1]
            value
  mpg  -220.16488
trunk    43.55851
_cons    10254.95

. matrix b         = b'

. matrix listing b

b[1,3]
              mpg       trunk       _cons
value  -220.16488    43.55851    10254.95

I transposed b to make it a row vector as a result of level estimates in Stata are saved as row vectors.

Instance 3 illustrates that the Stata matrix b accommodates the estimated coefficients and the names of the variables on which these values are estimated coefficients. To make clear, our mannequin is
[
Eb[{tt price}|{tt mpg}, {tt trunk} ] = {tt mpg}*beta_{tt mpg}
+ {tt trunk}*beta_{tt trunk} + {tt _cons}
]

and b accommodates the data that (-220.16) is the estimated coefficient on mpg, that (43.56) is the estimated coefficient on trunk, and that (10254.95) is the estimated fixed. We are able to compute the linear mixture (xb_iwidehat{betab}) over the observations utilizing the data in b, as a result of b accommodates each the worth and the title for every coefficient.

I take advantage of matrix rating to compute this linear mixture for every commentary, and I take advantage of generate to reiterate what this linear mixture is.

Instance 4: Utilizing matrix rating to compute (xb_iwidehat{betab}’)

. matrix rating double xbhat1 = b

. generate     double xbhat2 = mpg*(-220.16488) + trunk*(43.55851) + 10254.95

. listing xbhat1 xbhat2 in 1/4

     +-----------------------+
     |    xbhat1      xbhat2 |
     |-----------------------|
  1. | 5890.4661   5890.4663 |
  2. | 6991.2905   6991.2907 |
  3. | 5934.0246   5934.0248 |
  4. | 6548.5884   6548.5886 |
     +-----------------------+

I take advantage of the predictions for (Eb[{tt price}|{tt mpg}, {tt trunk} ]) in xbhat1 to compute the residuals and the estimated VCE.

Instance 5: Computing the estimated VCE

. generate double res       = (value - xbhat1)

. generate double res2      = res^2

. summarize res2

    Variable |        Obs        Imply    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
        res2 |         74     6674851    1.30e+07   11.24372   9.43e+07

. return listing 

scalars:
                  r(N) =  74
              r(sum_w) =  74
               r(imply) =  6674850.504745401
                r(Var) =  168983977867533.1
                 r(sd) =  12999383.74952956
                r(min) =  11.24371634723049
                r(max) =  94250157.2111593
                r(sum) =  493938937.3511598

. native N                   = r(N)

. native sum                 = r(sum)

. native s2                  = `sum'/(`N'-3)

. matrix V                  = (`s2')*xpxi

(See Programing an estimation command in Stata: The place to retailer your stuff for discussions of utilizing outcomes from r-class instructions and utilizing native macros.)

I confirm that my computations for (widehat{betab}) and the VCE match these of regress.

Instance 6: Evaluating in opposition to regress

. regress value mpg trunk

      Supply |       SS           df       MS      Variety of obs   =        74
-------------+----------------------------------   F(2, 71)        =     10.14
       Mannequin |   141126459         2  70563229.4   Prob > F        =    0.0001
    Residual |   493938937        71  6956886.44   R-squared       =    0.2222
-------------+----------------------------------   Adj R-squared   =    0.2003
       Complete |   635065396        73  8699525.97   Root MSE        =    2637.6

------------------------------------------------------------------------------
       value |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         mpg |  -220.1649   65.59262    -3.36   0.001    -350.9529    -89.3769
       trunk |   43.55851   88.71884     0.49   0.625    -133.3418    220.4589
       _cons |   10254.95   2349.084     4.37   0.000      5571.01    14938.89
------------------------------------------------------------------------------

. matrix listing e(b)

e(b)[1,3]
           mpg       trunk       _cons
y1  -220.16488    43.55851    10254.95

. matrix listing b

b[1,3]
              mpg       trunk       _cons
value  -220.16488    43.55851    10254.95

. matrix listing e(V)

symmetric e(V)[3,3]
              mpg       trunk       _cons
  mpg   4302.3924
trunk   3384.4186   7871.0326
_cons  -138187.95  -180358.85   5518194.7

. matrix listing V

symmetric V[3,3]
              mpg       trunk       _cons
  mpg   4302.3924
trunk   3384.4186   7871.0326
_cons  -138187.95  -180358.85   5518194.7

Strong commonplace errors

The regularly used strong estimator of the VCE is given by

[
widehat{V}_{robust}=frac{N}{N-k}
left( sum_{i=1}^N xb_i’xb_i right)^{-1}
Mb
left( sum_{i=1}^N xb_i’xb_i right)^{-1}
]

the place

[Mb=sum_{i=1}^N widehat{e}_i^2xb_i’xb_i]

See Cameron and Trivedi (2005), Inventory and Watson (2010), or Wooldridge (2015) for derivations and discussions.

matrix accum with weights (widehat{e}_i^2) computes the components for (Mb). Beneath, I take advantage of matrix accum to compute (Mb) and (widehat{V}_{strong})

Instance 7: A sturdy VCE

. matrix accum M    = mpg trunk [iweight=res2]
(obs=493938937.4)

. matrix V2         = (`N'/(`N'-3))*xpxi*M*xpxi

I now confirm that my computations match these reported by regress.

Instance 8: Evaluating computations of sturdy VCE

. regress value mpg trunk, strong

Linear regression                               Variety of obs     =         74
                                                F(2, 71)          =      11.59
                                                Prob > F          =     0.0000
                                                R-squared         =     0.2222
                                                Root MSE          =     2637.6

------------------------------------------------------------------------------
             |               Strong
       value |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         mpg |  -220.1649   72.45388    -3.04   0.003    -364.6338   -75.69595
       trunk |   43.55851    71.4537     0.61   0.544    -98.91613    186.0331
       _cons |   10254.95   2430.641     4.22   0.000      5408.39    15101.51
------------------------------------------------------------------------------

. matrix listing e(V)

symmetric e(V)[3,3]
              mpg       trunk       _cons
  mpg   5249.5646
trunk   3569.5316   5105.6316
_cons  -169049.76  -147284.49   5908013.8

. matrix listing V2

symmetric V2[3,3]
              mpg       trunk       _cons
  mpg   5249.5646
trunk   3569.5316   5105.6316
_cons  -169049.76  -147284.49   5908013.8

Cluster-robust commonplace errors

The cluster-robust estimator of the VCE is regularly used when the information have a panel construction, also called a longitudinal construction. This VCE accounts for the within-group correlation of the errors, and it’s given by

[
widehat{V}_{cluster}=frac{N-1}{N-k}frac{g}{g-1}
left( sum_{i=1}^N xb_i’xb_i right)^{-1}
Mb_c
left( sum_{i=1}^N xb_i’xb_i right)^{-1}
]

the place

[Mb_c=sum_{j=1}^g
Xb_j’
(widehat{eb}_j widehat{eb}_j’)
Xb_j ]

(Xb_j) is the (n_jtimes ok) matrix of observations on (xb_i) in group (j), (widehat{eb}_j) is the (n_jtimes 1) vector of residuals in group (j), and (g) is the variety of teams. See Cameron and Trivedi (2005), Wooldridge (2010), and [R] regress for derivations and discussions.

matrix opaccum computes the components for (Mb_c). Beneath, I create the group variable cvar from rep78 and use matrix opaccum to compute (Mb_c) and (widehat{V}_{cluster})

Instance 9: A cluster-robust VCE

. generate cvar = cond( lacking(rep78), 6, rep78)

. tab cvar

       cvar |      Freq.     P.c        Cum.
------------+-----------------------------------
          1 |          2        2.70        2.70
          2 |          8       10.81       13.51
          3 |         30       40.54       54.05
          4 |         18       24.32       78.38
          5 |         11       14.86       93.24
          6 |          5        6.76      100.00
------------+-----------------------------------
      Complete |         74      100.00

. native Nc = r(r)

. kind cvar

. matrix opaccum M2     = mpg trunk , group(cvar) opvar(res)

. matrix V2          = ((`N'-1)/(`N'-3))*(`Nc'/(`Nc'-1))*xpxi*M2*xpxi

I now confirm that my computations match these reported by regress.

Instance 10: Evaluating computations of cluster-robust VCE

. regress value mpg trunk, vce(cluster cvar)

Linear regression                               Variety of obs     =         74
                                                F(2, 5)           =       9.54
                                                Prob > F          =     0.0196
                                                R-squared         =     0.2222
                                                Root MSE          =     2637.6

                                   (Std. Err. adjusted for six clusters in cvar)
------------------------------------------------------------------------------
             |               Strong
       value |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         mpg |  -220.1649   93.28127    -2.36   0.065     -459.952    19.62226
       trunk |   43.55851   58.89644     0.74   0.493    -107.8396    194.9566
       _cons |   10254.95   2448.547     4.19   0.009     3960.758    16549.14
------------------------------------------------------------------------------

. matrix listing e(V)

symmetric e(V)[3,3]
              mpg       trunk       _cons
  mpg   8701.3957
trunk   4053.5381   3468.7911
_cons     -223021  -124190.97   5995384.3

. matrix listing V2

symmetric V2[3,3]
              mpg       trunk       _cons
  mpg   8701.3957
trunk   4053.5381   3468.7911
_cons     -223021  -124190.97   5995384.3

Finished and undone

I reviewed the formulation that underlie the OLS estimator and confirmed how one can compute them utilizing Stata matrix instructions and capabilities. Within the subsequent two posts, I write an ado-command that implements these formulation.

References

Cameron, A. C., and P. Okay. Trivedi. 2005. Microeconometrics: Strategies and purposes. Cambridge: Cambridge College Press.

Inventory, J. H., and M. W. Watson. 2010. Introduction to Econometrics. third ed. Boston, MA: Addison Wesley New York.

Wooldridge, J. M. 2010. Econometric Evaluation of Cross Part and Panel Information. 2nd ed. Cambridge, Massachusetts: MIT Press.

Wooldridge, J. M. 2015. Introductory Econometrics: A Fashionable Strategy. sixth ed. Cincinnati, Ohio: South-Western.



Related Articles

Latest Articles