Programming an estimation command in Stata: Utilizing Stata matrix instructions and capabilities to compute OLS objects

March 3, 2026

1

(newcommand{epsilonb}{boldsymbol{epsilon}}
newcommand{ebi}{boldsymbol{epsilon}_i}
newcommand{Sigmab}{boldsymbol{Sigma}}
newcommand{betab}{boldsymbol{beta}}
newcommand{eb}{{bf e}}
newcommand{xb}{{bf x}}
newcommand{zb}{{bf z}}
newcommand{yb}{{bf y}}
newcommand{Xb}{{bf X}}
newcommand{Mb}{{bf M}}
newcommand{Eb}{{bf E}}
newcommand{Xtb}{tilde{bf X}}
newcommand{Vb}{{bf V}})I current the formulation for computing the strange least-squares (OLS) estimator, and I focus on some do-file implementations of them. I focus on the formulation and the computation of independence-based commonplace errors, strong commonplace errors, and cluster-robust commonplace errors. I introduce the Stata matrix instructions and matrix capabilities that I take advantage of in ado-commands that I focus on in upcoming posts.

That is the fifth submit within the sequence Programming an estimation command in Stata. I like to recommend that you simply begin firstly. See Programming an estimation command in Stata: A map to posted entries for a map to all of the posts on this sequence.

OLS formulation

Recall that the OLS level estimates are given by

[
widehat{betab} =
left( sum_{i=1}^N xb_i’xb_i right)^{-1}
left(
sum_{i=1}^N xb_i’y_i
right)
]

the place ({bf x}_i) is the (1times ok) vector of unbiased variables, (y_i) is the dependent variable for every of the (N) pattern observations, and the mannequin for (y_i) is

[
y_i = xb_ibetab’ + epsilon_i
]

If the (epsilon_i) are independently and identically distributed, we estimate the variance-covariance matrix of the estimator (VCE) by

[
widehat{Vb} = widehat{s}
left( sum_{i=1}^N xb_i’xb_i right)^{-1}
]

the place (widehat{s} = 1/(N-k)sum_{i=1}^N e_i^2) and (e_i=y_i-{bf x}_iwidehat{{boldsymbol beta}}).

See Cameron and Trivedi (2005), Inventory and Watson (2010), or Wooldridge (2015) for introductions to OLS.

Stata matrix implementation

I take advantage of the matrix accum command to compute the sum of the merchandise over the observations. Typing

.  matrix accum zpz = z1 z2 z3

places (left( sum_{i=1}^N {bf z}_i'{bf z}_i proper)) into the Stata matrix zpz, the place ({bf z}_i=( {tt z1}_i, {tt z2}_i, {tt z3}_i, 1)). The (1) seems as a result of matrix accum has included the fixed time period by default, like virtually all estimation instructions.

Beneath, I take advantage of matrix accum to compute (left( sum_{i=1}^N {bf z}_i'{bf z}_i proper)), which accommodates (left( sum_{i=1}^N {bf x}_i'{bf x}_i proper)) and (left( sum_{i=1}^N {bf x}_i’y_i proper)).

Instance 1: Utilizing matrix accum

. sysuse auto
(1978 Vehicle Information)

. matrix accum zpz = value mpg trunk
(obs=74)

. matrix listing zpz

symmetric zpz[4,4]
           value        mpg      trunk      _cons
value  3.448e+09
  mpg    9132716      36008
trunk    6565725      20630      15340
_cons     456229       1576       1018         74

Now, I extract (left( sum_{i=1}^N {bf x}_i'{bf x}_i proper)) from rows 2–4 and columns 2–4 of zpz and (left( sum_{i=1}^N {bf x}_i’y_i proper)) from rows 2–4 and column 1 of zpz.

Instance 2: Extracting submatrices

. matrix xpx       = zpz[2..4, 2..4]

. matrix xpy       = zpz[2..4, 1]

. matrix listing xpx

symmetric xpx[3,3]
         mpg  trunk  _cons
  mpg  36008
trunk  20630  15340
_cons   1576   1018     74

. matrix listing xpy

xpy[3,1]
         value
  mpg  9132716
trunk  6565725
_cons   456229

I now compute (widehat{{boldsymbol beta}}) from the matrices fashioned in instance 2.

Instance 3: Computing (widehat{betab})

. matrix xpxi      = invsym(xpx)

. matrix b         = xpxi*xpy

. matrix listing b

b[3,1]
            value
  mpg  -220.16488
trunk    43.55851
_cons    10254.95

. matrix b         = b'

. matrix listing b

b[1,3]
              mpg       trunk       _cons
value  -220.16488    43.55851    10254.95

I transposed b to make it a row vector as a result of level estimates in Stata are saved as row vectors.

Instance 3 illustrates that the Stata matrix b accommodates the estimated coefficients and the names of the variables on which these values are estimated coefficients. To make clear, our mannequin is
[
Eb[{tt price}|{tt mpg}, {tt trunk} ] = {tt mpg}*beta_{tt mpg}
+ {tt trunk}*beta_{tt trunk} + {tt _cons}
]

and b accommodates the data that (-220.16) is the estimated coefficient on mpg, that (43.56) is the estimated coefficient on trunk, and that (10254.95) is the estimated fixed. We are able to compute the linear mixture (xb_iwidehat{betab}) over the observations utilizing the data in b, as a result of b accommodates each the worth and the title for every coefficient.

I take advantage of matrix rating to compute this linear mixture for every commentary, and I take advantage of generate to reiterate what this linear mixture is.

Instance 4: Utilizing matrix rating to compute (xb_iwidehat{betab}’)

. matrix rating double xbhat1 = b

. generate     double xbhat2 = mpg*(-220.16488) + trunk*(43.55851) + 10254.95

. listing xbhat1 xbhat2 in 1/4

     +-----------------------+
     |    xbhat1      xbhat2 |
     |-----------------------|
  1. | 5890.4661   5890.4663 |
  2. | 6991.2905   6991.2907 |
  3. | 5934.0246   5934.0248 |
  4. | 6548.5884   6548.5886 |
     +-----------------------+

I take advantage of the predictions for (Eb[{tt price}|{tt mpg}, {tt trunk} ]) in xbhat1 to compute the residuals and the estimated VCE.

Instance 5: Computing the estimated VCE

. generate double res       = (value - xbhat1)

. generate double res2      = res^2

. summarize res2

    Variable |        Obs        Imply    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
        res2 |         74     6674851    1.30e+07   11.24372   9.43e+07

. return listing 

scalars:
                  r(N) =  74
              r(sum_w) =  74
               r(imply) =  6674850.504745401
                r(Var) =  168983977867533.1
                 r(sd) =  12999383.74952956
                r(min) =  11.24371634723049
                r(max) =  94250157.2111593
                r(sum) =  493938937.3511598

. native N                   = r(N)

. native sum                 = r(sum)

. native s2                  = `sum'/(`N'-3)

. matrix V                  = (`s2')*xpxi

(See Programing an estimation command in Stata: The place to retailer your stuff for discussions of utilizing outcomes from r-class instructions and utilizing native macros.)

I confirm that my computations for (widehat{betab}) and the VCE match these of regress.

Instance 6: Evaluating in opposition to regress

. regress value mpg trunk

      Supply |       SS           df       MS      Variety of obs   =        74
-------------+----------------------------------   F(2, 71)        =     10.14
       Mannequin |   141126459         2  70563229.4   Prob > F        =    0.0001
    Residual |   493938937        71  6956886.44   R-squared       =    0.2222
-------------+----------------------------------   Adj R-squared   =    0.2003
       Complete |   635065396        73  8699525.97   Root MSE        =    2637.6

------------------------------------------------------------------------------
       value |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         mpg |  -220.1649   65.59262    -3.36   0.001    -350.9529    -89.3769
       trunk |   43.55851   88.71884     0.49   0.625    -133.3418    220.4589
       _cons |   10254.95   2349.084     4.37   0.000      5571.01    14938.89
------------------------------------------------------------------------------

. matrix listing e(b)

e(b)[1,3]
           mpg       trunk       _cons
y1  -220.16488    43.55851    10254.95

. matrix listing b

b[1,3]
              mpg       trunk       _cons
value  -220.16488    43.55851    10254.95

. matrix listing e(V)

symmetric e(V)[3,3]
              mpg       trunk       _cons
  mpg   4302.3924
trunk   3384.4186   7871.0326
_cons  -138187.95  -180358.85   5518194.7

. matrix listing V

symmetric V[3,3]
              mpg       trunk       _cons
  mpg   4302.3924
trunk   3384.4186   7871.0326
_cons  -138187.95  -180358.85   5518194.7

Strong commonplace errors

The regularly used strong estimator of the VCE is given by

[
widehat{V}_{robust}=frac{N}{N-k}
left( sum_{i=1}^N xb_i’xb_i right)^{-1}
Mb
left( sum_{i=1}^N xb_i’xb_i right)^{-1}
]

the place

[Mb=sum_{i=1}^N widehat{e}_i^2xb_i’xb_i]

See Cameron and Trivedi (2005), Inventory and Watson (2010), or Wooldridge (2015) for derivations and discussions.

matrix accum with weights (widehat{e}_i^2) computes the components for (Mb). Beneath, I take advantage of matrix accum to compute (Mb) and (widehat{V}_{strong})

Instance 7: A sturdy VCE

. matrix accum M    = mpg trunk [iweight=res2]
(obs=493938937.4)

. matrix V2         = (`N'/(`N'-3))*xpxi*M*xpxi

I now confirm that my computations match these reported by regress.

Instance 8: Evaluating computations of sturdy VCE

. regress value mpg trunk, strong

Linear regression                               Variety of obs     =         74
                                                F(2, 71)          =      11.59
                                                Prob > F          =     0.0000
                                                R-squared         =     0.2222
                                                Root MSE          =     2637.6

------------------------------------------------------------------------------
             |               Strong
       value |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         mpg |  -220.1649   72.45388    -3.04   0.003    -364.6338   -75.69595
       trunk |   43.55851    71.4537     0.61   0.544    -98.91613    186.0331
       _cons |   10254.95   2430.641     4.22   0.000      5408.39    15101.51
------------------------------------------------------------------------------

. matrix listing e(V)

symmetric e(V)[3,3]
              mpg       trunk       _cons
  mpg   5249.5646
trunk   3569.5316   5105.6316
_cons  -169049.76  -147284.49   5908013.8

. matrix listing V2

symmetric V2[3,3]
              mpg       trunk       _cons
  mpg   5249.5646
trunk   3569.5316   5105.6316
_cons  -169049.76  -147284.49   5908013.8

Cluster-robust commonplace errors

The cluster-robust estimator of the VCE is regularly used when the information have a panel construction, also called a longitudinal construction. This VCE accounts for the within-group correlation of the errors, and it’s given by

[
widehat{V}_{cluster}=frac{N-1}{N-k}frac{g}{g-1}
left( sum_{i=1}^N xb_i’xb_i right)^{-1}
Mb_c
left( sum_{i=1}^N xb_i’xb_i right)^{-1}
]

the place

[Mb_c=sum_{j=1}^g
Xb_j’
(widehat{eb}_j widehat{eb}_j’)
Xb_j ]

(Xb_j) is the (n_jtimes ok) matrix of observations on (xb_i) in group (j), (widehat{eb}_j) is the (n_jtimes 1) vector of residuals in group (j), and (g) is the variety of teams. See Cameron and Trivedi (2005), Wooldridge (2010), and [R] regress for derivations and discussions.

matrix opaccum computes the components for (Mb_c). Beneath, I create the group variable cvar from rep78 and use matrix opaccum to compute (Mb_c) and (widehat{V}_{cluster})

Instance 9: A cluster-robust VCE

. generate cvar = cond( lacking(rep78), 6, rep78)

. tab cvar

       cvar |      Freq.     P.c        Cum.
------------+-----------------------------------
          1 |          2        2.70        2.70
          2 |          8       10.81       13.51
          3 |         30       40.54       54.05
          4 |         18       24.32       78.38
          5 |         11       14.86       93.24
          6 |          5        6.76      100.00
------------+-----------------------------------
      Complete |         74      100.00

. native Nc = r(r)

. kind cvar

. matrix opaccum M2     = mpg trunk , group(cvar) opvar(res)

. matrix V2          = ((`N'-1)/(`N'-3))*(`Nc'/(`Nc'-1))*xpxi*M2*xpxi

I now confirm that my computations match these reported by regress.

Instance 10: Evaluating computations of cluster-robust VCE

. regress value mpg trunk, vce(cluster cvar)

Linear regression                               Variety of obs     =         74
                                                F(2, 5)           =       9.54
                                                Prob > F          =     0.0196
                                                R-squared         =     0.2222
                                                Root MSE          =     2637.6

                                   (Std. Err. adjusted for six clusters in cvar)
------------------------------------------------------------------------------
             |               Strong
       value |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         mpg |  -220.1649   93.28127    -2.36   0.065     -459.952    19.62226
       trunk |   43.55851   58.89644     0.74   0.493    -107.8396    194.9566
       _cons |   10254.95   2448.547     4.19   0.009     3960.758    16549.14
------------------------------------------------------------------------------

. matrix listing e(V)

symmetric e(V)[3,3]
              mpg       trunk       _cons
  mpg   8701.3957
trunk   4053.5381   3468.7911
_cons     -223021  -124190.97   5995384.3

. matrix listing V2

symmetric V2[3,3]
              mpg       trunk       _cons
  mpg   8701.3957
trunk   4053.5381   3468.7911
_cons     -223021  -124190.97   5995384.3

Finished and undone

I reviewed the formulation that underlie the OLS estimator and confirmed how one can compute them utilizing Stata matrix instructions and capabilities. Within the subsequent two posts, I write an ado-command that implements these formulation.

References

Cameron, A. C., and P. Okay. Trivedi. 2005. Microeconometrics: Strategies and purposes. Cambridge: Cambridge College Press.

Inventory, J. H., and M. W. Watson. 2010. Introduction to Econometrics. third ed. Boston, MA: Addison Wesley New York.

Wooldridge, J. M. 2010. Econometric Evaluation of Cross Part and Panel Information. 2nd ed. Cambridge, Massachusetts: MIT Press.

Wooldridge, J. M. 2015. Introductory Econometrics: A Fashionable Strategy. sixth ed. Cincinnati, Ohio: South-Western.

Programming an estimation command in Stata: Utilizing Stata matrix instructions and capabilities to compute OLS objects

Related Articles

10 Issues To Know About Apple’s New M5 Professional and M5 Max MacBook Professionals

Battle of AI Coding Brokers in 2026

Architecting for AI-driven development

Latest Articles

10 Issues To Know About Apple’s New M5 Professional and M5 Max MacBook Professionals

Battle of AI Coding Brokers in 2026

Architecting for AI-driven development

Google Drops Gemini 3.1 Flash-Lite: A Price-efficient Powerhouse with Adjustable Considering Ranges Designed for Excessive-Scale Manufacturing AI

Gemini 3.1 Flash-Lite is the quick provide help to want when you’re a dev with advanced knowledge