Monday, February 16, 2026

Programming an estimation command in Stata: Including sturdy and cluster-robust VCEs to our Mata-based OLS command


I present the best way to use the undocumented command _vce_parse to parse the choices for sturdy or cluster-robust estimators of the variance-covariance of the estimator (VCE). I then focus on myregress12.ado, which performs its computations in Mata and computes VCE estimators primarily based on independently and identically distributed (IID) observations, sturdy strategies, or cluster-robust strategies.

myregress12.ado performs unusual least-squares (OLS) regression, and it extends myregress11.ado, which I mentioned in Programming an estimation command in Stata: An OLS command utilizing Mata. To get essentially the most out of this put up, you have to be acquainted with Programming an estimation command in Stata: Utilizing a subroutine to parse a posh possibility and Programming an estimation command in Stata: Computing OLS objects in Mata.

That is the sixteenth put up within the collection Programming an estimation command in Stata. I like to recommend that you simply begin initially. See Programming an estimation command in Stata: A map to posted entries for a map to all of the posts on this collection.

Parsing the vce() possibility

I used ado-subroutines to simplify the parsing of the choices vce(sturdy) and vce(cluster cvarname) in myregress10.ado; see Programming an estimation command in Stata: Utilizing a subroutine to parse a posh possibility. A part of the purpose was to illustrate the best way to write ado-subroutines and the programming methods that I utilized in these subroutines.

Right here I take advantage of the undocumented command _vce_parse to simplify the parsing. There are various undocumented instructions designed to assist Stata programmers. They’re undocumented in that they’re tersely documented within the system assist however not documented within the manuals. As well as, the syntax or conduct of those instructions might change over Stata releases, though this not often occurs.

_vce_parse helps Stata programmers parse the vce() possibility. To see the way it works, contemplate the issue of parsing the syntax of myregress12.

myregress12 depvar [indepvars] [if] [in] [, vce(robust | cluster clustervar) noconstant]

the place indepvars can include issue variables or time-series variables.

I can use the syntax command to place regardless of the person specifies within the possibility vce() into the native macro vce, however I nonetheless should (1) test that what was specified is smart and (2) create native macros that the code can use to do the best factor. Examples 1–7 create the native macro vce, simulating what syntax would do, after which use _vce_robust to carry out duties (1) and (2).

I start with the case during which the person specified vce(sturdy); right here the native macro vce would include the phrase sturdy.

Instance 1: parsing vce(sturdy)


. clear all

. sysuse auto
(1978 Vehicle Knowledge)

. native vce "sturdy"

. _vce_parse , optlist(Strong) argoptlist(CLuster) : , vce(`vce')

. return record

macros:
             r(sturdy) : "sturdy"
             r(vceopt) : "vce(sturdy)"
                r(vce) : "sturdy"

The command

_vce_parse , optlist(Strong) argoptlist(CLuster) : , vce(`vce’)

has two items. The piece earlier than the colon (:) specifies the principles; the piece after the colon specifies what the person typed. Each bit can have a Stata object adopted by some choices; word the commas earlier than optlist(Strong) and earlier than vce(`vce’). Within the case at hand, the second piece solely incorporates what the person specified – vce(sturdy) – and the primary piece solely incorporates the choices optlist(Strong) and argoptlist(CLuster). The choice optlist(Strong) specifies that the vce() possibility within the second piece might include the choice sturdy and that its minimal abbreviation is r. Notice how the phrase Strong in optlist(Strong) mimics how syntax specifies minimal abbreviations. The choice argoptlist(CLuster) specifies that the vce() possibility within the second piece might include cluster clustervar, that the minimal abbreviation of cluster is cl, and that it’s going to put the argument clustervar into an area macro.

After the command,

_vce_parse , optlist(Strong) argoptlist(CLuster) : , vce(`vce’)

I take advantage of return record to point out what _vce_parse saved in r(). As a result of native macro vce incorporates “sturdy”, _vce_parse

  1. places the phrase sturdy within the native macro r(sturdy);
  2. places what the person typed, vce(sturdy), within the native macro r(vceopt); and
  3. places the kind of VCE, sturdy, within the native macro r(vce).

Examples 2 and three illustrate that _vce_parse shops the identical values in these native macros when the person specifies vce(rob) or vce(r), that are legitimate abbreviations for vce(sturdy).

Instance 2: parsing vce(rob)


. native vce "rob"

. _vce_parse , optlist(Strong) argoptlist(CLuster) : , vce(`vce')

. return record

macros:
             r(sturdy) : "sturdy"
             r(vceopt) : "vce(sturdy)"
                r(vce) : "sturdy"

Instance 3: parsing vce(r)


. native vce "r"

. _vce_parse , optlist(Strong) argoptlist(CLuster) : , vce(`vce')

. return record

macros:
             r(sturdy) : "sturdy"
             r(vceopt) : "vce(sturdy)"
                r(vce) : "sturdy"

Now, contemplate parsing the choice vce(cluster clustervar). As a result of the cluster variable clustervar might include lacking values, _vce_parse might must replace a sample-identification variable earlier than it shops the identify of the cluster variable in an area macro. In instance 4, I take advantage of the command

_vce_parse mytouse, optlist(Strong) argoptlist(CLuster) : , vce(`vce’)

to deal with the case when the person specifies vce(cluster rep78). The outcomes from the tabulate and summarize instructions illustrate that _vce_parse updates the sample-identification variable mytouse to account for the lacking observations in rep78.

Instance 4: parsing vce(cluster rep78)


. generate byte mytouse = 1

. tabulate mytouse

    mytouse |      Freq.     %        Cum.
------------+-----------------------------------
          1 |         74      100.00      100.00
------------+-----------------------------------
      Whole |         74      100.00

. summarize rep78

    Variable |        Obs        Imply    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
       rep78 |         69    3.405797    .9899323          1          5

. native vce "cluster rep78"

. _vce_parse mytouse, optlist(Strong) argoptlist(CLuster) : , vce(`vce')

. return record

macros:
             r(sturdy) : "sturdy"
            r(cluster) : "rep78"
             r(vceopt) : "vce(cluster rep78)"
            r(vceargs) : "rep78"
                r(vce) : "cluster"

. tabulate mytouse

    mytouse |      Freq.     %        Cum.
------------+-----------------------------------
          0 |          5        6.76        6.76
          1 |         69       93.24      100.00
------------+-----------------------------------
      Whole |         74      100.00

I take advantage of return record to point out what _vce_parse saved in r(). As a result of native macro vce incorporates cluster rep78, _vce_parse

  1. places the phrase sturdy within the native macro r(sturdy);
  2. places the identify of the cluster variable, rep78, within the native macro r(cluster);
  3. places what the person typed, vce(cluster rep78), within the native macro r(vceopt);
  4. places the argument to the cluster possibility, rep78, within the native macro r(vceargs); and
  5. places the kind of VCE, cluster, within the native macro r(vce).

Examples 5 and 6 illustrate that _vce_parse shops the identical values in these native macros when the person specifies vce(clus rep78) or vce(cl rep78), that are legitimate abbreviations for vce(cluster rep78).

Instance 5: parsing vce(clus rep78)


. native vce "clus rep78"

. _vce_parse mytouse, optlist(Strong) argoptlist(CLuster) : , vce(`vce')

. return record

macros:
             r(sturdy) : "sturdy"
            r(cluster) : "rep78"
             r(vceopt) : "vce(cluster rep78)"
            r(vceargs) : "rep78"
                r(vce) : "cluster"

Instance 6: parsing vce(cl rep78)


. native vce "cl rep78"

. _vce_parse mytouse, optlist(Strong) argoptlist(CLuster) : , vce(`vce')

. return record

macros:
             r(sturdy) : "sturdy"
            r(cluster) : "rep78"
             r(vceopt) : "vce(cluster rep78)"
            r(vceargs) : "rep78"
                r(vce) : "cluster"

Having illustrated the best way to make _vce_parse deal with the circumstances when the person specifies one thing legitimate, I’ll present in instance 7 that it’s going to additionally produce a regular error message when the person specifies an error situation.

Instance 7: parsing vce(foolish)


. native vce "foolish"

. seize noisily _vce_parse mytouse, optlist(Strong) argoptlist(CLuster) : , vc
> e(`vce')
vcetype 'foolish' not allowed

. return record

_vce_parse can parse different forms of vce() choices; to see them kind assist _vce_parse.

Additionally, keep in mind to kind undocumented if you end up on the lookout for a programmer’s instrument.

The code for myregress12

Right here is the code for myregress12.ado, which makes use of _vce_parse. I describe the way it works beneath.

I like to recommend that you simply click on on the file identify to obtain the code for my myregress12.ado. To keep away from scrolling, view the code within the do-file editor, or your favourite textual content editor, to see the road numbers.

Code block 1: myregress12.ado


*! model 12.0.0  16Jan2016
program outline myregress12, eclass sortpreserve
    model 14.1

    syntax varlist(numeric ts fv) [if] [in] [, noCONStant vce(string) ]
    marksample touse

    _vce_parse `touse' , optlist(Strong) argoptlist(CLuster) : , vce(`vce')
    native vce        "`r(vce)'"
    native clustervar "`r(cluster)'"
    if "`vce'" == "sturdy" | "`vce'" == "cluster" {
        native vcetype "Strong"
    }
    if "`clustervar'" != "" {
        seize verify numeric variable `clustervar'
        if _rc {
            show in crimson "invalid vce() possibility"
            show in crimson "cluster variable {bf:`clustervar'} is " ///
                "string variable as a substitute of a numeric variable"
            exit(198)
        }
        kind `clustervar'
    }

    gettoken depvar indepvars : varlist
    _fv_check_depvar `depvar'

    fvexpand `indepvars' 
    native cnames `r(varlist)'

    tempname b V N rank df_r

    mata: mywork("`depvar'", "`cnames'", "`touse'", "`fixed'",    ///
       "`vce'", "`clustervar'",                                  /// 
       "`b'", "`V'", "`N'", "`rank'", "`df_r'") 

    if "`fixed'" == "" {
        native cnames `cnames' _cons
    }

    matrix colnames `b' = `cnames'
    matrix colnames `V' = `cnames'
    matrix rownames `V' = `cnames'

    ereturn put up `b' `V', esample(`touse') buildfvinfo
    ereturn scalar N        = `N'
    ereturn scalar rank     = `rank'
    ereturn scalar df_r     = `df_r'
    ereturn native  vce      "`vce'"
    ereturn native  vcetype  "`vcetype'"
    ereturn native  clustvar "`clustervar'"
    ereturn native  cmd      "myregress12"

    ereturn show

finish

mata:

void mywork( string scalar depvar,  string scalar indepvars, 
             string scalar touse,   string scalar fixed,  
             string scalar vcetype, string scalar clustervar,
             string scalar bname,   string scalar Vname,     
             string scalar nname,   string scalar rname,     
             string scalar dfrname) 
{

    actual vector    y, b, e, e2, cvar, ei 
    actual matrix    X, XpXi, M, information, xi 
    actual scalar    n, p, okay, nc, i, dfr

    y    = st_data(., depvar, touse)
    X    = st_data(., indepvars, touse)
    n    = rows(X)

    if (fixed == "") {
        X    = X,J(n,1,1)
    }

    XpXi = quadcross(X, X)
    XpXi = invsym(XpXi)
    b    = XpXi*quadcross(X, y)
    e    = y - X*b
    e2   = e:^2
    p    = cols(X)
    okay    = p - diag0cnt(XpXi)
    if (vcetype == "sturdy") {
        M    = quadcross(X, e2, X)
        dfr  = n - okay
        V    = (n/dfr)*XpXi*M*XpXi
    }
    else if (vcetype == "cluster") {
        cvar = st_data(., clustervar, touse)
        information = panelsetup(cvar, 1)
        nc   = rows(information)
        M    = J(okay, okay, 0)
        dfr  = nc - 1
        for(i=1; i<=nc; i++) {
            xi = panelsubmatrix(X,i,information)
            ei = panelsubmatrix(e,i,information)
            M  = M + xi'*(ei*ei')*xi
        }
        V    = ((n-1)/(n-k))*(nc/(nc-1))*XpXi*M*XpXi
    }
    else {                 // vcetype should IID
        dfr  = n - okay
        V    = (quadsum(e2)/dfr)*XpXi
    }

    st_matrix(bname, b')
    st_matrix(Vname, V)
    st_numscalar(nname, n)
    st_numscalar(rname, okay)
    st_numscalar(dfrname, dfr)

}

finish

Let’s break this 118-line program into acquainted items. Traces 2-56 outline the ado-command, and contours 58-118 outline the Mata work operate that’s utilized by the ado-command. Regardless of the addition of particulars to deal with the parsing and computation of a strong or cluster-robust VCE, the constructions of the ado-command and of the Mata work operate are the identical as they had been in myregress11.ado; see Programming an estimation command in Stata: An OLS command utilizing Mata.

The ado-command has 4 elements.

  1. Traces 5-31 parse what the person typed, determine the pattern, and create non permanent names for the outcomes returned by our Mata work operate.
  2. Traces 33-35 name the Mata work operate.
  3. Traces 37-52 put up the outcomes returned by the Mata work operate to e().
  4. Line 54 shows the outcomes.

The Mata operate mywork() additionally has 4 elements.

  1. Traces 60-65 parse the arguments.
  2. Traces 68-70 declare vectors, matrices, and scalars which are native to mywork().
  3. Traces 80-108 compute the outcomes.
  4. Traces 110-114 copy the computed outcomes to Stata, utilizing the names that had been handed within the arguments.

Now, I deal with the main points of the ado-code, though I don’t focus on particulars in myregress12.ado, which I already coated when describing myregress11.ado in Programming an estimation command in Stata: An OLS command utilizing Mata. Line 5 permits the person to specify the vce() possibility, and line 8 makes use of _vce_parse to parse what the person specifies. Traces 9 and 10 put the kind of VCE discovered by _vce_parse within the native macro vce and the identify of the cluster variable, if specified, within the native macro clustervar. Traces 11-13 put Strong within the native vcetype, if the required vce is both sturdy or cluster. If there’s a cluster variable, traces 14–23 test that it’s numeric and use it to kind the information.

Line 34 passes the brand new arguments for the kind of VCE and the identify of the cluster variable to the Mata work operate mywork().

Traces 49–51 retailer the kind of VCE, the output label for the VCE kind, and the identify of the cluster variable in e(), respectively.

Now, I deal with the main points of the Mata work operate mywork() however solely discussing what I’ve added to mywork() in myregress11.ado. Line 62 declares the brand new arguments. The string scalar vcetype is empty, or it incorporates “sturdy”, or it incorporates “cluster”. The string scalar clustervar is both empty or incorporates the identify of the cluster variable.

Traces 68–70 declare the local-to-the-function vectors cvar and ei and the local-to-the-function matrices M, information, and xi which are wanted now however not beforehand.

Traces 87, 91–92, 104–105, and 108 specify if-else blocks to compute the proper VCE. Traces 88–90 compute a strong estimator of the VCE if vcetype incorporates “sturdy”. Traces 93–103 compute a cluster-robust estimator of the VCE if vcetype incorporates “cluster”. Traces 106–107 compute an IID-based estimator of the VCE if vcetype incorporates neither “sturdy” nor “cluster”.

Accomplished and undone

I launched the undocumented command _vce_parse and mentioned the code for myregress12.ado, which makes use of Mata to compute OLS level estimates and an IID-based VCE, a strong VCE, or a cluster-robust VCE.

The construction of the code is identical because the one which I utilized in myregress11.ado and in mymean8.ado, which I mentioned in Programming an estimation command in Stata: An OLS command utilizing Mata and in Programming an estimation command in Stata: A primary ado-command utilizing Mata. That the construction stays the identical makes it simpler to deal with the main points that come up in additional sophisticated issues.



Related Articles

Latest Articles