Programming an estimation command in Stata: Consolidating your code

(
newcommand{xb}{{bf x}}
newcommand{gb}{{bf g}}
newcommand{Hb}{{bf H}}
newcommand{Gb}{{bf G}}
newcommand{Eb}{{bf E}}
newcommand{betab}{boldsymbol{beta}}
)I write ado-commands that estimate the parameters of an exponential conditional imply (ECM) mannequin and a probit conditional imply (PCM) mannequin by nonlinear least squares, utilizing the strategies that I mentioned within the publish Programming an estimation command in Stata: Nonlinear least-squares estimators. These instructions will both share numerous code or repeat numerous code, as a result of they’re so related. It’s virtually at all times higher to share code than to repeat code. Shared code solely must be modified in a single place so as to add a characteristic or to repair an issue; repeated code have to be modified in every single place. I introduce Mata libraries to share Mata features throughout ado-commands, and I introduce wrapper instructions to share ado-code.

That is the twenty seventh publish within the sequence Programming an estimation command in Stata. I like to recommend that you simply begin at the start. See Programming an estimation command in Stata: A map to posted entries for a map to all of the posts on this sequence.

Ado-commands for ECM and PCM fashions

I now convert the examples of NLS for ECM and PCM fashions mentioned in Programming an estimation command in Stata: Nonlinear least-squares estimators to ado-commands. To maintain issues easy, the ado-commands mentioned right here omit many customary options comparable to issue variables, time-series variables, strong estimators of the VCE, and cluster–strong estimators of the VCE.

mynlexp1 implements an NLS estimator for the parameters of the ECM mannequin.

Code block 1: mynlexp1.ado


*! model 1.0.0  09May2016
program outline mynlexp1, eclass sortpreserve
    model 14.1

    syntax varlist [if] [in] [,  noCONStant ]
    marksample touse

    gettoken depvar indeps : varlist

    tempname b V N rank

    mata: mywork("`depvar'", "`indeps'", "`touse'", "`fixed'", ///
       "`b'", "`V'", "`N'", "`rank'" )

    if "`fixed'" == "" {
        native indeps "`indeps' _cons"
    }
    matrix colnames `b' = `indeps'
    matrix colnames `V' = `indeps'
    matrix rownames `V' = `indeps'

    ereturn publish `b' `V', esample(`touse')
    ereturn scalar N       = `N'
    ereturn scalar rank    = `rank'
    ereturn native  cmd     "mynlexp"

    ereturn show

finish

mata:

void MYNLExp(actual scalar todo, actual vector b,   ///
        actual vector y, actual matrix X,           ///
        val, grad, hess)
{
        actual vector  r, f, xb
        actual matrix  df

        xb  = X*b'
        f   = exp(xb)
        r   = y - f
        val = -(r:^2)
        df  = f:*X

        if (todo>=1) {
                grad = r:*df
        }
        if (todo==2) {
                hess = -1*quadcross(df, df)
        }
}

void mywork( string scalar depvar,  string scalar indeps,
             string scalar touse,   string scalar fixed,
             string scalar bname,   string scalar Vname,
             string scalar nname,   string scalar rname)
{
    actual vector y, b
    actual matrix X, V
    actual scalar n, p, ssr
    transmorphic S

    y = st_data(., depvar, touse)
    n = rows(y)
    X = st_data(., indeps, touse)
    if (fixed == "") {
        X = X,J(n, 1, 1)
    }
    p = cols(X)

    S = optimize_init()
    optimize_init_argument(S, 1, y)
    optimize_init_argument(S, 2, X)
    optimize_init_evaluator(S, &MYNLExp())
    optimize_init_params(S, J(1, p, .01))
    optimize_init_evaluatortype(S, "gf2")
    optimize_init_conv_vtol(S, 1e-10)
    b   = optimize(S)
    V   = invsym(-1*optimize_result_Hessian(S))
    ssr = (-1/(n-p))*optimize_result_value(S)
    V   = ssr*V

    st_matrix(bname, b)
    st_matrix(Vname, V)
    st_numscalar(nname, n)
    st_numscalar(rname, p)

}
finish

Traces 2–29 outline the ado-command, which makes use of the Mata work operate mywork() outlined on strains 54–89. Traces 33–52 outline the evaluator operate MYNLExp() utilized by optimize() in mywork(). This construction needs to be acquainted from the Poisson regression examples beforehand
mentioned.

mynlprobit1 implements an NLS estimator for the parameters of the PCM mannequin.

Code block 2: mynlprobit1.ado


*! model 1.0.0  09May2016
program outline mynlprobit1, eclass sortpreserve
    model 14.1

    syntax varlist [if] [in] [,  noCONStant ]
    marksample touse

    gettoken depvar indeps : varlist

    tempname b V N rank

    mata: mywork("`depvar'", "`indeps'", "`touse'", "`fixed'", ///
       "`b'", "`V'", "`N'", "`rank'" )

    if "`fixed'" == "" {
        native indeps "`indeps' _cons"
    }
    matrix colnames `b' = `indeps'
    matrix colnames `V' = `indeps'
    matrix rownames `V' = `indeps'

    ereturn publish `b' `V', esample(`touse')
    ereturn scalar N       = `N'
    ereturn scalar rank    = `rank'
    ereturn native  cmd     "mynlexp"

    ereturn show

finish

mata:

void MYNLProbit(actual scalar todo, actual vector b,  ///
        actual vector y, actual matrix X,             ///
        val, grad, hess)
{
        actual vector  r, f, xb
        actual matrix  df

        xb  = X*b'
        f   = regular(xb)
        r   = y - f
        val = -(r:^2)
        df  = normalden(xb):*X

        if (todo>=1) {
                grad = r:*df
        }
        if (todo==2) {
                hess = -1*quadcross(df, df)
        }
}

void mywork( string scalar depvar,  string scalar indeps,
             string scalar touse,   string scalar fixed,
             string scalar bname,   string scalar Vname,
             string scalar nname,   string scalar rname)
{

    actual vector y, b
    actual matrix X, V
    actual scalar n, p, ssr
    transmorphic S

    y = st_data(., depvar, touse)
    n = rows(y)
    X = st_data(., indeps, touse)
    if (fixed == "") {
        X = X,J(n, 1, 1)
    }
    p = cols(X)

    S = optimize_init()
    optimize_init_argument(S, 1, y)
    optimize_init_argument(S, 2, X)
    optimize_init_evaluator(S, &MYNLProbit())
    optimize_init_params(S, J(1, p, .01))
    optimize_init_evaluatortype(S, "gf2")
    optimize_init_conv_vtol(S, 1e-10)
    b   = optimize(S)
    V   = invsym(-1*optimize_result_Hessian(S))
    ssr = (-1/(n-p))*optimize_result_value(S)
    V   = ssr*V

    st_matrix(bname, b)
    st_matrix(Vname, V)
    st_numscalar(nname, n)
    st_numscalar(rname, p)
}
finish

The code for mynlprobit1 is almost similar to that for mynlexp1.

Duplicated code is harmful. Anytime you need to add a characteristic or repair an issue, you should do it twice. I extremely suggest that you simply keep away from duplicated code, and I illustrate how by rewriting these instructions to have a single Mata code base after which a single ado-code base.

Libraries of Mata code

The mywork() features utilized in mynlexp1 and mynlprobit1 differ solely within the evaluator operate they name; see line 76 in code blocks 1 and a pair of. I want to have one mywork() operate that is named by mynlexp1 and by mynlprobit1.

Mata features outlined on the backside of an ado-file are native to that ado-file, so I can not use this technique for outlining the mywork() operate that will likely be utilized by each mynlexp1 and mynlprobit1. What I want is a file containing compiled Mata features which are callable from another Mata operate or from inside any ado-file or do-file. Any such file is called a library.

I take advantage of mynllib.mata in code block 3 to make the library lmynllib.mlib containing the compiled Mata features MYNLWork(), MYNLProbit(), MYNLExp().

Code block 3: mynllib.mata


mata:
mata clear

void MYNLExp(actual scalar todo, actual vector b,   ///
        actual vector y, actual matrix X,           ///
        val, grad, hess)
{
        actual vector  r, f
        actual matrix  df

        f   = exp(X*b')
        r   = y - f
        val = -(r:^2)
        df  = f:*X

        if (todo>=1) {
                grad = r:*df
        }
        if (todo==2) {
                hess = -1*quadcross(df, df)
        }
}

void MYNLProbit(actual scalar todo, actual vector b,  ///
        actual vector y, actual matrix X,             ///
        val, grad, hess)
{
        actual vector  r, f, xb
        actual matrix  df

        xb  = X*b'
        f   = regular(xb)
        r   = y - f
        val = -(r:^2)
        df  = normalden(xb):*X

        if (todo>=1) {
                grad = r:*df
        }
        if (todo==2) {
                hess = -1*quadcross(df, df)
        }
}

void MYNLWork( string scalar depvar,  string scalar indeps,
             string scalar touse,   string scalar fixed,
             string scalar bname,   string scalar Vname,
             string scalar nname,   string scalar rname,
             string scalar mannequin)
{

    actual vector       y, b
    actual matrix       X, V
    actual scalar       n, p, ssr
    string scalar     emsg
    pointer(operate) f
    transmorphic      S

    if (mannequin=="expm") {
        f = &MYNLExp()
    }
    else if (mannequin=="probit") {
        f = &MYNLProbit()
    }
    else {
        emsg = "{pink}mannequin " + mannequin + " invalidn"
        printf(emsg)
        exit(error(498))
    }
    y = st_data(., depvar, touse)
    n = rows(y)
    X = st_data(., indeps, touse)
    if (fixed == "") {
        X = X,J(n, 1, 1)
    }
    p = cols(X)

    S = optimize_init()
    optimize_init_argument(S, 1, y)
    optimize_init_argument(S, 2, X)
    optimize_init_evaluator(S, f)
    optimize_init_params(S, J(1, p, .01))
    optimize_init_evaluatortype(S, "gf2")
    optimize_init_conv_vtol(S, 1e-10)
    b   = optimize(S)
    V   = invsym(-1*optimize_result_Hessian(S))
    ssr = (-1/(n-p))*optimize_result_value(S)
    V   = ssr*V

    st_matrix(bname, b)
    st_matrix(Vname, V)
    st_numscalar(nname, n)
    st_numscalar(rname, p)
}

mata mlib create lmynllib, substitute
mata mlib add    lmynllib MYNLWork() MYNLProbit() MYNLExp()

finish

Traces 1 and 99 open and shut the Mata session by which I outline the features, create the library, and add the features to the library. Traces 4–22 outline MYNLExp(), which I’ve already mentioned. Traces 24–43 outline MYNLProbit(), which I’ve already mentioned. Traces 45–94 outline MYNLWork(), which is the work operate that each mynlexp2.ado and mynlprobit2.ado will use. Observe that I’ve used uppercase letters within the names MYNLExp(), MYNLProbit(), MYNLWork(). Features in Mata libraries are world: they are often known as from wherever, and their names have to be distinctive within the house of names for Mata features. I attempt to keep away from utilizing operate names that different programmers may use by prefixing the names of my features with an uppercase title of the library and starting the operate title with an uppercase letter.

Line 49 specifies that the ninth argument MYNLWork() is a string scalar referred to as mannequin contained in the operate. The ado-commands will go both “expm” or “probit” on this argument.

If mannequin accommodates “expm”, line 60 shops the handle of the operate MYNLExp() in f. The variable sort that holds the handle of a operate is called a pointer to a operate. Because of this, line 56 declares f to a pointer to a operate. If mannequin accommodates “probit”, line 63 shops the handle of the operate MYNLProbit() in f. If mannequin doesn’t include “expm” or “probit”, strains 66–68 show an error message and exit.

Pointers maintain the handle of an object. All I want here’s a field that holds the handle of the evaluator operate comparable to the mannequin match by the ado-command that may name MYNLWork(). Line 56 declares f to be this kind of field, strains 60 and 63 retailer the reminiscence of the right operate in f, and line 81 places the handle saved in f into the optimize object S. Kind assist M-2 pointers to study extra about pointers.

The remaining strains of MYNLWork() are the identical because the strains within the mywork() features in mynlexp1 and mynlprobit1.

There may be nonetheless numerous duplicated code within the evaluator features MYNLExp() and MYNLProbit(). As a substitute of utilizing a pointer to the evaluator operate, I may have consolidated the evaluator features MYNLExp() and MYNLProbit() right into a single operate that used a further argument to determine which case to judge. I selected the introduced technique as a result of it’s sooner. Consolidating the evaluator features would have slowed down the operate that I most need to pace up. (The evaluator operate is named many instances by optimize().) So, on this case, I accepted the danger of duplicated code for the benefit of pace.

Line 96 creates the Mata library lmynllib.mlib within the present listing, changing any beforehand outlined model of this library. Line 97 places the compiled variations of MYNLWork(), MYNLProbit(), and MYNLExp() into lmynllib.mnlib. At this level, the file lmynllib.mlib within the present listing accommodates the compiled features MYNLWork(), MYNLProbit(), and MYNLExp().

mynllib.mata is a do-file that makes a Mata library, therefore the .mata suffix as an alternative of the .do suffix. I can execute it by typing do mynllib.mata. Instance 1 makes the library lmynllib.mlib.

Instance 1: Making a Mata library


. program drop _all

. mata: mata clear

. quietly do mynllib.mata

. mata: mata mlib index
.mlib libraries to be searched are actually
    lmatabase;lmatapss;lmataado;lmatapostest;lmatafc;lmatasem;lmatapath;
> lmatamcmc;lmatagsem;lmataopt;lmynllib;lfreduse;lpoparms;lspmat

After dropping all of the ado-commands in reminiscence, and clearing Mata, I used quietly do mynllib.mata to make the library, as a result of I don’t need to see the code once more. mata: mata mlib index updates the listing libraries identified to Mata; this step provides lmynllib.mlib to the listing of identified libraries in order that I can use the features therein outlined.

Having made lmynllib.mlib and added it to the listing of identified libraries, I can use the features therein outlined in one-line Mata calls in my ado-commands. Think about mynlexp2.

Code block 4: mynlexp2.ado


*! model 2.0.0  10May2016
program outline mynlexp2, eclass sortpreserve
    model 14.1

    syntax varlist [if] [in] [,  noCONStant ]
    marksample touse

    gettoken depvar indeps : varlist

    tempname b V N rank

    mata: MYNLWork("`depvar'", "`indeps'", "`touse'", "`fixed'", ///
       "`b'", "`V'", "`N'", "`rank'", "expm" )

    if "`fixed'" == "" {
        native indeps "`indeps' _cons"
    }
    matrix colnames `b' = `indeps'
    matrix colnames `V' = `indeps'
    matrix rownames `V' = `indeps'

    ereturn publish `b' `V', esample(`touse')
    ereturn scalar N       = `N'
    ereturn scalar rank    = `rank'
    ereturn native  cmd     "mynlexp"

    ereturn show

finish

The code for mynlexp2 is sort of the identical because the ado-code for mynlexp1 in code block 1. The one variations are that line 12 calls MYNLWork() as an alternative of mywork() and that MYNLWork() accepts a ninth argument, specified on line 13 to be “expm”.

Now take into account mynlprobit2 in code block 5.

Code block 5: mynlprobit2.ado


*! model 2.0.0  10May2016
program outline mynlprobit2, eclass sortpreserve
    model 14.1

    syntax varlist [if] [in] [,  noCONStant ]
    marksample touse

    gettoken depvar indeps : varlist

    tempname b V N rank

    mata: MYNLWork("`depvar'", "`indeps'", "`touse'", "`fixed'", ///
       "`b'", "`V'", "`N'", "`rank'", "probit" )

    if "`fixed'" == "" {
        native indeps "`indeps' _cons"
    }
    matrix colnames `b' = `indeps'
    matrix colnames `V' = `indeps'
    matrix rownames `V' = `indeps'

    ereturn publish `b' `V', esample(`touse')
    ereturn scalar N       = `N'
    ereturn scalar rank    = `rank'
    ereturn native  cmd     "mynlprobit"

    ereturn show

finish

The analogous modifications are made to mynlprobit2 that have been made to mynlexp2. Particularly, notice that line 13 passes “probit” within the ninth argument to MYNLWork().

Writing a piece ado-command

The Mata library allowed me to consolidate the duplicated Mata code. I nonetheless have numerous duplicated ado-code. To consolidate the duplicated ado-code, I’ve mynlexp3 and mynlprobit3 name a single ado-command that does the work as seen in code blocks 6 and seven.

Code block 6: mynlexp3.ado


*! model 3.0.0  11May2016
program outline mynlexp3
    model 14.1

    mynlwork expm `0'

finish

Code block 7: mynlprobit3.ado


*! model 3.0.0  11May2016
program outline mynlprobit3, eclass sortpreserve
    model 14.1

    mynlwork probit `0'

finish

Recall that regardless of the person specified is contained within the native macro 0. Line 5 of mynlexp3 passes regardless of the person specified prefixed with “expm” to mynlwork. Equally, line 5 of mynlprobit3 passes regardless of the person specified prefixed with “probit” to mynlwork. mynlexp3 and mynlprobit3 are referred to as wrapper instructions, as a result of they only wrap calls to mynlwork, which does the precise work.

Code block 8 accommodates the code for mynlwork.

Code block 8: mynlwork.ado


*! model 1.0.0  11May2016
program outline mynlwork, eclass sortpreserve
    model 14.1

    gettoken mannequin 0 : 0

    if "`mannequin'" == "expm" {
        native cname "mynlexp"
    }
    else if "`mannequin'" == "probit" {
        native cname "mynlprobit"
    }
    else {
        diplay "{pink}mannequin `mannequin' invalid"
        exit 498
    }

    syntax varlist [if] [in] [,  noCONStant ]
    marksample touse

    gettoken depvar indeps : varlist

    tempname b V N rank

    mata: MYNLWork("`depvar'", "`indeps'", "`touse'", "`fixed'", ///
       "`b'", "`V'", "`N'", "`rank'", "`mannequin'" )

    if "`fixed'" == "" {
        native indeps "`indeps' _cons"
    }
    matrix colnames `b' = `indeps'
    matrix colnames `V' = `indeps'
    matrix rownames `V' = `indeps'

    ereturn publish `b' `V', esample(`touse')
    ereturn scalar N       = `N'
    ereturn scalar rank    = `rank'
    ereturn native  cmd     "`cname'"

    ereturn show

finish

Line 5 makes use of gettoken to place the mannequin prefixed to the person enter by mynlexp3 or mynlprobit3 within the native macro mannequin and to place regardless of the person specified within the native macro 0. Traces 7–16 put the title of the calling command within the native macro cname or exit with an error message, if the mannequin will not be acknowledged. In principle, the error case dealt with in strains 13–16 will not be essential, as a result of I ought to know tips on how to name my very own command. Expertise has taught me that dealing with these additional error instances makes altering the code sooner or later a lot simpler, so I take into account this good observe.

By line 17, the native macro mannequin and the native macro cname include all that differs between the instances dealt with by mynlexp3 and mynlprobit3. mannequin is handed to MYNLWork() on line 26, and cname is used to retailer the command title in e(cmd) on line 38.

Examples 2 and three illustrate that mynlexp3 and mynlprobit3 produce output comparable to the outcomes produced by nl in examples 2 and 4 in Programming an estimation command in Stata: Nonlinear least-squares estimators.

Instance 2: mynlexp3 output


. mynlexp3 accidents cvalue tickets
Iteration 0:   f(p) =  -2530.846
Iteration 1:   f(p) = -1116.4901
Iteration 2:   f(p) = -248.56923
Iteration 3:   f(p) = -225.91644
Iteration 4:   f(p) = -225.89573
Iteration 5:   f(p) = -225.89573
Iteration 6:   f(p) = -225.89573
----------------------------------------------------------------------------
           |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-----------+----------------------------------------------------------------
    cvalue |   .1759434   .0323911     5.43   0.000     .1124581    .2394287
   tickets |   1.447672   .0333599    43.40   0.000     1.382287    1.513056
     _cons |  -7.660608   .2355725   -32.52   0.000    -8.122322   -7.198894
----------------------------------------------------------------------------

Instance 3: mynlprobit3 output


. mynlprobit3 hadaccident cvalue tickets
Iteration 0:   f(p) = -132.90997
Iteration 1:   f(p) = -16.917203
Iteration 2:   f(p) = -10.995001
Iteration 3:   f(p) = -10.437501
Iteration 4:   f(p) = -10.427738
Iteration 5:   f(p) = -10.427156
Iteration 6:   f(p) = -10.427123
Iteration 7:   f(p) = -10.427121
Iteration 8:   f(p) =  -10.42712
Iteration 9:   f(p) =  -10.42712
Iteration 10:  f(p) =  -10.42712
----------------------------------------------------------------------------
           |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-----------+----------------------------------------------------------------
    cvalue |   .3616322   .0918214     3.94   0.000     .1816656    .5415988
   tickets |   2.177509   .1974173    11.03   0.000     1.790578     2.56444
     _cons |  -10.95166    1.05565   -10.37   0.000    -13.02069   -8.882622
----------------------------------------------------------------------------

Completed and undone

It’s virtually at all times higher to share code than to repeat code. Shared code solely must be modified in a single place so as to add a characteristic or to repair an issue; repeated code have to be modified in every single place. I launched Mata libraries to share Mata features throughout ado-commands, and I launched wrapper instructions to share ado-code.

Programming an estimation command in Stata: Consolidating your code

Related Articles

Orchestrate AI-Pushed Options with the Cisco AITECH Certification

Workday’s AI reset: Can brokers save SaaS?

Quick Native LLM Inference, {Hardware} Selections & Tuning

Latest Articles

Orchestrate AI-Pushed Options with the Cisco AITECH Certification

Workday’s AI reset: Can brokers save SaaS?

Quick Native LLM Inference, {Hardware} Selections & Tuning

Social media is a faulty product

Subbed information supply, decrease inflation estimate – FlowingData