Monday, January 12, 2026

Programming an estimation command in Stata: Getting ready to write down a plugin


This publish is the primary in a collection that illustrates plug code written in one other language (like C, C++, or Java) into Stata. This method is called writing a plugin or as writing a dynamic-link library (DLL) for Stata.

Plugins could be written for any process, together with information administration, graphical evaluation, or statistical estimation. Per the theme of this collection, I talk about plugins for estimation instructions.

On this publish, I talk about the tradeoffs of writing a plugin, and I talk about a easy program whose calculations I’ll exchange with plugins in subsequent posts.

That is the twenty ninth publish within the collection Programming an estimation command in Stata. See Programming an estimation command in Stata: A map to posted entries for a map to all of the posts on this collection.

What’s a plugin?

A plugin is a number of features written in one other language that you could name from Stata. Technically, the opposite program is linked into Stata when it’s known as, a course of referred to as dynamically linking the library containing the opposite program in to Stata. For that reason, plugins are also called DLLs.

Individuals write plugins in different languages to resolve actually troublesome issues. The three commonest causes that you just would possibly write a plugin for Stata are the next.

  1. You will have code written in one other language for a way that isn’t obtainable in Stata or Mata.

  2. You will have an implementation in Stata/Mata that doesn’t make the most of built-in, quick, vectorized features, and you may make it sooner by doing among the calculations in a lower-level language like C.

  3. You develop your strategies in a low-level language like C after which plug the strategies in to Stata and different statistical software program packages.

The issue with plugins is that they’re troublesome. Writing and compiling a plugin written in one other language is far more troublesome than simply utilizing Stata and Mata. As well as, the prices of a mistake are a lot larger. Plugins could be harmful; they’ll corrupt reminiscence or do different issues that trigger Stata to exit or in any other case “crash”.

It’s exhausting to keep up plugins in C, C++, or Fortran. For these plugins, you need to compile a model for each working system on which you need it to work. For instance, you would possibly want compiled libraries for Home windows, Mac, and a model of Linux or two. Moreover, when the working system will get a brand new model, you need to recompile your plugin, and also you would possibly must distribute variations for the outdated and new variations of the working system.

Java plugins are a lot simpler to keep up, however they’re normally slower than plugins written in C, C++, or Fortran. You may distribute a single Java library to run on all of the supported working methods. You solely must recompile when modifications within the Java surroundings require it.

Regardless of these difficulties, there are circumstances through which a plugin could make a Stata implementation usable or possible. This collection of posts illustrates write and compile a plugin in a number of completely different languages.

A imply estimator

I start by discussing mymean10.ado, given in code block 1, which implements the command mymean10. mymean10 calculates the sample-average estimator for the imply and the variance–covariance of the estimator (VCE) for the variables specified by the consumer. mymean10 permits the consumer to specify an if restriction and an in vary, it handles lacking values within the specified variables, but it surely doesn’t permit any choices.

mymean10.ado makes use of Mata for its computations. Subsequent posts will exchange these Mata computations with plugin computations.

Code block 1: mymean10.ado


*! model 10.0.0  13Feb2018
program outline mymean10, eclass

    model 15.1

    syntax varlist(numeric) [if] [in]
    marksample touse
    tempname b V N

    mata: mymean_work("`varlist'", "`touse'", "`b'", "`V'", "`N'")

    matrix colnames `b'  = `varlist'
    matrix colnames `V'  = `varlist'
    matrix rownames `V'  = `varlist'
    ereturn publish `b' `V', esample(`touse')
    ereturn scalar   N   = `N'
    ereturn scalar df_r  = `N'-1
    ereturn show
finish

mata:
void mymean_work(string scalar vlist,          ///
    string scalar touse, string scalar bname,  ///
    string scalar vname, string scalar nname )
{
    actual matrix X, E, V
    actual vector b
    actual scalar n

    X = st_data(., vlist, touse)
    b = imply(X)
    E = (X :- b)
    n = rows(E)
    V = (1/n)*(1/(n-1))*quadcross(E,E)

    st_matrix(bname, b)
    st_matrix(vname, V)
    st_numscalar(nname, n)
}
finish

Observe the construction of this system. Strains 6–7 parse what the consumer typed, line 10 makes use of the Mata work perform mymean_work() to do the calculations, strains 12–18 retailer and show the outcomes, and features 21–40 outline mymean_work().

Let’s have a look at these elements.

In line 6, syntax creates the native macro varlist, which incorporates the variables specified by the consumer. syntax additionally creates the native macros if and in that respectively include any if restriction or in vary that the consumer specified. In line 7, marksample makes use of the native macros varlist, if, and in to create a sample-inclusion variable. This sample-inclusion variable is one for every included commentary and is zero for every excluded commentary. The sample-inclusion variable accounts for a user-specified if restriction or in vary that explicitly excludes an commentary, and it accounts for a lacking worth in any of the variables in varlist that implicitly excludes an commentary. marksample places the identify of this sample-inclusion variable within the native macro touse. (See Programming an estimation command in Stata: Permitting for pattern restrictions and issue variables for particulars of this course of.)

In line 8, tempname places momentary names into the native macros b, V, and N. These names usually are not in use, and the objects saved in these names might be dropped when mymean10 finishes. We use momentary names to keep away from overwriting international objects created by customers, like Stata matrices and Stata scalars. (See Programming an estimation command in Stata: A primary ado-command for an introduction to momentary names.)

Line 10 makes use of a one-line name to the Mata work perform mymean_work() to calculate the purpose estimates, the VCE, and the pattern dimension. mymean_work() places the vector of level estimates within the Stata matrix whose identify is saved within the native macro b. mymean_work() places the VCE within the Stata matrix whose identify is saved within the native macro V. And mymean_work() places the variety of included observations within the Stata scalar whose identify is saved within the native macro N. (Particulars of this course of are additionally mentioned in Programming an estimation command in Stata: Permitting for pattern restrictions and issue variables.)

Strains 12–14 put row and column names on the matrices that retailer the vector of level estimates and the VCE. Strains 15–17 retailer the leads to e(), and line 18 produces a typical Stata output desk. (See Programming an estimation command in Stata: A primary ado-command utilizing Mata for extra particulars.)

Strains 21–40 outline mymean_work().

Strains 22–24 specify that mywork_work() returns nothing to its caller and accepts 5 string scalar arguments. The primary argument, vlist, incorporates the names of the variables specified by the consumer. The second, touse, incorporates the identify of the sample-inclusion variable mentioned above. The final three include the names into which the outcomes might be saved. When mymean_work() finishes, the native macro bname incorporates the identify of the Stata matrix storing the vector of level estimates, the native macro vname incorporates the identify of the Stata matrix storing the VCE, and the native macro nname incorporates the identify of the Stata scalar storing the variety of pattern observations.

Now, I talk about mymean_work(). Strains 26–28 declare variables used within the perform. Line 30 places a replica of the observations for which sample-inclusion variable in Stata is one into the matrix X.

Strains 31–34 calculate the outcomes. Line 31 places the purpose estimates into the Mata vector b. Strains 32–34 calculate the VCE and retailer it within the Mata matrix V.

In line 36, st_matrix() copies the purpose estimates from b to the Stata matrix whose identify is saved in bname. In line 37, st_matrix() copies the VCE from V to the Stata matrix whose identify is saved in vname. In line 38, st_numscalar() copies the variety of pattern observations from the Mata scalar n to the Stata scalar whose identify is saved in nname.

What is going to change once we write a plugin?

All the final construction and lots of the specifics keep the identical once we write a plugin. What modifications is that we name a plugin to do the calculations as an alternative of a Mata perform.

Take into consideration writing code in a language like C to duplicate the calculations carried out by mymean_work(). Three issues would change. First, we might not copy the information from Stata into our plugin. The plugin surroundings provides us a view onto the information in Stata. Second, we have now to write down a perform to implement the imply carried out on line 31. Third, we must write a perform to implement the VCE calculations carried out on strains 32–34.

To facilitate the introduction to plugins, I made these modifications in Mata, as illustrated in mymean11.ado in code block 2.

Code block 2: mymean11.ado


*! model 11.0.0  13Feb2018
program outline mymean11, eclass

    model 15.1

    syntax varlist(numeric) [if] [in]
    marksample touse
    tempname b V N

    mata: mymean_work("`varlist'", "`touse'", "`b'", "`V'", "`N'")

    matrix colnames `b'  = `varlist'
    matrix colnames `V'  = `varlist'
    matrix rownames `V'  = `varlist'

    ereturn publish `b' `V', esample(`touse')
    ereturn scalar   N   = `N'
    ereturn scalar df_r  = `N'-1
    ereturn show
finish

mata:
void mymean_work(string scalar vlist,          ///
    string scalar touse, string scalar bname,  ///
    string scalar vname, string scalar nname )
{
    actual matrix X, E, V, samp
    actual vector b
    actual scalar n

    st_view(samp=., ., touse)
    st_view(X=., ., vlist)

    MyAve(X, samp, b, n)
    MyV(X, samp, b, V)

    st_matrix(bname, b)
    st_matrix(vname, V)
    st_numscalar(nname, n)
}

void MyAve(actual matrix X, actual vector samp, b, n)
{
    actual scalar   r, c, i, j

    r = rows(X)
    c = cols(X)
    b = J(1, c, 0)
    n = 0
    for(i=1; i<=r; i++) {
        if (samp[i]==1) {
            ++n
            for(j=1; j<=c; j++) {
                b[1,j] = b[1,j] + X[i,j]
            }
        }
    }
    b = (1/n)*b
}

void MyV(actual matrix X, actual vector samp, actual matrix b, V)
{
    actual scalar r, c, i, j, j2, n

    r = rows(X)
    c = cols(X)

    V = J(c, c, 0)
    e = J(1, c, 0)
    n = 0
    if (rows(b)!=1 | cols(b)!=c) {
        printf("{err}sample-average vector and VCE usually are not conformablen")
        exit(error(503))
    }
    for(i=1; i<=r; i++) {
        if (samp[i]==1) {
            ++n
            for(j=1; j<=c; j++) {
                e[1,j] = X[i,j] - b[1,j]
            }
            for(j=1; j<=c; j++) {
                for(j2=1; j2<=j; j2++) {
                    V[j, j2] = V[j, j2] + e[1,j]*e[1,j2]
                }
            }
        }
    }
    for(j=1; j<=c; j++) {
        for(j2=j+1; j2<=c; j2++) {
            V[j, j2] = V[j2, j]
        }
    }

    V = (1/n)*(1/(n-1))*V
}

finish

The Stata half in strains 1–19 is unchanged, as is the decision to mymean_work().

Strains 31 and 32 differ from their counterparts in mymean10.ado. Line 30 in mymean10.ado places a replica of the observations on the user-specified variables for which the sample-inclusion variable is one into the matrix X. Line 31 in mymean11.ado will get a view named samp on to all of the observations of the sample-inclusion variable. Line 32 mymean11 will get a separate view named X on to all of the observations of the variables specified by the consumer. These views on samp and X are extra like the information–entry features offered by the plugin surroundings.

In line 34 of mymean11.ado, we use the MyAve() perform to place the pattern common into b and the variety of observations for which the sample-inclusion variable is one into n. The code for MyAve() on strains 42–59 is analogous to the code one may write in, say, C. The biggest distinction is that line 58 makes use of Mata operators to divide every aspect of b by a scalar. The plugins will include features to carry out these operations.

In line 35, we use the MyV() perform to place the VCE into V. The code for MyV() is on strains 61–95. This code can also be analogous to the code one may write in C, with the identical caveat that line 94 could be carried out as a perform.

Strains 37-39 in myean11.ado are the identical as their counterpart strains 36–38 in mymean10.ado.

If I have been coding an estimator in Mata, I might not use the implementation in myean11. I may make issues considerably sooner by solely placing the observations for which the sample-inclusion variable is one in to the Mata view. The MyAve() and MyV() features illustrate how the calculations could be carried out in loops over information. The plugins will implement variations of those features.

Executed and undone

After discussing some execs and cons of writing a plugin for Stata, I mentioned a program whose calculations I may implement in a plugin. In my subsequent publish, I’ll talk about a C plugin for these calculations.



Related Articles

Latest Articles