Saturday, April 11, 2026

Superior Mata: Pointers – The Stata Weblog


I’m nonetheless recycling my speak known as “Mata, The Lacking Handbook” at consumer conferences, a chat designed to make Mata extra approachable. One of many issues I say late within the speak is, “Until you already know what pointers are and know you want them, ignore them. You don’t want them.” And right here I’m writing about, of all issues, pointers. Effectively, I exaggerated a bit in my speak, however just a bit.

Earlier than you are taking my earlier recommendation and cease studying, let me clarify: Mata serves numerous functions and one among them is as the first langugage we at StataCorp use to implement new options in Stata. I’m not referring to mock ups, toys, and experiments, I’m speaking about ready-to-ship code. Stata 12’s Structural Equation Modeling options are written in Mata, so is A number of Imputation, so is Stata’s optimizer that’s utilized by almost all estimation instructions, and so are most options. Mata has a facet to it that’s exceedingly severe and meant to be used by severe builders, and each a type of options can be found to customers simply as they’re to StataCorp builders. This is among the causes there are such a lot of user-written instructions can be found for Stata. Even when you don’t use the intense options, you profit.

So sometimes I have to take outing and tackle the considerations of those consumer/builders. I knew I wanted to try this now when Equipment Baum emailed a query to me that ended with “I’m stumped.” Equipment is the writer of An Introduction to Stata Programming which has accomplished extra to make Mata approachable to skilled researchers than something StataCorp has accomplished, and Equipment isn’t usually stumped.

I’ve a sure reptutation about how I reply most questions. “Why do you wish to try this?” I invariably reply, or worse, “You don’t wish to try this!” after which go on to provide the reply to the query I wanted they’d requested. When Equipment asks a query, nonetheless, I simply reply it. Equipment requested a query about pointers by establishing a synthetic instance and I do not know what his actual motivation was, so I’m not even going to attempt to encourage the query for you. The query is attention-grabbing in and of itself anyway.

Right here is Equipment’s synthetic instance:


actual perform x2(actual scalar x) return(x^2)
 
actual perform x3(actual scalar x) return(x^3) 

void perform tryit() 
{
        pointer(actual scalar perform) scalar fn
        string rowvector                     func
        actual scalar                          i

        func = ("x2", "x3")
        for(i=1;i<=size(func);i++) {
                fn = &(func[i])
                (*fn)(4)
        }
}

Equipment is working with pointers, and never simply tips to variables, however tips to capabilities. A pointer is the reminiscence tackle, the tackle the place the variable or perform is saved. Actual compilers translate names into reminiscence addresses which is among the causes actual compilers produce code that runs quick. Mata is an actual compiler. Anyway, pointers are reminiscence addresses, akin to 58, 212,770, 427,339,488, besides the values are normally written in hexadecimal slightly than decimal. Within the instance, Equipment has two capabilities, x2(x) and x3(x). Equipment needs to create a vector of the perform addresses after which name every of the capabilities within the vector. Within the synthetic instance, he is calling every with an argument of 4.

The above code doesn’t work:


: tryit()
         tryit():  3101  matrix discovered the place perform required
         :     -  perform returned error

The error message is from the Mata compiler and it is complaining concerning the line


        (*fn)(4)

however the true drawback is earlier within the tryit() code.

One corrected model of tryit() would learn,


void perform tryit()
{
        pointer(actual scalar perform) scalar fn
        pointer(actual scalar perform) vector func     // <---
        actual scalar                          i

        func = (&x2(), &x3())                         // <---
        for(i=1;i<=size(func);i++) {
                fn = func[i]                          // <---
                (*fn)(4)
        }
}

If you happen to make the three modifications I marked, tryit() works:


: tryit()
  16
  64

I wish to clarify this code and other ways the code might have been mounted. It is going to be simpler if we simply work interactively, so let’s begin over again:


: actual scalar x2(x) return(x^2)

: actual scalar x3(x) return(x^3)

: func = (&x2(), &x3())

Let’s check out what’s in func:


: func
                1            2
    +---------------------------+
  1 |  0x19551ef8   0x19552048  |
    +---------------------------+

These are reminiscence addresses. After we typed &x2() and &x3() within the line


: func = (&x2(), &x3())

capabilities x2() and x3() weren’t known as. &x2() and &x3() as a substitute consider to the addresses of the capabilities named x2() and x3(). I can show this:


: &x2()
  0x19551ef8

0x19551ef8 is the reminiscence tackle of the place the perform x2() is saved. 0x19551ef8 could not seem like a quantity, however that’s solely as a result of it’s introduced in base 16. 0x19551ef8 is in truth the quantity 425,008,888, and the compiled code for the perform x2() begins on the 425,008,888th byte of reminiscence and continues thereafter.

Let’s assign to fn the worth of the tackle of one of many capabilities, say x2(). I might try this by typing


: fn = func[1]

or by typing


: fn = &x2()

and both means, once I take a look at fn, it accommodates a reminiscence tackle:


: fn
  0x19551ef8

Let’s now name the perform whose tackle we’ve saved in fn:


: (*fn)(2)
  4

After we name a perform and wish to go 2 as an argument, we usually code f(2). On this case, we substitute (*fn) for f as a result of we don’t wish to name the perform named f(), we wish to name the perform whose tackle is saved in variable fn. The operator * normally means multiplication, however when * is used as a prefix, it means one thing totally different, in a lot the identical means the minus operator may be subtract or negate. The that means of unary * is “the contents of”. After we code *fn, we imply not the worth 425,008,888 saved in fn, we imply the contents of the reminiscence tackle 425,008,888, which occurs to be the perform x2().

We kind (*fn)(2) and never *fn(2) as a result of *fn(2) could be interpreted to imply *(fn(2)). If there have been a perform named fn(), that perform could be known as with argument 2, the outcome obtained, after which the star would take the contents of that reminiscence tackle, assuming fn(2) returned a reminiscence tackle. If it did not, we would get a kind mismatch error.

The syntax may be complicated till you perceive the reasoning behind it. Let’s begin with all new names. Take into account one thing named X. Truly, there might be two various things named X and Mata wouldn’t be confused. There might be a variable named X and there might be a perform named X(). To Mata, X and X() are various things, or stated within the jargon, have totally different title areas. In Mata, variables and capabilities can have the identical names. Variables and capabilities having the identical names in C isn’t allowed — C has just one title house. So in C, you possibly can kind


fn = &x2

to acquire the tackle of variable x2 or perform x2(), however in Mata, the above means the tackle of the variable x2, and if there isn’t a such variable, that is an error. In Mata, to acquire the tackle of perform x2(), you kind


fn = &x2()

The syntax &x2() is a definitional nugget; there isn’t a taking it aside to know its logic. However we will take aside the logic of the programmer who outlined the syntax. & means “tackle of” and &factor means to take the tackle of factor. If factor is a title&title — which means to lookup title within the variable house and return its tackle. If factor is title(), which means lookup title within the perform house and return its tackle. They means we formally write this grammar is


 &factor, the place 

 factor  :=   title
             title()
             exp

There are three potentialities for factor; it is a title or it is a title adopted by () or it is an expression. The final isn’t a lot used. &2 creates a literal 2 after which tells you the tackle the place the two is saved, which could be 0x195525d8. &(2+3) creates 5 after which tells you the place the 5 is saved.

However let’s get again to Equipment’s drawback. Equipment coded,


func = ("x2", "x3")

and I stated no, code as a substitute


func = (&x2(), &x3())

You don’t use strings to acquire pointers, you employ the precise title prefixed by ampersand.

There is a refined distinction in what Equipment was making an attempt to code and what I did code, nonetheless. In what Equipment tried to code, Equipment was in search of “run-time binding”. I, nonetheless, coded “compile-time binding”. I am about to clarify the distinction and present you easy methods to obtain run-time binding, however earlier than I do, let me inform you that

  1. You in all probability need compile-time binding.
  2. Compile-time binding is quicker.
  3. Run-time binding is usually required, however when individuals new to pointers assume they want run-time binding, they normally don’t.

Let me outline compile-time and run-time binding:

  1. Binding refers to establishing addresses comparable to names and names(). The names are stated to be sure to the tackle.
  2. In compile-time binding, the addresses are established on the time the code is compiled.

    Extra accurately, compile-time binding does not likely happen on the time the code is compiled, it happens when the code is introduced collectively for execution, an act known as linking and which occurs mechanically in Mata. It is a tremendous and unimportant distiction, however I are not looking for you to assume that every one the capabilities should be compiled on the identical time or that the order through which they’re compiled issues.

    In compile-time binding, if any capabilities are lacking when the code is introduced collectively for execution, and error message is issued.

  3. In run-time binding, the addresses are established on the time the code is executed (run), which occurs after compilation, and after linking, and is an express act carried out by you, the programmer.

To acquire the tackle of a variable or perform at run-time, you employ built-in perform findexternal(). findexternal() takes one argument, a string scalar, containing the title of the item to be discovered. The perform seems up that title and returns the tackle comparable to it, or it returns NULL if the item can’t be discovered. NULL is the phrase used to imply invalid reminiscence tackle and is in truth outlined as equaling zero.

findexternal() can be utilized solely with globals. The opposite variables that seem in your program would possibly seem to have names, however these names are used solely by the compiler and, within the compiled code, these “stack-variables” or “native variables” are referred to by their addresses. The names play no different function and aren’t even preserved, so findexternal() can’t be used to acquire their addresses. There could be no cause you’ll need findexternal() to search out their addresses as a result of, in all such circumstances, the ampersand prefix is an ideal substitute.

Capabilities, nonetheless, are international, so we will lookup capabilities. Watch:


: findexternal("x2()")
  0x19551ef8

Evaluate that with


: &x2()
  0x19551ef8

It is the identical outcome, however they have been produced in a different way. Within the findexternal() case, the 0x19551ef8 outcome was produced after the code was compiled and assembled. The worth was obtained, in truth, by execution of the findexternal() perform.

Within the &x2() case, the 0x19551ef8 outcome was obtained through the compile/meeting course of. We are able to higher perceive the excellence if we glance up a perform that doesn’t exist. I’ve no perform named x4(). Let’s receive x4()‘s tackle:


: findexternal("x4()")
  0x0

: &x4()
         :  3499  x4() not discovered

I could don’t have any perform named x4(), however that did not trouble findexternal(). It merely returned 0x0, one other means of claiming NULL.

Within the &x4() case, the compiler issued an error. The compiler, confronted with evaluating &x4(), couldn’t, and so complained.

Anyway, right here is how we might write tryit() with run-time binding utilizing the findexternal() perform:


void perform tryit() 
{
        pointer(actual scalar perform) scalar fn
        pointer(actual scalar perform) vector func
        actual scalar                          i

        func = (findexternal("x2()"), findexternal("x3()")

        for(i=1;i<=size(func);i++) {
                fn = func[i]
                (*fn)(4)
        }
}

To acquire run-time slightly than compile-time bindings, all I did was change the road


        func = (&x2(), &x3())

to be


        func = (findexternal("x2()"), findexternal("x3()")

Or we might write it this manner:


void perform tryit() 
{
        pointer(actual scalar perform) scalar fn
        string vector                        func
        actual scalar                          i

        func = ("x2()", "x3()")

        for(i=1;i<=size(func);i++) {
                fn = findexternal(func[i])
                (*fn)(4)
        }
}

On this variation, I put the names in a string vector simply as Equipment did initially. Then I modified the road that Equipment wrote,


        fn = &(func[i])

to learn


        fn = findexternal(func[i])

Both means you code it, when performing run-time binding, you the programmer ought to cope with what’s to be accomplished if the perform isn’t discovered. The loop


for(i=1;i<=size(func);i++) {
        fn = findexternal(func[i])
        (*fn)(4)
}

would higher learn


for(i=1;i<=size(func);i++) {
        fn = findexternal(func[i])
        if (fn!=NULL) {
                (*fn)(4)
        }
        else {
                ...
        }
}

In contrast to C, if you don’t embody the code for the not-found case, this system won’t crash if the perform isn’t discovered. Mata will provide you with an “invalid use of NULL pointer” error message and a traceback log.

If you happen to have been writing a program through which the consumer of your program was to go to you a perform you have been to make use of, akin to a probability perform to be maximized, you might write your program with compile-time binding by coding,


perform myopt(..., pointer(actual scalar perform) scalar f, ...)
{
        ...
        ... (*f)(...) ...
        ...
}

and the consumer would name you program my coding myopt(, &myfunc(),), or you might use run-time binding by coding


perform myopt(..., string scalar fname, ...)
{
        pointer(actual scalar perform) scalar f
        ...

        f = findexternal(fname)
        if (f==NULL) {
                errprintf("perform %s() not foundn", fname)
                exit(111)
        }
        ...
        ... (*f)(...) ...
        ...
}

and the consumer would name your program by coding myopt(, “myfunc()”,).

On this case I might be satisfied to desire the run-time binding answer for skilled code as a result of, the error being tolerated by Mata, I can write code to provide a greater, extra skilled wanting error message.



Previous article

Related Articles

Latest Articles