%21x is a Stata show format, simply as are %f, %g, %9.2f, %td, and so forth. You could possibly put %21x on any variable in your dataset, however that’s not its objective. Moderately, %21x is to be used with Stata’s show command for these wanting to raised perceive the accuracy of the calculations they make. We use %21x often in growing Stata.
%21x produces output that appears like this:
. show %21x 1 +1.0000000000000X+000 . show %21x 2 +1.0000000000000X+001 . show %21x 10 +1.4000000000000X+003 . show %21x sqrt(2) +1.6a09e667f3bcdX+000
All proper, I admit that the result’s fairly unreadable to the uninitiated. The aim of %21x is to point out floating-point numbers precisely as the pc shops them and thinks about them. In %21x’s protection, it’s extra readable than how the pc actually data floating-point numbers, but it loses not one of the mathematical essence. Computer systems actually report floating-point numbers like this:
1 = 3ff0000000000000
2 = 4000000000000000
10 = 4024000000000000
sqrt(2) = 3ff6a09e667f3bcd
Or extra accurately, they report floating-point numbers in binary, like this:
1 = 0011111111110000000000000000000000000000000000000000000000000000
2 = 0100000000000000000000000000000000000000000000000000000000000000
10 = 0100000000100100000000000000000000000000000000000000000000000000
sqrt(2) = 0011111111110110101000001001111001100110011111110011101111001101
By comparability, %21x is a mannequin of readability.
The above numbers are 8-byte floating level, also referred to as double precision, encoded in binary64 IEEE 754-2008 little endian format. Little endian signifies that the bytes are ordered, left to proper, from least important to most important. Some computer systems retailer floating-point numbers in huge endian format — bytes ordered from most important to least important — after which numbers appear to be this:
1 = 000000000000f03f
2 = 0000000000000040
10 = 0000000000004024
sqrt(2) = cd3b7f669ea0f63f
or:
1 = 0000000000000000000000000000000000000000000000001111000000111111
2 = 0000000000000000000000000000000000000000000000000000000000000100
10 = 0000000000000000000000000000000000000000000000000100000000100100
sqrt(2) = 1100110100111011011111110110011010011110000011111111011000111111
No matter that, %21x produces the identical output:
. show %21x 1 +1.0000000000000X+000 . show %21x 2 +1.0000000000000X+001 . show %21x 10 +1.4000000000000X+003 . show %21x sqrt(2) +1.6a09e667f3bcdX+000
Binary computer systems retailer floating-point numbers as a quantity pair, (a, b); the specified quantity z is encoded
z = a * 2^b
For instance,
1 = 1.00 * 2^0 2 = 1.00 * 2^1 10 = 1.25 * 2^3
The quantity pairs are encrypted within the bit patterns, comparable to 00111111…01, above.
I’ve written the elements a and b in decimal, however for causes that may turn out to be clear, we have to protect the important binaryness of the pc’s quantity. We might write the numbers in binary, however they are going to be extra readable if we signify them in base-16:
| base-10 | base-16 floating level |
|
|---|---|---|
| 1 | = | 1.00 * 2^0 |
| 2 | = | 1.00 * 2^1 |
| 10 | = | 1.40 * 2^3 |
“1.40?”, you ask, wanting on the final row, which signifies 1.40*2^3 for decimal 10.
The interval in 1.40 isn’t a decimal level; it’s a hexadecimal level. The primary digit after the hexadecimal level is the quantity for 1/16ths, the following is for 1/(16^2)=1/256ths, and so forth. Thus, 1.40 hexadecimal equals 1 + 4*(1/16) + 0*(1/256) = 1.25 in decimal.
And that’s the way you learn %21x values +1.0000000000000X+000, +1.0000000000000X+001, and +1.4000000000000X+003. To wit,
| base-10 | base-16 floating level |
%21x | ||
|---|---|---|---|---|
| 1 | = | 1.00 * 2^0 | = | +1.0000000000000X+000 |
| 2 | = | 1.00 * 2^1 | = | +1.0000000000000X+001 |
| 10 | = | 1.40 * 2^3 | = | +1.4000000000000X+003 |
The mantissa is proven to the left of the X, and, to the correct of the X, the exponent for the two. %21x is nothing greater than a binary variation of the %e format with which we’re all acquainted, for instance, 12 = 1.20000e+01 = 1.2*10^1. It’s such an apparent generalization that one would guess it has existed for a very long time, so excuse me after I point out that we invented it at StataCorp. If I weren’t so humble, I might emphasize that this human-readable approach of representing binary floating-point numbers preserves practically each side of the IEEE floating-point quantity. Being humble, I’ll merely observe that 1.40x+003 is extra readable than 4024000000000000.
Now that you understand how to learn %21x, let me present you ways you would possibly use it. %21x is especially helpful for analyzing precision points.
For example, the dice root of 8 is 2; 2*2*2 = 8. And but, in Stata, 8^(1/3) isn’t equal to 2:
. show 8^(1/3)2 . assert 8^(1/3) == 2 assertion is fake r(9) ; . show %20.0g 8^(1/3) 1.99999999999999978
I blogged about that beforehand; see How Stata calculates powers. The error isn’t a lot:
. show 8^(1/3)-2-2.220e-16
In %21x format, nevertheless, we will see that the error is just one bit:
. show %21x 8^(1/3) +1.fffffffffffffX+000 . show %21x 2 +1.0000000000000X+001
I want the reply for 8^(1/3) had been +1.0000000000001X+000, as a result of then the one-bit error would have been apparent to you. As a substitute, reasonably than being a bit too giant, the precise reply is a bit too small — one bit too small to be actual — so we find yourself with +1.fffffffffffffX+000.
One bit off means being off by 2^(-52), which is 2.220e-16, and which is the quantity we noticed once we displayed in base-10 8^(1/3)-2. So %21x didn’t reveal something we couldn’t have found out in different methods. The character of the error, nevertheless, is extra apparent in %21x format than it’s in a base-10 format.
On Statalist, the purpose typically comes up that 0.1, 0.2, …, 0.4, 0.6, …, 0.9, 0.11, 0.12, … haven’t any actual illustration within the binary base that computer systems use. That turns into apparent with %21x format:
. show %21x 0.1 +1.999999999999aX-004 . show %21x 0.2 +1.999999999999aX-003. ...
0.5 does have a precise illustration, in fact, as do all of the unfavourable powers of two:
. show %21x 0.5 // 1/2 +1.0000000000000X-001 . show %21x 0.25 // 1/4 +1.0000000000000X-002 . show %21x 0.125 // 1/8 +1.0000000000000X-003 . show %21x 0.0625 // 1/16 +1.0000000000000X-004 . ...
Integers have actual representations, too:
. show %21x 1 +1.0000000000000X+000 . show %21x 2 +1.0000000000000X+001 . show %21x 3 +1.8000000000000X+001 . ... . show %21x 10 +1.4000000000000X+003 . ... . show %21x 10786204 +1.492b380000000X+017 . ...
%21x is an effective way of turning into accustomed to base-16 (equivalently, base-2), which is price doing in case you program base-16 (equivalently, base-2) computer systems.
Let me present you one thing helpful that may be finished with %21x.
A programmer at StataCorp has carried out a brand new statistical command. In 4 examples, this system produces the next outcomes:
41.8479499816895 6.7744922637939 0.1928647905588 1.6006311178207
With none extra info, I can inform you that this system has a bug, and that StataCorp is not going to be publishing the code till the bug is mounted!
How can I do know that this program has a bug with out even realizing what’s being calculated? Let me present you the above ends in %21x format:
+1.4ec89a0000000X+005 +1.b191480000000X+002 +1.8afcb20000000X-003 +1.99c2f60000000X+000
Do you see what I see? It’s all these zeros. In randomly drawn issues, it will be unlikely that there could be all zeros on the finish of every consequence. What is probably going is that the outcomes had been one way or the other rounded, and certainly they had been. The rounding on this case was as a consequence of utilizing float (4-byte) precision inadvertently. The programmer forgot to incorporate a double within the ado-file.
And that’s a method %21x is used.
I’m regularly harping on programmers at StataCorp that if they’ll program binary computer systems, they should assume in binary. I am going ballistic after I see a comparability that’s coded as “if (abs(x-y)<1e-8) …” in an try to take care of numerical inaccuracy. What sort of quantity is 1e-8? Nicely, it’s this sort of quantity:
. show %21x 1e-8 +1.5798ee2308c3aX-01b
Why put the pc to all that work, and precisely what number of digits are you, the programmer, making an attempt to disregard? Moderately than 1e-8, why not use the “good” numbers 7.451e-09 or 3.725e-09, which is to say, 1.0x-1b or 1.0x-1c? In case you try this, then I can see precisely what number of digits you’re ignoring. In case you code 1.0x-1b, I can see you’re ignoring 1b=27 binary digits. In case you code 1.0x-1c, I can see you’re ignoring 1c=28 binary digits. Now, what number of digits do that you must ignore? How imprecise do you actually assume your calculation is? By the way in which, Stata understands numbers comparable to 1.0x-1b and 1.0x-1c as enter, so you possibly can sort the exact quantity you need.
As one other instance of pondering in binary, a StataCorp programmer as soon as described a calculation he was making. At one level, the programmer wanted to normalize a quantity in a specific approach, and so calculated x/10^trunc(log10(x)), and held onto the ten^trunc(log10(x)) for denormalization later. Dividing by 10, 100, and many others., could also be straightforward for us people, however it’s not straightforward in binary, and it may end up in very small quantities of dreaded round-off error. And why even trouble to calculate the log, which is an costly operation? “Bear in mind,” I stated, “how floating-point numbers are recorded on a pc: z = a*2^b, the place 0 < = |a| < 2. Writing in C, it’s straightforward to extract elements. In reality, isn’t a quantity normalized to be between 0 and a pair of even higher in your functions?” Sure, it turned out it was.
Even I generally neglect to assume in binary. Simply final week I used to be engaged on an issue and Alan Riley instructed an answer. I believed some time. “Very intelligent,” I stated. “Recasting the issue in powers of two will do away with that divide that prompted half the issue. Even so, there’s nonetheless the pesky subtraction.” Alan checked out me, imitating a glance I so typically give others. “In binary,” Alan patiently defined to me, “the distinction you want is the final 19 bits of the unique quantity. Simply masks out the opposite digits.”
At this level, a lot of you could wish to cease studying and go off and play with %21x. In case you play with %21x lengthy sufficient, you’ll finally study the connection between numbers recorded as Stata floats and as Stata doubles, and you could uncover one thing you assume to be an error. I’ll talk about that subsequent week in my subsequent weblog posting.
