Sunday, April 26, 2026

Utilizing dates and occasions from different software program


Most software program shops dates and occasions numerically, as durations from some sentinel date, however they differ on the sentinel date and on the items wherein the period is saved. Stata shops dates because the variety of days since 01jan1960, and datetimes because the variety of milliseconds since 01jan1960 00:00:00.000. January 3, 2011 is saved as 18,630, and 2pm on January 3 is saved as 1,609,682,400,000. Different packages use completely different decisions for bases and items.

It typically occurs that you must course of in Stata knowledge imported from different software program and find yourself with a numerical variable recording a date or datetime within the different software program’s encoding. It’s normally potential to regulate the numeric date or datetime values to the sentinel date and items that Stata makes use of. Beneath are conversion guidelines for SAS, SPSS, R, Excel, and Open Workplace.

 
SAS

SAS shops dates because the variety of days since 01jan1960, the identical as Stata:


    . gen statadate = sasdate
    . format statadate %td

SAS shops datetimes because the variety of seconds since 01jan1960 00:00:00, assuming 86,400 seconds/day. Thus, all that’s vital is to multiply SAS datetimes by 1,000 and fix a %tc format to the end result,


    . gen double statatime = (sastime*1000)
    . format statatime %tc

It will be important that variables containing SAS datetimes, resembling sastime above, be imported as doubles into Stata.

 
SPSS

SPSS shops each dates and datetimes because the variety of seconds since 14oct1582 00:00:00, assuming 86,400 seconds/day. To transform SPSS datetimes to Stata datetimes, sort


    . gen double statatime = (spsstime*1000) + tc(14oct1582 00:00)
    . format statatime %tc

Multiplying by 1,000 converts from seconds to milliseconds. Including tc(14oct1582 00:00) accounts for the differing bases.

Perform tc() returns the desired datetime as a Stata datetime, which is to say, the variety of milliseconds between the desired datetime and 01jan1960 00:00:00.000. We have to add the distinction between SPSS’s base and Stata’s base, which is tc(14oct1582 00:00) – tc(01jan1960 00:00), however tc(01jan1960) is definitionally 0, in order that simply leaves tc(14oct1582 00:00). tc(14oct1582), on your data, is -11,903,760,000,000.

SPSS dates are the identical as SPSS datetimes, so to transform an SPSS date to a Stata date, we may sort,


    . gen double statatime = (spssdate*1000) + tc(14oct1582 00:00)
    . gen statadate        = dofc(statatime)
    . format statadate %td
    . drop statatime

Perform dofc() converts a Stata datetime to a Stata date. We will mix the above into,


    . gen statadate = dofc((spsstime*1000) + tc(14oct1582 00:00))
    . format statadate %td

 
R

R shops dates as days since 01jan1970. To transform to a Stata date,


    . gen statadate = rdate - td(01jan1970)
    . format statadate %td

Stata makes use of 01jan1960 as the bottom, R makes use of 01jan1970, so all it’s a must to do is subtract the variety of days between 01jan1970 and 01jan1960.

R shops datetimes because the variety of UTC adjusted seconds since 01jan1970 00:00:00. UTC stands for Common Time Coordinated. Somewhat than assuming 86,400 seconds/day, some UTC days have 86,401 seconds. Leap seconds are typically inserted into UTC days to maintain the clock coordinated with the Earth’s rotation. Stata’s datetime %tC format is UTC time, which is to say, it accounts for these leap seconds. Thus, to transform R datetimes to Stata, you sort


   . gen double statatime = rtime - tC(01jan1970 00:00)
   . format statatime %tC

Word using Stata’s tC() perform somewhat than tc() to acquire the variety of milliseconds between the differing bases. tc() returns the variety of seconds since 01jan1960 00:00:00 assuming 86,400 seconds/day. tC() returns the variety of seconds adjusted for leap seconds. On this case, it will not make a distinction if we mistakenly typed tc() somewhat than tC() as a result of no leap seconds had been inserted between 1960 and 1970. Had the bottom yr been 1980, nevertheless, using tC() would have been necessary. 9 additional seconds had been had been inserted between 01jan1970 and 01jan1980!

In lots of instances you might favor utilizing a time variable that ignores leap seconds. In that case, You possibly can sort


    . gen double statatime = cofC(rtime - tC(01jan1970 00:00))
    . format statatime %tc

 
Excel

Excel has used completely different date methods for various working methods. Excel for Home windows used the “1900 Date System”. Excel for Mac used the “1904 Date System”. Extra just lately, Microsoft has been standardizing on the 1900 Date System.

When you’ve got an Excel for Home windows workbook, it’s prone to be utilizing 1900.

When you’ve got an Excel for Mac workbook, it’s prone to be utilizing 1904, except it got here from a Home windows workbook initially.

Anyway, each Excels can use both encoding. See http://help.microsoft.com/kb/214330 for extra data and for instrbuctions on changing your workbook between date methods.

In any case, you’re unlikely to come across Excel numerically coded dates. When you cut-and-paste the spreadsheet into Stata’s Knowledge editor, dates and datetimes paste as strings in human-readable kind. When you use a conversion bundle, most know to transform the date for you.

 
Excel, 1900 date system

For dates on or after 01mar1900, Excel 1900 Date System shops dates as days since 30dec1899. To transform to a Stata date,


    . gen statadate = exceldate + td(30dec1899)
    . format statadate %td

Excel can retailer dates between 01jan1900 and 28feb1900, too, however the components above is not going to deal with these two months. See http://www.cpearson.com/excel/datetime.htm for extra data.

For datetimes on or after 01may1900 00:00:00, Excel 1900 Date System shops datetimes as days plus fraction of day since 30dec1899 00:00:00. To transform with a one-second decision to a Stata datetime,


    . gen double statatime = spherical((exceltime+td(30dec1899))*86400)*1000
    . format statatime %tc

 
Excel, 1904 date system

For dates on or after 01jan1904, Excel 1904 Date System shops dates as days since 01jan1904. To transform to a Stata date,

 
    . gen statadate = exceldate + td(01jan1904)
    . format statadate %td

For datetimes on or after 01jan1904 00:00:00, Excel 1904 Date System shops datetimes as days plus fraction of day since 01jan1904 00:00:00. To transform with a one-second decision to a Stata datetime,

 
    . gen double statatime = spherical((exceltime+td(01jan1904))*86400)*1000
    . format statatime %tc

 
Open Workplace

Open Workplace makes use of the Excel, 1900 Date System.

 
Why Stata has two datetime encodings

Now we have simply seen that the majority packages assume 86,400 seconds/day, however that one as a substitute makes use of UTC time, wherein days have 86,400 or 86,401 seconds, relying. Stata gives each datetime encodings, known as %tc and %tC. That turned out to be handy in translating occasions from different packages. Stata will even allow you to swap from one to the opposite utilizing the cofC() and Cofc features, so you realize you need to be asking, which ought to I exploit?

Stata’s %tc format assumes that there are 24*60*60*1,000 ms per day — 86,400 seconds per day — simply as an atomic clock does. Atomic clocks depend oscillations between the nucleus and electrons of an atom and thus present a measurement of the true passage of time.

Time of day measurements have traditionally been based mostly on astronomical commentary, which is a flowery method of claiming, based mostly on trying on the solar. The solar needs to be at its highest level at midday, proper? So nevertheless you saved monitor of time — falling grains of sand or a wound up spring — you periodically reset your clock after which went about your small business. In olden occasions it was understood that the 60 seconds per minute, 60 minutes per hour, 24 hours per day, had been theoretical objectives that no mechanical gadget may reproduce precisely. As of late, now we have have extra correct definitions for measuring time. A second is 9,192,631,770 durations of the radiation comparable to the transition between two ranges of the bottom state of caesium 133. Clearly now we have higher gear than the ancients, so drawback solved, proper? Mistaken. There are two issues. The formal definition of a second is just a bit too quick to match size of a day, and the Earth’s rotation is slowing down.

Because of this, since 1972 leap seconds have been added to atomic clocks a couple of times a yr to maintain time measurements in synchronization with the earth’s rotation. Not like leap years, nevertheless, there isn’t a components for predicting when leap seconds will happen. The Earth could also be on common slowing down, however there’s a giant random part to that. Because of this, leap seconds are decided by committee and introduced 6 months earlier than they’re inserted. Leap seconds are added, if vital, on the tip of the day on June 30 and December 31 of the yr. The inserted occasions are designated as 23:59:60.

Unadjusted atomic clocks might precisely mark the passage of actual time, however you must perceive that leap seconds are each bit as actual as each different second of the yr. As soon as a leap second is inserted, it ticks identical to every other second and actual issues can occur throughout that tick.

You might have heard of phrases resembling GMT and UTC.

GMT is the previous Greenwich Imply Time and relies on astronomical commentary. GMT has been supplanted by UTC.

UTC stands for coordinated common time and is measured by atomic clocks, sometimes corrected for leap seconds. UTC is derived from two different occasions, UT1 and TAI. UT1 is the imply photo voltaic time, with which UTC is saved in sync by the occasional addition of a leap second. TAI is the atomically measured pure time. TAI was set to GMT plus 10 seconds in 1958 and has been operating unadjusted since then. Replace 07 Jan 2010: TAI is a statistical mixture of assorted atomic chronometers and even it has not ticked uniformly over its historical past; see http://www.ucolick.org/~sla/leapsecs/timescales.html and particularly http://www.ucolick.org/~sla/leapsecs/dutc.html#TAI. (Due to Steve Allen of the UCO/Lick Observatory for correcting my understanding and for the reference.)

UNK is StataCorp’s time period for the time commonplace most individuals use. UNK stands for unknowing. UNK relies on a latest time commentary, most likely UTC, after which simply assuming that there are 86,400 seconds per day after that.

The UNK commonplace is enough for a lot of functions, and in such instances, it would be best to use %tc somewhat than the leap second-adjusted %tC encoding. In case you are utilizing computer-timestamped knowledge, nevertheless, you must discover out whether or not the timestamping system accounted for leap-second changes. Issues can come up even when you don’t care about shedding or gaining a second right here and there.

As an example, you might import timestamp values from different methods recorded within the variety of milliseconds which have handed since some agreed upon date. When you select the unsuitable encoding scheme, in case you selected tc when you must select %tC, or vice versa, more moderen occasions will likely be off by 24 seconds.

To keep away from such issues, you might determine to import and export knowledge by utilizing Human Readable Types (HRF) resembling “Fri Aug 18 14:05:36 CDT 2006”. This technique has benefits, however for %tC (UTC) encoding, occasions resembling 23:59:60 are potential. Some software program will refuse to decode such occasions.

Stata refuses to decode 23:59:60 within the %tc encoding (perform clock) and accepts it with %tC (perform Clock()). When %tC perform Clock() sees a time with a sixtieth second, Clock() verifies that the time corresponds to an official leap second. Thus, when translating from printable types, attempt assuming %tc and verify the end result for lacking values. If there are none, you may assume your use of %tc is legitimate. If there are lacking values and they’re on account of leap seconds and never another error, it’s essential to use %tC perform Clock() to translate from HRF. After that, in case you nonetheless need to work in %tc items, use perform cofC() to translate %tC values into %tc.

If precision issues, one of the best ways to course of %tC knowledge is solely to deal with them that method. The inconvenience is that you just can not assume that there are 86,400 seconds per day. To acquire the period between dates, it’s essential to subtract the 2 time values concerned. The opposite problem has to do with coping with dates sooner or later. Underneath the %tC (UTC) encoding, there isn’t a set worth for any date greater than 6 months sooner or later.

 
Recommendation

Stata gives two datetime encodings:

  1. %tC, also called UTC, which accounts for leap seconds, and
     
  2. %tc, which ignores them (it assumes 86,400 seconds/day).

Techniques differ in how they deal with time variables. My recommendation is,

  • When you get hold of knowledge from a system that accounts for leap seconds, import utilizing Stata’s %tC.
    1. When you later must export knowledge to a system that doesn’t account for leap seconds, use Stata’s cofC() perform to translate time values earlier than exporting.
       
    2. When you intend to tsset the time variable and the evaluation will likely be on the second stage or finer, simply tsset the %tC variable, specifying the suitable delta() if vital, for instance, delta(1000) for seconds.
       
    3. When you intend to tsset the time variable and the evaluation will likely be at coarser than the second stage (minute, hour, and many others.), create a %tc variable from the %tC variable (generate double tctime = cofC(tCtime<) and tsset that, specifying the suitable delta() if vital. You have to do this as a result of, in a %tC variable, there usually are not essentially 60 seconds in a minute; some minutes have 61 seconds.
  • When you get hold of knowledge from a system that ignores leap seconds, use Stata’s %tc.
    1. When you later must export knowledge to a system that does account for leap seconds, use Stata’s Cofc() perform to translate time values.
       
    2. When you intend to tsset the time variable, simply tsset it, specifying the suitable delta().

Some customers favor at all times to make use of Stata’s %tc as a result of these values are somewhat simpler to work with. You are able to do that if

  • you don’t thoughts having as much as 1 second of error and
     
  • you don’t import or export numerical values (clock ticks) from different methods which can be utilizing leap seconds, as a result of then there may very well be practically 30 seconds of collected error.

There are two issues to recollect in case you use %tC variables:

  1. The variety of seconds between two dates is a perform of when the dates occurred. 5 days from one date is just not merely a matter of including 5*24*60*60*1,000 ms. You may want so as to add one other 1,000 ms. 300 and sixty-five days from now may require including 1,000 or 2,000 ms. The longer the span, the extra you might need so as to add. The easiest way so as to add durations to %tC variables is to extract the elements, add to them, after which reconstruct from the numerical elements.
     
  2. You can’t precisely predict datetimes greater than six months into the long run. We have no idea what the %tC worth will likely be of 25dec2026 00:00:00 as a result of yearly alongside the way in which, the Worldwide Earth Rotation Reference Techniques Service (IERS) will twice announce whether or not there would be the insertion of a leap second.

You possibly can assist alleviate these inconveniences. Face west and throw rocks. The profit will likely be solely transitory if the rocks land again on Earth, so you must throw them actually arduous. I do know what you’re considering, however this doesn’t must be a coordinated effort.



Related Articles

Latest Articles