Sunday, November 30, 2025

Replace to Import COVID-19 put up


In my final put up, I discussed that I didn’t need to distribute my covid19.ado file as a result of “it may very well be rendered ineffective if or when Johns Hopkins modifications its knowledge”. I wrote that on March 19, 2020, and the information modified on March 23, 2020. This can doubtless occur once more (and once more, and once more …). I’ll put up updates sooner or later as the information change, however chances are you’ll have to adapt ahead of I can put up. So let’s see how we are able to replace our code to adapt to the altering knowledge.

Let’s start by operating the code from my final weblog put up.


native URL = "https://uncooked.githubusercontent.com/CSSEGISandData/COVID-19/grasp/csse_covid_19_data/csse_covid_19_daily_reports/"
forvalues month = 1/12 {
   forvalues day = 1/31 {
      native month = string(`month', "%02.0f")
      native day = string(`day', "%02.0f")
      native yr = "2020"
      native at present = "`month'-`day'-`yr'"
      native FileName = "`URL'`at present'.csv"
      clear
      seize import delimited "`FileName'"
      seize affirm variable ïprovincestate
      if _rc == 0 {
         rename ïprovincestate provincestate
         label variable provincestate "Province/State"
      }
      seize save "`at present'", exchange
   }
}
clear
forvalues month = 1/12 {
   forvalues day = 1/31 {
      native month = string(`month', "%02.0f")
      native day = string(`day', "%02.0f")
      native yr = "2020"
      native at present = "`month'-`day'-`yr'"
      seize append utilizing "`at present'"
   }
}

One thing appears unsuitable once we describe our knowledge.

. describe

Accommodates knowledge
  obs:        11,341
 vars:            17
------------------------------------------------------------------------
              storage   show    worth
variable title   sort    format     label      variable label
------------------------------------------------------------------------
provincestate   str43   %43s                  Province/State
countryregion   str32   %32s                  Nation/Area
lastupdate      str19   %19s                  Final Replace
confirmed       lengthy    %8.0g                 Confirmed
deaths          int     %8.0g                 Deaths
recovered       lengthy    %8.0g                 Recovered
latitude        float   %9.0g                 Latitude
longitude       float   %9.0g                 Longitude
fips            lengthy    %12.0g                FIPS
admin2          str21   %21s                  Admin2
province_state  str28   %28s                  Province_State
country_region  str32   %32s                  Country_Region
last_update     str19   %19s                  Last_Update
lat             float   %9.0g                 Lat
long_           float   %9.0g                 Long_
lively          lengthy    %12.0g                Energetic
combined_key    str44   %44s                  Combined_Key
------------------------------------------------------------------------
Sorted by:
     Be aware: Dataset has modified since final saved.

We’ve variables with related names, reminiscent of provincestate and province_state, countryregion and country_region, and so forth. The variable names have modified within the newer uncooked recordsdata. However we should have the identical variable names once we append the information.

I appeared by the latest uncooked knowledge recordsdata and recognized the date on which the information modified. You are able to do this with out opening the recordsdata. You possibly can merely describe the information out of your native disk or cloud account.

The uncooked knowledge from March 22, 2020, use the previous variable names.

. describe utilizing 03-22-2020.dta

Accommodates knowledge
  obs:           309                          24 Mar 2020 11:48
 vars:             8
------------------------------------------------------------------------
              storage   show    worth
variable title   sort    format     label      variable label
------------------------------------------------------------------------
provincestate   str28   %28s                  Province/State
countryregion   str32   %32s                  Nation/Area
lastupdate      str19   %19s                  Final Replace
confirmed       lengthy    %12.0g                Confirmed
deaths          int     %8.0g                 Deaths
recovered       lengthy    %12.0g                Recovered
latitude        float   %9.0g                 Latitude
longitude       float   %9.0g                 Longitude
------------------------------------------------------------------------
Sorted by:

The uncooked knowledge from March 23, 2020, use the brand new variable names.

. describe utilizing 03-23-2020.dta

Accommodates knowledge
  obs:         3,415                          24 Mar 2020 11:48
 vars:            12
------------------------------------------------------------------------
              storage   show    worth
variable title   sort    format     label      variable label
------------------------------------------------------------------------
fips            lengthy    %12.0g                FIPS
admin2          str21   %21s                  Admin2
province_state  str28   %28s                  Province_State
country_region  str32   %32s                  Country_Region
last_update     str19   %19s                  Last_Update
lat             float   %9.0g                 Lat
long_           float   %9.0g                 Long_
confirmed       lengthy    %12.0g                Confirmed
deaths          int     %8.0g                 Deaths
recovered       lengthy    %12.0g                Recovered
lively          lengthy    %12.0g                Energetic
combined_key    str44   %44s                  Combined_Key
------------------------------------------------------------------------
Sorted by:

We might write some intelligent code to tell apart between recordsdata created earlier than and after March 23. However a easy different is to make use of seize rename to vary the variable names the place crucial within the uncooked knowledge recordsdata.

Let’s do this on the uncooked knowledge file for March 23 earlier than we incorporate it into the remainder of our code.

. use 03-23-2020.dta

. seize rename province_state provincestate

. seize rename country_region countryregion

. seize rename last_update lastupdate

. seize rename lat latitude

. seize rename lengthy longitude

. describe

Accommodates knowledge from 03-23-2020.dta
  obs:         3,415
 vars:            12                          24 Mar 2020 11:48
------------------------------------------------------------------------
              storage   show    worth
variable title   sort    format     label      variable label
------------------------------------------------------------------------
fips            lengthy    %12.0g                FIPS
admin2          str21   %21s                  Admin2
provincestate   str28   %28s                  Province_State
countryregion   str32   %32s                  Country_Region
lastupdate      str19   %19s                  Last_Update
latitude        float   %9.0g                 Lat
longitude       float   %9.0g                 Long_
confirmed       lengthy    %12.0g                Confirmed
deaths          int     %8.0g                 Deaths
recovered       lengthy    %12.0g                Recovered
lively          lengthy    %12.0g                Energetic
combined_key    str44   %44s                  Combined_Key
------------------------------------------------------------------------
Sorted by:
     Be aware: Dataset has modified since final saved.

The variable names within the new knowledge now match the variable names within the previous knowledge. Some variables within the newer knowledge didn’t seem within the previous knowledge. These new variables can be appended to the ultimate dataset however is not going to include any knowledge for dates previous to March 23.

The up to date code under will import the uncooked knowledge from the Johns Hopkins GitHub repository as of March 23, 2020. I’ve displayed the brand new instructions in crimson.


native URL = "https://uncooked.githubusercontent.com/CSSEGISandData/COVID-19/grasp/csse_covid_19_data/csse_covid_19_daily_reports/"
forvalues month = 1/12 {
   forvalues day = 1/31 {
      native month = string(`month', "%02.0f")
      native day = string(`day', "%02.0f")
      native yr = "2020"
      native at present = "`month'-`day'-`yr'"
      native FileName = "`URL'`at present'.csv"
      clear
      seize import delimited "`FileName'"
      seize affirm variable ïprovincestate
      if _rc == 0 {
         rename ïprovincestate provincestate
         label variable provincestate "Province/State"
      }
      seize rename province_state provincestate
      seize rename country_region countryregion
      seize rename last_update lastupdate
      seize rename lat latitude
      seize rename lengthy longitude

      seize save "`at present'", exchange
      }
}
clear
forvalues month = 1/12 {
   forvalues day = 1/31 {
      native month = string(`month', "%02.0f")
      native day = string(`day', "%02.0f")
      native yr = "2020"
      native at present = "`month'-`day'-`yr'"
      seize append utilizing "`at present'"
   }
}

We are able to confirm that this labored by describing the ensuing knowledge.

. describe

Accommodates knowledge
  obs:        11,341
 vars:            12
------------------------------------------------------------------------
              storage   show    worth
variable title   sort    format     label      variable label
------------------------------------------------------------------------
provincestate   str43   %43s                  Province/State
countryregion   str32   %32s                  Nation/Area
lastupdate      str19   %19s                  Final Replace
confirmed       lengthy    %8.0g                 Confirmed
deaths          int     %8.0g                 Deaths
recovered       lengthy    %8.0g                 Recovered
latitude        float   %9.0g                 Latitude
longitude       float   %9.0g                 Longitude
fips            lengthy    %12.0g                FIPS
admin2          str21   %21s                  Admin2
lively          lengthy    %12.0g                Energetic
combined_key    str44   %44s                  Combined_Key
------------------------------------------------------------------------
Sorted by:
     Be aware: Dataset has modified since final saved.

Let’s save this dataset so we are able to use it later.

. save covid19_raw
file covid19_raw.dta saved

Please word that now we have not checked and cleaned these knowledge. The code above and the ensuing knowledge ought to be used for educational functions solely.

I’ll present you how you can convert the uncooked knowledge to time-series knowledge in my subsequent put up.



Related Articles

Latest Articles