In my final put up, I discussed that I didn’t need to distribute my covid19.ado file as a result of “it may very well be rendered ineffective if or when Johns Hopkins modifications its knowledge”. I wrote that on March 19, 2020, and the information modified on March 23, 2020. This can doubtless occur once more (and once more, and once more …). I’ll put up updates sooner or later as the information change, however chances are you’ll have to adapt ahead of I can put up. So let’s see how we are able to replace our code to adapt to the altering knowledge.
Let’s start by operating the code from my final weblog put up.
native URL = "https://uncooked.githubusercontent.com/CSSEGISandData/COVID-19/grasp/csse_covid_19_data/csse_covid_19_daily_reports/"
forvalues month = 1/12 {
forvalues day = 1/31 {
native month = string(`month', "%02.0f")
native day = string(`day', "%02.0f")
native yr = "2020"
native at present = "`month'-`day'-`yr'"
native FileName = "`URL'`at present'.csv"
clear
seize import delimited "`FileName'"
seize affirm variable ïprovincestate
if _rc == 0 {
rename ïprovincestate provincestate
label variable provincestate "Province/State"
}
seize save "`at present'", exchange
}
}
clear
forvalues month = 1/12 {
forvalues day = 1/31 {
native month = string(`month', "%02.0f")
native day = string(`day', "%02.0f")
native yr = "2020"
native at present = "`month'-`day'-`yr'"
seize append utilizing "`at present'"
}
}
One thing appears unsuitable once we describe our knowledge.
. describe
Accommodates knowledge
obs: 11,341
vars: 17
------------------------------------------------------------------------
storage show worth
variable title sort format label variable label
------------------------------------------------------------------------
provincestate str43 %43s Province/State
countryregion str32 %32s Nation/Area
lastupdate str19 %19s Final Replace
confirmed lengthy %8.0g Confirmed
deaths int %8.0g Deaths
recovered lengthy %8.0g Recovered
latitude float %9.0g Latitude
longitude float %9.0g Longitude
fips lengthy %12.0g FIPS
admin2 str21 %21s Admin2
province_state str28 %28s Province_State
country_region str32 %32s Country_Region
last_update str19 %19s Last_Update
lat float %9.0g Lat
long_ float %9.0g Long_
lively lengthy %12.0g Energetic
combined_key str44 %44s Combined_Key
------------------------------------------------------------------------
Sorted by:
Be aware: Dataset has modified since final saved.
We’ve variables with related names, reminiscent of provincestate and province_state, countryregion and country_region, and so forth. The variable names have modified within the newer uncooked recordsdata. However we should have the identical variable names once we append the information.
I appeared by the latest uncooked knowledge recordsdata and recognized the date on which the information modified. You are able to do this with out opening the recordsdata. You possibly can merely describe the information out of your native disk or cloud account.
The uncooked knowledge from March 22, 2020, use the previous variable names.
. describe utilizing 03-22-2020.dta
Accommodates knowledge
obs: 309 24 Mar 2020 11:48
vars: 8
------------------------------------------------------------------------
storage show worth
variable title sort format label variable label
------------------------------------------------------------------------
provincestate str28 %28s Province/State
countryregion str32 %32s Nation/Area
lastupdate str19 %19s Final Replace
confirmed lengthy %12.0g Confirmed
deaths int %8.0g Deaths
recovered lengthy %12.0g Recovered
latitude float %9.0g Latitude
longitude float %9.0g Longitude
------------------------------------------------------------------------
Sorted by:
The uncooked knowledge from March 23, 2020, use the brand new variable names.
. describe utilizing 03-23-2020.dta
Accommodates knowledge
obs: 3,415 24 Mar 2020 11:48
vars: 12
------------------------------------------------------------------------
storage show worth
variable title sort format label variable label
------------------------------------------------------------------------
fips lengthy %12.0g FIPS
admin2 str21 %21s Admin2
province_state str28 %28s Province_State
country_region str32 %32s Country_Region
last_update str19 %19s Last_Update
lat float %9.0g Lat
long_ float %9.0g Long_
confirmed lengthy %12.0g Confirmed
deaths int %8.0g Deaths
recovered lengthy %12.0g Recovered
lively lengthy %12.0g Energetic
combined_key str44 %44s Combined_Key
------------------------------------------------------------------------
Sorted by:
We might write some intelligent code to tell apart between recordsdata created earlier than and after March 23. However a easy different is to make use of seize rename to vary the variable names the place crucial within the uncooked knowledge recordsdata.
Let’s do this on the uncooked knowledge file for March 23 earlier than we incorporate it into the remainder of our code.
. use 03-23-2020.dta
. seize rename province_state provincestate
. seize rename country_region countryregion
. seize rename last_update lastupdate
. seize rename lat latitude
. seize rename lengthy longitude
. describe
Accommodates knowledge from 03-23-2020.dta
obs: 3,415
vars: 12 24 Mar 2020 11:48
------------------------------------------------------------------------
storage show worth
variable title sort format label variable label
------------------------------------------------------------------------
fips lengthy %12.0g FIPS
admin2 str21 %21s Admin2
provincestate str28 %28s Province_State
countryregion str32 %32s Country_Region
lastupdate str19 %19s Last_Update
latitude float %9.0g Lat
longitude float %9.0g Long_
confirmed lengthy %12.0g Confirmed
deaths int %8.0g Deaths
recovered lengthy %12.0g Recovered
lively lengthy %12.0g Energetic
combined_key str44 %44s Combined_Key
------------------------------------------------------------------------
Sorted by:
Be aware: Dataset has modified since final saved.
The variable names within the new knowledge now match the variable names within the previous knowledge. Some variables within the newer knowledge didn’t seem within the previous knowledge. These new variables can be appended to the ultimate dataset however is not going to include any knowledge for dates previous to March 23.
The up to date code under will import the uncooked knowledge from the Johns Hopkins GitHub repository as of March 23, 2020. I’ve displayed the brand new instructions in crimson.
native URL = "https://uncooked.githubusercontent.com/CSSEGISandData/COVID-19/grasp/csse_covid_19_data/csse_covid_19_daily_reports/"
forvalues month = 1/12 {
forvalues day = 1/31 {
native month = string(`month', "%02.0f")
native day = string(`day', "%02.0f")
native yr = "2020"
native at present = "`month'-`day'-`yr'"
native FileName = "`URL'`at present'.csv"
clear
seize import delimited "`FileName'"
seize affirm variable ïprovincestate
if _rc == 0 {
rename ïprovincestate provincestate
label variable provincestate "Province/State"
}
seize rename province_state provincestate
seize rename country_region countryregion
seize rename last_update lastupdate
seize rename lat latitude
seize rename lengthy longitude
seize save "`at present'", exchange
}
}
clear
forvalues month = 1/12 {
forvalues day = 1/31 {
native month = string(`month', "%02.0f")
native day = string(`day', "%02.0f")
native yr = "2020"
native at present = "`month'-`day'-`yr'"
seize append utilizing "`at present'"
}
}
We are able to confirm that this labored by describing the ensuing knowledge.
. describe
Accommodates knowledge
obs: 11,341
vars: 12
------------------------------------------------------------------------
storage show worth
variable title sort format label variable label
------------------------------------------------------------------------
provincestate str43 %43s Province/State
countryregion str32 %32s Nation/Area
lastupdate str19 %19s Final Replace
confirmed lengthy %8.0g Confirmed
deaths int %8.0g Deaths
recovered lengthy %8.0g Recovered
latitude float %9.0g Latitude
longitude float %9.0g Longitude
fips lengthy %12.0g FIPS
admin2 str21 %21s Admin2
lively lengthy %12.0g Energetic
combined_key str44 %44s Combined_Key
------------------------------------------------------------------------
Sorted by:
Be aware: Dataset has modified since final saved.
Let’s save this dataset so we are able to use it later.
. save covid19_raw file covid19_raw.dta saved
Please word that now we have not checked and cleaned these knowledge. The code above and the ensuing knowledge ought to be used for educational functions solely.
I’ll present you how you can convert the uncooked knowledge to time-series knowledge in my subsequent put up.
