In my final publish, we discovered how one can import the uncooked COVID-19 knowledge from the Johns Hopkins GitHub repository and convert the uncooked knowledge to time-series knowledge. This publish will show how one can obtain uncooked knowledge and create choropleth maps like determine 1.
Determine 1: Confirmed COVID-19 circumstances in United States adjusted for inhabitants measurement
A choropleth map shows statistical details about geographical areas utilizing totally different colours or shading depth. Determine 1 shows the population-adjusted variety of confirmed circumstances of COVID-19 for every county in america as of April 2, 2020. Every shade of blue on the map represents the vary of the variety of circumstances proven within the legend on the backside left of the map. I used the community-contributed command grmap to create determine 1, and we want three items of details about every county to create this map. We’d like the geographic data for every county, the variety of confirmed circumstances in every county, and the inhabitants of every county.
Let start on the finish and work our means backward to discover ways to assemble a dataset that comprises this data. The information listed beneath have been used to create the map in determine 1. Every remark comprises details about a person county in america.
. checklist _ID GEOID _CX _CY confirmed popestimate2019 confirmed_adj ///
in 1/5, abbrev(12)
+------------------------------------------------------------------------+
| _ID GEOID _CX _CY confirmed popesti~2019 confirmed_~j |
|------------------------------------------------------------------------|
1. | 1 21007 -89.00 37.06 0 7888 0 |
2. | 2 21017 -84.22 38.21 2 19788 10 |
3. | 3 21031 -86.68 37.21 1 12879 8 |
4. | 4 21065 -83.96 37.69 0 14106 0 |
5. | 5 21069 -83.70 38.37 0 14581 0 |
+------------------------------------------------------------------------+
The primary 4 variables comprise geographic details about every county. The variable _ID comprises a singular identification quantity for every county that’s used to hyperlink with a particular file known as a “shapefile”. Shapefiles comprise the data that’s used to render the map and will probably be defined beneath. The variable GEOID is a Federal Data Processing Normal (FIPS) county code. We are able to use the FIPS code as a key variable to merge knowledge from different information. The variables _CX and _CY comprise geographic coordinates. We are able to obtain these knowledge from the United States Census Bureau.
The variable confirmed comprises the variety of confirmed circumstances of COVID-19 in every county. These knowledge have been downloaded from the Johns Hopkins GitHub repository. This isn’t the identical file that we utilized in my earlier posts.
The variable popestimate2019 comprises the inhabitants of every county. These knowledge have been downloaded from the United States Census Bureau.
The variable confirmed_adj comprises the variety of confirmed circumstances per 100,000 inhabitants. This variable is calculated by dividing the variety of circumstances for every county in confirmed by the entire inhabitants for every county in popestimate2019. The result’s multiplied by 100,000 to transform to “circumstances per 100,000 inhabitants”.
We might want to obtain knowledge from three totally different sources and merge these information right into a single dataset to assemble the dataset for our map. Let’s start by downloading and processing every of those datasets.
Record of matters
Obtain and put together the geographic knowledge
Obtain and put together the case knowledge
Obtain and put together the inhabitants knowledge
The way to merge the information and calculate adjusted counts
The way to create the choropleth map with grmap
Obtain and put together the geographic knowledge
Let’s start with the geographic knowledge. Shapefiles comprise the geographic data that grmap makes use of to create maps. Many shapefiles can be found freely on the Web, and you’ll find them utilizing a search engine. For instance, I looked for the phrases “usa shapefile”, and the primary consequence took me to the United States Census Bureau. This web site comprises shapefiles for america that present boundaries for states, congressional districts, metropolitan, micropolitan statistical areas, and plenty of others. I want to subdivide my map of america by county, so I scrolled down till I discovered the heading “County”. I want to use the file cb_2018_us_county_500k.zip.
We are able to copy the file from the web site to our native drive and use unzipfile to extract the contents of the file. You’ll be able to right-click on the file on the webpage, choose “Copy Hyperlink Location”, and paste the trail and filename to your copy command.
. copy https://www2.census.gov/geo/tiger/GENZ2018/shp/cb_2018_us_county_500k.zip ///
> cb_2018_us_county_500k.zip
. unzipfile cb_2018_us_county_500k.zip
inflating: cb_2018_us_county_500k.shp.ea.iso.xml
inflating: cb_2018_us_county_500k.shp.iso.xml
inflating: cb_2018_us_county_500k.shp
inflating: cb_2018_us_county_500k.shx
inflating: cb_2018_us_county_500k.dbf
inflating: cb_2018_us_county_500k.prj
inflating: cb_2018_us_county_500k.cpg
efficiently unzipped cb_2018_us_county_500k.zip to present listing
complete processed: 7
skipped: 0
extracted: 7
The information cb_2018_us_county_500k.shp and cb_2018_us_county_500k.dbf comprise the geographic data we want. We are able to use spshape2dta to course of the data in these information and create two Stata datasets named usacounties_shp.dta and usacounties.dta.
. spshape2dta cb_2018_us_county_500k.shp, saving(usacounties) exchange (importing .shp file) (importing .dbf file) (creating _ID spatial-unit id) (creating _CX coordinate) (creating _CY coordinate) file usacounties_shp.dta created file usacounties.dta created
The file usacounties_shp.dta is the shapefile that comprises the data that grmap will use to render the map. We don’t must do something to this file, however let’s describe and checklist its contents to see what it comprises.
. use usacounties_shp.dta, clear
. describe
Incorporates knowledge from usacounties_shp.dta
obs: 1,047,409
vars: 5 3 Apr 2020 10:36
----------------------------------------------------------------------
storage show worth
variable identify sort format label variable label
----------------------------------------------------------------------
_ID int %12.0g
_X double %10.0g
_Y double %10.0g
rec_header strL %9s
shape_order int %12.0g
----------------------------------------------------------------------
Sorted by: _ID
. checklist _ID _X _Y shape_order in 1/10, abbreviate(11)
+--------------------------------------------+
| _ID _X _Y shape_order |
|--------------------------------------------|
1. | 1 . . 1 |
2. | 1 -89.181369 37.046305 2 |
3. | 1 -89.179384 37.053012 3 |
4. | 1 -89.175725 37.062069 4 |
5. | 1 -89.171881 37.068184 5 |
|--------------------------------------------|
6. | 1 -89.168087 37.074218 6 |
7. | 1 -89.167029 37.075362 7 |
8. | 1 -89.154504 37.088907 8 |
9. | 1 -89.154311 37.089002 9 |
10. | 1 -89.151294 37.090487 10 |
+--------------------------------------------+
The shapefile, usacounties_shp.dta, comprises 1,047,409 coordinates that outline the boundaries of every county on our map. This file additionally contains the variable _ID, which is used to hyperlink these knowledge with the info in usacounties.dta. We’ll want this file later.
Subsequent, let’s use and describe the contents of usacounties.dta.
. use usacounties.dta, clear
. describe
Incorporates knowledge from usacounties.dta
obs: 3,233
vars: 12 3 Apr 2020 10:36
---------------------------------------------------------------------------
storage show worth
variable identify sort format label variable label
---------------------------------------------------------------------------
_ID int %12.0g Spatial-unit ID
_CX double %10.0g x-coordinate of space centroid
_CY double %10.0g y-coordinate of space centroid
STATEFP str2 %9s STATEFP
COUNTYFP str3 %9s COUNTYFP
COUNTYNS str8 %9s COUNTYNS
AFFGEOID str14 %14s AFFGEOID
GEOID str5 %9s GEOID
NAME str21 %21s NAME
LSAD str2 %9s LSAD
ALAND double %14.0f ALAND
AWATER double %14.0f AWATER
---------------------------------------------------------------------------
Sorted by: _ID
The primary three variables comprise geographic details about every state. The variable _ID is the spatial-unit identifier for every county that’s used to hyperlink this file with the shapefile, usacounties_shp.dta. The variables _CX and _CY are the x and y coordinates of the realm centroid for every county. The variable NAME comprises the identify of the county for every remark. The variable GEOID is the FIPS code saved as a string. We’ll want a numeric FIPS code to merge this dataset with different county-level datasets. So let’s generate a variable named fips that equals the numeric worth of GEOID.
. generate fips = actual(GEOID)
. checklist _ID GEOID fips NAME in 1/10, separator(0)
+-------------------------------+
| _ID GEOID fips NAME |
|-------------------------------|
1. | 1 21007 21007 Ballard |
2. | 2 21017 21017 Bourbon |
3. | 3 21031 21031 Butler |
4. | 4 21065 21065 Estill |
5. | 5 21069 21069 Fleming |
6. | 6 21093 21093 Hardin |
7. | 7 21099 21099 Hart |
8. | 8 21131 21131 Leslie |
9. | 9 21151 21151 Madison |
10. | 10 21155 21155 Marion |
+-------------------------------+
Let’s save our geographic knowledge and transfer on to the COVID-19 knowledge.
. save usacounties.dta file usacounties.dta saved
Obtain and put together the case knowledge
In my earlier posts, we discovered how one can obtain the uncooked COVID-19 knowledge from the Johns Hopkins GitHub repository. We’ll use a special dataset from one other department of the GitHub repository situated right here.
The file time_series_covid19_confirmed_US.csv comprises time-series knowledge for the variety of confirmed circumstances of COVID-19 for every county in america. Let’s click on on the filename to view its contents.
Every remark on this file comprises the info for a county or territory in america. The confirmed counts for every date are saved in separate variables. Let’s view the comma-delimited knowledge by clicking on the button labeled “Uncooked” subsequent to the pink arrow.
To import the uncooked case knowledge, copy the URL and the filename from the tackle bar in your internet browser, and paste the net tackle into import delimited.
. import delimited > https://uncooked.githubusercontent.com/CSSEGISandData/COVID-19/ > grasp/csse_covid_19_data/csse_covid_19_time_series/ > time_series_covid19_confirmed_US.csv (83 vars, 3,253 obs)
Observe that the URL for the info file is lengthy and wraps to a second and third line in our import delimited command. This URL have to be one line in your import delimited command. Let’s describe this dataset.
. describe
Incorporates knowledge
obs: 3,253
vars: 83
----------------------------------------------------------------------
storage show worth
variable identify sort format label variable label
----------------------------------------------------------------------
uid lengthy %12.0g UID
iso2 str2 %9s
iso3 str3 %9s
code3 int %8.0g
fips lengthy %12.0g FIPS
admin2 str21 %21s Admin2
province_state str24 %24s Province_State
country_region str2 %9s Country_Region
lat float %9.0g Lat
long_ float %9.0g Long_
combined_key str44 %44s Combined_Key
v12 byte %8.0g 1/22/20
v13 byte %8.0g 1/23/20
v14 byte %8.0g 1/24/20
v15 byte %8.0g 1/25/20
(Output omitted)
v80 lengthy %12.0g 3/30/20
v81 lengthy %12.0g 3/31/20
v82 lengthy %12.0g 4/1/20
v83 lengthy %12.0g 4/2/20
----------------------------------------------------------------------
Sorted by:
This dataset comprises 3,253 observations that comprise details about counties in america. Let’s checklist the primary 10 observations for fips, combined_key, v80, v81, v82, and v83.
. checklist fips combined_key v80 v81 v82 v83 in 1/10
+-------------------------------------------------------------+
| fips combined_key v80 v81 v82 v83 |
|-------------------------------------------------------------|
1. | 60 American Samoa, US 0 0 0 0 |
2. | 66 Guam, US 58 69 77 82 |
3. | 69 Northern Mariana Islands, US 0 2 6 6 |
4. | 72 Puerto Rico, US 174 239 286 316 |
5. | 78 Virgin Islands, US 0 30 30 30 |
|-------------------------------------------------------------|
6. | 1001 Autauga, Alabama, US 6 7 8 10 |
7. | 1003 Baldwin, Alabama, US 18 19 20 24 |
8. | 1005 Barbour, Alabama, US 0 0 0 0 |
9. | 1007 Bibb, Alabama, US 2 3 3 4 |
10. | 1009 Blount, Alabama, US 5 5 5 6 |
+-------------------------------------------------------------+
The variable fips comprises the FIPS county code that we are going to use to merge this dataset with the geographic data in usacounties_shp.dta. The variable combined_key comprises the identify of every county and state in america. And the variables v80, v81, v82, and v83 comprise the variety of confirmed circumstances of COVID-19 from March 30, 2020, by way of April 2, 2020. The newest case knowledge are saved in v83, so let’s change its identify to confirmed.
. rename v83 confirmed
We’ll encounter issues later once we merge datasets if fips comprises lacking values. So let’s drop any observations which might be lacking knowledge for fips.
. drop if lacking(fips) (2 observations deleted)
Our COVID-19 dataset is full. Let’s save the info and transfer on to the inhabitants knowledge.
. save covid19_adj, exchange file covid19_adj.dta saved
Obtain and put together the inhabitants knowledge
We may cease right here, merge the geographic knowledge with the variety of confirmed circumstances, and create a choropleth map of the variety of circumstances for every county. However this may be deceptive as a result of the populations of the counties are totally different. We’d choose to report the variety of circumstances per 100,000 individuals, and that will require realizing the variety of individuals in every county. Fortuitously, these knowledge can be found on the United States Census Bureau web site.
We are able to comply with the identical steps we used to obtain and import the case knowledge. First, right-click on the filename on the web site, then choose “Copy Hyperlink Location”, and use import delimited to import the info.
. import delimited https://www2.census.gov/programs-surveys/popest/datasets/ > 2010-2019/counties/totals/co-est2019-alldata.csv (164 vars, 3,193 obs)
I might sometimes describe the dataset at this level, however there are 164 variables on this dataset. So I’ll describe solely the related variables beneath.
. describe state county stname ctyname census2010pop popestimate2019
storage show worth
variable identify sort format label variable label
----------------------------------------------------------------------
state byte %8.0g STATE
county int %8.0g COUNTY
stname str20 %20s STNAME
ctyname str33 %33s CTYNAME
census2010pop lengthy %12.0g CENSUS2010POP
popestimate2019 lengthy %12.0g POPESTIMATE2019
The variable census2010pop comprises the inhabitants of every county primarily based on the 2010 census. However that data is 10 years outdated. The variable popestimate2019 is an estimate of the inhabitants of every county in 2019. Let’s use these knowledge as a result of they’re more moderen.
Subsequent, let’s checklist the info.
. checklist state county stname ctyname census2010pop popestimate2019 ///
in 1/5, abbreviate(14) noobs
+----------------------------------------------------------------------------+
| state county stname ctyname census2010pop popestima~2019 |
|----------------------------------------------------------------------------|
| 1 0 Alabama Alabama 4779736 4903185 |
| 1 1 Alabama Autauga County 54571 55869 |
| 1 3 Alabama Baldwin County 182265 223234 |
| 1 5 Alabama Barbour County 27457 24686 |
| 1 7 Alabama Bibb County 22915 22394 |
+----------------------------------------------------------------------------+
This dataset doesn’t embrace a variable with a FIPS county code. However we will create a variable that comprises the FIPS code utilizing the variables state and county. Visible inspection of the geographic knowledge in usacounties.dta signifies that the FIPS county code is the state code adopted by the three-digit state code. So let’s create our fips code variable by multiplying the state code by 1000 after which including the county code.
. generate fips = state*1000 + county
Let’s checklist the county knowledge to verify our work.
. checklist state county fips stname ctyname ///
in 1/5, abbreviate(14) noobs
+--------------------------------------------------+
| state county fips stname ctyname |
|--------------------------------------------------|
| 1 0 1000 Alabama Alabama |
| 1 1 1001 Alabama Autauga County |
| 1 3 1003 Alabama Baldwin County |
| 1 5 1005 Alabama Barbour County |
| 1 7 1007 Alabama Bibb County |
+--------------------------------------------------+
This dataset comprises the estimated inhabitants of every state together with the variable fips that we are going to use as a key variable to merge this knowledge file with the opposite knowledge information. Let’s save our knowledge.
. save census_popn, exchange file census_popn.dta saved
The way to merge the information and calculate adjusted counts
We’ve got created three knowledge information that comprise the data we have to create our choropleth map. The information file usacounties.dta comprises the geographic data we want within the variables _ID, _CX, and _CY. Recall that these knowledge are linked to the shapefile, usacounties_shp.dta, utilizing the variable _ID. The information file covid19_county.dta comprises the details about the variety of confirmed circumstances of COVID-19 within the variable confirmed. And the info file census_popn.dta comprises the details about the inhabitants for every county within the variable popestimate2019.
We’d like all of those variables in the identical dataset to create our map. We are able to merge these information utilizing the important thing variable fips.
Let’s start through the use of solely the variables we want from usacounties.dta.
. use _ID _CX _CY GEOID fips utilizing usacounties.dta
Subsequent, let’s merge the variety of confirmed circumstances from covid19_county.dta. The choice keepusing(province_state combined_key confirmed) specifies that we are going to merge solely the variables province_state, combined_key, and confirmed from the info file covid19_state.dta.
. merge 1:1 fips utilizing covid19_county ///
, keepusing(province_state combined_key confirmed)
(observe: variable fips was float, now double to accommodate utilizing knowledge's values)
Consequence # of obs.
-----------------------------------------
not matched 200
from grasp 91 (_merge==1)
from utilizing 109 (_merge==2)
matched 3,142 (_merge==3)
-----------------------------------------
The output tells us that 3,142 observations had matching values of fips within the two datasets. merge additionally created a brand new variable in our dataset named _merge, which equals 3 for observations with matching values of fips.
The output additionally tells us that 91 observations had a fips code within the geographic knowledge however not within the case knowledge. _merge equals 1 for these observations. Let’s checklist a few of these observations.
. checklist _ID GEOID fips combined_key confirmed ///
if _merge==1, abbreviate(15)
+-------------------------------------------------+
| _ID GEOID fips combined_key confirmed |
|-------------------------------------------------|
3143. | 502 60010 60010 . |
3144. | 503 60020 60020 . |
3145. | 1475 60030 60030 . |
3146. | 1476 60040 60040 . |
3147. | 1068 60050 60050 . |
|-------------------------------------------------|
3148. | 1210 66010 66010 . |
3149. | 504 69085 69085 . |
(Output omitted)
The primary seven observations have geographic data however no knowledge for confirmed. Let’s
depend the variety of observations the place _merge equals 1 and the info are lacking for confirmed.
. depend if _merge==1 & lacking(confirmed) 91
Let’s drop these observations from our dataset as a result of they comprise no knowledge for confirmed.
. drop if _merge==1 (91 observations deleted)
The merge output additionally tells us that 109 observations had a fips code within the case knowledge however not within the geographic knowledge. _merge equals 2 for these observations. Let’s checklist a few of these observations.
. checklist _ID GEOID fips combined_key confirmed ///
if _merge==2, abbreviate(15)
+------------------------------------------------------------------------+
| _ID GEOID fips combined_key confirmed |
|------------------------------------------------------------------------|
3143. | . 60 American Samoa, US 0 |
3144. | . 66 Guam, US 82 |
3145. | . 69 Northern Mariana Islands, US 6 |
3146. | . 72 Puerto Rico, US 316 |
3147. | . 78 Virgin Islands, US 30 |
|------------------------------------------------------------------------|
3148. | . 80001 Out of AL, Alabama, US 0 |
3149. | . 80002 Out of AK, Alaska, US 0 |
The primary seven observations have case data however no knowledge for _ID or GEOID.
A few of the observations are from American territories that aren’t states. Different observations have a worth of combined_key that means that the county data isn’t clear. Visible inspection of those observations means that many of the confirmed circumstances for these observations are zero. We are able to confirm this by depending the variety of observations the place _merge equals 2 and confirmed equals zero.
. depend if _merge==2 & confirmed==0 78
The output signifies that 78 of the 109 observations comprise no confirmed circumstances. We may examine these observations additional if we have been utilizing our outcomes to make coverage choices. However these observations are a small proportion of our dataset, and our current aim is barely to discover ways to make choropleth maps. So let’s delete these observations and the variable _merge and transfer on.
. drop if _merge==2 (109 observations deleted) . drop _merge
Let’s describe the dataset to confirm that merge was profitable.
. describe
Incorporates knowledge from usacounties.dta
obs: 3,142
vars: 8 3 Apr 2020 15:55
---------------------------------------------------------------------------
storage show worth
variable identify sort format label variable label
---------------------------------------------------------------------------
_ID int %12.0g Spatial-unit ID
_CX double %10.0g x-coordinate of space centroid
_CY double %10.0g y-coordinate of space centroid
GEOID str5 %9s GEOID
fips double %9.0g
province_state str24 %24s Province_State
combined_key str44 %44s Combined_Key
confirmed lengthy %12.0g 4/2/20
---------------------------------------------------------------------------
Sorted by:
Subsequent, let’s merge the variable popestimate2019 from the info file census_popn.dta.
. merge 1:1 fips utilizing census_popn
Consequence # of obs.
-----------------------------------------
not matched 51
from grasp 0 (_merge==1)
from utilizing 51 (_merge==2)
matched 3,142 (_merge==3)
-----------------------------------------
The output tells us that 3,142 observations had matching values of fips within the two datasets. merge once more created a brand new variable in our dataset named _merge, which equals 3 for observations with an identical worth of fips.
The output additionally tells us that 51 observations had a fips code within the geographic knowledge however not within the inhabitants knowledge. _merge equals 2 for these observations. Let’s checklist a few of these observations.
. checklist _ID GEOID fips combined_key confirmed popestimate2019 ///
if _merge==2, abbreviate(15)
+------------------------------------------------------------------+
| _ID GEOID fips combined_key confirmed popestimate2019 |
|------------------------------------------------------------------|
3143. | . 1000 . 4903185 |
3144. | . 2000 . 731545 |
3145. | . 4000 . 7278717 |
3146. | . 5000 . 3017804 |
3147. | . 6000 . 39512223 |
|------------------------------------------------------------------|
3148. | . 8000 . 5758736 |
3149. | . 9000 . 3565287 |
We’ve got no geographic or case data for these observations, so let’s drop them from our dataset in addition to drop the variable _merge.
. hold if _merge==3 (51 observations deleted) . drop _merge
Now, let’s generate, label, and format a brand new variable named confirmed_adj that comprises the population-adjusted variety of confirmed COVID-19 circumstances.
. generate confirmed_adj = 100000*(confirmed/popestimate2019) . label var confirmed_adj "Instances per 100,000" . format %16.0fc confirmed_adj
Let’s describe our dataset to confirm that it comprises all of the variables we might want to create our map.
. describe
Incorporates knowledge from usacounties.dta
obs: 3,142
vars: 10 3 Apr 2020 14:15
---------------------------------------------------------------------------
storage show worth
variable identify sort format label variable label
---------------------------------------------------------------------------
_ID int %12.0g Spatial-unit ID
_CX double %10.0g x-coordinate of space centroid
_CY double %10.0g y-coordinate of space centroid
GEOID str5 %9s GEOID
fips double %9.0g
combined_key str44 %44s Combined_Key
confirmed lengthy %12.0g 4/2/20
census2010pop lengthy %12.0g CENSUS2010POP
popestimate2019 lengthy %12.0g POPESTIMATE2019
confirmed_adj float %16.0fc Instances per 100,000
---------------------------------------------------------------------------
Sorted by:
Our dataset is full! Let’s save the dataset and discover ways to create a choropleth map.
. save covid19_adj file covid19_adj.dta saved
The way to create the choropleth map with grmap
We’ll use the community-contributed command grmap to create our choropleth map. You will need to activate grmap earlier than you employ it for the primary time.
. grmap, activate
Making a map that features Alaska and Hawaii would require the usage of choices that regulate for his or her massive distinction in measurement and for his or her not being bodily adjoining to the opposite 48 states. I want to hold our map so simple as attainable for now, so I’m going to drop the observations for Alaska and Hawaii.
. drop if inlist(NAME, "Alaska", "Hawaii") (2 observations deleted)
Subsequent, we should inform Stata that these are spatial knowledge through the use of spset. The choice modify shpfile(usastates_shp) will hyperlink our knowledge with the shapefile, usastates_shp.dta. Recall that the shapefile comprises the data that grmap will use to render the map.
. spset, modify shpfile(usacounties_shp)
(creating _ID spatial-unit id)
(creating _CX coordinate)
(creating _CY coordinate)
Sp dataset covid19_adj.dta
knowledge: cross sectional
spatial-unit id: _ID
coordinates: _CX, _CY (planar)
linked shapefile: usacounties_shp.dta
Now, we will use grmap to create a choropleth map for the population-adjusted variety of confirmed circumstances of COVID-19.
. grmap confirmed_adj, clnumber(7)
Determine 2: Choropleth map utilizing sectiles
By default, grmap divides the info into 4 teams primarily based on quartiles of confirmed_adj. I’ve used the choice clnumber(7) to divide the info into 7 teams or sectiles. You’ll be able to change the variety of teams utilizing the clnumber(#) choice, the place # is the variety of colour classes.
You may as well specify customized cutpoints utilizing the choices clmethod(customized) and clbreaks(numlist). The map beneath makes use of customized cutpoints at 0, 5, 10, 15, 20, 25, 50, 100, and 5000. I’ve additionally added a title and a subtitle.
. grmap confirmed_adj, ///
clnumber(8) ///
clmethod(customized) ///
clbreaks(0 5 10 15 20 25 50 100 5000) ///
title("Confirmed Instances of COVID-19 in america") ///
subtitle("circumstances per 100,000 inhabitants")
Determine 3: Choropleth map utilizing customized cutpoints
Conclusion and picked up code
We did it! We created a choropleth map of the population-adjusted variety of confirmed COVID-19 circumstances in every county of america! Let’s assessment the essential steps. First, we downloaded the geographic knowledge from the United States Census Bureau and transformed them to Stata knowledge information utilizing spshape2dta. Second, we downloaded, imported, and processed the COVID-19 knowledge from the Johns Hopkins GitHub repository and saved the info to a Stata knowledge file. Third, we downloaded, imported, and processed the inhabitants knowledge for every county from the United States Census Bureau and saved the info to a Stata knowledge file. Fourth, we merged the Stata knowledge information and calculated the population-adjusted variety of COVID-19 circumstances for every county. And fifth, we used spset to inform Stata that these are spatial knowledge, and we used grmap to create our choropleth map. You would comply with these steps to create a choropleth map for a lot of varieties of information, for different subdivisions of america, or for different international locations.
I’ve collected the code beneath that may reproduce figures 2 and three.
// Create the geographic datasets
clear
copy https://www2.census.gov/geo/tiger/GENZ2018/shp/cb_2018_us_county_500k.zip ///
cb_2018_us_county_500k.zip
unzipfile cb_2018_us_county_500k.zip
spshape2dta cb_2018_us_county_500k.shp, saving(usacounties) exchange
use usacounties.dta, clear
generate fips = actual(GEOID)
save usacounties.dta, exchange
// Create the COVID-19 case dataset
clear
import delimited https://uncooked.githubusercontent.com/CSSEGISandData/COVID-19/grasp/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_US.csv
rename v83 confirmed
drop if lacking(fips)
save covid19_county, exchange
// Create the inhabitants dataset
clear
import delimited https://www2.census.gov/programs-surveys/popest/datasets/2010-2019/counties/totals/co-est2019-alldata.csv
generate fips = state*1000 + county
save census_popn, exchange
// Merge the datasets
clear
use _ID _CX _CY GEOID fips utilizing usacounties.dta
merge 1:1 fips utilizing covid19_county ///
, keepusing(province_state combined_key confirmed)
hold if _merge==3
drop _merge
merge 1:1 fips utilizing census_popn ///
, keepusing(census2010pop popestimate2019)
hold if _merge==3
drop _merge
drop if inlist(province_state, "Alaska", "Hawaii")
generate confirmed_adj = 100000*(confirmed/popestimate2019)
label var confirmed_adj "Instances per 100,000"
format %16.0fc confirmed_adj
format %16.0fc confirmed popestimate2019
save covid19_adj, exchange
// Create the maps
grmap, activate
spset, modify shpfile(usacounties_shp)
grmap confirmed_adj, clnumber(7)
grmap confirmed_adj, ///
clnumber(8) ///
clmethod(customized) ///
clbreaks(0 5 10 15 20 25 50 100 5000) ///
title("Confirmed Instances of COVID-19 in america") ///
subtitle("circumstances per 100,000 inhabitants")







