Saturday, November 29, 2025

The way to create choropleth maps utilizing the COVID-19 knowledge from Johns Hopkins College


In my final publish, we discovered how one can import the uncooked COVID-19 knowledge from the Johns Hopkins GitHub repository and convert the uncooked knowledge to time-series knowledge. This publish will show how one can obtain uncooked knowledge and create choropleth maps like determine 1.

Determine 1: Confirmed COVID-19 circumstances in United States adjusted for inhabitants measurement

A choropleth map shows statistical details about geographical areas utilizing totally different colours or shading depth. Determine 1 shows the population-adjusted variety of confirmed circumstances of COVID-19 for every county in america as of April 2, 2020. Every shade of blue on the map represents the vary of the variety of circumstances proven within the legend on the backside left of the map. I used the community-contributed command grmap to create determine 1, and we want three items of details about every county to create this map. We’d like the geographic data for every county, the variety of confirmed circumstances in every county, and the inhabitants of every county.

Let start on the finish and work our means backward to discover ways to assemble a dataset that comprises this data. The information listed beneath have been used to create the map in determine 1. Every remark comprises details about a person county in america.

. checklist _ID GEOID _CX _CY confirmed popestimate2019 confirmed_adj ///
      in 1/5, abbrev(12)

     +------------------------------------------------------------------------+
     | _ID   GEOID      _CX     _CY   confirmed   popesti~2019   confirmed_~j |
     |------------------------------------------------------------------------|
  1. |   1   21007   -89.00   37.06           0           7888              0 |
  2. |   2   21017   -84.22   38.21           2          19788             10 |
  3. |   3   21031   -86.68   37.21           1          12879              8 |
  4. |   4   21065   -83.96   37.69           0          14106              0 |
  5. |   5   21069   -83.70   38.37           0          14581              0 |
     +------------------------------------------------------------------------+

The primary 4 variables comprise geographic details about every county. The variable _ID comprises a singular identification quantity for every county that’s used to hyperlink with a particular file known as a “shapefile”. Shapefiles comprise the data that’s used to render the map and will probably be defined beneath. The variable GEOID is a Federal Data Processing Normal (FIPS) county code. We are able to use the FIPS code as a key variable to merge knowledge from different information. The variables _CX and _CY comprise geographic coordinates. We are able to obtain these knowledge from the United States Census Bureau.

The variable confirmed comprises the variety of confirmed circumstances of COVID-19 in every county. These knowledge have been downloaded from the Johns Hopkins GitHub repository. This isn’t the identical file that we utilized in my earlier posts.

The variable popestimate2019 comprises the inhabitants of every county. These knowledge have been downloaded from the United States Census Bureau.

The variable confirmed_adj comprises the variety of confirmed circumstances per 100,000 inhabitants. This variable is calculated by dividing the variety of circumstances for every county in confirmed by the entire inhabitants for every county in popestimate2019. The result’s multiplied by 100,000 to transform to “circumstances per 100,000 inhabitants”.

We might want to obtain knowledge from three totally different sources and merge these information right into a single dataset to assemble the dataset for our map. Let’s start by downloading and processing every of those datasets.

Record of matters

Obtain and put together the geographic knowledge

Obtain and put together the case knowledge

Obtain and put together the inhabitants knowledge

The way to merge the information and calculate adjusted counts

The way to create the choropleth map with grmap

Again to desk of contents

Obtain and put together the geographic knowledge

Let’s start with the geographic knowledge. Shapefiles comprise the geographic data that grmap makes use of to create maps. Many shapefiles can be found freely on the Web, and you’ll find them utilizing a search engine. For instance, I looked for the phrases “usa shapefile”, and the primary consequence took me to the United States Census Bureau. This web site comprises shapefiles for america that present boundaries for states, congressional districts, metropolitan, micropolitan statistical areas, and plenty of others. I want to subdivide my map of america by county, so I scrolled down till I discovered the heading “County”. I want to use the file cb_2018_us_county_500k.zip.

graph1

We are able to copy the file from the web site to our native drive and use unzipfile to extract the contents of the file. You’ll be able to right-click on the file on the webpage, choose “Copy Hyperlink Location”, and paste the trail and filename to your copy command.

. copy https://www2.census.gov/geo/tiger/GENZ2018/shp/cb_2018_us_county_500k.zip ///
>      cb_2018_us_county_500k.zip

. unzipfile cb_2018_us_county_500k.zip
    inflating: cb_2018_us_county_500k.shp.ea.iso.xml
    inflating: cb_2018_us_county_500k.shp.iso.xml
    inflating: cb_2018_us_county_500k.shp
    inflating: cb_2018_us_county_500k.shx
    inflating: cb_2018_us_county_500k.dbf
    inflating: cb_2018_us_county_500k.prj
    inflating: cb_2018_us_county_500k.cpg

efficiently unzipped cb_2018_us_county_500k.zip to present listing
complete processed:  7
        skipped:  0
      extracted:  7

The information cb_2018_us_county_500k.shp and cb_2018_us_county_500k.dbf comprise the geographic data we want. We are able to use spshape2dta to course of the data in these information and create two Stata datasets named usacounties_shp.dta and usacounties.dta.

. spshape2dta cb_2018_us_county_500k.shp, saving(usacounties) exchange
  (importing .shp file)
  (importing .dbf file)
  (creating _ID spatial-unit id)
  (creating _CX coordinate)
  (creating _CY coordinate)

  file usacounties_shp.dta created
  file usacounties.dta     created

The file usacounties_shp.dta is the shapefile that comprises the data that grmap will use to render the map. We don’t must do something to this file, however let’s describe and checklist its contents to see what it comprises.

. use usacounties_shp.dta, clear

. describe

Incorporates knowledge from usacounties_shp.dta
  obs:     1,047,409
 vars:             5                          3 Apr 2020 10:36
----------------------------------------------------------------------
              storage   show    worth
variable identify   sort    format     label      variable label
----------------------------------------------------------------------
_ID             int     %12.0g
_X              double  %10.0g
_Y              double  %10.0g
rec_header      strL    %9s
shape_order     int     %12.0g
----------------------------------------------------------------------
Sorted by: _ID

. checklist _ID _X _Y shape_order in 1/10, abbreviate(11)

     +--------------------------------------------+
     | _ID           _X          _Y   shape_order |
     |--------------------------------------------|
  1. |   1            .           .             1 |
  2. |   1   -89.181369   37.046305             2 |
  3. |   1   -89.179384   37.053012             3 |
  4. |   1   -89.175725   37.062069             4 |
  5. |   1   -89.171881   37.068184             5 |
     |--------------------------------------------|
  6. |   1   -89.168087   37.074218             6 |
  7. |   1   -89.167029   37.075362             7 |
  8. |   1   -89.154504   37.088907             8 |
  9. |   1   -89.154311   37.089002             9 |
 10. |   1   -89.151294   37.090487            10 |
     +--------------------------------------------+

The shapefile, usacounties_shp.dta, comprises 1,047,409 coordinates that outline the boundaries of every county on our map. This file additionally contains the variable _ID, which is used to hyperlink these knowledge with the info in usacounties.dta. We’ll want this file later.

Subsequent, let’s use and describe the contents of usacounties.dta.

. use usacounties.dta, clear

. describe

Incorporates knowledge from usacounties.dta
  obs:         3,233
 vars:            12                          3 Apr 2020 10:36
---------------------------------------------------------------------------
              storage   show    worth
variable identify   sort    format     label      variable label
---------------------------------------------------------------------------
_ID             int     %12.0g                Spatial-unit ID
_CX             double  %10.0g                x-coordinate of space centroid
_CY             double  %10.0g                y-coordinate of space centroid
STATEFP         str2    %9s                   STATEFP
COUNTYFP        str3    %9s                   COUNTYFP
COUNTYNS        str8    %9s                   COUNTYNS
AFFGEOID        str14   %14s                  AFFGEOID
GEOID           str5    %9s                   GEOID
NAME            str21   %21s                  NAME
LSAD            str2    %9s                   LSAD
ALAND           double  %14.0f                ALAND
AWATER          double  %14.0f                AWATER
---------------------------------------------------------------------------
Sorted by: _ID

The primary three variables comprise geographic details about every state. The variable _ID is the spatial-unit identifier for every county that’s used to hyperlink this file with the shapefile, usacounties_shp.dta. The variables _CX and _CY are the x and y coordinates of the realm centroid for every county. The variable NAME comprises the identify of the county for every remark. The variable GEOID is the FIPS code saved as a string. We’ll want a numeric FIPS code to merge this dataset with different county-level datasets. So let’s generate a variable named fips that equals the numeric worth of GEOID.

. generate fips = actual(GEOID)

. checklist _ID GEOID fips NAME in 1/10, separator(0)

     +-------------------------------+
     | _ID   GEOID    fips      NAME |
     |-------------------------------|
  1. |   1   21007   21007   Ballard |
  2. |   2   21017   21017   Bourbon |
  3. |   3   21031   21031    Butler |
  4. |   4   21065   21065    Estill |
  5. |   5   21069   21069   Fleming |
  6. |   6   21093   21093    Hardin |
  7. |   7   21099   21099      Hart |
  8. |   8   21131   21131    Leslie |
  9. |   9   21151   21151   Madison |
 10. |  10   21155   21155    Marion |
     +-------------------------------+

Let’s save our geographic knowledge and transfer on to the COVID-19 knowledge.

. save usacounties.dta
file usacounties.dta saved

Again to desk of contents

Obtain and put together the case knowledge

In my earlier posts, we discovered how one can obtain the uncooked COVID-19 knowledge from the Johns Hopkins GitHub repository. We’ll use a special dataset from one other department of the GitHub repository situated right here.

graph1

The file time_series_covid19_confirmed_US.csv comprises time-series knowledge for the variety of confirmed circumstances of COVID-19 for every county in america. Let’s click on on the filename to view its contents.

graph1

Every remark on this file comprises the info for a county or territory in america. The confirmed counts for every date are saved in separate variables. Let’s view the comma-delimited knowledge by clicking on the button labeled “Uncooked” subsequent to the pink arrow.

graph1

To import the uncooked case knowledge, copy the URL and the filename from the tackle bar in your internet browser, and paste the net tackle into import delimited.

. import delimited 
> https://uncooked.githubusercontent.com/CSSEGISandData/COVID-19/
> grasp/csse_covid_19_data/csse_covid_19_time_series/
> time_series_covid19_confirmed_US.csv
(83 vars, 3,253 obs)

Observe that the URL for the info file is lengthy and wraps to a second and third line in our import delimited command. This URL have to be one line in your import delimited command. Let’s describe this dataset.

. describe

Incorporates knowledge
  obs:         3,253
 vars:            83
----------------------------------------------------------------------
              storage   show    worth
variable identify   sort    format     label      variable label
----------------------------------------------------------------------
uid             lengthy    %12.0g                UID
iso2            str2    %9s
iso3            str3    %9s
code3           int     %8.0g
fips            lengthy    %12.0g                FIPS
admin2          str21   %21s                  Admin2
province_state  str24   %24s                  Province_State
country_region  str2    %9s                   Country_Region
lat             float   %9.0g                 Lat
long_           float   %9.0g                 Long_
combined_key    str44   %44s                  Combined_Key
v12             byte    %8.0g                 1/22/20
v13             byte    %8.0g                 1/23/20
v14             byte    %8.0g                 1/24/20
v15             byte    %8.0g                 1/25/20

               (Output omitted)

v80             lengthy    %12.0g                3/30/20
v81             lengthy    %12.0g                3/31/20
v82             lengthy    %12.0g                4/1/20
v83             lengthy    %12.0g                4/2/20
----------------------------------------------------------------------
Sorted by:

This dataset comprises 3,253 observations that comprise details about counties in america. Let’s checklist the primary 10 observations for fips, combined_key, v80, v81, v82, and v83.

. checklist fips combined_key v80 v81 v82 v83 in 1/10

     +-------------------------------------------------------------+
     | fips                   combined_key   v80   v81   v82   v83 |
     |-------------------------------------------------------------|
  1. |   60             American Samoa, US     0     0     0     0 |
  2. |   66                       Guam, US    58    69    77    82 |
  3. |   69   Northern Mariana Islands, US     0     2     6     6 |
  4. |   72                Puerto Rico, US   174   239   286   316 |
  5. |   78             Virgin Islands, US     0    30    30    30 |
     |-------------------------------------------------------------|
  6. | 1001           Autauga, Alabama, US     6     7     8    10 |
  7. | 1003           Baldwin, Alabama, US    18    19    20    24 |
  8. | 1005           Barbour, Alabama, US     0     0     0     0 |
  9. | 1007              Bibb, Alabama, US     2     3     3     4 |
 10. | 1009            Blount, Alabama, US     5     5     5     6 |
     +-------------------------------------------------------------+

The variable fips comprises the FIPS county code that we are going to use to merge this dataset with the geographic data in usacounties_shp.dta. The variable combined_key comprises the identify of every county and state in america. And the variables v80, v81, v82, and v83 comprise the variety of confirmed circumstances of COVID-19 from March 30, 2020, by way of April 2, 2020. The newest case knowledge are saved in v83, so let’s change its identify to confirmed.

. rename v83 confirmed

We’ll encounter issues later once we merge datasets if fips comprises lacking values. So let’s drop any observations which might be lacking knowledge for fips.

. drop if lacking(fips)
(2 observations deleted)

Our COVID-19 dataset is full. Let’s save the info and transfer on to the inhabitants knowledge.

. save covid19_adj, exchange
file covid19_adj.dta saved

Again to desk of contents

Obtain and put together the inhabitants knowledge

We may cease right here, merge the geographic knowledge with the variety of confirmed circumstances, and create a choropleth map of the variety of circumstances for every county. However this may be deceptive as a result of the populations of the counties are totally different. We’d choose to report the variety of circumstances per 100,000 individuals, and that will require realizing the variety of individuals in every county. Fortuitously, these knowledge can be found on the United States Census Bureau web site.

graph1

We are able to comply with the identical steps we used to obtain and import the case knowledge. First, right-click on the filename on the web site, then choose “Copy Hyperlink Location”, and use import delimited to import the info.

. import delimited https://www2.census.gov/programs-surveys/popest/datasets/
> 2010-2019/counties/totals/co-est2019-alldata.csv
(164 vars, 3,193 obs)

I might sometimes describe the dataset at this level, however there are 164 variables on this dataset. So I’ll describe solely the related variables beneath.

. describe state county stname ctyname census2010pop popestimate2019

              storage   show    worth
variable identify   sort    format     label      variable label
----------------------------------------------------------------------
state           byte    %8.0g                 STATE
county          int     %8.0g                 COUNTY
stname          str20   %20s                  STNAME
ctyname         str33   %33s                  CTYNAME
census2010pop   lengthy    %12.0g                CENSUS2010POP
popestimate2019 lengthy    %12.0g                POPESTIMATE2019

The variable census2010pop comprises the inhabitants of every county primarily based on the 2010 census. However that data is 10 years outdated. The variable popestimate2019 is an estimate of the inhabitants of every county in 2019. Let’s use these knowledge as a result of they’re more moderen.

Subsequent, let’s checklist the info.

. checklist state county stname ctyname census2010pop popestimate2019 ///
      in 1/5, abbreviate(14) noobs

  +----------------------------------------------------------------------------+
  | state   county    stname          ctyname   census2010pop   popestima~2019 |
  |----------------------------------------------------------------------------|
  |     1        0   Alabama          Alabama         4779736          4903185 |
  |     1        1   Alabama   Autauga County           54571            55869 |
  |     1        3   Alabama   Baldwin County          182265           223234 |
  |     1        5   Alabama   Barbour County           27457            24686 |
  |     1        7   Alabama      Bibb County           22915            22394 |
  +----------------------------------------------------------------------------+

This dataset doesn’t embrace a variable with a FIPS county code. However we will create a variable that comprises the FIPS code utilizing the variables state and county. Visible inspection of the geographic knowledge in usacounties.dta signifies that the FIPS county code is the state code adopted by the three-digit state code. So let’s create our fips code variable by multiplying the state code by 1000 after which including the county code.

. generate fips = state*1000 + county

Let’s checklist the county knowledge to verify our work.

. checklist state county fips stname ctyname ///
      in 1/5, abbreviate(14) noobs

  +--------------------------------------------------+
  | state   county   fips    stname          ctyname |
  |--------------------------------------------------|
  |     1        0   1000   Alabama          Alabama |
  |     1        1   1001   Alabama   Autauga County |
  |     1        3   1003   Alabama   Baldwin County |
  |     1        5   1005   Alabama   Barbour County |
  |     1        7   1007   Alabama      Bibb County |
  +--------------------------------------------------+

This dataset comprises the estimated inhabitants of every state together with the variable fips that we are going to use as a key variable to merge this knowledge file with the opposite knowledge information. Let’s save our knowledge.

. save census_popn, exchange
file census_popn.dta saved

Again to desk of contents

The way to merge the information and calculate adjusted counts

We’ve got created three knowledge information that comprise the data we have to create our choropleth map. The information file usacounties.dta comprises the geographic data we want within the variables _ID, _CX, and _CY. Recall that these knowledge are linked to the shapefile, usacounties_shp.dta, utilizing the variable _ID. The information file covid19_county.dta comprises the details about the variety of confirmed circumstances of COVID-19 within the variable confirmed. And the info file census_popn.dta comprises the details about the inhabitants for every county within the variable popestimate2019.

We’d like all of those variables in the identical dataset to create our map. We are able to merge these information utilizing the important thing variable fips.

Let’s start through the use of solely the variables we want from usacounties.dta.

. use _ID _CX _CY GEOID fips utilizing usacounties.dta

Subsequent, let’s merge the variety of confirmed circumstances from covid19_county.dta. The choice keepusing(province_state combined_key confirmed) specifies that we are going to merge solely the variables province_state, combined_key, and confirmed from the info file covid19_state.dta.

. merge 1:1 fips utilizing covid19_county  ///
       , keepusing(province_state combined_key confirmed)
(observe: variable fips was float, now double to accommodate utilizing knowledge's values)

    Consequence                           # of obs.
    -----------------------------------------
    not matched                           200
        from grasp                        91  (_merge==1)
        from utilizing                        109  (_merge==2)

    matched                             3,142  (_merge==3)
    -----------------------------------------

The output tells us that 3,142 observations had matching values of fips within the two datasets. merge additionally created a brand new variable in our dataset named _merge, which equals 3 for observations with matching values of fips.

The output additionally tells us that 91 observations had a fips code within the geographic knowledge however not within the case knowledge. _merge equals 1 for these observations. Let’s checklist a few of these observations.

. checklist _ID GEOID fips combined_key confirmed ///
      if _merge==1, abbreviate(15)

      +-------------------------------------------------+
      |  _ID   GEOID    fips   combined_key   confirmed |
      |-------------------------------------------------|
3143. |  502   60010   60010                          . |
3144. |  503   60020   60020                          . |
3145. | 1475   60030   60030                          . |
3146. | 1476   60040   60040                          . |
3147. | 1068   60050   60050                          . |
      |-------------------------------------------------|
3148. | 1210   66010   66010                          . |
3149. |  504   69085   69085                          . |
        (Output omitted)

The primary seven observations have geographic data however no knowledge for confirmed. Let’s
depend the variety of observations the place _merge equals 1 and the info are lacking for confirmed.

. depend if _merge==1 & lacking(confirmed)
  91

Let’s drop these observations from our dataset as a result of they comprise no knowledge for confirmed.

. drop if _merge==1
(91 observations deleted)

The merge output additionally tells us that 109 observations had a fips code within the case knowledge however not within the geographic knowledge. _merge equals 2 for these observations. Let’s checklist a few of these observations.

. checklist _ID GEOID fips combined_key confirmed ///
      if _merge==2, abbreviate(15)

      +------------------------------------------------------------------------+
      | _ID   GEOID    fips                           combined_key   confirmed |
      |------------------------------------------------------------------------|
3143. |   .              60                     American Samoa, US           0 |
3144. |   .              66                               Guam, US          82 |
3145. |   .              69           Northern Mariana Islands, US           6 |
3146. |   .              72                        Puerto Rico, US         316 |
3147. |   .              78                     Virgin Islands, US          30 |
      |------------------------------------------------------------------------|
3148. |   .           80001                 Out of AL, Alabama, US           0 |
3149. |   .           80002                  Out of AK, Alaska, US           0 |

The primary seven observations have case data however no knowledge for _ID or GEOID.
A few of the observations are from American territories that aren’t states. Different observations have a worth of combined_key that means that the county data isn’t clear. Visible inspection of those observations means that many of the confirmed circumstances for these observations are zero. We are able to confirm this by depending the variety of observations the place _merge equals 2 and confirmed equals zero.

. depend if _merge==2 & confirmed==0
  78

The output signifies that 78 of the 109 observations comprise no confirmed circumstances. We may examine these observations additional if we have been utilizing our outcomes to make coverage choices. However these observations are a small proportion of our dataset, and our current aim is barely to discover ways to make choropleth maps. So let’s delete these observations and the variable _merge and transfer on.

. drop if _merge==2
(109 observations deleted)

. drop _merge

Let’s describe the dataset to confirm that merge was profitable.

. describe

Incorporates knowledge from usacounties.dta
  obs:         3,142
 vars:             8                          3 Apr 2020 15:55
---------------------------------------------------------------------------
              storage   show    worth
variable identify   sort    format     label      variable label
---------------------------------------------------------------------------
_ID             int     %12.0g                Spatial-unit ID
_CX             double  %10.0g                x-coordinate of space centroid
_CY             double  %10.0g                y-coordinate of space centroid
GEOID           str5    %9s                   GEOID
fips            double  %9.0g
province_state  str24   %24s                  Province_State
combined_key    str44   %44s                  Combined_Key
confirmed       lengthy    %12.0g                4/2/20
---------------------------------------------------------------------------
Sorted by:

Subsequent, let’s merge the variable popestimate2019 from the info file census_popn.dta.

. merge 1:1 fips utilizing census_popn

    Consequence                           # of obs.
    -----------------------------------------
    not matched                            51
        from grasp                         0  (_merge==1)
        from utilizing                         51  (_merge==2)

    matched                             3,142  (_merge==3)
    -----------------------------------------

The output tells us that 3,142 observations had matching values of fips within the two datasets. merge once more created a brand new variable in our dataset named _merge, which equals 3 for observations with an identical worth of fips.

The output additionally tells us that 51 observations had a fips code within the geographic knowledge however not within the inhabitants knowledge. _merge equals 2 for these observations. Let’s checklist a few of these observations.

. checklist _ID GEOID fips combined_key confirmed popestimate2019 ///
      if _merge==2, abbreviate(15)

      +------------------------------------------------------------------+
      | _ID   GEOID    fips   combined_key   confirmed   popestimate2019 |
      |------------------------------------------------------------------|
3143. |   .            1000                          .           4903185 |
3144. |   .            2000                          .            731545 |
3145. |   .            4000                          .           7278717 |
3146. |   .            5000                          .           3017804 |
3147. |   .            6000                          .          39512223 |
      |------------------------------------------------------------------|
3148. |   .            8000                          .           5758736 |
3149. |   .            9000                          .           3565287 |

We’ve got no geographic or case data for these observations, so let’s drop them from our dataset in addition to drop the variable _merge.

. hold if _merge==3
(51 observations deleted)

. drop _merge

Now, let’s generate, label, and format a brand new variable named confirmed_adj that comprises the population-adjusted variety of confirmed COVID-19 circumstances.

. generate confirmed_adj = 100000*(confirmed/popestimate2019)

. label var confirmed_adj "Instances per 100,000"

. format %16.0fc confirmed_adj

Let’s describe our dataset to confirm that it comprises all of the variables we might want to create our map.

. describe

Incorporates knowledge from usacounties.dta
  obs:         3,142
 vars:            10                          3 Apr 2020 14:15
---------------------------------------------------------------------------
              storage   show    worth
variable identify   sort    format     label      variable label
---------------------------------------------------------------------------
_ID             int     %12.0g                Spatial-unit ID
_CX             double  %10.0g                x-coordinate of space centroid
_CY             double  %10.0g                y-coordinate of space centroid
GEOID           str5    %9s                   GEOID
fips            double  %9.0g
combined_key    str44   %44s                  Combined_Key
confirmed       lengthy    %12.0g                4/2/20
census2010pop   lengthy    %12.0g                CENSUS2010POP
popestimate2019 lengthy    %12.0g                POPESTIMATE2019
confirmed_adj   float   %16.0fc               Instances per 100,000
---------------------------------------------------------------------------
Sorted by:

Our dataset is full! Let’s save the dataset and discover ways to create a choropleth map.

. save covid19_adj
file covid19_adj.dta saved

Again to desk of contents

The way to create the choropleth map with grmap

We’ll use the community-contributed command grmap to create our choropleth map. You will need to activate grmap earlier than you employ it for the primary time.

. grmap, activate

Making a map that features Alaska and Hawaii would require the usage of choices that regulate for his or her massive distinction in measurement and for his or her not being bodily adjoining to the opposite 48 states. I want to hold our map so simple as attainable for now, so I’m going to drop the observations for Alaska and Hawaii.

. drop if inlist(NAME, "Alaska", "Hawaii")
(2 observations deleted)

Subsequent, we should inform Stata that these are spatial knowledge through the use of spset. The choice modify shpfile(usastates_shp) will hyperlink our knowledge with the shapefile, usastates_shp.dta. Recall that the shapefile comprises the data that grmap will use to render the map.

. spset, modify shpfile(usacounties_shp)
  (creating _ID spatial-unit id)
  (creating _CX coordinate)
  (creating _CY coordinate)
  Sp dataset covid19_adj.dta
                knowledge:  cross sectional
     spatial-unit id:  _ID
         coordinates:  _CX, _CY (planar)
    linked shapefile:  usacounties_shp.dta

Now, we will use grmap to create a choropleth map for the population-adjusted variety of confirmed circumstances of COVID-19.

. grmap confirmed_adj, clnumber(7)

Determine 2: Choropleth map utilizing sectiles

graph1

By default, grmap divides the info into 4 teams primarily based on quartiles of confirmed_adj. I’ve used the choice clnumber(7) to divide the info into 7 teams or sectiles. You’ll be able to change the variety of teams utilizing the clnumber(#) choice, the place # is the variety of colour classes.

You may as well specify customized cutpoints utilizing the choices clmethod(customized) and clbreaks(numlist). The map beneath makes use of customized cutpoints at 0, 5, 10, 15, 20, 25, 50, 100, and 5000. I’ve additionally added a title and a subtitle.

. grmap confirmed_adj,                                            ///
       clnumber(8)                                               ///
       clmethod(customized)                                          ///
       clbreaks(0 5 10 15 20 25 50 100 5000)                     ///
       title("Confirmed Instances of COVID-19 in america") ///
       subtitle("circumstances per 100,000 inhabitants")

Determine 3: Choropleth map utilizing customized cutpoints

graph1

Conclusion and picked up code

We did it! We created a choropleth map of the population-adjusted variety of confirmed COVID-19 circumstances in every county of america! Let’s assessment the essential steps. First, we downloaded the geographic knowledge from the United States Census Bureau and transformed them to Stata knowledge information utilizing spshape2dta. Second, we downloaded, imported, and processed the COVID-19 knowledge from the Johns Hopkins GitHub repository and saved the info to a Stata knowledge file. Third, we downloaded, imported, and processed the inhabitants knowledge for every county from the United States Census Bureau and saved the info to a Stata knowledge file. Fourth, we merged the Stata knowledge information and calculated the population-adjusted variety of COVID-19 circumstances for every county. And fifth, we used spset to inform Stata that these are spatial knowledge, and we used grmap to create our choropleth map. You would comply with these steps to create a choropleth map for a lot of varieties of information, for different subdivisions of america, or for different international locations.

I’ve collected the code beneath that may reproduce figures 2 and three.


// Create the geographic datasets
clear
copy https://www2.census.gov/geo/tiger/GENZ2018/shp/cb_2018_us_county_500k.zip ///
     cb_2018_us_county_500k.zip
unzipfile cb_2018_us_county_500k.zip
spshape2dta cb_2018_us_county_500k.shp, saving(usacounties) exchange
use usacounties.dta, clear
generate fips = actual(GEOID)
save usacounties.dta, exchange

// Create the COVID-19 case dataset
clear
import delimited https://uncooked.githubusercontent.com/CSSEGISandData/COVID-19/grasp/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_US.csv
rename v83 confirmed
drop if lacking(fips)
save covid19_county, exchange

// Create the inhabitants dataset
clear
import delimited https://www2.census.gov/programs-surveys/popest/datasets/2010-2019/counties/totals/co-est2019-alldata.csv
generate fips = state*1000 + county
save census_popn, exchange

// Merge the datasets
clear
use _ID _CX _CY GEOID fips utilizing usacounties.dta
merge 1:1 fips utilizing covid19_county  ///
     , keepusing(province_state combined_key confirmed)
hold if _merge==3
drop _merge
merge 1:1 fips utilizing census_popn  ///
     , keepusing(census2010pop popestimate2019)
hold if _merge==3
drop _merge
drop if inlist(province_state, "Alaska", "Hawaii")
generate confirmed_adj = 100000*(confirmed/popestimate2019)
label var confirmed_adj "Instances per 100,000"
format %16.0fc confirmed_adj
format %16.0fc confirmed popestimate2019
save covid19_adj, exchange


// Create the maps
grmap, activate
spset, modify shpfile(usacounties_shp)
grmap confirmed_adj, clnumber(7)
grmap confirmed_adj,                                           ///
     clnumber(8)                                               ///
     clmethod(customized)                                          ///
     clbreaks(0 5 10 15 20 25 50 100 5000)                     ///
     title("Confirmed Instances of COVID-19 in america") ///
     subtitle("circumstances per 100,000 inhabitants")



Related Articles

Latest Articles