Saturday, November 29, 2025

The right way to create animated choropleth maps utilizing the COVID-19 knowledge from Johns Hopkins College


In my earlier posts, I confirmed methods to obtain the COVID-19 knowledge from the Johns Hopkins GitHub repository, graph the info over time, and create choropleth maps. Now, I’m going to indicate you methods to create animated choropleth maps to discover the distribution of COVID-19 over time and place.

The video beneath exhibits the cumulative variety of COVID-19 circumstances per 100,000 inhabitants for every county in the US from January 22, 2020, via April 5, 2020. The map doesn’t change a lot till mid-March, when the virus begins to unfold sooner. Then, we will see when and the place persons are being contaminated. You’ll be able to click on on the “Play” icon on the video to play it and click on on the icon on the underside proper to view the video in full-screen mode.

In my final publish, we discovered methods to create a choropleth map. We might want to be taught two extra abilities to create an animated map: methods to create a map for every date within the dataset and methods to mix the gathering of maps right into a video file.

The right way to create a map for every date

Let’s start by importing and describing the uncooked knowledge from the Johns Hopkins GitHub repository. Word that I imported these knowledge on April 5.

. import delimited https://uncooked.githubusercontent.com/CSSEGISandData/COVID-19/
> grasp/csse_covid_19_data/csse_covid_19_time_series/
> time_series_covid19_confirmed_US.csv
(86 vars, 3,253 obs)

. describe

Incorporates knowledge
  obs:         3,253
 vars:            86
--------------------------------------------------------------------------------
              storage   show    worth
variable identify   kind    format     label      variable label
--------------------------------------------------------------------------------
uid             lengthy    %12.0g                UID
iso2            str2    %9s
iso3            str3    %9s
code3           int     %8.0g
fips            lengthy    %12.0g                FIPS
admin2          str21   %21s                  Admin2
province_state  str24   %24s                  Province_State
country_region  str2    %9s                   Country_Region
lat             float   %9.0g                 Lat
long_           float   %9.0g                 Long_
combined_key    str44   %44s                  Combined_Key
v12             byte    %8.0g                 1/22/20
v13             byte    %8.0g                 1/23/20
v14             byte    %8.0g                 1/24/20
v15             byte    %8.0g                 1/25/20
v16             byte    %8.0g                 1/26/20

               (Output omitted)

v82             lengthy    %12.0g                4/1/20
v83             lengthy    %12.0g                4/2/20
v84             lengthy    %12.0g                4/3/20
v85             lengthy    %12.0g                4/4/20
v86             lengthy    %12.0g                4/5/20
--------------------------------------------------------------------------------
Sorted by:

The variables v12 via v86 include the cumulative variety of confirmed circumstances of COVID-19 in every county for every day beginning on January 22, 2020, and ending on April 5, 2020. Let’s listing some observations to view the uncooked knowledge.

. listing combined_key v12 v13 v84 v85 v86 in 6/10

     +----------------------------------------------------+
     |         combined_key   v12   v13   v84   v85   v86 |
     |----------------------------------------------------|
  6. | Autauga, Alabama, US     0     0    12    12    12 |
  7. | Baldwin, Alabama, US     0     0    28    29    29 |
  8. | Barbour, Alabama, US     0     0     1     2     2 |
  9. |    Bibb, Alabama, US     0     0     4     4     5 |
 10. |  Blount, Alabama, US     0     0     9    10    10 |
     +----------------------------------------------------+

We might want to embody v12 via v86 in our ultimate dataset. I’ve copied the code block from the top of my final publish and pasted it beneath with two small modifications. The primary modification is displayed in purple. The code retains and codecs the case knowledge for each date saved in v12 via v86. Word that I seek advice from “v12 via v86” within the code by typing v*. The asterisk serves as a wildcard, so v* refers to any variable that begins with the letter “v”. The second modification is that we don’t create a variable for our population-adjusted rely at this level.

// Create the geographic dataset
clear
copy https://www2.census.gov/geo/tiger/GENZ2018/shp/cb_2018_us_county_500k.zip ///
    cb_2018_us_county_500k.zip
unzipfile cb_2018_us_county_500k.zip
spshape2dta cb_2018_us_county_500k.shp, saving(usacounties) exchange
use usacounties.dta, clear
generate fips = actual(GEOID)
save usacounties.dta, exchange

// Create the COVID-19 case dataset
clear
import delimited https://uncooked.githubusercontent.com/CSSEGISandData/COVID-19/grasp/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_US.csv
drop if lacking(fips)
save covid19_county, exchange

// Create the inhabitants dataset
clear
import delimited https://www2.census.gov/programs-surveys/popest/datasets/2010-2019/counties/totals/co-est2019-alldata.csv
generate fips = state*1000 + county
save census_popn, exchange

// Merge the datasets
clear
use _ID _CX _CY GEOID fips utilizing usacounties.dta
merge 1:1 fips utilizing covid19_county ///
    , keepusing(province_state combined_key v*)
hold if _merge == 3
drop _merge
merge 1:1 fips utilizing census_popn ///
    , keepusing(census2010pop popestimate2019)
hold if _merge==3
drop _merge
drop if inlist(province_state, "Alaska", "Hawaii")
format %16.0fc popestimate2019 v*
save covid19_adj, exchange

Let’s describe a few of our ultimate dataset.

. describe _ID _CX _CY popestimate2019 v*

              storage   show    worth
variable identify   kind    format     label      variable label
--------------------------------------------------------------------------------
_ID             int     %12.0g                Spatial-unit ID
_CX             double  %10.0g                x-coordinate of space centroid
_CY             double  %10.0g                y-coordinate of space centroid
popestimate2019 lengthy    %16.0fc               POPESTIMATE2019
v12             byte    %16.0fc               1/22/20
v13             byte    %16.0fc               1/23/20

               (Output omitted)

v85             lengthy    %16.0fc               4/4/20
v86             lengthy    %16.0fc               4/5/20

The dataset accommodates all of the variables that we might want to create our animated map: the geographic data, the inhabitants of every county, and the variety of circumstances for every day in v12 via v86. I’m going to elucidate all of the steps we might want to take utilizing solely three of those variables. Then, we are going to put them collectively utilizing all 75 variables.

We might want to create a separate map for v12, v13, v14, and so forth to v86. Recall that we discovered about loops in considered one of my earlier posts. We are able to use forvalues to loop over the variables.

The forvalues loop beneath will retailer the numbers 12, 13, and 14 to the native macro time. You’ll be able to seek advice from time contained in the loop utilizing left and proper single quotes. The instance beneath describes every variable.

. forvalues time = 12/14 {
  2.     describe v`time'
  3. }

              storage   show    worth
variable identify   kind    format     label      variable label
--------------------------------------------------------------------------------
v12             byte    %16.0fc               1/22/20

              storage   show    worth
variable identify   kind    format     label      variable label
--------------------------------------------------------------------------------
v13             byte    %16.0fc               1/23/20

              storage   show    worth
variable identify   kind    format     label      variable label
--------------------------------------------------------------------------------
v14             byte    %16.0fc               1/24/20

We will even must calculate the population-adjusted variety of circumstances for every county for every day. Within the instance beneath, we start by producing a variable named confirmed_adj that accommodates lacking values. Contained in the loop, we exchange confirmed_adj with the adjusted rely for the present day and summarize confirmed_adj. Then, we drop confirmed_adj when the loop is completed.

. generate confirmed_adj = .
(3,108 lacking values generated)

. forvalues time = 12/14 {
  2.     exchange confirmed_adj = 100000*(v`time'/popestimate2019)
  3.     summarize confirmed_adj
  4. }
(3,108 actual modifications made)

    Variable |        Obs        Imply    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
confirmed_~j |      3,108    .0000143    .0007962          0   .0443896
(0 actual modifications made)

    Variable |        Obs        Imply    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
confirmed_~j |      3,108    .0000143    .0007962          0   .0443896
(1 actual change made)

    Variable |        Obs        Imply    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
confirmed_~j |      3,108    .0000205     .000869          0   .0443896

. drop confirmed_adj

We’d additionally want to embody the date within the title of every map. Recall that dates are saved because the variety of days since January 1, 1960. The variable v12 accommodates knowledge for January 22, 2020. We are able to calculate the variety of days since January 1, 1960 utilizing the date() operate.

. show date("January 22, 2020", "MDY")
21936

Our forvalues loop begins at time = 12, and we wish the date for v12 to be 21936. So we should subtract 12 from the date earlier than we add the worth of the native macro time. The instance beneath loops over the times from January 22, 2020, to January 24, 2020.

. forvalues time = 12/14 {
  2.     show 21936 - 12 + `time'
  3. }
21936
21937
21938

We are able to then use the string() operate to vary the show format of every date.

. forvalues time = 12/14 {
  2.     native date = 21936 - 12 + `time'
  3.     show string(`date', "%tdMonth_dd,_CCYY")
  4. }
January 22, 2020
January 23, 2020
January 24, 2020

We will even use graph export to export our graphs to Transportable Community Graphics (.png) information. These information will likely be mixed later to create our video, and the filenames have to be numbered sequentially from “1” with main zeros. The instance beneath demonstrates methods to use the string() operate to create these filenames.

. forvalues time = 12/14 {
  2.     native filenum = string(`time'-11,"%03.0f")
  3.     show "graph export map_`filenum'.png"
  4. }
graph export map_001.png
graph export map_002.png
graph export map_003.png

Let’s put all of the items collectively and show the essential instructions that we’ll use to create every map.

. generate confirmed_adj = .
(3,108 lacking values generated)

. forvalues time = 12/14 {
  2.     native date    = 21936 - 12 + `time'
  3.     native date    = string(`date', "%tdMonth_dd,_CCYY")
  4.     native filenum = string(`time'-11,"%03.0f")
  5.     show "exchange confirmed_adj = 100000*(v`time'/popestimate2019)"
  6.     show "grmap confirmed_adj, title(Map for `date')"
  7.     show "graph export map_`filenum'.png"
  8.     show
  9. }
exchange confirmed_adj = 100000*(v12/popestimate2019)
grmap confirmed_adj, title(Map for January 22, 2020)
graph export map_001.png

exchange confirmed_adj = 100000*(v13/popestimate2019)
grmap confirmed_adj, title(Map for January 23, 2020)
graph export map_002.png

exchange confirmed_adj = 100000*(v14/popestimate2019)
grmap confirmed_adj, title(Map for January 24, 2020)
graph export map_003.png

. drop confirmed_adj

Every iteration of the loop does three issues. The primary line exchanges the variable confirmed_adj with the population-adjusted rely for that specific day. The second line makes use of grmap to create the map with the date within the title. The third line exports the graph to a .png file with sequential filenames beginning at 001.

The code block beneath does the identical factor however with some choices added to grmap and graph export. A lot of the grmap choices have been mentioned in my final publish. I’ve added the choice ocolor(gs8) to render the map define colour with light-gray scale and the choice osize(vthin) to make the map strains thinner. The choices width(3840) and peak(2160) after graph export specify that every map be saved at 4K decision. It will create clear, detailed photos for our video.

generate confirmed_adj = .
forvalues time = 12/86 {
    native date = 21936 - 12 + `time'
    native date = string(`date', "%tdMonth_dd,_CCYY")
    exchange confirmed_adj = 100000*(v`time'/popestimate2019)
    grmap confirmed_adj,                                                    ///
        clnumber(8)                                                         ///
        clmethod(customized)                                                    ///
        clbreaks(0 5 10 15 20 25 50 100 5000)                               ///
        ocolor(gs8) osize(vthin)                                            ///
        title("Confirmed Circumstances of COVID-19 in the US on `date'") ///
        subtitle("cumulative circumstances per 100,000 inhabitants")

    native filenum = string(`time'-11,"%03.0f")
    graph export "map_`filenum'.png", as(png) width(3840) peak(2160)
}
drop confirmed_adj

The code block above will create 75 maps and save them in 75 information.

. ls
     4/06/20 19:36  .
     4/06/20 19:36  ..
1551.2k   4/06/20 10:52  map_001.png
1551.6k   4/06/20 10:52  map_002.png
1550.9k   4/06/20 10:53  map_003.png

    (Output omitted)

1589.1k   4/06/20 11:12  map_073.png
1587.2k   4/06/20 11:13  map_074.png
1585.9k   4/06/20 11:40  map_075.png

Let’s view the file map_001.png, which is the map for January 22, 2020.

Determine 1: map_001.png

Subsequent, let’s view the file map_075.png, which is the map for April 5, 2020.

Determine 2: map_0075.png

graph1

Word that the legend on the backside left of each maps is similar. I used the clmethod(customized) choice with grmap to specify my very own cutpoints for values of confirmed_adj. The legend will change from each day for those who use the default clmethod(), which selects the cutpoints primarily based on quantiles of confirmed_adj. The quantiles will change from each day, and the legend won’t be constant throughout maps. I chosen the cutpoints in clbreaks(0 5 10 15 20 25 50 100 5000) primarily based on the map of the ultimate day.

You’ll be able to replace your video with future knowledge by rising the higher restrict of the forvalues loop. For instance, on April 12, 2020, I’d kind forvalues time = 12/91.

We efficiently created a map for every day and saved every map to a separate file. Now, we have to mix these information to create our video.

The right way to create a video from the gathering of maps

I wrote a weblog publish in 2014 that describes methods to use FFmpeg to create a video from a set of photos. FFmpeg is a free software program package deal with variations out there for Linux, Mac, and Home windows. You’ll be able to execute FFmpeg instructions from inside Stata utilizing shell.

The instance beneath makes use of FFmpeg to mix our map information right into a video named covid19.mp4.

shell "ffmpeg.exe" -framerate 1/.5 -i map_percent03d.png -c:v libx264 -r 30 -pix_fmt yuv420p covid19.mp4

You could must specify the placement of ffmpeg.exe in your pc as within the instance beneath.

shell "C:Program Filesffmpegbinffmpeg.exe" -framerate 1/.5 -i map_percent03d.png -c:v libx264 -r 30 -pix_fmt yuv420p covid19.mp4

The names of the picture information are specified with the choice -i map_percent03d.png, and the identify of the output video, covid19.mp4, is specified on the finish of the command. The body fee is specified with the choice -framerate 1/.5. The ratio “1/.5” specifies a body fee of two photos per second.

FFmpeg could take a number of minutes to course of the pictures and convert them to a video. Your persistence will likely be rewarded with the video beneath.

Conclusion and picked up code

We did it! We efficiently created an animated choropleth map that exhibits the population-adjusted confirmed circumstances of COVID-19 for each county within the United State from January 22, 2020, via April 5, 2020. Our video is a strong instrument that we will use to review the distribution of COVID-19 over time and placement.

I wish to remind you that I’ve made considerably arbitrary selections whereas cleansing the info which can be used to create this video. My aim was to indicate you methods to create your personal maps and movies. My outcomes must be used for academic functions solely, and you will have to examine the info rigorously for those who plan to make use of the outcomes to make selections.

You’ll be able to reproduce the video with the code beneath.

// Create the geographic dataset
clear
copy https://www2.census.gov/geo/tiger/GENZ2018/shp/cb_2018_us_county_500k.zip ///
    cb_2018_us_county_500k.zip
unzipfile cb_2018_us_county_500k.zip
spshape2dta cb_2018_us_county_500k.shp, saving(usacounties) exchange
use usacounties.dta, clear
generate fips = actual(GEOID)
save usacounties.dta, exchange

// Create the COVID-19 case dataset
clear
import delimited https://uncooked.githubusercontent.com/CSSEGISandData/COVID-19/grasp/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_US.csv
drop if lacking(fips)
save covid19_county, exchange

// Create the inhabitants dataset
clear
import delimited https://www2.census.gov/programs-surveys/popest/datasets/2010-2019/counties/totals/co-est2019-alldata.csv
generate fips = state*1000 + county
save census_popn, exchange

// Merge the datasets
clear
use _ID _CX _CY GEOID fips utilizing usacounties.dta
merge 1:1 fips utilizing covid19_county ///
    , keepusing(province_state combined_key v*)
hold if _merge == 3
drop _merge
merge 1:1 fips utilizing census_popn ///
    , keepusing(census2010pop popestimate2019)
hold if _merge==3
drop _merge
drop if inlist(province_state, "Alaska", "Hawaii")
format %16.0fc popestimate2019 v*
save covid19_adj, exchange

// Create the maps
spset, modify shpfile(usacounties_shp)
generate confirmed_adj = .
forvalues time = 12/86 {
    native date = 21936 - 12 + `time'
    native date = string(`date', "%tdMonth_dd,_CCYY")
    exchange confirmed_adj = 100000*(v`time'/popestimate2019)
    grmap confirmed_adj,                                                    ///
        clnumber(8)                                                         ///
        clmethod(customized)                                                    ///
        clbreaks(0 5 10 15 20 25 50 100 5000)                               ///
        ocolor(gs8) osize(vthin)                                            ///
        title("Confirmed Circumstances of COVID-19 in the US on `date'") ///
        subtitle("cumulative circumstances per 100,000 inhabitants")

    native filenum = string(`time'-11,"%03.0f")
    graph export "map_`filenum'.png", as(png) width(3840) peak(2160)
}
drop confirmed_adj

// Create the video with FFmpeg
shell "C:Program Filesffmpegbinffmpeg.exe" -framerate 1/.5 -i map_percent03d.png -c:v libx264 -r 30 -pix_fmt yuv420p covid19.mp4



Related Articles

Latest Articles