The nfl2stata command not works attributable to web site adjustments.
Soccer season is across the nook, and I couldn’t be extra excited. Now we have a fairly aggressive StataCorp fantasy soccer league. I’m all the time on the lookout for an edge in our league, so I challenged certainly one of our interns, Chris Hassell, to jot down a command to internet scrape http://www.nfl.com for information on the NFL. The brand new command is nfl2stata. To put in the command, sort
internet set up http://www.stata.com/customers/kcrow/nfl2stata, change
With this new command, you may easliy discover the operating backs who had essentially the most touchdowns final season,
. nfl2stata participant "operating again", season(2017) clear
177 commentary(s) loaded
. gsort -touchdowns -yards
. checklist title workforce touchdowns in 1/10
+-------------------------------------+
| title workforce touchd~s |
|-------------------------------------|
1. | Todd Gurley LA 13 |
2. | Mark Ingram NO 12 |
3. | Le'Veon Bell PIT 9 |
4. | Jordan Howard CHI 9 |
5. | Leonard Fournette JAX 9 |
|-------------------------------------|
6. | Kareem Hunt KC 8 |
7. | Melvin Gordon LAC 8 |
8. | Carlos Hyde SF 8 |
9. | Latavius Murray MIN 8 |
10. | Alvin Kamara NO 8 |
+-------------------------------------+
You could find the top-5 discipline aim kickers (by discipline objectives made) from final season.
. nfl2stata participant "discipline aim kicker", season(2017) clear
54 commentary(s) loaded
. checklist title workforce fieldgoalsmade in 1/5
+--------------------------------------+
| title workforce f~lsmade |
|--------------------------------------|
1. | Robbie Gould SF 39 |
2. | Greg Zuerlein LA 38 |
3. | Harrison Butker KC 38 |
4. | Stephen Gostkowski NE 37 |
5. | Ryan Succop TEN 35 |
+--------------------------------------+
You’ll be able to generate a graph of the highest passing leaders from final common season.
. nfl2stata participant quarterback, season(2017) seasontype(reg) clear 71 commentary(s) loaded . graph bar (asis) yards if yards >= 4000, exclude0 /// over(title, type(yards) descending label(angle(forty_five) labsize(small))) /// blabel(bar) title(2017 Passing Yard Leaders)
There may be numerous attention-grabbing information to pore via, particularly in case you’re all in favour of fantasy soccer, as I’m. Although this looks as if a easy command, it really shouldn’t be, due to the time it takes to fetch, parse, and cargo the information from http://www.nfl.com by way of internet scraping.
Net scraping
You will have heard of the time period “internet scraping”. A easy definition of internet scraping is extracting information from web sites. More often than not, a web site’s copyright prevents folks from distributing information obtained from scaping their web site, however you should utilize a private copy of the information by yourself private pc. That is what the NFL’s copyright states. Due to this, customers should scrape the web site themselves. To do that for the NFL information, you sort
nfl2stata scrape, season(_all)
This command will scrape all information from 2009 to the present yr and save the information as Stata datasets to your native pc alongside your Stata adopath. Particularly, it’s going to save them in your PLUS listing the place subsequent nfl2stata instructions will have the ability to discover them. The primary yr of NFL information saved on http://www.nfl.com is 2009. Presently, there aren’t any information to scrape earlier than this. Net scraping is an costly and time-consuming course of. Relying on a number of components (pc velocity, pc reminiscence, community connection, and many others.), this preliminary information scrape can take hours to finish. You may need to run the above command in a single day. Upon getting scraped the historic information, you may simply sort
nfl2stata scrape
Updating your domestically saved datasets with the present week’s information does run quicker.
As of the writing of this weblog, the scraping command works, but when the NFL adjustments the HTML web page format, the command will break, and if this occurs, we’ll repair it if we are able to. Additionally, the information that’s scraped will change over time because the NFL updates earlier information on its web site, so typically the information you scraped a couple of weeks in the past won’t match what you see on the ESPN or NFL web site. As well as, typically the information can exist in a couple of place and could be inconsistent as one web site will get up to date stats and one other doesn’t. You’ll be able to rescrape the information through the use of nfl2stata scrape, season(_all) change to create new clear datasets. These issues are what makes internet scraping a risky course of.
Command
The command nfl2stata scrape produces recreation, recreation abstract, play-by-play, participant, participant profile, roster, and workforce Stata datasets for annually. To load these information into Stata, you could use the next instructions:
- To load game-by-game information into Stata, use
nfl2stata recreation "place" [, game_options]
- To load recreation abstract information into Stata, use
nfl2stata gamesummary [, game_summary_options]
- To load play-by-play information into Stata, use
nfl2stata playbyplay [, playbyplay_options]
- To load player-specific information into Stata, use
nfl2stata participant "place" [, player_options]
- To load participant profile information into Stata, use
nfl2stata profile [, profile_options]
- To load workforce roster information into Stata, use
nfl2stata roster [, roster_options]
- To load workforce game-by-game information into Stata, use
nfl2stata workforce [, team_options]
These instructions every search their respective datasets. Typically you will have to make use of Stata instructions like collapse, gsort, and merge to generate the statistics, type the information, and merge two or extra NFL datasets collectively to look at the information. Let’s have a look at a couple of extra examples.
Examples
I’ve discovered that the 2 Stata instructions I take advantage of most steadily with these information are gsort, which types information in ascending or descending order, and collapse, which makes a dataset of abstract statistics. collapse is particularly helpful when working with a number of video games’ or a number of seasons’ information. For instance, to seek out out which broad receiver led the NFL in receiving final yr, you’d sort
. nfl2stata recreation "broad receiver", season(2017) seasontype(reg) clear
2764 commentary(s) loaded
. collapse (sum) receivingyards, by(title)
. gsort -receivingyards
. checklist in 1/5
+----------------------------+
| title receiv~s |
|----------------------------|
1. | Antonio Brown 1533 |
2. | Julio Jones 1444 |
3. | Keenan Allen 1393 |
4. | DeAndre Hopkins 1378 |
5. | Adam Thielen 1276 |
+----------------------------+
Generally, you’ll want to merge two or extra NFL datasets to reply some questions in regards to the information. For instance, to seek out the common weight of an NFL operating again during the last 9 years, you could merge the roster information and the profile information to get the participant place and participant weight variables collectively in the identical dataset. For instance, sort
. nfl2stata roster, clear
18299 commentary(s) loaded
. duplicates drop playerid, power
Duplicates by way of playerid
(13,964 observations deleted)
. drop workforce teamname seasontype
. save temp_roster.dta, change
file temp_roster.dta saved
. nfl2stata profile, clear
4335 commentary(s) loaded
. merge 1:1 playerid utilizing temp_roster.dta
Outcome # of obs.
-----------------------------------------
not matched 0
matched 4,335 (_merge==3)
-----------------------------------------
. sum weight if place == "RB"
Variable | Obs Imply Std. Dev. Min Max
-------------+---------------------------------------------------------
weight | 384 215.9036 14.20637 173 269
To search out who led the NFL in receiving or dashing you’ll want to merge all offensive participant information into one dataset. For instance, to checklist the receiving leaders sort
. nfl2stata recreation "quarterback", season(2017) seasontype(reg) clear
1042 commentary(s) loaded
. tempfile tmp
. qui save "`tmp'", change
. nfl2stata recreation "operating again", season(2017) seasontype(reg) clear
2018 commentary(s) loaded
. qui append utilizing "`tmp'"
. qui save "`tmp'", change
. nfl2stata recreation "broad receiver", season(2017) seasontype(reg) clear
2764 commentary(s) loaded
. qui append utilizing "`tmp'"
. qui save "`tmp'", change
. nfl2stata recreation "tight finish", season(2017) seasontype(reg) clear
1554 commentary(s) loaded
. qui append utilizing "`tmp'"
. collapse (sum) receivingyards, by(title place)
. gsort -receivingyards
. checklist title place receivingyards in 1/30
+-------------------------------------------+
| title place receiv~s |
|-------------------------------------------|
1. | Antonio Brown WR 1533 |
2. | Julio Jones WR 1444 |
3. | Keenan Allen WR 1393 |
4. | DeAndre Hopkins WR 1378 |
5. | Adam Thielen WR 1276 |
|-------------------------------------------|
6. | Michael Thomas WR 1245 |
7. | Tyreek Hill WR 1183 |
8. | Larry Fitzgerald WR 1156 |
9. | Marvin Jones WR 1101 |
10. | Rob Gronkowski TE 1084 |
|-------------------------------------------|
11. | Brandin Cooks WR 1082 |
12. | A.J. Inexperienced WR 1078 |
13. | Travis Kelce TE 1038 |
14. | Golden Tate WR 1003 |
15. | Mike Evans WR 1001 |
|-------------------------------------------|
16. | Doug Baldwin WR 991 |
17. | Jarvis Landry WR 987 |
18. | T.Y. Hilton WR 966 |
19. | Marquise Goodwin WR 962 |
20. | Demaryius Thomas WR 949 |
|-------------------------------------------|
21. | Robby Anderson WR 941 |
22. | JuJu Smith-Schuster WR 917 |
23. | Davante Adams WR 885 |
24. | Cooper Kupp WR 869 |
25. | Stefon Diggs WR 849 |
|-------------------------------------------|
26. | Kenny Stills WR 847 |
27. | Devin Funchess WR 840 |
28. | Dez Bryant WR 838 |
29. | Alvin Kamara RB 826 |
30. | Zach Ertz TE 824 |
+-------------------------------------------+
Implementation
Chris used Stata’s Java plugins to jot down nearly all of the command. The opposite Java libraries he used to jot down the command are
There are numerous Java libraries on the market for internet scraping information. These are simply those we used.
