As of November 2019, this command now not works due to https://stats.nba.com restrictions.
Since our intern, Chris Hassell, completed nfl2stata sooner than anticipated, he went forward and created one other command to internet scrape https://stats.nba.com for information on the NBA. The command is nba2stata. To put in the command kind
internet set up http://www.stata.com/customers/kcrow/nba2stata, substitute
When Chris first wrote the command, I knew I needed to have a look at how the three-point shot has modified the best way the sport is performed. For instance, I can discover the most effective three-point shooter from final season.
. nba2stata playerstats _all, season(2017) seasontype(reg) stat(season) clear
Processing x/543 requests
.........x.........x.........x.........x.........50
.........x.........x.........x.........x.........100
.........x.........x.........x.........x.........150
.........x.........x.........x.........x.........200
.........x.........x.........x.........x.........250
.........x.........x.........x.........x.........300
.........x.........x.........x.........x.........350
.........x.........x.........x.........x.........400
.........x.........x.........x.........x.........450
.........x.........x.........x.........x.........500
.........x.........x.........x.........x...
660 remark(s) loaded
. gsort -threepointfieldgoalsmade
. checklist playername teamname threepointfieldgoalsmade in 1/10
+----------------------------------------------------+
| playername teamname three~de |
|----------------------------------------------------|
1. | James Harden Houston Rockets 265 |
2. | Paul George Oklahoma Metropolis Thunder 244 |
3. | Kyle Lowry Toronto Raptors 238 |
4. | Kemba Walker Charlotte Hornets 231 |
5. | Klay Thompson Golden State Warriors 229 |
|----------------------------------------------------|
6. | Wayne Ellington Miami Warmth 227 |
7. | Damian Lillard Portland Trailblazers 227 |
8. | Eric Gordon Houston Rockets 218 |
9. | Stephen Curry Golden State Warriors 212 |
10. | Joe Ingles Utah Jazz 204 |
+----------------------------------------------------+
Or I can test a participant’s regular-season three-point proportion for the final 5 years.
. nba2stata playerstat "Dirk", stat(season) seasontype(reg) clear
27 remark(s) loaded
. gsort -playerage
. checklist playername playerage threepointfieldgoalpercentage in 1/5
+-------------------------------------+
| playername playe~ge three~ge |
|-------------------------------------|
1. | Dirk Nowitzki 40 .409 |
2. | Dirk Nowitzki 39 .378 |
3. | Dirk Nowitzki 38 .368 |
4. | Dirk Nowitzki 37 .38 |
5. | Dirk Nowitzki 36 .398 |
+-------------------------------------+
Or I can see how three-point proportion impacts your favourite crew’s likelihood of profitable.
. nba2stata teamstats "HOU", season(2017) stat(sport) seasontype(reg) clear
82 remark(s) loaded
. hold if threepointfieldgoalpercentage > .35
(35 observations deleted)
. tab winloss
Win / loss | Freq. P.c Cum.
------------+-----------------------------------
L | 4 8.51 8.51
W | 43 91.49 100.00
------------+-----------------------------------
Whole | 47 100.00
nba2stata is nice in case you are planning on doing professional basketball evaluation. Though this command appears to be like similar to nfl2stata, it’s not. The command works fairly in a different way.
Net scraping JSON
In our final weblog put up, we talked about internet scraping the https://www.nfl.com and extracting the info from the HTML pages. The NBA information are totally different. You may entry the info through JSON objects from https://stats.nba.com. JSON is a light-weight information format. This information format is simple to parse; subsequently, we don’t have a scrape command for these information. We scrape and cargo these information on the fly.
The NBA’s copyright is much like that of the NFL; you should use a private copy of the info by yourself private laptop. When you “use, show or publish” something utilizing these information, you should embrace “a outstanding attribution to http://www.nba.com“. One other distinction is that the NBA information saved on http://stats.nba.com can go way back to the Nineteen Sixties, relying on the crew.
Command
There are solely 4 subcommands to nba2stata, although we may have developed extra. Chris had to return to high school.
- To scrape participant statistics information into Stata, use
nba2stata playerstats name_pattern [, playerstats_options]
- To scrape participant profile information into Stata, use
nba2stata playerprofile name_pattern [, playerprofile_options]
- To scrape crew statistics information into Stata, use
nba2stata teamstats team_adv [, teamstats_options]
- To scrape crew roster information into Stata, use
nba2stata teamroster team_adv [, teamroster_options]
Similar to with nfl2stata, you will want to make use of Stata instructions like collapse, gsort, and merge to generate the statistics, kind the info, and merge two or extra NBA datasets collectively to look at the info.
Examples
One factor I’m at all times inquisitive about is which school groups produce probably the most NBA gamers. That is simple to search out out utilizing nba2stata, collapse, and gsort.
. nba2stata playerprofile "_all", clear Processing x/4308 requests .........x.........x.........x.........x.........50 .........x.........x.........x.........x.........100 .........x.........x.........x.........x.........150.........x.........x.........x.........x.........4250 .........x.........x.........x.........x.........4300 ........ 4308 remark(s) loaded . save playerprofile, substitute (be aware: file playerprofile.dta not discovered) file playerprofile.dta saved . drop if faculty == "" (114 observations deleted) . gen ct = 1 . collapse (depend) ct, by(faculty) . gsort -ct . checklist in 1/10 +---------------------+ | faculty ct | |---------------------| 1. | Kentucky 97 | 2. | UCLA 86 | 3. | North Carolina 80 | 4. | Duke 70 | 5. | Kansas 69 | |---------------------| 6. | Indiana 57 | 7. | Notre Dame 55 | 8. | Louisville 53 | 9. | Arizona 51 | 10. | Syracuse 50 | +---------------------+
Due to the quantity of knowledge fetched, you may wish to save the participant profile information after fetching it as a result of it does take a while to obtain. On my machine, it took about an hour. The time largly relies on the quantity of knowledge that have to be fetched. Within the above case, it’s all of the participant profile information from the NBA.
One other fascinating instance could be to search out the oldest and youngest groups within the NBA. You need to use the crew roster to do that.
. nba2stata teamroster _all, season(2017) clear
Processing x/30 requests
.........x.........x.........x
502 remark(s) loaded
. collapse (imply) age, by(teamname)
. kind age
. checklist teamname age in 1/5
+---------------------------------+
| teamname age |
|---------------------------------|
1. | Phoenix Suns 24.4706 |
2. | Portland Trailblazers 24.8125 |
3. | Chicago Bulls 24.8889 |
4. | Atlanta Hawks 25.2222 |
5. | Brooklyn Nets 25.3529 |
+---------------------------------+
. checklist teamname age in -5/l
+---------------------------------+
| teamname age |
|---------------------------------|
26. | Washington Wizards 27.75 |
27. | San Antonio Spurs 28.3529 |
28. | Golden State Warriors 28.6667 |
29. | Cleveland Cavaliers 29 |
30. | Houston Rockets 29.1765 |
+---------------------------------+
Implementation
Once more, Chris used Stata’s Java plugins and Gson to write down nearly all of the command.
