Information are all over the place. Many authorities businesses, monetary establishments, universities, and social media platforms present entry to their information by means of an software programming interface (API). APIs typically return the requested information in a JavaScript Object Notation (JSON) file. On this publish, I’ll present you easy methods to use Python to request information with API calls and easy methods to work with the ensuing JSON information.
If you’re not accustomed to Python, it might be useful to learn the primary 4 posts in my Stata/Python Integration sequence earlier than you learn additional.
- Establishing Stata to make use of Python
- 3 ways to make use of Python in Stata
- How one can set up Python packages
- How one can use Python packages
APIs and JSON information
An API is a software program software that can be utilized to request information from one other computing system. There are lots of totally different sorts of APIs, and the syntax is usually distinctive to every API. However a typical API consists of a URL adopted by question choices. For instance, the URL under makes use of the openFDA API to request information about opposed drug occasions from the US Meals and Drug Administration.
https://api.fda.gov/drug/occasion.json?
We will add choices to our API name to slim our information request. For instance, the URL under requests the variety of opposed occasions in the US that concerned Fentanyl from January 1, 2018, by means of January 5, 2018.
https://api.fda.gov/drug/occasion.json?search=receivedate [20180101+TO+20180105]+AND+occurcountry:"US" +AND+affected person.drug.openfda.brand_name:"Fentanyl"&depend=receivedate
We will kind the URL for this API name within the tackle bar of an internet browser, and the browser will show the ensuing information as a JSON file.
JSON is a well-liked information file format that consists of a group of key:worth pairs. A “key” is just like a variable in a Stata dataset, and a “worth” is the info. The picture above consists of the key:worth pair time:“20180105” close to the underside. The bottom line is time and the worth is “20180105”.
JSON information are sometimes nested. For instance, time:“20180105” is nested inside the 4: key. Within the picture above, the time: secret is nested inside the 4: key, and the 4: secret is nested inside the outcomes: key. There are two keys on the high of the JSON nesting construction: meta: and outcomes:.
Our aim is to make use of the openFDA API to request information about opposed drug occasions and convert the nested JSON information to a Stata dataset. We might be utilizing the requests and pandas packages, so you need to verify that they’re put in earlier than we start.
Outline the URL for an API name
Let’s start by defining a string named URL inside a Python code block. The string URL comprises the URL to request opposed occasion information utilizing the openFDA API.
python:
URL = 'https://api.fda.gov/drug/occasion.json'
URL
finish
The assertion URL within the third line of the code block above shows the contents of the string URL.
. python:
---------------------------------------- python (kind finish to exit) -------------
>>> URL = 'https://api.fda.gov/drug/occasion.json'
>>> URL
'https://api.fda.gov/drug/occasion.json'
>>> finish
--------------------------------------------------------------------------------
We will customise our information request by including search fields and values to our API name. You possibly can see the syntax, a listing of searchable fields, and examples on the openFDA web site.
Let’s prohibit our search to opposed occasions reported from January 1, 2018, by means of January 5, 2018, by including the ?search choice to our API name within the code block under.
python:
URL = 'https://api.fda.gov/drug/occasion.json
> ?search=receivedate:[20180101+TO+20180105]'
URL
finish
The syntax works, however we have now added just one search discipline, and the URL for the API name is already tough to learn. Let’s cut up the URL into two strings: API and date. The string API comprises the URL for the essential API name, and the string date narrows our search from January 1, 2018, by means of January 5, 2018. Then, we are able to mix the strings by typing URL = API + date.
python: API = 'https://api.fda.gov/drug/occasion.json?search=" date = "receivedate:[20180101+TO+20180105]' URL = API + date URL finish
Our Python code block is simpler to learn, and the URL for the API name continues to be the identical.
. python: -------------------------------------------- python (kind finish to exit) -------- >>> API = 'https://api.fda.gov/drug/occasion.json?search=" >>> date = "receivedate:[20180101+TO+20180105]' >>> URL = API + date >>> URL 'https://api.fda.gov/drug/occasion.json?search=receivedate:[20180101+TO+20180105]' >>> finish -------------------------------------------------------------------------------
Let’s additional prohibit our question to opposed occasions that occurred in the US from January 1, 2018, by means of January 5, 2018. Within the code block under, the string nation comprises syntax that restricts our question to the US. Then, we are able to mix the strings API, date, and nation to specify the whole API name saved within the string URL. Word that we should embody “+AND+” between date and nation once we outline URL.
python: API = 'https://api.fda.gov/drug/occasion.json?search=" date = "receivedate:[20180101+TO+20180105]' nation = 'occurcountry:"US"' URL = API + date + "+AND+" + nation URL finish
Our code stays straightforward to learn even because the URL for our API name turns into extra complicated.
. python: -------------------------------------------- python (kind finish to exit) -------- >>> API = 'https://api.fda.gov/drug/occasion.json?search=" >>> date = "receivedate:[20180101+TO+20180105]' >>> nation = 'occurcountry:"US"' >>> URL = API + date + "+AND+" + nation >>> URL 'https://api.fda.gov/drug/occasion.json?search=receivedate:[20180101+TO+20180105] > +AND+occurcountry:"US"' >>> finish -------------------------------------------------------------------------------
We will use the same technique to additional prohibit our question to opposed occasions that concerned the drug Fentanyl. The string drug within the code block under consists of the syntax that specifies that the opposed occasion concerned Fentanyl.
python: API = 'https://api.fda.gov/drug/occasion.json?search=" date = "receivedate:[20180101+TO+20180105]' nation = 'occurcountry:"US"' drug = 'affected person.drug.openfda.brand_name:"Fentanyl"' URL = API + date + "+AND+" + nation + "+AND+" + drug URL finish
Lastly, let’s specify that our outcomes comprise information for the variety of opposed occasions that occurred on every day. The string information within the code block under comprises the particular syntax and should be added to the tip of the URL for the API name. Word that the string information should be preceded by & moderately than +AND+.
python: API = 'https://api.fda.gov/drug/occasion.json?search=" date = "receivedate:[20180101+TO+20180105]' nation = 'occurcountry:"US"' drug = 'affected person.drug.openfda.brand_name:"Fentanyl"' information="depend=receivedate" URL = API + date + "+AND+" + nation + "+AND+" + drug + "&" + information URL finish
Our code block continues to be straightforward to learn although the URL for our API name has grow to be fairly complicated.
. python: -------------------------------------------- python (kind finish to exit) -------- >>> API = 'https://api.fda.gov/drug/occasion.json?search=" >>> date = "receivedate:[20180101+TO+20180105]' >>> nation = 'occurcountry:"US"' >>> drug = 'affected person.drug.openfda.brand_name:"Fentanyl"' >>> information="depend=receivedate" >>> URL = API + date + "+AND+" + nation + "+AND+" + drug + "&" + information >>> URL 'https://api.fda.gov/drug/occasion.json?search=receivedate:[20180101+TO+20180105] > +AND+occurcountry:"US"+AND+affected person.drug.openfda.brand_name:"Fentanyl" > &depend=receivedate' >>> finish -------------------------------------------------------------------------------
Requesting information utilizing an API name
Now, we’re able to submit our API name to the openFDA information server. Let’s start by importing the requests bundle. We will use the get() technique to submit the URL for our API name. Then, we are going to retailer the ensuing JSON information in a dictionary object named information.
python: import requests API = 'https://api.fda.gov/drug/occasion.json?search=" date = "receivedate:[20180101+TO+20180105]' nation = 'occurcountry:"US"' drug = 'affected person.drug.openfda.brand_name:"Fentanyl"' information="depend=receivedate" URL = API + date + "+AND+" + nation + "+AND+" + drug + "&" + information information = requests.get(URL).json() information finish
We will view the contents of the information object by typing information as within the code block above. The information displayed within the output under are tough to learn as a result of they haven’t been formatted for show.
. python:
-------------------------------------------- python (kind finish to exit) --------
>>> import requests
>>> API = 'https://api.fda.gov/drug/occasion.json?search="
>>> date = "receivedate:[20180101+TO+20180105]'
>>> nation = 'occurcountry:"US"'
>>> drug = 'affected person.drug.openfda.brand_name:"Fentanyl"'
>>> information="depend=receivedate"
>>> URL = API + date + "+AND+" + nation + "+AND+" + drug + "&" + information
>>> information = requests.get(URL).json()
>>> information
{'meta': {'disclaimer': 'Don't depend on openFDA to make selections concerning
> medical care. Whereas we make each effort to make sure that information is correct, you
> ought to assume all outcomes are unvalidated. We might restrict or in any other case prohibit
> your entry to the API according to our Phrases of Service.', 'phrases':
> 'https://open.fda.gov/phrases/', 'license': 'https://open.fda.gov/license/',
> 'last_updated': '2020-09-09'}, 'outcomes': [{'time': '20180101', 'count': 1},
> {'time': '20180102', 'count': 16}, {'time': '20180103', 'count': 20},
> {'time': '20180104', 'count': 25}, {'time': '20180105', 'count': 24}]}
>>> finish
-------------------------------------------------------------------------------
We will use the json module to show the info in a extra readable format. Let’s start by importing the json module within the code block under. Then, we are able to use the dumps() technique to encode the JSON information. The indent=4 choice shows the info with indentions for every degree of nesting. The sort_keys=True choice kinds the info. And print() tells Python to show the outcomes of the dumps() technique.
python: import requests import json API = 'https://api.fda.gov/drug/occasion.json?search=" date = "receivedate:[20180101+TO+20180105]' nation = 'occurcountry:"US"' drug = 'affected person.drug.openfda.brand_name:"Fentanyl"' information="depend=receivedate" URL = API + date + "+AND+" + nation + "+AND+" + drug + "&" + information information = requests.get(URL).json() print(json.dumps(information, indent=4, sort_keys=True)) finish
The information within the output under are simpler to learn. We will now see that the info are nested within the keys meta and outcomes. The meta key comprises a disclaimer, the date the info have been final up to date, a URL for the license, and the phrases of use. That is helpful data, however I don’t wish to embody it in my dataset. I wish to use solely the info saved within the outcomes key.
. python:
-------------------------------------------- python (kind finish to exit) --------
>>> import requests
>>> import json
>>> API = 'https://api.fda.gov/drug/occasion.json?search="
>>> date = "receivedate:[20180101+TO+20180105]'
>>> nation = 'occurcountry:"US"'
>>> drug = 'affected person.drug.openfda.brand_name:"Fentanyl"'
>>> information="depend=receivedate"
>>> URL = API + date + "+AND+" + nation + "+AND+" + drug + "&" + information
>>> information = requests.get(URL).json()
>>> print(json.dumps(information, indent=4, sort_keys=True))
{
"meta": {
"disclaimer": "Don't depend on openFDA to make selections concerning
> medical care. Whereas we make each effort to make sure that information is correct,
> you need to assume all outcomes are unvalidated. We might restrict or in any other case
> prohibit your entry to the API according to our Phrases of Service.",
"last_updated": "2020-09-09",
"license": "https://open.fda.gov/license/",
"phrases": "https://open.fda.gov/phrases/"
},
"outcomes": [
{
"count": 1,
"time": "20180101"
},
{
"count": 16,
"time": "20180102"
},
{
"count": 20,
"time": "20180103"
},
{
"count": 25,
"time": "20180104"
},
{
"count": 24,
"time": "20180105"
}
]
}
>>> finish
-------------------------------------------------------------------------------
Convert the JSON information to a Stata dataset
We will use the get() technique to extract the outcomes portion of the information object and place it in a listing object named fdadata.
python: import requests import json API = 'https://api.fda.gov/drug/occasion.json?search=" date = "receivedate:[20180101+TO+20180105]' nation = 'occurcountry:"US"' drug = 'affected person.drug.openfda.brand_name:"Fentanyl"' information="depend=receivedate" URL = API + date + "+AND+" + nation + "+AND+" + drug + "&" + information information = requests.get(URL).json() fdadata = information.get('outcomes', []) print(json.dumps(fdadata, indent=4, sort_keys=True)) finish
We will confirm that we extracted the info efficiently by viewing the output under.
. python:
-------------------------------------------- python (kind finish to exit) --------
>>> import requests
>>> import json
>>> API = 'https://api.fda.gov/drug/occasion.json?search="
>>> date = "receivedate:[20180101+TO+20180105]'
>>> nation = 'occurcountry:"US"'
>>> drug = 'affected person.drug.openfda.brand_name:"Fentanyl"'
>>> information="depend=receivedate"
>>> URL = API + date + "+AND+" + nation + "+AND+" + drug + "&" + information
>>> information = requests.get(URL).json()
>>> fdadata = information.get('outcomes', [])
>>> print(json.dumps(fdadata, indent=4, sort_keys=True))
[
{
"count": 1,
"time": "20180101"
},
{
"count": 16,
"time": "20180102"
},
{
"count": 20,
"time": "20180103"
},
{
"count": 25,
"time": "20180104"
},
{
"count": 24,
"time": "20180105"
}
]
>>> finish
-------------------------------------------------------------------------------
The information within the fdadata listing object are nonetheless in “key:worth“, format and I want to convert them to the “rows-and-columns” format of a pandas information body. Let’s start by importing the pandas module utilizing the alias pd. Then, we are able to use the read_json() technique to learn the fdadata listing object right into a pandas information body named fda_df.
python: import requests import json import pandas as pd API = 'https://api.fda.gov/drug/occasion.json?search=" date = "receivedate:[20180101+TO+20180105]' nation = 'occurcountry:"US"' drug = 'affected person.drug.openfda.brand_name:"Fentanyl"' information="depend=receivedate" URL = API + date + "+AND+" + nation + "+AND+" + drug + "&" + information information = requests.get(URL).json() fdadata = information.get('outcomes', []) fda_df = pd.read_json(json.dumps(fdadata)) fda_df finish
The information body fda_df displayed within the output under comprises 5 rows and three columns. The primary column is the index for the info body. The second column, named “time”, comprises the date of every commentary. And the third column, named “depend”, comprises the variety of opposed occasions in the US that concerned Fentanyl on that date.
. python:
-------------------------------------------- python (kind finish to exit) --------
>>> import requests
>>> import json
>>> import pandas as pd
>>> API = 'https://api.fda.gov/drug/occasion.json?search="
>>> date = "receivedate:[20180101+TO+20180105]'
>>> nation = 'occurcountry:"US"'
>>> drug = 'affected person.drug.openfda.brand_name:"Fentanyl"'
>>> information="depend=receivedate"
>>> URL = API + date + "+AND+" + nation + "+AND+" + drug + "&" + information
>>> information = requests.get(URL).json()
>>> fdadata = information.get('outcomes', [])
>>> fda_df = pd.read_json(json.dumps(fdadata))
>>> fda_df
time depend
0 20180101 1
1 20180102 16
2 20180103 20
3 20180104 25
4 20180105 24
>>> finish
-------------------------------------------------------------------------------
Now, we are able to use the to_stata() technique to save lots of the pandas information body fda_df to a Stata dataset named fentanyl.dta. The model=118 choice specifies that the info might be saved in a Stata 16 information file.
python:
import requests
import json
import pandas as pd
API = 'https://api.fda.gov/drug/occasion.json?search="
date = "receivedate:[20180101+TO+20180105]'
nation = 'occurcountry:"US"'
drug = 'affected person.drug.openfda.brand_name:"Fentanyl"'
information="depend=receivedate"
URL = API + date + "+AND+" + nation + "+AND+" + drug + "&" + information
information = requests.get(URL).json()
fdadata = information.get('outcomes', [])
fda_df = pd.read_json(json.dumps(fdadata))
fda_df.to_stata('fentanyl.dta', model=118)
finish
We will listing the contents of the Stata information file fentanyl.dta to confirm that the info have been saved accurately.
. use fentanyl.dta, clear
. listing
+--------------------------+
| index time depend |
|--------------------------|
1. | 0 20180101 1 |
2. | 1 20180102 16 |
3. | 2 20180103 20 |
4. | 3 20180104 25 |
5. | 4 20180105 24 |
+--------------------------+
At this level, it could be straightforward to develop the vary of the dates in our API name from January 1, 2010, by means of January 1, 2020, and graph the ensuing information (see the code block under).
Conclusion
We did it! We efficiently submitted an API name to openFDA, processed the ensuing JSON information, and transformed the JSON information to a Stata dataset. You is probably not involved in opposed drug occasions reported to the FDA. However you should utilize comparable steps to obtain and course of every kind of knowledge which are helpful to you. Simply kind “in style api information” in your search engine, and put together to be amazed. Every API may have its personal distinctive search fields and syntax, so you have to to learn the documentation. However your endurance and persistence might be rewarded with a world full of knowledge.
I’ve collected the code under and added feedback to remind you of the aim of every assortment of Python statements.
instance.do
python:
# Import packages
import requests
import json
import pandas as pd
# Assemble the URL for the API name
API = 'https://api.fda.gov/drug/occasion.json?search="
date = "receivedate:[20100101+TO+20200101]'
nation = 'occurcountry:"US"'
drug = 'affected person.drug.openfda.brand_name:"Fentanyl"'
information="depend=receivedate"
URL = API + date + "+AND+" + nation + "+AND+" + drug + "&" + information
# Submit the API information request
information = requests.get(URL).json()
# Extract the 'outcomes' a part of the JSON information
fdadata = information.get('outcomes', [])
# Convert the JSON information to a pandas information body
fda_df = pd.read_json(json.dumps(fdadata))
# Use pandas to jot down the info body to a Stata 16 dataset
fda_df.to_stata('fentanyl.dta', model=118)
finish
use fentanyl.dta, clear
drop index
generate date = mofd(date(string(time, "%8.0f"),"YMD"))
format date %tm
collapse (sum) depend, by(date)
tsset date, month-to-month
twoway (line depend date, lcolor(blue) lwidth(medthick)), ///
ytitle("Adversarial Occasions Reported to the FDA") ///
ylabel(0(2000)8000, angle(horizontal) grid) ///
xtitle("") ///
title("Fentanyl Adversarial Occasions Reported to the FDA") ///
caption(Information Supply: openFDA, measurement(small)) ///
scheme(s1color)
