7 Web API
Contents
- Working with APIs in Julia
- Working with APIs in SAS
- Working with APIs in R
- Working with APIs in Python
Application programming interfaces, or API’s, loosely refer to inputs a user can submit to a computer program. If that sounds vague, don’t worry. The term API can cover a broad range of situations and in this section of the tutorial we are going to cover some of the basics of a more specific Web API.
For statistical purposes, a web API is a method of integrating data from a website into your code. Unlike SQL queries, web API requests are specific to the website you are attempting to download from to allow more flexibility. Web API’s are used by a variety of developers to request and retrieve data, and the tabular structure of returned by a SQL query may not be ideal. Web API’s utilize specific URL strings to locate the data that you wish to request, and typically give it to you in a format known as JSON. To address the first requirement, it’s ideal to do a quick web search and find the documentation of the API so that you understand the URL syntax. Assuming you have that correct and the website grants your request for data, the last step is to convert the JSON data to something a bit more usable for statistics.
APIs can be integrated with databases by writing data to a database once it has been obtained from an API. An example of why you may want to do this is if the API doesn’t keep all historical records. For instance, the Twitter API only allows you to search trends of the past two weeks. If you wanted to track specific trends for a whole year, you could periodically query the Twitter API and add new rows to a table in your database. By the end of the year, you would have a dataset of trends from the entire year, even though this data would no longer be available through the API. Another example of why you would want to integrate APIs with databases is if you need to do a lot of data manipulation beore the data is usable. Rather than performing alterations everytime you need to load the data, you could perform the alterations once, store the final tables to a database, and pull the usable data from there. Some APIs limit the number of requests you can make, so in these cases it can be beneficial to store data in a database that you can query as frequently as you’d like. Finally, you may want to put data from an API into a database if you don’t always have reliable internet when you’re working on your project, and your database is stored locally on your computer. While you need an internet connection to pull from an API, you dont need internet to pull from a local database.
One of the most useful parts of a web API is the Query String. The query string is not found on all apis, but it is commonly used when pulling data where a user might need to specify additional material, like what column names you’d like to query or what year of data that you’d like to pull. Concretely, the query string comes after the main url, or endpoint, and is separated by a question mark (?). Each new parameter is designated by an ampersand (&). An example could look like:
https.//fake.data.com/fakeapi?&parm1=value1&parm2=value2
If you know the query string, you could type it into your web browser, and get the raw JSON data printed. There’s generally two ways to specify query string parameters. You can specify them directly into the url string, which will always work. If the package or programming language allows it, there might be a more user friendly way to enter these parameters in, where the software appends the query string to the url itself. Like API’s in general, query strings are specific to the website, but many of them are designed to be familiar.
In this section, we’ll be using the COVID tracking API. More information on this API can be found here.
7.1 Julia
In Julia, we access APIs using the HTTP
and JSON
packages.
imports
The HTTP request will be made using the aptly named HTTP package, then the JSON data will be parsed using the JSON package. If you’re reading in data that’s not as tabular as a spreadsheet, consider using the the DataKnots package. DataKnots also integrates nicely with the JSONTables package, which could also be used to parse any JSON data where you’re more interested in the dataset as a whole. Check out using these in the last Julia example.
using HTTP, JSON, DataFrames, Query
Enter the Correct URL and make the request
= "https://covidtracking.com/api/v1/states/CA/current.json"
urlcacurrent = HTTP.get(urlcurrent) rawdata
The “get” function requests data from the api, and returns the data in JSON format.
Parse the JSON data
= rawdata.body |> String JSONdata
Now, we convert the data, stored in the body of the API object, into a string.
= JSONdata |> JSON.parse |> DataFrame CACOVID
Finally, we parse through the JSON String and convert the data into a usable DataFrame.
With A Query String using the LA City Data API
using DataKnots, HTTP, JSONTables, DataFrames
= "https://data.lacity.org/resource/63jg-8b9z.json"
url = Dict("\$query" => "SELECT * WHERE mocodes IS NOT NULL")
parms
= HTTP.get(url,query=parms).body |> String |> jsontable crimedata
Similar to above, but the jsontable function returns a Tables.jl object with structure, rather than just the values.
= DataKnot(:crime => crimedata )
base base[It.crime]
Finally, put the data into a useable DataKnot.
7.2 SAS
The SAS option requires 3 steps to work properly. First, you must define a where you will store the data from the API. Then, use proc http to make the api call
define a temporary file to read the JSON data into
filename raw_json temp;
use proc http to make the API call Use get to
proc http
url = "https://covidtracking.com/api/v1/us/current.json"
method = "GET"
out = raw_json
QUERY=("parm1" = "val1" "parm2" = "val2");
quit;
This api uses the path segment of the url to specify parameters, but if you are using an an api that utilizes a query string use
the query option as listed below, or append it directly to the url. If doing the latter, check the api documentation to see what delimiter is used between parameters. Parameters are typically specified in "parm" = "val"
form.
use the libname statement to convert the json file to a sas table
libname Calif JSON fileref = raw_json;
proc print data = Calif.alldata;
run;
You’ll note that because there was an internal object, the category
column is still in JSON format.
7.3 R
To connect to APIs in R, we utilize the httr
and jsonlite
packages.
library(httr)
library(jsonlite)
To get data from an API, we’ll use the GET
command from the httr
library. This function takes in a URL and can also take in extra parameters, depending on the API.
GET("https://covidtracking.com/api/v1/states/current.json")
res <- res
Response [https://covidtracking.com/api/v1/states/current.json]
Date: 2020-05-20 03:17
Status: 200
Content-Type: application/json
Size: 45 kB
Looking into the res variable, we see a description of our request and what was returned. To look at the data received, we convert the unicode content of the response into characters. APIs typically return data in JSON format, so we can use the fromJSON
function from the jsonlite
package to translate this into a dataframe.
res$content %>%
json_data <- rawToChar()
json_data %>%
data <- fromJSON()
head(data)
state positive positiveScore
1 AK 399 ...
2 AL 12376 ...
...
As an example of using an API with other query parameters that you can pass in, lets look at the API for jeopardy trivia questions! Information on this API can be found here.
The “random question” section of the API allows you to to pass in a “count” parameter for how many questions you would like. The GET
function in R can take these query specifications through the query
parameter as a named list.
GET(
res <-"http://jservice.io/api/random",
query = list(
count = 2
# could insert other query specifications here!
)
)
res$content %>%
questions <- rawToChar() %>%
fromJSON()
questions
id answer question value
1 71610 a centennial A celebration to mark at 10-decade anniversary 1000 ...
2 71611 Patagonia ...
...
7.4 Python
To connect with APIs in Python, we use the requests
package. We’ll also use the pandas
package later to convert the data into a DataFrame.
import requests
import pandas as pd
To get data from an API, we’ll use the get
command from the requests
library. This function takes in a URL and can also take in extra parameters, depending on the API.
requests.get("https://covidtracking.com/api/v1/states/current.json")
req = req
<Response [200]>
Looking into the res variable, we see that our request was successful (status 200). To look at the data received, can use the json
method of the request. We can convert this into a dataframe with the DataFrame
function from pandas
.
req.json()
json_data = pd.DataFrame(json_data)
data =data.head()
state ... hash
0 AK ... 22375f1c5b622e4c0aee670cc25b05584c1e3341
1 AL ... dc3754f5631d38c30317c938b33a1cbcb5a19bee
Same as in the R
section, we’ll use the API for jeopardy trivia questions to give an example of how to pass a query parameter to the API.
The “random question” section of the API allows you to to pass in a “count” parameter for how many questions you would like. The requests.get
function in Python can take these query specifications through the params
parameter as a dictionary of strings.
requests.get("http://jservice.io/api/random", params = {"count": "2"})
req = req.json()
json_data = pd.DataFrame(json_data)
data = data
id ... category
0 59388 ... {'id': 2741, 'title': 'think pink'...
1 59389 ... {'id': 7597, 'title': 'veggie-pourri'...