7 Web API

Contents

  1. Working with APIs in Julia
  2. Working with APIs in SAS
  3. Working with APIs in R
  4. Working with APIs in Python

Application programming interfaces, or API’s, loosely refer to inputs a user can submit to a computer program. If that sounds vague, don’t worry. The term API can cover a broad range of situations and in this section of the tutorial we are going to cover some of the basics of a more specific Web API.

For statistical purposes, a web API is a method of integrating data from a website into your code. Unlike SQL queries, web API requests are specific to the website you are attempting to download from to allow more flexibility. Web API’s are used by a variety of developers to request and retrieve data, and the tabular structure of returned by a SQL query may not be ideal. Web API’s utilize specific URL strings to locate the data that you wish to request, and typically give it to you in a format known as JSON. To address the first requirement, it’s ideal to do a quick web search and find the documentation of the API so that you understand the URL syntax. Assuming you have that correct and the website grants your request for data, the last step is to convert the JSON data to something a bit more usable for statistics.

APIs can be integrated with databases by writing data to a database once it has been obtained from an API. An example of why you may want to do this is if the API doesn’t keep all historical records. For instance, the Twitter API only allows you to search trends of the past two weeks. If you wanted to track specific trends for a whole year, you could periodically query the Twitter API and add new rows to a table in your database. By the end of the year, you would have a dataset of trends from the entire year, even though this data would no longer be available through the API. Another example of why you would want to integrate APIs with databases is if you need to do a lot of data manipulation beore the data is usable. Rather than performing alterations everytime you need to load the data, you could perform the alterations once, store the final tables to a database, and pull the usable data from there. Some APIs limit the number of requests you can make, so in these cases it can be beneficial to store data in a database that you can query as frequently as you’d like. Finally, you may want to put data from an API into a database if you don’t always have reliable internet when you’re working on your project, and your database is stored locally on your computer. While you need an internet connection to pull from an API, you dont need internet to pull from a local database.

One of the most useful parts of a web API is the Query String. The query string is not found on all apis, but it is commonly used when pulling data where a user might need to specify additional material, like what column names you’d like to query or what year of data that you’d like to pull. Concretely, the query string comes after the main url, or endpoint, and is separated by a question mark (?). Each new parameter is designated by an ampersand (&). An example could look like:

https.//fake.data.com/fakeapi?&parm1=value1&parm2=value2

If you know the query string, you could type it into your web browser, and get the raw JSON data printed. There’s generally two ways to specify query string parameters. You can specify them directly into the url string, which will always work. If the package or programming language allows it, there might be a more user friendly way to enter these parameters in, where the software appends the query string to the url itself. Like API’s in general, query strings are specific to the website, but many of them are designed to be familiar.

In this section, we’ll be using the COVID tracking API. More information on this API can be found here.

7.1 Julia

In Julia, we access APIs using the HTTP and JSON packages.

imports

The HTTP request will be made using the aptly named HTTP package, then the JSON data will be parsed using the JSON package. If you’re reading in data that’s not as tabular as a spreadsheet, consider using the the DataKnots package. DataKnots also integrates nicely with the JSONTables package, which could also be used to parse any JSON data where you’re more interested in the dataset as a whole. Check out using these in the last Julia example.

using HTTP, JSON, DataFrames, Query

Enter the Correct URL and make the request

urlcacurrent = "https://covidtracking.com/api/v1/states/CA/current.json"
rawdata = HTTP.get(urlcurrent)

The “get” function requests data from the api, and returns the data in JSON format.

Parse the JSON data

JSONdata = rawdata.body |> String

Now, we convert the data, stored in the body of the API object, into a string.

CACOVID = JSONdata |> JSON.parse |> DataFrame

Finally, we parse through the JSON String and convert the data into a usable DataFrame.

With A Query String using the LA City Data API

using DataKnots, HTTP, JSONTables, DataFrames

url = "https://data.lacity.org/resource/63jg-8b9z.json"
parms = Dict("\$query" => "SELECT * WHERE mocodes IS NOT NULL")

crimedata = HTTP.get(url,query=parms).body |> String |> jsontable

Similar to above, but the jsontable function returns a Tables.jl object with structure, rather than just the values.

base = DataKnot(:crime => crimedata )
base[It.crime]

Finally, put the data into a useable DataKnot.

7.2 SAS

The SAS option requires 3 steps to work properly. First, you must define a where you will store the data from the API. Then, use proc http to make the api call

define a temporary file to read the JSON data into

filename raw_json temp;

use proc http to make the API call Use get to

proc http
url = "https://covidtracking.com/api/v1/us/current.json"
method = "GET"
out = raw_json

QUERY=("parm1" = "val1" "parm2" = "val2");

quit;

This api uses the path segment of the url to specify parameters, but if you are using an an api that utilizes a query string use the query option as listed below, or append it directly to the url. If doing the latter, check the api documentation to see what delimiter is used between parameters. Parameters are typically specified in "parm" = "val" form.

use the libname statement to convert the json file to a sas table

libname Calif JSON fileref = raw_json;

proc print data = Calif.alldata;
run;

You’ll note that because there was an internal object, the category column is still in JSON format.

7.3 R

To connect to APIs in R, we utilize the httr and jsonlite packages.

library(httr)
library(jsonlite)

To get data from an API, we’ll use the GET command from the httr library. This function takes in a URL and can also take in extra parameters, depending on the API.

res <- GET("https://covidtracking.com/api/v1/states/current.json")
res
Response [https://covidtracking.com/api/v1/states/current.json]
  Date: 2020-05-20 03:17
  Status: 200
  Content-Type: application/json
  Size: 45 kB

Looking into the res variable, we see a description of our request and what was returned. To look at the data received, we convert the unicode content of the response into characters. APIs typically return data in JSON format, so we can use the fromJSON function from the jsonlite package to translate this into a dataframe.

json_data <- res$content %>% 
  rawToChar() 
  
data <- json_data %>% 
  fromJSON()
  
head(data)
state positive positiveScore
1     AK       399            ...
2     AL       12376          ...
...

As an example of using an API with other query parameters that you can pass in, lets look at the API for jeopardy trivia questions! Information on this API can be found here.

The “random question” section of the API allows you to to pass in a “count” parameter for how many questions you would like. The GET function in R can take these query specifications through the query parameter as a named list.

res <- GET(
  "http://jservice.io/api/random",
  query = list(
    count = 2
    # could insert other query specifications here!
  )
)

questions <- res$content %>% 
  rawToChar() %>% 
  fromJSON()
  
questions
     id       answer                                         question value
1 71610 a centennial   A celebration to mark at 10-decade anniversary  1000  ...
2 71611    Patagonia   ...
...

7.4 Python

To connect with APIs in Python, we use the requests package. We’ll also use the pandas package later to convert the data into a DataFrame.

import requests
import pandas as pd

To get data from an API, we’ll use the get command from the requests library. This function takes in a URL and can also take in extra parameters, depending on the API.

req = requests.get("https://covidtracking.com/api/v1/states/current.json")
req
<Response [200]>

Looking into the res variable, we see that our request was successful (status 200). To look at the data received, can use the json method of the request. We can convert this into a dataframe with the DataFrame function from pandas.

json_data = req.json()
data = pd.DataFrame(json_data)
data.head()
  state  ...                                      hash
0    AK  ...  22375f1c5b622e4c0aee670cc25b05584c1e3341
1    AL  ...  dc3754f5631d38c30317c938b33a1cbcb5a19bee

Same as in the R section, we’ll use the API for jeopardy trivia questions to give an example of how to pass a query parameter to the API.

The “random question” section of the API allows you to to pass in a “count” parameter for how many questions you would like. The requests.get function in Python can take these query specifications through the params parameter as a dictionary of strings.

req = requests.get("http://jservice.io/api/random", params = {"count": "2"})
json_data = req.json()
data = pd.DataFrame(json_data)
data
   id  ...                                   category
0  59388  ...  {'id': 2741, 'title': 'think pink'...
1  59389  ...  {'id': 7597, 'title': 'veggie-pourri'...