Extract, Transform & Load REST API data in Azure SQL Database

Introduction

In this series of blogs, I’ll show how to load JSON Documents from an API call into Azure SQL Database. I recently implemented this for a client. During development, I realised there were numerous ways this could be achieved. I’ll illustrate just one flavour of this. Undoubtedly, you’ll find alternative solutions that might be more optimal, however, I write this with SQL Server Developers in mind. Here’s an architecture overview of the approach I’ll take.

In addition to the above resources, I make use of Postman to do some API Testing. Postman is an application for testing APIs by sending requests to the web server in order to get a response back. I aim to use Postman in order to understand how the response data is structured and figure out the best way to consume it.

It’s worth mentioning that there are databases such as; MongoDB and Couchbase that store JSON Documents natively. Nevertheless, I’ll be focusing solely on traditional databases.

It’s also worth mentioning that JSON or JavaScript Object Notation is a type of semi-structured data. Semi-structured data is data that isn’t confined to a predefined schema such as that found in a relational database. In relational databases, columns need to be defined ahead of time. Instead, JSON holds data in a flexible key/value pair structure. It has been popularized by Web Services and APIs (Application Program Interface). An API is a set of definitions and protocols that allow one application to communicate with another.

Part 1 – Data Exploration

Objectives

  • Understand how the API data is structured and formatted.
  • Understand the API’s query parameters and configuration settings.
  • Run a few GET Requests based on the above understanding.

Background

In this series of blogs, I’ll be consuming data from the Land Registry API. The main reason being; the API doesn’t require the use of an authentication key, and so, I can rapidly develop a Minimum Viable Product. In this first part, of a 3-part series of blog, I’m keen to show the approach I took in understanding how the API should be consumed.

Scenario

Using Land Registry data, I’m interested in extracting “Last Paid Prices” of properties within a chosen vicinity.

Documentation

Reading the supporting documentation for the API is always a great first step in figuring out how the data should be consumed.

Examine the API documentation
Examine the API configuration

At this point I’m aiming to understand the following:

  • Structure of the data – particularly the hierarchy of the data.
  • Data Format – API responses can default to XML or JSON. If the default is XML, returning results as JSON is as easy as appending “.Json” to the endpoint (URL).
  • Query Parameters – these are your predicates, “WHERE” statement in SQL. Therefore, they must equal a given value.
  • API Configuration Settings – pagination & offset. These allow us to specify how many items are returned per page and specify the next set of results to be returned. In our instance the default items per page is 10.

Structure of the data

Price Paid Dataset (PPD) Model

Source: Landregistry

The above PPD Model is neatly presented in Star Schema, click here for a light introduction to Star Schemas. The data can be accessed programmatically using a GET request (see below).

Base URL
http://landregistry.data.gov.uk

Endpoint
/data/ppi/transaction-record

Query_Parameter
?estateType.label
?hasTransaction
?propertyAddress.county
?propertyAddress.district
?propertyAddress.locality
?propertyAddress.paon
?propertyAddress.postcode
?propertyAddress.saon
?propertyAddress.street
?propertyAddress.town
?propertyType.label
?recordStatus.label
?transactionCategory.label
?newBuild
?pricePaid
?transactionDate
?transactionId

Get Request

Base URL + Endpoint + “.Json” + Query_Parameter

Click here to view the values the Query_Parameters can take. For example, “property type” can take on values such as; “Detached, Semi-Detached, Terraced and Flat/Maisonette.” It’s worth noting that the values are case-sensitive.

API Configuration

Configuration

page

itemsPerPage

startIndex

Value

Default = 1

Default = 10

(page * itemsPerPage) + 1

Description

Use page for pagination

Items returned per query

Start position of result set

Modification

Can be modified

Cannot be modified

Cannot be modified

Now that we’ve understood how the data is structured and formatted, as well as the API’s query parameters and configuration settings, we can use Postman to run a few queries to construct our query.

Example Queries

Get request of postcode near Wimbledon Tennis Court

http://landregistry.data.gov.uk/data/ppi/transaction-record.json?propertyAddress.postcode=SW19%205JB&_page=1

For completeness, here is an illustration of more complex query.

That said, the API takes considerably longer to respond. It’s best to keep the query as simple as possible.

Get request of postcode near Twickenham Rugby Stadium

http://landregistry.data.gov.uk/data/ppi/transaction-record.json?propertyAddress.postcode=TW2%207QU&_page=1

Summary

In this blog, I’ve shown how I’ve approached the API documentation with the intention of constructing a GET request. Fortunately, the LandRegistry API is well documented, it even goes as far as providing a nice star schema model. Undoubtedly, not all APIs provide such a neat overview of their data structure.

In my next blog, I’ll begin extracting the data using a Logic App with a view to land said data in a Blob Store.

4 thoughts on “Extract, Transform & Load REST API data in Azure SQL Database

Leave a Reply

Your email address will not be published. Required fields are marked *