Data Engineering

Extract, Transform & Load REST API data in Azure SQL Database

Posted on 1st October 20191st October 2019 by Tino Zishiri

01
Oct

Introduction

In this series of blogs, I’ll show how to load JSON Documents from an API call into Azure SQL Database. I recently implemented this for a client. During development, I realised there were numerous ways this could be achieved. I’ll illustrate just one flavour of this. Undoubtedly, you’ll find alternative solutions that might be more optimal, however, I write this with SQL Server Developers in mind. Here’s an architecture overview of the approach I’ll take.

In addition to the above resources, I make use of Postman to do some API Testing. Postman is an application for testing APIs by sending requests to the web server in order to get a response back. I aim to use Postman in order to understand how the response data is structured and figure out the best way to consume it.

It’s worth mentioning that there are databases such as; MongoDB and Couchbase that store JSON Documents natively. Nevertheless, I’ll be focusing solely on traditional databases.

It’s also worth mentioning that JSON or JavaScript Object Notation is a type of semi-structured data. Semi-structured data is data that isn’t confined to a predefined schema such as that found in a relational database. In relational databases, columns need to be defined ahead of time. Instead, JSON holds data in a flexible key/value pair structure. It has been popularized by Web Services and APIs (Application Program Interface). An API is a set of definitions and protocols that allow one application to communicate with another.

Part 1 – Data Exploration

Objectives

Understand how the API data is structured and formatted.
Understand the API’s query parameters and configuration settings.
Run a few GET Requests based on the above understanding.

Background

In this series of blogs, I’ll be consuming data from the Land Registry API. The main reason being; the API doesn’t require the use of an authentication key, and so, I can rapidly develop a Minimum Viable Product. In this first part, of a 3-part series of blog, I’m keen to show the approach I took in understanding how the API should be consumed.

Scenario

Using Land Registry data, I’m interested in extracting “Last Paid Prices” of properties within a chosen vicinity.

Documentation

Reading the supporting documentation for the API is always a great first step in figuring out how the data should be consumed.

Examine the API documentation
Examine the API configuration

At this point I’m aiming to understand the following:

Structure of the data – particularly the hierarchy of the data.
Data Format – API responses can default to XML or JSON. If the default is XML, returning results as JSON is as easy as appending “.Json” to the endpoint (URL).
Query Parameters – these are your predicates, “WHERE” statement in SQL. Therefore, they must equal a given value.
API Configuration Settings – pagination & offset. These allow us to specify how many items are returned per page and specify the next set of results to be returned. In our instance the default items per page is 10.

Structure of the data

Price Paid Dataset (PPD) Model

^{Source: Landregistry}

The above PPD Model is neatly presented in Star Schema, click here for a light introduction to Star Schemas. The data can be accessed programmatically using a GET request (see below).

Base URL
^{http://landregistry.data.gov.uk}

Endpoint
^{/data/ppi/transaction-record}

Query_Parameter
^{?estateType.label}
^{?hasTransaction}
^{?propertyAddress.county}
^{?propertyAddress.district}
^{?propertyAddress.locality}
^{?propertyAddress.paon}
^{?propertyAddress.postcode}
^{?propertyAddress.saon}
^{?propertyAddress.street}
^{?propertyAddress.town}
^{?propertyType.label}
^{?recordStatus.label}
^{?transactionCategory.label}
^?newBuild
^?pricePaid
^{?transactionDate}
^{?transactionId}

Get Request

Base URL + Endpoint + “.Json” + Query_Parameter

Click here to view the values the Query_Parameters can take. For example, “property type” can take on values such as; “Detached, Semi-Detached, Terraced and Flat/Maisonette.” It’s worth noting that the values are case-sensitive.

API Configuration

Configuration

^page

^itemsPerPage

^startIndex

Value

^{Default = 1}

^{Default = 10}

^{(page * itemsPerPage) + 1}

Description

^{Use page for pagination}

^{Items returned per query}

^{Start position of result set}

Modification

^{Can be modified}

^{Cannot be modified}

Now that we’ve understood how the data is structured and formatted, as well as the API’s query parameters and configuration settings, we can use Postman to run a few queries to construct our query.

Example Queries

Get request of postcode near Wimbledon Tennis Court

http://landregistry.data.gov.uk/data/ppi/transaction-record.json?propertyAddress.postcode=SW19%205JB&_page=1

For completeness, here is an illustration of more complex query.

That said, the API takes considerably longer to respond. It’s best to keep the query as simple as possible.

Get request of postcode near Twickenham Rugby Stadium

http://landregistry.data.gov.uk/data/ppi/transaction-record.json?propertyAddress.postcode=TW2%207QU&_page=1

Summary

In this blog, I’ve shown how I’ve approached the API documentation with the intention of constructing a GET request. Fortunately, the LandRegistry API is well documented, it even goes as far as providing a nice star schema model. Undoubtedly, not all APIs provide such a neat overview of their data structure.

In my next blog, I’ll begin extracting the data using a Logic App with a view to land said data in a Blob Store.

Tino Zishiri

Data Analytics Consultant delivering & demystifying data solutions in the cloud. Passionate about all things Microsoft.

Introduction to Data Wrangler in Microsoft Fabric

What is Data Wrangler? A key selling point of Microsoft Fabric is the Data Science

25
Jul

Autogen Power BI Model in Tabular Editor

In the realm of business intelligence, Power BI has emerged as a powerful tool for

12
Jul

Microsoft Healthcare Accelerator for Fabric

Microsoft released the Healthcare Data Solutions in Microsoft Fabric in Q1 2024. It was introduced

09
Jul

Unlock the Power of Colour: Make Your Power BI Reports Pop

Colour is a powerful visual tool that can enhance the appeal and readability of your

09
Jul

Python vs. PySpark: Navigating Data Analytics in Databricks – Part 2

Part 2: Exploring Advanced Functionalities in Databricks Welcome back to our Databricks journey! In this

20
May

GPT-4 with Vision vs Custom Vision in Anomaly Detection

Businesses today are generating data at an unprecedented rate. Automated processing of data is essential

08
May

Exploring DALL·E Capabilities

What is DALL·E? DALL·E is text-to-image generation system developed by OpenAI using deep learning methodologies.

03
May

Using Copilot Studio to Develop a HR Policy Bot

The next addition to Microsoft’s generative AI and large language model tools is Microsoft Copilot

29
Apr

5 thoughts on “Extract, Transform & Load REST API data in Azure SQL Database”

Ilyas Assainov says:
Hey Tino,

Great article, but where is the next blog post mentioned?

6th November 2019 at 11:03 am
- Tino Zishiri says:
  Hi Ilyas thanks for your feedback. You can find the next article here. I’ll get the final blog (part 3) out shortly.
  
  11th November 2019 at 4:07 pm
Deepa says:
Cool. When is the next part available?

7th November 2019 at 10:13 am
- Tino Zishiri says:
  Hi Deepa thanks for your feedback. You can find the next article here. I’ll get the final blog (part 3) out shortly.
  
  11th November 2019 at 4:08 pm
Tino Zishiri says:
Hi All, here’s Part 3: https://adatis.co.uk/extract-transform-load-rest-api-responses-in-azure-sqldb-part-3-data-transformation-loading/

2nd April 2020 at 9:52 am

Comments are closed.

Data Engineering

Extract, Transform & Load REST API data in Azure SQL Database

Introduction