In this series of blogs, I’ll show how to load JSON Documents from an API call into Azure SQL Database. I recently implemented this for a client. During development, I realised there were numerous ways this could be achieved. I’ll illustrate just one flavour of this. Undoubtedly, you’ll find alternative solutions that might be more optimal, however, I write this with SQL Server Developers in mind. Here’s an architecture overview of the approach I’ll take.
In addition to the above resources, I make use of Postman to do some API Testing. Postman is an application for testing APIs by sending requests to the web server in order to get a response back. I aim to use Postman in order to understand how the response data is structured and figure out the best way to consume it.
It’s worth mentioning that there are databases such as; MongoDB and Couchbase that store JSON Documents natively. Nevertheless, I’ll be focusing solely on traditional databases.
Part 1 – Data Exploration
- Understand how the API data is structured and formatted.
- Understand the API’s query parameters and configuration settings.
- Run a few GET Requests based on the above understanding.
In this series of blogs, I’ll be consuming data from the Land Registry API. The main reason being; the API doesn’t require the use of an authentication key, and so, I can rapidly develop a Minimum Viable Product. In this first part, of a 3-part series of blog, I’m keen to show the approach I took in understanding how the API should be consumed.
Using Land Registry data, I’m interested in extracting “Last Paid Prices” of properties within a chosen vicinity.
Reading the supporting documentation for the API is always a great first step in figuring out how the data should be consumed.
At this point I’m aiming to understand the following:
- Structure of the data – particularly the hierarchy of the data.
- Data Format – API responses can default to XML or JSON. If the default is XML, returning results as JSON is as easy as appending “.Json” to the endpoint (URL).
- Query Parameters – these are your predicates, “WHERE” statement in SQL. Therefore, they must equal a given value.
- API Configuration Settings – pagination & offset. These allow us to specify how many items are returned per page and specify the next set of results to be returned. In our instance the default items per page is 10.
Structure of the data
Price Paid Dataset (PPD) Model
The above PPD Model is neatly presented in Star Schema, click here for a light introduction to Star Schemas. The data can be accessed programmatically using a GET request (see below).
Base URL + Endpoint + “.Json” + Query_Parameter
Click here to view the values the Query_Parameters can take. For example, “property type” can take on values such as; “Detached, Semi-Detached, Terraced and Flat/Maisonette.” It’s worth noting that the values are case-sensitive.
Default = 1
Default = 10
(page * itemsPerPage) + 1
page for pagination
Items returned per query
Start position of result set
Can be modified
Cannot be modified
Cannot be modified
Now that we’ve understood how the data is structured and formatted, as well as the API’s query parameters and configuration settings, we can use Postman to run a few queries to construct our query.
Get request of postcode near Wimbledon Tennis Court
For completeness, here is an illustration of more complex query.
That said, the API takes considerably longer to respond. It’s best to keep the query as simple as possible.
Get request of postcode near Twickenham Rugby Stadium
In this blog, I’ve shown how I’ve approached the API documentation with the intention of constructing a GET request. Fortunately, the LandRegistry API is well documented, it even goes as far as providing a nice star schema model. Undoubtedly, not all APIs provide such a neat overview of their data structure.
In my next blog, I’ll begin extracting the data using a Logic App with a view to land said data in a Blob Store.