With SQL Server 2014 due for release soon, there’s a lot of talk about how little is changing for those of us more interested in Business Intelligence. We’ll be getting clustered, updatable columnstore, which is very convenient, but I’ve been looking into the potential benefits of other new features. The real gem for me is the potential of memory-optimised tables or in-memory OLTP (previously known as Project Hekaton).
I’ve had a few questions around how columnstore and memory-optimised tables differ – they’re both in-memory solutions for data storage right?
The big difference comes in disk usage – in order for a columnstore index to exist, there must also be a physical table written to the disk. Any updates to the table, even in SQL Server 2014, will require disk writes, log writes too if you’re not bulk loading. Memory-optimised tables don’t have this overhead – we can push the staging data straight into memory and skip the disk entirely.
There are plenty of uses for this new type of table, but it’s the traditional ETL routine I’m interested in. We would normally pull data from our source systems, place it in a staging database while we clean, validate etc, then write the cleaned data to the warehouse:
The data written to the staging database is transient, it’s only of use during the load process. It is therefore redundant to write this data to disk each time; I believe this is where memory-optimised tables will really have an impact.
To prove this theory, I set up a quick demonstration. I started by creating two identical tables, one physical, one memory-optimised, both schema copies of [AdventureWorks2012].[Sales].[SalesOrderDetail].
The memory optimised table was created using the following script:
I then built the simplest ofs, a data flow to move data from AdventureWorks2012.Sales.SalesOrderDetail to a ‘staging’ table, then read the data from the staging table and write it back to a disk-based table, representing our final warehouse.
I configured two versions of the package, one where the staging table was disk-based, one where it is our memory-optimised table. I deployed and scheduled these packages as alternating jobs and kicked them off for a few hours to see how performance varies over time. The results are as follows:
The initial results are very promising – we see a huge reduction in execution time when using in-memory tables for our transient ETL data. We’re still performing minimal logging with this setup – you can reduce this even further by switching to SCHEMA_ONLY durability, but by bypassing writing the data to disk we’ve cut our ETL window in half.
There are obvious hardware limits to how this can be used, and in-memory tables have their own limitations, but implemented carefully it could change the face of warehouse loading drastically.
How Artificial Intelligence and Data Add Value to Businesses
Knowledge is power. And the data that you collect in the course of your business
May
Databricks Vs Synapse Spark Pools – What, When and Where?
Databricks or Synapse seems to be the question on everyone’s lips, whether its people asking
1 Comment
May
Power BI to Power AI – Part 2
This post is the second part of a blog series on the AI features of
Apr
Geospatial Sample architecture overview
The first blog ‘Part 1 – Introduction to Geospatial data’ gave an overview into geospatial
Apr
Data Lakehouses for Dummies
When we are thinking about data platforms, there are many different services and architectures that
Apr
Enable Smart Facility Management with Azure Digital Twins
Before I started writing this blog, I went to Google and searched for the keywords
Apr
Migrating On-Prem SSIS workload to Azure
Goal of this blog There can be scenario where organization wants to migrate there existing
Mar
Send B2B data with Azure Logic Apps and Enterprise Integration Pack
After creating an integration account that has partners and agreements, we are ready to create
Mar