A Guide to Azure SQL DataWarehouse

So you’ve heard the hype – the Azure SQL DW is going to solve all of your problems in one fell swoop… Right? Well… maybe. The system itself is a mix of technologies designed for low concurrency analytics across huge amounts of relational data. In short, it’s a cloud-scalable T-SQL-based MPP platform, with all the benefits and restrictions that performing everything in parallel brings. If your problem can be solved by performing lots of calculations over small of your data before aggregating the results into a whole, this is the technology for you.

However, before you jump right in, be aware that SQLDW is a very different beast to other SQL tools. There are specific concepts you need to be familiar with before building your system, otherwise you’re not going to see the promised performance gains and will likely lose faith very quickly!

I’m in the process of documenting these concepts, plus there is a wealth of information available on the Azure SQLDW site. For the next few months, I’ll be running through the blog topics below and updating the links accordingly. If there are topics you’d like me to add to the list, please get in touch!

Azure SQLDW Core Concepts:

What is it?

How Does Scaling Work?

Distributions

Polybase

Polybase Limitations

Polybase Design Patterns

CTAS

– Resource Classes

– Partitioning

Designing ETL (or ELT) In Azure SQLDW

– Row counts

– Statistics

– Surrogate Keys

Performance Tuning Azure SQLDW

– Plan Exploration in SQLDW

– Data Movement Types

– Minimising Data Movement

Managing Azure SQLDW

– Backup & Restore

– Monitoring Distributions

– System Monitoring

– Job Orchestration

– Scaling and Management

– Performance Tuning Queries

Azure SQLDW Architecture

– Presentation Layers

– Data Lake Integrations