A Roundup of MS Data AMP 2017

There was a bit of buzz last night as Microsoft held the first “Data Amp” online conference. It was essentially a chance to make a bit of marketing buzz around some new products, and there was a whole lot of hype associated. But what actually happened?

Well we learned a lot about the next version of SQL Server, several Azure components moved from preview into General Availability and, amidst the noise, some cool new features added to existing ones.

So, to help cut through the hype, here are my top 3 highlights from MS Data Amp:

1. SQL Server 2017

It’s official, SQL Server vNext is coming this year and will be called SQL Server 2017. CTP2.0 is now available and is deemed production ready so you can grab a copy and start preparing migrations & new applications.

But should you adopt it early? Is there anything radically different with this version?

Well actually yes, there’s quite a lot in there of interest to a BI Developer, Data Scientist or anyone with analytical querying requirements.

We already knew about a few features, some small functions such as STRING_AGG() that will make like a whole lot easier, some huge, such as On-Premise PowerBI Hosting.

The major announcements last night were:

a) PYTHON in SQL Server: Hot on the heels of 2016’s R integration, SQL Server 2017 will now support Python in a similar way. This provides a few more integration options and some benefits in terms of performance and optimisation, but mostly, it provides choice and compatibility. You no longer have to choose R if you’re going SQL Server, you can port your existing Python across directly.

Will it change the world? Probably not – the majority of data science is currently performed on dedicated platforms, and the ease at which these can be created and torn down in Azure means there is little barrier to entry to continuing to do so. It does, however, make it really easy to try out, and super easy to integrate into existing applications and workflows. I’d disagree with it being called “Built-in AI in SQL Server”, but it’s certainly a step in that direction.

b) SQL Graph: You can now have Graph Databases inside SQL Server. Essentially you mark tables as edges and nodes, then use specific SQL syntax to query them. If you’ve got existing Neo4J databases that you would prefer to integrate directly into SQL Server, this is for you. Your edges and nodes also act like normal tables, so you can still query them in the traditional relational way.

Personally, I’m interested in using it as a reporting layer – putting some D3 visualisations over the top and natively displaying clustering, force-directed graphs and the like without and need for transformations, all from your existing SQL Server.

Adatis’s very own Terry McCann has put together his introduction to SQL Graph

2. Azure Analysis Services is now Generally Available

This is fantastic news for us as it is the missing pieces that can connect together the various pieces of the modern azure warehouse. By having a Platform-as-a-Service semantic layer, we can replace any remaining SQL VMs in our architecture. We can now process data using Data Lake and SQL Datawarehouse, model it in Azure Analysis Services for highly concurrent querying, then expose it via PowerBI, all without a virtual machine in sight.

As a service, it is a little costly – it’s currently priced around the same as the SQL VM it is replacing, but it only performs one of the functions. But the benefits of not managing infrastructure, and the ability to pause & resume without re-processing, are going to be very useful going forward.

3. Cognitive APIs now General Available

They’ve been in preview for a while, but we can now use the Face API, Computer Vision API and Content Moderator service in production systems. This allows us to do things like checking for the same person in a bunch of different photos, detect similarities between images and tag them with certain contents. This is pretty cool, although can be onerous to do en-masse… until now.

The BIG news for me here, is that the Cognitive API technology has been pushed into Data Lake Analytics. This means you can use native functions inside U-SQL to harness these APIs.

What does this mean? Well we can now push petabytes of image files through these APIs in a hugely parallel way, all without having to provision any infrastructure – you could write the code to do this from the browser on your phone as you travel home! This is an immensely powerful, scalable tool that’s now available right out of the box! Want to auto-moderate several million image files at once? You can! Want to pull out any image files that contain something that looks like a car – without waiting hours for the result? You can!!

Now, there are other features and announcements that came out of last night, but these are far and away the three that I’m excited about.

What about you?