As I’m sure you’ve heard, CTP3 of SQL Server Denali was released yesterday, and can be downloaded here. Denali includes Data Quality Services (DQS), Microsoft’s new data cleansing and matching component that’s based on the Zoomix acquisition that occurred a couple of years back. Data Quality Services didn’t make it into the first CTP, but is now available, so I though it would be worth a quick blog post.
Installing Data Quality Services
Data Quality Services is an option in the main install, but when you go to run the Data Quality Client, you’ll get a message stating that DQS is not installed. As far as I can tell, DQS needs to be installed manually, by running the DQSInstaller.exe, which you can find in the SQL Server Binn directory. This will create two SQL Server databases:
Data Quality Client
Once DQS is configured, you’ll be in a position to use the Data Quality Client, which is a windows application, available in 32 or 64 bit. Once you connect, you’ll get the following screen:
The knowledge base is the key to how DQS works, being utilised to determine the data quality of your source data. You can create your own knowledge base, using your own data, or even cloud-based data. For this example, I’m going to use the built in knowledge base called DQS Data.
As an example, I’ve created some data that I want to cleanse. It’s adventure works country data that I’ve put into a separate table that I’ve called dbo.Country.
My task now is is to clean this data, which I can do by creating a data quality project:
I’ve called my project Countries, and I’ve picked the built-in DQS Data knowledge base, which I noticed contains reference data for countries. The activity that I’ve selected is Cleansing, and then I’ve clicked Create.
DQS will then prompt for mapping the source data to one of the domains in the Knowledge Base. Here I’ve mapping my country name to the Country domain from the knowledge base:
After clicking the Start button on the next screen, the cleaning process starts, which gives the following results:
I put a couple of typos into my source data, such as changing Algeria to ‘Algerian’ and Albania to ‘Albana’. These were picked up by DQS, along with a few others, but a user has the opportunity to approve or reject, via the radio buttons shown above. I chose to approve the first two, and then clicked next. In the final screen, DQS allows you to output the cleaned data to a separate table, as shown below:
If you go into the table that DQS has created, you will see that there is a Name_Status column, which holds the cleaning status on a per record basis:
This can be used to update the source data, if required, and therefore address the data quality issues.
Summary
This has been a quick introduction to cleaning data with Data Quality Services. There’s plenty more to look at, and I hope to find time to do so at some point – In particular I’m keen to take a look at how DQS works with Master Data Services. I’ve noticed that there’s also a new MS DQS blog (http://blogs.msdn.com/b/dqs/) – I’m sure that will be a great resource for DQS learning.
How Artificial Intelligence and Data Add Value to Businesses
Knowledge is power. And the data that you collect in the course of your business
May
Databricks Vs Synapse Spark Pools – What, When and Where?
Databricks or Synapse seems to be the question on everyone’s lips, whether its people asking
1 Comment
May
Power BI to Power AI – Part 2
This post is the second part of a blog series on the AI features of
Apr
Geospatial Sample architecture overview
The first blog ‘Part 1 – Introduction to Geospatial data’ gave an overview into geospatial
Apr
Data Lakehouses for Dummies
When we are thinking about data platforms, there are many different services and architectures that
Apr
Enable Smart Facility Management with Azure Digital Twins
Before I started writing this blog, I went to Google and searched for the keywords
Apr
Migrating On-Prem SSIS workload to Azure
Goal of this blog There can be scenario where organization wants to migrate there existing
Mar
Send B2B data with Azure Logic Apps and Enterprise Integration Pack
After creating an integration account that has partners and agreements, we are ready to create
Mar