As data engineers, we use Azure Data Factory on a daily basis to collect datasets from sources, process them, and store in target data stores. It provides a robust environment in the Microsoft Azure stack to schedule and manage data pipelines and is generally a very powerful tool for data integration and ETL in the cloud.
As we begin to explore how Microsoft Fabric will change the way we perform these tasks in its end-to-end data analytics platform, I wanted to explore the data integration workload of the service, Data Factory. This short blog will help to understand the next generation of Azure Data Factory, outline some of its key features and compare the differences between Azure Data Factory and Data Factory in Microsoft Fabric.
Azure Data Factory & Power Query Dataflows
Data Factory in Fabric brings together the functionality of Azure Data Factory and Power Query Dataflows into one product. Initially, Power Query Dataflows was an add-on to Power BI as a simple to use data transformation tool for data analysts in the cloud. Now it is also used for data migration in Power Apps. Azure Data Factory is primarily used for data ingestion, designed for data engineers to create flowing pipelines using task specific modules.
Before Data Factory, there was a split between the two with Power Query Dataflows; an easy-to-use tool for low volumes of data and Azure Data Factory for more scalable solutions. With its introduction into Microsoft Fabric, Data Factory takes the simple approach to data transformation from Power Query Dataflows and couples it with the scalability and ETL pipeline builds from ADF.
Features
Data Connectors
There are many options for data connectivity within Data Factory as shown by the image above, from excel workbooks and JSON files to Azure SQL databases and Databricks. It is also possible to connect directly to the OneLake environment or input a local file, as shown below.
We are given a very modern and simple environment to select our data source, making it really efficient to get connected to our data.
Dataflows
Dataflows are then created using the online Power Query editor and the data transformation engine to retrieve, define and load the data from source to target. A data destination can be selected for the dataflow as shown below, with Azure SQL database, Lakehouse, Azure Data Explorer (Kusto) and Warehouse currently supported.
It is worth noting that the Lakehouse & Data Warehouse are also available as sources, making it very convenient for us to build projects integrated within these.
The editor also allows you to write code in M (Power Query language) to develop within the dataflow.
Data Pipelines
As seen in Azure Data Factory, data pipelines are used to create a flow of executable activities to perform specific tasks. In Data Factory, we can incorporate the running of a dataflow into our pipelines to enhance the ETL process. These can then be scheduled to run at certain times or triggered based on the output of another flow.
There is now also a unique monitoring hub to track scheduled runs. This feature, combined with dataflows and data pipelines, gives us a full view of all workloads, and allows us to drill down into any activity.
Other Notable Features
Office 365 outlook activity
There is a new Office 365 outlook activity that allows us to send customised emails detailing the information from our pipelines or pipeline runs.
Save As
It is also now possible to save our existing pipelines in a convenient way to duplicate for other development purposes.
Azure Data Factory vs Data Factory in Fabric
For those that use Azure Data Factory on a regular basis, the below table from Microsoft outlines some of the key differences between the two products:
Overall, it confirms that Data Factory is an enhanced version of Azure Data Factory, with the benefit of having dataflows integrated within it. The pipelines are now integrated with the unified data platform, the monitoring hub has more advanced features, connections have replaced linked services and are more intuitive, and Microsoft has promised that functionality such as CI/CD and self-hosted integration runtimes are in progress.
Final thoughts
Data Factory combines the best features from two established products and improves them, providing users with a powerful data integration component of the Fabric suite. It offers tonnes of data connectivity, has a modernised feel, and brings some great new features. As a frequent user of Azure Data Factory, it matches up well and I’m interested to see how it may fit into the Adatis framework in the future!
I hope this has been a useful introduction. Thanks for reading.
Using Copilot Studio to Develop a HR Policy Bot
The next addition to Microsoft’s generative AI and large language model tools is Microsoft Copilot
Apr
Pretty Power BI – Adding GIFs
Good UX design is critical in enabling stakeholders to maximise the key insight that they
Apr
Pareto Charts in Power BI and the DAX behind them
The Pareto principle, commonly referred to as the 80/20 rule, is a concept of prioritisation.
Apr
Databricks: Cluster Configuration
Databricks, a cloud-based platform for data engineering, offers several tools that can be used to
Apr
AI Assistance in Microsoft Fabric
The exponential growth of Large Language Models (LLMs) couples with Microsoft’s close partnership with OpenAI
Apr
10 reasons why it’s worth the effort to understand the value of your data
“If leaders really want to create a data driven culture, the journey starts with them!
Apr
Content Safety in Azure AI Studio
Azure AI Content Safety is a solution designed to identify harmful content, whether generated by
Apr
Model Benchmarks in Azure AI Studio
In the constantly changing field of artificial intelligence (AI) and machine learning (ML), choosing the
Apr