What are Integration Runtimes?
An Integration Runtime (IR) is the compute infrastructure used by Azure Data Factory to provide data integration capabilities such as Data Flows and Data Movement. It has access to resources in either public networks, or hybrid scenarios (public and private networks).
Integration Runtimes are specified in each Linked Service, under Connections.
There are 3 types to choose from.
Azure Integration Runtime is managed by Microsoft. All the patching, scaling and maintenance of the underlying infrastructure is taken care of. The IR can only access data stores and services in public networks.
Self-hosted Integration Runtimes use infrastructure and hardware managed by you. You’ll need to address all the patching, scaling and maintenance. The IR can access resources in both public and private networks.
Azure-SSIS Integration Runtimes are VMs running the SSIS engine which allow you to natively execute SSIS packages. They are managed by Microsoft. As a result, all the patching, scaling and maintenance is taken care of. The IR can access resources in both public and private networks.
Integration Runtime Scenarios
- Azure automatically provisions an integration Runtime which can connect to Azure resources (Azure SQL, Azure Synapse Analytics, Storage Accounts) without any issues.
- You can perform data integration securely in a private network environment, shielded from the public cloud environment. For that you need to install a self-hosted IR inside your virtual private network. The self-hosted integration runtime only makes outbound HTTP-based connections to open internet.
- You can also perform data integration securely in an on prem environment. For that you need to install a Self-hosted IR behind your corporate firewall in your on prem environment.
- You can natively execute SSIS Packages by creating an Azure-SSIS Integration Runtime which creates an Integration Services Catalog in Azure SQL Database where the packages are stored. An ADF pipeline run sends commands to the Azure SSIS IR which executes the SSIS Packages.
Are Integration Runtimes Secure?
Data Store Credentials
On-premise data store credentials can either be stored within Data Factory or be referenced by Data Factory via Key Vault at runtime. Storing credentials within Data Factory means they are always stored and encrypted on the Self-hosted IR machine.
Storing credentials locally can be done with or without flowing credentials through Azure backend service to the Self-hosted IR machine. Both options allow secure encryption.
Encryption in Transit
All data transfers are via secure channel HTTPS and TLS over TCP to prevent man-in-the-middle attacks during communication with Azure services.
You can also use IPSec VPN or Azure ExpressRoute to further secure the communication channel between your on-premises network and Azure.
Virtual Network Service Endpoint
Using Virtual Network Service Endpoints to restrict SQL DB access to only the specified Virtual Network (VNet) adds an extra layer of security. Service Endpoints enables private IP addresses in the VNet to reach the endpoint of an Azure service without needing a public IP address on the VNet.
Once you enable service endpoints in your VNet, you can add a VNet rule to secure the Azure service resources to your VNet. The rule provides improved security by fully removing public internet access to resources and allowing traffic only from your VNet.
In order for the Azure-SSIS IR to access the SQL Database, it needs to be joined to the same VNet and Subnet as illustrated by the above diagram (scenario 4). In this way, only this Subnet can access the SQL Database.
With that in place, turning off “Allow Azure Services to Access Server” is the next step as both the IR and the Azure SQL DB now operate within the context of a VNet and can communicate with private IP addresses which is more secure.
In this blog we’ve looked at the 3 integration runtimes. We’ve also examined how they can be made secure. Thank you for your attention.
Introduction to Data Wrangler in Microsoft Fabric
What is Data Wrangler? A key selling point of Microsoft Fabric is the Data Science
Jul
Autogen Power BI Model in Tabular Editor
In the realm of business intelligence, Power BI has emerged as a powerful tool for
Jul
Microsoft Healthcare Accelerator for Fabric
Microsoft released the Healthcare Data Solutions in Microsoft Fabric in Q1 2024. It was introduced
Jul
Unlock the Power of Colour: Make Your Power BI Reports Pop
Colour is a powerful visual tool that can enhance the appeal and readability of your
Jul
Python vs. PySpark: Navigating Data Analytics in Databricks – Part 2
Part 2: Exploring Advanced Functionalities in Databricks Welcome back to our Databricks journey! In this
May
GPT-4 with Vision vs Custom Vision in Anomaly Detection
Businesses today are generating data at an unprecedented rate. Automated processing of data is essential
May
Exploring DALL·E Capabilities
What is DALL·E? DALL·E is text-to-image generation system developed by OpenAI using deep learning methodologies.
May
Using Copilot Studio to Develop a HR Policy Bot
The next addition to Microsoft’s generative AI and large language model tools is Microsoft Copilot
Apr