Use of Azure Data Explorer and Azure Stream Analytics in IoT solutions

 

It is estimated that by the end of 2022 there will be 14.4 billion connected IoT devices globally and whilst the growth has slowed in 2021 it is expected to continue accelerating in 2023 and beyond.

 

Azure Data Explorer (ADX) and Azure Stream Analytics (ASA) are both Microsoft services designed for analysing large volumes of streaming data such as data generated by IoT devices (amongst other sources) and this blog is an overview of their use in IoT scenarios.

 

Azure Data Explorer

Azure Data Explorer is a fully managed, high-performance, big data analytics platform that makes it easy to analyse high volumes of data in near real-time. Its toolbox has all you need for an end-to-end solution from data ingestion and query to visualization and management. In fact, ADX is used as telemetry analytics platform in a lot of MS products and services such as Github, Office, Power BI, Xbox, Visual Studio etc.

 

Architecture

ADX contains two services under the hood, data management service and the engine service. In a typical cloud architecture fashion, storage and compute resources are decoupled and as a result, can be scaled independently. whilst engine service takes care of processing the data, managing hot cache and long-term storage and query execution.

 

ADX ingests and stores data in a proprietary format, unlike ASA where data is not stored and processed in memory. ADX uses column store, text indexing and data – a combination of these 3 technologies allows it to store huge amounts of data (petabytes) in a highly compressed format which is queried using KQL (Kusto Query Language).

 

Indexing allows skipping of entire batches of records when querying the data. Data shards are essentially immutable which means significant performance benefits as the need for complex change management is eliminated. As a result, KQL queries can query billions of records in seconds.

 

Whilst the engine service does have a familiar feel of a relational database in that it consists of a collection of databases, tables and other objects in it, there are however no primary/foreign keys or in fact any other constraints such as key uniqueness. This means that ingestion is fast and efficient as there is no need to handle complicated constraints in a distributed system.

 

Due to its mostly append-only nature, ADX is not designed to be a traditional data warehousing solution however its features make it an excellent choice in environments where 3Vs of big data world are prominent, such as IoT dataflows. It also provides built-in ML algorithms to support time series analytics such as seasonality detection, regression analysis, filtering for noise reduction, change detection, pattern matching and forecasting.

 

ADX is a natural destination for IoT data as it provides managed ingestion from IoT Hub and advanced analytics/ad hoc queries on the ingested data. KQL allows ad-hoc exploration of the data a bit like SQL in a traditional data warehouse.

One of the architectural solutions involving ADX in IoT environment is this:

 

 

ADX can leverage several other technologies, e.g., it can query data from ADLS Gen2 thus allowing customers to combine historical data held in the data lake with near real-time data cached in ADX. It is also capable of outputting data into ADLS Gen2 for further consumption by other services. In the architectural diagram, ASA complements ADX, consuming telemetry data and providing a hot data path where it can be served to the end users as real-time dashboards and operational custom apps.

 

Azure Stream Analytics

Azure Stream Analytics is a fully managed stream processing engine designed to process large volumes of data in sub-millisecond latencies. IoT signals from telemetry devices usually flow into IoT hub (these are optimized to collect data from connected devices) however IoT hub does not do anything with them. By using Azure Stream Analytics developers can make use of the data collected from devices.

 

Unlike Azure Data Explorer, Azure Stream Analytics does not store data. When connected to IoT Hub ASA takes the live stream and does things with it such as filter, manipulate and output into other systems. It can also have ML-based anomaly detection. Patterns in the stream can be identified and trigger actions and workflows such as alerts, data storing or feeding data into BI tool such as PowerBI. Queries in Stream Analytics are expressed in a SQL-like query language.

 

Azure Stream Analytics is available on Azure IoT Edge runtime which allows data to be processed directly on a device. This can reduce the amount of data sent to IoT Hub as initial signal filtering logic and/or aggregation can be implemented on the device itself. If the solution needs analytics deployed onto the edge, ASA is the right choice as ADX does not have such capabilities.

 

Azure Stream Analytics works well for analysing data on the fly and real-time alerting and dashboarding where there is a need to identify and act on changes in parameters in real-time. In many cases, though there is a need to run analytics on the data later or access historical data. ASA needs to store data and by now you might be thinking, given the awesome compression rates and performance of ADX, can ADX be an output for ASA? The answer to that is yes!

 

From October 2022, ADX can be used as an output from ASA. With this integration, Azure Stream Analytics job can natively ingest the data into Azure Data Explorer table.

 

The new integration enables ADX and ASA to be used alongside each other in IoT solutions in a new way, as shown in the below architecture:

 

With this architecture, the solution will benefit from low latency on alerts based on ASA input stream and alongside that data ingested into ADX can be used for generating insights, forecasting, anomalies finding etc.

To learn more about the possibilities of Azure Data Explorer and Azure Stream Analytics in IoT solutions, speak to the team.