A big data architecture is designed to handle the ingestion, processing, and analysis of data that is too large or complex for traditional database systems. The threshold at which organisations enter into the big data realm differs, depending on the capabilities of the users and their tools. For some, it can mean hundreds of gigabytes of data, while for others it means hundreds of terabytes. As tools for working with big data sets advance, so does the meaning of big data. More often, this term relates to the value you can extract from your data sets through advanced analytics, rather than strictly the size of that data.

The evolving data architecture challenge

The cloud is rapidly changing the way applications are designed and how data is processed and stored. The choice around which is the best architecture for a particular organisation’s goals is not straightforward and requires a full understanding of data and business requirements as well as a thorough knowledge of emerging technologies and best practice.

Traditional vs big data solutions

Data in traditional database systems is typically relational data with a pre-defined schema and a set of constraints to maintain referential integrity. Often, data from multiple sources in the organisation may be consolidated into a modern data warehouse, using an ETL process to move and transform the source data.

By contrast, a big data architecture is designed to handle the ingestion, processing, and analysis of data that is too large or complex for traditional database systems. The data may be processed in batch or in real time. These two categories are not mutually exclusive, and there is overlap between them.

The skill is in selecting the relevant Azure services and the appropriate architecture for each scenario. In addition, there is also work to be done on the technology choices for data solutions in Azure which can include open source options.