Spark is all the rage at the moment (and has been for a while) in the Big Data and Analytics communities, seeing application for all aspects of working with data, from Streaming to Data Science. It offers a very performant, multi-purpose scalable platform with a very strong user community. In this series I’ll be looking at setting Spark up on a local machine for learning purposes, working with the Jupyter notebook environment for data wrangling, mungeing and visualisation. We’ll also take a quick look at cloud platform offerings and some of the basics of the extensions to the core Spark platform such as Spark Structured Streaming and Spark ML. Spark is a very large subject and I won’t be going into too much depth, just enough to give readers a taster for capabilities and ease of use. There are some fantastics sources of information out there in the Spark community for those interested in a deeper understanding, which I’ll provide references to along the way.
Part 1: Installing Spark on Windows
Part 3: Installing Jupyter Notebook Kernels
Part 4: Spark on Azure HDInsight
Part 5: Spark on Azure Databricks
Part 6: Spark Core
Part 7: Spark SQL
Part 8: Spark Structured Streaming
Part 9: Spark ML
Part 10: Spark GraphX
Introduction to Data Wrangler in Microsoft Fabric
What is Data Wrangler? A key selling point of Microsoft Fabric is the Data Science
Jul
Autogen Power BI Model in Tabular Editor
In the realm of business intelligence, Power BI has emerged as a powerful tool for
Jul
Microsoft Healthcare Accelerator for Fabric
Microsoft released the Healthcare Data Solutions in Microsoft Fabric in Q1 2024. It was introduced
Jul
Unlock the Power of Colour: Make Your Power BI Reports Pop
Colour is a powerful visual tool that can enhance the appeal and readability of your
Jul
Python vs. PySpark: Navigating Data Analytics in Databricks – Part 2
Part 2: Exploring Advanced Functionalities in Databricks Welcome back to our Databricks journey! In this
May
GPT-4 with Vision vs Custom Vision in Anomaly Detection
Businesses today are generating data at an unprecedented rate. Automated processing of data is essential
May
Exploring DALL·E Capabilities
What is DALL·E? DALL·E is text-to-image generation system developed by OpenAI using deep learning methodologies.
May
Using Copilot Studio to Develop a HR Policy Bot
The next addition to Microsoft’s generative AI and large language model tools is Microsoft Copilot
Apr