Data Engineering

Introduction to Spark

Posted on 29th January 201816th December 2019 by Nigel Meakins

29
Jan

Spark is all the rage at the moment (and has been for a while) in the Big Data and Analytics communities, seeing application for all aspects of working with data, from Streaming to Data Science. It offers a very performant, multi-purpose scalable platform with a very strong user community. In this series I’ll be looking at setting Spark up on a local machine for learning purposes, working with the Jupyter notebook environment for data wrangling, mungeing and visualisation. We’ll also take a quick look at cloud platform offerings and some of the basics of the extensions to the core Spark platform such as Spark Structured Streaming and Spark ML. Spark is a very large subject and I won’t be going into too much depth, just enough to give readers a taster for capabilities and ease of use. There are some fantastics sources of information out there in the Spark community for those interested in a deeper understanding, which I’ll provide references to along the way.

Part 1: Installing Spark on Windows

Part 2: Jupyter Notebooks

Part 3: Installing Jupyter Notebook Kernels

Part 4: Spark on Azure HDInsight

Part 5: Spark on Azure Databricks

Part 6: Spark Core

Part 7: Spark SQL

Part 8: Spark Structured Streaming

Part 9: Spark ML

Part 10: Spark GraphX