In Power BI and more specifically Power Query, there is a simple, out of the box Web connector that enables you to scrape data from a website. This blog will show you how to achieve this in both Power BI desktop and through Excel.
Power BI Desktop
1. Open a Power BI Desktop. Select ‘Get Data’.
2. In the pop up window, select ‘Other’ > ‘Web’ and click ‘Connect’.
3. Type in the website you would like to scrape. For this example, I have used the BBC Sport website – looking at the ‘Premier League’ football league table. Click ‘OK’ to continue.
4. Power Query will try and obtain the data from the website and may take a couple of minutes loading. As long as you are connected to the Internet, the query will complete and looks like the below:
5. I have ticked the second option, however, you could select all three datasets if preferred. When happy with your selections, click ‘Load’.
6. You have now imported the scraped data, but to view it in Power Query, click ‘Edit Queries’ from the top menu pane.
7. The Query Editor will open in a separate window, with the scraped data showing in its raw state. You can now modify/join/clean until your hearts content!
Excel
1. Firstly, you must have Excel 2013 or later and the Power Query add in for Excel installed to be able to scrape a website using the below method.
2. Open Excel and start a new workbook. Then, click ‘Power Query’ from the top menu pane and select the ‘From Web’ button.
3. Enter the URL (demonstrated in step 3 of Power BI Desktop example) and click ‘OK’.
4. Choose your desired data set or alternatively, pick more than one by ticking the ‘Select Multiple Items’ option. Click ‘Load’ to download the data.
5. When loaded, you can preview the data by hovering over it – found under the ‘Workbook Queries’ menu pane.
6. To open the data in Power Query, double click the workbook query. A new window will open and the data available to edit – see below:
Limitations
Whilst playing with the web scraping feature, I have found that some websites work better with Power Query than others. At time of press, I cannot find any official Microsoft documentation on what elements of a website do work and more importantly, what doesn’t. However, web data that is wrapped in an html table or around div tags seem to work better with Power Query.
What’s Next?
As I have demonstrated, scraping websites using Power Query is very easy. This feature does not require any M or coding expertise, as Power Query works its magic under the hood. This is a great alternative to C#, JavaScript, etc. for scraping a web API.
This article shows you how to bring the data into Power Query but in reality, it will need to be cleaned and transformed to be useful. Excel Power Query Web Scrape – AFL Fixtures by Matt Allington and Using Power Query M Language for Scraping Any Website Data by Chris Webb are two very useful blogs demonstrating how this can be achieved. If you are already a very capable M developer or have used Power Query in the past, you will know how powerful, yet simple this is.
Future Blogs
Look out for more blogs on M and Power Query in the future. The next installment will be on how we can write native M to scrape websites, focusing specifically on web data that cannot be accessed through the user interface.
How Artificial Intelligence and Data Add Value to Businesses
Knowledge is power. And the data that you collect in the course of your business
May
Databricks Vs Synapse Spark Pools – What, When and Where?
Databricks or Synapse seems to be the question on everyone’s lips, whether its people asking
1 Comment
May
Power BI to Power AI – Part 2
This post is the second part of a blog series on the AI features of
Apr
Geospatial Sample architecture overview
The first blog ‘Part 1 – Introduction to Geospatial data’ gave an overview into geospatial
Apr
Data Lakehouses for Dummies
When we are thinking about data platforms, there are many different services and architectures that
Apr
Enable Smart Facility Management with Azure Digital Twins
Before I started writing this blog, I went to Google and searched for the keywords
Apr
Migrating On-Prem SSIS workload to Azure
Goal of this blog There can be scenario where organization wants to migrate there existing
Mar
Send B2B data with Azure Logic Apps and Enterprise Integration Pack
After creating an integration account that has partners and agreements, we are ready to create
Mar