In Power BI and more specifically Power Query, there is a simple, out of the box Web connector that enables you to scrape data from a website. This blog will show you how to achieve this in both Power BI desktop and through Excel.
Power BI Desktop
1. Open a Power BI Desktop. Select ‘Get Data’.
2. In the pop up window, select ‘Other’ > ‘Web’ and click ‘Connect’.
3. Type in the website you would like to scrape. For this example, I have used the BBC Sport website – looking at the ‘Premier League’ football league table. Click ‘OK’ to continue.
4. Power Query will try and obtain the data from the website and may take a couple of minutes loading. As long as you are connected to the Internet, the query will complete and looks like the below:
5. I have ticked the second option, however, you could select all three datasets if preferred. When happy with your selections, click ‘Load’.
6. You have now imported the scraped data, but to view it in Power Query, click ‘Edit Queries’ from the top menu pane.
7. The Query Editor will open in a separate window, with the scraped data showing in its raw state. You can now modify/join/clean until your hearts content!
Excel
1. Firstly, you must have Excel 2013 or later and the Power Query add in for Excel installed to be able to scrape a website using the below method.
2. Open Excel and start a new workbook. Then, click ‘Power Query’ from the top menu pane and select the ‘From Web’ button.
3. Enter the URL (demonstrated in step 3 of Power BI Desktop example) and click ‘OK’.
4. Choose your desired data set or alternatively, pick more than one by ticking the ‘Select Multiple Items’ option. Click ‘Load’ to download the data.
5. When loaded, you can preview the data by hovering over it – found under the ‘Workbook Queries’ menu pane.
6. To open the data in Power Query, double click the workbook query. A new window will open and the data available to edit – see below:
Limitations
Whilst playing with the web scraping feature, I have found that some websites work better with Power Query than others. At time of press, I cannot find any official Microsoft documentation on what elements of a website do work and more importantly, what doesn’t. However, web data that is wrapped in an html table or around div tags seem to work better with Power Query.
What’s Next?
As I have demonstrated, scraping websites using Power Query is very easy. This feature does not require any M or coding expertise, as Power Query works its magic under the hood. This is a great alternative to C#, JavaScript, etc. for scraping a web API.
This article shows you how to bring the data into Power Query but in reality, it will need to be cleaned and transformed to be useful. Excel Power Query Web Scrape – AFL Fixtures by Matt Allington and Using Power Query M Language for Scraping Any Website Data by Chris Webb are two very useful blogs demonstrating how this can be achieved. If you are already a very capable M developer or have used Power Query in the past, you will know how powerful, yet simple this is.
Future Blogs
Look out for more blogs on M and Power Query in the future. The next installment will be on how we can write native M to scrape websites, focusing specifically on web data that cannot be accessed through the user interface.
Meet the Team – Jason Bonello, Senior Consultant
Meet Jason Bonello! Jason has been with us for just over two years and works
Apr
Meet the Team – Matt How, Principal Consultant
Next up in our series of meet the team blogs is Matt How. Matt has
Apr
MLFlow: Introduction to MLFlow Tracking
MLFlow is an open-source MLOps platform designed by Databricks to enable organisations to easily manage
Apr
Adatis are pleased to announce expansion plans into India
Adatis has offices in London, Surrey and Bulgaria – and has some big expansion plans
Mar
Querying and Importing Data from Excel to SSMS
Introduction There are couple of ways to import data from Excel to SSMS – via
Mar
Data Engineering for Graduates and New Starters
This blog aims to give potential graduates and other new starters in the industry some
Mar
Passing DP-900 Azure Data Fundamentals
Back in December, I took the DP-900 Azure Data Fundamentals exam which is one of
Feb
Real-time Dashboards Provide Transparency to Everyone
Real-time dashboards enable data analytics firm Adatis to be agile and transparent with its team
Feb