In Power BI and more specifically Power Query, there is a simple, out of the box Web connector that enables you to scrape data from a website. This blog will show you how to achieve this in both Power BI desktop and through Excel.
Power BI Desktop
1. Open a Power BI Desktop. Select ‘Get Data’.
2. In the pop up window, select ‘Other’ > ‘Web’ and click ‘Connect’.
3. Type in the website you would like to scrape. For this example, I have used the BBC Sport website – looking at the ‘Premier League’ football league table. Click ‘OK’ to continue.
4. Power Query will try and obtain the data from the website and may take a couple of minutes loading. As long as you are connected to the Internet, the query will complete and looks like the below:
5. I have ticked the second option, however, you could select all three datasets if preferred. When happy with your selections, click ‘Load’.
6. You have now imported the scraped data, but to view it in Power Query, click ‘Edit Queries’ from the top menu pane.
7. The Query Editor will open in a separate window, with the scraped data showing in its raw state. You can now modify/join/clean until your hearts content!
1. Firstly, you must have Excel 2013 or later and the Power Query add in for Excel installed to be able to scrape a website using the below method.
2. Open Excel and start a new workbook. Then, click ‘Power Query’ from the top menu pane and select the ‘From Web’ button.
3. Enter the URL (demonstrated in step 3 of Power BI Desktop example) and click ‘OK’.
4. Choose your desired data set or alternatively, pick more than one by ticking the ‘Select Multiple Items’ option. Click ‘Load’ to download the data.
5. When loaded, you can preview the data by hovering over it – found under the ‘Workbook Queries’ menu pane.
6. To open the data in Power Query, double click the workbook query. A new window will open and the data available to edit – see below:
Whilst playing with the web scraping feature, I have found that some websites work better with Power Query than others. At time of press, I cannot find any official Microsoft documentation on what elements of a website do work and more importantly, what doesn’t. However, web data that is wrapped in an html table or around div tags seem to work better with Power Query.
This article shows you how to bring the data into Power Query but in reality, it will need to be cleaned and transformed to be useful. Excel Power Query Web Scrape – AFL Fixtures by Matt Allington and Using Power Query M Language for Scraping Any Website Data by Chris Webb are two very useful blogs demonstrating how this can be achieved. If you are already a very capable M developer or have used Power Query in the past, you will know how powerful, yet simple this is.
Look out for more blogs on M and Power Query in the future. The next installment will be on how we can write native M to scrape websites, focusing specifically on web data that cannot be accessed through the user interface.