The out-of-the-box functionality in Power Query will enable you to import pretty much any flat file, ranging from csv to fixed width. However, there are instances when badly formatted files cannot be loaded correctly and this blog will demonstrate a workaround for this.
Unusual Flat Files
Whilst csv and txt are the most common types of flat file, I have worked on a number of BI projects where the data warehouse needs to ingest fixed width, ragged right or pipe delimited text files. You won’t be surprised to hear that Power Query recognizes all of these formats! I will demonstrate below how simple it is to import a Ragged Right file with a ‘.in’ extension.
Ragged Right in Notepad:
Power Query’s Interpretation:
Badly Formed Flat Files
You won’t normally know that a flat file is badly formed until actually importing it. In the below example, a csv file contains both missing and double commas. Csv stands for ‘comma separated values’, so Power Query expects a comma to split each column. Any developer who has imported data (through SSIS, SQL import, etc.) will have encountered a problem like this.
Two Commas in Notepad:
Two Commas in Power Query:
As expected, Power Query believes there is a blank column between the values ‘Liverpool’ and ‘Stoke’. As a result, the data is shifted to the right.
Missing Commas in Power Query:
A missing comma between columns works the opposite way. Data is shifted to the left.
Solution
As we already know that the csv file contains both two and missing commas, the next step is to import the data in a non-delimited format. To do this, you need to start a new Blank Query and type the following code into the Advanced Editor:
let
Value = Table.FromList(Lines.FromBinary(File.Contents(“D:Documentation
InternalPowerQueryFlatFilesCSVMissingOrMoreCommas.csv”)),Splitter.SplitByNothing())
in
Value
The code is turning the flat file data into a table and treating every row as an individual column – hence the ‘SplitByNothing’ syntax. This is how the query looks:
We can now replace two commas with just one comma and split the table by ‘,’. I have previously shown you how to Replace Values in an earlier blog, but will quickly demonstrate the Split Columns function.
Power Query actually auto detected all of these settings, as it can see the frequent number of commas and that they occur 10 times. If you drop down Advanced options, you can change the delimiters and output columns if they are not interpreted correctly by Power Query. Finally, we can implement the Use First Row As Headers UI function and filter the nulls in the ‘HTR’ column.
There is no easy way to write automated logic that would fix these three records because the missing commas were in different places and in our current query, different columns. There are also no string patterns that can be easily deduced. Therefore, this time around we will use Replace Values to move the data to the required columns.
There it is – a very effective way of visualising and managing both unusual and badly formed flat files in Power Query.
Next Steps
We could easily extend the code to handle other file types or delimiters. Ken Puls’ blog demonstrates how you can pass the flat file connection string in as a parameter within a function. This would be very handy for someone wanting to bring in multiple flat files through one configuration. I would be very interested to see if any other Power Query developers have their own M code that handles flat files. Feel free to comment or contact me directly if you have any suggestions on how to extend my example further.
Meet the Team – Jason Bonello, Senior Consultant
Meet Jason Bonello! Jason has been with us for just over two years and works
Apr
Meet the Team – Matt How, Principal Consultant
Next up in our series of meet the team blogs is Matt How. Matt has
Apr
MLFlow: Introduction to MLFlow Tracking
MLFlow is an open-source MLOps platform designed by Databricks to enable organisations to easily manage
Apr
Adatis are pleased to announce expansion plans into India
Adatis has offices in London, Surrey and Bulgaria – and has some big expansion plans
Mar
Querying and Importing Data from Excel to SSMS
Introduction There are couple of ways to import data from Excel to SSMS – via
Mar
Data Engineering for Graduates and New Starters
This blog aims to give potential graduates and other new starters in the industry some
Mar
Passing DP-900 Azure Data Fundamentals
Back in December, I took the DP-900 Azure Data Fundamentals exam which is one of
Feb
Real-time Dashboards Provide Transparency to Everyone
Real-time dashboards enable data analytics firm Adatis to be agile and transparent with its team
Feb