Flat Files in Power Query

The out-of-the-box functionality in Power Query will enable you to import pretty much any flat file, ranging from csv to fixed width.  However, there are instances when badly formatted files cannot be loaded correctly and this blog will demonstrate a workaround for this. 

Unusual Flat Files

Whilst csv and txt are the most common types of flat file, I have worked on a number of BI projects where the data warehouse needs to ingest fixed width, ragged right or pipe delimited text files.  You won’t be surprised to hear that Power Query recognizes all of these formats!   I will demonstrate below how simple it is to import a Ragged Right file with a ‘.in’ extension. 

Ragged Right in Notepad:

clip_image002

Power Query’s Interpretation:

clip_image003

 

Badly Formed Flat Files

You won’t normally know that a flat file is badly formed until actually importing it.  In the below example, a csv file contains both missing and double commas.  Csv stands for ‘comma separated values’, so Power Query expects a comma to split each column.  Any developer who has imported data (through SSIS, SQL import, etc.) will have encountered a problem like this.

Two Commas in Notepad:

clip_image004

Two Commas in Power Query:

As expected, Power Query believes there is a blank column between the values ‘Liverpool’ and ‘Stoke’.  As a result, the data is shifted to the right.

clip_image006

Missing Commas in Power Query:

A missing comma between columns works the opposite way.  Data is shifted to the left.

clip_image008

Solution

As we already know that the csv file contains both two and missing commas, the next step is to import the data in a non-delimited format.  To do this, you need to start a new Blank Query and type the following code into the Advanced Editor:

let

Value = Table.FromList(Lines.FromBinary(File.Contents(“D:Documentation
InternalPowerQueryFlatFilesCSVMissingOrMoreCommas.csv”)),Splitter.SplitByNothing())

in

                Value

The code is turning the flat file data into a table and treating every row as an individual column – hence the ‘SplitByNothing’ syntax.  This is how the query looks:

clip_image009

We can now replace two commas with just one comma and split the table by ‘,’.  I have previously shown you how to Replace Values in an earlier blog, but will quickly demonstrate the Split Columns function.

clip_image011

Power Query actually auto detected all of these settings, as it can see the frequent number of commas and that they occur 10 times.  If you drop down Advanced options, you can change the delimiters and output columns if they are not interpreted correctly by Power Query.   Finally, we can implement the Use First Row As Headers UI function and filter the nulls in the ‘HTR’ column.

clip_image013

There is no easy way to write automated logic that would fix these three records because the missing commas were in different places and in our current query, different columns.  There are also no string patterns that can be easily deduced.  Therefore, this time around we will use Replace Values to move the data to the required columns.

clip_image015

There it is – a very effective way of visualising and managing both unusual and badly formed flat files in Power Query. 

Next Steps

We could easily extend the code to handle other file types or delimiters. Ken Puls’ blog demonstrates how you can pass the flat file connection string in as a parameter within a function.  This would be very handy for someone wanting to bring in multiple flat files through one configuration.  I would be very interested to see if any other Power Query developers have their own M code that handles flat files.  Feel free to comment or contact me directly if you have any suggestions on how to extend my example further.