The out-of-the-box functionality in Power Query will enable you to import pretty much any flat file, ranging from csv to fixed width. However, there are instances when badly formatted files cannot be loaded correctly and this blog will demonstrate a workaround for this.
Unusual Flat Files
Whilst csv and txt are the most common types of flat file, I have worked on a number of BI projects where the data warehouse needs to ingest fixed width, ragged right or pipe delimited text files. You won’t be surprised to hear that Power Query recognizes all of these formats! I will demonstrate below how simple it is to import a Ragged Right file with a ‘.in’ extension.
Ragged Right in Notepad:
Power Query’s Interpretation:
Badly Formed Flat Files
You won’t normally know that a flat file is badly formed until actually importing it. In the below example, a csv file contains both missing and double commas. Csv stands for ‘comma separated values’, so Power Query expects a comma to split each column. Any developer who has imported data (through SSIS, SQL import, etc.) will have encountered a problem like this.
Two Commas in Notepad:
Two Commas in Power Query:
As expected, Power Query believes there is a blank column between the values ‘Liverpool’ and ‘Stoke’. As a result, the data is shifted to the right.
Missing Commas in Power Query:
A missing comma between columns works the opposite way. Data is shifted to the left.
Solution
As we already know that the csv file contains both two and missing commas, the next step is to import the data in a non-delimited format. To do this, you need to start a new Blank Query and type the following code into the Advanced Editor:
let
Value = Table.FromList(Lines.FromBinary(File.Contents(“D:Documentation
InternalPowerQueryFlatFilesCSVMissingOrMoreCommas.csv”)),Splitter.SplitByNothing())
in
Value
The code is turning the flat file data into a table and treating every row as an individual column – hence the ‘SplitByNothing’ syntax. This is how the query looks:
We can now replace two commas with just one comma and split the table by ‘,’. I have previously shown you how to Replace Values in an earlier blog, but will quickly demonstrate the Split Columns function.
Power Query actually auto detected all of these settings, as it can see the frequent number of commas and that they occur 10 times. If you drop down Advanced options, you can change the delimiters and output columns if they are not interpreted correctly by Power Query. Finally, we can implement the Use First Row As Headers UI function and filter the nulls in the ‘HTR’ column.
There is no easy way to write automated logic that would fix these three records because the missing commas were in different places and in our current query, different columns. There are also no string patterns that can be easily deduced. Therefore, this time around we will use Replace Values to move the data to the required columns.
There it is – a very effective way of visualising and managing both unusual and badly formed flat files in Power Query.
Next Steps
We could easily extend the code to handle other file types or delimiters. Ken Puls’ blog demonstrates how you can pass the flat file connection string in as a parameter within a function. This would be very handy for someone wanting to bring in multiple flat files through one configuration. I would be very interested to see if any other Power Query developers have their own M code that handles flat files. Feel free to comment or contact me directly if you have any suggestions on how to extend my example further.
Pareto Charts in Power BI and the DAX behind them
The Pareto principle, commonly referred to as the 80/20 rule, is a concept of prioritisation.
Apr
Databricks: Cluster Configuration
Databricks, a cloud-based platform for data engineering, offers several tools that can be used to
Apr
AI Assistance in Microsoft Fabric
The exponential growth of Large Language Models (LLMs) couples with Microsoft’s close partnership with OpenAI
Apr
10 reasons why it’s worth the effort to understand the value of your data
“If leaders really want to create a data driven culture, the journey starts with them!
Apr
Content Safety in Azure AI Studio
Azure AI Content Safety is a solution designed to identify harmful content, whether generated by
Apr
Model Benchmarks in Azure AI Studio
In the constantly changing field of artificial intelligence (AI) and machine learning (ML), choosing the
Apr
Celebrating International Women’s Day: from Classroom to Code
As we celebrate International Women’s Day, I want to share my journey of breaking stereotypes
Mar
Pretty Power BI – Adding Pagination to Bar Charts
Good User Experience (UX) design is crucial in enabling stakeholders to maximise the insights that
Feb