Data Analytics

Flat Files in Power Query

Posted on 16th March 20165th April 2020 by Callum Green

16
Mar

The out-of-the-box functionality in Power Query will enable you to import pretty much any flat file, ranging from csv to fixed width. However, there are instances when badly formatted files cannot be loaded correctly and this blog will demonstrate a workaround for this.

Unusual Flat Files

Whilst csv and txt are the most common types of flat file, I have worked on a number of BI projects where the data warehouse needs to ingest fixed width, ragged right or pipe delimited text files. You won’t be surprised to hear that Power Query recognizes all of these formats! I will demonstrate below how simple it is to import a Ragged Right file with a ‘.in’ extension.

Ragged Right in Notepad:

Power Query’s Interpretation:

Badly Formed Flat Files

You won’t normally know that a flat file is badly formed until actually importing it. In the below example, a csv file contains both missing and double commas. Csv stands for ‘comma separated values’, so Power Query expects a comma to split each column. Any developer who has imported data (through SSIS, SQL import, etc.) will have encountered a problem like this.

Two Commas in Notepad:

Two Commas in Power Query:

As expected, Power Query believes there is a blank column between the values ‘Liverpool’ and ‘Stoke’. As a result, the data is shifted to the right.

Missing Commas in Power Query:

A missing comma between columns works the opposite way. Data is shifted to the left.

Solution

As we already know that the csv file contains both two and missing commas, the next step is to import the data in a non-delimited format. To do this, you need to start a new Blank Query and type the following code into the Advanced Editor:

let

Value = Table.FromList(Lines.FromBinary(File.Contents(“D:Documentation
InternalPowerQueryFlatFilesCSVMissingOrMoreCommas.csv”)),Splitter.SplitByNothing())

Value

The code is turning the flat file data into a table and treating every row as an individual column – hence the ‘SplitByNothing’ syntax. This is how the query looks:

We can now replace two commas with just one comma and split the table by ‘,’. I have previously shown you how to Replace Values in an earlier blog, but will quickly demonstrate the Split Columns function.

Power Query actually auto detected all of these settings, as it can see the frequent number of commas and that they occur 10 times. If you drop down Advanced options, you can change the delimiters and output columns if they are not interpreted correctly by Power Query. Finally, we can implement the Use First Row As Headers UI function and filter the nulls in the ‘HTR’ column.

There is no easy way to write automated logic that would fix these three records because the missing commas were in different places and in our current query, different columns. There are also no string patterns that can be easily deduced. Therefore, this time around we will use Replace Values to move the data to the required columns.

There it is – a very effective way of visualising and managing both unusual and badly formed flat files in Power Query.

Next Steps

We could easily extend the code to handle other file types or delimiters. Ken Puls’ blog demonstrates how you can pass the flat file connection string in as a parameter within a function. This would be very handy for someone wanting to bring in multiple flat files through one configuration. I would be very interested to see if any other Power Query developers have their own M code that handles flat files. Feel free to comment or contact me directly if you have any suggestions on how to extend my example further.

Callum Green

Introduction to Data Wrangler in Microsoft Fabric

What is Data Wrangler? A key selling point of Microsoft Fabric is the Data Science

25
Jul

Autogen Power BI Model in Tabular Editor

In the realm of business intelligence, Power BI has emerged as a powerful tool for

12
Jul

Microsoft Healthcare Accelerator for Fabric

Microsoft released the Healthcare Data Solutions in Microsoft Fabric in Q1 2024. It was introduced

09
Jul

Unlock the Power of Colour: Make Your Power BI Reports Pop

Colour is a powerful visual tool that can enhance the appeal and readability of your

09
Jul

Python vs. PySpark: Navigating Data Analytics in Databricks – Part 2

Part 2: Exploring Advanced Functionalities in Databricks Welcome back to our Databricks journey! In this

20
May

GPT-4 with Vision vs Custom Vision in Anomaly Detection

Businesses today are generating data at an unprecedented rate. Automated processing of data is essential

08
May

Exploring DALL·E Capabilities

What is DALL·E? DALL·E is text-to-image generation system developed by OpenAI using deep learning methodologies.

03
May

Using Copilot Studio to Develop a HR Policy Bot

The next addition to Microsoft’s generative AI and large language model tools is Microsoft Copilot

29
Apr

Data Analytics

Flat Files in Power Query

Unusual Flat Files

Badly Formed Flat Files

Solution

Next Steps

Callum Green

Introduction to Data Wrangler in Microsoft Fabric

Autogen Power BI Model in Tabular Editor

Microsoft Healthcare Accelerator for Fabric

Unlock the Power of Colour: Make Your Power BI Reports Pop

Python vs. PySpark: Navigating Data Analytics in Databricks – Part 2

GPT-4 with Vision vs Custom Vision in Anomaly Detection

Exploring DALL·E Capabilities

Using Copilot Studio to Develop a HR Policy Bot

London

Aberdeen

Surrey

Bulgaria

India

Glasgow

Manchester

Get in touch

Our Social Channels

Unusual Flat Files

Badly Formed Flat Files

Solution

Next Steps

Related Posts

Callum Green

Introduction to Data Wrangler in Microsoft Fabric

Autogen Power BI Model in Tabular Editor

Microsoft Healthcare Accelerator for Fabric

Unlock the Power of Colour: Make Your Power BI Reports Pop

Python vs. PySpark: Navigating Data Analytics in Databricks – Part 2

GPT-4 with Vision vs Custom Vision in Anomaly Detection

Exploring DALL·E Capabilities

Using Copilot Studio to Develop a HR Policy Bot