I was recently working on a hybrid project where we download files from a lake and transform the data with SSIS. I was stunned to find that there’s no native ability to name the file you download from the lake! Even more frustrating, the downloaded file was inconsistently named as Data., rendering the SSIS File System Task useless in this case also. PowerShell to the rescue…
Using an execute process task to call the following PowerShell script, we were able to overcome this challenge.
param([string] $NewFileName, [string] $LocalFolder, [string] $FileNameFilter) $file = Get-ChildItem -Path $LocalFolder -Filter $FileNameFilter | ? { $_.LastWriteTime -gt (Get-Date).AddSeconds(-15) } | select -Last 1 Move-Item -Path $file.FullName -Destination $LocalFolder""$NewFileName -Force
The script accepts three parameters:
- $NewFileName – What you want to name the file to, including the file extension.
- $LocalFolder – The local folder in which the file resides.
- $FileNameFilter – A mask to apply for searching for the downloaded file. In this case, we used Data.* where * is a wildcard for the GUID
Get-ChildItem is used to obtain the details of the latest file written to our $LocalFolder within the last 15 seconds. This just adds an element of security, minimizing risk of the script being used outside of the SSIS process and renaming files it shouldn’t.
Move-Item is used instead of Rename-Item, as in our case we wanted to overwrite the file if it already existed.
If you have multiple packages using this script, which are called in parallel by a master package, I would highly recommend adding a completion constraint between all of the Execute Package Tasks to ensure no file accidentally renamed inappropriately by another package running at the same time. If removing parallelism isn’t an option for performance reasons, you could set up a different local folder per package.
Pareto Charts in Power BI and the DAX behind them
The Pareto principle, commonly referred to as the 80/20 rule, is a concept of prioritisation.
Apr
Databricks: Cluster Configuration
Databricks, a cloud-based platform for data engineering, offers several tools that can be used to
Apr
AI Assistance in Microsoft Fabric
The exponential growth of Large Language Models (LLMs) couples with Microsoft’s close partnership with OpenAI
Apr
10 reasons why it’s worth the effort to understand the value of your data
“If leaders really want to create a data driven culture, the journey starts with them!
Apr
Content Safety in Azure AI Studio
Azure AI Content Safety is a solution designed to identify harmful content, whether generated by
Apr
Model Benchmarks in Azure AI Studio
In the constantly changing field of artificial intelligence (AI) and machine learning (ML), choosing the
Apr
Celebrating International Women’s Day: from Classroom to Code
As we celebrate International Women’s Day, I want to share my journey of breaking stereotypes
Mar
Pretty Power BI – Adding Pagination to Bar Charts
Good User Experience (UX) design is crucial in enabling stakeholders to maximise the insights that
Feb