Finding duplicate objects following Git merges

Git as a version control has many advantages over Azure DevOps Server (formerly Teams Foundation Server, TFS) . The ability for multiple people to work on the same project at the same time – Azure DevOp’s Server lock out policy, while understandable, became quickly unworkable across a multi-person team. The command line tools also offer a lot of flexibility

However Git is not without its issues when working on a SQL Server solution. One of the most notable is the issues that come when trying to merge changes. Anyone who has merged changes in Git within Visual Studio will be familiar with the following:

This happens when Git is unable to automatically merge the two files and you have to manually solve conflicts using Visual Studio’s merge tool. Often this is a straightforward manner of choosing whether to take something from the local or remote file (or sometimes both). However there are times where the merge tool can’t quite work out what needs to be merged and you end up with duplicate elements in your project definition file and get the “An item with the same key has already been added” error when your project is loaded:

Working on a project with many hundreds of packages per project trying to track down these duplicate elements quickly became a time sink so I wrote a Python script which will identify them for me. It traverses the project metadata file, which is an XML file, and pulls out each instance of the PackageMetaData tag. It then loads these into a Pandas dataframe and does a “group by” to get a count and then only return those with a count greater than 1. A quick way to identify where duplicate elements exist in your merged file which can then be deleted from the XML and the project metadata file reloaded in Visual Studio:

import xml.etree.ElementTree as ET
import pandas as pd

# Variable pointing to your local xml
localXml = "C:\\Users\\DPS\\Desktop\\SSIS.xml"


namespace = "{www.microsoft.com/SqlServer/SSIS}"
e = ET.parse(localXml).getroot()

# traverses the XML document to get all the PackageMetaData elements
t = e.findall("DeploymentModelSpecificContent/Manifest/{0}Project/{0}DeploymentInfo/{0}PackageInfo/{0}PackageMetaData".format(namespace))
    
# creates an empty list
l = []

# loops through each instance of PackageMetaData and pulls out the Package and the Parameter names and puts them in a list
for a in t:
    a_name = a.attrib.get('{0}Name'.format(namespace))
    for z in a.getiterator('{0}Parameter'.format(namespace)):
        z_name = z.attrib.get('{0}Name'.format(namespace))
        l.append(a_name + '-' + z_name)
        

# loads the list into a dataframe, splits on the "-" and renames the columns
df = pd.DataFrame(l, columns=["column"])
df = df["column"].str.split('-', expand=True)
df.columns = ['package', 'parameter']

# counts the number of instances of each and filter where greater than 1
df = df.groupby(['package', 'parameter']).size().to_frame('size')
df = df[df['size'] > 1]

# print to the console
print(df)

Business Intelligence

Finding duplicate objects following Git merges

David Stelfox

Introduction to Data Wrangler in Microsoft Fabric

Autogen Power BI Model in Tabular Editor

Microsoft Healthcare Accelerator for Fabric

Unlock the Power of Colour: Make Your Power BI Reports Pop

Python vs. PySpark: Navigating Data Analytics in Databricks – Part 2

GPT-4 with Vision vs Custom Vision in Anomaly Detection

Exploring DALL·E Capabilities

Using Copilot Studio to Develop a HR Policy Bot

London

Aberdeen

Surrey

Bulgaria

India

Glasgow

Manchester

Get in touch

Our Social Channels

Related Posts

David Stelfox

Introduction to Data Wrangler in Microsoft Fabric

Autogen Power BI Model in Tabular Editor

Microsoft Healthcare Accelerator for Fabric

Unlock the Power of Colour: Make Your Power BI Reports Pop

Python vs. PySpark: Navigating Data Analytics in Databricks – Part 2

GPT-4 with Vision vs Custom Vision in Anomaly Detection

Exploring DALL·E Capabilities

Using Copilot Studio to Develop a HR Policy Bot