Finding duplicate objects following Git merges

Git as a version control has many advantages over  Azure DevOps Server (formerly Teams Foundation Server, TFS) . The ability for multiple people to work on the same project at the same time – Azure DevOp’s Server lock out policy, while understandable, became quickly unworkable across a multi-person team. The command line tools also offer a lot of flexibility

However Git is not without its issues when working on a SQL Server solution. One of the most notable is the issues that come when trying to merge changes. Anyone who has merged changes in Git within Visual Studio will be familiar with the following:

Error when merging changes in Git within Visual Studio

This happens when Git is unable to automatically merge the two files and you have to manually solve conflicts using Visual Studio’s merge tool. Often this is a straightforward manner of choosing whether to take something from the local or remote file (or sometimes both). However there are times where the merge tool can’t quite work out what needs to be merged and you end up with duplicate elements in your project definition file and get the “An item with the same key has already been added” error when your project is loaded:

Microsoft Visual Studio error

Working on a project with many hundreds of packages per project trying to track down these duplicate elements quickly became a time sink so I wrote a Python script which will identify them for me. It traverses the project metadata file, which is an XML file, and pulls out each instance of the PackageMetaData tag. It then loads these into a Pandas dataframe and does a “group by” to get a count and then only return those with a count greater than 1. A quick way to identify where duplicate elements exist in your merged file which can then be deleted from the XML and the project metadata file reloaded in Visual Studio:

 

import xml.etree.ElementTree as ET
import pandas as pd

# Variable pointing to your local xml
localXml = "C:\\Users\\DPS\\Desktop\\SSIS.xml"


namespace = "{www.microsoft.com/SqlServer/SSIS}"
e = ET.parse(localXml).getroot()

# traverses the XML document to get all the PackageMetaData elements
t = e.findall("DeploymentModelSpecificContent/Manifest/{0}Project/{0}DeploymentInfo/{0}PackageInfo/{0}PackageMetaData".format(namespace))
    
# creates an empty list
l = []

# loops through each instance of PackageMetaData and pulls out the Package and the Parameter names and puts them in a list
for a in t:
    a_name = a.attrib.get('{0}Name'.format(namespace))
    for z in a.getiterator('{0}Parameter'.format(namespace)):
        z_name = z.attrib.get('{0}Name'.format(namespace))
        l.append(a_name + '-' + z_name)
        

# loads the list into a dataframe, splits on the "-" and renames the columns
df = pd.DataFrame(l, columns=["column"])
df = df["column"].str.split('-', expand=True)
df.columns = ['package', 'parameter']

# counts the number of instances of each and filter where greater than 1
df = df.groupby(['package', 'parameter']).size().to_frame('size')
df = df[df['size'] > 1]

# print to the console
print(df)

 

Leave a Reply

Your email address will not be published. Required fields are marked *