The Challenge
Write a program that generates all two-word anagrams of the string “documenting”, in SQL Server.
e.g. “Documenting” = “Document Gin”.
Introduction
This weeks’ challenge was….well, a challenge. In our first session everyone attempted it via paired programming, for this dojo we decided to attempt the problem with one larger group, consisting of 5 people. This technique encourages an open forum for developers to discuss ideas, with one person at the computer coding. The assigned coder must swap with another participant every 5 – 10 minutes.
Preparation
Before beginning the Dojo, I asked for one of the Senior Consultants to attempt to complete the problem and provide the code and concepts behind his thinking. The idea is that one challenge can be solved in a number of ways, with 5 people working collaboratively thinking in a completely different way to one individual coder. We provided a file containing 10,000 words (which would become the master list for the anagram solution). If you would like to try this yourself, the text file can be downloaded from here.
Senior Developer’s Solution
Most importantly, the senior developer DID NOT use the TDD development methodology. It took around 2 hours for the developer to implement a fully functioning stored procedure. Here is how it was achieved:
1. Stored procedure, encapsulating a long T-SQL script. The solution was focused around string manipulation and recursive querying.
2. While loop turns word into a pivoted array of individual letters, which can then be compared against the words in the word list table.
3. Recursive CTE returns words consisting only of letters in that array
4. XML PATH used to create CHECKSUM string of letters and counts.
5. All word combinations of correct length returned and checked against checksum to validate letter counts.
The solution was built to be flexible from the start – it returned anagrams with ‘n’ number of words rather than the proposed 2. It would also work with any provided starting word and special characters.
Code Dojo Solution
The first task was to brainstorm ideas on how to conquer the challenge, starting with whether it could even be achieved in SQL Server! The coding team weighed up string manipulation, recursive CTE’s, a cursor or mathematical calculations, amongst other SQL Server functions. The general consensus was to avoid recursion (where possible) to ensure faster query results and mathematical algorithms to compare against the data in the imported wordlist table.
The development team used TDD to implement the mathematical approach to solving the problem. Each code enhancement contained a new automated test script that would capture any failures and enable a simple rollback to the last working codebase. This proved to be a successful technique, as ideas were changing constantly throughout development.
Actual Dojo Solution
After around 90 minutes of SQL coding and TDD, the team managed to come up with a working solution. This is how they achieved it:
1. A SQL Function to work out a unique value for a character (letter).
a. Each letter of the word ‘Documenting’ has a case sensitive ASCII value e.g. ‘D’ is 68 and ‘d’ is 100.
b. All letters were converted to uppercase, to ensure a non-case sensitive approach.
c. Apply the POWER SQL function within a given letter, which gives it a unique number and cannot be duplicated by another letter.
2. Simple stored procedure that looks up the existing words in the full word list table, which references the function whilst comparing letter values.
a. Find all word combinations and their total POWER value.
b. Compare the total number against the hardcoded word ‘Documenting’
c. Return and records that have two words adding up to the total POWER of ‘Documenting’.
d. Nest the hard coded word into the procedure, which means any word can be compared at run time.
3. The TDD approach helped facilitate the iterative, code review process.
Conclusion
Although the Senior Developer and Dojo Team’s solutions met the minimum expectations for the challenge, there were a number of pro’s and con’s to both solutions.
The Senior Developer thought ahead and made improvements to the requirements, such as dynamically handling more than 2 word anagrams. He also demonstrated some of the under used and more powerful functionality within SQL. However, it takes around 2 minutes to execute one anagram and the code itself is not the most efficient. By not using TDD in his approach, he over complicated the solution and did not encounter performance bottlenecks until the end of the build.
On the other hand, the Dojo team fully practiced TDD. This was reflected in the much smaller codebase and, most importantly, the speed in which the anagram procedure executed was much quicker (23 Seconds). Their solution is limited in that it is very rigid and cannot handle more than two word anagrams. It also cannot handle special characters, whereas the Senior Developer solution can. However, these were not requirements of the solution – would a client accept a slower, more complicated product that handles requirements they do not need?
Overall, both solutions work adequately but it is clear that when confronting a technical challenge/problem, running a Dojo and practising TDD can bring more efficient results. Had we added further requirements to the challenge, maybe the dojo team would have found another, even more efficient solution.
References
General
Good Dojo Practices – http://codingdojo.org/
TDD – http://code.tutsplus.com/tutorials/the-newbies-guide-to-test-driven-development–net-13835
SQL Functions
POWER – https://msdn.microsoft.com/en-us/library/ms174276.aspx
ASCII – https://msdn.microsoft.com/en-us/library/ms177545.aspx
XML PATH – http://blogs.msdn.com/b/mind_talks/archive/2012/01/18/xml-path-for-sql-server.aspx
CHECKSUM – https://msdn.microsoft.com/en-us/library/ms189788.aspx
Dojo Code
For access to the Senior Developer’s and the Dojo team’s SQL solutions, please leave a comment and I will get back to you directly. It would be interesting to see if anyone else has a better technique (both in terms of code length and performance) and welcome any suggestions.
Introduction to Data Wrangler in Microsoft Fabric
What is Data Wrangler? A key selling point of Microsoft Fabric is the Data Science
Jul
Autogen Power BI Model in Tabular Editor
In the realm of business intelligence, Power BI has emerged as a powerful tool for
Jul
Microsoft Healthcare Accelerator for Fabric
Microsoft released the Healthcare Data Solutions in Microsoft Fabric in Q1 2024. It was introduced
Jul
Unlock the Power of Colour: Make Your Power BI Reports Pop
Colour is a powerful visual tool that can enhance the appeal and readability of your
Jul
Python vs. PySpark: Navigating Data Analytics in Databricks – Part 2
Part 2: Exploring Advanced Functionalities in Databricks Welcome back to our Databricks journey! In this
May
GPT-4 with Vision vs Custom Vision in Anomaly Detection
Businesses today are generating data at an unprecedented rate. Automated processing of data is essential
May
Exploring DALL·E Capabilities
What is DALL·E? DALL·E is text-to-image generation system developed by OpenAI using deep learning methodologies.
May
Using Copilot Studio to Develop a HR Policy Bot
The next addition to Microsoft’s generative AI and large language model tools is Microsoft Copilot
Apr