Adatis Coding Dojo – Session No.2

The Challenge

Write a program that generates all two-word anagrams of the string “documenting”, in SQL Server.
e.g. “Documenting” = “Document Gin”.

Introduction

This weeks’ challenge was….well, a challenge. In our first session everyone attempted it via paired programming, for this dojo we decided to attempt the problem with one larger group, consisting of 5 people. This technique encourages an open forum for developers to discuss ideas, with one person at the computer coding. The assigned coder must swap with another participant every 5 – 10 minutes.

Preparation

Before beginning the Dojo, I asked for one of the Senior Consultants to attempt to complete the problem and provide the code and concepts behind his thinking. The idea is that one challenge can be solved in a number of ways, with 5 people working collaboratively thinking in a completely different way to one individual coder. We provided a file containing 10,000 words (which would become the master list for the anagram solution). If you would like to try this yourself, the text file can be downloaded from here.

Senior Developer’s Solution

Most importantly, the senior developer DID NOT use the TDD development methodology. It took around 2 hours for the developer to implement a fully functioning stored procedure. Here is how it was achieved:

1. Stored procedure, encapsulating a long T-SQL script. The solution was focused around string manipulation and recursive querying.

2. While loop turns word into a pivoted array of individual letters, which can then be compared against the words in the word list table.

3. Recursive CTE returns words consisting only of letters in that array

4. XML PATH used to create CHECKSUM string of letters and counts.

5. All word combinations of correct length returned and checked against checksum to validate letter counts.

The solution was built to be flexible from the start – it returned anagrams with ‘n’ number of words rather than the proposed 2. It would also work with any provided starting word and special characters.

Code Dojo Solution

The first task was to brainstorm ideas on how to conquer the challenge, starting with whether it could even be achieved in SQL Server! The coding team weighed up string manipulation, recursive CTE’s, a cursor or mathematical calculations, amongst other SQL Server functions. The general consensus was to avoid recursion (where possible) to ensure faster query results and mathematical algorithms to compare against the data in the imported wordlist table.

The development team used TDD to implement the mathematical approach to solving the problem. Each code enhancement contained a new automated test script that would capture any failures and enable a simple rollback to the last working codebase. This proved to be a successful technique, as ideas were changing constantly throughout development.

Actual Dojo Solution

After around 90 minutes of SQL coding and TDD, the team managed to come up with a working solution. This is how they achieved it:

1. A SQL Function to work out a unique value for a character (letter).

a. Each letter of the word ‘Documenting’ has a case sensitive ASCII value e.g. ‘D’ is 68 and ‘d’ is 100.

b. All letters were converted to uppercase, to ensure a non-case sensitive approach.

c. Apply the POWER SQL function within a given letter, which gives it a unique number and cannot be duplicated by another letter.

2. Simple stored procedure that looks up the existing words in the full word list table, which references the function whilst comparing letter values.

a. Find all word combinations and their total POWER value.

b. Compare the total number against the hardcoded word ‘Documenting’

c. Return and records that have two words adding up to the total POWER of ‘Documenting’.

d. Nest the hard coded word into the procedure, which means any word can be compared at run time.

3. The TDD approach helped facilitate the iterative, code review process.

Conclusion

Although the Senior Developer and Dojo Team’s solutions met the minimum expectations for the challenge, there were a number of pro’s and con’s to both solutions.

The Senior Developer thought ahead and made improvements to the requirements, such as dynamically handling more than 2 word anagrams. He also demonstrated some of the under used and more powerful functionality within SQL. However, it takes around 2 minutes to execute one anagram and the code itself is not the most efficient. By not using TDD in his approach, he over complicated the solution and did not encounter performance bottlenecks until the end of the build.

On the other hand, the Dojo team fully practiced TDD. This was reflected in the much smaller codebase and, most importantly, the speed in which the anagram procedure executed was much quicker (23 Seconds). Their solution is limited in that it is very rigid and cannot handle more than two word anagrams. It also cannot handle special characters, whereas the Senior Developer solution can. However, these were not requirements of the solution – would a client accept a slower, more complicated product that handles requirements they do not need?

Overall, both solutions work adequately but it is clear that when confronting a technical challenge/problem, running a Dojo and practising TDD can bring more efficient results. Had we added further requirements to the challenge, maybe the dojo team would have found another, even more efficient solution.

References

General

Good Dojo Practices – http://codingdojo.org/
TDD – http://code.tutsplus.com/tutorials/the-newbies-guide-to-test-driven-development–net-13835

SQL Functions

POWER – https://msdn.microsoft.com/en-us/library/ms174276.aspx
ASCII – https://msdn.microsoft.com/en-us/library/ms177545.aspx
XML PATH – http://blogs.msdn.com/b/mind_talks/archive/2012/01/18/xml-path-for-sql-server.aspx
CHECKSUM – https://msdn.microsoft.com/en-us/library/ms189788.aspx

Dojo Code

For access to the Senior Developer’s and the Dojo team’s SQL solutions, please leave a comment and I will get back to you directly.  It would be interesting to see if anyone else has a better technique (both in terms of code length and performance) and welcome any suggestions.