Progressively Improving a Ball of Mud
July 24th, 2022
I have recently been working on a project which, as usual, is a legacy ball of mud. It is impossibly hard to work with. There is no way to tell whether the change I'm making will break something unrelated, which the client may or may not discover in a few months. Every time I fix a bug, I find a dozen others that nobody was aware of for years. The application breaks all the time, and often multiple times in the same place. Everybody knows it, but what are people doing about it?
Most have already decided that the only solution is to replace it, but, years later, the legacy is still there. Replacing an application that has no documentation and no acceptance tests is a tall order. I've seen these replacements fail or postponed forever. Replacing something that has been running for decades is risky, especially since so many business processes and applications were built around it.
It's hard to start from scratch without disruptions. You need to know, for example, that customers from a certain country should have a specific disclaimer at the bottom of one of the transactional e-mails. It may not be difficult to accomplish in the new software, but you need to know that this rule exists. Now imagine thousands of little rules, and you got yourself a mammoth task. People who get excited about full rewrites end up facing similar problems, and months into the rewrite, motivation sinks as people realize that it's not at all what they imagined it would be.
Whatever people are hoping to accomplish with a replacement or a rewrite, they can do with less effort using various refactoring approaches, while mitigating many of the risks.
Initial Big Refactoring
You can make tiny refactoring like renaming a local variable, but this will not make a dent when your design is all wrong. For example, I was working on a feature where it calculated the wrong amounts for a receipt. Hunting down this bug took days, and fixing it would have required at least a week. As I said, it's a ball of mud, and this particular feature had no classes and no automated tests. All the code was technical, meaning that it's impossible to reason about it in business terms. It just speaks in integers, arrays, and anonymous data structures that don't represent anything meaningful. When code cannot be reasoned about, that is, you cannon have a mental model of it, then working with it becomes risky and extremely slow. A 10-minute fix turns into a chore of several days or even weeks.
I knew that I will need to do a big refactoring, which rougly goes like this:
- Discuss with the team
- Design new classes as a mini-module
- Implement and test classes
- Write acceptance tests
- Reuse new code
Discuss With the Team
I discussed the upcoming refactoring with my team. The objectives were to get feedback on my findings, and to ensure that nobody duplicates the effort. I also listed all the bugs we could fix and features that we could build on top of this new design, so that nobody wastes time addressing those. This, of course, only works for non-urgent work. However, if everything is always urgent, then fix the client's or manager's expectations first. Urgency with such legacy will get everyone burned out.
What I usually do once I have gathered enough understanding is create a new design. It needs to be domain-centric. I don't try to solve every problem at once, or create the perfect solution. I find the common denominator between all bugs and features, and build a foundation. I can refine my design once I start using it.
I started with the simple goal of calculating correct amounts based on raw data. This resulted in a small class for each business rule, such as having price display rules depending on the country: US and Canada show item price pre-tax, while Europe includes taxes. There were a lot more rules than I initially thought. Some were taken from the code, others were found in a slide deck where the feature was initially discussed, which gave me clues as to what to look for in the code, and which questions to ask. 90% of the work was research, which I would have needed to do regardless of the refactoring. This is why I rarely shy away from refactoring once the bulk of the work is already done, and the understanding is fresh in my mind. To postpone refactoring is to throw away this precious work. Adequately documenting findings is as tedious as just writing the code, so might as well do that.
I also designed classes for both the input and output: a Transaction object representing the raw data goes in, and a Receipt object representing the calculated data comes out.
The Transaction object was fetched from a new TranactionRepository, which loads data from 7 different tables. The Transaction was designed independently from the data source. This is important, as I don't want to be biased by the database structure, which may not be adequate to represent the Transaction. This is especially true with code that grows over time. For example, there were 3 different ways to represent taxes in the database, because the software grew from province-specific, to country-specific, and then worldwide. The taxes were controlled by 5 database fields which were added over time. They may or may not be empty, and the combination of emptiness determined how taxes were to be displayed. I translated all that nonsense into a simple array of Tax objects, so that whoever uses the Transaction object would not need to know the quirks of tax storage. The Transaction already has all of this figured out by the time it comes out of the repository.
The Receipt's goal was to be ready to go into any template without further logic. Dates were formatted according to the locale, and amounts were formatted according to the currency. Everything was calculated based on country-specific price display rules, which determined whether to included the taxes on line items. In the end, I could give the repository an ID of the Transaction, then convert that object to a Receipt according to a variety of business and localization rules. In summary, it goes like this: repository > transaction > business rules > receipt. Pretty much every feature of every information system I wrote in my entire career can be reduced to this sequence.
The resulting code was significantly shorter than the initial messy code, while also having full test coverage. It was very easy to use and nearly impossible to misuse. Making changes to well-structured code was much easier.
Once the foundation was done, leveraging it in existing code was easy. I first wrote a few acceptance tests, which passed until the point where it failed on the wrong amounts. Once the new code integrated in the exiting features, the bug should go away on its own, and it did. Calling the new classes was accomplished in 3 lines of code. The existing code that handled receipt logic became disconnected and unused, meaning that it could now be removed. It was a lot of code.
The end result was like a small oasis in the middle of a desert. When I land there, it's refreshing, and a respite from the ball of mud. Subsequent bugs (specs that people missed) were easy to debug, because I no longer needed to try and figure out thousands of lines of code. I can write an automated test and find exactly where it fails, then fix it in minutes. It went from the worst-coded feature in the application to one of the cleanest. Even if I only needed to fix the one bug, it would have still been faster to refactor than to try and brute-force the bug.
What's more, the code is self-documenting. Because it was designed in the simplest possible way, any further models will just be a repetition of the code. The only documentation I can imagine is a usage example, which is 3 lines of code. Everything else is blindly obvious.
Another advantage of this technique is that there is encapsulation, meaning that future changes would not need to touch so many classes. Developers will be able to make smaller refactoring without risking stepping on someone else's toes, so when they find new things to improve, they don't have to ask permission, as the impact of the change is much more constrained. I hope this helps you tidy up your legacy. Happy coding!