Drowning in Debt

Drowning in Debt
Technical Debt

I'm not referring to financial debt but technical debt.  Technical debt is the additional cost of rework by choosing the easy solution rather than implementing the better approach. It's easy as a developer with a manager who is asking daily if you have fixed the issue or completed Project X to want to work quickly to get the code or bugfix implemented as soon as possible. But at what cost?  Technical debt affects every company and every IT department and it costs money to correct.


Our Story

About a little over a year ago when I started in my new department as the full stack architect, the team was getting prepared to roll out some upgraded software in less than a month. After the deployment to a few customers, we noticed there were issues. The developers spent some time trying to solve the issues without any success. Since I was new to the team, I used this opportunity to review the code so I understood the features of the product.

As a developer for more than 25 years, I was horrified by the state of the code. This is brand new code that has barely been implemented. I expected the code to be cleaner and in a better state. However that is the last thing I found.  This brand new application barely used by a few customers was drowning in technical debt.

What I Found

  • 140K lines of code for our frontend application.
  • 78K lines of code for our backend application.
  • Duplicate code for every page of our frontend application. We had approximately 11 pages and each page was duplicated from one to the other.
  • Lots of duplicated code. The same calculations for one field written at least 11 times.
  • Data validation that was performed on the front end application were also performed on the backend application.
  • Some data validation were not being done for fields.
  • Unused code. Functions that were written but that were never used.
  • Large 100-200 lines nested if statements
  • Poorly implemented framework
  • Security vulnerabilities
  • Bi-directional flow of data

The project was failing so we had to take a break from deploying and roll back our customers to the previous product so we could correct the issues in our new software. After discussions with the team leaders and our managers we decided to clean up the code and run some major test scenarios to get a better quality application out the door.

What is Technical Debt

Martin Fowler has a great article explaining cruft and technical debt. Technical debt is the consequences of software development decisions that prioritize speed over quality.

Cruft is the deficiencies in internal quality that make it harder than it would ideally be to modify and extend the system further.

https://martinfowler.com/bliki/TechnicalDebt.html

Our group had too much cruft. We had barely began to roll out the new product to our customers and we were already drowning in technical debt.

Types of Technical Debt

There are several types of technical debt. After reading Martin Fowler's article on technical debt, I read the subsequent article on the types of debt.

https://martinfowler.com/bliki/TechnicalDebtQuadrant.html

What happened on with our team was possibly a combination of all of these types of debt. We decided to move forward with the code cleanup and reduce as much technical debt with a dedicated team focused on this purpose but we also focused heavily on producing a quality application with minimal bugs.

What We Did

We made many changes to the department over a short period of time. We changed some of our Agile processes, worked on the technical debt and quality improvements of the application and security changes to our team.

Development Standards

  • Reviewed the team's development standards and ensured they were being followed. We found that some standards were not being followed.
  • Implemented code reviews by two senior developers for all code merges.
  • Standardized our code branching strategy.
  • Reviewed existing test cases and implemented new test cases

Documentation

  • Revised the previous architecture designs of the software and infrastructure. The architecture was redesigned for original production implementation to remove bi-directional flow of data which eliminated the source of truth for work orders.
  • Architecture diagrams created and placed in architecture website as none previous existed.
  • Documentation site created in Gitlab with Hugo pages for developers and business to use.

Sonar and CI/CD

  • Reviewed Sonar scans looking for code smells that could be removed or reduced.
  • CI/CD - Automated the code builds. Continuous Delivery is still to be implemented.
  • Prioritized the Sonar scan to not increase in number. The number of code smells and duplications could not increase.

Code Changes

  • Upgraded all the dependencies to the latest stable version.
  • Converted 80% of our functions to utility functions to reduce duplications. We still have some duplicated code but it is a work in progress.
  • Converted the 11 pages to have common HTML pages and common features that were executed from a common functions.
  • Refactored some of the nested if to be simplified. There were a few 100 line nested-ifs that needed to be simplified.
  • Reviewed existing test cases and implemented new test cases

Process Changes

  • Redefined the Agile process to correctly document Problems Reported in Test (PRT) and Problems Reported in Production (PRP).
  • Agile process for refining tasks were redefined to be more detailed. Refinement tasks now include documentation, testing for UAT and unit testing and deployment tasks.
  • Clarified what is expected for the refinement of the tasks ensuring that the tasks could be picked up easily by any developer.
  • Reiterated that code quality and the quality of the software was of the highest importance.
  • Ensured that we had three rounds of testing before a release
    • Unit testing
    • IT Business testing
    • UAT testing

Security Changes

  • Reviewed and cleaned up database and application accounts
  • Implemented tighter security on the application and database servers

Where we are today

Our plan was to have our application code cleaned up and 95% bug free in 3 months. The cleanup of the code and major testing effort for the application took 10 months.

We had 3 IT individuals test each release for any bugs and we found 453 PRTs (Problems Reported in Testing) during these 10 months. After our pilot deployment to production we only found one PRP (Problems Reported in Production) which we fixed within a day.

As of February we have migrated 9 of 30 North American customers to the new application.

Code Quality

Our code quality was increased during the refactoring. We still have a more code to clean up to reduce the lines of duplication.

Before After
Backend App Lines of Code 78K 52K
Backend App Lines of Duplication 906 335
Backend App Days of Debt 61 10
Client App Lines of Code 140K 79K
Client App Lines of Duplication 2700 769
Client App Days of Debt 207 78

Problems Reported

We categorized all the problems reported and prioritized the problems based on the critically of the function affected. This highlights the major testing effort that the team did to provide a higher quality product.

The two major problems which affected production were resolved within a day. The two trivial problems found by the customers were annoyances and were implemented in a couple of weeks.

Reduced Incidents

During this refactoring we also improved upon our existing support model to reduce the overall incidents that occurs for the existing application. This required the support team and the developers to prioritize incidents to have root causes of the issue analyzed and a problem ticket was created to resolve the issue.

Blocker Major Minor Trivial
Test 83 168 155 47
Prod 0 2 0 2
Criticality 2021 2022
P1 (production down) 7 4
P2 45 12
P3 184 119
P4 (annoyance) 84 37

Retrospective

My team and I learned a lot of valuable lessons during the last year and half.

Many of those lessons were related to the technical debt quadrant outlined by Martin Fowler are categorized below based on our team experience.

Deliberate + Prudent

  • Decisions to ship now and deal with consequences later
  • Decisions to implement pieces of the technology without understanding the consequences

Deliberate + Reckless

  • No formal written architecture or functional design documents
  • Immense pressure to finish software without regard to quality

Inadvertant + Reckless

  • Junior developers needed more guidance but there was no expertise or time to learn

Inadvertant + Prudent

  • Hindsight - The design was rushed because of requirements of another project

Our team did an outstanding job working many long hours and weekends to ensure the success of the application. Without their hard work and dedication we would not have the success we have today.

The team learned that following development standards and implementing code quality reduced many of our previous issues with the deployment. The major testing effort by the team reduced our overall risk to delivering a quality product. The team was mentored by several senior developers and continue to be mentored by senior developers.

Our outstanding team leaders helped drive the coordination of the refactoring and standardization of the agile process. We received technical leadership from the enterprise architects and other technical architects to review the code and design of the architecture to ensure we were implementing sustainable designs and implementing quality practices.

Upper management provided outstanding support and assistance when we faced roadblocks. They ensured that any assistance we needed or roadblocks we faced were cleared for our success.