Improving Software Delivery with Unit Regression Tests

Diffblue HQ
7 min readMay 20, 2020

Whether your organization is a bank that relies on applications to perform high-speed trading, a retailer that wants to provide the best online shopping experience for customers or a manufacturer that needs to keep track of a vast inventory of parts to ensure just-in-time delivery, software is key to your success. Software has become critical to creating competitive advantages⁠-either deployed internally to increase employee productivity or externally to serve customers better.

To gain and maintain these competitive advantages, organizations need to release new features quickly.

In this blog, we explore the key metrics that you should focus on to improve your development processes and how unit regression tests will contribute to you becoming an elite performing organization.

Measure, then improve

The only way to stay ahead and maintain competitive advantages through software is to adopt a continuous improvement methodology for your software development life cycle (SDLC). If you are not continually trying to improve your software delivery process, you will fall behind your competition who will release new and innovative features first. Let’s look at how you can implement continuous improvement in your organization to prevent this from happening.

As the management consultant Peter Drucker said, “If you can’t measure it, you can’t improve it.” So, the first thing we need before we can think about improving is to measure something? But what?

Organizations have tried various metrics to measure developer effectiveness including lines of code written. Initially, this metric seemed like a good idea-because surely, the more code a developer writes, the more productive they are, right? Not necessarily. Organizations realized that this metric led to programmers writing bloatware: unoptimized code they could have written to run much more efficiently. How could organizations be sure that the code was effective and did not contain defects? The number of lines didn’t give them that either. Metrics have to incentivize the behaviors that the organization finds desirable.

DevOps practitioners’ extensive research has resulted in the industry settling on the DORA metrics: the most appropriate metrics to measure developer effectiveness to improve their teams’ performance.

DORA Metrics

Over six years, the DevOps Research and Assessment (DORA) researched many criteria to determine which software delivery capabilities contributed the most to a company’s overall performance. The researchers based the measures on real-world analysis rather than theoretical models. As a result, organizations that work towards improving these metrics within their developer teams increase the overall performance of their business and have a much higher chance of joining the group of elite performers.

The four metrics are:

  • Lead Time
  • Deployment Frequency
  • Mean Time to Restore
  • Change Failure Rate

DORA Metrics and Unit Regression Tests

In this section, we look at the four metrics in more detail, describe their impact on the business, and analyze how organizations can improve them by automating unit regression tests.

Metric #1: Lead Time (LT)

Lead time is a measure of the time it takes from a developer committing code to the organization deploying the feature in production without any issues. We have touched on this metric and on change failure rate in a previous blog, but I will go into more detail here.

Long lead times of days or even weeks mean the developer has moved onto creating other features by the time users find any issues in production. It takes a significant amount of time for the developer to fix the reported problem. They have to relearn the logic of the code that they created previously before they can start troubleshooting. Revisiting the code takes time and it prevents the developer from making progress on other features.

However, if an organization can deploy a new feature within a day of the developer finishing the code, issues that are discovered by users can be fixed quickly as the code and logic are still fresh in the developer’s mind.

According to the State of DevOps Report for 2019, an organization is classed as elite if they can get from code commit to running in production in less than a day. High performers achieve this in under a week.

Unit tests increase the chance of issues being detected and fixed early in the SDLC. Fixing issues early results in end-users reporting fewer problems when companies release new features into production.

Having to manually write unit tests can increase lead time significantly, but unit tests that are written automatically with artificial intelligence are typically generated in 10% of the time, resulting in reduced lead times and new features being delivered to users sooner.

Metric #2: Deployment Frequency (DF)

Deployment frequency measures how often an organization delivers valuable features to users. This metric improves when the scope of each software release (sometimes referred to as the “batch size”) is kept small. A small batch size results in users receiving software features quicker, which helps to speed up feedback, reduce the risk compared to a “big bang” launch, and increases the motivation and sense of urgency of developers.

Elite organizations deploy on-demand as soon as a new feature has been implemented and passed the automated delivery pipeline; high performing organizations deploy between once a day and once a week.

To achieve these rates of deployment frequency, organizations need a high degree of automation with their SDLC as human intervention slows down the process. As discussed above, automating the creation of unit regression tests reduces the time it takes to test code units, which means developers can deploy more frequently.

Lead Time and Deployment Frequency both measure how fast developers can get features to users. However, equally important are metrics that measure the quality of code delivered as there is no point in shipping software quickly if it is full of bugs or is not fit for purpose. Mean Time to Restore and Change Failure Rate are metrics that encompass quality measurements.

Metric #3: Mean Time to Restore (MTTR)

In high-velocity development environments, errors are inevitable. Facebook used to have the company motto “move fast and break things.” This was how Mark Zuckerberg told developers that he considered it critical to get new features in front of customers quickly at the expense of introducing bugs they could fix later!

So, if we accept that failure is inevitable in a fast-moving environment, it is essential to track how quickly developer teams fix issues when users report them. The Mean Time to Restore metric measures how long it takes to get an application up and running once a user reports an outage or significant degradation. If organizations take too long to restore a business-related application, employee productivity sufferers or customers become unhappy because of the reduced level of service.

Elite organizations restore a primary application within an hour and high performers within a day.

If new code causes an outage or significant slowdown, it needs to be changed, which involves it going through the entire CI/CD pipeline again. Automatically created unit regression tests speed up this process as described above for Lead Time.

In addition, developers that run unit regression tests help to prevent introducing issues into production, which reduces the chances of an outage and therefore the need to restore a defective service.

Metric #4: Change Failure Rate (CFR)

It is essential to track how often a software modification results in failure so that organizations can identify improvements to the overall SDLC process. Change Failure Rate measures the percentage of releases when issues are not detected during testing and get deployed into production.

If the CFR is high, then the cost of resolving issues is higher than if they were detected early in the development process. Reducing the CFR reduces the number of times developers have to go back into their code to troubleshoot issues found by users in production.

Both elite and high performing organizations see between 0% and 15% of changes requiring actions such as a hotfix, rollback, fix forward, or patches. Companies achieve low CFRs by ensuring that as many errors as possible are caught and fixed early in the SDLC to prevent them from being rolled out to production.

The introduction of unit regression tests into the CI/CD pipeline results in developers catching issues early in the development cycle. By adopting a test-often-test-early mindset using automatically created unit regression tests, developers can significantly reduce the CFR and can increase the quality of the software they deploy to the production environment.

Improving performance

Continuous improvement is a goal of all developer teams that adopt DevOps practices. Before making any attempts to improve, teams have to measure their processes to determine what they can improve and to monitor progress.

The four DORA metrics described in this blog (Lead Time, Deployment Frequency, Mean Time to Restore, and Change Failure Rate) have industry-wide recognition as the key measures determining whether an organization is an elite or high performer. Therefore, it follows that improving these metrics improves the performance of an organization.

Increasing software velocity

We discussed how creating automated unit regression tests using Diffblue Cover helps improve these metrics:

  • Lead Times reduce as automated unit regression test writing takes only 10% of the time it would take for a developer to write them
  • Deployment Frequency increases due to the automation of writing and updating unit regression tests
  • Mean Time to Restore reduces as unit regression tests are updated quicker by the AI engine than by developers. In addition, there may be fewer outages or service degradations overall as more of the code is tested and issues are detected during early testing stages rather than in production
  • Change Failure Rate is lower due to greater code coverage resulting in developers being alerted before deploying them in production

All of these metrics factor into the length of your release cycles and your organization’s software development velocity. A team that has higher quality code and effectively uses automation is better equipped to deploy releases more quickly and more often as part of a CI/CD pipeline.

Learn more about how Diffblue Cover creates unit regression tests.

Originally published at on May 20, 2020.



Diffblue HQ

Diffblue Cover autonomous AI-powered Java unit test suite generation & maintenance at scale. | We’re making developers wildly more productive with AI for Code