Issues working with large legacy automation test suites and how (we think) we got it right the next time

“The automation tests are passing at around 80% so I think we’re good to release”

We’ve heard this expression before and it doesn’t bode well. We release code into the production environment and within a few days a new defect is reported.

“How did this get missed?” asks the editor.

It’s a fair question. This particular defect is not an edge case and is critical to someone’s workflow. On asking one of the testers how they carried out their testing, we find that they performed exploratory testing of the new feature and some regression on the affected areas. The defect was not in an area that was expected to be impacted by the changes in this release. Fair enough; it seems like a sensible approach.

“What about our automated tests? Shouldn’t they have picked this up?” asks the Programme Manager.

Well, the answer is ‘not necessarily’ – we cannot check everything. In this case, however, we did have an automated test that covered the exact scenario described in the defect description. So what happened? The answer is simply that the result was masked in the noise of a suite of tests that is in bad shape. More detailed analysis shows that the test case in question had been failing for some time.

“Why wasn’t the defect raised sooner?”, I hear you ask. Well, a defect was raised for a failure, just not this failure; the test failed at an earlier point in time due to a known, low priority defect yet to be fixed. When the test case failed again, an assumption was made that this failure was entirely due to a known problem.

“So why don’t we just fix the tests to prevent this kind of thing happening again?”

The product is important, but is no longer being heavily invested in. It’s a mature product which is coming to the end of its life. The team looking after the product is much smaller and the existing suite of automated tests being inherited is: large (over a thousand Java/Selenium 1.0 tests), based on older technology, slow and brittle. It was also developed by coders of varying skill, many of whom are no longer at the company, who exhibited an undisciplined approach to peer review.

It’s clear that a lot of investment would be required to solve these problems.

“Are there any quick wins?” asks the Programme Manager.

“How about excluding all the currently failing tests?”

This might get your pass rate closer to 100%, but leaves a gaping hole in your coverage and does not address the fragility and reliability of the rest of the suite.

A better question to ask is ‘What value do we get from this test suite?’.

The noise generated by so many test failures reduces confidence in the ability to understand whether or not there are any new defects. The time taken to nurse along a flaky automation suite and analyse the results is significant. Perhaps the information generated by running these automated tests is actually damaging? Might we better spend the time saved by not running the suite (and analysing the failures) on more exploratory testing? Perhaps the creation of a much smaller, smarter set of tests, which are reliable and resilient, would prove more valuable? A suite of tests which, when failing, would highlight a real defect, showing greater value and be used with greater enthusiasm than a much larger suite of tests that returns dubious information.

Work on replacing the legacy product is well under way. The new team is smaller than the team working on the legacy product, but the automation tests are relied upon and are intrinsic to the team’s function.

So why is this working so (apparently) well and what are we doing to avoid the same pitfalls?

For one, the team structure, culture and development practices are quite different. The new team has adopted Behavioural Driven Development (BDD) as well as Cucumber as a tool for automation (see Sarah Wells’ article Behaviour-Driven Development at the FT). BDD gives the team a good shared understanding of what it is we are building – we run what we call ‘speccing sessions’ to facilitate this. For user stories that we decide to automate, we build them directly on top of the scenarios derived from our shared understanding. Straight away, any automated tests we create have clear purpose for all to see (Product Owners, Business Analysts, Developers, Testers) and the test coverage is also well understood. The legacy tests suffer from having no link to the original requirements of the system and the descriptions only go as far as the method names, which are not always clear.  The new Cucumber tests also have the benefit of acting as our functional documentation.

Here is an example to illustrate:

Scenario: A simple article can be successfully published

 Given a simple article exists in Methode

   And the article has not already been published

  When I publish that article to the Content Store

  Then it should be available from the Read API

   And the id should be the same as the Methode UUID

   And the title should be the same as the Methode headline

(Methode is one of our content management systems).

The new team structure has shifted roles and responsibilities. Traditionally, automated tests were written by testers. This pushed ownership of the tests onto the testers and, even with the best of intentions, these tests were from thereon regarded as ‘something for the testers to worry about’. In the new team, testers help define the scenarios and use their experience as testers to assist with the creation of smart test cases. It is the developers who automate these ‘tests’. The term ‘test’ is used loosely with respect to automation since an automated script is incapable of ‘testing’, only ‘checking’. Now, all stakeholders care about the test suite in terms of: results, code quality, extensibility, reliability and so on.

A further step to cement the importance of the automated suite was to make it a critical part of our build pipeline. We cannot commit or promote code to production without our acceptance tests passing completely. This has resulted in our test suite being treated as production quality code. Issues that arise are fixed straight away, just as would be the case for the product itself. Since the tests are now an intrinsic part of our build pipeline, there is also a vested interest in keeping the duration of the automation run as short as possible, partly to allow code to progress quicker through the pipeline and partly to get quicker feedback.

We also require peer review for all code commits and this is managed using tools such as Stash from Atlassian, the makers of JIRA and Confluence.

What we’re used to seeing now

By contrast, our legacy product has no hard dependency on the result of the tests and no-one other than the testers have any overriding incentive to care about them. As such, they are not treated in the same regard as the delivered product.

“So what is the role of the tester in this new team?”

Testers are left to do what they do best: intelligent interrogation of software and systems to learn new information. Exploratory testing is not used to confirm what we think should happen, but to tell us as much as possible about what we don’t know. This information is sometimes characterised as defects, but other times simply gives stakeholders greater awareness of the subtleties within a complex piece of software.

A good deal of ‘testing’, or ‘meta testing’,  occurs during the speccing sessions with the involvement of the whole team. During these speccing sessions, a lot of information is gleaned and the tester can process this and make assertions about quality and potential pitfalls before a single line of code is written. This can often drive out problems early and again helps to create a better defined set of automated checks.

Testers can still continue to write automated checks, but these are no longer the sole responsibility of testers. Every commit to the code base is peer reviewed and the same goes for the automated checks.

One lesson we have learnt the hard way is that neglect, underinvestment and procrastination over time inevitably leads to seemingly insurmountable problems. If you care about your product, treat your automation efforts with the same care and attention as the product itself.

Author: Martin Roddam

Senior Quality Analyst in Data Technology