Improvements To FT.com Content Functional Tests

Did you read Martin Roddam’s blog post? If not, it’s well worth a look before you read this blog post. He explained the automated testing on our current CMS (content management system) for FT.com. He also described the new testing strategy being utilised in the forthcoming CMS replacement. This blog post is a sequel of sorts, explaining what we did to solve some of the testing problems on the current system. We intend this to form part of a trilogy of epic testing blog entries.

The Tricky Sequel

The work we carried out on the tests could also be seen as a sequel, or follow up. It was intended to bring the tests back to being relevant again. However, sequels tend to face a challenge in matching up to the original (who could forget Bad Boys II or The Stone Roses’ The Second Coming?).

So what was the state of the tests when we began? For us, the current CMS originally had a pretty good and detailed set of automation tests from the outset but it didn’t scale very well as the project developed and matured. The project ended up with an automated test suite that:

  • Few people trusted
  • Had too many failures which obfuscated real bugs with invalid test assertions
  • Took too long to run (over 12 hours)
  • Couldn’t tell us the state of our application when we needed to understand the impact on our stability of forthcoming releases by ourselves or other teams.

The only people who cared about the tests were the testers, and the rest of the team didn’t really consider the tests relevant. Pass or fail, they were a distraction. Effectively the tests were a peripheral part of the release process rather than being a core part.

Beyond the desire to change the problems above, other drivers to improve our tests come from the nature of our project. The FT.com website’s content publishing and rendering management systems are part of a core set of functionality. They have key integration points with other projects which require monitoring and qualifying. Some examples include changing user journey workflows (such as signing in or registering) and changes to advertising.

So what did we do?

  • Rearrange tests into a hierarchical test suite structure
  • Run the tests on a reliable and supported platform
  • Trigger/control the tests and review results from a centralised location

Let’s look at the work in a bit more detail.

Test Suite Structure

The original test suite had grown organically through iterative development of the website. Multiple teams were maintaining the tests which resulted in a disorganised structure. This made it difficult to:

  • Find and update tests to reflect a change in site behaviour
  • Determine the best location for a new test
  • Be sure that you have not duplicated a test that already existed
  • Be sure there are no redundant tests when the product features change

A good test suite should be easy for humans to read and navigate through as well as for computers. So we found two solutions to cater for both contexts.

We designed a new structure for the tests, based on common themes between tests. The structure in the screenshot below allows for simple and logical browsing of tests by any member of a development team. The tests are stored in Git in this manner.

Rearranging the tests to make them easily readable by people does not impact the efficiency of test runs – the tests are grouped in a different way for that. Originally the tests were run in three monolithic suites. We rearranged them into seven test suites.

Publish tool Priority 1

Publish tool Priority 2

Publish tool Priority 3

Render Chrome

Render Firefox

Render IE

Publishing Workflows

We’re trying to not overwhelm the machines that run the tests. Making smaller test suites allows us to have more focused/common tests grouped together. The tests also take less time to run so we get quicker feedback about the test run. Less time also means less sustained use of machine resources.

Another benefit is that we can be more selective about which tests we want to run at any given time. If we have a time constraint or we want to target the most important validation tests for our publishing tool (for a change made by another team), we can run the “Priority 1” suite and be confident we are verifying a core section of the application’s behaviour. Likewise if we determine there is a browser-specific issue/feature (or a new browser version has been released), we can run a set of render tests against Chrome, Firefox or Internet Explorer separately from each other.

A Better Place For Tests To Live

Throughout the lifetime of the tests they have been moved from location to location, sometimes under the cover of darkness.

The “Physical Machine Under A Desk” Paradox

Do tests exist if they sit on a machine under a desk, the location and name of which nobody knows, let alone the pass rate? We faced this problem when we tried to take the easy option of quickly setting up a box to run the tests next to our desks. The short term gains turned into long term problems:

  • Power or network cable would get disconnected accidentally
  • The machine was connected to the normal staff power/network systems which can be susceptible to power outages (scheduled and unscheduled)
  • Team members left the company and forgot to update others on where the test box sits, how it’s configured etc. (resulting in a treasure hunt for the box)
  • The box was not maintained/updated with latest security patches
  • The same person who maintained the tests, maintained the test box and nobody beyond that was aware of the details (a single point of failure)
  • The test suites all ran on the same single machine in series, one after another (increasing test run time significantly).

Virtual Insanity

The solution we came up with was to move the tests to run on a virtual machine. Good intentions don’t always result in the best outcomes. The benefits of the move were offset by significant problems:

  • Small disk space – Little significant forward thinking was employed during the planning phase, such as for disk space. After the OS was installed, there wasn’t much room left. This impacted the test runs as lots of artefacts were being generated (such as log files). Test runs would frequently crash due to running out of disk space. We had to wipe the disks of this data between test runs and sometimes even during test runs.
  • Low RAM and slow CPU power – this contributed to long test run times and occasional crashes due to running out of memory.

Virtuous Reality

Eventually we realised we had to do more. We procured and worked on a group of dedicated virtual machines (one where the tests were run from, and three VMs on which to run the tests), based in one of our managed data centres.

  • All the maintenance of the machines were taken out of our hands by the dedicated support team for VMs.
  • We ensured the VMs had large hard drives (100GB instead of 20GB), a good amount of RAM (4 GB) and some modern, faster CPUs.
  • The underlying hardware that the VMs ran on was significantly better in our data centre than that hosted in our office (dedicated backup power supply, etc).

This made the test runs far more simpler. All we had to do was ensure the appropriate software was installed and configured on the machines.

An important point to make here is that even with the VMs in place and running our automated tests, we ensured it was still possible to run the tests on a local machine – this is important for a number of reasons:

  • Continued development/maintenance of test scripts
  • Running a specific test to investigate a defect
  • Backup to run test suites if the VMs are down or lost connection


Jenkins: The Future Of Administering Tests

By this point, we had achieved major improvements. We had a better design for the test suites and a better platform for them to run on. More was required (we needed to improve how the tests were run, monitored and administered).

Tests runs were being controlled through a batch file which was manually triggered by a member of the team logging onto the machine and clicking on it. From there they could either start (or terminate) current test runs. This was a lot of effort, with no ability to track who had run the tests or how much longer they had to run.

Our solution was to use Jenkins, the continuous integration management tool.

  • We installed Jenkins on one VM as the control and connected three other VMs to it as slaves where the tests would be run.
  • We configured a build job for each test suite. They are assigned to run on different slaves meaning certain test suites can be run in parallel thanks to arranging the tests into smaller suites
  • The Jenkins web client can be accessed by other teams allowing them to trigger the tests when they need
  • We can track the progress of tests and check the estimated time they should finish (as Jenkins stores metadata of previous runs)
  • There is an improvement to configurability of the tests and the environment they run on without having to go onto each of the VMs to repeat config changes
  • Jenkins also provides a useful quick report on the results of the test runs. If further detail is needed, we examine the results in Cuanto (http://www.trackyourtests.com/) which is software that collates and allows the reading of test results through a browser
  • We (or anyone who needs to) can execute or terminate test runs at the click of a button on the Jenkins admin console web page instead of a batch file on a machine
  • We are able to see what tests are running at any instant in time
  • We can see trends of recent test runs to identify if there’s been a measurable increase or decrease in pass rate

The Result: A Blockbuster Success

It’s been a major success. The tests are now a core part of our release process and are used by ourselves as well as other teams. Both developers and testers are now interested in the test runs. This is especially noteworthy as developers are now actively creating new/modifying existing tests to cover our releases. Our manual testing time has been reduced for each release thanks to the confidence gain in our tests. Consequently we have some more time set aside for exploratory testing which allows us to gain more experience of our products as well as detect a good percentage of defects that would otherwise have been found by our customers (or perhaps not at all). The tests hit genuinely high pass rate levels so that when there’s a fall in the pass rate, we know there’s a real issue to investigate, whether it’s environmental or with our new features under test.

The Future….

This isn’t the end of the work to overhaul the tests. There is more we intend to do.

  • We are planning to extend the number of test jobs in Jenkins further by adding new jobs for health checks and test suites covering other site components
  • Test Data Generation
    • Currently, many of the tests have a setup section where they generate testing articles using our CMS solution. In the lower environments this can prove problematic if the CMS is being patched or has some other form of outage, as this prevents the tests from running. The tests are thus dependent on this connectivity being up.
    • The solution will be to utilise a component that exists between our front end web application and the “back end” CMS which means our dependence on the CMS is reduced (but not removed)
  • Move to selenium webdriver
    • The current framework used by the tests is Selenium Remote Control (RC). It’s a legacy product and is deprecated.
    • Selenium RC works by injecting JavaScript into the actual page being tested. Effectively you are changing the product being tested – which is something we should be avoiding as we don’t know what state the page could be in.
    • Selenium Webdriver is the current framework that is in wide use in the web testing industry. We need to move across to this but the process may be relatively long as it’s a significantly different API than the previous product.