Skip to main content

It ain't just reds and greens: Automated Acceptance Testing and quaternary test outcomes

Posted by johnsmart on March 11, 2013 at 11:08 PM PDT

Although they seem simple enough on the surface, test outcomes are actually quite complicated beasts. Traditional unit tests, and basic TDD tests, have just two states, passing or failing, represented by red and green in the famous "RED-GREEN-REFACTOR" dicton. In Behaviour Driven Development (BDD), on the other hand, we have the additional concept of 'pending' tests: tests that have been specified (for example, in a Cucumber or JBehave story) but not yet implemented. When we report on test results, we need to be able to distinguish these three states, as a pending test has very different semantics to a failing test. Pending means it's not yet done yet, but this may well be as expected, especially towards the start of a sprint. A failing test, on the other hand, needs fixing. Now.

Most BDD tools, such as Cucumber, JBehave, Concordion, easyb and so forth, report test results in terms of these three states. However, the complexity doesn't stop here. Maintaining web tests, for example, requires ongoing effort, and can perturb the test reporting if not handed with care. For example, if a web page changes during normal development or refactoring work, the tests that use this page may break. Although good software engineering practices such as the use of Page Objects can reduce the risk of this quite a bit, and reduce the work involved in maintaining the tests when it does, it is still something that will happen regularly. And again, the semantics of a test that is broken is quite different to those of a failing test. A broken test needs maintenance work on the test suite. It may also mask an application error, but you will need to investigate to find out. A failing test means that the application is broken, and therefore needs urgent fixing.

Thucydides is an open source library that aims to make automated acceptance testing and reporting easier and more informative. It builds on Java testing tools such as JBehave, easyb and even JUnit, and provides strong integration with Selenium 2/WebDriver for automated web tests.

In an attempt to address this limitation in conventional BDD reporting, Thucydides now distinguishes between test failures (triggered by an assertion error) from test errors (triggered by any other exception). Thucydides is an open source library that aims to make automated acceptance testing and reporting easier and more informative. It builds on Java testing tools such as JBehave, easyb and even JUnit, and provides strong integration with Selenium 2/WebDriver for automated web tests.

When you run your automated acceptance tests using Thucydides, any error that triggers an AssertionError (or a subclass of AssertionError) will be considered a test failure. Anything else (such as the NotFoundException, when an element is not found on the page) is considered to be an error, and therefore indicative of a broken test.

In the future we may extend Thucydides further to make this concept more configurable: for example, so that users can provide exceptions that should be considered as either an error or a test failure, or even adding additional outcome states (e.g infrastructure failure, database not setup, etc.).

Thanks for a good insight! I agree that Automated ...

Thanks for a good insight!
I agree that Automated Acceptance tests are much more than just red and green, pass or fail.

I'm currenty starting to develop new Automated Acceptance tests for a legacy system that don't have any automated tests. Just like manual testing, providing information about the system is more important than just saying pass/fail.

My thought goes in the same direction regarding automated tests as manual tests. To provide useful information rather than pass/fail.

Some useful states/data to report in a structured way that I found are:
* State of setup (e.g. connections to databases needed for the test)
* State of pre-conditions
* State of test data generation
* In-data that was actually used (might be randomized or combined in endless combinations, different for each execution)
* State of test execution
* Validation of out data
* Time meassurements for different steps etc.

If having this type of data stored, analysis of historical tests could be done to provide even more information by comparing that data.

Each test and each system has it's own set of information that is important to report. And I find it challenging to do that in a good way and find good documenting experiences about how to do that.

I haven't found a tool that supports that mind set yet... I don't think there will ever be the one-and-only tool for that either. But for the construction work, there will probably be better hammers, screw-drivers, saws etc that can help on the road. Maybe packaged into a nice tool box made of platinum. :-)