I’ve said that notion of automation should follow the TDD specification process, not lead it. We choose the “test automation framework” (TAFW) based on the nature of our specification and how we have chosen to represent it to stakeholders.
That said, once we have settled on a TAFW, we can then determine how best to use it to bind the specification to the production code, allowing for automated tests to be run in a relatively quick and efficient manner.
I say “relatively quick” because the actual speed of your tests, and how granular they are (how much they specifically tell you when they fail) depends on how the bindings are designed. The traditional view of ATDD is that it is conducted completely from the external view of the system:
Typically, the “given” step definition method would enter information into the actual GUI, the “when” would click a button there, or some similar activity, and the “then” would check the database to see if the information was properly recorded. This is often termed a “full stack” test, because all the various layers must be operating properly for the test to pass. The test is comprehensive.
In the past this would have almost surely been a manual test as we would need a human tester to interact with the GUI, move the mouse, fill out fields, and click on controls.
Tools have made this easier. Frameworks like Selenium, WinRunner, Watir and the like make it possible to automate the GUI interactions. The specifics are beyond the scope of this series, but suffice it to say that you can easily find these tools and how to use them online.
The problem with such a test, even if fully automated, is twofold:
Does this make it a bad or undesirable test? No, but it means we do not want it to be the only test we have. Part of the problem here, lack of speed, may be due to the fact that the test has to spin up the database every time it runs, and has to access said database when we want to verify the behavior was correct.
If we understand the notion of test doubles, or “mocking” we can rebind the exact same specification in a variety of different ways.
Here we have replaced the Persistence layer with a mock version. The mock version would simply record any method calls made by the logic tier, so the test could examine said recording to ensure that the right actions were taken and fail if they were not. This test would be significantly faster because we would not read the database at all. In fact, we would not even need to gain access to it.
Note that this would only change the “then” part of the binding, in this particular case.
We could also write a version of the binding that eliminates the GUI. The GUI, if well-crafted, would contain no application logic but rather the GUI components would be written to call methods on the controller layer. Most GUI “form” tools are designed this way. There is no reason that our binding could not call those methods itself, bypassing the GUI entirely:
Here we don’t need a mock as the test itself is taking the place of the GUI layer. Note that each different binding involves fewer and fewer layers, making the test faster and more granular. If when then create a mock version of the logic tier, we can test the controller alone:
Where and how we create these narrower tests has a lot to do with the nature of the behavior and how it is designed. For example, there will likely be some components in the logic tier that can be tested on their own by simply calling methods on them and asserting against the value(s) they return:
If you’re thinking “that’s a unit test” you’re absolutely right. Remember in part 2 I said you can write a unit test or an acceptance test using any TAFW you choose. In this case the specification written in human-readable Gherkin is now driving a unit test, which is fast, granular, and can be run every few minutes, exactly what a developer would want when test-driving these logic components in the first place, or when subsequently making changes to them.
These are just examples. Some behaviors only involve one or two layers inherently. For example, sometimes a GUI has a rule that is only about its components:
Feature: Sales Calculator
Scenario: OK button activates when fields are filled out
Given a user has navigated to the sales form
When all fields in the sales form have been filled out
Then the OK button will un-grey
The binding for this behavior would only involve the GUI layer (and, again, would be created using a tool like Selenium) as the other layers have no visibility on this behavior and thus have no failure mode regarding it. Also, this specification could be written by an expert in “user experience” who would be the stakeholder in this case, and then bound by the developers to the actual component(s). Because of this we can use terms like “fields”, “form”, and “button” because these are terms the stakeholder is comfortable with and because this specification is solely about the nature of the GUI itself. We would not use such terms in a test about a logic component, where the stakeholder was a business analyst, etc...
Once you’ve created these various different bindings, executing the specification in various ways against the system, you also have a natural way to move the project through different stages of completion.
While test-driving the logic components, for example, your developers would be running the narrowest bindings constantly, ensuring that the behaviors at that level were all correct. If a “unit level” test is failing, there is no reason to run the “full stack” version as we know it will fail. The narrowly-bound tests therefore serve as a gate; until they pass, we don’t move on to the larger tests. As each expansion of the binding passes, we can run the next layer(s) out, until we eventually run the full-stack test (which is actually a form of integration test at this point). The fact that these larger tests are slower and less granular is no longer a problem because we’ve already proven the components work by running the narrower tests first. We get the speed and granularity that the developers need, but also have the visibility and clarity we want for all stakeholders in the organization… and the specification is in a single place, maintained and understood by everyone.
The one thing you’ll notice pretty early when doing your testing this way is that the various bindings you’re creating have many common elements. When we mocked out the persistence layer, for example. I pointed out that only the “then” part of that binding would be different from the full stack test. The code in these two different bindings would be mostly redundant, and that’s going to continue to happen to various degrees across the bindings. That’s what we’ll deal with in part 4.