Variety of Models

“Every test has a purpose and every concern a layer”

This is my guiding motto for how I look at quality strategy, but admittedly
it does not tell you how to figure out which tests to write and which layers
need to exist.

At this point in time, most of the industry has normed around a few different methods of creating tests:

  • Test Pyramid or Test Trophy (if JS)
  • End to End Automation
  • “We have enough money to pay manual testers so we still do that”

Unfortunately, while all of these methods have use, they still run into a problem, eloquently expressed by Justin Searls on Twitter:

“…Nearly zero teams write expressive tests that establish clear boundaries, run quickly & reliably, and only fail for useful reasons…”

So how do you resolve that within your team? The motto I expressed and the
methods presented do not actually answer this question. The four methods
mentioned above focus on implementation rather than creation. My motto gets closer, but it is not intended as a guide (more of a catchy phrase I can use to identify why I do what I do).

So back to our question, how do figure out what the heck the tests you write should be?

Is the Test Pyramid right for you? Is the Test Trophy the better option? What about the Test Hex? Can I just ignore tests completely and just have manual testers keep doing everything?

I believe you should use a strategy model, a variety of models even.

I can hear you now, “But Sarah, why would you want a variety of models?”

To that I respond with, cause there is a variety of situations!

In the world of science, there are a variety of models for the same thing, which are applied based on which is best for helping to answer the question at hand. In general, the more precisely accurate a model is, the more clunky it is to use. If your question does not require the precision of the more accurate models, you can limit how much information you work with under the guidance of less complex models.

For instance, the outdated Bohr model of the atom is still used under some circumstances. It is a vivid visualization, one which many people can still remember after they learn it, even if they never go back to atoms again. It has its flaws (and currently accepted most accurate model electron cloud is generally better), but it is good enough to give clarity and its orbits are still the location of highest probability for finding an electron. This makes this a good model under the right circumstances. A model does not have to be perfect or accurate, just good enough to impart the right information under the circumstances it is used.

So, the purpose of a model is to help you answer a question by focusing on what matters. Frequently we have a lot of information at our fingertips, but not all of it useful. Just as frequently, the information we really need is missing. Models for strategy and thinking assist in understanding these knowledge gaps.

Going back to our question of how to test, there are (at the time of this writing) several models of various levels of popularity. I will cover:

  • Swiss Cheese
  • Test Pyramid
  • Testing Trophy
  • Testing Spectrum
  • Microservices Testing Honeycomb
  • Automated Testing Quadrant
  • Round Earth Test Strategy
  • Heuristic Test Strategy
  • Root Cause Analysis

Then I will give an example of how and when I would leverage certain models from this list in a way which speaks to the strength of the particular model. Skip to here if you want to jump past the individual models analysis straight to the example.

Swiss Cheese

As proposed by: Me
There is an entire section of this website dedicated to this model I suggest checking out, but the tldr is “Swiss Cheese encourages a purpose-first approach, with the warning that it is easy to fall into the trap of creating too many quality layers”

Test Pyramid

As proposed by: Mike Cohn in Succeeding with Agile
Martin Fowler explanation
TLDR: The slower and more expensive a test type is, the less you should use of it
Inputs for model:

  • Speed of test type
  • Cost of test type
  • Value of test

This model is great at guiding you towards a more sustainable test framework by encouraging questions like: How cheap can you make this test? How fast can this test be? Can this be covered by a unit test?

The pitfalls to this thought process is it does not help answer “what am I actually testing”

Testing Trophy

As proposed by: Kent C. Dodds
Testing Javascript with Kent Dodds
The TLDR is you should focus on tests which return higher confidence, which means more tests which test components interacting with each other.

This model is great at breaking down testing within Javascript and leveraging what other Javascript developers have learned.

The model is less useful outside the Javascript world and uses a definition of unit and integration different than my definitions (which can result in confusion when talking to people).

Microservices Testing Honeycomb

As proposed by: André Schaffer
Testing of Microservices - Spotify
TLDR - A variation of the original test pyramid created for microservices, with a heavy focus on how services work together.

This model is a great guide for what good integration tests can look like, and this particular test type is one I prefer when building tests like Data Layer Tests or highly coupled services.

However, I feel like this model has a similar failing to the Test Pyramid - its focus on implementation comes at the cost of identifying purpose.

Automated Testing Quadrant

As presented by John Ferguson
Test Pyramid Heresy
TLDR: Tests can serve the purpose of Living Documentation, compliance tests, living api documentation, and/or technical checks. A project should determine what quadrants it focuses on (eg, a small microservice might focus on living api documentation while a finance tool would have more compliance)

This model hits a lot of the same points as Swiss Cheese. The linked post is also just a really good post I recommend everyone to read if this topic or quality strategy in general interests you.

However, I feel like this model serves best as a describing model - it is good at helping you talk about your strategy, it (as described) is less useful for creating the strategy in the first place.

Round Earth Test Strategy

As presented by James Bach
Round Earth Test Strategy
TLDR: Thinks of technology as concurrent spheres analogy to help understand, includes data. Looks at assumptions as the bedrock and attempts to make use of a useful analogy.

This model comes from a good place. A useful analogy can encourage conversation and a good analogy can act as a good model for conversation.

However, this model is trying to do a lot. It is a great example of how trying to do everything can result in not being good at anything. It is complicated and to those who are able to effectively use it, more power to ya.

Software Testing Spectrum

As presented by Ecky Putrady
The Software Testing Spectrum

  • This is more of a way to measure a test by looking at its impact (Confidence) compared to its cost (churn, run cost, setup cost).
  • Very very useful for measuring how good the tests themselves are, but less useful as a way to guide writing tests
  • Additionally useful when choosing between implementation methods by narrowing the focus to impact vs cost

Heuristic Test Strategy (Matt Heusser)

  • This is basically the Swiss Cheese model, where you go through all the risks on a project in a systematic way, identify the risks you want to do something about. You figure out the kinds of different ways you can address those risks and stand that up against the consequences.

Root Cause Analysis

Not technically a testing model, but important tool in the kit

  • Something which is good to use when looking at how problems make it through the strategy and what to look at changing. When partnered with something like Swiss Cheese, you can have good conversations like “which layer should have caught this? Is it reasonable? Are we missing a layer? Is that layer worth having?”

While some of the models overlap, many of them actually focus on different things and have different strengths.

So let’s go through an example of how we would leverage a few different models in a scenario.

Models in Action

For this example, the project in question is one which has been running for a while, but they have a lot of production issues - the team noticed they are spending most (>60%) of their time on unplanned work such as debugging the production issues. By the Swiss Cheese model, they have an overflowing bucket situation and have decided to relook at their strategy.

They started with looking at the Root Cause Analysis sessions on the production issues and determined that issues with the database interactions were the primary culprit. Since the current quality strategy did not appear to take them into account, they decided to run a Swiss Cheese Test Strategy workshop to redo the strategy.

Swiss Cheese Workshop Outline

During the Test Strategy workshop, the team discovered they had a missing layer around the database. No one really thought about it before, but the database has been aggressively mocked out of all tests except API tests, and there’s no tests specifically around interactions, despite the high complexity of the interactions. As a result, the concern of “our database interactions stay consistent and stable” went unmet.

(While we could have jumped straight here from Root Cause Analysis + examinging the strategy, the team wanted to ensure there were not other surprise missing layers)

As a result of this discovery, the team decided to add a test layer whose entire purpose is to verify the domain objects ability to manipulate the database. This is great and also the end of the Swiss Cheese model contributions for this example. Now that we know what we want to verify & protect, we turn to which models help guide implementation details.

Let’s first reach for the good old stand by, the Test Pyramid.

Standard Test Pyramid

As a reminder, this model encourages us to look at speed and reliability. For this starting point, we are looking at just testing the domain objects talking to a real database - a MySQL database in this case. These tests are most likely gonna be pretty reliable with some best practices (eg, clean database between tests), but they will not be fast.

The Test Pyramid encourages us to ask, can we speed these up without loosing that reliability?

Well, the entire point of these is to test database interactions, and that is also our slow down point…but maybe there is still a way. In memory databases have been around a while and while they are not perfect, their imperfections are pretty well documented. Is this the right choice?

Now we are evaluating two different potential types of tests. This is the perfect place to break out the Testing Spectrum with attributes focused on what we care about.

Blank Testing Spectrum

Filling this out for the original proposal of unit test framework using domain objects and MySQL normal database:

Testing Spectrum for MySQL

TLDR is:

  • High confidence
  • Low performance

Okay, but what about the in memory database? There would be a small confidence hit, but since H2’s incompatibilities are well documented, it should be manageable. And there would be a significant performance improvement, leaving us with:

Testing Spectrum for MySQL v H2

Putting both on the spectrum gives us the direct comparison where the we felt like there would be a small drop in confidence, but a big uptick in speed. That looks pretty decent, and is ultimately H2 is what the team went with.

To conclude the example, we used the Swiss Cheese to identify the missing layer, and leveraged both Test Pyramid and Testing Spectrum to pick our proposal for how to implement that layer. The three models working together achieved a better result than if we had used only one.

Note: To be clear, I’m not saying shove as many models as possible into your working day. What I am saying, is if you are unsure, use the models to find the information to focus on, and use the situation to figure out which models to apply.

Eg, Swiss Cheese and Test Pyramid both seek to help create test strategies, however they are not in competition. They work well together because they seek to help you with different things. As mentioned in [], Swiss Cheese seeks to help you decide what to test, while the Test Pyramid seeks to help you determine how to implement your tests.

When one understands when to apply models, you can start collecting them like tools in a toolbox. Being able to leverage a variety of models helps you be flexible and work in a variety of situations.