December 13, 2016 ·

Aggregate Failures in RSpec

Checking multiple attributes after exercising a method is a common test pattern. The problem is this seems to break a core guideline for testing: one spec has one expectation (http://betterspecs.org/#single). But the key point to remember is that specs should verify a single idea.

It is bad testing two completely different expectations in the same spec. A spec proving a method is called with correct params should not also verify the response; they should be two different specs: “it is called with correct params” and “it receives success response”.

However, the ambiguity of the single expectation guideline comes when testing multiple attributes on the same object to verify state. A refund spec needs to ensure that the status is successful and that the amount is correct. These cases do not violate the guideline because they are verifying a single idea: the response object has correct state.

A reasonable boilerplate for imminent code examples

We’re going to look at refunding a charge using the Stripe stripe-api gem. The Stripe::Refund.create method is called with the id of the charge to be refunded. Stripe then returns an API response wrapped in the Stripe::Refund object.

Stripe::Refund.create(charge: "ch_18vE1uGt3Swi8TrvRRvhDaaV")

#<Stripe::Refund id=re_19PYK0Gt3Swi8TrvrZfMcNJY 0x00000a> JSON: {
  "id": "re_19PYK0Gt3Swi8TrvrZfMcNJY",
  "object": "refund",
  "amount": 100,
  "charge": "ch_18vE1uGt3Swi8TrvRRvhDaaV",
  "created": 1481466540,
  "currency": "usd",
  "status": "succeeded"
}

The boilerplate below is assumed for each upcoming example to keep the individual specs focused. Just remember that refund is the Stripe::Refund response.

require "rspec/autorun"

describe Stripe::Refund, ".create" do
  subject(:refund) { Stripe::Refund.create(charge: "ch_18vE1uGt3Swi8TrvRRvhDaaV") }
  # specs will go here
end

To simulate failing specs for all examples, assume the refund response is returning the unexpected attributes amount: 0 and status: "failed" along with a valid id: "re_19PYK0Gt3Swi8TrvrZfMcNJY".

NOTE: There is some hand-waving. A charge needs to be created before each refund spec. You would probably also want to use something like the VCR gem to cache responses so you are not constantly hitting the API.

Example #1: The common way of verifying multiple attributes

Browse any Ruby codebase and you are likely to see a spec like this.

it "returns success response from stripe" do
  expect(refund.amount).to eq 100
  expect(refund.status).to eq "succeeded"
  expect(refund.id).to start_with "re_"
end

There are separate expectations for each attribute. This works, but there is a catch; as soon as an expectation fails, the spec will abort.

F

Failures:

  1) Stripe::Refund .create returns success response from stripe
     Failure/Error: expect(refund.amount).to eq 100

       expected: 100
            got: 0

       (compared using ==)
     # refund.rb:19:in `block (2 levels) in <main>'

Finished in 0.04063 seconds (files took 0.17161 seconds to load)
1 example, 1 failure

Failed examples:

rspec refund.rb:18 # Stripe::Refund .create returns success response from stripe

Notice there is no mention of the status being “failed”. This is because the RSpec does not make it this far; it stops looking when the amount does not match.

Example #2: The simplicity and bloatof the have_attributes matcher

RSpec provides a matcher for checking multiple attributes on an object in a single assertion: have_attributes.

it "returns success response from stripe" do
  expect(refund).to have_attributes(
    amount: 100,
    status: "succeeded",
    id: a_string_starting_with("re_")
  )
end

This clarifies what is being tested. Instead of three expectations, a single object is examined to ensure correct state. However, there is a tradeoff. While the spec is reduced to a single expectation, the RSpec failure output is much harder to read.

F

Failures:

  1) Stripe::Refund .create returns success response from stripe
     Failure/Error:
       expect(refund).to have_attributes(
         amount: 100,
         status: "succeeded",
         id: a_string_starting_with("re_")
       )

       expected #<Stripe::Refund amount=0, status="failed", id="re_19PYK0Gt3Swi8TrvrZfMcNJY">
       to have attributes {:amount => 100, :status => "succeeded", :id => (a string starting with "re_")}
       but had attributes {:amount => 0, :status => "failed", :id => "re_19PYK0Gt3Swi8TrvrZfMcNJY"}
       Diff:
       @@ -1,4 +1,4 @@
       -:amount => 100,
       -:id => (a string starting with "re_"),
       -:status => "succeeded",
       +:amount => 0,
       +:id => "re_19PYK0Gt3Swi8TrvrZfMcNJY",
       +:status => "failed",

     # refund.rb:25:in `block (2 levels) in <main>'

Finished in 0.02815 seconds (files took 0.17159 seconds to load)
1 example, 1 failure

Failed examples:

rspec refund.rb:24 # Stripe::Refund .create returns success response from stripe

Way more noise; way less helpful. There is a lot of repetition and it is not immediately apparent what went wrong.

Example #3: Testing multiple attributes using one-liners

So maybe this is where single expectation specs come to the rescue? Example #1 only showed a single failure although there were actually two. Example #2 showed all failures but made it less apparent what went wrong due to noise.

We’ll now create a spec for each attribute.

context "when successful" do
  specify { expect(refund.amount).to eq 100 }
  specify { expect(refund.status).to eq "succeeded" }
  specify { expect(refund.id).to start_with "re_" }
end

NOTE: The specify keyword is just an alias for it. In this scenario, it reads better.

FF.

Failures:

  1) Stripe::Refund .create when successful should eq 100
     Failure/Error: specify { expect(refund.amount).to eq(100) }

       expected: 100
            got: 0


       (compared using ==)
     # refund.rb:33:in `block (3 levels) in <main>'

  2) Stripe::Refund .create when successful should eq "succeeded"
     Failure/Error: specify { expect(refund.status).to eq "succeeded" }

       expected: "succeeded"
            got: "failed"

       (compared using ==)
     # refund.rb:34:in `block (3 levels) in <main>'

Finished in 0.01369 seconds (files took 0.08941 seconds to load)
3 examples, 2 failures

Failed examples:

rspec refund.rb:33 # Stripe::Refund .create when successful should eq 100
rspec refund.rb:34 # Stripe::Refund .create when successful should eq "succeeded"

Running the specs prints the expected failures: 1 pass, 2 fails. Better in some ways, but there are downsides:

The error messages are misleading. They specify “Stripe::Refund .create when successful should eq …”, which infers refund returns different values in each spec.
A new refund is processed for each spec, needlessly slowing down the test suite.
It is not easy to see that all failures are for the same spec.

Example #4: Aggregating failure responsesfor surprise-free state checking

We now come back full circle to the first and most common way; writing three expectations in the same spec… with a twist.

it "returns success response from stripe" do
  aggregate_failures "stripe response" do
    expect(refund.amount).to eq 100
    expect(refund.status).to eq "succeeded"
    expect(refund.id).to start_with "re_"
  end
end

Notice the aggregate_failures method? This was added in RSpec 3.3. It prevents a spec from aborting after the first failure. All expectations inside the block are run and compiled into a useful summary.

F

Failures:

  1) Stripe::Refund .create returns success response from stripe
     Got 2 failures from failure aggregation block "stripe response".
     # refund.rb:45:in `block (2 levels) in <main>'

     1.1) Failure/Error: expect(refund.amount).to eq 100


            expected: 100
                 got: 0

            (compared using ==)
          # refund.rb:46:in `block (3 levels) in <main>'

     1.2) Failure/Error: expect(refund.status).to eq "succeeded"

            expected: "succeeded"
                 got: "failed"

            (compared using ==)
          # refund.rb:47:in `block (3 levels) in <main>'

Finished in 0.01508 seconds (files took 0.0881 seconds to load)
1 example, 1 failure

Failed examples:

rspec refund.rb:44 # Stripe::Refund .create returns success response from stripe

Aggregating failures give us some great benefits over the previous three examples:

A single failure indicating the response was incorrect.
Refund request only made a single time; fast tests!
All expectations run; no hidden failures.
Clean expected/got breakouts; less noise; more skimmable.

Aggregate failures with an RSpec tag

As an alternative to the block syntax, you can use the :aggregate_failures tag. This will aggregate all failures within the spec without changing the spec structure:

it "returns success response from stripe", :aggregate_failures do
  expect(refund.amount).to eq 100
  expect(refund.status).to eq "succeeded"
  expect(refund.id).to start_with "re_"
end