Sagas Pattern + Property Testing = ❤️

A while ago I read this article about Sagas Pattern and its Elixir implementation: Sage. It immediately got my attention, as a nice way of error handling. I’ve explored error management (from a different perspective) in the Functor Flavoured Pipes in Elixir article. So I was curious trying some new perspective.

As I could not find any complete example implemented with Sage, I decided to build my own. It soon became interesting enough to share it here.

Again, what is this Sagas Pattern?

Well, for this answer I will point back to Andrew’s article: https://medium.com/nebo-15/introducing-sage-a-sagas-pattern-implementation-in-elixir-3ad499f236f6. He does a great job of explaining the concept and the terminology. And he’s the main Sage project contributor.

In our current exercise, we will focus only on the implementation.

We will use notions such as:

  • step — a pair of transaction and compensation
  • transaction — what we want to achieve in the current step. Eg. make a payment. This can end up with success or error
  • compensation — amends the effects of a failed transaction. As we will see, they can deal with the failed step’s transaction. Eg. a failed payment triggers an email alert. Or it can deal with a transaction failing in the future which can trigger for example a payment refund.
  • effect — the positive outcome of a transaction

What are we building?

We’ll build an online custom bicycle shop 😆. Well, a very basic one. It will be (surprisingly) called Bikex.

There’s a single action: order_bike. This will trigger two calls to our external suppliers of brakes and tires. With those succeeding, the payment is done through our payment provider, then sending a confirmation email to the customer.

This would be the happy path, and we’ll build the failure management on top of that.

Specifications:

  • there will be no web interface. For our example, we will mostly interact with our app through tests.
  • all external APIs are mocked
  • no database. It would have been interesting to explore also the persistence layer. But it would be too much for one article
  • we will use PIDs as identifiers for order numbers, payments etc. Every external provider mock will have its own process, so the PID will be enough for our simple exercise

Now we know what we want to build, let’s start it!

mix new bikex

External services

We create a Mock namespace for all services that our app is interacting with. In this namespace, we will mock those services.

Server

We build a generic server for all the external services. Its purpose is to take a list of predefined responses and keep the state of the last returned response. We want to track how the app behaves in all possible circumstances.

For example, one of the providers may have the following responses for the order action:

%{order: [{:ok, :confirmed}, {:error, :no_reponse}, etc.]}

For our experiment, we assume one call and 2 retries for some servers in case of {:error, :no_reposne}. More details on this later, in the implementation. For now, it’s important to know why we provide a list of responses and not just one.

Each external service will start an instance of this server.

It’s a GenServer holding in its state the last provided response and a list of remaining responses organised by type.

It can handle 2 actions:

  1. a request, eg. :order, :cancel_order . Each time this function is called, it will “consume” one response from the specific list
  2. return the last response

It will make more sense as soon as we start to use it.

Brakes Supplier

We need now to prepare the interaction with our first supplier. For the moment it implements a single function: order/1, taking the PID of the brakes supplier server as argument.

Now we can use it like this:

iex(1)> {:ok, brakes_pid} = Mock.Server.start_link(%{order: [{:ok, :ordered}, {:error, :out_of_stock}, {:error, :no_response}]})
{:ok, #PID<0.274.0>}
iex(2)> Mock.BrakesSupplier.order(brakes_pid)
{:ok, %Mock.BrakesSupplier{brakes_order: #PID<0.274.0>, state: :ordered}}

As said above, we will use the PIDs to keep track of the orders, payments etc.

Do we need all that setup?

Well, not really. There are a few alternatives we could choose. You can totally skip the implementation of the server and suppliers. You can create a behaviour and the supplier calls directly inside the tests. The Mox library is a really good candidate for this kind of approach. If Bikex would be a real project, probably that would be my go-to approach.

Yet, for our example, I preferred to implement the server itself. This way you can experiment with the Saga outside of the test as well.

Property Testing

Let’s stop now for a few moments and think how are we going to test our scenarios. There will be so many cases and combinations of success and failures. How can we make sure we covered all of them?

Property testing may be the right answer here. We will “delegate” all those worries to the test itself. If we miss something, the test will eventually catch the case and will fail.

As with Sage, I will not insist on the property testing concept itself. We will use the stream_data library. You can find more details on stream_data and property testing even in the official elixir blog https://elixir-lang.org/blog/2017/10/31/stream-data-property-based-testing-and-data-generation-for-elixir/

Implementation

Let’s start the implementation and the first property test of our app. We need stream_data and sage packages for our example and need to add them in the mix file.

Ordering Brakes

The actual Saga implementation for ordering the bike brakes:

Many things happening here. The main function order_bike/1 takes the providers pids list as the only argument. The Saga starts with Sage.new() function, runs some steps and is executed with Sage.execute/2. The argument given to the execute will be passed to all transactions and compensations as attrs.

Each step we run has a name. The current one is :brakes. It is used to identify the effects and errors resulting from this specific step.

Apart from the name, Sage.run(:brakes, &brakes_transaction/2, &brakes_compensation/4) has a transaction and a compensation function. Of course, those can have more meaningful names but, for our example, those are easier to follow.

Transaction functions take 2 arguments:

  • effects so far — quite self-explanatory
  • attributes — we provide in the execute/2 function

And should return:

  • {:ok, effect} — the Saga will proceed with the next transaction
  • {:error, reason} — the compensations will be executed for all the steps run up to this point. Unless the compensation is instructed to retry the transaction and the cycle restarts. But we will speak more about this

Our above brakes_transaction/2 calls the BrakesSupplier.order/1. This returns either {:ok, _brakes_order} or {:error, _error}. It’s the same thing the transaction needs to return.

Compensation functions take 4 arguments:

  • effect to compensate — the result (effect) of the respective transaction
  • effects so far
  • error — as a tuple with the stage name that generated the error and the error itself. Eg in our case: {:brakes, {:brakes, :no_response}}
  • attributes

And should return:

  • :ok — the Saga will run “backward”, following the compensations path of the executed transactions. There are some really good diagrams in the Sage blog article that explain how this works.
  • :abort — same as above but ignores retries defined in previous compensations
  • {:retry, opts} — retries a specific transaction. In the options you can define the number of times the transaction should be retried
  • {:continue, effect} — continues the Saga with a default effect, or one generated in the compensation

In our brakes_compensation/4 above we pattern match on the error type. If we have no response from our provider, we retry 2 times before moving to the next compensation (if there is one). But in our case, being the only compensation, it will return the result.

The other brakes_compensation/4 catches all other errors and returns :ok. This happens for example if the brakes are out of stock. We have no effects to compensate yet, as our only transaction failed.

Time to test the brakes!

First thing first! What kind of response is expected from our BrakesProvider to an order request? Let’s assume there can be 3 cases:

{:ok, :ordered}
{:error, :out_of_stock}
{:error, :no_response

Our property based testing must randomly return one of those responses each time the BrakesProvider.order/1 is called. But we decided that each {:error, :no_response} will be retried 2 times. In such case, we want to prepare a list with containing the next responses after the failed one. We can do that with StreamData member_of/1 and list_of/2 functions.

We generate lists of 3 random responses. The test runs 10000 times, way more than we need it, but we want to be sure we catch all responses combinations.

We start the brakes provider process with the generated responses. Then call the order_bike/1 function, assert all possible results and the server response.

The nice thing about this property testing is that even if you miss one or more cases, the test will catch it and will error. So you will be able to add the missing case to the test suite.

Ordering Tyres

Ordering tyres would be very similar to ordering brakes. But we want to introduce some level of complexity:

  • as those two actions are not linked to each other, they can run asynchronously. Sage provides this functionality with run_async/5 function.
  • if one of the orders succeeds and the other fails, we need to cancel also the successful one.

Finally, the :retry will not work properly in the async steps compensations. So we will need to change a bit our solution. Here I see 3 options:

  1. if we absolutely need the retry for ordering brakes and tyres we could run the steps sequentially and not async. The retry will work without any issues.
  2. we could add an extra step before async-ordering brakes and tyres. This extra step should produce no effects. Its compensation should check if the error is :no_response and retry. But beware you will end up with loads of edge cases as the Saga grows
  3. keep the order steps async, but will not handle the retry in the compensation. For our study case, we’ll use this option. (In such case you could implement your own retry inside the compensation function if needed)

This is what our Saga will look like now:

The ordering tyres step will be identical, so I will omit it. But you can always check the full implementation on Github. We added the :tyres step and run both order steps async.

The other important thing happens in the compensation where we check if a brakes order already exists. If it does, we attempt to cancel it. Sage handles the failed transactions, but not the failed compensations. There would be many ways to handle a failed compensation. Here we log an error, indicating that manual action is required for the brakes order.

In the property test, we generate responses for the order and cancel actions, for both brakes and tyres. We start both processes and check the ordering results.

We check for either

  • the success case — when all orders were placed correctly
  • error — when something went wrong

Of course, we could assert the errors in much more details. But for this example, we check if an error occurs, all orders were canceled (or at least attempted to) by checking the last server response.

Payment

If ordering the brakes and tyres from our suppliers was successful, it’s time to pay for your new bike. For this, we’ll use a payment provider. The responses from our payment provider can be:

  • {:ok, :paid} — we continue the Saga
  • {:error, :no_funds}— stop the Saga and run the compensations. At this point, the Saga implementation starts to pay off. The already defined compensations will take care of canceling the brakes and tyres orders
  • {:error, :no_response} — same as above, but we will retry the payment 2 times before failing

Not that many things happening in our test file. We generate the states for the payment, start the payment process, and add payment result to the assertions.

Confirmation Email

Time for the last step in our Saga. What makes the bike order confirmation email special to include it in our example? Well, it’s a bit different from previous steps. Even if it fails, we don’t want to run the whole compensation chain and undo all the effects. We will simply alert that manual action is required and conclude the saga. The confirmation email is not critical and can be manually sent later on if the step fails.

If the email sending fails, the saga just continues with a generated effect: %EmailProvider{ref: nil, state: :not_sent}. Again, we will not insist on the implementation of the EmailProvider.

Sending or not the email will still result in a successful Saga, so we change the test accordingly:

We check both the email transaction effect or the one generated by the failure.

Saga’s Outcome

The bike ordering Saga is now ready. As we can see also in the test file, the result can be either success, {:ok, _last_effect, _all_effects} for example:

{:ok, %Mock.EmailProvider{ref: #PID<0.27384.1>, state: :sent},
%{
brakes: %Mock.BrakesSupplier{brakes_order: #PID<0.27381.1>, state: :ordered},
email: %Mock.EmailProvider{ref: #PID<0.27384.1>, state: :sent},
payment: %Mock.PaymentProvider{payment_order: #PID<0.27383.1>, state: :paid},
tyres: %Mock.TyresSupplier{state: :ordered, tyres_order: #PID<0.27382.1>}
}}

or error: {_error, _error} for example: {:error, {:payment, :no_funds}}.

It’s very easy to pattern match on the results and use the effects or react to the errors.

Conclusion

Quite a long “saga” for us as well. There would be a lot more to explore about both sagas and property testing. And I’m sure there are better and smarter ways to use both libraries. Yet, I found it a fun exercise and a good combination of concepts.

Used in the right scenario, Sagas can elegantly handle cases that otherwise would require a lot of complicated code, nested “cases” and “with statements”.

I am very curious to find your opinion on this. Do you think the Saga pattern can solve some of the pains you faced while working on your projects? Have you already integrated some property testing in your Elixir applications?

The full Bikex code is available here:

https://github.com/iacobson/blog_bikex

elixir dev | dorian.iacobescu@gmail.com | @iac0bs0n