Adventures in Engineering

Enums are schema too

Backwards compatibility is one of the most crucial factors of when and how to release a new feature. Maintaining a stable API while constantly improving your service is critical – you don’t want to be breaking your API clients every second Tuesday. In this simple example I want to share how fields that are enums (commonly status fields) are crucial parts of your API stability and how changing them is a terrible idea.

The API

We’ve all had to keep track of expenses at some point or another (and to varying degrees of accuracy). I for example, keep track of the transactions and later pair that up with the matching receipt.

In this hypothetical API I’m going to build out on my transaction list to be able to add OCR matching of my stack of receipts.

To start with, my transaction might look a little like this:

{
  "id": "9c0e7c57-c8af-4a77-b6ca-4eaa36a51fcc",
  "value": 12.50,
  "description": "Nook Cafe"
  "status": "open",
  "date": "2017-07-14"
  "comments": "Bought the boss a caramel soy latte"
}

I keep the status as “open” (I need to look at this one), “paired” (I’ve found the receipt for this) and “submitted” (I’ve sent the receipts for this transaction to my accountant). The important thing here is that “status” isn’t just an enumeration of potential values, its a representation of state within a wider workflow.

When I come to add OCR to this process, I want to be able to store a “autopaired” status so that I know the process was automated for me. Perhaps I want to quickly sense check the receipt to make sure its the right one.

So how do I add “autopaired” to the API?

Changing status

The first option is to change the possible values of the “status” field. To start with, its probably quite simple to add a state to the status enumeration. In doing this we’d end up with both “paired” and “autopaired”. It’s simple, clear and quite logical.

Workflows would be either open => paired => submitted, or open => autopaired => submitted.

This is however, an utterly terrible idea! This changes the implicit meaning of “paired” and in fact, in the case of auto-paired receipts completely removes the “pair” state from my transaction altogether. For example, what if finance had a dashboard showing all un-approved reports?

Perhaps something like this?

receipts_to_submit = 0
for txn in transactions:
  if txn.status == "paired":
    receipts_to_submit++
print("You have %d transactions that can be sent to your accountant.")

You may argue this is fragile code. I’d wholeheartedly agree. Now lets back slowly away from semantics and back to building a great API.

If we change “paired” to “autopaired” then this client code breaks. This is arguably true for all enumerated values. State transitions are as core to your API as the data that represents them.

State transition history

Ok, so we can’t change the status field because this breaks the field definition. Perhaps we could think of this as an attachment and store a history of the behavior. An API resource might then look like this:

{
  "id": "9c0e7c57-c8af-4a77-b6ca-4eaa36a51fcc",
  "value": 12.50,
  "description": "Nook Cafe"
  "status": "paired",
  "date": "2017-07-14"
  "comments": "Bought the boss a caramel soy latte",
  "history": [
    {
      "time": "2017-07-14 10:05:01+1000",
      "action": "ocr-bot autopaired receipt based on date and value"
    }
  ]
}

Awesome! In this model we keep the status the same but we also share details about changes to the transaction over time. Not only this, but an API client can deduce a lot more about the action by parsing its contents.

Simply re.match() on the action field and …

Regex…

Right, so we’re condensing structured data (about the action) into a string field. An action may be better described as:

{
  ...
  "action": {
    "actor": "ocr-bot",
    "behavior": "autopair",
    "reasons": [
      "date",
      "value"
    ]
  }  
  ...
}

This is a great solution, its structured and a logical representation of the behavior behind the scenes except we’re over complicating the solution. A full set of history with every action and data change to a resource feels more like an audit. Thats great to have but it feels like we’re completing a lot more work just to be able to add our OCR magic to the mix.

Multiple states

Ok, so we can’t redefine the enum and we don’t want to dive down the rabbit hole of building a full history of the transaction and who changed what. So what do we do?

Lets briefly revisit the workflows involved: 1) I create a transaction recording that I spent some money 2) Later, something pairs that transaction up with some paperwork I have 3) I send the paired transaction off to my accountant.

In step 2) however, I can either auto-pair or manually pair up the receipt. And these two actions aren’t mutually exclusive. An OCR paired receipt can be replaced and paired by myself, and, potentially the other way around too.

We’re confusing the workflow state, with the action state.

So lets add a field instead:

{
  "id": "9c0e7c57-c8af-4a77-b6ca-4eaa36a51fcc",
  "value": 12.50,
  "description": "Nook Cafe"
  "status": "open",
  "paired": null,
  "date": "2017-07-14"
  "comments": "Bought the boss a caramel soy latte"
}

This is perfect and quite flexible for a few reasons.

The status of the transaction in my overall workflow is maintained. The process of pairing a transaction to a receipt is stored and represented clearly (true, “autopaired” or perhaps a URI to the stored receipts – whatever we like).

We’re free to add extra workflows around this transaction as need be.

Flattening state transitions.

In this example we’ve taken what would have been a new ‘status’ in a workflow and flattened it out into a new field to achieve backwards compatibility. Should we change all enumerations for statuses to fields? Definitely not!

In this example we’re adding a new way to achieve a status, but our initial approach confused this with adding a new status. By adding a new field we’re maintaining a clear definition of what the status field represents while also providing data about the new feature we’re launching. Further down the track, we might add a receipts field that links to the receipt documents themselves. We could add an approvals process, or perhaps a way to track reimbursements for this transaction. Those might be workflows, but they’re not changing the transaction.

There comes a point at which the overall workflow and state representation for this resource (a transaction) needs to be rethought from first principles (perhaps when we add approvals and reimbursements). At this time, then there’s an opportunity for a new version which would allow us to make breaking changes to the API and perhaps enumerate all workflows into a super global status field. But to get this feature out the door and into the hands of clients, today is not that day.

Coda

This post is an example of reducing scope creep, removing breaking changes and representing new processes on an existing resource. Please use it as an example – your own state transformations are no doubt much more elaborate than this. Enumerations ARE schema, and the format and value of your data matters just as much as the schema with which you use to represent it.

developerjack