It’s often expedient to discuss the “happy path”, which is the ideal or most simple flow of logic through a system. While it’s a great tool for conversation and identifying requirements, I’ve found it more and more problematic when thinking through actual implementations, especially when there are distributed systems involved. It’s often better to plan and design for all paths from the start.
You may have read the Fallacies of distributed computing before, but it’s hard to truly appreciate them until you’ve lived them. In particular the first two fallacies: “the network is reliable” and “latency is zero”. I’ve found these to be so untrue that it seems negligent not to account for them.
The other thing I’ve realized is that you cannot address these issues with entirely technical solutions. You have to account for these realities in the design of your product. This is where we get into trouble by focusing on “happy path”. Any discussion of a happy path must immediately be followed by a discussion on how the product will behave in the face of poor network conditions.
Let’s take a very common feature of a lot of websites: purchasing.
A typical flow would be:
- User selects products to purchase
- User provides billing information
- System charges user
- System shows user a confirmation
You’d also have an obvious “exceptional flow” where their card is declined:
- System charges user, but charge is declined
- System shows user an error message
- User updates billing information and tries again
Both of these flows are the “happy path” and can be naively implemented using straightforward, synchronous code:
class PurchasesController
def create
customer = current_user
order = customer.orders.find(params[:order_id])
result = MegaPayments.create_txn(id: customer.id,
order_number: order.id,
amount: order.total.to_s)
if result.declined?
flash[:error] = "There was a problem charging your card, please update your billing information"
render :new
else
redirect_to orders_path(order)
end
end
end
Unfortunately, we’ve fallen victim to the fallacies of distributed computing and thus haven’t accounted for all possible outcomes and thus have a sub-optimal design for our product. For example, what if we time out talking to the payment processor? In that case we have no idea if the customer was actually charged. A time-out is a degenerate form of slowness, so another outcome could be that the payment process is really slow and the user reloads or retries after waiting too long. What happens then?
Because of our naive product design and matching implementation, we create a situation where our customers can overcharge themselves.
Before writing a line of code, we needed to ask: “What if the payment processor is slow or unresponsive? What should the user experience be in that case?”
This isn’t some rare “edge case” or an exception we can ignore until it’s a problem. This will happen, and the more customers we have, the more often it’s going to happen. The thing is: it must be handled in the design phase, not as some bugfix later on. And how it’s handled will have a fundamental effect on your implementation. In any case, your code will never be as simple as the naive solution above—working with distributed systems means you will always have to manage the complexities related to the fallacies of distributed computing.
Suppose the product designer wants the customer to be shown the results of their charge, no matter what. That means you need to add a lot of code to ensure a customer has only one charge in-flight at a time, and you need a user interface that can account for that, as well as an interaction model that can wait for completion, check for in-flight charges, etc. etc.
Suppose instead the product designer wants the customer to see the results if they are available quickly, but otherwise show them a generic message if it’s taking too long? This changes your implementation. You now have to keep track of charges where a user hasn’t seen the result, and you need a way to handle a declined charge that the customer never saw.
These implementations may lead to further questions. Suppose we decide on a design where we background the communication to Mega Payments. We create some AJAX code on our front-end to check to see if the payment has completed.
We now have a new issue to design around: what if the front-end can’t reach the back-end to check up on the payment? What if the user navigates away from the page? What should happen when they return?
Again, these are product design issues, not implementation details. As a developer, you have to think through these possibilities and present them to the designers so they can sort out what needs to happen. Since the designer isn’t naturally going to understand how the implementation affects the overall feature in this way, you, the developer, are the only one to make sure this thinking happens.
For the next thing you work on, what are the edge cases? How will the product design handle them? And what edge cases will those decisions reveal? Do this enough and you’ll stop seeing the happy path and edge cases, because they aren’t there.