Another unhappy wording case that leads to confusion and a lot of headaches for developers and maintainers
We have a broken term. It’s microservices.
Since when it came out in 2014, the most widespread and accepted definition of a microservice has been, I paraphrase, “a fatty function autonomously deployable with a thin, sometimes implicit communication layer around”.
A whole ecosystem along with tons of literature has been developed to support the underlying concept.
The stress was on the incredible advantage you get by deploying your tiny piece of code autonomously and decoupled from every other. Suddenly, every team can work independently and deliver with simplicity, protected by their microservice boundaries.
But it this the truth? After all, this is the promise that has been sold along with the package.
“A microservice is a fatty function autonomously deployable with a thin, sometimes implicit communication layer around” — The average internet
Unfortunately, turns out that things are more complicated than that. Let’s give an example.
Suppose you have a user email registration flow that accepts a request via an HTTP api and stores the submitted data into a database table.
The common microservice way to approach this scenario is to split the responsibilities between two, sometimes three microservices.
- The HTTP api microservice, responsible for accepting, formally validating and forwarding a registration request.
- The user email registration microservice, responsible for guarding business rules such as email uniqueness.
- The user email persistence microservice, responsible for saving the newly created user email to the database.
I’ll stop here for the sake of simplicity, but we already have three microservices for our scenario.
Now my question:
What happens when we have new requirements such as “for a registration to be successful, a request must contain both a valid email and a username”?
Suddenly, we find ourselves orchestrating the update and the deployment of our three microservices altogether.
Let’s understand why.
CASE 1. We update the user email registration microservice (#2) with the new requirement first.
Our registration flow immediately stops working: depending on the implementation, #1 may or may not break because of an unexpected field in the request, but it won’t definitely forward it to #2, which will consequently reject every incoming request.
CASE 2. We update the HTTP api microservice (#1) with the new requirement first.
#1 starts sending both user email and name to #2. Depending on the implementation and the communication protocol, #2 may or may not reject the unknown request format. In the best case, only the email is considered and forwarded to #3. The service is not delivering what the API promises.
CASE 3. We update the storage microservice (#3) with the new requirement first.
#1 may or may not break because of an unexpected field in the request, but it won’t definitely forward it to #2. Depending on the implementation, #3 may or may not save the user data. Worst case happens when we enforce a non-empty constraint on the user name data representation while deploying the new version, in which case the storage engine will reject any incoming request.
How do we solve the problem? As said before, we need to orchestrate multiple deployments. This can be done in multiple ways, one being:
- update #1 so that it accepts an optional user name along with the email, but do not forward it yet to #2
- update #2 so that it accepts an optional user name along with the email, but do not forward it yet to #3
- update #3 so that it accepts and stores an optional user name along with the email
- update #1 so that it only accepts a request with both user email and name, then forward it to #2
- update #2 so that it only accepts a request with both user email and name, then forward it to #3
- update #3 so that it only accepts and stores both user email and name
Wow, 6 updates, 2 on each microservice. Isn’t this defeating the autonomous deployability promise? Looks like it.
The example was trivial, and yet we found out how complicated it is to rollout a simple requirement like that.
Do you smell the same as I do? Being unable to easily update your software, everything is tangled to everything, software breaks even if you formally haven’t done anything wrong…yes! It’s a monolith. And now it’s a distributed one.
This tweet by Kelsey Hightower says it all:
2020 prediction: Monolithic applications will be back in style after people discover the drawbacks of distributed monolithic applications.
— Kelsey Hightower (@kelseyhightower) December 11, 2017
A microservice architecture is not an excuse for shrinking as much as you can the boundaries of your deployable units. Doing so will only create a tremendous overhead in orchestrating the development and the deployment of your project. There are even further drawbacks in being able to end-to-end test your application when you have a microservice architecture, but this would take far too long for the purpose of this post.
What should one do, then?
The solution to this problem is simple, but not easy at all: one should find the right boundaries to their m̷i̷c̷r̷o̷ services.
Why it is not an easy task? Because boundaries are a moving target and not visible at all when you start looking for them. A huge part of the speculation around Domain Driven Design is exactly about how to find the boundaries of your systems.
But fear not, some heuristics and tricks apply here and are definitely helpful in the quest for finding the perfect service “size”.
- if two components need to directly talk to each other (in a request-response) fashion, keep them together in the same service;
- if you identify an upstream-downstream relation between two components, you can separate them in two different services;
- upstream services can announce what happened by publishing events into a pub-sub system;
- downstream services will listen to the published events to which they are interested and consequently react;
- published events should contain only details relevant to what happened, but enough data to be useful to receivers. You decide the balance here;
- a team should not depend on another one in order to deliver a feature end to end; modify your service boundaries accordingly if you find yourself in the opposite situation (see Conway’s Law and its postulates);
- software responsibilities are different from team responsibilities; a team can be responsible for both the backend and the frontend part of a service, but these two components can be deployed independently; the team is responsible for deciding the internal architecture of the service;
- if you are starting greenfield, go for a monolith / single repository architecture and be simply careful to decouple your internal modules one from each other. It’s way easier to decompose a monolith later than aggregate multiple microservices (or at least developers are more experienced in doing the former);
- last but not least, a general suggestion: be agile, do experiments. If something works for you and doesn’t hinder you, go for it. Otherwise, be brave enough to acknowledge the failure, learn from it and change.
Distributed architecture rule of thumb: keep together what requires request-response. Separate what can follow a pub-sub model.
— 🇺🇦 Nicolò Pignatelli (@nicolopigna) December 7, 2017