Promoting an environment where employees feel safe to fail

NOTE: the original version of this post can be found here.

A couple of weeks ago, while I was at the office, I received an email from a fellow developer. He sent it to the whole team.

Recently I screw up […] and would like to explain what happened.

TL;DR: if it ain’t broke don’t fix it.

This is completely my fault and I worked almost solely on this.

[…] The conclusion is: if you are not completely sure how something is working and moreover how complex and uncertain this something is, do not touch it at all. (Or if you have to, triple check all the possible cases by four eyes, and try to cover it with the tests first).”

The content of this email made me think about how failures can be a very bad experience for an employee, especially if a company doesn’t promote a culture where everyone can feel safe to run experiments and fail without being blamed for that.

But at Rocket Labs, luckily, we are firmly convinced of the exact opposite idea: every failure counts just as another occasion to learn and improve, and no one should feel as the only and unique responsible or be blamed for what happened. Teams are there for a reason, and when you work in a team, is the whole team that succeeds, fails and recovers. Not the single persons, but all together.

The following is my reply to the email. I’m happy to share it and I’m looking forward to reading your comments about how you deal with similar situations in your company.

TL;DR: if it ain’t broke, but it sucks and we gain value by improving it, improve it. Always.

Hello […], thanks for your email and, above all, for your good intention to lower technical debt where it was for sure required since long. Personally, I really appreciate whom cares about our project and wants to keep it in good shape.

I think that the consequences of your development effort only derive from a lack of a careful strategy and an only partial knowledge about the complexity of the feature. It’s not specifically your personal fault if some problems popped up in production. True, we can always be more careful, looking back at what we did, but in some cases we just can’t foresee what will happen. Why? For the same reason that made you decide to refactor: technical debt that leads to unpredictability. The only thing that we can do in that case is to gather together and get rid of the problems, as a team (not only yours, I’m talking about everyone of us).

I, myself, once screwed up something very badly in our project, but I still firmly think that that was the most precious lesson I learnt since when I’m in our company. And yet you will never hear me say “this piece of code is awful and uncontrollable, but let’s not refactor it because it works”. This way of thinking is as dangerous as can be the technical debt itself. I’m not saying that we are allowed to be reckless, but doing nothing for improving our code won’t bring us anywhere, it will just makes us slower, more vulnerable to bugs, less capable of developing features. In a nutshell, it makes us lose time and money.

When one team decides to do some refactoring (because this is a team decision, mind that, which is taken in agreement with the PO and other devs during the planning), they just need to be extra careful on the steps to be taken in order to to accomplish it, maybe talking with whom developed the feature originally, using a feature switch to gain the possibility to immediately go back to the legacy logic flow, deploying the changes on a staging environment to check that everything is working there (e.g. migrations), make the whole team test the refactoring and review the code, deliver small commits to limit possible damage, write huge acceptance tests that will be removed once the refactoring is done in favor of smaller unit tests, or a combination of everything.

Moreover, always prefer to go and talk to people instead of reading some documentation. As you already learned, this can be easily outdated.

Having said this, my conclusion (which strongly disagrees with yours): if you are not completely sure how something is working and moreover how complex and uncertain this something is….

…be smart enough and take the necessary steps to avoid any damage or limit it as much as possible, then improve it. I’m 100% sure that everyone of us can do that, supported by her/his team and using the right techniques.

I’m looking forward for the next refactoring. And if the author(s) will request it, I will be very happy to help.