Towards a Self-Healing Organization

Hacker News LinkedIn Facebook Twitter by Abby Fichtner

“Failures are a natural byproduct of venturing into the unknown.” – Stephen P. Robbins

In 1923 when Babe Ruth broke the record for most home runs in a single season, he also led the league in the highest number of strike outs. Thing is, nobody cares about the latter. He’s remembered for his successes, not the failures he incurred on his way there.

There’s been a trend in technology to stop treating failures as critical, emergency exceptions to be avoided at all costs, and to start treating them as a natural and expected part of operations.

While the approach feels backwards — perhaps even unethical to be releasing products that can fail to customers — it’s had a hugely positive impact on the technology available to us. It is, for example, a key principle behind how Cloud technology works. And the way today’s top internet apps are built.

It turns out that designing systems to automatically detect and handle failures is significantly less expensive then going to extraordinary lengths to try to prevent any errors from happening in the first place. It means the time that would have been spent in prevention can now be spent creating more value. It means happier, more productive employees who can spend their time creating new features instead of fire fighting problems in old ones.

And the result, gloriously, is a vastly improved experience as customers are rarely exposed to issues in a system that’s able to automatically and seamlessly handle them “under the hood” without their notice.

I see a parallel in how our organizations handle “failures” in everything going according to plan. We spend more energy ensuring that we’re on track with plans then we do monitoring our environment for changes.

We run our organizations with the assumption that everything will go as expected. That our plans will work out as anticipated. We’ll deliver what we planned to deliver. Sell what we planned to sell. That our markets, customers, partners, competitors, employees will continue operating as we expect.

We deliver our new product, three years in the making… and nobody wants it because the world has changed while we were busy creating it.

We develop an update for our existing product, but it doesn’t match changing customer expectations and we lose market share.

A competitor comes out with a new version of their product and suddenly our customers are demanding equivalent features that we hadn’t even considered.

A company from a different industry blindsides us by entering our market with a product that effectively renders ours obsolete.

New players create a tighter market for talent causing us to lose key employees and miss our profit estimates due to salary and benefit increases to bring our other employees up to the new industry norms.

Companies in our supply chain go out of business or raise their prices. Financial, societal, and political shifts change our customer’s priorities, needs, and interests… The assumption that all will go according to plan makes as much sense as assuming that technology will never fail.

What would our organizations look like if we flipped this mindset on it’s head? If we designed our organizations with the understanding that change and unexpected occurrences were the norm, rather than the exception? And, with this understanding, engineered them to be self-healing rather than flawless.