Plane Crash

Plane Crashes, Software Failures, and other Human Errors

Want to know if your team is effective? Listen to them.

We can learn a lot about team effectiveness through research that’s been done on teams whose work can mean the difference between life and death. Namely, operating room teams and airline cockpit crews.

The airline industry, which gets such a bad rap, actually has a phenomenal safety record. An airline like United only loses a plane from an accident about once in every four million flights.

Hospitals, on the other hand, which we tend view as safe, are the cause of 44,000 – 98,000 preventable deaths in the US each year. Even at the low end, this tops the death rate from things we actually know to fear like breast cancer (~40,000), car accidents (~40,000) or, of course, airplane fatalities (~120).

So, clearly, the aviation industry is doing better in this respect, and the healthcare industry is now actively working to adopt some of their training, known as Crew Resource Management, or CRM for short.

About 80% of all (airline) crashes are caused by human, rather then technical (mechanical) errors. And, in fact, many of these are the result of errors in communication. Okay, this makes sense for software errors, right? We all know how horrible everyone is at communicating with one another. But, it’s one thing when the risk of miscommunication is frustrated users. If we put ourselves in a situation where failing to communicate could result in “oh my God, we’re all going to die!!” then it seems we’d be able to do just a little better, right?

Let’s take the 1982 Air Florida crash in Washington DC that killed 74 of it’s 79 occupants plus 4 motorists on the 14th Street Bridge that it struck. The first officer tried several times to tell the captain that the plane had a dangerous amount of ice on its wings, enough to cause all of these deaths. And yet, listen to how he reports the problem:

Try #1: “Look how the ice is just hanging on his, ah, back, back there, see that?”

Try #2: “See all those icicles on the back there and everything?”

Try #3: “Boy, this is a, this is a losing battle here on trying to de-ice those things, it [gives] you a false feeling of security, that’s all it does.”

I don’t know about you, but when I picture myself in a situation where my life, and the lives of 78 people around me, is at risk, I’m picturing a very direct and crystal clear warning and if that doesn’t work, screaming, with potentially some jumping up and down and arms waving around. Not “ahhh, see all those icicles back there and everything?”

The captain clearly didn’t take him seriously either until at the end when the first officer finally says, “Larry, we’re going down, Larry,” and the captain responds, “I know it.” Crash. Boom. Bam. 78 dead.

But even more amazingly, what researchers found was that this exact problem was actually common in airline cockpit crews. A famous crash in aviation is the 1990 Columbian Avianca 052 crash that slammed into tennis champion John McEnroe’s father’s estate in New York, killing 73 passengers. The reason? Plane ran out of fuel. Ran out of fuel because it had to go further then expected? No, ran out of fuel while politely waiting for air traffic control at Kennedy airport to give it permission to land. Why didn’t the co-pilot just tell air traffic control they were about to run out of fuel? They were trying to be polite rather than pushy. Peace, love and happiness. Have a flower. 73 dead.

Korean Air Flight 801, almost the same exact situation as Air Florida. In trying to warn the captain of severe weather problems that would eventually lead to the deaths of 228 of the 254 people on board, the first officer says, “Don’t you think it rains more? In this area, here?” and “Captain, the weather radar has helped us a lot”

Captain, the weather radar has helped us a lot?! What are these people doing? They’re hinting at the impending problem, in hopes that the guy who’s a little busy with the whole “flying an airplane” or “trying to bring 99 planes circling Kennedy airport in for a landing” thing is going to catch on, read their mind, and solve the problem for them self. All without losing face. All without getting upset at the hinter for being too pushy or impolite. Wouldn’t want any hurt feelings, now would we?

The official term for this is “mitigated speech” and Malcolm Gladwell provides a fascinating account of how it has effected the airline industry in his book Outliers. He defines it as “any attempt to downplay or sugarcoat the meaning of what is being said.” and explains that “we mitigate when we’re being polite, or when we’re ashamed or embarrassed, or when we’re being deferential to authority.”

Amazingly, this mitigated speech actually explains a major source of airline crashes in the past, and since implementing CRM training to combat it, preventable accidents in aviation have decreased by 50%. And so now the healthcare profession is looking to benefit from it too, because, wow, 44,000 – 98,000 preventable deaths per year!

The crazy common point in both operating rooms and cockpit crews, the thing that makes this all so damning is that research into both fields has shown that errors occur most often when a senior, experienced person is performing. In other words, even in these highly trained professions, it is not just about being skilled or knowledgeable enough – in fact, the more skilled you are, the more it can hurt you!

» In surgery, errors occur most often with experienced surgeons, typically resulting from a breakdown in communications such as a resident failing to effectively communicate vital information (ahh, see all those icicles?

» In commercial airlines, planes are flown with 2 pilots –- the captain and a first officer –- who split the flying duties equally. And yet, crashes occur most often when the captain is flying. Why? “Planes are safer when the least experienced pilot is flying, because it means the second pilot isn’t going to be afraid to speak up.” (Gladwell)

Wizard Of Oz Man Behind The Curtain
“Pay no attention to the man behind the curtain.” Wizard of Oz (1939)

And so the question we have to ask ourselves is – can this also be true in software development? Is being skilled or knowledgeable enough? Or can it sometimes actually stop the line when they see things going wrong? Or might we have senior team members and managers that are intimidating or difficult to raise issues to? Do we find ourselves hinting to save face in the light of possible problems? Or, perhaps, just remaining silent on issues? (pay no attention to the man behind the curtain

Part of the research behind CRM found that, even among highly skilled professions, human errors are still going to happen. However, we can compensate for this and drastically reduce the consequences of those errors if we have a team that is effectively communicating with one another. A team where its members, regardless of position, are able to speak up and clearly communicate when they notice a problem. And thus, we don’t have to be infallible if we can work with others we can rely on to have our back and catch issues that might miss our attention.

I like this quote from the Nebraska Medical Center, one of the medical centers adopting CRM for their operating rooms:

The use of CRM creates an atmosphere of mutual responsibility not only for making sure everyone does his or her job, but also for making sure everyone else on the team is informed.

Nebraska Medical Center

Comments

24 responses to “Plane Crashes, Software Failures, and other Human Errors”

  1. Yes, communication problems galore in software development. I’ve learned over the years that hearing the phrases “I’m done” and “yes, I tested that” have multiple meanings. But specifically to your point, I have been on teams with many different personalities on the continuum from outgoing to shy, and I’ve been in post mortems where people have said, “yeah, I thought that was odd.” Luckily, nobody died – yet. And despite the use of the word team, most teams have someone in charge, and the dynamic around that person in charge has a big effect. Ego can come into play. And if the team leader is considered to be “all knowing,” sometimes junior members of the team doubt their own observations or might assume that the leader has everything under control, even if they see something that raises a flag.

    With regard to problems being caused by the more experienced people, it reminds me a bit of the Emotional Intelligence theories (http://en.wikipedia.org/wiki/Emotional_intelligence). Some people that we might traditionally consider to be super stars might not rank so high on an emotional intelligence scale. Check it out.

  2. Abby Fichtner Avatar
    Abby Fichtner

    Mike,

    Thanks for your insightful comments, once again! I think you’re right that definitely a part of this is on how receptive experienced people are to feedback.

    Another very interesting point made in Outliers is that different cultures have different norms in terms of how they defer to people in authority. So, for example, Koreans are highly respectful of those in charge and the result is that for a while Korean Air had one of the worst accident rates in the airline industry because they just couldn’t get past this to come right out and say that the senior person might be doing something wrong. It was only when they recognized that they had to find ways to bypass that culture in the cockpit that they were able to correct this.

    I think it’s interesting that you mention how we talk about “teams” but there’s still someone in charge. Maybe by pushing autonomous teams in agile (where there is no one in charge), it helps us do for our teams the same thing Korean Air did for their cockpit crews…

  3. One thing that keeps crossing my mind when reading this post and the Beautiful Team post is that I don’t think you can have just anyone be a member of a beautiful team or an autonomous team. I think the idea of an autonomous team is great, but I guess I have a hard time believing that any group of people could be reformed (not sure if I like that word) to successfully participate in such a group. It seems like you need a good combination of people. I still haven’t read Beautiful Teams yet, so maybe it’s time to do that.

  4. I wonder if the big difference between the airline and healthcare industries is that each individual error event has smaller numerical consequences. That is, if I make a mistake in surgery, _one_ person is going to be hurt or die. If I make a mistake while flying a plane, a plane-load of people (including me) are at risk, not to mention people on the ground. Every life is precious of course, but from the viewpoint of the practitioner’s risk perception, the pilot is going to see a bigger result.

    So from a risk _perception_ vantage point, I wonder if the software issue isn’t often a misdiagnosis of the importance of speaking up on any specific instance. Or maybe related is a sort of naive belief in my own ability to recover from any small personal deviations from delivery. So many teams I’ve worked on are just so grudgingly optimistic — they hate course-correction, but they’re easily swayed by calls to win another one for the Gipper.

    My first reaction, though, was to point the finger at lack of trust. I’ve been on teams where nobody would complain to authority because they felt it would not have any positive effect. This is particularly true of schedule risk, as in when the manager is making estimates and commitments without checking with the team. Luckily, burndown charts provide a great tool to help overcome the shyness in this area.

    And I think this is another argument for why _everyone_ has to participate in planning poker and other story estimation exercises. It provides an opportunity before crisis for lower-ranking, less experienced participants to ask questions and drive the team toward review. (And of course, it reinforces every participant’s responsibility for shepherding to completion.)

  5. It takes a bit of reflection because I’m so used to it. This happens all the time in software. Part of the problem is a focus on specialists — that is, for a team of 5 people, there are five distinct skills needed. There may be some cross-over in talent, but when push comes to shove the specialist gets it their way because no one else has a credible knowledge (they arn’t the expert!) in the specific field to question the specialist.

    I also see this coming in a slightly different way from managers. It isn’t that they have a particularly superior depth of knowledge, but they have the position of authority to question anything and everything — until (right or wrong) you do it their way. You get to a point where you don’t offer your opinion any more, because you know you’ll be told to “reconsider” until your opinion matches.

  6. Abby Fichtner Avatar
    Abby Fichtner

    @Mike – hah, I dunno, I wonder if people just need the experience of being part of a great team to really grok its value…

    @Abbot – so agree on having everyone just participating right up front. Now that I’ve got this notion bouncing around in my head, I keep noticing how many times on my current project that I keep my mouth shut (I know, that’s really something coming from me!) because if you make the mistake of pointing out an issue you suddenly wind up owning it – which is a huge incentive to just keep your head down and mouth shut.

    @Peter – thanks for returning! And that’s a really good point about specialists. I know agile tries to have us all the same, everyone can do everything – and maybe it’s trying to combat this problem. But, I think it’s just the way of the world that some of us are going to be better at some things then others – otherwise, well, there wouldn’t be much advantage in teams in the first place. Hmm.

    You get to a point where you don’t offer your opinion any more, because you know you’ll be told to “reconsider” until your opinion matches. The beatings will continue until morale improves.

  7. Anonymous Avatar
    Anonymous

    I think that this is an unfortunately timed post, considering the recent crash and loss of life.

  8. fintubi Avatar
    fintubi

    I’d use the Tenerife crash as a better example than Palm 90, since the authoritarian posture of the captain was so clear and the mitigated speech from the FE was so chilling: “Is he not clear of the runway, that Pan American?”

    By contrast, the captain and FO of Palm 90 seemed to have more of a co-enabling relationship, each reinforcing the other’s cavalier views on icing – and both missing engine anti-ice on the takeoff checklist. It’s still a great poster case for CRM, though, because all the screwing around trying to game the system wrt airframe ice distracted them from the actual cause of the crash. If they had engine anti-ice on, the pressure ratio probe would have given them an accurate reading, and the resulting correct thrust level would have had them airborne despite the modest airframe ice they were carrying.

  9. I am a scientist who runs a group of about a dozen other scientists, and I explicitly beg the people who work under me to FUCKING TELL ME when I am full of shit.

  10. Anonymous Avatar
    Anonymous

    I think that making mistakes is part of everyones lives. When a junior developer fails to build or test any software it is not usually that bad because junior programmers are asigned with easier, shorter tasks, but when an experienced senior developer screws, wow! he or she screws big time because senior members are often assigned with more complex or important tasks. Software development had become far more complex than years before, you can face multiple unexpected technological challenges and also teams are getting bigger, mixing different personalities into a room to work together and that increases the risk of failure even if you apply the most advanced development techniques and management models.

  11. Anonymous Avatar
    Anonymous

    I think Norwegian work culture where I manage IT-dev teams are quite the opposite of the Korean/Asian one. People barely respect authority and speaking up to someone senior is quite normal. This often has the consequence that the plane (project) never gets off the ground because the captain have to get concensus from both the crew and passengers on how and where to fly the plane.

    I don’t think it is possible to generalize a way of getting a team to work well together. It all depends too much on people, culture, the type of project etc. Communication is important, but so is knowing when to shut up.

    The examples with plane crashes are the obvious cases where the danger is real and clear. Usually it is not so obvious that there is a problem or what the correct solution might be. This is where I see the difference between the senior and junior. The junior picks up the easy to spot issues – which often have already been solved or have little or no consequence, while the senior focuses on the issues that might not be clearly visible but have the biggest consequences.

  12. Many people defer to the machine. They don’t consider that it is possible that the machine can ever fail, such as an altimeter or a computer. In essence, the computer has become the unquestionable superior in the situation, and the results can be disaster. I would expect that this is the case in authoritarian environments, such as the military, except for the possibility of senior, experienced and knowledgeable people who are made to feel inferior by the machine may induce some wariness of automation, even going to far and constantly trying to overrule the machines, with similar results of disaster. Trusting machinery takes a balanced approach, which needs to be adjusted due to the nature and history of the machine in question.

    I also extend the ideas presented above, where it is pointed out the communications difficulties cause problems. It would seem, from evidence presented above, that there is a confusion between chatter and real-time, functional communications. Some people might say that the solution is to ban chatter from the cockpit, but this would create undue social stress on the pilots and be counter productive. Instead, I suggest that IATA create a specific set of simple English phrases to communicate problems and immediate concerns.

    There also needs to be a command protocol where the junior person raises questions or concerns, which require investigation and response by senior people. Not out of hand dismissal.

    Running out of fuel while in a holding pattern is crazy. But perhaps advance warning to the control tower would be appropriate, which might include more automation: transponders which relay the amount of fuel left and expected consumption rate to each airspace control (airport tower), and more frequent alerts when the fuel becomes critically low. One only hopes that air traffic computers can handle this, but when the situation becomes critical the pilots need to inform air traffic control.

    But should such a system be put in place? Before adding a feature to any system, but critical systems most importantly, there must be an analysis of the complexity it adds to the whole system, which creates more points of failure. Keeping systems simple means that they are easier to test, and even once having passed tests, are less likely to fail.

    If I can go off on a tangent here, I am starting to think that currently popular operating systems are not appropriate for critical applications, because the kernel is too complex. The monolithic kernel of Linux keeps on growing by leaps and bounds, much of the time by kernel modules. Yet any one of these modules can fail and crash the whole system. The microkernel architecture, such as offered by Minix, GNU HURD, and some other operating systems is probably more appropriate for critical applications. But microkernels are slow, which is the trade-off that has kept them from becoming popular. But for a critical system, it’s worth the extra money that has to be spent on faster hardware.

  13. Abby Fichtner Avatar
    Abby Fichtner

    Hi, Rajiv,

    Thank you so much!  And so very true about no laws of physics to govern us.  At least our tools are getting better so we can automate checking  things like static analysis metrics, code coverage, and automated test results to help guide us – but sad that so many projects don’t do things like this, preferring instead to fly blind.   And to just, as you say, feed the devs more coffee to work more hours, as if that is a substitute for just doing things smarter. 

Leave a Reply

Your email address will not be published. Required fields are marked *