Joe's Jottings

Jottings Number 74, by Joe Podolsky:

From: joe_podolsky@hp.com

Date: June 1997

Subject: Are We Determined to Get Better?

There were two thought jogging articles in _The New York Times Magazine_ on June 15, 1997.

The first is "A Bug by Any Other Name" by technology journalist James Gleick. In it, Gleick slams "Microspeak," the process of painting rosy word pictures that Microsoft has raised to a high art. In Microspeak, says Gleick, "bugs" are really "issues." "This could be a known issue or an intermittent issue. Then again, it could be a design side effect, undocumented behavior, or, perhaps, a technical glitch."

Ho hum. This is old news. Who of us who has written more than ten lines of code haven't defined the unintended behavior of our brainchild as an "unplanned feature." Gleick wants the Microsoft people (and by implication, the rest of us) to simply confess. He wants us to say, "We messed up. Here's what the problem is. We're sorry. We're going to try to fix it."

Gleick compares Microspeak to governmental "spin." But he holds us computer-folks to a higher standard. "Perhaps," he says, "Microspeak seems all the more garish because this great institution operates not in the sphere of politics or war, where we've grown accustomed to 'pacification' and 'collateral damage' and 'plausible deniability,' but in the technical realm, where words usually mean what they say."

Maybe that goes to show what I've long believed, that information systems are as much political artifacts as they are technical.

Gleick's article is light. He makes his point with a poke of irony. The second article is very different, very serious, in fact, heartbreaking. Written by Lisa Belkin, the article's title asks, "How Can We Save the Next Victim?" It's about medical mistakes, "bugs" that aren't annoying or funny or expensive, but fatal.

It is the story of Jose Eric Martinez who died on August 2, 1996 in Hermann Hospital in Houston because of a series of mistakes that resulted in the two-month-old infant getting a fatal overdose of medication. Documentation of the events is chilling, because the death was caused by intelligent people, carefully doing their jobs, within a system that failed. And, while the article revolves around the Martinez tragedy, Belkin brings in vignettes of other equally horrible incidents.

"For decades," writes Belkin, "the American Medical Association's approach to error has been to describe it as an aberration in a system that is basically safe." Both the AMA and the malpractice lawyers focused on _who_ made the error, not how to prevent it in the future.

About two years ago, however, several organizations, sponsored by the AMA and others, have been set up to focus on prevention. "...All these separate efforts," Belkin says, "add up to a growing recognition that the health of health care may lie in its ability to _admit_ and prevent its mistakes (my emphasis)."

Belkin quotes University of Texas human factors expert Robert L. Helmreich: "Admitting to imperfection is a first step for medicine, because many in the profession seem to believe they _can_ be perfect (again, my emphasis)."

"The central problem with the belief in perfection," Belkin continues, "is that hospital systems are designed around it. They rely on concentration - the nurse, for instance, will connect the nutrition bag to the nutrition line and not to the dialysis line. But things should be designed, human factors experts would argue, so that the connective ports on the nutrition bag fits _only_ on the connective port on the nutrition line (again, my emphasis)."

A major problem with all this, of course, is the tort system. As Belkin points out, "... the very methods used to root out error - admitting it, measuring it, discussing it - have the side effect of providing evidence of error, evidence that plaintiffs' lawyers are eager to see."

In the world where errors are fatal, in medicine, in airplanes, for example, there are no easy answers. We expect both the process and the people to be perfect, and we try to accomplish this by using the legal system (which is definitely not held to the same expectation of perfection) as a blunt weapon to create publicity, to inflict retribution, and to instill fear. It's not a tool that creates a virtuous cycle toward improvement.

But, we in the Information Technology world may yet have a chance to do things better. In general, information systems are made up of hardware, software, processes, and people. We are already intolerant of hardware problems. Remember Intel's problem with the Pentium's arithmetic? Intel tried to "spin" it away in classic Microspeak fashion but was smashed by the righteous uprising.

As Gleick points out, we hold no such high expectations about software packages themselves. But we still expect the overall system to produce business results. Since we tolerate "variations" in software performance, we drive people and processes to compensate for the software flaws.

And we aggravate the situation by changing things so that, often, we get only a single chance to learn something before it's replaced by the next version. Rather than asking people to focus on learning and continuous improvement, we ask them instead to become experts at managing change. Managing continuous change may resemble continuous improvement, but it's not nearly as productive in terms of business results.

Notice again that we must measure our success in terms of business _results_, not necessarily in terms of the processes that produce those results. Jeffery Voas is co-founder and chief technology officer of Reliable Software Technologies. He wrote an article in the July/August 1997 issue of _IEEE Software_ entitled, "Can Clean Pipes Produce Dirty Water." His answer: "Of course they can. Clean pipes can break, they can be attached to the wrong source, or the original water source may infuse dirty water into the pipeline." In his analogy, of course, pipelines are the software process and the water is the resulting code.

Voas also asks "the complementary question: 'Can dirty pipes produce clean water?' Once again, the answer is yes, but this result is much less likely."

Voas suggests testing procedures at key places in the process pipeline. In order to capture the dynamic behavior of software driven processes, he suggests using "fault injection" methodologies that allow us to see how our systems handle at least some anomalous situations.

Various forms of testing are important to do, and certainly cleaning known dirt from our pipelines has to help. But none of these solutions can be applied until we take a few steps.

Both Gleick and Belkin tell us what the first step has to be: admit that we have a problem. Most of us already do that, but we stop there. The issues are so pervasive and institutionalized that we don't really expect things to change. Our pain becomes fodder for Dilbert and Cathy cartoons.

So, we have to take the next step: decide to get better, as measured by business results, from metrics specified by business managers. Then and only then can we turn inward and use our technology for sustaining those results and for leading our business colleagues toward the results we feel are possible.

What are the linkages between our key processes and business results? What are the trends? What are the problems that we are seeing? Are we publicly discussing the problems in clear terms, not Microspeak, without worrying about blame? And, most important, rather than just living with the situations, are we determined to get better?

Best regards,

Joe

Back to Joe's Jottings