William M. Fleischman
Between June 1985 and January 1987, a series of accidents involving the Therac-25 medical linear accelerator caused severe injuries to six cancer patients. Three of these patients died as the result of massive radiation overdoses to which they were exposed. The accidents were found to have been caused by the failure of software that controlled safety critical operations of the Therac-25. A thorough retrospective analysis of these accidents undertaken by Nancy Leveson and Clark Turner revealed that, from an engineering standpoint, the Therac-25 was a poorly and carelessly designed system. More generally, their analysis points to failures at a higher level of abstraction in the systems of medical treatment in which the Therac-25 was utilized, as well as failures in the regulatory regimes meant to protect the public through prior approval and oversight of the use of such medical devices.
The Therac accidents are widely studied in courses or modules devoted to the ethical responsibilities of professionals in the computing field. It is difficult to imagine that they did not influence the authors and the content of the various professional codes of ethics – for example, the 1992 revision of the Code of Ethics of the ACM and the Software Engineering Code of Ethics promulgated in 1999.
Since the introduction of electronic voting systems following the passage of the Help America Vote Act (HAVA) in 2002, numerous studies – we cite, among others, investigations by teams at Johns Hopkins University, Princeton University, the University of California at Berkeley, and the Center for Election Integrity at Cleveland State University – have disclosed serious and unsettling flaws in virtually all of the electronic voting devices marketed and in use in the United States. In addition, experience in the use of electronic voting devices in recent elections has confirmed their fallibility. Ballots have been inexplicably lost from or added to vote totals, direct recording electronic devices (DREs) have provided incorrect ballots, machines have failed to operate at the start of voting and have broken down during the course of an election, memory cards and smart card encoders have failed during elections. Since HAVA was intended to prevent problems like those encountered in the contested 2000 Presidential election, these shortcomings have created the unsatisfactory situation in which the purported remedy for problems associated with the conduct of fair elections has in actuality served to intensify public doubts about the electoral process. By analogy with the case of the Therac-25, the software controlling these electronic voting devices can be considered “safety-critical” in the sense of safeguarding the integrity of elections on which public trust in the legitimacy of elected governments rests.
Carefully considered, these studies of the deficiencies of electronic voting systems reveal numerous striking analogies with the engineering and system failures diagnosed in the case of the Therac-25. The analogies begin at the level of operation of the devices themselves, in particular the presence in each instance of chronic “minor” malfunctions which must somehow be ignored or explained away in order not to undermine belief in the trustworthiness of the devices. At a higher level, investigations uniformly disclose the absence of defensive design, overconfidence in the infallibility of software, inadequate software engineering practices relating to safety and security, conflation of user friendly with safe interface design, and, most pointedly, inadequate or nonexistent documentation.
In comparing the medical treatment systems in which the Therac-25 was utilized with the systems of state and local election boards which form the “customer base” for electronic voting devices, we find the same pattern of articulation failure within organizations, failures of due diligence, complacency involving unwarranted trust of the vendor and tolerance for fault-prone devices, and inadequate training of personnel.
At the level of the vendor or manufacturer, the analogies that begin with poor engineering practices already cited, extend further to the predilection to rush the product to market while overselling its reliability, the absence of documentation and audit trails concerning adverse incidents, inadequate response to such incidents, and evidence of willingness to bear the cost of penalties rather than undertake necessary engineering revisions.
Finally, the situation at the level of regulatory regimes seems even less satisfactory in the cases relating to present-day electronic voting devices than it did in the 1980s in connection with the radiation accidents associated with the Therac-25. The same problem of the absence of regulatory personnel with the technological competence to evaluate system shortcomings appears to plague the current case as it did at the time of the approval and oversight of the operation of the Therac-25. At the same time, there appears to be a widespread belief at present that the regulatory system will somehow fix everything.
In this paper, we will explore the analogies between the deficiencies of the Therac-25 and those of present-day electronic voting systems by laying out in some detail the elements of similarity described above, paying particular attention to system interactions at several pertinent hierarchical levels. We will try to come to grips with the question of what has been learned – especially in regard to responsible practices of software engineering – by past experience in relevant circumstances and what makes it so difficult to avoid the vexatious repetition of certain unsatisfactory patterns of behavior, even in the presence of admonitory precedent. Finally, we discuss some possibilities for incorporating the insights offered by the comparison of these two cases in courses on ethics and professional responsibility, especially in the education of those aspiring to careers in software engineering.