History's Worst Software Error
In 1985, the world witnessed a medical tragedy that shook the entire industry to its core. This event sent shockwaves through the medical community, raising questions about the importance of proper training, testing, and oversight. A state-of-the-art radiation therapy device, the THERAC-25, was responsible for the world's first death by radiation treatment overdose.
The THERAC-25: A Breakthrough in Radiation Therapy
The THERAC-25 was a breakthrough in radiation therapy technology, designed to treat cancer patients with high-energy radiation beams. It was developed by the Canadian company Atomic Energy of Canada Limited (AECL) and was marketed as a faster and more efficient alternative to earlier radiation therapy machines.
The THERAC-25 was a computer-controlled machine that could deliver precise doses of radiation to specific parts of the body, reducing the risk of damage to healthy tissue.
However, the THERAC-25 was not without its flaws. The machine was designed with hardware and software components, making it more complex than earlier models. Moreover, the software used in the THERAC-25 had never been independently reviewed, and the manufacturer did not adequately test the device before it was put into use.
The Tragic Events of 1985
In 1985, the THERAC-25 started to malfunction, causing the machine to deliver massive radiation overdoses to patients. The machine's software contained a programming error that caused the radiation beam to be delivered in a continuous mode rather than a pulsed mode, resulting in radiation doses up to 100 times higher than intended. Patients who received the overdose suffered from severe radiation burns, nausea, and vomiting.
One of the patients, a 61-year-old woman named Judi Bowers, received a massive overdose of radiation that caused her to suffer excruciating pain, ultimately leading to her death. It was later revealed that the THERAC-25 had malfunctioned more than a dozen times before Bowers' death, but the hospital staff had failed to recognize the problem.
A Cautionary Tale of Inadequate Validation and Documentation
Here's a list of the code issues that were found in the THERAC-25:
- A race condition between the operator's inputs and the machine's output could cause the machine to deliver radiation in continuous mode rather than pulsed mode.
- Poor user interface design made it difficult for operators to detect and respond to errors.
- Inadequate safety interlocks allowed operators to bypass or disable safety features.
- Poor error handling made it difficult for operators to diagnose and fix problems.
- Inadequate testing and validation of the software made it difficult to identify and correct errors.
- Lack of logging or tracking of machine usage made it difficult to identify patterns of malfunctions.
- Insufficient documentation of the software, which made it difficult to understand and maintain the code.
- Failure to follow established software development best practices, such as code reviews, version control, and software testing.
These code issues contributed to the deadly legacy of the THERAC-25 and highlighted the importance of proper software design, testing, and validation in medical devices. The THERAC-25 incident led to significant changes in the way medical devices are regulated and designed, with a greater emphasis on safety, usability, and transparency.
What Were The Code Issues With The THERAC-25
The programming error in the THERAC-25 software was caused by a race condition, which occurs when two or more processes compete for the same resource. In the THERAC-25, the race condition was between the operator's inputs and the machine's output. Specifically, if the operator entered specific commands too quickly, the machine's software would get confused and deliver the radiation beam in continuous mode rather than pulsed mode. This error was difficult to detect because the device had no built-in safety checks to prevent it from happening. Here's an example of what the code might have looked like:
while (operator_input == 1) {
set_beam_mode(CONTINUOUS_MODE);
set_beam_intensity(HIGH_INTENSITY);
activate_beam();
}
In this code example, the while loop continuously sets the beam mode to continuous and the beam intensity to high while the operator input equals 1. If the operator were to enter one multiple times in quick succession, the machine would not have enough time to deactivate the beam between pulses, resulting in a continuous beam of high-intensity radiation. This programming error highlights the importance of proper software testing and safety checks on medical devices.
One of the significant issues with the THERAC-25 was related to its user interface design. The machine's interface was poorly designed, making it difficult for operators to detect and respond to errors. For example, when an error occurred, the machine would print an error message on a paper tape rather than display it on the screen. This made it difficult for operators to identify the problem and take corrective action. Additionally, the machine did not have any built-in safety checks to prevent operators from bypassing or disabling safety interlocks, which were intended to prevent the machine from delivering radiation when it was unsafe. Here's an example of how the safety interlocks might have been implemented in the THERAC-25 software:
if (beam_on) {
if (door_closed && water_flow && vacuum_on) {
activate_beam();
} else {
display_error_message("Safety interlock failure!");
}
}
Unfortunately, the THERAC-25 did not have proper safety interlocks, which allowed operators to bypass them and deliver radiation even when it was unsafe. In this code example, the safety interlocks are designed to ensure that the beam is only activated if the door is closed, water is flowing, and the vacuum is on. However, if these conditions are not met, the machine should display an error message rather than activating the beam.
Another issue with the THERAC-25 software was related to how the machine's errors were handled. The device would display an error message and stop functioning when an error occurred. However, this made it difficult for operators to diagnose the problem and take corrective action. Sometimes, the machine would display multiple error messages simultaneously, adding to the confusion. Here's an example of how the error handling might have been implemented in the THERAC-25 software:
if (beam_on) {
if (door_closed && water_flow && vacuum_on) {
activate_beam();
} else {
display_error_message("Safety interlock failure!");
stop_beam();
}
}
if (beam_intensity > MAX_INTENSITY) {
display_error_message("Beam intensity too high!");
stop_beam();
}
if (beam_angle < MIN_ANGLE || beam_angle > MAX_ANGLE) {
display_error_message("Invalid beam angle!");
stop_beam();
}
In this code example, the machine is designed to display an error message and stop the beam if any of the safety interlocks fail, the beam intensity is too high, or the beam angle is invalid. However, the machine does not guide operators on how to diagnose and fix these problems. This lack of guidance made it difficult for operators to effectively troubleshoot the machine, contributing to the THERAC-25's deadly legacy.
The Aftermath and Lessons Learned
The THERAC-25 tragedy highlights the importance of robust software development practices, particularly in critical systems such as medical devices. Software development has evolved significantly in the decades since the THERAC-25 incident, with the adoption of continuous integration and continuous delivery (CI/CD) practices and an increased emphasis on quality assurance (QA).
CI/CD practices involve frequent, automated testing and deployment of software, allowing for faster and more reliable delivery of updates and reducing the risk of errors or vulnerabilities. Modern software development typically involves rigorous QA processes, including code reviews, unit testing, and acceptance testing. These practices help ensure that software is thoroughly tested and validated before deployment, reducing the risk of errors or defects that could have catastrophic consequences.
In the context of medical devices, these modern software development practices are critical for ensuring patient safety. Medical devices must meet rigorous safety and quality standards before they can be approved for use. Ongoing monitoring and testing are necessary to ensure they continue functioning safely and effectively. By adopting modern software development practices, medical device manufacturers can help ensure that their products meet these standards and reduce the risk of errors or malfunctions that could harm patients.
In conclusion, the lessons learned from the THERAC-25 tragedy have impacted software development, particularly in critical systems such as medical devices. By adopting modern software development practices such as CI/CD and QA, developers can help ensure their software is thoroughly tested and validated, reducing the risk of errors or vulnerabilities that could have catastrophic consequences.
Sources
- https://en.wikipedia.org/wiki/Therac-25
- https://ethicsunwrapped.utexas.edu/case-study/therac-25
- https://www.computer.org/csdl/magazine/co/2017/11/mco2017110008/13rRUxAStVR
- https://ieeexplore.ieee.org/abstract/document/274940
- https://www.cs.nmt.edu/~cse382/reading/therac-25.pdf
- https://www.sciencedirect.com/science/article/abs/pii/S1549374104300821