> software quality doesn't appear because you have good developers. It's the end result of a process, and that process informs both your software development practices, but also your testing. Your management. Even your sales and servicing.
If you only take one thing away from this article, it should be this one! The Therac-25 incident is a horrifying and important part of software history, it's really easy to think type-systems, unit-testing and defensive-coding can solve all software problems. They definitely can help a lot, but the real failure in the story of the Therac-25 from my understanding, is that it took far too long for incidents to be reported, investigated and fixed.
There was a great Cautionary Tales podcast about the device recently[0], one thing mentioned was that, even aside from the catasrophic accidents, Therac-25 machines were routinely seen by users to show unexplained errors, but these issues never made it to the desk of someone who might fix it.
This is true but there also needs to be good developers as well. It can't just be great process and low quality developer practices. There needs to be: 1/ high quality individual processes (development being one of them), 2/ high quality delivery mechanisms, 3/ feedback loops to improve that quality, 4/ out of band mechanisms to inspect and improve the quality.
I would argue that a good process always has a good self correction mechanism built in. This way, the work done by a "low quality" software developer (this includes almost all of us at some point in time), is always taken into account by the process.
I strongly believe that we will see an incident akin to Therac-25 in the near future. With as many people running YOLO mode on their agents as there are, Claude or Gemini is going to be hooked up to some real hardware that will end up killing someone.
Personally, I've found even the latest batch of agents fairly poor at embedded systems, and I shudder at the thought of giving them the keys to the kingdom to say... a radiation machine.
> Personally, I've found even the latest batch of agents fairly poor at embedded systems
I mean even simple crud web apps where the data models are more complex, and where the same data has multiple structures, the LLMs get confused after the second data transformation (at the most).
E.g. You take in data with field created_at, store it as created_on, and send it out to another system as last_modified.
The first commenter on this site introduces himself as "a physician who did a computer science degree before medical school." He is now president of the Ray Helfer Society [1], "an honorary society of physicians seeking to provide medical leadership regarding the prevention, diagnosis, treatment and research concerning child abuse and neglect."
While the cause is noble, the medical detection of child abuse faces serious issues with undetected and unacknowledged false positives [2], since ground truth is almost never knowable. The prevailing idea is that certain medical findings are considered proof beyond reasonable doubt of violent abuse, even without witnesses or confessions (denials are extremely common). These beliefs rest on decades of medical literature regarded by many as low quality because of methodological flaws, especially circular reasoning (patients are classified as abuse victims because they show certain medical findings, and then the same findings are found in nearly all those patients—which hardly proves anything [3]).
I raise this point because, while not exactly software bugs, we are now seeing black-box AIs claiming to detect child abuse with supposedly very high accuracy, trained on decades of this flawed data [4, 5]. Flawed data can only produce flawed predictions (garbage in, garbage out). I am deeply concerned that misplaced confidence in medical software will reinforce wrongful determinations of child abuse, including both false positives (unjust allegations potentially leading to termination of parental rights, foster care placements, imprisonment of parents and caretakers) and false negatives (children who remain unprotected from ongoing abuse).
I'd be interested in knowing how many of y'all are being taught about this sort of thing in college ethics/safety/reliability classes.
I was taught about this in engineering school, as part of a general engineering course also covering things like bathtub reliability curves and how to calculate the number of redundant cooling pumps a nuclear power plant needs. But it's a long time since I was in college.
Is this sort of thing still taught to engineers and developers in college these days?
> A commission attributed the primary cause to generally poor software design and development practices, rather than singling out specific coding errors.
Which to me reads as "this entire codebase was so awful that it was bound to fail in some or other way".
This reminds me of the Belgium 2003 election that was impossibly skewered by a supernova light years away sending charged particles which manage to get through our atmosphere (allegedly) and flipping a bit. Not the only case it's happened.
Well the thing is, millions of stars go supernova in the observable universe every single day. Throw in the daily gamma ray burst as well, and you've got bit flips all over the place.
More heads should have rolled over this in my opinion, absolutely despicable that they cheerfully threw innocent people in prison rather than admit their software was a heap of crap. It makes me so angry this injustice was allowed to prevail for so long because nobody cared about the people being mistreated and tarred as thieves as long as they were 'little people' of no consequence, while senior management gleefully covered themselves in criminality to cover for their own uselessness.
It's an archetypal example of 'one law for the connected, another law for the proles'. The CEO had her CBE stripped for 'bringing the honours system into disrepute' but she brought the entire justice system and the country itself into disrepute. At the bare minimum the Post Office should have been stripped of its right to bring its own prosecutions given it's clearly breathtakingly unfit as an organisation to wield such power. In my opinion they belong in prison themselves.
Probably many rather than a single bug, but the botched London Ambulance dispatch software from the 90s, is probably one of the most deadly software issues of all time, although there aren't any estimates I know of that try to quantify the number of lives lost as a result.
In the same vein one could argue that Therac-25 was not actually a software bug but a hardware problem. Interlocks, that could have prevented the accidents and that where present in earlier Therac models, were missing. The software was written with those interlocks in mind. Greedy management/hardware engineers skipped them for the -25 version.
It's almost never just software. It's almost never just one cause.
Both - and really MCAS was fine but the issue was the metering systems (Pitot tubes) and the handling of conflicting data. That part of the puzzle was definitely a bug in the logic/software.
Remember the Airbus that crashed in the middle of the Atlantic because one of the pilots kept pulling on his yoke, and the computer decided to average his input with normal input from the other pilot?
Conflict resolution in redundant systems seems to be one of the weakest spots in modern aircraft software.
We're more likely to get a similar incident like this very quickly if we continue with the cult of 'vibe-coding' and throwing away basic software engineering principles out of the window as I said before. [0]
Take this post-mortem here [1] as a great warning and which also highlights exactly what could go horribly wrong if the LLM misreads comments.
What's even more scarier is each time I stumble across a freshly minted project on GitHub with a considerable amount of attention, not only it is 99% vibe-coded (very easy to detect) but it completely lacks any tests written for it.
Makes me question the ability of the user prompting the code in the first place if they even understand how to write robust and battle-tested software.
> software quality doesn't appear because you have good developers. It's the end result of a process, and that process informs both your software development practices, but also your testing. Your management. Even your sales and servicing.
If you only take one thing away from this article, it should be this one! The Therac-25 incident is a horrifying and important part of software history, it's really easy to think type-systems, unit-testing and defensive-coding can solve all software problems. They definitely can help a lot, but the real failure in the story of the Therac-25 from my understanding, is that it took far too long for incidents to be reported, investigated and fixed.
There was a great Cautionary Tales podcast about the device recently[0], one thing mentioned was that, even aside from the catasrophic accidents, Therac-25 machines were routinely seen by users to show unexplained errors, but these issues never made it to the desk of someone who might fix it.
[0] https://timharford.com/2025/07/cautionary-tales-captain-kirk...
This is true but there also needs to be good developers as well. It can't just be great process and low quality developer practices. There needs to be: 1/ high quality individual processes (development being one of them), 2/ high quality delivery mechanisms, 3/ feedback loops to improve that quality, 4/ out of band mechanisms to inspect and improve the quality.
I would argue that a good process always has a good self correction mechanism built in. This way, the work done by a "low quality" software developer (this includes almost all of us at some point in time), is always taken into account by the process.
I strongly believe that we will see an incident akin to Therac-25 in the near future. With as many people running YOLO mode on their agents as there are, Claude or Gemini is going to be hooked up to some real hardware that will end up killing someone.
Personally, I've found even the latest batch of agents fairly poor at embedded systems, and I shudder at the thought of giving them the keys to the kingdom to say... a radiation machine.
The 737 MAX MCAS debacle was one such failure, albeit involving a wider system failure and not purely software.
Agreed on the future but I think we were headed there regardless.
> Personally, I've found even the latest batch of agents fairly poor at embedded systems
I mean even simple crud web apps where the data models are more complex, and where the same data has multiple structures, the LLMs get confused after the second data transformation (at the most).
E.g. You take in data with field created_at, store it as created_on, and send it out to another system as last_modified.
The first commenter on this site introduces himself as "a physician who did a computer science degree before medical school." He is now president of the Ray Helfer Society [1], "an honorary society of physicians seeking to provide medical leadership regarding the prevention, diagnosis, treatment and research concerning child abuse and neglect."
While the cause is noble, the medical detection of child abuse faces serious issues with undetected and unacknowledged false positives [2], since ground truth is almost never knowable. The prevailing idea is that certain medical findings are considered proof beyond reasonable doubt of violent abuse, even without witnesses or confessions (denials are extremely common). These beliefs rest on decades of medical literature regarded by many as low quality because of methodological flaws, especially circular reasoning (patients are classified as abuse victims because they show certain medical findings, and then the same findings are found in nearly all those patients—which hardly proves anything [3]).
I raise this point because, while not exactly software bugs, we are now seeing black-box AIs claiming to detect child abuse with supposedly very high accuracy, trained on decades of this flawed data [4, 5]. Flawed data can only produce flawed predictions (garbage in, garbage out). I am deeply concerned that misplaced confidence in medical software will reinforce wrongful determinations of child abuse, including both false positives (unjust allegations potentially leading to termination of parental rights, foster care placements, imprisonment of parents and caretakers) and false negatives (children who remain unprotected from ongoing abuse).
[1] https://hs.memberclicks.net/executive-committee
[2] https://news.ycombinator.com/item?id=37650402
[3] https://pubmed.ncbi.nlm.nih.gov/30146789/
[4] https://rdcu.be/eCE3l
[5] https://www.sciencedirect.com/science/article/pii/S002234682...
I'd be interested in knowing how many of y'all are being taught about this sort of thing in college ethics/safety/reliability classes.
I was taught about this in engineering school, as part of a general engineering course also covering things like bathtub reliability curves and how to calculate the number of redundant cooling pumps a nuclear power plant needs. But it's a long time since I was in college.
Is this sort of thing still taught to engineers and developers in college these days?
It was taught in a first year software ethics class on my Computer Science programme. Back in 2010. I'm wondering if they still do
This was part of our Systems Engineering class, something like this: https://web.mit.edu/6.033/2014/wwwdocs/assignments/therac25....
Im too curious, I made a poll. I for sure wasnt in computer science uni. I only heard about it vaguely online.
https://strawpoll.com/NMnQNX9aAg6
I was taught about it in university as a computer science undergrad, thought about it often since I ended up working in medtech.
My (tragically) favorite part is, from wikipedia:
> A commission attributed the primary cause to generally poor software design and development practices, rather than singling out specific coding errors.
Which to me reads as "this entire codebase was so awful that it was bound to fail in some or other way".
This reminds me of the Belgium 2003 election that was impossibly skewered by a supernova light years away sending charged particles which manage to get through our atmosphere (allegedly) and flipping a bit. Not the only case it's happened.
On the bright side, wow, those computers are really sturdy: takes a whole supernova to just flip a bit :)
Well the thing is, millions of stars go supernova in the observable universe every single day. Throw in the daily gamma ray burst as well, and you've got bit flips all over the place.
TIL TheDailyWTF is still active. I'd thought it had settled to greatest hits only some years ago.
I was taught this incident in university many years ago. It's undeniably an important lesson that shouldn't be forgotten
Well There's Your Problem podcast, Episode 121: Therac-25
https://www.youtube.com/watch?v=7EQT1gVsE6I
The most deadly bug in history. If you know any other deadly bug, please share! I love these stories!
Several people killed themselves over this: https://www.wikipedia.org/wiki/British_Post_Office_scandal
More heads should have rolled over this in my opinion, absolutely despicable that they cheerfully threw innocent people in prison rather than admit their software was a heap of crap. It makes me so angry this injustice was allowed to prevail for so long because nobody cared about the people being mistreated and tarred as thieves as long as they were 'little people' of no consequence, while senior management gleefully covered themselves in criminality to cover for their own uselessness.
It's an archetypal example of 'one law for the connected, another law for the proles'. The CEO had her CBE stripped for 'bringing the honours system into disrepute' but she brought the entire justice system and the country itself into disrepute. At the bare minimum the Post Office should have been stripped of its right to bring its own prosecutions given it's clearly breathtakingly unfit as an organisation to wield such power. In my opinion they belong in prison themselves.
Probably many rather than a single bug, but the botched London Ambulance dispatch software from the 90s, is probably one of the most deadly software issues of all time, although there aren't any estimates I know of that try to quantify the number of lives lost as a result.
http://www0.cs.ucl.ac.uk/staff/a.finkelstein/papers/lascase....
The MCAS related bugs @ Boeing led to 300+ deaths, so it's probably a contender.
Was that a bug or a failure to inform pilots about a new system?
In the same vein one could argue that Therac-25 was not actually a software bug but a hardware problem. Interlocks, that could have prevented the accidents and that where present in earlier Therac models, were missing. The software was written with those interlocks in mind. Greedy management/hardware engineers skipped them for the -25 version.
It's almost never just software. It's almost never just one cause.
Just to point it out even clearer - there's almost never a root cause.
Both - and really MCAS was fine but the issue was the metering systems (Pitot tubes) and the handling of conflicting data. That part of the puzzle was definitely a bug in the logic/software.
Remember the Airbus that crashed in the middle of the Atlantic because one of the pilots kept pulling on his yoke, and the computer decided to average his input with normal input from the other pilot?
Conflict resolution in redundant systems seems to be one of the weakest spots in modern aircraft software.
The 737 Max MCAS is arguably a bug. That killed 346 people.
Not a "bug" per se, but texting while driving kills ~400 people per year in the US. It's a bug at some level of granularity.
To be tongue in cheek a bit, buggy JIRA latency has probably wasted 10,000 human years. Those are many whole human lives if you count them up.
We're more likely to get a similar incident like this very quickly if we continue with the cult of 'vibe-coding' and throwing away basic software engineering principles out of the window as I said before. [0]
Take this post-mortem here [1] as a great warning and which also highlights exactly what could go horribly wrong if the LLM misreads comments.
What's even more scarier is each time I stumble across a freshly minted project on GitHub with a considerable amount of attention, not only it is 99% vibe-coded (very easy to detect) but it completely lacks any tests written for it.
Makes me question the ability of the user prompting the code in the first place if they even understand how to write robust and battle-tested software.
[0] https://news.ycombinator.com/item?id=44764689
[1] https://sketch.dev/blog/our-first-outage-from-llm-written-co...
Wondering if that "one developer" is here on HN.