1083 lines
51 KiB
Plaintext
1083 lines
51 KiB
Plaintext
Famous Bugs courtesy of Dave Curry (ecn.davy@purdue).
|
||
|
||
Originally From John Shore (shore@nrl-css)
|
||
|
||
Some time ago, I sent out a request for documented reports on
|
||
"famous bugs", promising to summarize the results for all who contributed.
|
||
This is the promised summary.
|
||
|
||
I judge the effort as a failure. People encounter bugs and fix bugs
|
||
and talk about bugs, but they rarely document bugs. Those responding
|
||
to my request were well meaning and helpful, but I was left feeling
|
||
a little like my hair was being done by a gossip-laden ex-hacker.
|
||
|
||
I am, of course, no different. I didn't have documentation for
|
||
most of the famous bugs I knew about, and I don't have sufficient time,
|
||
energy, and interest to follow all the leads. I should have known better.
|
||
|
||
One strong conclusion that I've reached is that many
|
||
computer system horror stories become known as bugs when in fact
|
||
they are not -- they're system problems that result from hardware failures,
|
||
operator mistakes, and the like. Let me mention a few examples. In his
|
||
book Software Reliablility, Glenford Myers mentioned a number of classical
|
||
software errors. For example, he mentioned that a software error in the
|
||
onboard computer of the Apollo 8 spacecraft erased part of the
|
||
computer's memory. I happen to have a copy of a memo by Margaret
|
||
Hamilton that summarized the conclusions of a detailed study of
|
||
every Apollo bug (see below), so I looked under Apollo 8. The best
|
||
I could find was this: "16. Illegal P01. P01 was selected during
|
||
midcourse, illegally by the astronaut. This action destroyed W-matrix
|
||
(several erasables in AGC)." The software did erase the memory, but it
|
||
did so because the astronaut did something illegal, and not because the
|
||
programmer goofed. This example is characteristic of the Apollo errors
|
||
(see below), most of which show the need for better exception handling as
|
||
part of the software specification. But weak specifications are not
|
||
the same thing as bugs.
|
||
|
||
Here's another example, also from Apollo. It starts with a
|
||
note to me from Kaeler.pa at Xerox (via Butler Lampson):
|
||
|
||
I heard about the MIT summer student at NASA whose Apollo program filled
|
||
up memory (with logging information) and flashed the red abort light a
|
||
few seconds before the first moon landing. The student was in the
|
||
control room, and after a little thought, said, "Go ahead, I know what
|
||
it is, I will take responsibility". Luckily, he was right. He was
|
||
awarded all sorts of medals for being willing to take the
|
||
responsibility.
|
||
|
||
You should get this story from the horse's mouth before distributing
|
||
it. I heard it from Neil Jacobstein (212-454-0212, in new york). He
|
||
heard it from his advisor at the Johnson Space Center in Houston a few
|
||
years ago. It might be interesting to trace this back to someone who
|
||
really knows about it.
|
||
|
||
I called Jacobstein, and after some discussion he decided that the "bug"
|
||
was probably the famous "1201 descent alarm" that I mention below.
|
||
Again, this was caused by an astronaut error (a radar was turned on
|
||
when it should have been off).
|
||
|
||
Lot's of people mentioned to me various NORAD "bugs" that
|
||
caused alerts. I got a copy of a Senate Armed Services Committee Report of
|
||
9 October, 1980, "Recent False Alerts From the Nation's Missile
|
||
Attack Warning System." It deals primarily with the June 1980 alerts,
|
||
but it contains the following summary:
|
||
|
||
Oct. 3, 1979 -- An SLBM radar (Mt. Hebro) picked up a low orbit rocket
|
||
body that was close to decay and generated a false launch and impact
|
||
report.
|
||
|
||
November 9, 1979 -- False indications of a mass raid caused by inadvertent
|
||
introduction of simulated data into the NORAD Computer System.
|
||
|
||
March 15, 1980 -- Four SS-N-6 SLBMs were launched from the Kuril
|
||
Islands as part of Soviet troop training. One of the lauches generated
|
||
an unusual threat fan.
|
||
|
||
June 3, 1980 -- False indications caused by a bad chip in a communications
|
||
processor computer
|
||
|
||
According to Borning@Washington (who by the way is studying computer
|
||
problems in missile warning systems), the cause of the Nov. 1979 problem
|
||
was as follows:
|
||
|
||
To test the warning system, false attack data was intermixed with data
|
||
from actual satellite observations, put on tape, and run through the
|
||
system. On November 9, the test tape of this sort was accidentally
|
||
left mounted on a secondary backup computer. This machine was left
|
||
connected to the system in use. When the primary computer failed, a
|
||
backup computer was activated, which also failed. Then the secondary
|
||
computer came into play, causing the alert.
|
||
|
||
All of these missile alerts were caused by real flying objects, hardware
|
||
failures, or human error. I'm not saying that bugs didn't cause any
|
||
missile alerts, just that the ones that are reputed to have been
|
||
caused by bugs in fact were not.
|
||
|
||
Perhaps computer software -- as opposed to the combination of
|
||
hardware, software, and people -- is more reliable than folklore
|
||
has it. I would be interested in hearing your comments on this
|
||
proposition.
|
||
|
||
Despite the foregoing problems, the assembly of responses makes
|
||
interesting reading. In the following, I'll mention a few
|
||
well-documented bugs and then append various extracts
|
||
from what I received. Thanks to all who responded. In most cases, I
|
||
eliminated duplicates. Special thanks to Peter Neumann (SRI),
|
||
who seems to be keeping better track of these problems than anyone else.
|
||
Many of these don't qualify as bugs by anyone's definition, but
|
||
they're interesting stories so I've included some of them anyway.
|
||
|
||
-------------
|
||
|
||
Space-Shuttle Bug
|
||
|
||
This may be the most famous of all -- the one that delayed at the last
|
||
minute the launch of the first space shuttle. The cause was was
|
||
a bug that interfered with the communication between concurrent
|
||
processes -- an area of progamming that is among the least well in hand.
|
||
John R. Garman wrote about the problem in detail in ACM SIGSOFT Software
|
||
Engineering News (SEN), vol. 6, No. 5, pp 3-10.
|
||
|
||
--------------
|
||
|
||
The First Bug
|
||
|
||
Worth mentioning for the trivia folks, it was a moth that
|
||
was beaten to death by a relay in the Mark II. It was discovered by the
|
||
Mark II staff, which included Grace Hopper. (Reports on this
|
||
are a bit confusing. Many attribute the bug to her; in her own
|
||
published account she refers to "we". I called and asked her. She
|
||
said the machine operator actually pulled the moth out, and that it
|
||
was found by the combined efforts of the staff.) Lots of people mentioned
|
||
this bug to me; my favorite report attributed the bug to
|
||
|
||
"some little old lady who works for the Navy. She's the
|
||
one who came up with Cobol, I think."
|
||
|
||
In fact, this is one of the better-documented bugs. You can even
|
||
see its picture in the Anals of the History of Computing (vol. 3,
|
||
July 1981, page 285). It's ironic that "modern" bugs have
|
||
practically nothing in common with the first one (an exception being
|
||
Dijkstra's well known remark about testing).
|
||
|
||
--------------
|
||
|
||
ARPANET Gridlock
|
||
|
||
In October 1980 the net was unusable for a period of several
|
||
hours. It turned out that the routing processes in all the IMPs were
|
||
consuming practically all resources as the result of processing three
|
||
inconsistent routing updates. It turned out that the inconsistency
|
||
arose from dropped bits in a single IMP. Whether you choose
|
||
to call this a bug or not, clearly it demonstrated a design failure. The
|
||
details are reported well by Eric Rosen in SEN, January 1981.
|
||
|
||
---------------------
|
||
|
||
APOLLO flight experiences
|
||
|
||
When Margaret Hamilton was working at the Charles Stark Draper
|
||
Laboratory in the early 1970s, she documented and analyzed in some
|
||
detail the various "software anomalies" that occurred during several
|
||
APOLLO flights. Apparently she did this at the request of Marty Shooman.
|
||
I don't think that she ever published the results, but
|
||
some years back she gave us a copy of "Shuttle Management Note #14"
|
||
(23 October 1972), which summarized her analysis. It makes interesting
|
||
reading.
|
||
|
||
One of her strongest conclusions was that 73% of the problems were
|
||
caused by "real-time human error". Translated roughly into 1983
|
||
computer-speak, this means that the APOLLO software wasn't user friendly.
|
||
(I guess they didn't have icons or something.) Apparently, there
|
||
was much debate about this during the design, but the software types
|
||
were told that astronauts have the right stuff or something so there
|
||
was no need to make the software robust.
|
||
|
||
One example is quite widely known, as it occured during the APOLLO
|
||
11 landing on the moon. In what was referred to as "1201-1202 Descent
|
||
Alarms", the software kept restarting as the result of overloading. Turned
|
||
out the radar switch was in the wrong position and used up 13% more
|
||
computer time than had been anticipated.
|
||
|
||
Hamilton states that "pure software errors" were not a problem on
|
||
APOLLO flights. I guess she means that the software met its specifications,
|
||
which is quite an accomplishment. But the specifications apparently did
|
||
not say much about error detection and recovery. Hamilton states that
|
||
"all potentially catastrophic problems would have been prevented by
|
||
a better and/or known philosophy of providing error detection and
|
||
recovery via some mechanism."
|
||
|
||
________________________
|
||
|
||
Nuclear Reactor Design Program
|
||
|
||
I don't know when the bug was first introduced, but
|
||
it transpired in 1979. From Jim Horning (Horning.pa@parc-maxc):
|
||
|
||
A belatedly-discovered bug in a stress analysis program (converting a
|
||
vector into a magnitude by summing components--rather than summing
|
||
absolute values; module written by a summer student) caused a number of
|
||
nuclear reactors to be closed down for checks and reinforcement a about
|
||
three years ago (not long after TMI). This was fairly widely discussed
|
||
in the press at the time, but I never did see how the lawsuits came out
|
||
(if, indeed, they have been completed).
|
||
|
||
>From br10@cmu-10a came the following newswire stories:
|
||
|
||
a023 0026 16 Mar 79
|
||
PM-Topic, Advisory,
|
||
Managing Editors:
|
||
Wire Editors:
|
||
It all started in a tiny part of a computer program used by an
|
||
engineering firm designing nuclear reactors. It ended with the
|
||
shutdown of five nuclear power plants at a time when President Carter
|
||
is pushing oil conservation and the world oil market is in turmoil.
|
||
The computer miscalculated some safety precautions required by law.
|
||
The power from the closed plants now will have to be replaced by
|
||
electicity generated with oil or coal. This may cost utility customers
|
||
money and throw a curve at Carter's conservation program.
|
||
In Today's Topic: The Little Computer and the Big Problem, AP writer
|
||
Evans Witt traces this glitch in the system, from the obscure
|
||
computer to its possible effect on the nation's energy problems.
|
||
The story, illustrated by Laserphoto NY7, is upcoming next.
|
||
The AP
|
||
|
||
ap-ny-03-16 0328EST
|
||
***************
|
||
|
||
a024 0044 16 Mar 79
|
||
PM-Topic-Glitch, Bjt,950
|
||
TODAY'S TOPIC: The . yyter and the Big Problem
|
||
Laserphoto NY7
|
||
By EVANS WITT
|
||
Associated Press Writer
|
||
WASHINGTON (AP) - Something just didn't add up.
|
||
And the result is: five nuclear power plants are shut down;
|
||
millions of Americans may pay higher utility bills; and a sizable blow
|
||
may have been struck to President Carter's efforts to reduce the use of
|
||
imported oil and to control inflation.
|
||
The immediate source of all this is part of the federal bureaucracy
|
||
- the Nuclear Regulatory Commission which ordered the shutdowns.
|
||
But in one sense, the ultimate culprit was ''Shock II,'' a tiny
|
||
part of a computer program used by a private firm to design the power
|
||
plants' reactors.
|
||
Shock II was wrong and that means parts of the five reactors might
|
||
not survive a massive earthquake. Shock II was the weak link that
|
||
could have allowed the chain to snap.
|
||
In between Shock II and the shutdowns were a public utility, a
|
||
private engineering firm and the NRC staff. It was really the
|
||
judgments of the dozens of scientists and engineers, not elected or
|
||
appointed officials, that led to the shutdowns.
|
||
Perhaps as a result, the decision's impact on the nation's energy
|
||
situation was not even considered until the very last moment - when
|
||
the commission itself was faced with the final decision.
|
||
And at that point, the NRC said, it had no choice. It said the law
|
||
was clear: serious questions about the reactors had been raised and
|
||
the reactors had to be turned off until answers were found.
|
||
The specific questions are arcane engineering issues, but the
|
||
explanation is straightfoward: Will some of the systems designed to
|
||
protect the reactor survive an earthquake - or will they fail, and
|
||
possibly allow radioactive death to spew into the air?
|
||
The regulations say the reactors must be able to withstand a quake
|
||
equal to the strongest ever recorded in their area. The regulations
|
||
don't allow any consideration of the likelihood of a major quake. All
|
||
four states where the reactors are located - New York, Pennsylvania,
|
||
Maine and Virginia - have had minor quakes in this decade and
|
||
damaging quakes at least once in this century.
|
||
The only way to test them - short of having a massive earthquake -
|
||
is to test a model of the reactor. The ''model'' is actually a set of
|
||
mathematical formulas in a computer that reflect how the reactor and
|
||
its parts will behave in a quake.
|
||
The model used for the five reactors came from Stone and Webster,
|
||
the large Boston engineering and architectural firm that designed the
|
||
plants. The Stone and Webster model indicated how strong and well
|
||
supported pipes had to be and how strong valves had to be.
|
||
The problem apparently cropped up after Stone and Webster suggested
|
||
within the last few months more pipe supports in the secondary
|
||
cooling system of the reactor at Shippingport, Pa., operated by
|
||
Duquesne Light Co. in Pittsburgh.
|
||
But why were the supports needed? ''This was not clear to us,
|
||
looking at the calculations done by the models,'' said Gilbert W.
|
||
Moore, Duquesne's general superintendent of power stations.
|
||
So Dusquesne - and Stone and Webster - sent the computer models
|
||
through their paces again, having them calculate and recalculate what
|
||
would happen to the pipes in an earthquake.
|
||
''We came out with some numbers which were not in the range we
|
||
would like,'' Moore said.
|
||
That made the problem clear - the model now said the pipes might
|
||
break in an earthquake. The previous analysis indicated an adequate
|
||
safety margin in the pipes, and Stone and Webster's explanation was:
|
||
''One subroutine may not give uniformly conservative results.''
|
||
The problem was in a ''subroutine,'' a small part of the computer
|
||
model, called ''Shock II,'' said Victor Stello, director of NRC's
|
||
division of reactor operations.
|
||
''The facts were that the computer code they were using was in
|
||
error,'' said Stello. ''Some of the computer runs were showing things
|
||
are okay. In some cases, the piping systems were not okay.
|
||
''We didn't know the magnitude of the error or how many plants
|
||
might be affected,'' he said.
|
||
It was on March 1 that Duquesne told the NRC of the problem by
|
||
telephone and asked for a meeting to discuss it. The same day, Energy
|
||
Secretary James R. Schlesinger was telling Congress that unleaded gas
|
||
might cost $1 a gallon within a year and service stations might be
|
||
ordered shut down on Sundays because of oil shortages.
|
||
The meeting took place on Thursday, March 8, in Washington with NRC
|
||
staff, Stone and Webster engineers and Duquesne Light people on hand.
|
||
Through the weekend, Stello said, engineers from NRC, Duquesne and
|
||
Stone and Webster worked at the private firm's Boston office,
|
||
analyzing the severity of the problem.
|
||
''By the middle of Sunday (March 10) we begin to get a pretty good
|
||
idea of what it meant for the systems,'' Stello said. ''Monday, we got
|
||
the latest information from our people at the Stone and Webster
|
||
offices. It became clear that there would be a number of the safety
|
||
systems that would have stresses in excess of allowable limits. The
|
||
magnitude of the excess was considerable.''
|
||
Tuesday, members of the NRC were briefed by their staff of
|
||
engineers and scientists. They asked for an analysis of the economic
|
||
impact of the decision, and then ordered the plants closed within 48
|
||
hours.
|
||
And the five reactors shut down: Duquesne Light Co.'s Beaver Valley
|
||
plant at Shippingport, Pa.; Maine Yankee in Wiscasset, Maine; the
|
||
Power Authority of New York's James Fitzpatrick plant at Scriba, N.Y.;
|
||
and two Virginia and Electric Power Co. reactors at Surry, Va.
|
||
It may take months to finish the analysis of the potential problems
|
||
and even longer to make changes to take care of the situation.
|
||
Until the reactors start generating again, the utilities will have
|
||
to turn to plants using oil or coal. This may cost more, and that cost
|
||
may be borne by the millions of utility customers.
|
||
To replace the power from these nuclear plants could require
|
||
100,000 barrels of oil a day or more. And this at a time when
|
||
President Carter has promised to cut U.S. oil consumption by 5 percent
|
||
- about 1 million barrels a day - and when the world's oil markets are
|
||
in turmoil because of recent upheavals in Iran.
|
||
|
||
-------------------------------
|
||
|
||
Summary of various problems from NEUMANN@SRI-AI
|
||
|
||
Review of Computer Problems -- Catastrophes and Otherwise
|
||
|
||
As a warmup for an appearance on a SOFTFAIR panel on computers and
|
||
human safety (28 July 1983, Crystal City, VA), and for a new editorial
|
||
on the need for high-quality systems, I decided to look back over
|
||
previous issues of the ACM SIGSOFT SOFTWARE ENGINEERING NOTES [SEN]
|
||
and itemize some of the most interesting computer problems recorded.
|
||
The list of what I found, plus a few others from the top of the head,
|
||
may be of interest to many of you. Except for the Garman and Rosen
|
||
articles, most of the references to SEN [given in the form (SEN Vol
|
||
No)] are to my editorials.
|
||
|
||
SYSTEM --
|
||
SF Bay Area Rapid Transit (BART) disaster [Oct 72]
|
||
Three Mile Island (SEN 4 2)
|
||
SAC: 50 false alerts in 1979 (SEN 5 3);
|
||
simulated attack triggered a live scramble [9 Nov 79] (SEN 5 3);
|
||
WWMCCS false alarms triggered scrambles [3-6 Jun 80] (SEN 5 3)
|
||
Microwave therapy killed arthritic patient by racing pacemaker (SEN 5 1)
|
||
Credit/debit card copying despite encryption (Metro, BART, etc.)
|
||
Remote (portable) phones (lots of free calls)
|
||
|
||
SOFTWARE --
|
||
First Space Shuttle launch: backup computer synchronization (SEN 6 5 [Garman])
|
||
Second Space Shuttle operational simulation: tight loop on cancellation
|
||
of early abort required manual intervention (SEN 7 1)
|
||
F16 simulation: plane flipped over crossing equator (SEN 5 2)
|
||
Mariner 18: abort due to missing NOT (SEN 5 2)
|
||
F18: crash due to missing exception condition (SEN 6 2)
|
||
El Dorado: brake computer bug causing recall (SEN 4 4)
|
||
Nuclear reactor design: bug in Shock II model/program (SEN 4 2)
|
||
Various system intrusions ...
|
||
|
||
HARDWARE/SOFTWARE --
|
||
ARPAnet: collapse [27 Oct 1980] (SEN 6 5 [Rosen], 6 1)
|
||
FAA Air Traffic Control: many outages (e.g., SEN 5 3)
|
||
SF Muni Metro: Ghost Train (SEN 8 3)
|
||
|
||
COMPUTER AS CATALYST --
|
||
Air New Zealand: crash; pilots not told of new course data (SEN 6 3 & 6 5)
|
||
Human frailties:
|
||
Embezzlements, e.g., Muhammed Ali swindle [$23.2 Million],
|
||
Security Pacific [$10.2 Million],
|
||
City National, Beverly Hills CA [$1.1 Million, 23 Mar 1979]
|
||
Wizards altering software or
|
||
critical data (various cases)
|
||
|
||
SEE ALSO A COLLECTION OF COMPUTER ANECDOTES SUBMITTED FOR the 7th SOSP
|
||
(SEN 5 1 and SEN 7 1) for some of your favorite operating system
|
||
and other problems...
|
||
|
||
[Muni Metro Ghosts]
|
||
|
||
The San Francisco Muni Metro under Market Street has been plagued with
|
||
problems since its inauguration. From a software engineering point of
|
||
view, the most interesting is the Ghost Train problem, in which the
|
||
signalling system insisted that there was a train outside the
|
||
Embarcadero Station that was blocking a switch. Although in reality
|
||
there was obviously no such train, operations had to be carried on
|
||
manually, resulting in increasing delays and finally passengers were
|
||
advised to stay above ground. This situtation lasted for almost two
|
||
hours during morning rush hour on 23 May 1983, at which point the
|
||
nonexistent train vanished as mysteriously as it had appeared in the
|
||
first place. (The usual collection of mechanical problems also has
|
||
arisen, including brakes locking, sundry coupling problems, and
|
||
sticky switches. There is also one particular switch that chronically
|
||
causes troubles, and it unfortunately is a weakest-link single point
|
||
of failure that prevents crossover at the end of the line.)
|
||
|
||
---------------------
|
||
|
||
Problems mentioned in the book Software Reliability, Glen Myers
|
||
|
||
Myers mentions a variety of problems. One famous one (lot's of
|
||
people seem to have heard about it) is the behavior of an early
|
||
version of the ballistic missile early warning system in identifying
|
||
the rising moon as an incoming missile. Myers points out that, by
|
||
many definitions, this isn't a software error -- a problem I discussed
|
||
a bit at the beginning of this message.
|
||
|
||
Other problems mentioned by Myers include various Apollo errors I've
|
||
already mentioned, a 1963 NORAD exercise that was incapacitated because
|
||
"a software error casued the incorrect routing of radar information",
|
||
the loss of the first American Venus probe (mentioned below in more
|
||
detail). Mentioned with citations were an Air Force command system that
|
||
was averaging one software failure per day after 12 years in operation,
|
||
deaths due to errors in medical software, and a crash-causing error
|
||
in an aircraft design program. I was not able to follow up on any of
|
||
the citations.
|
||
|
||
__________________________________
|
||
__________________________________
|
||
|
||
The rest of this message contains excerpts from things I received from
|
||
all over, in most cases presented without comments.
|
||
|
||
__________________________________
|
||
__________________________________
|
||
faulk@nrl-css (who got it elsewhere)
|
||
|
||
Today, I heard good story that is a perfect example of the problems
|
||
that can arise whe the assumptions that one module makes about another
|
||
are not properly documented. (The story is from a system engineer here
|
||
whose father got it from the Smithsonian Space people.)
|
||
|
||
Aparently, the Jupiter(?) probe to mars could have programs beamed to
|
||
it which it would load in internal memory. The system engineers used this
|
||
property to make mission changes and/or corrections. After the probe had
|
||
been on Mars for a while, memory started getting tight. One of the
|
||
engineers had the realization that they no longer needed the module that
|
||
controled the landing so the space could be used for something else. The
|
||
probe was sent a new program that overwrote the landing module. As soon as
|
||
this was accomplished, all contact was lost to the probe.
|
||
|
||
Looking back into the code to find what had gone wrong, the programmers
|
||
discovered that because the landing module had to have information on
|
||
celestial navigation, some or all of the celestial navigation functions
|
||
were inclded in the landing module. Unfortunately, the antenna pointing
|
||
module also required celestial navigation information to keep the antenna
|
||
pointed at earth. To do this, it use the navigation functions in the
|
||
landing module. Overlaying the module has left the antenna pointing in some
|
||
unknown direction and all contact with the craft has been lost forever.
|
||
|
||
Fortunately, all of the mission requirements had been fulfilled so it
|
||
was no great loss. It can live on as a great example of bad design.
|
||
|
||
|
||
-------------------------
|
||
mo@LBL-CSAM
|
||
|
||
The folklore tells of a bug discovered during the fateful
|
||
flight of Apollo 13. It seems that the orbital mechanics
|
||
trajectory calculation program had a path which had never been
|
||
excercised because of the smooth, gentle orbital changes
|
||
characteristic of a nominal Apollo flight. However, when
|
||
the flight dynamics team was trying ways to get them home with
|
||
the aid of much more creative maneuvers, the program promplty
|
||
crashed with a dump (running on IBM equipment, I believe).
|
||
The story goes that the fix was simple - something on the
|
||
order of a missing decimal, or a zero-oh reversal,
|
||
(Divide by zero!!!!!)
|
||
but there was much consternation and tearing of hair when this
|
||
critical program bought the farm in the heat of the moment.
|
||
|
||
This was related to me by an ex-NASA employee, but I have heard
|
||
it through other paths too. I guess the NASA flight investigation
|
||
summary would be one place to try and verify the details.
|
||
|
||
----------------------------
|
||
jr@bbncd
|
||
|
||
One cute one was when the Multics swapper-out process swapped out the
|
||
swapper-in process. (recall that all of the Multics OS was swappable)
|
||
|
||
------------------------------------
|
||
dan@BBN-UNIX
|
||
|
||
Here in Massachusetts we've recently begun testing cars for emissions. All car
|
||
inspections are carried out at gas stations, which in order to participate in
|
||
the program had to buy a spiffy new emissions analyzer which not only told you
|
||
what your emissions were, but passed judgement on you as well, and kept a
|
||
recorrd on mag tape which was sent to the Registry of Motor Vehicles so that
|
||
they could monitor compliance.
|
||
|
||
Well, on June 1 the owners of the cheaper ($8K) of the two acceptable analyzers
|
||
discovered that their machines could not be used; they didn't like the month of
|
||
June! The company which built them, Hamilton, had to apply a quick fix which
|
||
told the machines that it was actually December (!?). Lots of people were
|
||
inconvenienced.
|
||
|
||
Unfortunately all I know about this at the moment is what the Boston Globe had
|
||
to say, so I don't know what the actual problem was. The article said that the
|
||
quick cure involved replacing the "June" chip with the "December" chip; I don't
|
||
know what that means, if anything. Electronic News or Computerworld ought to
|
||
have more accurate information.
|
||
|
||
Don't forget about the rocket launch early in the space program which had to be
|
||
aborted because the Fortran program controlling it believed that the number of
|
||
seconds in a day was 86400 (rather than the sidereal time figure).
|
||
|
||
The recent issue of Science News with the cover story on when computers make
|
||
mistakes mentioned a story about a graduate student who almost didn't get his
|
||
thesis due to inaccuracies in the university computer's floating point
|
||
software. Not really earthshaking, except to him, I suppose.
|
||
|
||
-----------------------------
|
||
STERNLIGHT@USC-ECL
|
||
|
||
I don't have the data, but there were at least two "almost launches"
|
||
of missiles. The rising moon was only one. You
|
||
might try contacting Gus Weiss at the National Security Council--
|
||
he will be able to tell you quite a bit. Mention my name if you
|
||
like.
|
||
|
||
[I called Weiss, who didn't have much to say. He kept repeating
|
||
that the problems were fixed--js]
|
||
|
||
-------
|
||
|
||
mark@umcp-cs
|
||
|
||
The following is probably not famous except with me, but deserves to be.
|
||
|
||
True story:
|
||
Once upon a time I managed to write a check so that my
|
||
bank balance went exactly to zero. This was not so unusual an
|
||
occurance, as my checking account had an automatic loan feature
|
||
in case of overdraft, and I used this feature occasionally.
|
||
Negative and positive balances were therefore well known in this
|
||
account. Not so zero balances.
|
||
Soon after writing this check I attempted to withdraw
|
||
some funds using my money machine card. Unsuccessful. I
|
||
attempted to DEPOSIT money via the machine. Unsuccessful.
|
||
I talked to a person: they had no record of my account ever
|
||
having existed.
|
||
After several trips to the bank, each time up one more
|
||
level in the management hierarchy, I, the bank managers and me,
|
||
discovered the following: The bank's computer had been programmed
|
||
so that the way to delete an account was to set the balance to
|
||
zero. When I wrote my fatal zeroing check the computer promptly
|
||
forgot all about me. Only my passion for paper records, and the
|
||
bank's paper redundancy, enabled the true story to emerge and
|
||
my account to be restored.
|
||
Interestingly, no funds were ever in danger, since the
|
||
account was closed with NO MONEY in it. Nonetheless, the
|
||
inconvenience was considerable. Once the situation became
|
||
straightened out I immediately transferred my account to another
|
||
bank, writing a letter to the first bank explaining my reasons
|
||
for doing so.
|
||
|
||
--------------------------------
|
||
craig@umcp-cs
|
||
|
||
The most famous bug I've ever heard of was in the program
|
||
which caluclated the orbit for an early Mariner flight
|
||
to Venus. Someone changed a + to a - in a Fortran
|
||
program, and the spacecraft went so wildly off course
|
||
that it had to be destroyed.
|
||
|
||
--------------------------------
|
||
fred@umcp-cs
|
||
|
||
Some examples of bugs I've heard about but for which I
|
||
don't have documentation: (a) bug forced a Mercury astronaut
|
||
to fly a manual re-entry; . . .
|
||
|
||
There was something about this on the Unix-Wizards mailing list a
|
||
while back. The way I understand it, a programmer forgot that the
|
||
duration of the Mercury Capsule's orbit had been calculated in
|
||
siderial time, and left out the appropriate conversion to take into
|
||
account the rotation of the Earth beneath the capsule. By the end
|
||
of the mission the Earth had moved several hundred miles from where
|
||
it ``should'' have been according to the program in question. Sorry
|
||
I can't give you any definite references to this.
|
||
|
||
---------------------
|
||
KROVETZ@NLM-MCS
|
||
|
||
I've heard of two bugs that I think are relatively famous:
|
||
|
||
1. A bug in a FORTRAN program that controlled one of the inner planet fly-bys
|
||
(I think it was a fly-by of Mercury). The bug was caused because the
|
||
programmer inadvertently said DO 10 I=1.5 instead of DO 10 I=1,5. FORTRAN
|
||
interprets the former as "assign a value of 1.5 to the variable DO10I". I
|
||
heard that as a result the fly-by went off course and never did the fly-by!
|
||
A good case for using variable declarations.
|
||
|
||
2. I'm not sure where this error cropped up, but one of the earlier versions
|
||
of FORTRAN a programmer passed a number as an actual argument (e.g.
|
||
CALL MYPROC(2)) and within the procedure changed the formal argument.
|
||
Since FORTRAN passes arguments by reference this had the result of changing
|
||
the constant "2" to something else! Later versions of FORTRAN included a
|
||
check for changing an argument when the actual is an expression.
|
||
|
||
-------------------------------------
|
||
uvicctr!dparnas
|
||
>From jhart Thu Jun 9 13:30:48 1983
|
||
To: parnas
|
||
|
||
San Fransisco Bay Area Rapid Transit, reported in Spectrum about two
|
||
years ago. "Ghost trains", trains switched to nonexistent lines,
|
||
and best of all the rainy day syndrome.
|
||
|
||
-------------------------------------
|
||
|
||
uvicctr!uw-beaver!allegra!watmath!watarts!geo
|
||
|
||
One of my pals told me this story. One morning, when they booted,
|
||
years ago, the operators on the Math faculty's time-sharing system
|
||
set the date at December 7th, 1941 (ie Pearl Harbor). Well the
|
||
spouse of the director of the MFCF (ie Math Faculty Computing Facility)
|
||
signed on, was annoyed by this, and changed the date to the actual
|
||
date. Everyone who was signed on while this was done was charged
|
||
for thirty-something years of connect time.
|
||
I wouldn't know how to document this story.
|
||
|
||
Oh yes, didn't Donn Parker, self-proclaimed computer sleuth, call
|
||
the fuss made over UNIX and intelligent terminals some outrageous
|
||
phrase, like 'The bug of the century'? I am refering to the fuss
|
||
made over the fact that some of the terminals that berkeley had bought
|
||
were sufficiently intelligent, that they would do things on the command
|
||
of the central system. The danger was that if someone was signed
|
||
on to one of these terminals as root, an interloper could write
|
||
something to this terminal causing the terminal to silently transmit
|
||
a string back to UNIX. Potentially, this string could contain a
|
||
command line giving the interloper permissions to which they were
|
||
not entitled.
|
||
|
||
Cordially, Geo Swan, Integrated Studies, University of Waterloo
|
||
allegra!watmath!watarts!geo
|
||
|
||
----------------------------
|
||
smith@umcp-cs
|
||
|
||
John, another bug for your file. Unfortunately it is a rumor that I
|
||
I haven't tried to verify. Recall that FORTRAN was developed on the
|
||
IBM 704. One of the 704's unusual features was that core storage used
|
||
signed magnitude, the arithmatic unit used 2's complement, and the
|
||
index regesters used 1's complement. When FORTRAN was implemented
|
||
on the IBM product that replaced the 704, 7094 etc. series, the
|
||
3 way branching if went to the wrong place when testing negative zero.
|
||
(It branched negative, as opposed to branhing to zero). I heard this
|
||
rumor from Pat Eberlein (eberlein@buffalo). Supposedly, the bug wasn't
|
||
fixed (or descovered) for two years.
|
||
|
||
----------------------------
|
||
VES@KESTREL
|
||
|
||
1. In the mid 70's in a lumber provessing plant in Oregon a program was
|
||
controlling the cutting of logs into boards and beams. The program included
|
||
an algorithm for deciding the most efficient way to cut the log (in terms of
|
||
utilizing most of the wood), but also controlled the speed with which the log
|
||
was advancing.
|
||
|
||
Once the speed of a log increased to dangerous levels. All personnel
|
||
was scattered and chased out of the building, the log jumped off the track,
|
||
fortunately there were no casualties. This was caused by a software bug.
|
||
A reference to the event would be the former director of the Computer Center
|
||
at Oregon State University (prior to 1976), who at the time I heard the story
|
||
(Spring 1977) was President of the company which developed the software.
|
||
|
||
|
||
2. Abother rather amusing incident dates back to the mid 60's. It
|
||
was not caused by a software bug but is indicative of the vulnerability of
|
||
software systems particularly in those early days. It involved the Denver
|
||
office of Arizona airlines. Their reservation system was periodically getting
|
||
garbage input. Software experts were dispatched but failed to identify the
|
||
cause. Finally the software manager of the company which developed the system
|
||
went to study the problem on the spot.
|
||
|
||
After spending a week at the site he managed to identify a pattern
|
||
in the generation of garbage input: it was happening only during the shifts
|
||
of a paticular operator and only when coffie was served to her. Shortly
|
||
afterwards the cause was pinpointed. The operator was a voluminous lady
|
||
with a large belly. The coffie pot was placed behind the terminal and
|
||
when she would reach for it her belly would rest on the keyboard. Unfortuna-
|
||
tely, I don't have more exact references to that event.
|
||
|
||
----------------------------
|
||
RWK at SCRC-TENEX
|
||
|
||
There's the famous phase-of-the-moon bug which struck (I believe)
|
||
Gerry Sussman and Guy Steele, then both of MIT. It turned out to
|
||
be due to code which wrote a comment into a file of LISP forms,
|
||
that included the phase of the moon as part of the text. At certain
|
||
times of the month, it would fail, due to the comment line being
|
||
longer than the "page width"; they had failed to turn off automatic
|
||
newlines that were being generated by Maclisp when the page width
|
||
was exceeded. Thus the last part of the line would be broken onto
|
||
a new line, not proceeded with a ";" (the comment character). When
|
||
reading the file back in, an error would result.
|
||
|
||
--------------------------------
|
||
gwyn@brl-vld
|
||
|
||
An early U.S. Venus probe (Mariner?) missed its target immensely
|
||
due to a Fortran coding error of the following type:
|
||
|
||
DO 10 I=1.100
|
||
|
||
Which should have been
|
||
|
||
DO 10 I=1,100
|
||
|
||
The first form is completely legal; it default-allocates a REAL
|
||
variable DO10I and assigns 1.1 to it!
|
||
|
||
-------------------------------------
|
||
Horning.pa@PARC-MAXC
|
||
|
||
I have heard from various sources (but never seen in print) the story
|
||
that the problem with the wings of the Lockheed Electras (that caused
|
||
several fatal crashes) slipped past the stress analysis program because
|
||
of an undetected overflow. This one would probably be next to impossible
|
||
to document.
|
||
|
||
One of my favorite bugs isn't all that famous, but is instructive. In
|
||
about 1961, one of my classmates (Milton Barber) discovered that the
|
||
standard integer binary-to-decimal routine provided by Bendix for the
|
||
G-15D computer wasn't always exact, due to accumulated error from short
|
||
multiplications. This only affected about one number in 26,000, but
|
||
integer output OUGHT to be exact. The trick was to fix the problem
|
||
without using any additional drum locations or drum revoltions. This
|
||
occupied him for some time, but he finally accomplished it. His new
|
||
routine was strictly smaller and faster. But was it accurate? Milton
|
||
convinced himself by numerical analysis that it would provide the
|
||
correct answer for any number of up to seven digits (one word of BCD).
|
||
Just to be safe, he decided to test it exhaustively. So he wrote a loop
|
||
that counted in binary and in BCD, converted the binary to BCD, and
|
||
compared the results. On the G-15D this ran at something like 10 numbers
|
||
per second. For several weeks, Milton took any otherwise idle time on
|
||
the college machine, until his loop had gone from 0 to 10**7-1 without
|
||
failure. The he proudly submitted his routine to Bendix, which duly
|
||
distributed it to all users. Soon thereafter, he got a telephone call:
|
||
"Are you aware that your binary to decimal conversion routine drops the
|
||
sign on negative numbers?" This is the most exhaustive program test that
|
||
I've ever seen, yet the program failed on half its range!
|
||
|
||
-----------
|
||
Horning.pa
|
||
|
||
[Excerpts from a trip report by Dr. T. Anderson of the University of
|
||
Newcastle
|
||
upon Tyne.]
|
||
|
||
The purpose of my trip was to attend a subworking group meeting on the
|
||
production of reliable software, sponsored by NASA, chaired by John
|
||
Knight
|
||
(University of Virginia), and organized and hosted by the Research
|
||
Triangle
|
||
Institute [Nov. 3-4, 1981]. Essentially, NASA would like to know how on
|
||
earth
|
||
software can be produced which will conform to the FAA reliability
|
||
standards of
|
||
10^-9 failures/hour. Sadly, no one knew.
|
||
|
||
FRANK DONAGHE (IBM FEDERAL SYSTEMS DIVISION): PRODUCING
|
||
RELIABLE SOFTWARE FOR THE SPACE SHUTTLE
|
||
|
||
Software for the Space Shuttle consists of about 1/2 million lines of
|
||
code,
|
||
produced by a team which at its largest had about 400 members. Costs
|
||
were high
|
||
at about $400 per line. . . . Between the first and second flight 80% of
|
||
the
|
||
mdoules were changed, such that about 20% of the code was replaced.
|
||
Three
|
||
weeks prior to the second flight a bug in the flight software was
|
||
detected which
|
||
tied up the four primary computers in a tight (two instruction) loop. .
|
||
|
||
-----------
|
||
Laws@SRI-AI
|
||
|
||
Don't forget the bug that sank the Sheffield in the Argentine
|
||
war. The shipboard computer had been programmed to ignore
|
||
Exocet missiles as "friendly." I might be able to dig up
|
||
an IEEE Spectrum reference, but it is not a particularly
|
||
good source to cite. The bug has been widely reported in
|
||
the news media, and I assume that Time and Newsweek must
|
||
have mentioned it.
|
||
|
||
I'll try to find the reference, but I must issue a disclaimer:
|
||
some of my SRI associates who monitor such things more closely
|
||
than I (but still without inside information) are very suspicious
|
||
of the "bug" explanation. "Computer error" is a very easy way
|
||
to take the heat off, and to cover what may have been a tactical
|
||
error (e.g., turning off the radar to permit some other communication
|
||
device to function) or a more serious flaw in the ship's defensive
|
||
capability.
|
||
|
||
-------
|
||
PARK@SRI-AI
|
||
|
||
>From International Defense Review and New Scientist after the
|
||
Falklands war ...
|
||
|
||
The radar system on the Sheffield that didn't report the incoming Exocet
|
||
missile because it wasn't on the list of missiles that it expected a Russian
|
||
ship to use.
|
||
|
||
The missiles fired over the heads of British troops on the Falklands beaches
|
||
at the Argentinians, that could have gone off if they had detected enough
|
||
metal below them (probably not really software).
|
||
|
||
The missile that was guided by a person watching a tv picture of the missile
|
||
from a shipboard camera. A flare on the tail of the missile intended to
|
||
make the missile more visible to the camera tended to obscure the target.
|
||
A hasty software mod made the missile fly 20 feet higher (lower?) so that
|
||
the operator could see the target.
|
||
|
||
A more general consideration is that in the event of an electromagnetic
|
||
pulse's deactivating large numbers of electronic systems, one would prefer
|
||
that systems like missiles in the air fail safe.
|
||
|
||
--------------------------------
|
||
Laws@SRI-AI
|
||
|
||
I have scanned Spectrum's letters column since the original
|
||
Exocet mention in Oct. '82, but have not found any mention
|
||
of the bug. Perhaps I read the item on the AP newswire or saw
|
||
a newspaper column posted on a bulletin board here at SRI.
|
||
Possibly Garvey@SRI-AI could give you a reference. Sorry.
|
||
|
||
--------------------
|
||
Garvey@SRI-AI
|
||
Subject: Re: Falkland Islands Bug
|
||
|
||
The original source (as far as I know) was an article in
|
||
New Scientist (a British magazine) on 10 Feb 83. It suggested that
|
||
the Exocet was detected by the Sheffield's ESM gear, but catalogued as
|
||
a friendly (!) missile, so no action was taken. I have two, strictly
|
||
personal (i.e., totally unsubstantiated by any facts or information
|
||
whatsoever) thoughts about this:
|
||
1) I suspect the article is a bit of disinformation to
|
||
cover up other failings in the overall system; from bits and pieces
|
||
and rumors, I would give top billing to the possibility of poor
|
||
spectrum management as the culprit;
|
||
2) I wouldn't care if the missile had the Union Jack on
|
||
the nose and were humming Hail Brittania, if it were headed for me, I
|
||
would classify it as hostile!
|
||
|
||
______________
|
||
|
||
PALMER@SCRC-TENEX
|
||
|
||
I was working for a major large computer manufacturer {not the one
|
||
that employees me today}. One of the projects I handled was an RPG
|
||
compiler that was targeted to supporting customers that were used to
|
||
IBM System III systems. There had been complaints about speed
|
||
problems from the field on RPG programs that used a table lookup
|
||
instruction. The computer supporting the compiler had excellent
|
||
micorocde features: we decided to take advantage of the feature by
|
||
microcoding the capability into the basic machine.
|
||
|
||
The feature was pretty obvious: it would search for things in ordered
|
||
or unordered tables and do various things depending on whether the key
|
||
was in the table.
|
||
|
||
We made what we considered to be the "obvious" optimization in the
|
||
case of ordered tables - we performed a binary search. Nothing could
|
||
be faster given the way the tables were organized. We wrote the code,
|
||
tested it on our own test cases and some field examples and got
|
||
performance improvements exceeding sales requirements. It was an
|
||
important fix...
|
||
|
||
Unfortunatly, it was wrong. It isn't clear what "it" was in this case
|
||
- Us or IBM. People loaded their tables in the machine with each run
|
||
of an RPG program and they often wouldn't bother to keep their ordered
|
||
tables ordered. IBM didn't care - it ignored the specification {most
|
||
of the time}. Our code would break when people gave us bad data in
|
||
ways that IBM's wouldn't. We had to fix ours.
|
||
|
||
---------------------------------
|
||
Olmstread.PA@PARC-MAXC
|
||
|
||
I can't supply any documentation, but I was told when I was in school with
|
||
Ron Lachman (you might want to check with him at LAI -- laidback!ron, I think)
|
||
that IBM had a bug in its program IEFBR14.
|
||
This program's sole job was to return (via BR 14, a branch
|
||
through register 14, the return register); it was used by JCL (shudder!)
|
||
procedures which allocated file space and needed a program, any program, to
|
||
run. It was a one-line program with a bug: it failed to clear the return
|
||
code register (R 15, I think). I submit you won't find any programs with
|
||
a higher bugs-per-instruction percentage.
|
||
|
||
----------------------
|
||
hoey@NRL-AIC
|
||
|
||
I got this at MIT....
|
||
|
||
From: ihnp4!zehntel!zinfandel!berry@ucb-vax
|
||
|
||
In the April 1980 issue of ACM SIGSOFT Software Engineering Notes,
|
||
editor Peter G. Neumann (NEUMANN@SRI-KL at that time) relays information that
|
||
Earl Boebert got from Mark Groves (OSD R&E) regarding bugs in
|
||
the software of the F-16 fighter. Apparently a problem in the navigation
|
||
software inverted the aircraft whenever it corssed the equator. Luckily it
|
||
was caught early in simulation testing and promptly fixed.
|
||
In the July issue, J.N. Frisina at Singer-Kearfott wrote to Mr. Neumann,
|
||
"concerned that readers might have mistakenly believed there was a bug in the
|
||
flight software, which was of course not the case." [At least they fixed THAT
|
||
one. Wasn't it Hoare who said that acceptance testing is just an unsuccessful
|
||
attempt to find bugs?] Mr. Frisina wrote:
|
||
"In the current search for reliable software, the F16 Navigation
|
||
software is an example of the high degree of reliability and quality
|
||
that can be obtained with the application of proper design verification
|
||
and testing methodologies. All primary misison functions were software
|
||
correct."
|
||
In the April '81 Issue it is revealed that the F18 range of control travel
|
||
limits imposed by the F18 software are based on assumptions about the
|
||
inability of the aircraft to get into certain attitudes. Well, some of these
|
||
'forbidden' attitudes are in fact attainable. Apparently so much effort had
|
||
gone design and testing of the software that it is now preferable to
|
||
modify the aircraft to fit the software, rather than vice-versa!
|
||
|
||
-------
|
||
Cantone@nrl-aic
|
||
|
||
I've heard from Prof. Martin Davis, a logician at NYU, that Turing's
|
||
Ph.D. thesis was just filled with bugs. His thesis was a theoretical
|
||
description of his Turing machine that included sample computer
|
||
programs for it. It was these programs that were filled with bugs.
|
||
Without computers there was no way to check them.
|
||
(Those programs could have worked with only minor fixes).
|
||
|
||
[NOTE: I called Davis, who gave as a reference a paper by Emile Post on
|
||
recursive unsolvability that appeared in 1947-8 in the Journal of
|
||
Symbolic Logic -- js]
|
||
|
||
----------------------
|
||
David.Smith@CMU-CS-IUS
|
||
|
||
In simulation tests between the first and second Shuttle flights, a bug
|
||
was found in the onboard computer software, which could have resulted in
|
||
the premature jettison of ONE of the SRB's. That would have been the
|
||
world's most expensive pinwheel!
|
||
|
||
I read this in Aviation Week, but that leaves a lot of issues to scan.
|
||
|
||
------------------------------
|
||
butkiewi@nrl-css
|
||
|
||
In response to your request for info on bugs, here's one.
|
||
We work with some collection system software that was initially
|
||
deployed in 1974. Part of the system calculated the times of
|
||
upcoming events. In 1976, on February 29th, some parts of the system
|
||
thought it was a different julian day and it basically broke the
|
||
whole system. Several subroutines needed leap year fixes. One
|
||
software engineer was called in from home and worked all night on
|
||
that one. How many sound Software Engineering Principles were
|
||
violated?
|
||
|
||
----------------------------
|
||
|
||
Stachour.CSCswtec@HI-MULTICS
|
||
|
||
while i cannot cite any published documentation, and this could hardly
|
||
qualify as a famous bug, a mail-system I once worked on (which ran with
|
||
priviledge to write into any-one mailboxes) was discovered to have an
|
||
incorrect check for message-length froma message coming from a file.
|
||
The result was that a 'specially prepared msg' could arrange to overlay
|
||
the test of the mail-system, and especilly restore the system
|
||
'change-access' program on top of the mail-system, which then gave the
|
||
caller power to change access controls on any file of the system.
|
||
This was for a university-built mail-system for a HOneywell GCOS3
|
||
ciria 1976.
|
||
|
||
--------------------
|
||
Sibert@MIT-MULTICS
|
||
|
||
I imagine you've heard about this already, and, if not, I can't provide
|
||
any documentation, but anyway: it is said that the first Mariner space
|
||
probe, Mariner 1, ended up in the Atlantic instead of around Venus
|
||
because someone omitted a comma in a guidance program.
|
||
|
||
--------------------------
|
||
Kyle.wbst@PARC-MAXC
|
||
|
||
TRW made a satellite in the late '50's or early '60's with the feature
|
||
that it could power down into a stand-by mode to conserve electrical
|
||
consumption. On the first pass over the cape (after successful orbital
|
||
check out of all systems), the ground crew transmitted the command to
|
||
power down. On the next pass, they transmitted the command to power up
|
||
and nothing happened because the software/hardware system on board the
|
||
satellite shut EVERYTHING down (including the ground command radio
|
||
receiver).
|
||
|
||
----------------------------
|
||
Hoffman.es@PARC-MAXC
|
||
|
||
>From the AP story carried on Page 1 of today's Los Angeles Times:
|
||
|
||
"Jet Engine Failure Tied to Computer: It's Too Efficient
|
||
|
||
The efficiency of a computer aboard a United Airlines 767 jet may have
|
||
led to the failure of both of the plane's engines, forcing the aircraft
|
||
into a four-minute powerless glide on its approach to Denver, federal
|
||
officials said Tuesday.
|
||
. . .
|
||
[The National Transportation Safety Board's] investigation has disclosed
|
||
that the overheating problem stemmed from the accumulation of ice on the
|
||
engines.
|
||
. . .
|
||
[I]t is believed that the ice built up because the onboard computer had
|
||
the aircraft operating so efficiently during the gradual descent that
|
||
the engines were not running fast enough to keep the ice from forming.
|
||
. . .
|
||
The incident raised questions among aviation safety experts about the
|
||
operation of the highly computerized new generation of jetliners that
|
||
are extremely fuel-efficient because of their design and computerized
|
||
systems.
|
||
"The question is at what point should you averride the computer," one
|
||
source close to the inquiry said.
|
||
. . .
|
||
[T]he engines normally would have been running fast enough to keep the
|
||
ice from forming. In the case of Flight 310, investigators believe, the
|
||
computer slowed the engine to a rate that conserved the maximum amount
|
||
of fuel but was too slow to prevent icing.
|
||
A key question, one source said, is whether the computer-controlled
|
||
descent might have kept the flight crew from recognizing the potential
|
||
icing problem. Airline pilots for some time have complained that the
|
||
highly computerized cockpits on the new jets -- such as the 767,
|
||
Boeing's 757 and the Airbus 310 -- may make pilots less attentive.
|
||
. . .
|
||
|
||
__________________
|
||
|
||
Kaehler.pa@parc-maxc
|
||
Info-Kermit@COLUMBIA-20
|
||
|
||
On Wednesday, August 24, at 11:53:51-EDT, KERMIT-20 stopped working on
|
||
many TOPS-20 systems. The symptom was that after a certain number of
|
||
seconds (KERMIT-20's timeout interval), the retry count would start
|
||
climbing rapidly, and then the transfer would hang. The problem turns
|
||
out to be a "time bomb" in TOPS-20. Under certain conditions (i.e. on
|
||
certain dates, provided the system has been up more than a certain
|
||
number of hours), the timer interrupt facility stops working properly.
|
||
If KERMIT-20 has stopped working on your system, just reload the
|
||
system and the problem will go away. Meanwhile, the systems people at
|
||
Columbia have developed a fix for the offending code in the monitor
|
||
and have distributed it to the TOPS-20 managers on the ARPAnet.
|
||
|
||
The problem is also apparent in any other TOPS-20 facility that uses
|
||
timers, such as the Exec autologout code.
|
||
|
||
The time bomb next goes off on October 27, 1985, at 19:34:06-EST.
|
||
|
||
-----------
|
||
-----------
|
||
|
||
Craig.Everhart@CMU-CS-A has put together a long file of stories
|
||
that were gathered as the result of a request for interesting,
|
||
humorous, and socially relevant system problems. They make fun
|
||
reading, but most of them aren't really bugs, let alone well-docmented
|
||
ones, so I decided not to include them. The file, however, can be
|
||
obtained from Craig.
|
||
|
||
(end-of-message)
|
||
|
||
|
||
|
||
|
||
-Bob (Krovetz@NLM-MCS)
|
||
|
||
|