textfiles/humor/COMPUTER/horror.story

934 lines
47 KiB
Gherkin
Raw Permalink Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

****** HACKING HORROR STORIES ******
These are the responses to my ARPANET post of 27 October 82. If reading
any of the stories below inspires you to send your own, please do! I will
continue to update this file as long as macabre tales of men and their
machines continue to come in.
- Brad
----Message 11 (1564 chrs) is----
Mail-From: local user X400BC70 at 27-Oct-82 19:26:55-EDT
Date: 27 October 1982 1917-EDT
From: Bob Colwell at CMU-10A
Subject: Re: Hacking horror stories
To: Brad.Allen@CMU-10A
Brad:
When I was finishing my Master's here at CMU, we were using a
PDP-11/45 that was showing incipient senility. One week before the final
demo, the RT-11 monitor stopped powering up properly and instead took to
halting the machine at some incredibly non-obvious spot.
This was not acceptable performance, so we scratched our heads
faster and faster for about two days trying to fix it. Finally, in
desperation, we single-stepped the RT-11 boot sequence, and found that it
was doing a memory check that it believed was failing. It then tried to
jump to a "memory check failed" diagnostic that it expected to find in
memory, which of course was not there. What was there, however, was a
random collection of bits that just happened to look like a jump to the
original totally bogus location that we could see on the lights of the
front panel. (Incidentally, we could read and write the supposedly bad
memory location using the front panel). The solution? We powered up the
machine with the halt switch asserted. Then we loaded in a "Return from
Interrupt" instruction where the random bit collection was. Presto.
By the way, until this problem occurred, we were competing for use of the
11/45 with two other groups of students. Since they all gave up when
this difficulty hit, we had sole use of the machine until it got officially
fixed.
Bob
----Message 12 (993 chrs) is----
Mail-From: ARPANET host USC-ISIB received by CMU-10A at 27-Oct-82 20:08:14-EDT
Date: 27 Oct 1982 1708-PDT
From: Dave Dyer <DDYER at USC-ISIB>
Subject: horrors
To: allen at CMU-10A
On a tops-10 system I was responsible for, I made a typo installing
a bug fix to the monitor's file system code. The result was that for
several days (until the file system began seriously degrading) a randomly
selected physical block of the disk was written with a copy of the
retrieval information for the system's accounting files.
Another, we had installed a new memory box, which unknon to us
was responding with the wrong word once in 10^8 or so operations.
We ran with this flake for about a month before the bit decay was
tracked down to the culprit. At that point, EVERYTHING that had
been done during the bad time was "possibly" damaged, and quite a few
were in fact damaged. It took about a year before the last artifacts
of that episode were filtered out.
-------
----Message 13 (857 chrs) is----
Mail-From: ARPANET host MIT-ML received by CMU-10A at 27-Oct-82 20:37:13-EDT
Date: 27 October 1982 20:40-EDT
From: Peter Szolovits <PSZ at MIT-ML>
Subject: Hacking horror stories
To: Brad.Allen at CMU-10A
cc: PSZ at MIT-ML
My first paying programming job was to convert some FORTRAN programs
from the 7094 to an IBM 360 in 1966 at UCLA. Some of these were
unbelievably hairy (doing memory management within Fortran, character
manipulation before there were characters in Fortran, etc.) and obscure
(some of the code was in fact Fortran II code that first needed
conversion to Fortran IV). The real horror was that my predecessor had
been taken away by the men in the white coats, and lived in a mental
hospital; so there really was no way to get any additional info on much
of this code, and I had a graphic example of where my job led.
----Message 14 (2082 chrs) is----
Mail-From: CMUFTP host CMU-CS-VLSI received by CMU-10A at 27-Oct-82 20:44:03-EDT
Date: 27 Oct 1982 20:30-EDT
From: James.Gosling at CMU-CS-VLSI at CMU-10A
Subject: Re: Hacking horror stories
To: Brad.Allen at CMU-10A
Message-Id: <82/10/27 2030.262@CMU-CS-VLSI>
Several years ago I was doing some development work on a compiler for a
language like Pascal. And like most Pascal implementations, the
compiler was written in the same language and was used to compile
itself. It was broken into many modules. To make a change to the
compiler I would just recompile the affected module and link it back in
with the rest of the modules. At some point, I took one of these test
versions of the compiler and replaced the production compiler with it
-- it seemed to be just fine. In fact, it was fine for quite a while.
So long that this new version got onto the backups and all of the
backups of the production compiler were lost. There was also the
problem that the old production compiler couldn't have compiled the new
compiler anyway, since the language had changed quite a lot. Well...
In one of the modules that had never been through the new compiler was
a piece of code that tickled a bug in the code generator. The bug was
a cooperative one between one of the new pieces of code and one of the
old one. What I ended up with was a compiler which I couldn't
recompile because fixing the bug involved compiling a module that
tickled the bug. Because of the circularity in the compiler (that it
compiled itself) I was up the proverbial creek without a paddle. There
was no way that I could recompile or shuffle anything to fix the beast.
All backups were either of the broken compiler or had been overwritten.
The solution was incredibly messy: I spent a long time doing intensive
octal surgery on the object modules that I had. This was made very
difficult because there was essentially no information left around to
correlate program text to compiled code and because the bug caused bad
code to be generated in many places.
James.
----Message 15 (1169 chrs) is----
Mail-From: ARPANET host MIT-XX received by CMU-10A at 27-Oct-82 22:29:30-EDT
Date: 27 Oct 1982 2231-EDT
From: Larry Seiler <SEILER at MIT-XX>
Subject: Bug fix horror story
To: Allen at CMU-10A
cc: Seiler at MIT-XX
Maybe this is not quite what you have in mind, but in case it is...
My most painful bug was a simple uninitialized variable (I had moved
the initialization statement to a position after the first reference).
This variable was a pointer, and its position in the call stack just
happened to contain an address in code space. So running the program
caused certain instructions in a different procedure to be changed
into noops, with bizarre results. Loading the debugger caused the
program to work correctly, by tranferring the target of the modification
into an unused part of the debugger (I think). Even after I discarded my
innocent assumption that the code I wrote was the code that was being
executed, I still had to guess what routine was writing to code space
(and by what mechanism). Total time required to fix the bug: 8 hours.
How embarrassing. Why am I telling you this? Well, why not?
Larry Seiler
-------
----Message 17 (1759 chrs) is----
Mail-From: ARPANET host Utah-20 received by CMU-10A at 28-Oct-82 02:11:44-EDT
Date: 28 Oct 1982 0012-MDT
From: JW-Peterson at UTAH-20 (John W. Peterson)
Subject: Re: Hacking horror stories
To: Brad.Allen at CMU-10A
cc: JW-Peterson at UTAH-20
In-Reply-To: Your message of 27-Oct-82 1516-MDT
In trying to learn the graphics/animation biz, I've run into a few. In
making some films this summer I wound up working strictly at night, to
help prevent any light from entering the room. The filming had to be
completed entirly over the weekend, so it would interfere with normal
bussiness activity (like turning the lights on...). Worse yet the
old Bolex I was using had no way for the computer to trip it's shutter,
so I had to manually press the cable release every time the computer rang
the terminal bell; for several hours at a strech.
Some other animation stories: Before color graphics CRT's & framebuffers
were invented, the poor filmmaker had to sleep next to the camera. When
the bell rang, he would wake up, change the color filter wheel to the
next primary color, backwind the film all the way, and go back to sleep...
Perhaps best of all is Jim Blinn's "Korean Janitor" movie. During the
creation of the DNA sequences for "Cosmos", they decided to let the
camera run over night, with the computer tripping it every several
seconds. So the locked up the room and put a big "Filming in process:
Do Not Enter" sign on the door. Unfortunatly, the Korean janitor could
not read the english sign but DID have a pass key. The resulting film
shows a DNA molecule twisting in space, a flood of light, and then a
jerkey sequence of the janitor cleaning the room at 200mph, seen as a
reflection in the screen.
jp
-------
----Message 19 (1595 chrs) is----
Mail-From: ARPANET host MIT-XX received by CMU-10A at 28-Oct-82 10:50:59-EDT
Date: 28 Oct 1982 1054-EDT
From: Geoffrey H. Cooper <GEOF at MIT-XX>
Subject: Re: Hacking horror stories
To: Brad.Allen at CMU-10A
cc: geof at MIT-XX
In-Reply-To: Your message of 27-Oct-82 1716-EDT
This is our favorite "what happens when people are taught higher level
models before the lower level ones" story. I get this second hand,
so some of the details might be a little off. It may not be of the sort
you had in mind, but it's amusing enough to bear repeating anyway.
Around here, we teach a course in software engineering in which the
students are taught and write programs in CLU (a language which lets user
defined abstractions work the same way that the language defined ones do).
One common final project for the course involved writing an assembler in
CLU. The problem statement required that numbers be input and output in
octal, rather than decimal.
Most of the students, I am told, defined an OCTAL abstraction, with all the
normal integer arthmetic operations, and with Parse and Unparse operations
that converted strings into OCTAL's and back again.
This was implemented by representing an OCTAL as an array of integers, each
of which represented an octal digit. The arithmetic operations simulated
octal arithmetic on this representation. None of the students was
apparently aware that the normal integer data abstraction that they had
been using was really just stored as bits, which were more easily converted
to octal than decimal.
-Geof Cooper
-------
----Message 20 (1069 chrs) is----
Mail-From: ARPANET host CMU-20C received by CMU-10A at 28-Oct-82 10:57:26-EDT
Date: Thursday, 28 October 1982 10:57-EDT
From: Jon Webb <Webb at Cmu-20c>
To: Brad.Allen at CMU-10A
Cc: webb at CMU-20C
Subject: Hacking horror stories
Well, here it is: I was working as an undergraduate programmer at my
undergraduate university, and I basically had the run of the
time-sharing user interface (it was TSO, on an IBM 360/65). I decided
it would be nice if you could edit lines you'd typed, like the facility
in the C-shell on unix except more primitive. Well, it was a pretty
trivial change to allow this, but unfortunately to be effective the
change had to be installed in the system, I couldn't test it in advance.
So I installed it one night, and TSO wouldn't work anymore. Very
embarassing, especially as the backup method I thought would work
didn't. In fact one of the systems programmers had to be called in to
fix the system, in the middle of the night. I gave up on editting in
TSO. This is an argument for personal computers.
Jon
----Message 21 (910 chrs) is----
Mail-From: ARPANET host UCB-C70 received by CMU-10A at 28-Oct-82 11:57:13-EDT
Date: 28 Oct 1982 08:55:57-PDT
From: CSVAX.bitar@Berkeley
To: Brad.Allen@CMU-10A
Subject: Hacking horror story
I was working late one night developing a file under the Unix operating
system. I was in a hurry at one point, and wanting to rename the file,
I executed the unix move cmd.
A moment later Unix complained of indigestion,
and I noticed that instead of typing 'mv oldname newname', which
is Unix's way of renaming a file, I had typed 'rm oldname newname'.
So Unix had executed 'rm oldname', then run into newname and vomited.
I nearly did the same.
Fortunately I did have a backup copy of the file, which I subsequently
re-editted, bringing it up to date.
After that incident, though, I was very careful about slight cognitive
mistakes, such as thinking 'move' (mv) and typing 'rm' (remove) instead.
----Message 22 (1801 chrs) is----
Mail-From: local user C410RF60 at 28-Oct-82 12:06:03-EDT
Date: 28 October 1982 1155-EDT
From: Robert Frederking at CMU-10A
Subject: Re: Hacking horror stories
To: Brad.Allen@CMU-10A
Yourdon's book on software engineering has a few of these. Most of
my really horrible experiences happened due to politics or manufacturer's
screw-ups.
(Example of first): CWRU was building a network, and had to pick between
DEC and Harris computers (Harris one because one of their VPs was a trustee
at CWRU - they were clearly inferior machines). Besides teaching their staff
how to program, we had to constantly show them that feature X was broken, and
how to fix it. The project finally collapsed due to their crufty machines.
The operating system was *not* virtual memory (altho user space was), and while
adding networking software to their OS, they ran out of room. "Sorry".
(Example of second): in trying to microprogram Intel's hack-of-a-
bit-slice-machine, you had to fit your instructions into a 2-dimensional address
space! Some instructions could only branch in rows, others only in columns,
yet others only to specific clusters of locations. It was clearly a hack
to cover running out of instruction bits. They even had to sell a program
designed to find a fit for your microcode to the available space (I think
the problem is NP-complete - 2d bin packing).
The best example is the interupt disable instruction on the 6800.
If the least significant bit of the *preceding* instruction is 1, the whole
processor hangs when you try to disable the interupt. Also, some of the
illegal opcodes (which aren't masked out) will cause the processor to hang
so badly, it can't be reset. You have to turn it off, and wait for the
dynamic RAM register to fade out!
Bob
----Message 24 (1536 chrs) is----
Mail-From: CMUFTP host CMU-CS-Speech received by CMU-10A at 28-Oct-82 14:51:46-EDT
Date: 28 Oct 1982 14:47:27-EDT
From: David.Cunnius at CMU-CS-SPEECH at CMU-10A
To: Allen@CMU-10A
Subject: Hacking horrors
The old 15-311, Software Engineering Methods, will probably be one of
the more fertile sources of horror stories. The semester I took this course,
Spring '80, one of the tasks was a database implementation for a science-
fiction wargame. Looking back now, I think our project group was doomed from
the start. Of the original five-man team, one dropped the course before anyone
else even met him, one had to take some time off to deal with a family crisis
around mid-term, and one simply disappeared for a period of three weeks, coming
back without even a memory of where he'd been. Despite all that, we did get
something together for the final demo. We were using a modular design and had
divided the task into thirteen subtasks. At the demo, four of the thirteen
modules worked properly, two that had tested out perfectly the previous day
didn't work at all at the demo, and most of the other seven hadn't even been
coded yet. Of the four modules that worked, the most impressive one was the
display package; unfortunately, that was also the only module which was
optional in the original specification. Two of the members of the group
somehow managed to pull 'D's as our final grade; to this day I haven't had the
nerve to ask the other two what their grades were.
Dave Cunnius (dac@CMU-CS-Speech)
----Message 25 (2873 chrs) is----
Mail-From: ARPANET host Washington received by CMU-10A at 28-Oct-82 16:18:31-EDT
Date: 28 Oct 1982 1318-PDT
From: Bob Bandes <JUGGLE at WASHINGTON>
Subject: Re: Hacking horror stories
To: Brad.Allen at CMU-10A
In-Reply-To: Your message of 27-Oct-82 1416-PDT
As a senior project when I was going to school at UC Santa Cruz,
I put together a real-time voice controlled operating system.
The entire thing was written in assembly language on a PDP-11/32
running RT11. Since this was a single user system with a fixed
disk, it was necessary to make a tape backup at the end of every
session.
Well, after one particularly furious day of hacking, I decided to
write my backup tape and go home for the day. My normal procedure
was to mount my backup tape and use ROLLIN to copy an entire
disk-image to the tape. Unbeknownst to me, the procedure that
I used had the effect of first initializing the tape before
making the backup.
This had always worked just fine. But on this particular day, I
had been working on my disk I/O routines and apparently had somehow
managed to write garbage on some unknown portion of the disk.
I had no idea that anything was wrong as I went to make my backup
tape. As usual, first the tape was initialized, then, as ROLLIN
began to write the disk image, the program hung! There I was
with no backup tape and having major problems making a backup.
My next move was to panic. After settling down somewhat, I tried
rebooting the operating system and making the backup again.
Still the same problem. Then I remembered about the DECtape
drive on the machine. If I could only find a DECtape and manage
to individually tranfer the files that I needed I would be home
free.
I ran over to the cabinets and began frantically looking for DECtapes.
AHA! I found one! As I ran back over to the computer, I took a
bounding step and landed on the side of my ankle. I proceeded
to lie on the floor writhing and screaming in agony for the next
fifteen minutes. "This just isn't my day," I was saying to myself.
When the pain began to subside I tried to get up. I couldn't walk
on the ankle since it hurt so much. So I hopped over to the DECtape
drive and mounted the DECtape. Then I hopped over to console and
sat down.
At least something went right that day, as the machine allowed me
(without hanging) to individually transfer all my files to DECtape.
I then read a clean version of the operating system onto the disk
and proceeded to tranfer all of my files from DECtape back onto
the disk. This time all went normally with the magtape backup and
the world was safe again for future hacking.
Fortunately my ankle wasn't broken. It was only severly sprained.
For the next few weeks I was forced to do my hacking with an
ace-bandage wrapped around my ankle.
--Bob Bandes
-------
----Message 29 (721 chrs) is----
Mail-From: ARPANET host UCB-C70 received by CMU-10A at 28-Oct-82 23:30:51-EDT
Date: 28 Oct 1982 20:26:51-PDT
From: Kim.norvig@Berkeley
To: brad.allen@cmu-10a
Subject: Re: Hacking horror stories
Lucky for me, most of the stories I remember are happy ones, not horror stories.
My favorite story about someone else is when Jim Meehan was writing TALESPIN,
his AI program that generated stories, mostly about birds and bears
running around the forest. One story started off fine, then started to
slow down, and finally ended with the line
Joe Bear thinks that FREE STORAGE IS EXHAUSTED
Oh well, @b(I) thought it was cute.
Can I be put on the mailing list to see your collection of anecdotes?
program to
----Message 33 (1413 chrs) is----
Mail-From: ARPANET host MIT-MC received by CMU-10A at 30-Oct-82 16:38:45-EDT
Date: 30 Oct 1982 1635-EDT
From: RG.JMTURN at MIT-OZ at MIT-MC
Subject: Re: Hacking horror stories
To: Brad.Allen at CMU-10A
In-Reply-To: Your message of 27-Oct-82 1832-EDT
The experience that still makes my skin crawl is the time I was debugging
some Lisp Machine board at the MIT AI lab. I had spent several hours trying
to isolate a noisy signal which seemed to be tied to another one, but I could
not find a common wire and I had replaced all the common chips. In desperation,
I pulled out the the board and yanked the extender, about to give up hope.
As I stared down at the extender, I muttered some curse to the designers
of the machine...and noticed a solder splash on the extender shorting two
lines! For ghu's sake, if you can't trust your tools, what can you trust.
On the other hand, for an example of the other extreme, this week, I was
in Montreal doing an installation for Lisp Machine, Inc. A crufty Bus
Interface seemed to be making the machine go 1/2 speed, and sometimes
fail entirely. The person I was working with and I decided to call it a day
around 5, and go to our hotel. When we came back the next morning, the machine
worked perfectly. The best we can figure it, the machine wanted us to be
able to have a night in Montreal, and the afternoon the next day...
JAmes
-------
----Message 38 (1003 chrs) is----
Mail-From: ARPANET host UCB-C70 received by CMU-10A at 1-Nov-82 23:02:54-EST
Date: 30 Oct 1982 03:44:28-PDT
From: CSVAX.fishkin@Berkeley
To: Allen@CMU-10A
Subject: painful hacks
Hi there,
My name is Ken Fishkin, and I'm a grad at Berkeley. My most painful hack
occured while hacking a 6K line C database program at the University
of Wisconsin-Madison as an undergrad. My program worked perfectly, with
all debug prints on. When I set my 'const' debug to false, however,
the program would crash! To make things even more fun, if I deleted
1 debug print the program would still run correctly, but if I deleted
another instead it wouldn't! I wound up doing a sort of tree traversal,
individually deleting some 200! debug prints individually, finding the
proper sequence of delete-compile-delete that would keep my program
intact. To this day, I still have no idea what was wrong with the
program.
If possible, could you mail me your final collection of
horrible hacks?
Ken
----Message 40 (1981 chrs) is----
Mail-From: ARPANET host CMU-20C received by CMU-10A at 2-Nov-82 11:29:15-EST
Date: 2 Nov 1982 1128-EST
From: MASON at CMU-20C
Subject: horror stories
To: brad.allen at CMU-10A
Many roboticists have reported the following demo problem: when
filming or demonstrating, we often raise venetian blinds, turn on
the lights, or bring in floods. The increase in ambient light
may cause optical-interrupt type sensors on the robot to stop
functioning, and the heat from floods may affect other components
of the system. Thus a system which has functioned flawlessly for
months begins to malfunction the very minute the generals arrive.
Real-time programming has its special frustrations, but the most
difficult bugs arise from difficulties in the timing of process
interactions. Most of these are too complicated to make good stories.
One of the most confusing PDP11 bugs I had may be worth telling.
When a byte is pushed onto the stack, the stack pointer is first
incremented to keep the pointer at word boundaries. Hence the
odd byte is garbage, left over from no-longer-active stack frames.
I had a program which pushed a byte, but popped a word, thus accessing
this garbage. Even careful inspection of the code didn't turn up
this violation of stack discipline. The worst part is that the
manifestation of the bug would vary depending on which process last
used the stack. In particular, the bug became invisible when
single-stepping with our symbolic debugger---the debugger (im)providentially
cleared the relevant byte in the act of saving some registers.
This reminds me of another PDP11 bug. Our 11/40 had a micro-code
error. The SOB instruction (subract one and branch, used for simple loops)
didn't test the TRAP bit, which is used by debuggers for single-stepping.
Hence, when single-stepping, the programmer was not shown the instruction
following the SOB. It was executed "in secret", with very confusing results.
-------
----Message 32 (621 chrs) is----
Mail-From: ARPANET host MIT-XX received by CMU-10A at 3-Nov-82 15:20:23-EST
Date: 2 Nov 1982 17:19:35-EST
From: jfw at mit-vax at mit-xx
To: allen@cmu-10a
Subject: Programming horror stories
Two summers ago, while I was working on an improvement to our UNIX at LL-ASG,
I fired up a test version a little too fast, and watched with puzzlement as
the filesystem check program started printing out random things. I wound up
killing a 100Mb filesystem full of useful things. After 2 weeks of poring over
the code I wrote which did that, I found the bug: " = " instead of " |= ".
One character did all that...
----Message 37 (1934 chrs) is----
Mail-From: local user C410MS40 at 4-Nov-82 00:37:41-EST
Date: 4 November 1982 0036-EST (Thursday)
From: Mark.Sherman at CMU-10A
To: Brad.Allen at CMU-10A
Subject: Re: Hacking Horror Stories
Message-Id: <04Nov82 003626 MS40@CMU-10A>
As an undergrad I worked as a systems staff on a time sharing system
that resembled Multics (called DSL/TSS - think of it as Unix on HP21
series machines). On such systems, the login program is like any other
program; when a user sits down he "calls" this program from a
predefined file system path to gain access to the system. For some
unrememberable reason, I had to make some modifications to this
program, did so, and installed the new version. The only real way to
try this program out was to log out and then log back in. Having logged
out, I tried to log back in. To my chagrin, I had accidently set the
protection on the new login program to read instead of its normal
read-execute. Thus the system refused to run the login program. By
S.O.P., this would not be a problem - when doing such a drastic change,
we always made sure that at least one other systems programmer was
logged in so that he could patch anything that was necessary, like
changing access control on the login program. Before my attempt to
change the login program, there were two other systems programmers
logged in. After my mistake, I walked over to the two other staff
people only to find that they had both logged out - after all each knew
that the other was logged in and so saw no reason to stay on as the
"protection". Thus there was no way to log into the system and no way
to patch it while it ran. We had to move the system to a spare disk,
boot a backup system, bring up the extra disk with the file system
containing the bogus protection as a "raw" disk and use a special disk
utility to set the one necessary bit giving execute access to the login
program.
Mark
----Message 38 (3657 chrs) is----
Mail-From: ARPANET host CMU-20C received by CMU-10A at 4-Nov-82 01:40:45-EST
Date: Thursday, 4 November 1982 01:39-EST
From: Skef Wholey <Wholey at CMU-20C>
To: Brad.Allen at CMU-10A
Subject: Horrorful horrors
CMU's 15-311 is indeed a source of horrors, and I experienced a rather horrible
in that class last year. There were five of us in our group, which we called
"SPAM", each of us competent hackers. Our project was a 68000 simulator and
debugger, which would run 68000 machine code and let you look at registers and
memory and so forth. Our work progressed on schedule (with the aid of many
all-nighters), and we were able to run simple assembly language programs just
about a week before the demo.
Being a rather noisy bunch, wanting our demo to be as slick as possible, we
decided that we'd run a backgammon program written in C compiled with cc68. We
had used small programs compiled with cc68 to test the simulator. The programs
were small enough to compile and assemble on a Vax, print the hex object code,
and type it into file which we would load into our simulator. The backgammon
program was too large for this, obviously, so the object code was FTP'ed to
another machine, put on tape, and brought to the Computation Center, where we
pulled it off of tape and loaded it into our simulator. The program didn't
work. It didn't work the day before the demo.
We found a few bugs in our simulator, but worst yet we found bugs in the cc68
compiler, now N machines away. Fixing these we found bugs in the game playing
program itself. Compiling the program on the Vax and transporting the object
code was out of the question at this point -- too little time left before the
demo (we had all announced that we'd appear in coat and tie). So we ever so
carfully patched the hex files, and voila! The program ran beautifully.
That year Comp Center gave each undergrad who needed a computer account an
account on each undergrad machine (TOPS-D and TOPS-E). These machines were on
Comp Center's DECnet: not a reliable network at that time. We had the current
version of our system and the patched hex files on TOPS-D, because the load was
lower there that night, but were scheduled to demo on TOPS-E terminals. DECnet
was, of course, down for quite a while, but finally came up. We quickly
transferred the current system to the E and ran back to our rooms or homes to
shower and dress.
We marched triumphantly into the terminal room and sat at our terminals while
our SPAMmascots fed cookies to the waiting crowd and our professor. The system
came up fine, and we demonstrated how to deposit into and read from memory and
registers before moving onto the demo programs. We loaded the hex files, set
breakpoints at our test locations, and lo! IT DIDN'T WORK. We were all
somewhat bummed and embarrassed, and managed to muddle through at the mercy of
this mysterious adversary that had destroyed a system that worked an hour
before. The professor suggested that we get our act a little more together and
have a somewhat less flashy demo in his office a few days hence.
The problem: we had neglected to copy the patched hex files from the D to the
E. We were demoing buggy 68000 code. The second demo went a bit better. We
now laugh about the first. Comp Center no longer gives out accounts to one
student on more than one machine. Good idea.
--Skef
[What be your motive for knowin' this stuff, eh? Doo ye like to feed on
stories o' suffrin'? Are ye writin' a book? I enjoyed reading those sent to
you so far and enjoyed sending you this one. Good topic.]
----Message 39 (1236 chrs) is----
Mail-From: CMUFTP host CMU-CS-VLSI received by CMU-10A at 4-Nov-82 09:40:16-EST
Date: 4 Nov 1982 8:36-EST
From: Ed.Frank at CMU-CS-VLSI at CMU-10A
Subject: Hacking horror stories
To: Brad.Allen@cmua
Message-Id: <82/11/04 0836.841@CMU-CS-VLSI>
While working on the software for a Graphics terminal we built at
Stanford, I ran into the following problem. The software was written in
assembly language, and was burnt into EPROMS. For a long time the
software easily fit in four 2708 (1K x 8) EPROMS. Well, one week after adding
the graphics support code to the terminal, I simply could not get it to
work. I spent literally dozens of hours going over at most 500 assembly
language statements, to no avail. Things were so bad in fact that I
seriously began to question my abilities as a programmer. One evening
while I was checking the output of the assembler (for at this point I
was convinced it was an assembler bug) I noticed that that one of the
target addresses of a jump was greater than FFF (hex). I didn't think
anything of it, until a few seconds latter when it occured to me that
addresses > 4K required 5 proms. I quickly went back to
work, burned the extra eprom, and the program worked perfectly!
Ed
----Message 40 (731 chrs) is----
Mail-From: local user C410RK40 at 4-Nov-82 09:58:20-EST
Date: 4 November 1982 0955-EST (Thursday)
From: Richard.Korf at CMU-10A (C410RK40)
To: Brad.Allen at CMU-10A
Subject: hacking horror story
Message-Id: <04Nov82 095535 RK40@CMU-10A>
Brad,
My favorite bug of all time concerned an ASR35 Teletype. I was trying to format
some output and found that directly after printing a long line, the second line
was indented by one space. Naturally, the bug went away when I ran the debugger.
It finally turned out that the printing head was physically bouncing off the
left hand stop. If it didn't have to print again immediately, it would have a
chance to settle back to the beginning of the line.
-rich
----Message 41 (1799 chrs) is----
Mail-From: local user C410SS40 at 4-Nov-82 11:42:32-EST
Date: 4 November 1982 1134-EST (Thursday)
From: Steven.Shafer at CMU-10A (C410SS40)
To: brad.allen at CMU-10A
Subject: Horrors!
Message-Id: <04Nov82 113429 SS40@CMU-10A>
Brad --
I had a nasty experience with an old PDP-11/40E running UNIX.
I had written a program which juggled several processes, one of which was
the largest core-image of any program in existance on the machine (<64K, of
course). One day, it died a sudden death.
I started tracking it down with print statements. At first, the problem
looked like something being set to 0; then, as I added more debugging code,
the 0's jumped around. I never knew which routines they would crop up in,
or whether global data structures were affected, or even if code itself was
being overwritten. Sometimes, the program would die even though the
debugging code showed nothing extraordinary.
I eventually gave up and rewrote the program from scratch, using smaller
processes and succeeding. Several months later, a paging bug was fixed: it
was responsible for writing 0's on pages when the core-image of a process
was beyond a certain length.
What makes this a horror story is a UNIX vagary tickled by the bug: within
the code being executed, there was a statement to close a file. The file,
like all UNIX files, was indexed by a small integer. When the zeroes struck
this variable, the effect was to close file 0, i.e. disconnect the keyboard!
So, not only did the program die, but it refused to talk to me long before
the actual moment of death, leaving me to watch helplessly as it writhed
in agony, unable to talk to it, unable to interrupt it, and never knowing
where the Flying Fickle Finger of Fate would strike next!
-- Steve
----Message 43 (390 chrs) is----
Mail-From: local user C410BL50 at 4-Nov-82 12:30:02-EST
Date: 4 November 1982 1214-EST (Thursday)
From: Bruce.Lucas at CMU-10A (C410BL50)
To: brad.allen at CMU-10A
Subject: horrors
Message-Id: <04Nov82 121457 BL50@CMU-10A>
On Unix, I once meant to type "rm *.BAK" but instead typed "rm * .BAK".
Fortunately, I hadn't made too many changes since the last backup to tape.
Bruce
----Message 46 (1054 chrs) is----
Mail-From: local user C410EL80 at 4-Nov-82 14:26:58-EST
Date: 4 November 1982 1411-EST
From: Ellen Lowenfeld at CMU-10A
Subject: Re: Hacking Horror Stories
To: Brad Allen
This one's kind of embarrassing, looking back on it... When I was
a sophomore at Brown, I took a course which had a big project, I guess
like 311 here, except that the groups were pairs. So that I and my partner
could test pre-compiled code separately (IBM 370, batch mode) we each
had a dummy main routine. Mine printed its name, and then called whatever
routine(s) I wanted to test. Unfortunately, I left out the quotes around
its name, and sent it into infinite recursion. IBM's great error message
once I found it after looking in 3 manuals, and poring over pages of
IEFH01X (or something like that), was "user error". Not until I had
spent most of a day looking for a wizard did I go back and just look
at the code I had written. Was my face red when all the people I had
talked to while trying to find out the problem asked what it turned out
to be!
----Message 47 (1310 chrs) is----
Mail-From: CMUFTP host CMU-RI-FAS received by CMU-10A at 4-Nov-82 14:38:21-EST
Date: 4 Nov 1982 13:09:55-EST
From: Neil.Swartz at CMU-RI-FAS at CMU-10A
To: ba0c@cmua
Subject: Horror stories
Several stories come to mind. At Princeton, they had WATFIV on a 360/91.
You got 2 seconds of computer time and 600 lines of output. One job came
out in WATFIV that printed a line of characters and then overstruck the
characters again and again. The computer counted this as one line so it
would do this forever. The print heads tore through the paper, the ribbon
and started in on the carriage. The system was down for more than 12 hours.
Another good one which I have heard about- (If anybody knows more about this
I would like to hear about it) The Phantom Teletype Program. The way it
worked was this: At a random time interval the program would start up and
pick a teletype on the system. It would print "The Phantom Teletype Strikes
Again!!" and then it would copy itself somewhere else on disk, set up the
parameters for its re-execution, and delete the old copy. System
programmers could find out where it had been, but not where it was
currently. Because it was too difficult to track, they left it on the
system.
There are lots of good(bad) stories running around.
Neil
----Message 49 (2598 chrs) is----
Mail-From: ARPANET host UTexas-20 received by CMU-10A at 4-Nov-82 16:41:21-EST
Date: 4 Nov 1982 1538-CST
From: CMP.LSMITH at UTEXAS-20
Subject: some horror stories
To: brad.allen at CMU-10A
My first hacking horror story goes back to my very first
programming course. My program kept exceeding its time limit and
aborting. I checked my code carefully and decided it was correct,
but only needed a little more time to finish. So I confidently
upped my limit from 7 seconds to a CPU minute of CDC 6600 time. I
was really horrified when it timed out again, blowing my entire
semester's allotment. A sharp consultant found my bug. I made the
FORTRAN equivalent of "FOR X = 1.0 BY 0.1 TO 10.0," with my final
test an equal. Since 0.1 is a repeating fraction in binary, it
never equaled 10, so it went past and on to infinity.
Years later I was working on a PDP11/45 Unix system. The system
began crashing some time after we retrieved something from the
backup tapes, using Unix's raw mode access to the tape. In cooked
mode, things worked right, so we knew it couldn't be a hardware
problem. After some months of trying to debug the problem, we
modified the tape device handler so that it spun and monitored
its registers until the transfer completed. One of the high bits
in the address register was sticking off. In cooked mode, Unix
read into its system buffers in low core and everything worked
because that bit stayed off anyway. In raw mode, it read into
user space directly. Whenever the address register was
incremented past that bit boundary, the DMA transfer would drop
down and wipe out some random locations and the system would
slowly collapse.
The worst horror stories are when you spend days hacking at a
program, only to discover that you've invoked a compiler bug. We
are extremely fortunate to have the ELISP system. I had a problem
with a lengthy computation sometimes returning NIL from compiled
code. Between the (RETURN RESULT) in the called function and
(SETQ X (CALLED ...)) in the caller, the value was being lost.
Interpreted, it worked. If I traced the function, it worked. If I
traced any function in a chain below it, it worked. It turns out
that if you have a chain of calls about 10 deep, then a MAPCAR
over a list of at least 3 values, then about three more calls
down, and all the functions are compiled, then the time bomb NIL
is stuck up on the stack. If any function in the chain is
interpreted, for example by tracing it, then the behavior goes
away. As far as I know, this bug still hasn't been found.
-------
----Message 50 (1130 chrs) is----
Mail-From: CMUFTP host CMU-CS-IUS received by CMU-10A at 4-Nov-82 21:16:47-EST
Date: 4 Nov 1982 20:08-EST
From: Victor.Milenkovic at CMU-CS-IUS at CMU-10A
Subject: Re: Hacking Horror Stories
To: Brad.Allen at CMU-10A
Message-Id: <82/11/04 2008.913@CMU-CS-IUS>
One version of the PL/I debugger at Yorktown had no provision for
displaying the hex values of pointer variables. However, it would, on
request, display the hex address of any other type of variable, as well
as its value. And so, in my program, I would create records,
containing a single float variable, based at the pointer I wanted to
see, and recompile. By requesting the address of these records, I
could determine the value of the pointer.
In PL/I one can allocate an area of memory and declare offset variables
into it. One can freely assign offset variables into pointer variables
and back again -- or so I thought. If a pointer to offset assignment
results in a negative offset, nothing complains (although it should),
but if one assigns the offset back to the pointer, it gets garbage.
This peculiarity caused a very tenacious bug.
----Message 51 (304 chrs) is----
Mail-From: local user C410BL03 at 4-Nov-82 21:52:38-EST
Date: 4 November 1982 2151-EST (Thursday)
From: Bruce.Leverett at CMU-10A
To: Brad.Allen at CMU-10A
Subject: Re: hacking horror stories
In-Reply-To: <04Nov82 210911 BA0C@CMU-10A>
Message-Id: <04Nov82 215100 BL03@CMU-10A>
Don't remember.
----Message 52 (2968 chrs) is----
Mail-From: local user C425EC0F at 4-Nov-82 22:12:20-EST
Date: 4 November 1982 2210-EST
From: eddie caplan at CMU-10A
To: brad allen at CMU-10A
Subject: hacking horror stories
i was doing research in the computer music lab. i was trying to
generate emotional responses in subjects by producing sympathetic
vibrations from the 64 loudspeakers surrounding the listening room.
normally, we would add sub- and ultrasonic frequencies to classical
"standards", and then play them to the subjects.
now, usually we just use frequency modulation to synthesize the
instruments of the classic orchestras. but one day as i was
making an undergraduate volunteer retch to beethoven's seventh
symphony, a thought struck me. if i changed to additive synthesis
for the instruments, i could elicit REALLY BIG responses! i mean,
i had been having pretty good results up 'til then, and i wasn't
complaining. but, with FM there was lots of data lost. additive
synthesis would make the music itself generate an emotional response.
full fidelity beethoven combined with me could convert hasidic jews to
catholicism!
so, i spent the next week redoing the beethoven. i finished at
2:30am, and the only other person around was my officemate, dana.
i asked her if she had heard beethoven's seventh recently. i told
her that i had a recording of boston symphony conducted by klaus
tennstedt. i still remember her eyes lighting up at the prospect. i
hated to lie to her, but she couldn't be told the truth or the data
would be tainted. i had to expose her to it without her suspecting.
i put dana into the listening room and turned on the music with
my sub- and ultrasonic frequencies added.
i watched through the soundproof glass from the observation room.
during the first movement, dana cried uncontrollably. she curled
up in the chair and wimpered. dana laughed insanely, and had what
appeared to be several orgasms.
"i've done it!", i cried.
but then, the second movement began. i shudder still when i think
of it. i looked in at dana. she was sitting upright in the chair,
staring straight ahead, her hands gripping her knees. there was
blood starting to drip from her fingernails. she was becoming
catatonic and starting to shake. i had to halt the processor before
permanent damage was done. but before i was able to stand, dana
let out an excrutiating scream. she shook violently and fell to the
floor. then, dana began to float into the air. i pulled open
the door and rushed into the listening room. dana was screaming far
above my head. beethoven was screaming from the 64 speakers.
then, i called her name. it was too much. dana dissolved.
i think that the added sound of me yelling to her exceeded the
threshold. i know now that i am to blame for her dissolving, and
that i'm responsible for bringing her back. perhaps it can be done
with bartok. dana always liked bartok.
eddie
----Message 53 (2694 chrs) is----
Mail-From: CMUFTP host CMU-CS-Spice received by CMU-10A at 4-Nov-82 22:58:54-EST
Date: 4 Nov 1982 22:08-EST
From: Rob.MacLachlan at CMU-CS-SPICE at CMU-10A
Subject: Hacking Horrors
To: Brad.Allen@cmua
Message-Id: <82/11/04 2208.881@CMU-CS-SPICE>
I ran into my most obsure bug last summer when I was working on a boot image
builder for Accent to run under Accent. What I had to do was convert the
original program, which had POS filesystem calls that read and wrote random
things scattered throughout it to use the Accent primitives, which are read
and write an entire file. After factoring this code out into a separate
module I found that the program died the same way about one time out of
five. Since the debugger was virtually non-existant I proceeded to put in
debugging code. First I put in a check where it was dying for the fatal
condition, which would print various information. I found that when the
error occured the cause was that the Pascal Get intrinsic was returning a
random value instead of the correct one, but no particular pattern was
observable. I then put in code to dump the contents of the pascal file
object after every value read from the file to see if it was getting
clobbered; with this code in place the program died with an illegal memory
reference inside the system print routine inside of one of the debugging
WriteLn's. At this point it was obvious that something earlier in the
program was damaging the environment somehow, so I tried successively
commenting out earlier parts of the program to find the offending code, and
I found that if I did not read an earlier file, than the problem did not
occur. This caused me to suspect my file handling module, so I put
debugging code in it to check that all of the pointers it was returning were
valid. When this debugging code was inserted the program then died earlier
in the program, but this time consistantly during the reading of the third
microcode file. Insertion of debugging code at this point revealed that to
a point the buffer contained the correct data, but the rest was zero. At
this point I felt reasonably sure that I had found a bug in Accent, so I
called in the wizards, who looked at the address of the buffer and said: 'Oh
that crosses a 64k boundry'. Evidently it was a "Known" bug that a pascal
object could not cross a 64k boundry, because the address calculations wrap
around, and the ReadFile routine I was calling read the file into a place in
memory such that it crossed a 64k boundry. The Execution of the debugging
code I put in caused storage to be allocated, thus causing the heap to cross
a 64k boundry earlier in the program.
----Message 54 (1784 chrs) is----
Mail-From: local user C410TL19 at 5-Nov-82 01:22:19-EST
Date: 5 November 1982 0122-EST (Friday)
From: Tom.Lane at CMU-10A
To: Brad.Allen at CMU-10A
Subject: Re: Hacking Horror Stories
Message-Id: <05Nov82 012212 TL19@CMU-10A>
Well, after reading your accumulated file I felt like I should
contribute one of my own.
I have spent too many years of my life hacking systems which tried to
enlarge a processor's address space by using software-controlled bank
switching (C.mmp/Hydra & Cm* locally, Hewlett-Packard 9845 out in the
real world; personal computing CP/M systems seem to be going down the
same garden path). These machines extend a processor with (say) a 64K
address space to handle megabytes, by dividing the processor address
space into two to 16 blocks. Each block is mapped to a block of physical
memory by means of an associated processor register. Accessing a
particular memory location requires loading up one of the map registers
with the block number of the location, then accessing the processor-
visible address "register number * block size + location's offset
within block".
This scheme is a LOSER. The majority of bugs found in each system
I have worked with have been directly related to bank switching;
it's just too easy to forget to load or restore a map register.
This leads to reading or clobbering semi-random locations in blocks
other than the one wanted. Worse, the bugs are often very difficult
to duplicate, since they only show up when two data structures being
manipulated at once happen to reside in different physical blocks.
HP's testing records showed that 75% of the bugs discovered during
system testing were of this ilk; many of them required an unreasonable
amount of effort to track down.
tom lane