2116 lines
88 KiB
Plaintext
2116 lines
88 KiB
Plaintext
|
Path: senator-bedfellow.mit.edu!bloom-beacon.mit.edu!spool.mu.edu!agate!library.ucla.edu!news.mic.ucla.edu!magnesium.club.cc.cmu.edu!news.sei.cmu.edu!cert.org!netnews.upenn.edu!dsinc!gvls1!boojum!esr
|
||
|
From: esr@snark.thyrsus.com (Eric S. Raymond)
|
||
|
Newsgroups: comp.unix.sys5.r4,comp.unix.pc-clone.32bit,comp.bugs.sys5,news.answers
|
||
|
Subject: Known Bugs in the USL UNIX distribution
|
||
|
Message-ID: <1mMD8p#M7bHGP74mn36O7fG8Zm0smCcO=esr@boojum.thyrsus.com>
|
||
|
Date: 5 Aug 93 16:27:16 GMT
|
||
|
Expires: 4 Aug 93 23:30:00 GMT
|
||
|
Sender: esr@boojum.thyrsus.com (Eric S. Raymond)
|
||
|
Followup-To: comp.unix.pc-clone.32bit
|
||
|
Lines: 2102
|
||
|
Approved: news-answers-request@MIT.Edu
|
||
|
Xref: senator-bedfellow.mit.edu comp.unix.sys5.r4:4553 comp.unix.pc-clone.32bit:5794 comp.bugs.sys5:1881 news.answers:11155
|
||
|
|
||
|
Archive-name: usl-bugs
|
||
|
Last-update: 05 Aug 1993
|
||
|
Supersedes: <unknown>
|
||
|
Version: 17.0
|
||
|
|
||
|
Many FAQs, including this one, are available via FTP on the archive site
|
||
|
rtfm.mit.edu (aka pit-manager.mit.edu or 18.172.1.27) in the directory
|
||
|
pub/usenet/news.answers. The name under which this FAQ is archived appears in
|
||
|
the Archive-name line above. This FAQ is updated monthly; if you want the
|
||
|
latest version, please query the archive rather than emailing the overworked
|
||
|
maintainer.
|
||
|
|
||
|
What's new in this issue:
|
||
|
* New bug info (see below)
|
||
|
* Instructions for fixing the FUBYTE problem under Del 2.2.
|
||
|
|
||
|
*** NEWS FLASH *** NEWS FLASH *** NEWS FLASH *** NEWS FLASH *** NEWS FLASH ***
|
||
|
|
||
|
May's new bug (II.43) is still *really serious*. Get after your
|
||
|
vendor to fix it ASAP!
|
||
|
|
||
|
*** NEWS FLASH *** NEWS FLASH *** NEWS FLASH *** NEWS FLASH *** NEWS FLASH ***
|
||
|
|
||
|
(In the table below, bugs new this issue are marked with a ** at the
|
||
|
left margin; old bugs for which information has been added are marked
|
||
|
with *)
|
||
|
|
||
|
0. Table of Contents
|
||
|
I. Introduction
|
||
|
II. General Bugs
|
||
|
1. UNIX kernel must lie below the 1024-cylinder mark
|
||
|
2. Suid programs dump core when signalled
|
||
|
3. DMAs on large ISA machines may fail
|
||
|
4. There is a cylinder limit on disk size
|
||
|
5. more(1) doesn't handle SIGWINCH
|
||
|
6. X performance problem
|
||
|
7. C shell background process termination logs you out
|
||
|
8. A security hole in login
|
||
|
9. COFF problems with long filenames
|
||
|
10. Flakeouts in the Wangtek device driver
|
||
|
11. A kernel declaration bug
|
||
|
12. Reading tar archives with cpio foos up on multiply-linked files
|
||
|
13. Process accounting is broken
|
||
|
14. tar(1) foos up in the presence of symbolic links
|
||
|
15. Symbolic links can interfere with shellscript execution
|
||
|
16. Piping a csh builtin causes the shell to hang.
|
||
|
17. tar(1) fails to restore adjacent symbolic links properly
|
||
|
18. COFF binaries linked with curses(3) and shared libc hang
|
||
|
19. shl hangs, sxt devices bad
|
||
|
20. num-lock prevents mouse from working properly
|
||
|
21. adjtime() doesn't work
|
||
|
23. cron mail doesn't go through aliasing
|
||
|
24. fragility in xterm
|
||
|
25. csh lossage due to bad optimization
|
||
|
26. Bug in cp(1)
|
||
|
27. tbl -me doesn't work
|
||
|
* 28. who -r fragility leads to boot-time problems
|
||
|
29. at(1) breaks here-documents in shell scripts
|
||
|
30. UHC mouse driver ignores the middle button.
|
||
|
31. mmap acces doesn't update file mod times
|
||
|
32. AT&T select(2) is incompatible with BSD select(2)
|
||
|
33. (4.2) The login program requires its PPID to be 1
|
||
|
34. (4.2) Bad MAXMINOR values can make the system unbootable
|
||
|
35. Incompatible change in TZ interpretation
|
||
|
36. Nulls in pixmaps can crash X
|
||
|
37. Potential security hole in SVr4s using sendmail
|
||
|
38. Reporting bug in df on non-root filesystems
|
||
|
39. tar writes -v output to stdout, not stderr
|
||
|
40. SIGPIPE is delayed and not reliable
|
||
|
41. /usr/lib/acct/fwtmp doesn't work
|
||
|
42. whatis database is full of garbage.
|
||
|
43. mmap is seriously broken
|
||
|
** 44. a bug in xterm
|
||
|
** 45. DrawText16() bug in XWIN
|
||
|
** 46. output redirection with exec fails in sh
|
||
|
** 47. rm fails to reject . or .. arguments
|
||
|
III. Serial-port and tty administration problems
|
||
|
1. Dropout problems with tty devices
|
||
|
2. Quick port setup option in sysadm is broken
|
||
|
3. ttymon drops DTR when it shouldn't
|
||
|
4. ttymon doesn't drop DTR when it should
|
||
|
5. (4.2) Terminating cu to a direct line locks up the port
|
||
|
* 6. Hardware flow control bug breaks streaming data transfers
|
||
|
7. Bad interaction between ttymon and networking
|
||
|
IV. Networking and File-Sharing Bugs
|
||
|
1. NFS locking is unusably slow
|
||
|
2. UFS file system problems
|
||
|
3. Byte-order problem with NFS when accessing Sun disks
|
||
|
4. Under weird circumstances, lseek on UFS may cause corruption
|
||
|
5. FTP problems
|
||
|
6. A bug in the WD80x3 support
|
||
|
7. Security hole near fingerd
|
||
|
8. Fatal bug in priority-band message handling.
|
||
|
9. SVr4.0.4 TCP/IP routing is broken
|
||
|
10. df(1) on NFS volumes returns bad data
|
||
|
11. rsh hogs the processor
|
||
|
** 12. MTU for remote networks ignored
|
||
|
** 13. Bug in remote printing.
|
||
|
V. SCSI Support Problems
|
||
|
1. sar is confused by SCSI
|
||
|
2. A configuration problem
|
||
|
3. Synchronous SCSI hang problem
|
||
|
4. ps chokes on commands that do SCSI I/O
|
||
|
5. Transfer speed problems with Adaptec 1542B on 486s
|
||
|
6. df gives inaccurate values for large SCSI partitions
|
||
|
VI. Development Tools Problems
|
||
|
1. General UCB library brokenness
|
||
|
2. USL emulation of BSD signals doesn't work
|
||
|
3. Possible string library problems
|
||
|
4. USL's ndbm support is broken.
|
||
|
5. An include file is missing
|
||
|
6. sscanf(3) has a potential bug
|
||
|
7. shmat(2) vs. vfork(2)
|
||
|
8. FIONREAD fails on regular files
|
||
|
9. fread(3) does the wrong thing on pipes and FIFOs
|
||
|
10. putw appears to be broken
|
||
|
11. Compiler problems
|
||
|
12. getlogin() doesn't work
|
||
|
13. syslog routines don't work
|
||
|
14. Bogus `r' in xt driver configuration flags
|
||
|
15. ioctl for kernel symbol fetches fails (4.2)
|
||
|
** 16. Bug in cc optimizer (4.2.1)
|
||
|
** 17. /usr/ucb/install uses missing group "staff"
|
||
|
VII. The FUBYTE Problem *
|
||
|
VIII. Destiny and Dell
|
||
|
|
||
|
I. Introduction
|
||
|
|
||
|
This posting lists known bugs in System V Release 4 implementations, and known
|
||
|
fixes applied by various porting houses (there's also random bits of
|
||
|
information about SCO UNIX here and there). It was formerly part of the
|
||
|
386-buyers-faq issues 1.0 through 4.0, and is still best read in conjunction
|
||
|
with the pc-unix/software FAQ descended from that posting.
|
||
|
|
||
|
This document is maintained and periodically updated as a service to the net by
|
||
|
Eric S. Raymond <esr@snark.thyrsus.com>, who began it for the very best
|
||
|
self-interested reason that he was in the market and didn't believe in plonking
|
||
|
down several grand without doing his homework first (no, I don't get paid for
|
||
|
this, though I have had a bunch of free software and hardware dumped on me as a
|
||
|
result of it!). Corrections, updates, and all pertinent information are
|
||
|
welcomed at that address.
|
||
|
|
||
|
This posting is periodically broadcast to the USENET group comp.unix.sysv386
|
||
|
and to a list of vendor addresses. If you are a vendor representative, please
|
||
|
check to make sure the information on your company is current and correct. If
|
||
|
it is not, please email me a correction ASAP. If you are a knowledgeable user
|
||
|
of any of these products, please send me a precis of your experiences for the
|
||
|
improvement of future issues.
|
||
|
|
||
|
The bug descriptions often include indications of fixes by the various porting
|
||
|
houses to their current releases. These are:
|
||
|
|
||
|
Consensys UNIX Version 1.3 abbreviated as "Cons" below
|
||
|
Dell UNIX Issue 2.2 abbreviated as "Dell" below
|
||
|
Esix Revision A abbreviated as "Esix" below
|
||
|
Micro Station Technology SVr4 UNIX abbreviated as "MST" below
|
||
|
Microport System V Release 4.0 version 4 abbreviated as "uPort" below
|
||
|
UHC Version 3.6 abbreviated as "UHC" below
|
||
|
SCO Open DeskTop 1.1 abbreviated as "SCO" below
|
||
|
|
||
|
II. General Bugs
|
||
|
|
||
|
1. UNIX kernel must lie below the 1024-cylinder mark
|
||
|
Bela Lubkin says "SCO's boot filesystem must lie below 1024 cylinder mark;
|
||
|
anything else can be anywhere. This is more-or-less a limitation of the BIOS
|
||
|
interface that the bootstrap loader must use. Could be circumvented by going
|
||
|
directly to controller hardware in the bootstrap loader, but that would be
|
||
|
horrendously complex with all the controllers & host adapters to be supported."
|
||
|
Actually this is not quite right. It's the *kernel* that must lie below
|
||
|
the 1K-cylinder mark; the rest of the root partition could extend above it.
|
||
|
But since partition endpoints are the only way to control where physical
|
||
|
blocks get allocated, it comes to the same thing
|
||
|
Roger Knopf <rogerk@sco.COM> adds: "The 1024 cylinder limit applies
|
||
|
not only to the kernel but also to /boot. Both are read in while we
|
||
|
are using the BIOS to talk to the hard disk. There are 10 bits set
|
||
|
aside in the register for cylinders in the INT 13 call, hence 1024
|
||
|
cylinders. There are a few controllers that allocate 2 more bits (they
|
||
|
are taken away from the space allocated for head bits, I recall). It
|
||
|
is trivial to modify all the relevant boot code to use these bits IF
|
||
|
YOU KNOW THAT THE CONTROLLER WILL USE THEM but I know of no way to
|
||
|
reliably determine that this is the case. Once the kernel is loaded
|
||
|
we use 16 bits everywhere to hold the cylinder number."
|
||
|
|
||
|
2. Suid programs dump core when signalled
|
||
|
Mark Snitily of SGCS says that under many SVr4s, signalling a
|
||
|
process that is running suid root will cause it to core-dump. He says
|
||
|
Dell and MST have fixed this, and SCO doesn't suffer from this.
|
||
|
|
||
|
3. DMAs on large ISA machines may fail
|
||
|
On ISA machines with more that 16MB of RAM, SVr4 may try to do DMA
|
||
|
from outside the bus's address space, causing serious problems. UNIX ought
|
||
|
to do an in-memory copy to within the low 16MB but the USL base code doesn't.
|
||
|
Dell says they've fixed this, and that's been confirmed by a user.
|
||
|
UHC says they've fixed this; they add that the special buffer-allocation
|
||
|
logic to handle the problem can be turned off with a tunable kernel parameter
|
||
|
if you've got less than 16M.
|
||
|
Microport says they've fixed this in their new 4.1 release, shipping early
|
||
|
March.
|
||
|
Esix offers a patch to correct this problem.
|
||
|
SCO used to have a similar bug but fixed it long ago.
|
||
|
John Sully <jms@mport.com> writes: "This was due to a bug in pre version 4
|
||
|
dma code. The USL code has always at least attempted to do a copy from low
|
||
|
memory to high memory on systems with more than 16Mb of RAM. By the way UHC is
|
||
|
wrong; the buffer allocation code only comes into play if you have more than
|
||
|
16Mb of memory. You can turn it off if you have a machine (ie. an EISA bus)
|
||
|
which will allow you to do DMA above 16Mb. You *must* have this tunable
|
||
|
(MAXDMAPAGE) turned on if you are using *ISA* bus masters in a system with more
|
||
|
than 16Mb of ram. Unfortunately doing this will affect all drivers which do
|
||
|
dma as there is no good way to do this on a per-driver basis."
|
||
|
|
||
|
4. There is a cylinder limit on disk size
|
||
|
Stock USL code is limited to 1,024 cylinders per Winchester, which
|
||
|
might cause problems with some disk drives.
|
||
|
Microport, Dell, Esix, MST, and UHC have fixed this.
|
||
|
|
||
|
5. more(1) doesn't handle SIGWINCH
|
||
|
It doesn't get its window size from the stty/termio structures, so it
|
||
|
doesn't cope with SIGWINCH properly.
|
||
|
|
||
|
6. X performance problem
|
||
|
Stock X11R4 and R5 (at least prior to 1.2E) is said to hog the
|
||
|
processor if you use the LOCALCONNECT option. Jan Brittenson
|
||
|
<bson@gnu.ai.mit.edu> posted the following workaround:
|
||
|
|
||
|
I don't know what causes the standard X server to hog the CPU, but
|
||
|
it can be avoided. Use the following program instead of xinit. Compile
|
||
|
it with `$CC -O -o xserv xserv.c -lX11' where CC is either
|
||
|
/usr/ccs/bin/cc or gcc. Set DISPLAY and XINITRC and run `xserv' from
|
||
|
your home directory. This is just a q&d hack, and not really a
|
||
|
substitute for xinit -- but it works.
|
||
|
|
||
|
/* xserv.c -- start X server
|
||
|
|
||
|
Start X server. Similar to xinit, but intended to
|
||
|
circumvent the X386 CPU Hog Mode
|
||
|
|
||
|
Jan Brittenson, June 2 1992 05:15 am
|
||
|
with corrections by Adam Donnison <adam@shinto.saki.com.au> Tue, 2 Mar 1993
|
||
|
*/
|
||
|
|
||
|
#include <stdio.h>
|
||
|
#include <sys/types.h>
|
||
|
#include <signal.h>
|
||
|
#include <setjmp.h>
|
||
|
#include <unistd.h>
|
||
|
#include <libgen.h>
|
||
|
|
||
|
#include <X11/Xlib.h>
|
||
|
#include <X11/Xos.h>
|
||
|
#include <X11/Xmu/SysUtil.h>
|
||
|
|
||
|
|
||
|
extern int errno;
|
||
|
|
||
|
/* This may need to be "/usr/X386/bin/X386" */
|
||
|
#define DEFAULT_XPATH "/usr/bin/X11/X"
|
||
|
|
||
|
/* Start X server. Fork-exec server, passing the DISPLAY environment
|
||
|
variable. Wait for server to get up and running (at which point it
|
||
|
passes back a SIGUSR1), at which point the user xinitrc file is run. */
|
||
|
|
||
|
#define XINITRC ".xinitrc"
|
||
|
#define DEFAULT_XCOMMAND "xterm -g +1+1 -n login -display :0"
|
||
|
|
||
|
extern void *malloc (), free ();
|
||
|
extern char *basename (), *getenv (), *strcpy ();
|
||
|
|
||
|
/* X stuff */
|
||
|
Display *top_display;
|
||
|
|
||
|
|
||
|
/* This is supposed to be in libgen.a... */
|
||
|
static char
|
||
|
*basename (s0)
|
||
|
char *s0;
|
||
|
{
|
||
|
register char *s1;
|
||
|
|
||
|
for (s1 = s0 + strlen (s0) - 1;
|
||
|
s1 > s0 && *s1 != '/'; s1--);
|
||
|
|
||
|
if (*s1 == '/')
|
||
|
return s1+1;
|
||
|
|
||
|
return s1;
|
||
|
}
|
||
|
|
||
|
jmp_buf sigusr1_frame;
|
||
|
|
||
|
static void
|
||
|
caught_sigusr1 (int dummy) { longjmp (sigusr1_frame, !0); }
|
||
|
|
||
|
|
||
|
static char
|
||
|
*dispname (s0)
|
||
|
char *s0;
|
||
|
{
|
||
|
register char *s1;
|
||
|
|
||
|
for (s1 = s0 + strlen (s0) - 1;
|
||
|
s1 > s0 && *s1 != ':'; s1--);
|
||
|
|
||
|
return s1;
|
||
|
}
|
||
|
|
||
|
|
||
|
/* No arguments */
|
||
|
int
|
||
|
main (argc, argv)
|
||
|
int argc;
|
||
|
char **argv;
|
||
|
{
|
||
|
char *xserver_file, *xinitrc_file, *home_path, *display, *display_X_arg;
|
||
|
int xserver_pid, orgmask;
|
||
|
|
||
|
|
||
|
/* Not that it really matters, just to avoid being used as a direct
|
||
|
replacement for xinit. */
|
||
|
|
||
|
if (argc != 1)
|
||
|
{
|
||
|
fprintf (stderr, "usage: %s\n", basename (*argv));
|
||
|
exit (1);
|
||
|
}
|
||
|
|
||
|
|
||
|
/* Resolve xinitrc path. This is done before the server is
|
||
|
started. */
|
||
|
|
||
|
if (!(home_path = getenv ("HOME")))
|
||
|
home_path = "/etc";
|
||
|
|
||
|
if (!(xinitrc_file = getenv ("XINITRC")))
|
||
|
{
|
||
|
xinitrc_file = malloc (strlen (home_path) + 1 + strlen (XINITRC) + 1);
|
||
|
sprintf (xinitrc_file, "%s/%s", home_path, XINITRC);
|
||
|
}
|
||
|
else
|
||
|
xinitrc_file = strdup (xinitrc_file);
|
||
|
|
||
|
|
||
|
/* Resolve display */
|
||
|
if (!(display = getenv ("DISPLAY")))
|
||
|
display = display_X_arg = ":0.0";
|
||
|
else
|
||
|
display_X_arg = dispname (display);
|
||
|
|
||
|
|
||
|
/* Tell server to notify us when up and running */
|
||
|
signal (SIGUSR1, SIG_IGN);
|
||
|
orgmask = sigblock (sigmask (SIGUSR1));
|
||
|
|
||
|
/* Start server */
|
||
|
if (!(xserver_pid = vfork ()))
|
||
|
{
|
||
|
xserver_file = DEFAULT_XPATH;
|
||
|
|
||
|
execl (xserver_file, xserver_file, display_X_arg, NULL);
|
||
|
|
||
|
fprintf (stderr, "%s: can't exec %s (errno = %d) -- start-up aborted\n",
|
||
|
basename (*argv), xserver_file, errno);
|
||
|
exit (1);
|
||
|
}
|
||
|
|
||
|
if (xserver_pid < 0)
|
||
|
{
|
||
|
fprintf (stderr, "%s: can't fork (errno = %d) -- start-up aborted\n",
|
||
|
basename (*argv), errno);
|
||
|
|
||
|
exit (1);
|
||
|
}
|
||
|
|
||
|
/* Await signal from server */
|
||
|
#if 0
|
||
|
/* Why the #@$*! doesn't this work?! */
|
||
|
sigsetmask (orgmask);
|
||
|
alarm (20);
|
||
|
sigpause (sigmask (SIGUSR1) | sigmask (SIGALRM));
|
||
|
#else
|
||
|
sleep (5);
|
||
|
#endif
|
||
|
|
||
|
/* Open display */
|
||
|
if (!(top_display = XOpenDisplay (display)))
|
||
|
{
|
||
|
fprintf (stderr, "%s: unable to open display '%s' -- start-up aborted\n",
|
||
|
basename (*argv), display);
|
||
|
exit (1);
|
||
|
}
|
||
|
|
||
|
/* Execute xinitrc file */
|
||
|
if (system (xinitrc_file) < 0)
|
||
|
system (DEFAULT_XCOMMAND);
|
||
|
|
||
|
/* Close display */
|
||
|
XCloseDisplay (top_display);
|
||
|
|
||
|
/* Terminate server */
|
||
|
kill (xserver_pid, SIGTERM);
|
||
|
|
||
|
/* Finished */
|
||
|
free (xinitrc_file);
|
||
|
}
|
||
|
|
||
|
7. C shell background process termination logs you out
|
||
|
In C shell, unless "ignoreeof" is set, termination of a background
|
||
|
process will log you out. With "ignoreeof" set, just the message
|
||
|
"Use logout to exit" will be printed.
|
||
|
|
||
|
8. A security hole in login
|
||
|
David Wexelblat <dwex@mtgzfs3.att.com> reports: "There is a HUGE security
|
||
|
hole in /bin/login in all USL derived SVR4s before 4.0.4. Refer to CERT
|
||
|
advisory CA-91:08, dated 5/23/91. This is known to be present in AT&T SVR4
|
||
|
2.1, and Microport SVR4 3.1. ESIX claims to have fixed it, Microport reports
|
||
|
that it is fixed in 4.1. I won't give any more details unless necessary.
|
||
|
Suffice to say that this bug allows any non-privileged user on an SVR4 system
|
||
|
to get read-write access to any file on the system."
|
||
|
|
||
|
9. COFF problems with long filenames
|
||
|
A source at Dell urges: "Our SVR4v2 did some stuff that USL didn't get
|
||
|
around to until SVR4v4. Try Dell UNIX 2.1 with a COFF program on a large UFS
|
||
|
filesystem in a directory with long names. Runs on Dell UNIX. Breaks on
|
||
|
others." I don't have more definite info yet.
|
||
|
|
||
|
10. Flakeouts in the Wangtek device driver
|
||
|
Dell reports that USL's Wangtek device driver is seriously flaky. "How'd
|
||
|
you like a multi volume backup where the second and subsequent volumes don't
|
||
|
follow on from the previous volumes?" UHC confirms this and is actively
|
||
|
working on the problem.
|
||
|
An anonymous SCOer says "The QIC02 tape controller `standard' is seriously
|
||
|
flaky. Our driver's in pretty good shape but nobody will ever have a truly
|
||
|
solid driver that supports every QIC02 controller you can find."
|
||
|
Gordon Ross <gwr@mc.com> reports: "Actually, the SCSI tape target driver
|
||
|
`st01' has a similar problem at version 4.0.3 which I corrected while I worked
|
||
|
on the SVR4 code. The correction was provided to the support group at USL.
|
||
|
The actual problem was that the SCSI tape would return a `check status'
|
||
|
completion code which was just trying to inform the driver of the arrival
|
||
|
of the `logical end of media' indication but the driver was treating it
|
||
|
as an error. The tape drive had in fact written the data, but the driver
|
||
|
incorrectly assumed that the "check status" return meant that it failed.
|
||
|
The result of this is that when you write into the end of the tape, you
|
||
|
can read back one more "chunk" than yu wrote. Of course, cpio does not
|
||
|
like this at all when doing multi-volume backups..."
|
||
|
|
||
|
11. A kernel declaration bug
|
||
|
A botch in USL's /etc/conf/pack.d/kernel/space.c (which is present in
|
||
|
Consensys 1.3, Dell 2.1, Esix 4.0.3A, Microport 4.0.3 and 4.0.4 and may also be
|
||
|
present in other SVr4s) can step on the linesw[] table. The problem is that
|
||
|
the domain name array initialization is wrong and too short; thus, when it's
|
||
|
set, data past the end of the array can be stomped. To fix this, find the
|
||
|
following near line 247:
|
||
|
|
||
|
char srpc_domain[] = SRPC_DOMAIN;
|
||
|
|
||
|
and change it to
|
||
|
|
||
|
char srpc_domain[SYS_NMLN] = SRPC_DOMAIN;
|
||
|
|
||
|
then rebuild the kernel.
|
||
|
Microport officially knows about this bug and plans to fix it in a
|
||
|
near-future update release. It has been fixed in Dell 2.2.
|
||
|
|
||
|
12. Reading tar archives with cpio foos up on multiply-linked files
|
||
|
Paul De Bra <debra@info.win.tue.nl> reports the following:
|
||
|
In theory, cpio(1) is supposed to be able to read tar(1) archives. In
|
||
|
practice...don't try it. Multiply-linked files will be extracted from the
|
||
|
archive, whether or not they match the current pattern and whether or not
|
||
|
you have selected 'u'. This happens even if you use the `t' option, so
|
||
|
it's not even save to list the archive files!
|
||
|
|
||
|
13. Process accounting is broken
|
||
|
In 4.0.3, process accounting doesn't work. From examining the accounting
|
||
|
scripts, it appears that /usr/lib/acct/accton is supposed to set a return code
|
||
|
depending on whether accounting was switched on already or not. However, it
|
||
|
always returns the same result - accounting switched off. This means that the
|
||
|
/usr/lib/acct/ckpacct script, which is run every hour to keep the proccess
|
||
|
accounting log in check, instead turns off accounting the first time it is run
|
||
|
after booting. The same happens with the nightly /usr/lib/acct/monacct
|
||
|
program.
|
||
|
I don't yet know whether this bug is present in 4.0.4. It is definitely
|
||
|
un-fixed in Dell 2.1 and Consensys 1.3. In Dell 2.2 the return bug is fixed,
|
||
|
but accounting isn't automatically enabled at boot time.
|
||
|
|
||
|
14. tar(1) foos up in the presence of symbolic links
|
||
|
Tar can get the names of symbolic links wrong when creating an archive.
|
||
|
This bug can be demonstrated by doing the following:
|
||
|
|
||
|
mkdir t
|
||
|
cd t
|
||
|
touch a 1234567890
|
||
|
ln -s 1234567890 b
|
||
|
ln -s a c
|
||
|
tar vcf ../t.tar .
|
||
|
|
||
|
The output generated by tar is:
|
||
|
|
||
|
a ./ 0 tape blocks
|
||
|
a ./a 0 tape blocks
|
||
|
a ./1234567890 0 tape blocks
|
||
|
a ./b symbolic link to 1234567890
|
||
|
a ./c symbolic link to a234567890
|
||
|
|
||
|
(Note the above commands should be done in the order shown and in a new
|
||
|
directory) This bug is nasty. Recommended solution: use GNU tar.
|
||
|
This is reported from Esix 4.0.3 and Consensys 1.3, but probably exists on
|
||
|
other SVr4s as well. It has been fixed in Dell 2.2.
|
||
|
|
||
|
15. Symbolic links can interfere with shellscript execution
|
||
|
There is a problem running #! scripts when symbolic links are involved.
|
||
|
Typing in the following from a command shell demonstrates the problem:
|
||
|
|
||
|
mkdir a b
|
||
|
ln -s a c
|
||
|
cd a
|
||
|
cat > script <<!
|
||
|
#!/bin/sh
|
||
|
echo Hello
|
||
|
!
|
||
|
chmod 755 script
|
||
|
cd ../b
|
||
|
ln -s ../c/script .
|
||
|
./script
|
||
|
|
||
|
The message generated from the last line is:
|
||
|
|
||
|
a/script: a/script: cannot open
|
||
|
|
||
|
This is reported from Esix 4.0.3, Consensys 1.3, and Dell 2.2, but
|
||
|
probably exists on other SVr4s as well.
|
||
|
|
||
|
16. Piping a csh builtin causes the shell to hang.
|
||
|
While running csh, this can be demonstrated by some of the following:
|
||
|
|
||
|
echo Hello | cat
|
||
|
history | more
|
||
|
|
||
|
(A solution to this one is use tcsh-6.02.)
|
||
|
This is reported from Esix 4.0.3 and Consensys 1.3, but probably exists on
|
||
|
other SVr4s as well. It is reported fixed in Dell 2.2.
|
||
|
|
||
|
17. tar(1) fails to restore adjacent symbolic links properly
|
||
|
Arthur Krewatt <...!rutgers!mcdhup!kilowatt!krewat> reports:
|
||
|
SVR4 tar has another strange bug. Seems if when restoring files, you
|
||
|
restore one file that is a link, say "a ->/a/b/c/d/e" and there is another
|
||
|
link just after it called "b ->/a/b/c" tar will restore it as "b ->/a/b/c/d/e"
|
||
|
This just seems to be a lack of the NULL at the end of the string, like
|
||
|
someone did a memmov or memcpy(dest,src,strlen(src)); where it should be
|
||
|
strlen(src)+1 to include the NULL.
|
||
|
|
||
|
18. COFF binaries linked with curses(3) and shared libc hang
|
||
|
...eating the CPU. Cause unknown.
|
||
|
|
||
|
19. shl hangs, sxt devices bad
|
||
|
shl(1) does not work. Try creating a layer and doing an 'ls'. Your session
|
||
|
hangs. Bruce Momjian <root%candle.uucp@bts.com>, who reported this bug, says
|
||
|
he believes it is the sxt devices which are broken. It definitely exists in
|
||
|
Consensys 1.3.
|
||
|
|
||
|
20. num-lock prevents mouse from working properly
|
||
|
When using the Motif window manager, if your num lock is on, your mouse
|
||
|
clicks are not recognized by the window manager. The mouse still works in
|
||
|
xterm(1). This is allegedly fixed in Destiny (4.2).
|
||
|
Under Dell 2.2 if num lock is on there's no problem, but if scroll lock
|
||
|
is on then mouse clicks aren't recognised.
|
||
|
|
||
|
21. adjtime() doesn't work
|
||
|
Hugh Stearns <hoyt@isus.tnet.com> reports that in 4.0.3.6 adjtime() doesn't.
|
||
|
Calling `date -a' works to adjust the time slowly.
|
||
|
|
||
|
23. cron mail doesn't go through aliasing
|
||
|
Hugh Stearns <hoyt@isus.tnet.com> reports that in 4.0.3.6 cron mail to adm
|
||
|
doesn't get redirected by the aliases file.
|
||
|
|
||
|
24. fragility in xterm
|
||
|
Hugh Stearns <hoyt@isus.tnet.com> reports that in 4.0.3.6, doing ~! from
|
||
|
a cu in xterm kills xterm. This has been fixed in Dell 2.2.
|
||
|
|
||
|
25. csh lossage due to bad optimization
|
||
|
If a csh user sources a non-existent file in their .cshrc (eg, source .alias,
|
||
|
where .alias doesn't exist), then the system will hang for a couple of minutes.
|
||
|
Eventually the user get an "Out of memory" error and the console logs "NOTICE:
|
||
|
out of swap space - Insufficient memory to allocate 2 pages - system call
|
||
|
failed".
|
||
|
This appears to be due to over-optimization of code surrounding a longjmp
|
||
|
call.
|
||
|
(There are numerous other reports of memory leak bugs in csh).
|
||
|
|
||
|
26. Bug in cp(1)
|
||
|
If ``copy'' encounters a directory before a file, it dumps core ...
|
||
|
|
||
|
--- cut ---
|
||
|
cd /tmp
|
||
|
mkdir copybug jnk
|
||
|
cd jnk
|
||
|
mkdir directory
|
||
|
>file
|
||
|
cp -r * /tmp/copbug
|
||
|
--- cut ---
|
||
|
|
||
|
This was reported from Consensys 4.0.3 but is probably a generic SVr4 bug.
|
||
|
It appears to have been fixed in ESIX SVR4.0.3A and Dell 2.2.
|
||
|
|
||
|
27. tbl -me doesn't work
|
||
|
Wolfgang Denk reports that trying to use "tbl -me" for any input file causes
|
||
|
tbl to quit. The problem is that newer tbl versions don't accept [nt]roff
|
||
|
contol lines (".rm @W") after .TS.
|
||
|
|
||
|
28. who -r fragility leads to boot-time problems
|
||
|
It coredumps if the name of the timezone (TZ) is longer than three characters
|
||
|
and the length is a multiple of four. This can be a real problem for European
|
||
|
sites... and is potentially more hazardous than immediately apparent as _a
|
||
|
lot_ of the initialization scripts (rc1.d, rc2.d) use ``who -r'' to see if the
|
||
|
machine is in single- or multi-user mode. And when ``who'' bombs out, the
|
||
|
``set'' command is iven an empty command-line and can't do much else than print
|
||
|
the shell variables, $1-$9 remain empty ... meaning that more or less all the
|
||
|
scripts fail in various ways and the system has an exceptionally hard time
|
||
|
coming up.
|
||
|
Peter Wemm <peter@DIALix.oz.au> reports that this bug was present in Dell
|
||
|
2.0, fixed in Dell 2.1, but reappeared in Dell 2.2. Dell says it's a generic
|
||
|
USL bug.
|
||
|
There is an easy workaround; make sure /etc/inittab is an odd number of
|
||
|
characters long. The bug is causes by an off-by-one in a buffer malloc.
|
||
|
|
||
|
29. at(1) breaks here-documents in shell scripts
|
||
|
at adds gratuitous empty lines to the job submitted by the user.
|
||
|
This prevents shell here-documents from working.
|
||
|
|
||
|
30. UHC mouse driver ignores the middle button
|
||
|
This may be a generic USL problem, but Dell (at least) has fixed it. UHC
|
||
|
says they have a patch for it, but I haven't seen the patch.
|
||
|
|
||
|
31. mmap acces doesn't update file mod times
|
||
|
Peter Wemm <peter@DIALix.oz.au> reports that under SVr4, if one mmap()'s a
|
||
|
file, and writes to it via the mapped memory, when the disk is updated, the
|
||
|
modification time does not update.
|
||
|
|
||
|
32. AT&T select(2) is incompatible with BSD select(2)
|
||
|
|
||
|
Paul Eggert <eggert@twinsun.com>, as quoted by James Buster <bitbug@lynx.com>
|
||
|
reports:
|
||
|
|
||
|
The select() system call waits for read, write, or exception activity
|
||
|
on a set of file descriptors, and yields an integer telling you how
|
||
|
much activity it found.
|
||
|
|
||
|
BSD's select(N,&R,&W,&E,&T) can yield up to 3*N, because BSD's select()
|
||
|
counts the number of bits that it turns on in in the R, W, and E
|
||
|
arguments, and R, W, and E each contain one bit per file descriptor.
|
||
|
However, System V Release 4 v2.1's select(N,&R,&W,&E,&T) yields at most N,
|
||
|
because SVR4's select() just counts the number of active file
|
||
|
descriptors, regardless of how many bits it turns on.
|
||
|
|
||
|
For example, the following code checks file descriptor 0. In BSD, this
|
||
|
code can set n to 2 if file descriptor 0 is ready for both reading and
|
||
|
writing. However, in SVR4, this code sets n to at most 1, because only
|
||
|
file descriptor 0 is active.
|
||
|
|
||
|
int n;
|
||
|
fd_set r, w;
|
||
|
FD_ZERO(r); FD_SET(0, &r);
|
||
|
FD_ZERO(w); FD_SET(0, &w);
|
||
|
n = select(1, &r, &w, (fd_set*)0, (struct timeval*)0);
|
||
|
|
||
|
At least one widely used piece of software depends on the BSD
|
||
|
behavior, namely X11R5 (see Xt/NextEvent.c). In this application, the
|
||
|
bug's symptoms are subtle and are rarely encountered, but they do
|
||
|
exist.
|
||
|
|
||
|
Most of X11R5's calls to select() don't care about this difference,
|
||
|
but the following files in the X11R5 distribution contain calls to
|
||
|
select() that may be affected by this bug:
|
||
|
|
||
|
contrib/lib/i18nXView2/lib/libxview/notify/ndetselect.c
|
||
|
contrib/lib/xview3/lib/libxview/notify/ndetselect.c
|
||
|
mit/fonts/server/os/waitfor.c
|
||
|
mit/lib/Xt/NextEvent.c
|
||
|
mit/server/os/WaitFor.c
|
||
|
|
||
|
(Note: this is a very old bug. Paul Eggert tells me that William Kucharski reported this bug to AT&T in 1989 when he ported X11R3!)
|
||
|
|
||
|
33. (4.2) The login program requires its PPID to be 1
|
||
|
Rick Richardson reports: "The "/bin/login" program has been changed to be
|
||
|
hardwired to require its PPID to be "1". In all other versions of UNIX, it is
|
||
|
sufficient that there be an /etc/utmp entry. This bug was reported to USL, and
|
||
|
I did get a fixed "login" program from them, but the fix did not make it into
|
||
|
the release. I don't know how mere mortals get the fix at this point."
|
||
|
|
||
|
34. (4.2) Bad MAXMINOR values can make the system unbootable
|
||
|
Rick Richardson reports: "If MAXMINOR is stune'ed to the maximum value,
|
||
|
0x3fff (18 bits), then the kernel will refuse to boot, cycling up to driver
|
||
|
initialization and then doing a processor recent. Interestingly, this bug was
|
||
|
not in the beta release, but was in the final release."
|
||
|
|
||
|
35. (4.2) Incompatible change in TZ interpretation
|
||
|
Rick Richardson reports: "While not really a bug, this is a surprise. In
|
||
|
4.2, the TZ variable was given a new meaning. Rather than the traditional
|
||
|
CST6CDT type of value, it now looks like ":US/Central". This causes 3.2 and
|
||
|
4.0 binaries which use the date/time routines to report GMT time. I have no
|
||
|
idea why another variable name was not choosen. I've taken to aliasing the
|
||
|
binaries, e.g. "TZ=CST6CDT svr4binary"."
|
||
|
Mike "Ford" Ditto <ford@omnicron.com> corrects this. "This change
|
||
|
was made in 4.0, not 4.2, and 4.0 binaries should have no problem with
|
||
|
the new format. Some 4.0 systems use the new format by default. The
|
||
|
old format should be avoided unless SVR3 binaries are in use, since
|
||
|
the new features of the time conversion libraries are only available
|
||
|
if the new format is used."
|
||
|
Christoph Badura points out that the time functions still read the old
|
||
|
TZ format, so you can set TZ=CST6DT or whatever and only the new features
|
||
|
will be disabled.
|
||
|
|
||
|
36. Nulls in pixmaps can crash X
|
||
|
Rick Richardson reports: "Displaying XPM2 pixmaps which have NULLS in them
|
||
|
will crash the X server. Admittedly, this is not much of a bug, since these
|
||
|
are ill-formed or corrupted pixmaps. But the server should stay up, even in
|
||
|
these conditions. A little error checking needed."
|
||
|
|
||
|
37. Potential security hole in SVr4s using sendmail
|
||
|
Christoph Badura writes: "/usr/ucblib/aliases contains an alias for
|
||
|
decode that feeds straight into uudecode. I don't know under what uid
|
||
|
uudecode gets invoked, but if it's root anyone can overwrite any file
|
||
|
on a SVR4 system running the stock sendmail. [Under Dell UNIX] t
|
||
|
appears that the files get created with a user-ID of "daemon". Not
|
||
|
nice but better than root."
|
||
|
|
||
|
38. Reporting bug in df on non-root filesystems
|
||
|
Paul Debra <debra@win.tue.nl> discovered that if df(1) is run on a
|
||
|
filesystem other than root with a n argument of `.', the file system
|
||
|
name is always reported as '/'. This does *not* happen if you give
|
||
|
it $PWD as argument.
|
||
|
This bug is present in Dell 2.2.
|
||
|
|
||
|
39. tar writes -v output to stdout, not stderr
|
||
|
This is an incompatible, undocumented change from earlier UNIXes and
|
||
|
royally screws up invocations like /bin/tar cvf - foo | /bin/tar tf - that
|
||
|
previously worked.
|
||
|
Observed in ESIX 4.0.3A and 4.0.4, Dell 2.2; probably generic. It
|
||
|
also existed in SCO ODT and Xenix before 2.0 and 3.2v4, but has been fixed in
|
||
|
these most recent versions.
|
||
|
|
||
|
40. SIGPIPE is delayed and not reliable
|
||
|
Wolfgang Denk reports a kernel bug in src/uts/i386/fs/fifofs/fifovnops.c
|
||
|
that results in SIGPIPE not getting raised immediately by failed writes.
|
||
|
You can reproduce this with the following program:
|
||
|
|
||
|
1 #include <stdio.h>
|
||
|
2 #include <signal.h>
|
||
|
3
|
||
|
4 extern int errno;
|
||
|
5
|
||
|
6 int sp();
|
||
|
7
|
||
|
8 int eop = 0;
|
||
|
9
|
||
|
10 char *line = "This is garbage.\n";
|
||
|
11
|
||
|
12 main () {
|
||
|
13 int i;
|
||
|
14 int l = strlen (line);
|
||
|
15
|
||
|
16 signal (SIGPIPE, sp);
|
||
|
17 for (;;) {
|
||
|
18 /*
|
||
|
19 for (i=0; i<10000; ++i) ;
|
||
|
20 */
|
||
|
21 if (write(1, line, l) != l) {
|
||
|
22 fprintf (stderr, "write error, errno=%d, eop=%d\n",
|
||
|
23 errno, eop);
|
||
|
24 fflush (stderr);
|
||
|
25 exit (errno);
|
||
|
26 }
|
||
|
27 }
|
||
|
28 }
|
||
|
29
|
||
|
30 int sp()
|
||
|
31 {
|
||
|
32 fprintf (stderr, "SIGPIPE\n");
|
||
|
33 fflush (stderr);
|
||
|
34 eop = 1;
|
||
|
35 }
|
||
|
|
||
|
To test this, pipe its reslt to ls.
|
||
|
|
||
|
He writes: "That is, you can't be sure that SIGPIPE will be raised when a pipe
|
||
|
breaks. Adding a short delay (for instance by uncommenting the for loop around
|
||
|
line 19) gives _always_ SIGPIPE -- but usually you don't want to have
|
||
|
additional delays in your program :-("
|
||
|
|
||
|
Bernard Fouche <bernard@cpio1.fr.mugnet.org> observes that this is
|
||
|
not necessarily a bug. He writes: "Compile your example with the
|
||
|
following change :
|
||
|
|
||
|
- do not include your delay loop.
|
||
|
- add a line between line 24 and 25. This line will be :
|
||
|
sleep(60);
|
||
|
This change will make a.out stay alive for 1 minute before
|
||
|
exiting.
|
||
|
- recompile, run with 'a.out|ls'.
|
||
|
- do 'ps -le |grep a.out'.
|
||
|
|
||
|
What you'll see is that a.out is now running in the background and its
|
||
|
father is init(1)! So the return value of write(2) (EIO) can now be
|
||
|
understood.
|
||
|
|
||
|
The only thing that I can tell is that pipes, that are now based on
|
||
|
streams in SVR4, have a more complex behavior than in SVR3.2 but I
|
||
|
would not call problem #40 a 'bug'. It can be related to the shell
|
||
|
that ran the command and/or the scheduler and/or the stream subsystem."
|
||
|
|
||
|
41. /usr/lib/acct/fwtmp doesn't work
|
||
|
John F. Haugh reports that under Dell UNIX the /usr/lib/acct/fwtmp command
|
||
|
does not work as described in the man page; the output contains no line
|
||
|
feeds and appears to be garbage. I have verified this.
|
||
|
This is probably a generic SVr4 bug.
|
||
|
|
||
|
42. whatis database is full of garbage.
|
||
|
Raymond Nijssen <raymond@woensel.es.ele.tue.nl> reports: "Both under ESIX
|
||
|
4.0.3 and 4.0.4, whatis database contains an awful lot of garbage, such as
|
||
|
nroff macros. In addition, quite a lot of man pages mentioned are missing, and
|
||
|
several available man pages are not mentioned. Since makewhatis is broken (at
|
||
|
least under 4.0.3A), this cannot be repaired easily. ESIX blamed USL for
|
||
|
this."
|
||
|
|
||
|
43. mmap is seriously broken
|
||
|
|
||
|
(thanks to Peter Wemm <peter@zeus.dialix.oz.au> for a detailed report.)
|
||
|
|
||
|
ALL SVR4.0s have/had a nasty kernel bug that causes seemingly random executable
|
||
|
and shared library corruption, and also unleashes a SERIOUS security bug. The
|
||
|
"Copy-on-Write" mechanism within the kernel has bugs. It is sufficient to say
|
||
|
that the security related bug allows any user with shell and compiler access to
|
||
|
WRITE to any file that they can read.
|
||
|
|
||
|
SVR4.2 has been fixed for some time. ICL apparently fixed it in their sparc
|
||
|
reference port (and x86 port), which means that Solaris2.x do not have the
|
||
|
bugs.
|
||
|
|
||
|
The most common symptom of shared library corruption is that programs
|
||
|
simply core dump when you attempt to access a non existing file.
|
||
|
|
||
|
$ more /notexisting
|
||
|
Segmentation Fault (core dumped).
|
||
|
|
||
|
To recover from this, restore /usr/lib/libc.so.1 from the distribution media.
|
||
|
|
||
|
The security bugs have no known workaround, other than crippling the mmap()
|
||
|
function in the kernel.
|
||
|
|
||
|
Dell has produced a fix for their release 2.2 systems. The patch is
|
||
|
available from dell1.dell.com:/support2.2/CoW.t
|
||
|
|
||
|
Although it has not been tested, it is very unlikeley that Dell's patch will
|
||
|
work on any other SVR4/386, as it replaces two kernel modules, and Dell's
|
||
|
kernel has autoconfiguration extensions that are not present in other systems.
|
||
|
|
||
|
Dell 2.2 has got a STREAMS optimizer function enabled in the system that joins
|
||
|
together small adjacent streams messages. There were bugs in the early USL
|
||
|
versions of this, but for 2.2, Dell enabled it after applying a fix from USL.
|
||
|
It seems that in some rare circumstances, some machines are quite unstable with
|
||
|
this enabled as default. support2.2/CoW.t also disables the optimization to
|
||
|
improve stability. This brings Dell 2.2 into line with the other SVR4.0.4
|
||
|
systems.
|
||
|
|
||
|
44. a bug in xterm
|
||
|
|
||
|
Nickolay Saukh <nms@ussr.eu.net> reports ""
|
||
|
|
||
|
45. DrawText16() bug in XWIN
|
||
|
Nickolay Saukh <nms@ussr.eu.net> reports "xterm strips off the eight bit of
|
||
|
first character in line. This bug was present in x11r5 but fixed by some
|
||
|
patch. I have no exact info under my thumb."
|
||
|
(Can anyone else confirm this bug?)
|
||
|
|
||
|
46. output redirection with exec fails in sh
|
||
|
Andreas Luik <luik@isa.de> reports: "In Bourne shell scripts, the output of
|
||
|
all following commands may be redirected using the "exec" builtin with an
|
||
|
output redirection, e.g.
|
||
|
|
||
|
exec > LOG
|
||
|
|
||
|
If such a construct is used in a for loop with a variable filename for the
|
||
|
redirection, e.g. exec > $f, only the first output redirection is executed in
|
||
|
the SVR4 /bin/sh. It works correctly in /bin/ksh as well as in the HPUX, SunOS
|
||
|
4.1 and AIX Bourne shells."
|
||
|
|
||
|
47. rm fails to reject . or .. arguments
|
||
|
Andreas Luik <luik@isa.de> reports: "rm does not check for `.' and `..'
|
||
|
arguments. The rm program should check for the arguments `.' and `..' (at
|
||
|
least if called with the -r option) and ignore this arguments with the message
|
||
|
"rm: cannot remove `.' or `..'". All implementation I'm aware of perform this
|
||
|
check. As far as I know, this check is also in the SVR4 sources but implemented
|
||
|
incorrect. This bug should be fixed for security reasons."
|
||
|
|
||
|
III. Serial-port and tty administration problems
|
||
|
Nickolay Saukh <nms@ussr.eu.net> reports "XWIN bug for DrawText16(). If one
|
||
|
tries to output text line with more then one font, then text segment with
|
||
|
second font (and subsequent segments) displayed shifted to left. This bug also
|
||
|
fixed by some patch to x11r5."
|
||
|
(Can anyone else confirm this bug?)
|
||
|
|
||
|
1. Dropout problems with tty devices
|
||
|
The most serious problem anyone has reported is that the USL asy driver is
|
||
|
flaky and occasionally drops characters at above 4800 baud.
|
||
|
Microport, Dell, Esix, and UHC say that they believe they've fixed this.
|
||
|
However, Dell, at least, was mistaken when they first made this claim; a more
|
||
|
detailed description of the problem is given below. I have been assured that
|
||
|
this is on the fix list for the next Dell release.
|
||
|
Bela Lubkin at SCO comments "386 interrupt latency vs. unbuffered UARTs.
|
||
|
This is a tough problem. Nobody's driver should drop characters with a
|
||
|
turned-on 16550. It's not so easy with a 16450. Anyone with 16450s or lower
|
||
|
should be able to solve their problems by dropping in a 16550."
|
||
|
|
||
|
2. Quick port setup option in sysadm is broken
|
||
|
In 4.0.3 sysadm, the quick port setup option, which is used to add and
|
||
|
delete terminal ports, is seriously broken. The script modifies /etc/conf/*
|
||
|
files, and has incorrect minor numbers, sets the 5th field of sdevice.d to Y
|
||
|
when it should be N, and is missing columns for node.d. See
|
||
|
/usr/sadm/sysadm/bin/q-add. This bug is present in USL 4.2 as well
|
||
|
(certainly in Consensys V.4.2).
|
||
|
|
||
|
3. ttymon drops DTR when it shouldn't
|
||
|
Hugh Stearns <hoyt@isus.tnet.com> reports that in 4.0.3.6 the ttymon(1)
|
||
|
utility for HDB uucp drops DTR every few weeks. The workaround is to disable
|
||
|
and re-enable it.
|
||
|
The SVr4.2 ttymon is even more broken; it *never* raises DTR after the
|
||
|
first outgoing call. Jeremy Chatfield at IF has confirmed that this is a
|
||
|
real bug in the USL sources and is on his urgent-fix list.
|
||
|
|
||
|
In the May 10, 1993 issue of Open Systems Today, page 70, Jason Levitt
|
||
|
describes some of his ttymon problems. He has a file posted on ftp.uu.net
|
||
|
under /published/open-systems-today/other/svr42uucp.tar; This tar file
|
||
|
contians a fixed ttymon program along with a text file describing setting
|
||
|
up ttymon and uucp so that it works pretty well.
|
||
|
|
||
|
4. ttymon doesn't drop DTR when it should
|
||
|
Stephen Hebditch <steveh@orbital.demon.co.uk> reports from a Dell
|
||
|
2.2 system:
|
||
|
"When a user logs out, ttymon does not appear to lower the DTR line for
|
||
|
a sufficiently long enough time to always cause the modem to drop
|
||
|
carrier. The WorldBlazer modem here is set to its default of 50ms DTR
|
||
|
detection time - the minimum time allowable - but around 2 times out of
|
||
|
10, when a user logs out it will not drop carrier although the DTR
|
||
|
light on its front panel can be seen to blink momentarily.
|
||
|
Disabling service for a particular device (e.g. using 'pmadm -d -p
|
||
|
ttymon3 -s 00') will only work if ttymon hasn't spawned a child process
|
||
|
for that port.
|
||
|
According to the manual "ttymon should exit if no one types anything in
|
||
|
<timeout> seconds after the prompt is sent". Occasionally when hanging
|
||
|
up an outgoing connection, spurious characters can trigger ttymon into
|
||
|
thinking that there is a new user wanting to log in. Because it has
|
||
|
seen these characters, ttymon will then not time-out, locking up that
|
||
|
port until the controlling ttymon child process is killed."
|
||
|
See the fix note attached to III.3.
|
||
|
|
||
|
5. (4.2) Terminating cu to a direct line locks up the port
|
||
|
The problem is the C2 security mechanisms. Terminating cu with ~.
|
||
|
doesn't tear them down correctly. Subsequently, another cu(1) will be
|
||
|
able to get at the port, but utilities which try to get at it directly (i.e.,
|
||
|
cat or stty) won't be.
|
||
|
Rick Richardson <rick@digibd.com> adds: "The "cu" problem where ports
|
||
|
can't be used by stty, seyon, or other programs once "cu" has had its way
|
||
|
with them: This problem apparently affects any program (cu, uucp) that uses
|
||
|
the DIAL(3) routines. Those routines have been modified to use the "cs"
|
||
|
connection server daemon to open the port and/or dial a phone number on behalf
|
||
|
of the client (though you'd hardly realize this from reading the manual page).
|
||
|
The "cs" daemon does *something*, where *something* is not known yet, which
|
||
|
causes all subsequent termio type ioctl's to fail. This bug has been reported
|
||
|
to USL and Univel, but no fix has been forthcoming."
|
||
|
He continues: "I had our streams device driver guy put in a version of one
|
||
|
of our serial port drivers with debugging turned on, and he said that it looked
|
||
|
like the driver "close" routine was never getting called - possibly because the
|
||
|
device close call only happens on the last close of a device, and the
|
||
|
connection server has still got the port open. This theory would seem to
|
||
|
indicate that "cu" and "uucp" are fine, but that the connection server is
|
||
|
broken. We don't really know, though -- its just a theory.
|
||
|
See the fix note attached to III.3.
|
||
|
|
||
|
6. Hardware flow control bug breaks streaming data transfers
|
||
|
Stephen Hebditch <steveh@orbital.demon.co.uk> reports from a Dell
|
||
|
2.2 system:
|
||
|
"There is a definite problem with hardware flow control. If
|
||
|
characters are being continually sent to the modem with no break, then
|
||
|
after around 40K or so the asy driver will ignore the fact that the
|
||
|
modem has lowered the CTS line and will keep on sending. Up to that
|
||
|
point it will correctly stall when the CTS line is lowered. If there
|
||
|
is a break in sending, then flow control will work correctly once
|
||
|
more. This means that streaming protocols such as Z-Modem will break
|
||
|
but simpler protocols like UUCP g which don't fill up the modem buffer
|
||
|
will work correctly."
|
||
|
Your editor has seen this one himself while attempting to use rz
|
||
|
for uploads to his friendly Internet site, as was his wont under SVr3.
|
||
|
I now get around this by using ymodem protocol for uploads.
|
||
|
This is probably a generic bug in 4.0.4 serial handling.
|
||
|
|
||
|
7. Bad interaction between ttymon and networking
|
||
|
Stephen Hebditch <steveh@orbital.demon.co.uk> reports from a Dell
|
||
|
2.2 system:
|
||
|
"A problem with ttymon, in.telnetd and in.rlogind. When a user logs out,
|
||
|
wrong entries are written to utmp and wtmp. This results in utmp and
|
||
|
wtmp containing a new record for that user for a session starting at
|
||
|
the time that they logged out. This results in some programs (finger
|
||
|
for example) showing that users are logged in when they are not and
|
||
|
means that login accounting is not possible."
|
||
|
See the fix note attached to III.3.
|
||
|
|
||
|
IV. Networking and File Sharing Bugs
|
||
|
|
||
|
1. NFS locking is unusably slow
|
||
|
Randy Terbush <randy@dsndata.dsndata.com> has posted code which
|
||
|
demonstrates a serious bug in the SVr4 NFS locking daemon.
|
||
|
In his own words: "The symptoms are ~30% cpu usage by 'lockd' and
|
||
|
severe slowing of the machines on the network. This program
|
||
|
demonstates that it takes ~20 seconds to obtain locks from an ailing
|
||
|
'lockd'. We have verified that this bug does not exist in HPUX 8.0x."
|
||
|
Randy's code is too large to be included here. He is, quite
|
||
|
rightly, exercised at USL's exceedingly slow response to this problem.
|
||
|
The comment in his makefile reads, in part:
|
||
|
|
||
|
# USL has admitted to the existance of this bug in version 4.0, 4.1,
|
||
|
# and 4.2 of their distributed and yet to be released sources. This is
|
||
|
# a network crippling problem that they have refused to fix until
|
||
|
# release 4.3, which will be OVER 1 YEAR from today. (29 Oct 1992)
|
||
|
# If your version of 'lockd' exhibits this same problem, I would
|
||
|
# strongly urge you to contact your vendor and ask them to put some
|
||
|
# pressure on USL to fix this problem. SVR4 is virtually useless in a
|
||
|
# network of shared resources while this problem exists.
|
||
|
|
||
|
2. UFS file system problems
|
||
|
In stock USL 4.0.3, you can't use a UFS file system as the root; the system
|
||
|
hangs if you try. Consensys, Dell, Esix, Microport, MST, UHC, and ESIX all
|
||
|
appear to have fixed this.
|
||
|
David Aitken, the UNIX product manager at UHC, writes "The ufs as root file
|
||
|
system [problem] was not really a bug, just a little oversight on USL's part -
|
||
|
we have fixed it completely by adding one line to the /stand/boot script:
|
||
|
rootfstype=ufs!" He adds that they've been using ufs on their lab machines for
|
||
|
over 10 months with no trouble, and the latest UHC release defaults to ufs if
|
||
|
you have more than 120MB of disk.
|
||
|
|
||
|
3. Byte-order problem with NFS when accessing Sun disks
|
||
|
Christoph Badura <bad@generics.ka.sub.org> notes that the stock USL resolver
|
||
|
library suffers from serious confusion about the byte order in the
|
||
|
socketaddr_in structure. This bug is acknowledged by USL for the 4.0.4
|
||
|
release. A symptom of this bug is that Sun disks will not mount correctly over
|
||
|
NFS. As a workaround, try removing the references to /usr/lib/resolv.so from
|
||
|
/etc/netconfig and rebooting your system. Unfortunately, this will mean
|
||
|
you can't use nameservers.
|
||
|
Alan Batie <batie@agora.rain.com> writes: "Actually, you don't have to
|
||
|
remove resolv.so, just put tcpip.so first and have a hosts file with the names
|
||
|
of hosts you want to do NFS mounts from. This way you can use nameservers for
|
||
|
most things."
|
||
|
|
||
|
4. Under weird circumstances, lseek on UFS may cause corruption
|
||
|
Christoph Badura <bad@generics.ka.sub.org> reports that a UFS lseek() to an
|
||
|
offset which is a multiple of 4096 but not a multiple of 8192, followed by a
|
||
|
write(), may corrupt the file being written. The bug shows up only, if the
|
||
|
file has no pages in the page pool associated with it at the seek offset and at
|
||
|
4k before the seek offset. He has sent USL kernel fix for this, which was
|
||
|
included in 4.0.4.
|
||
|
|
||
|
5. FTP problems
|
||
|
The in.ftpd on SVR4.0.3 does not support all the commands listed in RFC 959.
|
||
|
When recent SCO UNIX/ODT versions ftp to SVR4.0.3, the SVR4 side will refuse,
|
||
|
drop the connection, and core dump after you authenticate. This is because the
|
||
|
SCO end sends the 'SYST' command ala RFC 959, and the SVR4.0.3 end doesn't
|
||
|
recognise it. Some ports have fixed this.
|
||
|
Christoph Badura adds: "The bug is do to a longjmp(3) on a sigjmpbuf obtained
|
||
|
by sigsetjmp(3). ARGH. Testing led to a bug in the original BSD sources, which
|
||
|
is still present in the NET/2 ftpd. "
|
||
|
|
||
|
6. A bug in the WD80x3 support
|
||
|
MST reports a serious bug in the SVr4 kernel support for this card. Here's
|
||
|
how to reproduce it:
|
||
|
|
||
|
server: init 3 and share (export) /usr for example.
|
||
|
|
||
|
client: mount -F nfs server:/usr /mnt
|
||
|
cd /mnt
|
||
|
find . -print | cpio -ocBuv > /dev/null
|
||
|
|
||
|
what happens:
|
||
|
server and client will "hang" together.
|
||
|
|
||
|
"cue":
|
||
|
hit keys on server and/or client, hang will go away
|
||
|
for 10-20 seconds temporarily. Yank BNC connectors
|
||
|
do the same trick.
|
||
|
|
||
|
They say they've heard from customers that this happens on Dell, UHC as well
|
||
|
as USL 4.0.4. PCNFS/BWNFS network xcopy suffers this as well. Client can be a
|
||
|
Sun Sparc for that matter.
|
||
|
|
||
|
7. Security hole near fingerd
|
||
|
Jerry Whelan <guru@stasi.bradley.edu> reports:
|
||
|
We encountered a cute security hole in AT&T SVR4 2.1 (which I believe
|
||
|
translates to USL 4.0.2). It apparently was fixed in AT&T SVR4 3.0. The
|
||
|
hole related to the finger daemon. If a user set his .plan to a symbolic
|
||
|
link pointing to a protected file (such as /etc/shadow, or somebody's
|
||
|
mail file) then fingering the user would cause the finger daemon to read
|
||
|
that file and display it.
|
||
|
I don't know if the bug exists in any other vendor's versions of 4.0.2.
|
||
|
We replaced our fingerd with gnu finger, only to find the same problem.
|
||
|
I sent the changes back to the gnu finger developer, but I don't think a
|
||
|
newer fixed version has been officially released yet.
|
||
|
Steve Peltz <peltz@cerl.uiuc.edu> writes: "The fix to the fingerd problem
|
||
|
(pointing a .plan file to a protected file and thus getting read access to it)
|
||
|
can be fixed by changing inetd.conf to not give root privileges to the fingerd
|
||
|
process. It seems like overkill to have fingerd set to the user id of the
|
||
|
person you're fingering to see if you should have access to the file."
|
||
|
|
||
|
8. Fatal bug in priority-band message handling.
|
||
|
Douglas C. Schmidt" <schmidt@liege.ICS.UCI.EDU> reports:
|
||
|
There is a bug with handling priority-band messages that causes several
|
||
|
System V Release 4 versions (particularly Solaris 2.1) to crash. The following
|
||
|
code replicates the problem. Sun has been notified and claims they will fix
|
||
|
this problem in the next release (2.2?).
|
||
|
|
||
|
/* This program causes System V Release 4 to crash! */
|
||
|
#include <sys/types.h>
|
||
|
#include <sys/fcntl.h>
|
||
|
#include <stdio.h>
|
||
|
#include <stropts.h>
|
||
|
|
||
|
#define FIFO "/tmp/foo"
|
||
|
#define BIGFILE "/usr/dict/words"
|
||
|
|
||
|
static int
|
||
|
do_child (int fifo_fd)
|
||
|
{
|
||
|
struct strbuf msg;
|
||
|
char buf[BUFSIZ];
|
||
|
|
||
|
msg.maxlen = sizeof buf;
|
||
|
msg.buf = buf;
|
||
|
|
||
|
do
|
||
|
{
|
||
|
int flags = 0;
|
||
|
|
||
|
if (getmsg (fifo_fd, 0, &msg, &flags) != -1)
|
||
|
(void) printf ("(%2d) (%2d): %s",
|
||
|
msg.len - sizeof (int), *(int *) msg.buf, msg.buf + sizeof (int));
|
||
|
else
|
||
|
return -1;
|
||
|
}
|
||
|
while (msg.len != 0);
|
||
|
|
||
|
return 0;
|
||
|
}
|
||
|
|
||
|
static int
|
||
|
do_parent (int fifo_fd)
|
||
|
{
|
||
|
FILE *fp;
|
||
|
char buf[BUFSIZ];
|
||
|
|
||
|
(void) srand ((unsigned) time (0));
|
||
|
|
||
|
if ((fp = fopen (BIGFILE, "r")) == 0)
|
||
|
return -1;
|
||
|
|
||
|
while (fgets (buf + sizeof (int), sizeof buf, fp) != 0)
|
||
|
{
|
||
|
struct strbuf msg;
|
||
|
int band = rand () % 11;
|
||
|
|
||
|
msg.buf = buf;
|
||
|
msg.len = strlen (buf + sizeof (int)) + 1 + sizeof (int);
|
||
|
*(int *) buf = band;
|
||
|
|
||
|
if (putpmsg (fifo_fd, 0, &msg, band, MSG_BAND) == -1)
|
||
|
return -1;
|
||
|
}
|
||
|
return 0;
|
||
|
}
|
||
|
|
||
|
int
|
||
|
main (void)
|
||
|
{
|
||
|
int fd;
|
||
|
|
||
|
#if defined (TEST_FIFO)
|
||
|
(void) unlink (FIFO);
|
||
|
|
||
|
if (mkfifo (FIFO, 0666) == -1)
|
||
|
perror ("mkfifo"), exit (1);
|
||
|
#else
|
||
|
int pipe_fds[2];
|
||
|
|
||
|
if (pipe (pipe_fds) == -1)
|
||
|
perror ("pipe"), exit (1);
|
||
|
#endif
|
||
|
|
||
|
switch (fork ())
|
||
|
{
|
||
|
case -1:
|
||
|
perror ("fork"), exit (1);
|
||
|
/* NOTREACHED */
|
||
|
case 0:
|
||
|
#if defined (TEST_FIFO)
|
||
|
if ((fd = open (FIFO, O_RDONLY)) == -1)
|
||
|
return -1;
|
||
|
#else
|
||
|
fd = pipe_fds[0];
|
||
|
close (pipe_fds[1]);
|
||
|
#endif
|
||
|
if (do_child (fd) == -1)
|
||
|
perror ("do_child"), exit (1);
|
||
|
|
||
|
break;
|
||
|
default:
|
||
|
#if defined (TEST_FIFO)
|
||
|
if ((fd = open (FIFO, O_WRONLY)) == -1)
|
||
|
return -1;
|
||
|
#else
|
||
|
fd = pipe_fds[1];
|
||
|
close (pipe_fds[0]);
|
||
|
#endif
|
||
|
if (do_parent (fd) == -1)
|
||
|
perror("do_parent"), exit (1);
|
||
|
break;
|
||
|
}
|
||
|
return 0;
|
||
|
}
|
||
|
|
||
|
9. SVr4.0.4 TCP/IP routing is broken
|
||
|
Raymond Nijssen <raymond@woensel.es.ele.tue.nl> reports:
|
||
|
"I found a problem with ESIX 4.0.4 TCP/IP routing. I'm not sure if it's also
|
||
|
present in other SVR4 flavors. The problem is that once a system has received
|
||
|
an ICMP route redirect message, it is supposed to store the new route in its
|
||
|
routing tables. This does not work properly, which is revealed by ping(1)ing
|
||
|
to a host though a gateway in a more complex network configuration. For almost
|
||
|
every packet is sent to another gateway than the one which corresponds with the
|
||
|
network of the destination. This in turn leads to an enormous amount of ICMP
|
||
|
messages, which leads to bad network thoughput. We also had some mysterious
|
||
|
crashes until we decided to change the network configuration to circumvent this
|
||
|
problem."
|
||
|
(This seems very likely to be a generic SVr4 problem).
|
||
|
|
||
|
10. df(1) on NFS volumes returns bad data
|
||
|
Raymond Nijssen reports from Esix 4.0.3A and 4.0.4: " Diskspace figures of
|
||
|
NFS mounted filesystems reported by both /bin/df and /usr/ucb/df are 4 times
|
||
|
too big."
|
||
|
|
||
|
11. rsh hogs the processor
|
||
|
Raymond Nijssen <raymond@woensel.es.ele.tue.nl> reports from Esix 4.0.3A and
|
||
|
4.0.4: "The rsh command hogs the CPU. On an empty system, `rsh foo -n bar'
|
||
|
takes 1 second kernel-mode CPU per second elapsed."
|
||
|
|
||
|
12. MTU for remote networks ignored
|
||
|
Nathan D. Lane <nathan@seldon.foundation.tricon.com> reports: "Esix 4.0.4
|
||
|
ignores the MTU for remote networks. I have PPP setup on my RS/6000 and the
|
||
|
Esix box connects via ethernet to the RS/6000. Packets are always sent out
|
||
|
"full size" by the Esix machine, no matter where their destination. It is my
|
||
|
understanding that, when routing to a remote network where the MTU is a)
|
||
|
unknown or b) set to something lower than 1536, the originating machine should
|
||
|
make the packets smaller. Instead, when the Esix box blasts out its packets
|
||
|
across the PPP link, it sends them full size, making the other end do *a lot*
|
||
|
of packet reassembly.""
|
||
|
This has not been confirmed on other ports, but seems likely to be
|
||
|
a generic SVR4 problem.
|
||
|
|
||
|
13. Bug in remote printing.
|
||
|
A couple of USENETters have reported that the remote-printing support for
|
||
|
lpr (the System V print spooler) is broken in SVr4.0. Printing is done
|
||
|
correctly, but the job is not then removed from the print queue on either
|
||
|
system.
|
||
|
|
||
|
V. SCSI Support Problems
|
||
|
|
||
|
1. sar is confused by SCSI
|
||
|
Sar -d doesn't work on SCSI drives. Dell fixed this in 2.1 and it's
|
||
|
reported to work OK in Esix 4.0.3A; no report of any other SVr4 having fixed
|
||
|
this yet. SCO fixed it in 3.2.4. Appears to be fixed in USL 4.2.
|
||
|
|
||
|
2. A configuration problem
|
||
|
Stock USL 4.0 requires you to jumper your SCSI devices to fixed IDs
|
||
|
during installation (it can be changed to any other ID after). Specifically,
|
||
|
the tape must be ID 6.
|
||
|
Dell says they've fixed this. The requirement is definitely still present
|
||
|
in Esix and Consensys 1.3. UHC thinks they've fixed this, but their 4.0.3.6
|
||
|
release still seems to demand ID 1 to install.
|
||
|
I've seen an email report that USL 4.2 still has this problem. But after
|
||
|
publishing this, I got a request for more info from Mike Drangula
|
||
|
<miked@usl.com> at USL. He wrote:
|
||
|
|
||
|
> As far as
|
||
|
> I know ( and I wrote the SCSI configuration tools for 4.2 ), there is only
|
||
|
> one case where a device is required to be at a particular SCSI ID, unless
|
||
|
> you count the requirement that the HBA be at ID 7.
|
||
|
>
|
||
|
> The only requirement for a given SCSI id is that, on a SCSI-based MCA
|
||
|
> machine that uses IBM's SCSI Host Adapters, the boot disk must be at ID 6
|
||
|
> if there is more than one disk installed on the HBA.
|
||
|
>
|
||
|
> The old requirement that the tape be set to SCSI ID 6 is no longer in effect.
|
||
|
> If your HBA will support booting from it, there is not even a requirement
|
||
|
> that the boot SCSI disk be at SCSI ID 0. The only requirement for disks is
|
||
|
> that the boot disk must have the lowest SCSI ID of any DISKS on the system
|
||
|
> ( except in the already noted case of MCA SCSI )
|
||
|
|
||
|
Give Mike a hand for actually reading this bug list.
|
||
|
|
||
|
3. Synchronous SCSI hang problem
|
||
|
David Wexelblat <dwex@mtgzfs3.att.com> reports: "Stock SVR4.0.3 will hang
|
||
|
the SCSI bus with a 1542 in synchronous mode. Dell fixed this, and this has
|
||
|
been given to Microport [ed note: Microport 4.0.4 and Consensys 4.0.3 have
|
||
|
fixed the problem; MST UNIX and Esix 4.0.3 still have this problem; I have not
|
||
|
yet been able to determine if ESIX 4.0.4 does]. In the file /sbin/bcheckrc,
|
||
|
change the line:
|
||
|
|
||
|
echo MARK > /dev/rswap
|
||
|
|
||
|
to
|
||
|
|
||
|
echo MARK | dd of=/dev/rswap bs=512 conv=sync > /dev/null 2>&1
|
||
|
|
||
|
The magic is apparently the conv=sync, which forces a 512 byte block
|
||
|
to be written. The original echo writes 4 bytes, which apparently causes
|
||
|
synchronous SCSI to go out to lunch.
|
||
|
|
||
|
Now, you ask, how can I fix this, since the system won't boot? There are
|
||
|
a couple of methods. First, if possible, disable synchronous negotiation
|
||
|
(1542 jumper J5-1 removed, plus whatever you may need to do to your drive).
|
||
|
Then boot up, edit /sbin/bcheckrc, then shutdown, restrap for synchronous,
|
||
|
then reboot. Everything should be OK.
|
||
|
|
||
|
That's the easy way. Unfortunately, some hard drives will only work
|
||
|
in synchronous mode. Well, you can still recover from this phenomenon.
|
||
|
Here's how:
|
||
|
|
||
|
1) Install on your hard drive
|
||
|
2) Boot from the first boot floppy. When it tells you to, insert
|
||
|
the second boot floppy. At the first prompt, hit <DEL> to
|
||
|
break out to a shell.
|
||
|
3) Mount your hard drive under /mnt with the following command
|
||
|
(replace FS-TYPE with s5, s52, or ufs, whichever you used for
|
||
|
for your root partition):
|
||
|
|
||
|
/etc/fs/FS-TYPE/mount /dev/dsk/c0t0d0s1 /mnt
|
||
|
|
||
|
4) Now edit /mnt/sbin/bcheckrc:
|
||
|
|
||
|
ed /mnt/sbin/bcheckrc
|
||
|
|
||
|
You may want the 'ed' man page handy (I barely remember how to
|
||
|
to use 'ed' :->). For simplicity, you can delete/comment out
|
||
|
the offending line, then replace it with the correct line later.
|
||
|
5) Unmount the hard drive:
|
||
|
|
||
|
umount /mnt
|
||
|
|
||
|
6) Reboot from the hard drive. Everything should come up OK. and
|
||
|
you can finish editing /sbin/bcheckrc, if necessary.
|
||
|
|
||
|
Note that you perform these actions at your own risk. The first version was
|
||
|
performed by me on Microport SVR4, and the second was performed by someone
|
||
|
else (on my suggestion) on ESIX SVR4."
|
||
|
This problem appears to be fixed on Consensys 1.3 and Dell 2.1; also
|
||
|
(pace David's remark) in ESIX 4.0.4, which has
|
||
|
|
||
|
echo MARK | /sbin/dd.arch conv=sync > /dev/rswap 2> /dev/null
|
||
|
|
||
|
4. ps chokes on commands that do SCSI I/O
|
||
|
Hugh Stearns <hoyt@isus.tnet.com> reports that in 4.0.3.6, ps
|
||
|
doesn't work when a SCSI command in progress. It stops printing at the
|
||
|
process executing the scsi command.
|
||
|
This is still broken in Dell 2.2 and ESIX 4.0.3.
|
||
|
|
||
|
5. Transfer speed problems with Adaptec 1542B on 486s
|
||
|
If a system mount or install fails, try setting the DMA speed to 5MB/s,
|
||
|
rather than the default 5.7MB/s. This is accomplished by removing the jumper
|
||
|
shorting the 12th pin pair of jumper block 5.
|
||
|
|
||
|
6. df gives inaccurate values for large SCSI partitions
|
||
|
Derek Terveer <derek.terveer@stpaul.gov> reports "I was on a Esix 4.0.4
|
||
|
system recently with a >1024 cylinder (i.e., ~1.05 GB disk) and the df command
|
||
|
was giving wildly inaccurate values. I presume that this has something to do
|
||
|
with the size of the partitions, because it works just fine on a system with
|
||
|
smaller drives and partitions."
|
||
|
|
||
|
VI. Development Tools Problems
|
||
|
|
||
|
1. General UCB library brokenness
|
||
|
The BSD compatibility libraries were badly broken in USL code. A Dell
|
||
|
source adds "That meant that almost all the apps derived from them were broken
|
||
|
too. Most stuff like automount will die when you send a SIGHUP, instead of
|
||
|
rereading the map file. You can get a system into very strange states when
|
||
|
that happens."
|
||
|
John Sully <jms@mport> of Microport opines: "This is a bug in automount
|
||
|
itself rather than BSD compatibility, since the automount which comes with SVR4
|
||
|
is not compiled with the BSD libraries. (isn't this comforting?? :-()."
|
||
|
|
||
|
Peter Wemm <peter@DIALix.oz.au> reports "There is a very simple and reliable
|
||
|
sure to this sort of thing: Using your favourite hex editor, change all
|
||
|
instances of "signal" in the binary file to "sigset". Most BSD code assumes
|
||
|
that signal() auto-rearms after handling a signal. On SVR4, signal() does not,
|
||
|
but sigset() is argument compatible, and has BSD semantics."
|
||
|
|
||
|
Esix and UHC's BSD libraries are USL stock. I don't yet know
|
||
|
the status of other ports. Microport has run into things they think may be
|
||
|
symptoms of this but have no fix yet.
|
||
|
|
||
|
John Sully <jms@mport> of Microport counters with: "One common thread I find
|
||
|
on reading of these problems is that the BSD compatibility libraries are
|
||
|
*misused*. [...] The problem is that BSD and SYSV have similarly named .h files
|
||
|
which sometimes contain different definitions for objects with the same name.
|
||
|
This has been known to cause all sorts of problems because the SYSV headers are
|
||
|
picked up and then the calls are satisfied from the BSD library rather than the
|
||
|
shared object library. I have found that if you use /usr/ucb/cc that the BSD
|
||
|
compatibility is much less broken than it would seem at first because it
|
||
|
ensures that the correct headers are picked up."
|
||
|
|
||
|
However, note that there is at least one *real* bug known --- as of 4.0.4
|
||
|
the signal emulation cannot explicitly set a handler to SIG_DFL or SIG_IGN.
|
||
|
|
||
|
Developers should be very careful that if they use -L/usr/lib/ucb -lucb
|
||
|
the cc used is also the Berkeley cc.
|
||
|
|
||
|
2. USL emulation of BSD signals doesn't work
|
||
|
A different source reports that the the USL implementatation of BSD signals
|
||
|
is broken in both 4.0.3 and 4.0.4; in particular, the sigvec() family doesn't
|
||
|
work properly. It is possible to make minor tweaks to source to make such apps
|
||
|
work properly with the native USL signals implementation.
|
||
|
|
||
|
Here's more on the signals problem, thanks to Richard <rc@siesoft.co.uk>:
|
||
|
------------------------------------------------------------------------------
|
||
|
The problem is to do with the signal() function that is within the BSD
|
||
|
compatability libc.
|
||
|
|
||
|
To reproduce the problem do the following:
|
||
|
|
||
|
#include <stdio.h>
|
||
|
#include <sys/types.h>
|
||
|
#include <signal.h>
|
||
|
#include <sys/siginfo.h>
|
||
|
|
||
|
main()
|
||
|
{
|
||
|
signal(SIGPIPE,SIG_IGN);
|
||
|
pause();
|
||
|
}
|
||
|
|
||
|
and compile it with cc xx.c -o xx /usr/ucblib/libucb.a
|
||
|
|
||
|
(John Sully observes that this is definitely wrong; /usr/ucb/cc should have
|
||
|
been used rather than "cc ... -L/usr/ucblib -lucb" or the equivalent "cc ...
|
||
|
/usr/ucblib/libucb.a".)
|
||
|
|
||
|
If you run the program and then signal it with a SIGPIPE, the program
|
||
|
will die, even though you've told it to ignore SIGPIPE.
|
||
|
|
||
|
The fix is difficult unless you've got source because there's a missing 'else'
|
||
|
clause from the signal() code. This is the only signal fault I've found in
|
||
|
the BSD signal functions, details of the rumoured sigvec problem would be
|
||
|
useful?
|
||
|
|
||
|
If you're trying to compile an application you could change the application
|
||
|
code to do the following, this does work..
|
||
|
|
||
|
void
|
||
|
catch(s)
|
||
|
int s;
|
||
|
{
|
||
|
/* DO NOTHING */
|
||
|
;
|
||
|
}
|
||
|
|
||
|
main()
|
||
|
{
|
||
|
signal(SIGPIPE,catch);
|
||
|
pause();
|
||
|
}
|
||
|
|
||
|
SUMMARY
|
||
|
You can only change a signal handler to a function handler, any number of
|
||
|
times. Any attempt to set the handler to SIG_DFL, or SIG_IGN will fail.
|
||
|
|
||
|
This bug has given some people working with X11R5 aggro, causing the X server
|
||
|
to die when you close a client.
|
||
|
|
||
|
Christoph Badura <bad@flatlin.ka.sub.org> confirms this bug
|
||
|
He has sent USL a source fix. It appears already to have been fixed in Dell
|
||
|
2.2.
|
||
|
------------------------------------------------------------------------------
|
||
|
|
||
|
3. Possible string library problems
|
||
|
There are also persistent rumors of problems in the BSD-emulation string
|
||
|
libraries. I have not been able to pin down specifics on this.
|
||
|
|
||
|
4. USL's ndbm support is broken.
|
||
|
Christoph Badura <bad@generics.ka.sub.org> reports "The ndbm functions in
|
||
|
the ucb library are broken [apparently due to a compiler of optimizer bug in cc
|
||
|
-- ed.]. Try makeing the whatis data base for /usr/share/man with Tom
|
||
|
Christiansen's perl rewrite of man.
|
||
|
The easiest way to fix this is to compile GNU's replacement ndbm.c with gcc
|
||
|
-fpcc-struct-return -traditional (gcc1.40 or 2.2 will do nicely) and install it
|
||
|
in your C library. Source is available for FTP from prep.ai.mit.edu.
|
||
|
|
||
|
5. An include file is missing
|
||
|
Both 4.0.3 and 4.0.4 USL versions are missing the documented dial.h
|
||
|
file from their /usr/include directory. Dell 2.[12] has it.
|
||
|
|
||
|
6. sscanf(3) has a potential bug
|
||
|
Anthony Shipman <als@bohra.cpg.oz.au> reports: " I found the following bug
|
||
|
in SCO Unix 3.2.* and I think it may be common to many AT&T derived Unixes.
|
||
|
|
||
|
sscanf() calls _doscan() to read from a pretend file. The file
|
||
|
uses the string as a buffer and a fake file descriptor of 60 (=_NFILE).
|
||
|
Since _NFILE (for SCO UNIX) is 60 it assumes that fd 60 can never be open.
|
||
|
|
||
|
Then when fscanf() hits the end of the string it calls _filbuf() to read
|
||
|
into the buffer (which is the string) from fd 60. This should fail with
|
||
|
an errno=9 and then _filbuf() sets EOF and it all terminates.
|
||
|
|
||
|
However in SCO Unix you can reconfigure the kernel to increase the number
|
||
|
of files per process to a recommended maximum of 150. If you do this then
|
||
|
your program might have fd 60 open one day. Then sscanf() will read from this
|
||
|
file overwriting your string. The byte count to the read() in _filbuf()
|
||
|
is some undefined but large value so a lot of memory will be overwritten. In
|
||
|
my case the string was on the stack so my stack was wiped.
|
||
|
|
||
|
In short if you configure your kernel to have NOFILES > _NFILE ie more than
|
||
|
the default then sscanf() is a time bomb in your code."
|
||
|
|
||
|
This is alleged to have been fixed in SVr4, but I haven't been able to
|
||
|
confirm the fix. Bob Tinsmamn of SCO support writes: "We're fixing it
|
||
|
too, in a maintenance supplement to the Development System that will
|
||
|
come out at the end of this year or the beginning of 1993, known as
|
||
|
Development System Maintenance Supplement 4.2 or MSD 4.2."
|
||
|
|
||
|
7. shmat(2) vs. vfork(2)
|
||
|
The shmat(2) call is known to interact bady with vfork(2). Specifically,
|
||
|
if you attach a shared-memory segment, vfork(), and then the child releases
|
||
|
the segment, the parent loses it too! Workaround; use fork(2).
|
||
|
UHC and Microport both suspect that they still have this bug and opine that
|
||
|
anyone who uses vfork deserves to lose. Dell has no plans to fix it.
|
||
|
|
||
|
John Sully <jms@mport.com> writes: "This is not a bug. It is completely
|
||
|
consistent with the semantics of a change to the address space of the child.
|
||
|
Think about it: any change to the address space of a child process created by
|
||
|
vfork(2) is reflected in the parent since the child is actually executing in
|
||
|
the parent's address space. Therefore if the child changes the address space
|
||
|
(in this case by releasing the shared memory segment) what should happen?
|
||
|
Right, the parent should have the same change happen. And what does happen?
|
||
|
The segment is released in the parent. One can argue about the braindead
|
||
|
semantics of vfork(2) all day, but the fact remains that this is exactly what
|
||
|
one would expect to happen. To quote from the manual page:
|
||
|
|
||
|
[...] vfork differs from fork in
|
||
|
that the child borrows the parent's *memory* and thread of
|
||
|
control until a call to execve or an exit (either by a call
|
||
|
to exit or abnormally.) [ emphasis added ]
|
||
|
|
||
|
and later:
|
||
|
|
||
|
It does not work, however, to return while
|
||
|
running in the child's context from the procedure which
|
||
|
called vfork since the eventual return from vfork would then
|
||
|
return to a no longer existent stack frame.
|
||
|
|
||
|
Please note that the entire address space of the parent is used by
|
||
|
the child created by vfork(2). The manual page also points out
|
||
|
several other caveats involved in doing anything to the parent's
|
||
|
address space except successfully calling an exec family function or
|
||
|
_exit (note it specifically says *not* to call exit(2)). I do not believe
|
||
|
that having a shared memory segment disappear from the parent's address
|
||
|
space is out of line after reading the man page for vfork(2).
|
||
|
|
||
|
It is interesting to note that Sun after implementing its new VM system in
|
||
|
SunOS 4.0 initially had no plans to support vfork, since they felt that the COW
|
||
|
semantics of the new fork would provide the necessary efficiency gain. Indeed
|
||
|
they found that most programs which used vfork worked just fine by doing
|
||
|
-Dvfork=fork. All that is, except for a certain popular command interpreter
|
||
|
[ed: can you say C shell?]. So we are stuck with the legacy of this braindead
|
||
|
system call.
|
||
|
|
||
|
BTW, Microport has no plans to fix this :-)."
|
||
|
|
||
|
8. FIONREAD fails on regular files
|
||
|
Christoph Badura <bad@generics.ka.sub.org> reports that the FIONREAD ioctl()
|
||
|
fails on regular (disk) files. He has sent USL a one-line kernel fix.
|
||
|
|
||
|
12. fread(3) does the wrong thing on pipes and FIFOs
|
||
|
Ed Hall <edhall@rand.org> writes: "Unlike the raw read() system call,
|
||
|
fread() is supposed to be able to make several partial reads to satisfy the
|
||
|
data requested by its arguments. The exceptions are an EOF or an error on the
|
||
|
stream. This characteristic is quite useful when moving data through pipes or
|
||
|
over network connections, since partial reads are quite common in these cases.
|
||
|
Well, the version of fread() in ESIX 4.0.3 (and likely other Sys5R4's) only
|
||
|
does a single physical read, and if it only satifies part of the requested
|
||
|
number of bytes, that's all you get. This can sting you even if you carefully
|
||
|
check the value returned by fread(), since the value returned is rounded down
|
||
|
to the number of complete "nitems" read, although your position in the stream
|
||
|
can be up to size-1 bytes beyond that point. Neither ferror() nor feof()
|
||
|
indicate anything is wrong when this happens."
|
||
|
This bug (which is also present in 4.0.4) is serious and nasty and should
|
||
|
be high on every porting house's list to fix. It appears to be peculiar to
|
||
|
USL 4.0.3 and 4.0.4; 4.0.2 does *not* have it, nor does SCO.
|
||
|
A USL source claims it has been fixed in 4.1.
|
||
|
|
||
|
10. putw appears to be broken
|
||
|
There is a bug in the ESIX SVR4.0.3A putw() routine in the C shared
|
||
|
library which is probably USL's. The following program demonstrates
|
||
|
it:
|
||
|
|
||
|
/* compile with: cc -o file file.c */
|
||
|
#include <stdio.h>
|
||
|
main()
|
||
|
{
|
||
|
int i;
|
||
|
for (i=0; i<1022; ++i) {
|
||
|
putchar('1');
|
||
|
}
|
||
|
putw(-11, stdout);
|
||
|
for (i=0; i<1022; ++i) {
|
||
|
putchar('1');
|
||
|
}
|
||
|
}
|
||
|
|
||
|
The putw() routine does not output 4 bytes, as it should. It may be
|
||
|
there is some interaction with buffer flushing that is causing the
|
||
|
problem. Also, note that if you change the sign of the first argument
|
||
|
to putw(), the program works fine.
|
||
|
|
||
|
11. Compiler problems
|
||
|
Ronald Guilmette <rfg@ncd.com> also reports the following:
|
||
|
|
||
|
------------------------------------------------------------------------------
|
||
|
/* Here is a bug in the original SVR4 C compiler (aka C Issue 5) which
|
||
|
effectively prevents you from making good use of the `const' and
|
||
|
`volatile' qualifiers defined by ANSI C in conjunction with pointer
|
||
|
types and typedef statements. Compile this code and you will get:
|
||
|
|
||
|
"qualifiers.c", line 23: left operand must be modifiable lvalue: op "="
|
||
|
|
||
|
...if your copy of the svr4 C compiler still has the bug. Note that
|
||
|
given these declarations, the ANSI C standard say that the thing pointed
|
||
|
to by the variable `pci' should be considered to be constant... not the
|
||
|
variable `pci' itself. (The GCC compiler, either version 1.x or version
|
||
|
2.x, correctly compiles this example without complaint.)
|
||
|
*/
|
||
|
|
||
|
typedef const int *ptr_to_const_int;
|
||
|
|
||
|
ptr_to_const_int pci;
|
||
|
|
||
|
int i;
|
||
|
|
||
|
void main ()
|
||
|
{
|
||
|
pci = &i;
|
||
|
}
|
||
|
------------------------------------------------------------------------------
|
||
|
/* Here is a subtle bug in the original SVR4 C compiler (aka C Issue 5)
|
||
|
which prevents you from first declaring a tagged type (i.e. a struct
|
||
|
type or a union type) in a parameter list, and then defining that tagged
|
||
|
type later on within the same scope. (Note that according to the ANSI C
|
||
|
standard, the scope in which parameters get declared and the outermost
|
||
|
block of a function body are one and the same scope. Thus, this really
|
||
|
is legal ANSI C code!)
|
||
|
|
||
|
Try compiling this with your C compiler on SVR4. If your compiler still
|
||
|
has the bug, you will get:
|
||
|
|
||
|
"tagged_type.c", line 24: warning: dubious tag declaration: struct S
|
||
|
"tagged_type.c", line 28: warning: improper member use: i
|
||
|
"tagged_type.c", line 28: warning: improper member use: i
|
||
|
"tagged_type.c", line 31: warning: dubious tag declaration: struct S
|
||
|
"tagged_type.c", line 35: warning: improper member use: i
|
||
|
"tagged_type.c", line 35: warning: improper member use: i
|
||
|
|
||
|
(The GCC compiler also had this bug in version 1.x, but it has been fixed
|
||
|
in version 2.x.)
|
||
|
*/
|
||
|
|
||
|
void foobar1 (arg) /* use old-style without prototypes */
|
||
|
struct S *arg;
|
||
|
{
|
||
|
struct S { int i; }; /* define the type `struct S' */
|
||
|
|
||
|
arg->i = arg->i; /* legal according to ANSI C rules! */
|
||
|
}
|
||
|
|
||
|
void foobar2 (struct S *arg) /* use new-style with prototypes */
|
||
|
{
|
||
|
struct S { int i; }; /* define the type `struct S' */
|
||
|
|
||
|
arg->i = arg->i; /* legal according to ANSI C rules! */
|
||
|
}
|
||
|
------------------------------------------------------------------------------
|
||
|
/* Here is a serious bug in the original SVR4 `dump' program which dumps
|
||
|
out parts of object files in either plain hex form or symbolically.
|
||
|
|
||
|
To see the `dump' program get a segfault and die, save this code under
|
||
|
the name `dump-bug.c' and then do:
|
||
|
|
||
|
cc -g -c dump-bug.c
|
||
|
dump -v -D dump-bug.o
|
||
|
|
||
|
The bug arises whenever `dump' tries to read Dwarf debugging information
|
||
|
for an array of pointers to any "user defined" type (e.g. `struct S' in
|
||
|
this example). Past that point, `dump' is totally confused, so further
|
||
|
Dwarf debugging information finally causes it to go belly-up.
|
||
|
*/
|
||
|
|
||
|
struct S { int i; };
|
||
|
struct S *array[10];
|
||
|
int j;
|
||
|
------------------------------------------------------------------------------
|
||
|
It appears that the svr4 C compiler (for x86 machines) doesn't conform real
|
||
|
well to either the letter or the spirit of the IEEE 754 floating-point
|
||
|
standard. In particular, "unordered comparisons" and other operations on
|
||
|
NaNs don't always produce the result that that the IEEE 754 standard calls
|
||
|
for.
|
||
|
|
||
|
An AT&T source comments: "This is documented in the SVID as a future direction.
|
||
|
We do not support NaNs in -Xa and -Xt modes, only in -Xc. Try
|
||
|
isnan(sqrt(-1.0)) to determine which modes support it."
|
||
|
------------------------------------------------------------------------------
|
||
|
|
||
|
The compiler fails to issue diagnostics in cases where a typedef name is
|
||
|
reused to declare a formal parameter, as in:
|
||
|
|
||
|
-----------------------------------------------------------------------
|
||
|
typedef int FOO;
|
||
|
void bar (FOO)
|
||
|
int FOO;
|
||
|
{
|
||
|
}
|
||
|
-----------------------------------------------------------------------
|
||
|
|
||
|
The compiler crashes on the following invalid input:
|
||
|
|
||
|
-----------------------------------------------------------------------
|
||
|
int i;
|
||
|
volatile void *pvv;
|
||
|
|
||
|
void pvv_test ()
|
||
|
{
|
||
|
(i ? *pvv : *pvv); /* ERROR */
|
||
|
}
|
||
|
-----------------------------------------------------------------------
|
||
|
|
||
|
The compiler fails to issue diagnostics for cases where an attempt is
|
||
|
made to "forward declare" an enum type (without also defining it), as
|
||
|
in:
|
||
|
|
||
|
-----------------------------------------------------------------------
|
||
|
enum enum0 *ep; /* ERROR */
|
||
|
-----------------------------------------------------------------------
|
||
|
|
||
|
The compiler rejects the following code with an error, although there
|
||
|
seems to be no good reason why it should (because no object is being
|
||
|
declared).
|
||
|
|
||
|
-----------------------------------------------------------------------
|
||
|
#include <limits.h>
|
||
|
|
||
|
typedef char array_type[ULONG_MAX];
|
||
|
-----------------------------------------------------------------------
|
||
|
|
||
|
12. getlogin() doesn't work
|
||
|
Robert Withrow <witr@rwwa.com> reports "The posix function
|
||
|
getlogin() doesn't work on most svr4s (at least up to SVR4.0.3.0...
|
||
|
cuserid() *does* work, but it makes porting a pain. Try it some time
|
||
|
and perhaps add it to your list."
|
||
|
Raymond Nijssen <raymond@woensel.es.ele.tue.nl> confirms this and
|
||
|
adds that this bug (due to utmp and wtmp file corruptions [possibly
|
||
|
caused by ttymon bugs described above --- ed.]) breaks executables such
|
||
|
as talk(1).
|
||
|
|
||
|
13. syslog routines don't work
|
||
|
Raymond Nijssen <raymond@woensel.es.ele.tue.nl> reports: "Under ESIX 4.0.3,
|
||
|
syslog routines are unusable. They are slightly better under 4.0.4, but still
|
||
|
severely broken."
|
||
|
"In addition, replacing the syslogd executable that comes with Esix with the
|
||
|
one provided by Marc Boucher (marc@cam.org) shows that the syslog() call itself
|
||
|
is sane. It's available from ftp.cam.org."
|
||
|
|
||
|
14. Bogus `r' in xt driver configuration flags
|
||
|
Raymond Nijssen <raymond@woensel.es.ele.tue.nl> reports: "Both under ESIX
|
||
|
4.0.3 and 4.0.4, the `r' flag is present in the third column of
|
||
|
/etc/conf/cf.d/mdevice for the [n][s]xt drivers, suggesting that these drivers
|
||
|
would be required for relinking the kernel. This is not the case. I saw at
|
||
|
least one release of Dell SVR4 in which this was ok." (Making this change
|
||
|
reduces the kernel's size somewhat.)
|
||
|
|
||
|
15. ioctl for kernel symbol fetches fails
|
||
|
Trying to obtain kernel values of certain symbols fails. The
|
||
|
two symbols from the kernel that are quite useful are "avenrun" and
|
||
|
"total" which as far as I can tell are defined in the "mm" driver.
|
||
|
This bug manifests itself in applications like "top", "u386mon" ...
|
||
|
One used to use the nlist() function call, but according to the man page
|
||
|
for nlist() it should not be used due to the dynamic loading and unloading
|
||
|
of drivers that can happen at any time in the "life" of a V.4.2 kernel.
|
||
|
|
||
|
Try the sample hack below to see if your system has the same problem.
|
||
|
|
||
|
#include <sys/types.h>
|
||
|
#include <sys/stat.h>
|
||
|
#include <fcntl.h>
|
||
|
#include <sys/ksym.h>
|
||
|
|
||
|
main()
|
||
|
{
|
||
|
int fd=0;
|
||
|
long ar[3];
|
||
|
struct mioc_rksym k;
|
||
|
|
||
|
fd = open("/dev/kmem", O_RDONLY);
|
||
|
k.mirk_buflen = sizeof(ar);
|
||
|
k.mirk_buf = (void *)&ar;
|
||
|
k.mirk_symname = "avenrun";
|
||
|
if((ioctl(fd, MIOC_READKSYM, &k))==-1) {
|
||
|
perror("ioctl");
|
||
|
exit(1);
|
||
|
}
|
||
|
printf("%d %d %d\n",ar[0],ar[1],ar[2]);
|
||
|
close(fd);
|
||
|
}
|
||
|
|
||
|
Thanks to David P. Cutter <dpc@shady.grail.com> for reporting this.
|
||
|
|
||
|
16. Bug in cc optimizer (4.2.1)
|
||
|
|
||
|
Nickolay Saukh <nms@ussr.eu.net> reports a bug in
|
||
|
cc, the Optimizing C Compilation System (CCS) 2.0 07/24/92
|
||
|
|
||
|
If you have global (external) structure/union with name 'tr'
|
||
|
commands to access very first member (with zero offset) are
|
||
|
garbled. Simple text to reproduce the bug
|
||
|
|
||
|
struct _tr {
|
||
|
int aa;
|
||
|
int bb;
|
||
|
} tr;
|
||
|
|
||
|
void
|
||
|
foo(int zz) {
|
||
|
tr.aa = zz;
|
||
|
}
|
||
|
|
||
|
Here is the result of cc -O -S foo.c
|
||
|
|
||
|
.file "ccbug.c"
|
||
|
.version "01.01"
|
||
|
.type foo,@function
|
||
|
.text
|
||
|
.globl foo
|
||
|
.align 4
|
||
|
|
||
|
.nopsets "cc"
|
||
|
.align 16
|
||
|
foo:
|
||
|
movl 4(%esp),%eax
|
||
|
movl %eax,&r
|
||
|
^------------- <<<< THE BUG
|
||
|
ret
|
||
|
.align 16,7,4
|
||
|
.size foo,.-foo
|
||
|
.ident "acomp: (CCS) 2.0 07/24/92 "
|
||
|
.data
|
||
|
.comm tr,8,4
|
||
|
.text
|
||
|
.ident "optim: (CCS) 2.0 07/24/92 "
|
||
|
|
||
|
** 17. /usr/ucb/install uses missing group "staff"
|
||
|
/usr/ucb/install uses the group name "staff" as the default group to install
|
||
|
programs. As this group does not exist in /etc/group, the installation will
|
||
|
fail. I would suggest changing the /etc/group file like in Solaris as follows:
|
||
|
|
||
|
nuucp::9:root,nuucp
|
||
|
staff::10:
|
||
|
|
||
|
VII. The FUBYTE Problem
|
||
|
|
||
|
(Thanks to Christoph Badura <bad@flatlin.ka.sub.org> for this info)
|
||
|
|
||
|
The kernel function fubyte() is documented to return a positive value when
|
||
|
given a valid user space address and -1 otherwise. In the latter case u.u_error
|
||
|
is set to EFAULT. USL SysV R4.0.3 has a sign extension bug in the
|
||
|
implementation of fubyte() for local file descriptors (i.e. not opened via
|
||
|
RFS), which causes fubyte() to return negative values if the byte fetched has
|
||
|
its high bit set. This bug doesn't affect STREAMS drivers, as they don't call
|
||
|
(and in fact are normally unable to call) fubyte(). Thus writing a byte with
|
||
|
the high bit set to certain character device drivers returns with -1 and errno
|
||
|
set to EFAULT.
|
||
|
|
||
|
The bug may affect any character device driver that calls fubyte(). It's not
|
||
|
limited to serial card drivers. The bug is noticed most often with serial card
|
||
|
drivers, since uucp uses byte values > 127 very early during g-protocol setup
|
||
|
and drivers for serial cards tend to use fubyte() quite often.
|
||
|
|
||
|
Note also that the bug's effect is different if the driver checks for a -1
|
||
|
return value of fubyte() or just a negative one. In the former case it is
|
||
|
possible to pass bytes with the 8 bit set through fubyte(), except for 0xff
|
||
|
which is -1 in two's complement. That makes the bug more obscure.
|
||
|
|
||
|
The fix is easy. First, make a backup copy of the kernel object file
|
||
|
/etc/conf/pack.d/kernel/vm.o! A disassembly of vm.o(lfubyte) should reveal
|
||
|
*exactly* one mov[s]bl (move byte to long w/sign extend). That one needs to be
|
||
|
patched into a movzbl (zero extend). The difference is one bit in the second
|
||
|
byte of the opcode.
|
||
|
|
||
|
The movsbl has the bit pattern 00001111 1011111w mod/rm-byte.
|
||
|
The movzbl has the bit pattern 00001111 1011011w mod/rm-byte.
|
||
|
|
||
|
The 'w' bit is 0 for the instruction in question. So the opcodes are 0f be and
|
||
|
0f b6. Here is the diff -c from dis -F lfubyte showing the patch applied to
|
||
|
the Dell 2.1 kernel:
|
||
|
|
||
|
*** vm.o Mon Mar 9 00:31:38 1992
|
||
|
--- vm.o.org Mon Mar 9 00:32:40 1992
|
||
|
***************
|
||
|
*** 22,28 ****
|
||
|
11c90: 85 c0 testl %eax,%eax
|
||
|
11c92: 75 09 jne 0x9 <11c9d>
|
||
|
11c94: 8b 45 08 movl 8(%ebp),%eax
|
||
|
! 11c97: 0f b6 00 movzbl (%eax),%eax
|
||
|
11c9a: 89 45 fc movl %eax,-4(%ebp)
|
||
|
11c9d: c7 05 d8 13 00 00 00 00 00 00 movl $0x0,0x13d8
|
||
|
11ca7: 83 3d dc 13 00 00 00 cmpl $0x0,0x13dc
|
||
|
--- 22,28 ----
|
||
|
11c90: 85 c0 testl %eax,%eax
|
||
|
11c92: 75 09 jne 0x9 <11c9d>
|
||
|
11c94: 8b 45 08 movl 8(%ebp),%eax
|
||
|
! 11c97: 0f be 00 movsbl (%eax),%eax
|
||
|
11c9a: 89 45 fc movl %eax,-4(%ebp)
|
||
|
11c9d: c7 05 d8 13 00 00 00 00 00 00 movl $0x0,0x13d8
|
||
|
11ca7: 83 3d dc 13 00 00 00 cmpl $0x0,0x13dc
|
||
|
|
||
|
Of course there is a workaround at the driver level. Canonically, one would do
|
||
|
this by checking for fubyte() returning -1 *and* u.u_error being set to EFAULT
|
||
|
(u.u_error is cleared upon entering a system call). However, in R4.0.3
|
||
|
fubyte() does NOT set u.u_error. It *does* set u.u_fault_catch.fc_errno.
|
||
|
|
||
|
Cristoph reports that Dell 2.1 can be object-patched successfully to fix this.
|
||
|
I'm told that the offending 11c97 is at exactly the same address in the
|
||
|
Consensys 1.3 kernel.
|
||
|
|
||
|
At vm.o:fa7d in Dell 2.2 there's a movzbl (%edx),%edx; same instruction,
|
||
|
different target register. Here's the relevant diff output:
|
||
|
|
||
|
*** vm.o-old Wed Jul 7 03:13:11 1993
|
||
|
--- vm.o Wed Jul 7 03:13:00 1993
|
||
|
***************
|
||
|
*** 25,31 ****
|
||
|
fa76: 85 c0 testl %eax,%eax
|
||
|
fa78: 75 09 jne 0x9 <fa83>
|
||
|
fa7a: 8b 55 08 movl 8(%ebp),%edx
|
||
|
! fa7d: 0f b6 12 movzbl (%edx),%edx
|
||
|
fa80: 89 55 fc movl %edx,-4(%ebp)
|
||
|
fa83: c7 05 d8 13 00 00 00 00 00 00 movl $0x0,0x13d8
|
||
|
fa8d: 83 3d dc 13 00 00 00 cmpl $0x0,0x13dc
|
||
|
--- 25,31 ----
|
||
|
fa76: 85 c0 testl %eax,%eax
|
||
|
fa78: 75 09 jne 0x9 <fa83>
|
||
|
fa7a: 8b 55 08 movl 8(%ebp),%edx
|
||
|
! fa7d: 0f be 12 movsbl (%edx),%edx
|
||
|
fa80: 89 55 fc movl %edx,-4(%ebp)
|
||
|
fa83: c7 05 d8 13 00 00 00 00 00 00 movl $0x0,0x13d8
|
||
|
fa8d: 83 3d dc 13 00 00 00 cmpl $0x0,0x13dc
|
||
|
|
||
|
Applying this patch produces a working kernel.
|
||
|
|
||
|
I do not know the status of the other ports.
|
||
|
|
||
|
Another poster (Marc Boucher <marc@cam.org>) adds:
|
||
|
|
||
|
On ESIX SVR4.0.3 Rev. A, the instruction movsbl in question can be changed to
|
||
|
movzbl (as described above) with a binary-editor on file
|
||
|
/etc/conf/pack.d/kernel/vm.o. At offset 0x11eb0, change 0xbe to 0xb6.
|
||
|
|
||
|
Before patching, verify that your /etc/conf/pack.d/kernel/vm.o is the same as
|
||
|
mine! On my system, the /bin/sum generated checksum of vm.o was "4440 222".
|
||
|
|
||
|
The problem results from a sign-extension bug. The function lfubyte(), which
|
||
|
is called by fubyte(), is declared as
|
||
|
|
||
|
int lfubyte(char *addr); /* actually caddr_t */
|
||
|
|
||
|
The byte is fetched with
|
||
|
|
||
|
val = *addr;
|
||
|
|
||
|
which triggers sign extension. Casting addr to a unsigned char * or declaring
|
||
|
it as such solves the problem.
|
||
|
|
||
|
This bug is still present in stock USL 4.0.4. However, it has been fixed in
|
||
|
Dell 2.2.
|
||
|
|
||
|
Raymond Nijssen contributes the following:
|
||
|
|
||
|
---- README --------------------------------------------------------------->8--
|
||
|
This shell script was written to help out people who are less experienced in
|
||
|
patching kernel binaries.
|
||
|
This version can be used to fix the fubyte bug in follwing SVR4 flavors:
|
||
|
|
||
|
ESIX 4.0.3A
|
||
|
ESIX 4.0.4
|
||
|
Dell 2.1
|
||
|
Consensys 1.3
|
||
|
|
||
|
You need sdb and your system has to be able to rebuild the kernel.
|
||
|
|
||
|
After the patch is applied, you have to rebuild the kernel by running
|
||
|
/etc/conf/bin/idbuild and /etc/conf/bin/idreboot for the patch to take effect.
|
||
|
|
||
|
You have to be root to do all this.
|
||
|
The program will ask for your confirmation before it changes anything.
|
||
|
|
||
|
Please do make a backup first, and remember that you can select the old kernel
|
||
|
(/stand/unix.old) at boot time by pressing the space bar at the 'Booting the
|
||
|
ESIX system....' prompt, in case the system fails to boot from the patched
|
||
|
kernel, though this is higly unlikely.
|
||
|
|
||
|
Systems to which this patch was applied have been running flawlessly
|
||
|
for several months, in case you have doubts...
|
||
|
|
||
|
Happy patching!
|
||
|
--------------------------------------------------------------------------->8--
|
||
|
|
||
|
----- fbfix --------------------------------------------------------------->8--
|
||
|
#!/bin/sh
|
||
|
#
|
||
|
# Copyright (c) 1993 Raymond X.T. Nijssen (raymond@woensel.es.ele.tue.nl)
|
||
|
# All Rights Reserved
|
||
|
#
|
||
|
|
||
|
# the bug...
|
||
|
#
|
||
|
b=fubyte
|
||
|
|
||
|
# offsets according to flakey USL sdb. gdb and dis say something different
|
||
|
esix403_o=0x11eb0
|
||
|
esix404_o=0x11683
|
||
|
dell21_o=0x11c98 #dell 2.1
|
||
|
cons13_o=$dell21_o #consensys 1.3
|
||
|
|
||
|
# data
|
||
|
v=0x458900be #old
|
||
|
r=0x458900b6 #new
|
||
|
|
||
|
# file
|
||
|
f=/etc/conf/pack.d/kernel/vm.o
|
||
|
|
||
|
# progs
|
||
|
s=/usr/ccs/bin/sdb
|
||
|
i=/etc/conf/bin/idbuild
|
||
|
|
||
|
c='\c';t='\t';n='\n';N=/dev/null
|
||
|
|
||
|
# aux
|
||
|
pe() if [ -n "$e" ];then echo ${n}ERROR: $e $n;e="";fi
|
||
|
yn() { while :;do echo $n$1 [$2] $c;read a;if [ -z "$a" ];then a=$2;fi
|
||
|
case "$a" in y*)return 0;;n*)return 1;;*)echo Answer 'y' or 'n';;esac;done;}
|
||
|
cr() if id|grep "^uid=0">$N;then return 0
|
||
|
else e="Only root may patch the kernel";return 1;fi
|
||
|
ab() { echo ${n}FATAL: $e$n;exit 1;}
|
||
|
ac() { pe;yn "Continue ?" "y";return;}
|
||
|
qu() { R="";if [ -n "$1" ];then d="[$1] :";else d=":";fi
|
||
|
while [ -z "$R" ];do echo ${n}Enter the $2 $d $c;read a
|
||
|
if [ "$a" ];then R=$a;elif [ -n "$1" ];then R=$1;
|
||
|
else e="No $2 entered";ac||exit 0;fi;done;}
|
||
|
|
||
|
|
||
|
# main
|
||
|
if [ ! -t 0 ];then e="This program must not be piped into a shell";ab;fi
|
||
|
if [ ! -f $s ];then e="$s not found";ab;fi
|
||
|
if [ ! -f $f ];then e="$f not found";ab;fi
|
||
|
if [ ! -f $i ];then e="$i not found";ab;fi
|
||
|
|
||
|
echo $n$n${t}YOU are responsible for running this program.$n$n${t}Clauses 9 and 10 of the GNU GENERAL PUBLIC LICENSE$n${t}apply to this program.$n$n${t}If you continue, you thereby agree that its author, $n${t}nor his employer, nor anybody else except yourself, has any $n${t}liablity for any loss, damage etc. etc.$n
|
||
|
|
||
|
ac||exit 1
|
||
|
|
||
|
echo $n$n${t}Fixable versions with the $b bug$n$n$t$t[1]$t ESIX 4.0.3A$n$t$t[2]$t ESIX 4.0.4$n$t$t[3]$t DELL 2.1$n$t$t[4]$t Consensys 1.3$n
|
||
|
R=1;qu "$R" "SVR4 flavor this system is running"
|
||
|
case $R in 1)o=$esix403_o;; 2)o=$esix404_o;;3)o=$dell21_o;; 4)o=$cons13_o;;
|
||
|
*)e="Invalid answer";ab;;esac
|
||
|
|
||
|
echo $n${t}Looking for replacement target ... $c
|
||
|
if echo $o:?lx|$s -e $f 2>$N|grep $o/$v>$N;then echo found
|
||
|
if yn "Do you want to patch the kernel now?" "n";then
|
||
|
cr||ab
|
||
|
qu "$f.orig" "name of backup file"
|
||
|
if [ -f $R ];then e="File $R already exists";ab;fi
|
||
|
if cp $f $R;then echo $n${t}Copied $f to $R;else e="Failed to write $R";ab;fi
|
||
|
if echo $o!$r|$s -e -w $f>$N 2>&1;then
|
||
|
echo ${n}Fixed $b bug, you may now run $i and reboot$n;else e="$s failed";pe
|
||
|
if cp $R $f;then echo $n${t}Copied $R to $f;else e="Restore $f failed";pe;fi
|
||
|
e="Patch failed!!";ab;fi
|
||
|
fi
|
||
|
else echo not found;e="Replacement target not found at expected offset";ab;fi
|
||
|
--------------------------------------------------------------------------->8--
|
||
|
|
||
|
VIII. Destiny and Dell
|
||
|
|
||
|
A source at at UNIX System Labs Europe claims that `Destiny' (the new Release
|
||
|
4.2) incorporates all of Dell UNIX's fixes to 4.0.3; thus, any bug for which a
|
||
|
Dell fix is indicated above should be gone in Destiny.
|
||
|
--
|
||
|
Send your feedback to: Eric Raymond = esr@snark.thyrsus.com
|