309 lines
12 KiB
Plaintext
309 lines
12 KiB
Plaintext
|
GNU coding standards. last updated 10 Feb 89
|
|||
|
|
|||
|
Reference standards:
|
|||
|
|
|||
|
Don't in any circumstances refer to Unix source code for or during
|
|||
|
your work on GNU! (Or to any other proprietary programs.)
|
|||
|
|
|||
|
If you have a vague recollection of the internals of a Unix program,
|
|||
|
this does not absolutely mean you can't write an imitation of it, but
|
|||
|
do try to organize the imitation along different lines, because this
|
|||
|
is likely to make the details of the Unix version irrelevant and
|
|||
|
dissimilar to your results.
|
|||
|
|
|||
|
For example, Unix utilities were generally optimized to minimize
|
|||
|
memory use; if you go for speed instead, your program will be very
|
|||
|
different. You could keep the entire input file in core and scan it
|
|||
|
there instead of using stdio. Use a smarter algorithm discovered more
|
|||
|
recently than the Unix program. Eliminate use of temporary files. Do
|
|||
|
it in one pass instead of two (we did this in the assembler).
|
|||
|
|
|||
|
Or, on the contrary, emphasize simplicity instead of speed. For some
|
|||
|
applications, the speed of today's computers makes simpler algorithms
|
|||
|
adequate.
|
|||
|
|
|||
|
Or go for generality. For example, Unix programs often have static
|
|||
|
tables or fixed-size strings, which make for arbitrary limits; use
|
|||
|
dynamic allocation instead. Make sure your program handles nulls and
|
|||
|
other funny characters in the input files. Add a programming language
|
|||
|
for extensibility and write part of the program in that language.
|
|||
|
|
|||
|
Or turn some parts of the program into independently usable libraries.
|
|||
|
Or use a simple garbage collector instead of tracking precisely when
|
|||
|
to free memory, or use a new GNU facility such as obstacks.
|
|||
|
|
|||
|
|
|||
|
Compatibility standards:
|
|||
|
|
|||
|
Utility programs and libraries for GNU should be totally upward
|
|||
|
compatible with those in Berkeley Unix, with certain exceptions.
|
|||
|
When a feature is used only by users (not by programs or command files),
|
|||
|
and it is done poorly in Berkeley Unix, it is good to replace
|
|||
|
it completely with something totally different and better.
|
|||
|
(For example, vi is replaced with Emacs.) But it is nice to
|
|||
|
offer a compatible feature as well. (There is a free vi-clone,
|
|||
|
so we will offer it.)
|
|||
|
|
|||
|
Additional useful features not in Berkeley Unix are welcome.
|
|||
|
Additional programs with no counterpart in Unix may be useful,
|
|||
|
but our first priority is usually to duplicate what Unix already
|
|||
|
has.
|
|||
|
|
|||
|
|
|||
|
Formatting standards:
|
|||
|
|
|||
|
It is important put the open-brace that starts the body of a C function
|
|||
|
in column zero, and avoid putting any other open-brace or open-parenthesis
|
|||
|
or open-bracket in column zero. Several tools look for
|
|||
|
open-braces in column zero to find the beginnings of C functions.
|
|||
|
These tools will not work on code not formatted that way.
|
|||
|
|
|||
|
It is also important for function definitions to start the name
|
|||
|
of the function in column zero. `ctags' or `etags' cannot recognize
|
|||
|
them otherwise. Thus, the proper format is this:
|
|||
|
|
|||
|
static char *
|
|||
|
concat (s1, s2) /* Name starts in column zero here */
|
|||
|
char *s1, *s2;
|
|||
|
{ /* Open brace in column zero here */
|
|||
|
...
|
|||
|
}
|
|||
|
|
|||
|
Aside from this, I prefer code formatted like this:
|
|||
|
|
|||
|
if (x < foo (y, z))
|
|||
|
haha = bar[4] + 5;
|
|||
|
else
|
|||
|
{
|
|||
|
while (z)
|
|||
|
{
|
|||
|
haha += foo (z, z);
|
|||
|
z--;
|
|||
|
}
|
|||
|
return ++x + bar ();
|
|||
|
}
|
|||
|
|
|||
|
I find it easier to read a program when it has spaces before the
|
|||
|
open-parentheses and after the commas. Especially after the commas.
|
|||
|
|
|||
|
Please use formfeed characters to divide the program into pages at
|
|||
|
logical places (but not within a function). It does not matter just
|
|||
|
how long the pages are, since they do not have to fit on a printed
|
|||
|
page.
|
|||
|
|
|||
|
When you split an expression into multiple lines, split it
|
|||
|
before an operator, not after one. For example:
|
|||
|
|
|||
|
if (foo_this_is_long && bar > win (x, y, z)
|
|||
|
&& remaining_condition)
|
|||
|
|
|||
|
|
|||
|
Commenting Standards:
|
|||
|
|
|||
|
Every program should start with a comment saying briefly
|
|||
|
what it is for. Example: "fmt -- filter for simple filling of text".
|
|||
|
|
|||
|
Please put a comment on each function saying what the function does,
|
|||
|
what sorts of arguments it gets, and what the possible values of
|
|||
|
arguments mean and are used for. It is not necessary to duplicate in
|
|||
|
words the meaning of the C argument declarations, if a C type is being
|
|||
|
used in its customary fashion. If there is anything nonstandard about
|
|||
|
its use (such as an argument of type `char *' which is really the
|
|||
|
address of the second character of a string, not the first), or any
|
|||
|
possible values that would not work the way one would expect (such as,
|
|||
|
that strings containing newlines are not guaranteed to work), be sure
|
|||
|
to say so.
|
|||
|
|
|||
|
Please put two spaces after the end of a sentence in your comments, so
|
|||
|
that the Emacs sentence commands will work. Also, please write
|
|||
|
complete sentences and capitalize the first word. Avoid putting
|
|||
|
a case-sensitive identifier at the beginning of a sentence, because
|
|||
|
such identifiers look strange if lower case and yet are erroneous if
|
|||
|
capitalized.
|
|||
|
|
|||
|
The comment on a function is much clearer if you use the argument
|
|||
|
names to speak about the argument values. Thus, "the inode number
|
|||
|
NODE_NUM" rather than "an inode". Our convention is to put the
|
|||
|
argument name in all caps when speaking about its value in this way.
|
|||
|
|
|||
|
There is usually no purpose in restating the name of the function in
|
|||
|
the comment before it, because the reader can see that for himself.
|
|||
|
There might be an exception when the comment is very long.
|
|||
|
|
|||
|
There should be a comment on each static variable as well, like this:
|
|||
|
|
|||
|
/* Nonzero means truncate lines in the display;
|
|||
|
zero means continue them. */
|
|||
|
|
|||
|
int truncate_lines;
|
|||
|
|
|||
|
|
|||
|
Syntactic Standards:
|
|||
|
|
|||
|
Please explicitly declare all arguments to functions.
|
|||
|
Don't omit them just because they are ints.
|
|||
|
|
|||
|
Declarations of external functions and functions to appear later
|
|||
|
in the source file should all go in one place near the beginning of
|
|||
|
the file (somewhere before the first function definition in the file),
|
|||
|
or else should go in a header file. Don't put extern declarations
|
|||
|
inside functions.
|
|||
|
|
|||
|
Don't declare multiple variables in one declaration that spans lines.
|
|||
|
Start a new declaration on each line, instead. For example, instead
|
|||
|
of this:
|
|||
|
|
|||
|
int foo,
|
|||
|
bar;
|
|||
|
|
|||
|
write either this:
|
|||
|
|
|||
|
int foo, bar;
|
|||
|
|
|||
|
or this:
|
|||
|
|
|||
|
int foo;
|
|||
|
int bar;
|
|||
|
|
|||
|
(If they are global variables, each should have a comment
|
|||
|
preceding it anyway.)
|
|||
|
|
|||
|
When you have an if-else statement nested in another if statement,
|
|||
|
always put braces around the if-else. Thus, never write like this:
|
|||
|
|
|||
|
if (foo)
|
|||
|
if (bar)
|
|||
|
win ();
|
|||
|
else
|
|||
|
lose ();
|
|||
|
|
|||
|
always like this:
|
|||
|
|
|||
|
if (foo)
|
|||
|
{
|
|||
|
if (bar)
|
|||
|
win ();
|
|||
|
else
|
|||
|
lose ();
|
|||
|
}
|
|||
|
|
|||
|
Don't declare both a structure tag and variables or typedefs in the
|
|||
|
same declaration. Instead, declare the structure tag separately
|
|||
|
and then use it to declare the variables or typedefs.
|
|||
|
|
|||
|
Try to avoid assignments inside if-conditions. For example, don't
|
|||
|
write this:
|
|||
|
|
|||
|
if ((foo = (char *) malloc (sizeof *foo)) == 0)
|
|||
|
fatal ("out of memory");
|
|||
|
|
|||
|
instead, write this:
|
|||
|
|
|||
|
foo = (char *) malloc (sizeof *foo);
|
|||
|
if (foo == 0)
|
|||
|
fatal ("out of memory");
|
|||
|
|
|||
|
Naming Standards:
|
|||
|
|
|||
|
Please use underscores to separate words in a name,
|
|||
|
so that the Emacs word commands can be useful within them.
|
|||
|
Stick to lower case; reserve upper case for macros and enum
|
|||
|
constants, and for name-prefixes that follow a uniform convention.
|
|||
|
|
|||
|
For example, use names like `ignore_space_change_flag';
|
|||
|
don't use names like `iCantReadThis'.
|
|||
|
|
|||
|
Variables that indicate whether command-line options have been
|
|||
|
specified should be named after the meaning of the option, not after
|
|||
|
the option-letter. A comment should state both the exact meaning of
|
|||
|
the option and its letter. For example,
|
|||
|
|
|||
|
/* Ignore changes in horizontal whitespace (-b). */
|
|||
|
int ignore_space_change_flag;
|
|||
|
|
|||
|
|
|||
|
Semantic Standards:
|
|||
|
|
|||
|
Avoid arbitrary limits on the length or number of ANY data structure,
|
|||
|
including filenames, lines, files, and symbols, by allocating all
|
|||
|
data structures dynamically. In most Unix utilities, "long lines
|
|||
|
are silently truncated". This is not acceptable in a GNU utility.
|
|||
|
|
|||
|
Utilities reading files should not drop null characters, or any other
|
|||
|
nonprinting characters including those with codes above 0177, except
|
|||
|
perhaps utilities specifically intended for interface to printers.
|
|||
|
|
|||
|
Check every system call for an error return, unless you know you
|
|||
|
wish to ignore errors. Include the system error text (from perror
|
|||
|
or equivalent) in *every* error message resulting from a failing
|
|||
|
system call, as well as the name of the file if any and the
|
|||
|
name of the utility. Just "cannot open foo.c" or "stat failed"
|
|||
|
is not sufficient.
|
|||
|
|
|||
|
Check every call to `malloc' or `realloc' to see if it returned zero.
|
|||
|
Check `realloc' even if you are making the block smaller; in a power-of-2
|
|||
|
system, this can require allocating new space!
|
|||
|
|
|||
|
In Unix, `realloc' can destroy the storage block if it returns zero.
|
|||
|
GNU `realloc' does not have this bug, so your utilities can assume
|
|||
|
that `realloc' works conveniently (does not destroy the original block
|
|||
|
when it returns zero.) If you wish to run them on Unix, and wish to
|
|||
|
avoid lossage in this case, you can use the GNU `malloc'.
|
|||
|
|
|||
|
Do not assume anything about the contents of a block of memory
|
|||
|
after it has been passed to `free'.
|
|||
|
|
|||
|
When static storage is to be written in during program execution,
|
|||
|
use explicit C code to initialize it. Reserve C initialized
|
|||
|
declarations for data that will not be changed.
|
|||
|
|
|||
|
Try to avoid low-level interfaces to obscure Unix data structures
|
|||
|
(such as file directories, utmp, or the layout of kernel memory),
|
|||
|
since these are less likely to work compatibly. If you need to
|
|||
|
find all the files in a directory, use `readdir' or some other
|
|||
|
high-level interface. These will be supported compatibly by GNU.
|
|||
|
|
|||
|
GNU signal handling will probably be like that in BSD, rather than
|
|||
|
that in system V, because BSD's is more powerful and easier to use.
|
|||
|
|
|||
|
|
|||
|
Portability standards:
|
|||
|
|
|||
|
Much of what is called "portability" in the Unix world refers to
|
|||
|
porting to different Unix versions. This is not relevant to GNU
|
|||
|
software, because its purpose is to run on top of one and only
|
|||
|
one kernel, the GNU kernel, compiled with one and only one C
|
|||
|
compiler, the GNU C compiler. The amount and kinds of variation
|
|||
|
among GNU systems on different cpu's will be like the variation
|
|||
|
among Berkeley 4.3 systems on different cpu's.
|
|||
|
|
|||
|
It is difficult to be sure exactly what facilities the GNU kernel will
|
|||
|
provide, since it isn't finished yet. Therefore, assume you can use
|
|||
|
anything in 4.3; just avoid using the format of semi-internal data
|
|||
|
bases (utmp, directories, kmem) when there is a higher-level
|
|||
|
alternative.
|
|||
|
|
|||
|
You can freely assume any reasonably standard facilities in the C
|
|||
|
language, libraries or kernel, because any such facility is part
|
|||
|
of GNU's job to support. The fact that there may exist kernels or
|
|||
|
C compilers that lack those facilities is irrelevant as long
|
|||
|
as the GNU kernel and C compiler support them.
|
|||
|
|
|||
|
It remains necessary to worry about differences among cpu types, such
|
|||
|
as the difference in byte ordering and alignment restrictions.
|
|||
|
However, I don't expect 16-bit machines ever to be supported by GNU,
|
|||
|
so there is no point in spending any time to consider the possibility
|
|||
|
that an int will be less than 32 bits.
|
|||
|
|
|||
|
You can assume that it is reasonable to use a meg of memory. Don't
|
|||
|
strain to reduce memory usage unless it can get to that level. If
|
|||
|
your program creates complicated data structures, just make them in
|
|||
|
core and give a fatal error if malloc returns zero.
|
|||
|
|
|||
|
If a program works by lines and could be applied to arbitrary user-
|
|||
|
supplied input files, it should keep only a line in memory, because
|
|||
|
this is not very hard and users will want to be able to operate
|
|||
|
on input files that are bigger than will fit in core all at once.
|
|||
|
|
|||
|
|
|||
|
|