365 lines
16 KiB
Plaintext
365 lines
16 KiB
Plaintext
From: Eliot Lear <lear@NET.BIO.NET>
|
|
|
|
The following was written by Dr. Charles Hedrick of Rutgers University
|
|
sometime in 1985. Please read it with the understanding that rule
|
|
numbers are nothing more than function names. For further reference,
|
|
I suggest the Sun Tutorial on Sendmail in their manuals.
|
|
-eliot
|
|
|
|
Command: followup
|
|
Newsgroups: net.unix-wizards,net.mail
|
|
To: steve@jplgodo.UUCP
|
|
Subject: a brief tutorial on sendmail rules
|
|
Distribution:
|
|
References: <902@rlgvax.UUCP> <545@jplgodo.UUCP>
|
|
|
|
A previous message suggested using "sendmail -bt" to see how sendmail
|
|
is going to process an address. This is indeed a handy command for
|
|
testing how an address will be processed. However the instructions
|
|
given were not quite right. To see how sendmail is going to deliver
|
|
mail to a given address, a reasonable thing to type is
|
|
sendmail -bt
|
|
0,4 address
|
|
Even this isn't quite right, but with "normal" rule sets it should work.
|
|
|
|
Because there is so much confusion about sendmail rules, the rest of
|
|
this message contains a brief tutorial. My own opinion of sendmail is
|
|
that it is quite a good piece of work. Many people have complained
|
|
about the difficulty of understanding sendmail rule sets. However I
|
|
have also worked with mailers that code address processing directly
|
|
into the program. I much prefer sendmail. The real problem is not
|
|
with sendmail, but with the rules. The rules normally shipped from
|
|
Berkeley have lots of code that does strange Berkeley-specific things,
|
|
and they are not commented. Also, typical complex rule sets are
|
|
trying to handle lots of things, forwarding mail among several
|
|
different mail systems with incompatible addressing conventions. A
|
|
rule set to handle just old-style (non-domain) UUCP mail would be very
|
|
simple and easy to understand. But real rule sets are not doing
|
|
simple things, so they are not simple.
|
|
|
|
For those not familiar with sendmail, -bt invokes the rule tester. It
|
|
lets you type a set of rule numbers and an address, and then shows you
|
|
what the rules will do to that address. In addition, rule test mode
|
|
automatically applies rule 3 before whatever rule you ask it to apply.
|
|
As we will see shortly, this is a reasonable thing to do.
|
|
|
|
Before describing the rule sets, let me define two terms: "header" and
|
|
"envelope". Header refers to the lines at the beginning of the
|
|
message, starting with "from:", "to:", "subject:", etc. Sendmail does
|
|
process these lines. E.g. with uucp mail it will add its own host
|
|
name at the beginning of the from line, so that the final recipient
|
|
stands some change of replying to the message. However sendmail
|
|
normally does not depend upon the from and to lines to perform its
|
|
actual delivery. It has more direct knowledge, passed on to it from
|
|
the program that generated the mail, or if it came from another site,
|
|
the mailer at that site. This information is referred to as the
|
|
"envelope", since it is like the addresses on the outside of an
|
|
envelope. For Arpanet mail, the envelope is passed to the next site
|
|
by the MAIL FROM: and RCPT TO: commands. For UUCP mail, it is passed
|
|
on as arguments to the remote rmail command. To see why there have to
|
|
be separate addresses "on the envelope", consider what happens when
|
|
you send mail to "john@vax, mary@sun". Two copies of the message will
|
|
be dispatched, one to vax and the other to sun. The "to: " line in
|
|
the headers will show both addresses. However the envelope will show
|
|
only the right address that we want this copy to go to. The copy sent
|
|
to vax will show "john@vax" and the copy sent to sun will show
|
|
"mary@sun". If sendmail had to look at the "to: " line, it would
|
|
never know which of the addresses shown there it was responsible for
|
|
handling.
|
|
|
|
Anyway, here is what the rules do:
|
|
|
|
3: always done first. This turns addresses from their normal textual
|
|
form into a form that the rest of the rules understand. In most
|
|
cases, all it does it put < > around the name of the host that is next
|
|
in line. Thus foo@bar turns into foo<@bar>. However it also does a
|
|
few transformations. E.g. it turns foo!bar!user into
|
|
bar!user<@foo.UUCP>. Since sendmail accepts either ! syntax or
|
|
@....UUCP syntax, rule 3 standardizes on @ syntax. It also does a few
|
|
other minor things. But you won't be far off if you just think of it
|
|
as adding < > around the host name.
|
|
|
|
4: always done last. This turns addresses from internal form back
|
|
into external form. It removes the < > around the host name, and
|
|
turns foo@bar.UUCP back into bar!foo. Again, there are one or two
|
|
other minor things, but you won't be too far off if you think of 4 as
|
|
just removing the < > around the host name.
|
|
|
|
0: This is the rule that handles the destination address on the
|
|
envelope. It is in some sense the primary rule. It returns a triple:
|
|
protocol, host, user. The protocol is usually one of local, TCP, or
|
|
UUCP. At the moment, it figures this out syntactically. In our rule
|
|
set, hosts ending in .UUCP are handled by UUCP, the current host is
|
|
local, and everything else is TCP. As domains are integrated into
|
|
UUCP, obviously this rule is going to change. This rule does very
|
|
little other than simply look at the format of the host name, though
|
|
as usual a few other details are involved (e.g. it removes the local
|
|
host. So myhost!foo!bar will be sent directly to foo).
|
|
|
|
1 and 2 are protocol-independent transformations used for sender and
|
|
recipient lines in the header (i.e. from: and to: lines). In our
|
|
rule sets, they don't do anything.
|
|
|
|
Each protocol has its own rules to use for sender and recipient lines
|
|
in the header. E.g. UUCP rules might add the local host name to the
|
|
beginning of the from line and remove it from the to line. In our
|
|
rule set, the complexities in these rules are primarily caused by
|
|
forwarding between UUCP and TCP. The line that defines the mailer for
|
|
a protocol lists the rule to use for source and recipient, in the S=
|
|
and R=.
|
|
|
|
Finally, here is the exact sequence in which these rules are used.
|
|
For example, the first line means that the destination specified in
|
|
the envelope is processed first by rule 3, then rule 0, then rule 4.
|
|
|
|
envelope recipient: 3,0,4 [actually rule 4 is applied only to the
|
|
user name portion of what rule 0 returns]
|
|
envelope sender: 3,1,4
|
|
header recipient: 3,2,xx,4 [xx is the rule number specified in R=]
|
|
header sender: 3,1,xx,4 [xx is the rule number specified in S=]
|
|
|
|
I have the impression that the sender from the envelope (the
|
|
return-path) may actually get processed twice, once by 3,1,4 and the
|
|
second time by 3,1,xx,4. However I'm not sure about that.
|
|
|
|
Now for the format of the rules themselves. I'm just going to show
|
|
some examples, since sendmail comes with a reference manual, which you
|
|
can refer to. However these examples are probably enough to let you
|
|
understand any set of rules that makes sense in the first place (which
|
|
the normal rules do not). This example is from our UUCP definition.
|
|
It a simplified version of the set of rules used to process the sender
|
|
specification. As such, the major thing it has to do is to add our
|
|
host name to the beginning, so that the guy at the end will know that
|
|
the mail went through us.
|
|
|
|
S13
|
|
R$+<@$-.UUCP> $2!$1 u@host.UUCP => host!u
|
|
R$=U!$+ $2 strip local name
|
|
R$+ $:$U!$1 stick on our host name
|
|
|
|
Briefly, the first rule turns the address from the form foo<@bar.UUCP>
|
|
back into bar!foo. The second rule removes our local host name, if
|
|
it happens to be there already, so we don't get it twice. The third
|
|
rule adds our host name to the beginning.
|
|
|
|
S13 says that this is the beginning of a new rule set, number 13.
|
|
|
|
R$+<@$-.UUCP> $2!$1 u@host.UUCP => host!u
|
|
|
|
R says that this is a rule. The thing immediately after it,
|
|
$+<@$-.UUCP> is a pattern. If this pattern matches the address, then
|
|
the rule "triggers". If the rule triggers, the address is replaced
|
|
with the "right hand side", i.e. what is after the tab(s). In this
|
|
rule, the right hand sie is $2!$1. The thing after the next tab(s) is
|
|
a comment. This rule is used in processing UUCP addresses. As noted
|
|
above, by the time we get to it, rule 3 has already been applied. So
|
|
if we had a UUCP address of the form host1!host2!user, it would now be
|
|
in the form host2!user<@host1.UUCP>. This does match the pattern:
|
|
|
|
$+ <@$- .UUCP>
|
|
host2!user<@host1.UUCP>
|
|
|
|
$+ and $- are "wildcards" that match anything. $- will match exactly
|
|
one word, while $+ will match any number. (By the way, with the
|
|
increasing use of domains, this production should probably use
|
|
$+.UUCP, not $-.UUCP.) Since the pattern matches, we replace this
|
|
with the "right hand side" of the rule, $2!$1. $ followed by a digit
|
|
means the Nth thing matched by a wildcard. In this case there were
|
|
two wildcards, so
|
|
$1 = host2!user
|
|
$2 = host1
|
|
The final result is
|
|
host1!host2!user
|
|
As you can see, we have simply turned UUCP addresses from the format
|
|
produced by rule 3 back into normal ! format.
|
|
|
|
The second rule is
|
|
|
|
R$=U!$+ $2 strip local name
|
|
|
|
This is needed because there are situations in which our host name
|
|
ends up on the beginning of the recipient address. Since we are
|
|
about to add our host name, we don't want it to be there twice.
|
|
So if it was there before, we remove it. $= is used to see if
|
|
something is a member of a specified "class". U happens to be a list
|
|
of our UUCP host name and any nicknames. So $=U!$+ matches
|
|
any address that begins with our host name or nickname, then !, then
|
|
anything else. Suppose we had topaz!host1!host2!user. The
|
|
match would be
|
|
|
|
$=U !$+
|
|
topaz!host1!host2!user
|
|
|
|
The result of the match is that
|
|
|
|
$1 = topaz
|
|
$2 = host1!host2!user
|
|
|
|
Since the right hand side of this rule is simply "$2", the result is
|
|
|
|
host1!host2!user
|
|
|
|
I.e. we have removed the topaz from the beginning. By the way, the
|
|
class U used by the rule would have been defined earlier in the file
|
|
by the statement
|
|
|
|
CUtopaz ru-topaz
|
|
|
|
C defines a class. U is the name of the class. The rest of the
|
|
line is the list of things that will be in the class.
|
|
|
|
Finally we have the rule
|
|
|
|
R$+ $:$U!$1 stick on our host name
|
|
|
|
The $+ matches anything. In this case the name is host1!host2!user, so the
|
|
result of the match is
|
|
|
|
$1 = host1!host2!user
|
|
|
|
The result looks slightly obscure. $: is a tag that says to do this
|
|
only once. The problem is that this rule always applies, since the
|
|
pattern matches anything. Normally, rules are applied over and
|
|
over, as long as they apply. In this case, the result would be
|
|
an infinite loop. Putting $: at the beginning says to do it only
|
|
once. $U says to use the value of the macro U. Earlier in the
|
|
file we defined U as our UUCP host name, with a definition
|
|
|
|
DUtopaz
|
|
|
|
Note that there can be a class and a macro with the same name.
|
|
$=U tests whether something is in the class U. $U is replaced
|
|
by the value of the macro U.
|
|
|
|
So the final value of this rule, $:$U!$1, is
|
|
|
|
topaz!host1!host2!user
|
|
|
|
So this rule has managed to add our host name to the beginning, as it
|
|
was supposed to. Since there are no further rules in the set (the
|
|
next line is the end of file or the beginning of a new rule set),
|
|
this value is returned.
|
|
|
|
There are several more magic things that can appear in a pattern.
|
|
The most important are:
|
|
|
|
$* - this is another wild card. It is similar to $+, but $+ matches
|
|
anything, whereas $* matches both anything and nothing. I.e. $+
|
|
matches 1 or more tokens and $* matches 0 or more tokens. So here
|
|
is a list of the wildcards I have mentioned:
|
|
|
|
$* 0 or more
|
|
$+ 1 or more
|
|
$- exactly 1
|
|
$=x any member of class x
|
|
|
|
A typical example of $* is a production where we aren't sure whether
|
|
the user name is before or after the host name:
|
|
|
|
R$*<@$+.UUCP>$* $@$1<@$2.UUCP>$3
|
|
|
|
This production would test for the host name ending in .UUCP, and
|
|
return immediately. $@ is a flag you haven't seen yet. It is simply
|
|
a return statement. It causes the right hand side of this rule to be
|
|
returned as the final value of this rule set.
|
|
|
|
The other magic thing I will mention is $>. This is a subroutine
|
|
call. Here is an example taken from rule set 24, which is used to
|
|
process recipients in TCP mail. Its purpose is to handle the
|
|
situation where we might have an address like topaz!user@red. (Our
|
|
host name is topaz. Red is a local host that we talk to via TCP.)
|
|
I.e. someone is asking us to relay mail to red. Rule 3 will have
|
|
turned this into user@red<@topaz.UUCP>. What we want to do is
|
|
get rid of the topaz.UUCP and treat red as the host. (Rule set 0
|
|
would do this for the recipient on the envelope. This rule is
|
|
used for the to: field in the header.) Here is the rule.
|
|
|
|
R$+<@$=U.UUCP> $@$>9$1 in case local!a@b
|
|
|
|
The pattern matches our example, as follows:
|
|
|
|
$+ <@$=U .UUCP>
|
|
user@red<@topaz.UUCP>
|
|
|
|
Recall that $+ matches anything and $=U tests whether something is our
|
|
UUCP host name or one of our nicknames. The result of the match is
|
|
|
|
$1 = user@red
|
|
$2 = topaz
|
|
|
|
The right hand side is $@$>9$1. The $@ is the tag saying to stop the
|
|
rule set here and return this value. $>9 is a subroutine call. It
|
|
says to take the right hand side, pass it to rule set 9, and then
|
|
use the value of rule set 9. The actual right hand side is simply
|
|
$1, which in this case is user@red. Here is rule set 9:
|
|
|
|
S9
|
|
R$*<$*>$* $1$2$3 defocus
|
|
R$+ $:$>3$1 make canonical
|
|
R$+ $@$>24$1 and do 24 again
|
|
|
|
The first rule simply removes < >. It is sort of a quick and dirty
|
|
version of rule 4. In fact we have no < > left, since we have removed
|
|
the <@topaz.UUCP>. So this rule does not trigger. (Now that I think
|
|
about it, I suspect it is probably never going to trigger, and so is
|
|
not needed.)
|
|
|
|
The next rule is a simple subroutine call. It matches anything ($+
|
|
matches any 1 or more token). The right hand side is $:$>3$1 The $:
|
|
says to do it only once. Since the rule matches anything, you need
|
|
this, or you will have an infinite loop. The $>3 says to call rule 3
|
|
as a subroutine. The $1 is the actual right hand side. Since the
|
|
left hand side matched the whole address, what this rule does is
|
|
simply call rule set 3 on the whole address. Recall that rule set 3
|
|
basically locates the host name and puts < > around it. So in this
|
|
case the result is user<@red>. As you can see, it was not enough to
|
|
remove <@topaz.UUCP>. That leaves us with no host name. We have to
|
|
call rule 3 to find the current host name and put < > around it.
|
|
|
|
The last rule is really just a goto statement. The pattern is $+,
|
|
which matches anything, so it always triggers. The right hand side is
|
|
$@$>24$1. The $@ is the return tag. It says to stop this rule set
|
|
and return that value. $>24 says to call rule set 24. The actual
|
|
right hand side is $1, so we call rule set 24 with the whole address.
|
|
If you recall, this ruleset (9) was called from the middle of 24 when
|
|
we found user@red<@topaz.UUCP>. So what we have done is to change
|
|
this into user<@red> and say to start rule set 24 over again.
|
|
|
|
I hope you have found this exposition useful. As a final convenience,
|
|
here is a "reference card" for reading rule sets. Note that this
|
|
contains only operators used by the rules. There are plenty of
|
|
other facilities used in the configuration section which I am
|
|
not documenting here. (I'd love to see someone produce a complete
|
|
reference card.)
|
|
|
|
wildcards:
|
|
$* 0 or more tokens
|
|
$+ 1 or more tokens
|
|
$- exactly one token
|
|
$=x member of class x (x must be a letter, lower/upper case distinct)
|
|
$~x not a member of class x
|
|
|
|
macro values (usable in pattern or on right hand side)
|
|
$x value of macro x (x must be a letter, lower/upper case distinct)
|
|
At least on the Pyramid, $x is replaced by the macro's value
|
|
when the sendmail.cf file is being read in.
|
|
|
|
on the right hand side:
|
|
$n string matched by the Nth wildcard
|
|
$>n call rule set N as a subroutine
|
|
$@ return
|
|
$: only do this rule once
|
|
|
|
in rule 0, defining the return value
|
|
$# protocol
|
|
$@ host
|
|
$: user
|
|
|
|
Rutgers extensions, usable only on right hand side
|
|
$%n take the string matched by the Nth wildcard, look it up in
|
|
/etc/hosts, and if found use the primary host name
|
|
$&x use the current value of macro x. x must be a letter.
|
|
upper and lower case are treated as distinct.
|
|
|
|
|