Exim 3 as a mail hub for Microsoft Exchange Server 2000

Purpose

I was asked at my company to provide a spam filter for email. We run Microsoft Exchange Server 2000 as our mail..uh..."system", and I discovered very quickly that spam filters for Exchange are expensive, as in > $0. So I decided to take an old workstation nobody was using anymore and install a spam filter on Linux. The final, working-quite-well solution uses the Debian stable distribution, Exim 3.x, and SpamAssassin.

Requirements

The basic requirement for a mail hub in front of MS Exchange is that it pass desirable incoming mail to Exchange. In our case, it does not need to handle outgoing mail--Exchange can handle that as normal. Others may decide to isolate Exchange completely, which shouldn't be more difficult to implement than the incoming was for me. Internal mail will not go through the hub--it will be handled entirely by Exchange. Therefore, the mail hub MTA must be able to: recognize valid local addresses, check them for spam, and deliver them to the back end Exchange server. In addition, we will implement tools to help train the spam filter.

Setup

Install Exim. Install SpamAssassin. If you were running Debian, you could be home by now. We will first configure Exim with hub functionality. Then, we will "turn it on" by rerouting incoming mail and test. Finally, we will add the SpamAssassin package. Test again. Accept accolades.

Configuration

Exim general settings

Exim is a free (speech and beer) MTA. It is modified via a flat configuration file. There are certain global settings which need to be made in order to allow incoming mail to even be acknowledged. Exim separates its .conf file into several sections, each separated by a single line with the word "end". The first section includes main configuration settings:

######################################################################
#                    MAIN CONFIGURATION SETTINGS                     #
######################################################################

# Add this domain to all parts whose domain can not be determined
qualify_domain = amor.org

# local domains
local_domains = /etc/exim/domains
local_domains_include_host = true
local_domains_include_host_literals = true

never_users = root
trusted_users = mail

smtp_verify = true
smtp_accept_queue_per_connection = 100

freeze_tell_mailmaster = true

received_header_text = "Received: \
         ${if def:sender_rcvhost {from ${sender_rcvhost}\n\t}\
         {${if def:sender_ident {from ${sender_ident} }}\
         ${if def:sender_helo_name {(helo=${sender_helo_name})\n\t}}}}\
         by ${primary_hostname} \
         ${if def:received_protocol {with ${received_protocol}}} \
         (Exim ${version_number} #${compile_number} (Debian))\n\t\
         id ${message_id}\
         ${if def:received_for {\n\tfor <$received_for>}}"

receiver_try_verify = true

EXEMPDIR = /etc/exim/exemployees/

end

The /etc/exim/domains file contains a list of domains for which we wish to receive mail. Each domain for which we receive mail is listed twice, first as a "glob" with wildcards, then as a single key. The globs allow us, in some uses of this file, to accept mail at mail.amor.org, for example. It looks like this:

*.amorhq.net
*.amor.org
*.amorministries.org
*.amorministries.com
*.comebuildhope.org
*.comebuildhope.com
amorhq.net
amor.org
amorministries.org
amorministries.com
comebuildhope.org
comebuildhope.com

The remainder of the configuration settings are trivial. Notice, however, that we do not allow relaying of any kind (since we do not handle outgoing mail), and we set up a macro, called EXEMPDIR, which points to a local filesystem directory which we will use shortly.

Hub Directors and Transport

In Exim, local addresses are verified and then handled by directors, and remote addresses by routers, both of which then use transports for the actual delivery. Therefore, we need to set up directors to recognize valid local addresses, and a transport to deliver mail to the Exchange server. In the exim.conf file, transports are listed before directors, even though the directors are processed first. So, in the TRANSPORTS section, we add a single transport (the remote_smtp transport comes standard in the Debian install; we might as well keep it):

######################################################################
#                      TRANSPORTS CONFIGURATION                      #
######################################################################

# This transport performs the central task of routing
# incoming mail to the Exchange server.
local_smtp:
  driver = smtp
  hosts = 192.168.0.38
  hosts_override

remote_smtp:
  driver = smtp

end

Pretty straightforward. We added a transport named "local_smtp", which uses SMTP to deliver mail to our Exchange server, sitting at 192.168.0.38 on our LAN. The hosts_override line simply instructs Exim to always send to the IP specified with hosts. If you had multiple Exchange servers, you could specify multiple hosts here. We don't.

Now, we create three directors:

######################################################################
#                      DIRECTORS CONFIGURATION                       #
#             Specifies how local addresses are handled              #
######################################################################

system_aliases:
  driver = aliasfile
  file = /etc/aliases/hubaliases
  search_type = lsearch

hubbed_mail:
  driver = smartuser 
  transport = local_smtp
  condition = ${lookup {$local_part} lsearch {/etc/aliases/hubusers}{yes}{no}}

fall_through:
  driver = smartuser
  transport = local_smtp
  domains = /etc/exim/domains

end

Let's start at the bottom and work our way up. The fall_through director uses the "smartuser" driver, which means it will handle all mail that reaches it. Directors are processed in order from top to bottom, so this is our "last resort" director. It uses the local_smtp transport, so all mail will be delivered to our Exchange server at 192.168.0.38. Well, not all mail. The domains line specifies it should only match mail bound for a domain listed in our file (i.e. - not fumanchu@192.168.0.6). So we get to use the same file twice. Which is nice.

Next up is the hubbed_mail director. This is the meat of the hub, and looks up canonical local addresses. Microsoft Exchange Server allows multiple email addresses for each mailbox; however, it expects one of those addresses to be the primary address. This will be the address to which we deliver mail which is processed on our hub. Again, we use the "smartuser" driver, and the same transport, but we check for the existence of the local_part in a file which looks like this (ours is actually about 4 times as long):

acha
bill
barney
dperkio
erin
fernandos
fumanchu
howardmajor
jen
lydia
missionservices
sheri
stever
wjohnson
yld

Each line is a primary address for a particular mailbox on the Exchange server. So why do we have this director when we already have the fall_through director? It seems ike a good idea. I can examine the log (grep fall_through /var/log/exim/mainlog) to see addresses which are being sent through without explicit local counterparts. In our org, such mail will be bounced to a mail administrator by Exchange, and I want to stay on their good side. It also makes other configurations (in my future or your present) easier; you can change the behavior of fall_through to either fail, or even blackhole, invalid addresses. Another option might be to use winbind (part of the Samba suite) or LDAP to match valid local parts.

Later on, I got tired of that same administrator forwarding fall_through email to me all the time, requesting that yet another address be blackholed or rerouted. So I dropped the following into /etc/cron.daily/exim:
# Send an email report of addresses which fall through the hub.
if [ -f /var/log/exim/mainlog ]; then
  grep fall_through /var/log/exim/mainlog | mail postmaster -s"$(hostname): fall_through report"
fi
This should go before the scriptlet that cycles the logs, or you'll be grepping the wrong log file. I chose to receive this regardless of whether or not it was empty (no matches). It's up to you to filter these out if you don't want that "heartbeat".

Finally, we have the system_aliases director, which uses a common "aliasfile" director. The file is searched linearly (hence "lsearch"); each row in the file consists of a key: value pair. If the local part of the recipient address matches a key, the mail will be reprocessed using the corresponding value instead. Combined with address rewriting (discussed below), I find this much more powerful than Exchange tools for multiple addresses per user. Notice that this alias file applies the same rules regardless of the incoming domain (as long as it's listed in local_domains). In our organziation, we don't care if you send mail to fumanchu@amor.org, fumanchu@amorministries.com, or fumanchu@comebuildhope.com. It all goes to the same person. If you have other requirements, you can use multiple alias files, for example: file = /etc/aliases/$domain. The $domain part will be expanded to the incoming domain name. In addition, we use the aliases file to utterly deny mail for certain local parts by using the :blackhole: command, which Exim interprets as you would expect. Does anyone know how to reject these on Exchange without having them tied to a mailbox? We had a single mailbox named "devnull" with all these aliases we didn't care to see in the first place, but our mail admin still got bounce messages for them. Dumb design, IMO. This is much cleaner. Our hubaliases file looks like this:

blair: bill
service: thecount
jessicaw: jessica

abuse: fumanchu
bofh: fumanchu
postmaster: fumanchu
root: fumanchu
rbre: fumanchu
mailadmin: fumanchu

elninokit: barney
draw-er: barney
thedraw-er: barney
spas: barney
pray: barney

sheril: sheri

internships: dperkio
volunteers: erin

mtsc: missionservices
mtcs: missionservices
amormtsc: missionservices

adminmgr: wjohnson
bajamgr: yld
fldmgr: yld

corinnap: :blackhole:
devnull: :blackhole:
LoriLenz: :blackhole:
mt5c: :blackhole:
proverbios: :blackhole:
rinnas: :blackhole:
ryanmorrison: :blackhole:
ship: :blackhole:
sunshine: :blackhole:

Again, this is an example. Ours is longer, and if you find performance to be an issue, you can use cdb files, or even SQL queries to something like Postgres. Notice that each valid address above aliases to an entry in our hubusers file.

Routers, etc.

We include the standard routers by default:

######################################################################
#                      ROUTERS CONFIGURATION                         #
#            Specifies how remote addresses are handled              #
######################################################################

lookuphost:
  driver = lookuphost
  transport = remote_smtp

literal:
  driver = ipliteral
  transport = remote_smtp

end

Test Stage 1

At this point, you should test what you have so far. Start exim if it hasn't been started already. Try testing a few local addresses and aliases:

tuxville:/# exim -bt fumanchu@amor.org
fumanchu@amor.org
  deliver to fumanchu in domain amor.org
  director = hubbed_mail, transport = local_smtp

tuxville:/# exim -bt abuse@comebuildhope.com
fumanchu@amor.org
    <-- abuse@comebuildhope.com
  deliver to fumanchu in domain amor.org
  director = hubbed_mail, transport = local_smtp

You should receive a listing of operations on the address, and the final director and transport used for delivery.

Rewriting

Not only do we want fumanchu@multiple.domains.tld to all reach fumanchu@amor.org, but we want fudogg@amor.org to go to fumanchu@amor.org. We can accomplish this with a line in the REWRITE section:

######################################################################
#                      REWRITE CONFIGURATION                         #
######################################################################

^.*fu.*@.*$ ${lookup{$domain}lsearch{/etc/exim/domains}{fumanchu@amor.org}fail}

end

The first part is a regex against which the incoming recipient address will be evaluated. If it matches, it will be rewritten according to the second part. In this case (and all our cases), we include a further condition in the second part which makes sure the incoming domain is listed in our domains file. If it is, the address will be rewritten as fumanchu@amor.org, otherwise, the rewriting fails, not the processing. Most importantly, perhaps, we are now able to do something we could only dream about in Exchange: receive multiple addresses with multiple domains. At our org, we want intern, interns, internship, and internships to go to the same address. Multiply that by six domains, and you end up with a lot of addresses to write one line at a time in an Exchange policy. Using Exim rewriting, we can deliver all addresses that start with intern to internships@amor.org, regardless of domain. We can write a single line:

^.*intern.*@.*$ ${lookup{$domain}lsearch{/etc/exim/domains}{internships@amor.org}fail}

Ditto for volunteers, which are also called "E2s" here:

^.*vol.*@.*$ ${lookup{$domain}lsearch{/etc/exim/domains}{volunteers@amor.org}fail}
^.*e2.*@.*$ ${lookup{$domain}lsearch{/etc/exim/domains}{volunteers@amor.org}fail}

Extra Goodies

We are not limited to simply shuttling mail through the hub. Now that we have a more powerful MTA like Exim, we can implement some other niceties. First, we can do something more meaningful with mail addressed to ex-employees than simply bouncing it. Add the following to the TRANSPORTS section of exim.conf:

# An auto-reply transport for ex-employees
ex_employees:
  driver = autoreply
  file = ${lookup {$local_part} lsearch {EXEMPDIRindex}{EXEMPDIR$value}} 
  from = postmaster@amor.org
  subject = Recipient not found
  to = $sender_address
  user = mail
  group = mail

And a corresponding director (in the DIRECTORS section, of course):

exemployees_director:
  driver = smartuser
  condition = ${lookup {$local_part} lsearch {EXEMPDIRindex}{yes}{no}} 
  transport = ex_employees
  domains = /etc/exim/domains

These take advantage of the macro we set up long ago, called "EXEMPDIR". Exim will directly substitute the value wherever the name is found, so the condition line:

  condition = ${lookup {$local_part} lsearch {EXEMPDIRindex}{yes}{no}} 

is interpreted by Exim as:

  condition = ${lookup {$local_part} lsearch {/etc/exim/exemployees/index}{yes}{no}} 

This director/transport pair processes any incoming local parts which are listed in /etc/exim/exemployees/index, which is another key: value file. It looks like this:

jo-anne: interns
jlaw: interns

The director only checks to see if the local part of the recipient address is listed in the file; it does nothing with the "interns" value, in our example. The transport, however, does use it, and sends an autoreply to the sender with the contents of /etc/exim/exemployees/interns as the body of the message. The Subject, From, and other headers can be set as you see in the transport. I recommend cc'ing yourself while you test this.

Test Stage 2

Test the rewriting rules with the -brw switch:

tuxville:/# exim -brw fudogg@amor.org
  sender: fumanchu@amor.org
    from: fumanchu@amor.org
      to: fumanchu@amor.org
      cc: fumanchu@amor.org
     bcc: fumanchu@amor.org
reply-to: fumanchu@amor.org
env-from: fumanchu@amor.org
  env-to: fumanchu@amor.org

Test the exemployees director:

tuxville:/# exim -bt jo-anne@amor.org
jo-anne@amor.org
  deliver to jo-anne in domain amor.org
  director = exemployees_director, transport = ex_employees

Finally, if you haven't already, send some mail to yourself, at myself@eximbox.mydomain.org, using your favorite mail client (mine happens to be telnet for this sort of thing :). This assumes your Exim box is registered with nameservers on your LAN (and it should be). Add a static entry to DDNS, if you use that. Make sure the existing Exchange server can ping the Exim box by name, and that the Exim box can ping the Exchange server by IP.

Turning It On

All of our domains have MX records pointing to mail.amor.org. My Exchange server sits behind two layers of firewalls already, so "turning it on" consisted of telling my Exchange server that it was no longer "mail.amor.org", and then telling our firewalls to route incoming port 25 to the Exim box. Total downtime consisted of a couple of seconds to hit "Enter" twice.

Serious Testing Time

Mostly consisting of waiting for the screams... No screams? Good for you. Send test mail. Check user mailboxes. Check the logfiles on Exim:

cat /var/log/exim/mainlog | grep hubbed_mail
cat /var/log/exim/mainlog | grep fall_through
tail /var/log/exim/rejectlog

Spam Assassin

SpamAssassin is a free (speech and beer) standalone spam filter. It uses header analysis, text analysis, blacklists, collaborative spam-tracking databases like Vipul, and even a Bayesian filter to assign a "spam score" to each message processed. If the spam score exceeds a certain limit, the message is not deleted (by default, you can do this if you want with the right exim.conf), but is tagged as spam in a way that is readily apparent to the end-user.

Configure SA

You really shouldn't mess with the scoring bits of SA until you've gotten some feedback from using the process for a while. You should read the READMEs, man pages, etcetera, and learn how to run spamd on startup. In particular, you should edit /etc/default/spamassassin:

ENABLED=1
OPTIONS="-F 0"

This starts spamd at boot and excludes the From header. Thanks to dman.

Configure Exim

SA is not embedded or compiled into Exim in any way. We have multiple options for linking the two, but the basic approach is to create (as you might guess), a transport and a couple of directors. First, the transport:

# This transport does the actual checking via SpamAssassin.
spamcheck:
  driver = pipe
  command = /usr/sbin/exim -oMr spam-scanned -bS
  transport_filter = /usr/bin/spamc
  bsmtp = all
  home_directory = "/tmp"
  current_directory = "/tmp"
  user = mail
  group = mail
  return_path_add = false
  log_output = true
  return_fail_output = true
  prefix = 
  suffix = 

This is simpler than it looks. The transport line is the real workhorse. It does nothing more than pipe our incoming message to spamc, which is a client for the spamd server. This allows for much faster processing than single calls to spamassassin. Multiple messages are provided for by using batched SMTP, or "bsmtp". The command line uses BSMTP (-bS) to hand scanned mail back to Exim; it is arbitrarily handed back with a protocol name of "spam-scanned", which allows us to write our director to avoid scanning mail endlessly. Here's the director:

spamcheck_director:
  no_verify
  condition = "${if and {{!def:h_X-Spam-Flag:} {!eq{$received_protocol}{spam-scanned}} {!eq {$received_protocol}{local}}} {1}{0}}"
  driver = smartuser
  transport = spamcheck

This should go into your .conf file after the system_aliases and before hubbed_mail. The no_verify directive instructs Exim to skip this director while determining if a given address is locally valid. We use a "smartuser" director again to catch all mail. The condition line is another string expansion, and checks to see that we have not already scanned this mail. If the mail came in with a protocol named "spam-scanned" (which only local processes can set), then we assume the mail came from SpamAssassin, skip this director, and process the mail normally. The local protocol is also accepted.

Test Stage 3

Look at incoming mail. All incoming mail should now be scanned with SA. Mail that does not meet or exceed the minimum spam score will still have Spam-Flag headers added, showing you the score and which tests were matched by the message.

Extra Goodies for SA

SA now includes a Bayesian filter, which is a type of learning filter. You feed it spam and ham, and it learns the difference statistically. I added a way for end-users to report false positives and false negatives for SA, by simply forwarding the original mail to either spam@mail.amor.org or ham@mail.amor.org. I had to use the full machine.domain name, since Exchange still handles internal mail. spam@mail.amor.org should go straight to the Exim hub. The received mail for these two addresses is added into the SA-learn database for the Bayes filter. You will need to do periodic rebuilds of the SA-learn db on your own; we explicitly do not do that for each email, due to the performance hit.

First, I had to add the following kludge to get around the machine.domain kludge. There's a better way I haven't found yet. Let me know if you know how. For now, I just rewrite name@mail.amor.org to name@amor.org:

^.*@.*\.amor\.org$ ${local_part}@amor.org

Then we add a new transport:

# This transport is used for users to forward their mail
# to a place where SpamAssassin can then learn from it, Bayes-style. 
learn_spam:
  driver = pipe
  command = /usr/bin/sa-learn --${local_part} --no-rebuild --single
  user = mail
  group = mail

...which uses the command line to send our mail to sa-learn. The ${local_part} will be either "ham" or "spam". Then we have a corresponding director:

learn_spam_director:
  driver = smartuser 
  local_parts = spam : ham
  transport = learn_spam
  condition = ${lookup {$sender_address_domain} lsearch {/etc/exim/domains}{yes}{no}}

...which only processes the local addresses "spam" and "ham", and only for our local domains. This goes first in the DIRECTORS section, btw.

Concluding Remarks

  1. Test, test, test. If you want to look really cool, write a full test suite while you implement this. I did. That will certainly help you maintain policy if you change things in the future.
  2. Help me make this better if you know how.
  3. There are times above where I'm sure I'm wasting cycles checking things I don't need to. But I haven't had time to test their removal.
  4. I eventually want a means, similar to the learn_spam_director, to allow users to add good and bad senders to SA'a white- and black-lists. Watch this space.
  5. Let me know if this document helped you. fumanchu@amor.org.
  6. The possibilities with Exim are far greater than with Exchange. Take this as a base and run with it.

Robert Brewer
MIS
Amor Ministries
fumanchu@amor.org

"Why a hub" revisited

There was a recent thread about this on the exim-users mailing list. Some answers to the question, "What is the purpose of [using exim as a relay]?" Responses: