Building an Anti-Spam solution on Sun

SpamAssassin, Amavisd-New, and Sendmail on Solaris.

Jason B Consorti

The scourge of Spam can be tackled with the ultimate tool in today's arsenal: SpamAssassin. Here is how to build a single or dual server mail infrastructure with Solaris. You can easily adapt this to your favorite flavor of Unix.

Theory of Operation

Mail hits Sendmail. Sendmail talks to its Milter library, by sending content, header and envelope information. Sendmail expects back a thumbs up, thumbs down, or a shoulder shrug (Yes, No, Not Enough Info). Milter talks to Amavis-Milter (included in Amavisd-new), to get this information. Amavis-Milter is basically just a facilitating pipe to the Amavisd-new daemon. The Amavisd daemon then fires up SpamAssassin.

SpamAssassin does several things to create a score. This score is then compared to a threshold. If this score is below the threshold, it marks the message as legitimate to Amavisd who makes sure that Sendmail sends the message along. If the score is at or above this threshold, SpamAssasin marks the message as Spam and makes sure Amavisd knows about this, who in turn tells Sendmail. Spam creates the score by:
  • Filtering the message and looking for key attributes that are the mark of Spam. It looks for things like the words "Money Back Guarantee", or more sophisticated tests like bad html tags. Each of these "hits" add to the overall score (or spamminess) of a message.
  • Doing DCC and Razor checks. These checks are simple: an MD5 hash of a message is created and compared to offsite databases maintaned by trustworthy administrators. If the check comes back positive, this adds to the score.
  • Applying Bayesian algorithms to the messages. Basically this means that SpamAssassin keeps track of what are the most common characteristics of Spam for your site and if it sees these characteristics are in a message, it adds to the score.
  • Checking its whitelist and weighs it. If someone keeps sending good email to the server, SpamAssasin remembers this. If they just happen to send an email that looks real spammy, SpamAssasin may just let it go through. Conversley, if someone keeps sending spammy messages, SpamAssasin will always be suspicious of email from that source. This weight transforms the final score and is configurable.
Currently, that threshold score is 6.3. Don't ask why, it just is 6.3. You can configure this score higher or lower, but I don't suggest touching it. I've seen friendly mail messages be marked as low as -5, and spam mail get marked as high as 90. I have yet to come across a false positive ( in other words, a friendly email scored at or above 6.3).

After SpamAssasin tells Amavisd about a spam message, Amavisd will, if configured correctly, send an email to the source of the suspicious mail. This message basically explains why this message was thought to be spam and if it is not spam, it provides instructions on how to make sure future emails are let through. The theory here is that spammers do not usually take incoming mail but real people do. I've some emails to spammers get taken. Very few are not refused.

Mise En Place

First, I recommend that you have the appropriate access to the Internet. You will need ftp, DCC (UDP 6277), Razor (TCP 2703), smtp, dns (UDP and TCP 53), and ping access to the outside world. You may want to have ident access out as well, though it is not needed. Putting this server behind a firewall and allowing smtp access in to it would be wise. You can close the ftp hole when you are done installing.

Second, I recommend the latest version of Solaris. As of this writing, Solaris 9 is the freshest available. The latest version of Solaris has the advantage in that it runs a rather recent version of Sendmail already and you have less prep work to do for putting the latest version on. Also, patch it with the latest Recommended Patch Cluster from Sun. I also recommend running the latest JASS from Sun. I tend to modify JASS's "hardening.driver" before I run "jass-execute", just to make sure everything I want secured is secured and what I don't want touched (ie. pam, syslog.conf) is left alone

These are the software packages you will need. You can collect some from Sun Freeware or any of its mirror sites. I prefer Secsup.
  • Latest m4
  • Latest gnu make (I always use this over Sun's make)
  • Latest gnu patch (Sun's patch is not the flavor everyone else uses)
  • Latest Sendmail (I use Sendmail and not Postfix or Qmail, simply because more Solaris admins know Sendmail).
  • A good version of GCC (I am using v3.3 currently)
  • Latest PERL (I am using v5.8.0 currently)
  • Latest SpamAssassin PERL package (I prefer NOT to use the CPAN module).
  • Latest DCC
  • Latest Razor
  • Latest Amavisd-New
  • Razor patch for Amavisd Config (just in case)
  • Razor patch for Amavisd Core (just in case)
  • Enough CPAN modules to choke a camel:
    MD5 LWP Mail::Internet Archive::Tar Archive::Zip IO::Wrap IO::Stringy MIME::Words MIME::Head MIME::Body MIME::Entity MIME::Parser Net::SMTP Net::DNS Net::Ping Net::Server Net::Server::PreForkSimple Convert::TNEF Convert::UUlib MIME::Decoder::Base64 MIME::Decoder::Binary MIME::Decoder::Gzip64 MIME::Decoder::NBit MIME::Decoder::QuotedPrint MIME::Decoder::UU Time::HiRes Digest::SHA1 Digest::Nilsimsa Getopt::Long File::Copy Bit::Vector Date::Calc Unix::Syslog (needs to be forced to work on Solaris) Net::DNS::Resolver::Recurse (needs to be forced if you answer yes for tests.)

Don't worry about downloading the CPAN modules. I made a quick perl script to suck down all but the last two. The other two we will grab later by hand. It is important to get everything else into your home directory, unzipped and untarr'd and ready to go. If you are doing this as user root, make a temporary directory and put everything in there.

Optionally you can get these as well (you might have to Google search these):
  • lharc
  • zoo
  • arc

Step By Step Instructions

I will first talk about putting everything on one server, but keep in mind that it is possible to separate the Mail engine from the Spam engine. All steps assume you are running with root privleges. If you are sudo, please do all your work in your home directory; you can clean up later. If your real id is 0, then do this in some temp directory (assuming you put all the pieces into that temp directory). Be advised, then, that / will have some files put into it if your real id is 0, but we'll move them aside as we come across them. Assume the commands that I give are run sudo'd to root with your own real id intact.

First, install perl, m4, gnu's make, gnu's gcc, and gnu's patch. This means downloading, gunzip'ing, and pkgadd'ing. For example:

# cd ~
# gunzip m4-1.4-sol9-sparc-local.gz
# pkgadd -d ./gunzip m4-1.4-sol9-sparc-local

Make sure that your PATH includes "/usr/local/bin" first from here on in. You must remove Sun's softlink "/usr/bin/perl" and put your own to point to "/usr/local/bin/perl".

# rm /usr/bin/perl
# ln -s /usr/local/bin/perl /usr/bin/perl

Compile the latest Sendmail. I will not go into detail on how to compile and install Sendmail. The important thing is that you build Sendmail's milter:

# cd ~/sendmail-8.12.9/devtools/Site
# cat >> site.config.m4 << END
> APPENDDEF(`conf_sendmail_ENVDEF', `-DMILTER')

# cd ../../libmilter
# ./Build
# ./Build install

You must configure Sendmail according to your local needs. If you intend to retrieve mail via IMAP on the mail server, then you will configure differently than, say, if you intend to forward mail to a Windows Lotus Notes server. It's up to your needs and I can't cover all iterations here. I do recommend configuring Sendmail to use rbls (Realtime Blackhole Lists). Commercially, I recommend MAPS.

After you are done building and installing Sendmail according to its instructions, get those CPAN modules. Use the perl script I provided to get most of the modules. For the last two, you can fetch and install by hand:

# cd ~
# ./
# perl -MCPAN -e shell
cpan> force install Unix::Syslog
cpan> force install Net::DNS::Reolver::Recurse

If you never ran the CPAN module before, it will ask you a bunch of questions. Just take the defaults, EXCEPT when it asks you to choose between "follow, ask, or ignore". Choose "follow". It will make your life a lot easier. "follow" just means it will automatically retrieve any dependent modules and install them.

Then, if you want, build and install zoo, lha, and arc. Basically, these are used to open message content created with those tools. You don't find many messages done with those anymore so don't sweat it if you can't find them. Next up, you need to make a group called "amavis". I picked the number 60003 out of the thin air; you can choose your own. Then make a user "amavis" with his home dir in /var. Leave its password locked:

# groupadd -g 60003 amavis
# useradd -g amavis -d /var/amavis -m -c Amavis -s /bin/sh amavis

Configure, gmake and install dcc:

# cd ~/dcc-dccd-1.1.36
# ./configure
# gmake
# gmake install

Test that cdcc works with:

# /usr/local/bin/cdcc 'info'

It should come back with things like:
# 08/17/03 11:47:52 EDT /var/dcc/map
# Will re-resolve names after 12:15:17
# 212.59 ms chosen delay 12 total addresses 11 working
IPv6 off,- RTT+0 ms anon
# *,- Servercave server-ID 1183
# 100% of 32 requests ok 215.18+0 ms RTT 203 ms queue wait
#,- neonova server-ID 1127
# 84% of 32 requests ok 650.18+0 ms RTT 493 ms queue wait
#,- WEiAPG server-ID 1072
# 56% of 9 requests ok 2838.82+0 ms RTT 2032 ms queue wait
#,- SPAMCHECK.NET server-ID 168
# 75% of 32 requests ok 1533.95+0 ms RTT 193 ms queue wait
#,- NIET server-ID 1080
# 88% of 32 requests ok 650.59+0 ms RTT 103 ms queue wait
#,- Misty server-ID 1170
# 81% of 32 requests ok 653.77+0 ms RTT 238 ms queue wait
#,- MessageCare server-ID 110
# 88% of 32 requests ok 809.00+0 ms RTT 170 ms queue wait
#,- servers server-ID 1049
# 84% of 32 requests ok 949.58+0 ms RTT 155 ms queue wait
#,- SdV server-ID 1179
# 91% of 32 requests ok 701.27+0 ms RTT 110 ms queue wait
#,- server-ID 1181
# 84% of 32 requests ok 739.80+0 ms RTT 106 ms queue wait
#,- SINECTIS server-ID 1114
# 81% of 32 requests ok 945.85+0 ms RTT 104 ms queue wait

localhost,- RTT-1000 ms 32768
#,- localhost
# not answering

Now make, install, patch, and register razor.

# cd ~/razor-agents-2.22
# perl Makefile.PL
# gmake
# gmake install
# cd /usr/local/lib/perl5/site_perl/5.8.0/sun4-solaris/Razor2/Client
# gpatch < ~/Razor2.patch
# gpatch < ~/Razor2.patch2
# razor-client
# razor-admin -create

You may need to rehash your PATH, if you are running tcsh, several times, since the last several commands places files in "/usr/local/bin".

You will need to register with Razor before they will let you use their database. You will need to use a unique email address for each instance of Razor you deploy. It's annoying when you have a mail cluster, buy what can you do?

# razor-admin -register -user

Then copy over the .razor directory to the amavis home:

# cp -r ~/.razor /var/amavis
# chown -R amavis:amavis /var/amavis/.razor

Next up is SpamAssassin. Configure, gmake and install it.:

# cd ~/Mail-SpamAssassin-2.55
# ./configure
# gmake
# gmake install

There are a couple more things to do for SpamAssassin:

# mkdir /var/amavis/.spamassasin
# chown amavis:amavis /var/amavis/.spamassassin
# touch /var/amavis/.spamassassin/user_prefs
# chown amavis:amavis /var/amavis/.spamassassin/user_prefs

You will need to make the config file in /etc/mail/spamassassin. Here is my It has some explanations in it for some of the variables. Edit it for your own site.

# cp ~/ /etc/mail/spamassassin
# vi /etc/mail/spamassassin

Now for amavisd-new. It is a PERL script so installation is easy. The milter helper must be built.

# touch /var/amavis/amavis.log
# cd ~/amavisd-new-20030314
# cp amavisd /usr/local/sbin
# cp amavisd.conf /usr/local/etc
# cd helper-progs
# ./configure
# gmake
# cp amavis-milter /usr/local/sbin
# mkdir /var/amavis/tmp
# chown amavis:amavis /var/amavis/tmp

Edit "/usr/local/etc/amavid.conf". It is A LOT to work on, but don't be discouraged; it has great comments to help you along. Here are some important variables to set:
  • $daemon_user and $daemon_group are set to "amavis"
  • $notify_method = 'pipe:flags=q argv=/usr/lib/sendmail -Ac -i -odd -f ${sender} -- ${recipient}';
  • @bypass_virus_checks_acl = qw( . );
  • @local_domains_acl gets a full list of all the domains you receive email for
  • $unix_socketname = "$MYHOME/amavisd.sock";
  • $notify_spam_sender_templ = read_text('/var/amavis/notify_spam_sender.txt');
  • $final_spam_destiny = D_BOUNCE;
  • read_hash(\%whitelist_sender, '/var/amavis/whitelist');
  • read_hash(\%blacklist_sender, '/var/amavis/blacklist');
  • read_hash(\%spam_lovers, '/var/amavis/spam_lovers');
  • @bypass_spam_checks_acl = ( "!.$mydomain", "." );
If you host multiple domains, be sure to add them BEFORE the "." in the "bypass_spam_checks_acl". For instance:

@bypass_spam_checks_acl = ( "!.$mydomain", "!", "!", "." );

This variable ensures that incoming mail is scanned only and not outgoing mail. Unfortunately, the fact that this "." must be at the end is not documented.

You must create a file called notify_spam_sender.txt You must create, also, in "/var/amavis", the files blacklist, whitelist, and spam_lovers. In spam_lovers, you will want to put the user your $mailfrom_notify_spamadmin and $hdrfrom_notify_sender is set to in the amavisd.conf file. If you have any obvious email addresses that would be usefull to universally whitelist, add those addresses or subnets to the whitelist file. Be sure to chown everything to amavis:amavis.

# cd ~
# cp notify_spam_sender.txt /var/amavis
# chown amavis:amavis /var/amavis/notify_spam_sender.txt
# touch /var/amavis/blacklist
# chown amavis:amavis /var/amavis/blacklist
# touch /var/amavis/whitelist
# chown amavis:amavis /var/amavis/whitelist
# cat >> /var/amavis/spam_lovers << END
# chown amavis:amavis /var/amavis/spam_lovers

Now you need a proper amavisd.startup file. Stick that in "/etc/init.d/" and link it from rc2.d and have it start after sendmail. This startup script does a poor job of killing amavis-milter, so you may find that you have to kill it by hand.

# cp ~/amavisd.startup /etc/init.d/
# ln -s /etc/init.d/amavisd.startup /etc/rc2.d/S89amavisd

To get Sendmail to recognize the amavis milter, you need to add these to your sendmail m4 file:

define(`_FFR_MILTER', `1')dnl
INPUT_MAIL_FILTER(`milter-amavis', `S=local:/var/amavis/amavis-milter.sock, T=S: 10m;R:10m;E:10m')dnl

Rebuild your cf file and deploy it to "/etc/mail".

That's it. Start up amavis from the startup script first. Then restart sendmail. Follow your log file (hopefully "/var/log/syslog") for debugging.

# /etc/init.d/amavisd.startup start
# /etc/init.d/sendmail stop
# /etc/init.d/sendmail start

Bayesian Learning

You should note that the most powerful aspect of SpamAssassin is its Bayesian engine. It learns what is and isn't spam, thanks to help from its own prepackaged scoring scheme. It takes a thousand or so examples before SpamAssassin uses the Bayesian database it builds. You can speed this up by making some adjustments in the "/etc/mail/spamassassin/" file.

# Any mail that scores below this threshold
# is used in the Bayesian learning process
# as an example of non-spam (ham)
auto_learn_threshold_nonspam 1.00
# Any mail that scores above this threshold
# is used in the Bayesian learning process
# as an example of spam
auto_learn_threshold_spam 8.00

You can comment out these changes after you start seeing "BAYES_" in your syslog.

Separate Email and Spam engines

You would want to do this for a couple reasons: First, content analyzation has some cpu cost so it is wise to do this on a swift server. If you have a fair amount of outgoing mail, this will ensure that that mail goes out without much delay. You can have multiple Mail engines use a single Spam engine.

Second, you can play with Sendmail's queue timeouts on the Spam engine. The bounced messages (the ones created from the "notify_spam_sender.txt" file) can sit for 5 days on a default configured Sendmail queue. If you have just an engine for Spam, you can drop it to, say, one day, and not interfere with mail that is delayed for good reason. It is possible to do this on one server, but that involves running two different Sendmails listening on different ports, yadda, yadda, yadda, and it's a pain in the ass.

Third, for the paranoid, you can lock down the Mail engine for just smtp (in/out), ident (out), dns (out), and out to the Spam engine to the milter port you decide on. Then you can lock down the Spam engine to allow dcc (out), razor (out), and milter (in). If your Spam server is down, Sendmail won't choke. It will simply move the mail along as if it got an ok from the dead server. Also, on the Spam engine, you can eliminate the mail server and just keep the client queue processor (that means running only one instance with "-Ac" and not the "-bd" option).

Now, to do this, you will need to build both servers with Sendmail. The Spam engine will need all the good stuff, like perl, perl modules, gcc, amavis, spamassassin, dcc, etc. Then, on the Mail engine, edit the m4 with this entry rather than the one above:

define(`_FFR_MILTER', `1')dnl
INPUT_MAIL_FILTER(`milter-amavis', `S=inet:10025@spam1, T=S:10m;R:10m;E:10m')dnl

Where "spam1" is the name of your Spam engine. The "10025" is a port you can pull out of the thin air as long as it is not being used. Rebuild your cf file and deploy it to "/etc/mail". On your Spam engine, edit the amavis startup file and change the milter invocation to amavisd_startup.remote:

/usr/local/sbin/amavis-milter -p inet:10025@ >/dev/null 2>&1 &

And that's it!

Inspiration and Useful Links

Special thank you to Harker who taught me not to fear Sendmail.

Updated 15 September 2003

Notes Since 15 September 2003

  • I've noted 3 false positives in about 250,000 messages on my corporate site.
  • A redundant, but better, note about the Bayesian process: I would make Amavisd-new pass on all email until you build a good Bayesian database for SpamAssassin. I can't stress enough how important the Bayesian process is to your fight against Spam. You should pay special attention to getting this process up and running before implementing your Anti-Spam solution. You can do this by first configuring Amavisd-new's "amavisd.conf" file to pass instead of bounce spammy mail:

    $final_spam_destiny = D_PASS;

    And then you should tweak these two variables in SpamAssasin's "" file to use a lot of mail in making its Bayesian database:

    auto_learn_threshold_nonspam 0.00
    auto_learn_threshold_spam 8.00

    These setting tell SpamAssasin to use email with a spammy score of 0.00 or below as an example of non-spam, and email with a spammy score of 8.00 or higher as spam. The defaults are -2 and 15 respectively.

    You can check on the process with a tool supplied with SpamAssasin's source (under the "tools" subdirectory in the source) called "check_bayes_db". You can copy this file to "/usr/local/bin", then run it AS user amavis. You must "su - amavis" to see the database:

    # su - amavis
    $ /usr/local/bin/check_bayes_db | head
    0.000 0 0 0 non-token data: db format = on-the-fly probs, expiry, scan-counting
    0.000 0 4725 0 non-token data: nspam
    0.000 0 544 0 non-token data: nham
    0.000 0 104604 0 non-token data: ntokens
    0.000 0 1739 0 non-token data: oldest age
    0.000 0 13060 0 non-token data: current scan-count
    0.000 0 12506 0 non-token data: last expiry scan-count
    0.803 1 0 10122 H*r:200.216.218
    0.803 1 0 10438 neuropathology
    0.803 1 0 10482 24046
    You can see that, here, SpamAssassin has 4725 examples of spam and 544 examples of ham (regular non-spam email). I recommend tweaking the "auto_threshold" numbers until you see ham outnumber spam by 50%.

    After a while, you will start to see "BAYES" show up in the log file. This means that SpamAssassin has enough information to start applying its Bayesian algorithms to add to the spammy score of an email. At this point, you should start bouncing email by changing "amavisd.conf":

    $final_spam_destiny = D_BOUNCE;

    And it would be prudent to comment out the two "auto_learn_threshold" settings in SpamAssasin's "" file.
    And always remember to "kill -HUP" the amavisd daemons (not the milter) to let all these changes take effect. You even need to "kill -HUP" amavisd if you only touch SpamAssassin's "" file, since this is loaded once per life cycle of the amavisd parent daemon.

    # ps -ef | grep amavis
     amavis 18274 1 0 Sep 05 ? 4:03 /usr/local/sbin/amavis-milter -p inet:10025@
    amavis 18992 16575 1 11:05:26 ? 0:01 /usr/local/bin/perl -T /usr/local/sbin/amavisd -c /usr/local/etc/amavisd.conf
    amavis 16575 1 0 07:32:08 ? 0:04 /usr/local/bin/perl -T /usr/local/sbin/amavisd -c /usr/local/etc/amavisd.conf
    amavis 18991 16575 1 11:04:54 ? 0:03 /usr/local/bin/perl -T /usr/local/sbin/amavisd -c /usr/local/etc/amavisd.conf
    # kill -HUP 16575

  • I'm begining to see a lot of spammers automatically reply to my "notify_sender.txt" bounce email. I would be very careful with who you add to your "/var/amavis/whitelist" file. I've even put a line into the "notify_sender.txt" file asking for people to write a brief note so I can know they are human and not an automatic reply from a spammer.

    The whitelist, blacklist, and spam_lovers files are flat files, with one entry each line. The entries can be full email address, domains, or sub-domains. Whenever you change these files, amavisd will need to be restarted or HUP'd. You will find yourself adding people from time to time to the whitelist.

  • (December 9th, 2003) I've also noticed that razor's sites sometimes go stale. I have a cron job in user amavis' crontab:

    30 3 * * * /usr/local/bin/razor-admin -discover

    This will update the list of razor sites available to us.

  • (December 9th, 2003) I have checked the log files to determine which rules the false negatives (spam that didn't get caught) seem to share. This should be picked up by the Bayesian filters, but sometimes the filters need to be kicked. I run a perl script I wrote called This script hunts down mail that Milter sent along, and finds what rules were triggered. I pipe this output to sort and look for the most hit rules and tweak them in /etc/mail/spamassassin/

    # ./ /var/log/syslog | sort -n -k 3 -

  • (December 12th 2003) One thing I regret not doing in my deployment: asking the users which mailling lists they subscribe to and want to ensure they keep receiving mail from. My firm's employees recieve a lot of industry reports and newsletters (we're a financial firm) daily and it is embarassing to the IT department whenever some trader or portfolio manager complains that they failed to receive critical information because of our anti-spam filters. The Bayesian algorithms may not have blocked these mails right away, so the users complained that they mysteriously stopped getting the mail.
  • (April 2006) Migrated page to Google Apps. Some links may be broken. It's been a couple of years since I've left where I had installed this setup, so it may be a bit out of date.
Copyright (c) 2003 Jason B. Consorti

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.1 or any later version published by the Free Software Foundation; with no Invariant Sections, with no Front-Cover Texts, and no Back-Cover Texts. A copy of the license can be found at