Let me explain some of the things I do. My business is
junkemailfilter.com and it is likely from what I can tell the most
accurate spam filter and most efficient on the planet. (Efficient in the
amount of computing power it takes to process mail). What is my secret?
It's not a secret, but I use every trick that exists and several I
invented myself.
My service is that I do front end spam filtering. Email from the world
comes in, I process it, and send the good email on to their existing
server. Users set 3 MX records as follows:
mx.junkemailfilter.com - 10
mx.junkemailfilter.net - 20
mx.junkemailfilter.org - 30
The lowest MX record points to one main server and I have a second
server that runs Spam Assassin. The second MX points to 2-3 IP addresses
(I'm adding on) and in a remote location so that if there is a failure
on the main server the other servers will continure to process and
forward email until the main server comes back up. The highest MX just
returns 4xx on anything that connects to it. It's just a spam bait MX
for spammers that try to go in the back door.
There are a number of things I do that are different than most spam
filters. I try to catch spam based on the spammers behavour first and
then look at the message content last. I also actively look for non spam
as well as spam bypassing spam filtering based on a MySQL reputation
database. Using these tricks I only have to feed SpamAssassin about 2%
of the messages I process, which make my system very efficient.
I'm using Exim to do the tricky stuff. It is very versatile and can do
just about anything I want. So if you're wondering how I do that with
Qmail - I don't. I'm just here investigating why Qmail does unusual MX
processing. See my other thread for details.
Spam is different than regular email in several ways. Spammers are out
to get your money. So spammers are always trying to get you to DO
something. And that is something that you can often trap for.
Spammers are also working from lists that are mined from web sites and
such and the spammer has no relationship to the recipient, and there are
often ways you can tell that gives the spammer away.
Spammers are usually on the move. Since most places don't allow spamming
they have to set up, spam, and then move on. Or they are using windows
zombies on home computers.
Spammers goal is to spam as many people as possible. So they go for
those easiest to spam. If they encounter resistence they move on. They
don't usually keep retrying until the message is delivered. Real email
will keep trying becuse getting the specific message delivered is important.
Spammers also try to do tricky stuff to get past spam filters that only
spammers do. Once you identify the trick then it becomes a tell that
gives away all spammers using that trick with 100% accuracy.
So - getting to my point. My 3 MX levels have completely different
behavour relating to the use of 4xx errors. As I said my highest MX
always returns 4xx on everything just to send away spammers who are
going in the back door. And like I said this gets rid of about 350,000
spams a day with 0 processing power. In fact - if any of you want to cut
about 15% of your spam all you have to do is create a new highest MX and
point it to a dead IP address and you'll get a little less spam.
On the lowest MX, the main server, there are classifications of what I
call suspicious hosts. This is a category where 99% of the email comes
from spammers, but you can't just block it because the 1% good email
would bounce. These include hosts from dynamic IP ranges, blacklists
like spamcop that is just not quite good enough to just bounce on, and
hosts with bad reverse DNS. So the idea here is if I force them to retry
the second MX by returning 4xx errors then the hit and run spammers go
away and the ones who retry will usually be the 1% of good email that I
need to pass. This leads to a significant reduction in spam. Over 90%
actually.
Many virus infected spam zombies are not that sophisticated and they are
only smart enough to send to the lowest MX. Real MTAs are expected to be
smarter programs and will retry them all. That is one of the tricks I
use to separate real MTAs from virus infected zombies.
Every trick I use reduces some spam and I have thousands of tricks. The
front end levels take out the bulk of it and as the mesage progresses
through the system more and more spam is sheared off. I also have a fast
track whitelisting system to shear off the good email up front and pass
it on quickly also reducing system load and false positives.
Additionally the second tier servers are also backup should bad things
happen to that main servers the backups kick in and process email
without the end users experiencing any down time.
I also use it for load balancing. If for some reason the load levels go
above 80 on the main server I start returning 4xx errors to incoming
connections as a way of limiting the problem, letting the server to
process the jobs causing the load and taking new mail after the load
drops down again. The expectation is that on peak load that the second
tier servers will get the traffic instead allowing the email to pass
through them and on to the destination. And it works very well for those
sending mail with all other MTAs than qmail. The qmail senders have to
wait for the main server to accept the mail due to the way qmail works.
So - my point is, 4xx errors only mean that the server you are connected
to is not ready for some reason and it doesn't mean that any other
server on my network will also return a 4xx. The concept that if one
server returns 4xx that it speaks for all servers is a concept that I
have never heard until yesterday in this forum and it only exists in the
qmail world.
So - I have to say that I'm confused and surprized by this and still
trying to comprehend why qmail behaves differently than every other MTA
in this regard.
|