Qmail
[Top] [All Lists]

Advanced tricks I use to get rid of spam using MX 4xx

To: qmail@list.cr.yp.to
Subject: Advanced tricks I use to get rid of spam using MX 4xx
From: Marc Perkel <marc@perkel.com>
Date: Sat, 25 Nov 2006 08:20:57 -0800
Delivered-to: sp-com-lists@consult.net
Delivered-to: gmail-qmail@securepoint.com
Delivered-to: sp.com.list@gmail.com
Delivered-to: mailing list qmail@list.cr.yp.to
In-reply-to: <20061125.101809.193764004.hanche@math.ntnu.no>
Mailing-list: contact qmail-help@list.cr.yp.to; run by ezmlm
References: <45676987.2050808@perkel.com> <456789E5.3000508@gatworks.com> <20061125.101809.193764004.hanche@math.ntnu.no>
User-agent: Thunderbird 1.5.0.8 (Windows/20061025)
Let me explain some of the things I do. My business is junkemailfilter.com and it is likely from what I can tell the most accurate spam filter and most efficient on the planet. (Efficient in the amount of computing power it takes to process mail). What is my secret? It's not a secret, but I use every trick that exists and several I invented myself.

My service is that I do front end spam filtering. Email from the world comes in, I process it, and send the good email on to their existing server. Users set 3 MX records as follows:

mx.junkemailfilter.com - 10
mx.junkemailfilter.net - 20
mx.junkemailfilter.org - 30

The lowest MX record points to one main server and I have a second server that runs Spam Assassin. The second MX points to 2-3 IP addresses (I'm adding on) and in a remote location so that if there is a failure on the main server the other servers will continure to process and forward email until the main server comes back up. The highest MX just returns 4xx on anything that connects to it. It's just a spam bait MX for spammers that try to go in the back door.

There are a number of things I do that are different than most spam filters. I try to catch spam based on the spammers behavour first and then look at the message content last. I also actively look for non spam as well as spam bypassing spam filtering based on a MySQL reputation database. Using these tricks I only have to feed SpamAssassin about 2% of the messages I process, which make my system very efficient.

I'm using Exim to do the tricky stuff. It is very versatile and can do just about anything I want. So if you're wondering how I do that with Qmail - I don't. I'm just here investigating why Qmail does unusual MX processing. See my other thread for details.

Spam is different than regular email in several ways. Spammers are out to get your money. So spammers are always trying to get you to DO something. And that is something that you can often trap for.

Spammers are also working from lists that are mined from web sites and such and the spammer has no relationship to the recipient, and there are often ways you can tell that gives the spammer away.

Spammers are usually on the move. Since most places don't allow spamming they have to set up, spam, and then move on. Or they are using windows zombies on home computers.

Spammers goal is to spam as many people as possible. So they go for those easiest to spam. If they encounter resistence they move on. They don't usually keep retrying until the message is delivered. Real email will keep trying becuse getting the specific message delivered is important.

Spammers also try to do tricky stuff to get past spam filters that only spammers do. Once you identify the trick then it becomes a tell that gives away all spammers using that trick with 100% accuracy.

So - getting to my point. My 3 MX levels have completely different behavour relating to the use of 4xx errors. As I said my highest MX always returns 4xx on everything just to send away spammers who are going in the back door. And like I said this gets rid of about 350,000 spams a day with 0 processing power. In fact - if any of you want to cut about 15% of your spam all you have to do is create a new highest MX and point it to a dead IP address and you'll get a little less spam.

On the lowest MX, the main server, there are classifications of what I call suspicious hosts. This is a category where 99% of the email comes from spammers, but you can't just block it because the 1% good email would bounce. These include hosts from dynamic IP ranges, blacklists like spamcop that is just not quite good enough to just bounce on, and hosts with bad reverse DNS. So the idea here is if I force them to retry the second MX by returning 4xx errors then the hit and run spammers go away and the ones who retry will usually be the 1% of good email that I need to pass. This leads to a significant reduction in spam. Over 90% actually.

Many virus infected spam zombies are not that sophisticated and they are only smart enough to send to the lowest MX. Real MTAs are expected to be smarter programs and will retry them all. That is one of the tricks I use to separate real MTAs from virus infected zombies.

Every trick I use reduces some spam and I have thousands of tricks. The front end levels take out the bulk of it and as the mesage progresses through the system more and more spam is sheared off. I also have a fast track whitelisting system to shear off the good email up front and pass it on quickly also reducing system load and false positives.

Additionally the second tier servers are also backup should bad things happen to that main servers the backups kick in and process email without the end users experiencing any down time.

I also use it for load balancing. If for some reason the load levels go above 80 on the main server I start returning 4xx errors to incoming connections as a way of limiting the problem, letting the server to process the jobs causing the load and taking new mail after the load drops down again. The expectation is that on peak load that the second tier servers will get the traffic instead allowing the email to pass through them and on to the destination. And it works very well for those sending mail with all other MTAs than qmail. The qmail senders have to wait for the main server to accept the mail due to the way qmail works.

So - my point is, 4xx errors only mean that the server you are connected to is not ready for some reason and it doesn't mean that any other server on my network will also return a 4xx. The concept that if one server returns 4xx that it speaks for all servers is a concept that I have never heard until yesterday in this forum and it only exists in the qmail world.

So - I have to say that I'm confused and surprized by this and still trying to comprehend why qmail behaves differently than every other MTA in this regard.




<Prev in Thread] Current Thread [Next in Thread>