Hello All,
I've been using Qmail on OpenBSD for the past ~7 years, and it's been a
rock-solid system with no problems the whole time. Now that I've run
into my first major problem, I'm at a real loss as to how to diagnose it.
Here's the issue: over the past several months, I've had occasional
random instances where the box in question will have slowed to a crawl,
and when I log in, I see literally dozens of qmail-smtpd processes, each
eating as much CPU as they can (since they're all running at the same
priority, this is anywhere from 2-3% each up to 99% if there's only one
rogue process). When this has happened, I've stopped qmail with
"qmailctl stop", and then run a quick bash loop (for x in `ps aux | grep
-i "/bin/qmail-smtpd" | cut -c 10-15`; do sudo kill -9 $x; done) that's
the equivalent of a killall -9 /var/qmail/bin/qmail-smtpd to get rid of
all of these spinning qmail-smtpd processes, which don't go down
gracefully. On some occasions, I've had to reboot the system to get
Qmail to behave normally again. In each case, though, it's been a
temporary thing, and since I've not had a lot of time on my hands
lately, I've not followed up too carefully.
Unfortunately, as of this morning, the problem is back -- and it's not
going away. I've tried killing the processes I don't know how many
times, restarted Qmail several times, and rebooted at least 3-4 times.
Clearly, I need to figure out what's going on, and fix the root problem.
I've gone through all of the mailing list archives and Googling I can
think of, though, and I'm really not seeing anything that looks close to
a solution, which is why I've turned up here.
Here's as much info as I think is relevant without flooding you guys
with irrelevancies:
* Celeron 1.8GHz, 512MB RAM, IDE RAID-5 (LSI MegaRAID i4 card, I've had
no problems with it
* OpenBSD 3.8 (yes, I know I need to upgrade the OS)
* Netqmail-1.05 with the validrcptto patch, and the patch that lets me
route all mail across a relay (forget the name of the patch; relaying
through outbound.mailhop.org, a DynDNS.com service)
* Running under the standard tcpserver config outlined in Life With Qmail
* Also running Courier-IMAP with authdaemond, SpamAssassin, and Horde/IMP
* Found these interesting pieces in /var/log/qmail/smtpd/current:
@40000000464db272146d1294 tcpserver: end 27580 status 256
@40000000464db27218ab8764 tcpserver: fatal: unable to bind: address
already used
@40000000464db2732a4c4cec tcpserver: status: 35/50
@40000000464db27928807244 qmail-smtpd: not in validrcptto:
Contact@schnarff.com at 71.16.199.73
@40000000464db27b1ef76fd4 tcpserver: fatal: unable to bind: address
already used
@40000000464db27c0d635c4c tcpserver: end 15227 status 0
@40000000464db27c304254fc tcpserver: status: 34/50
@40000000464db27e2ec1527c tcpserver: end 11521 status 0
@40000000464db27f1f91f284 tcpserver: status: 33/50
@40000000464db2863079ef5c tcpserver: fatal: unable to bind: address
already used
@40000000464db291015d20f4 tcpserver: fatal: unable to bind: address
already used
@40000000464db29a125496bc tcpserver: fatal: unable to bind: address
already used
@40000000464db2a102ec311c tcpserver: fatal: unable to bind: address
already used
@40000000464db2a506afccb4 tcpserver: fatal: unable to bind: address
already used
@40000000464db2a521545b34 tcpserver: end 16989 status 256
@40000000464db2a614ecd984 tcpserver: status: 32/50
@40000000464db2aa16aa6fd4 tcpserver: fatal: unable to bind: address
already used
@40000000464db2b21e132284 tcpserver: fatal: unable to bind: address
already used
@40000000464db2b4398be154 tcpserver: status: 33/50
@40000000464db2b51910b5bc tcpserver: pid 18767 from 61.7.160.54
@40000000464db2b51f6796c4 tcpserver: status: 34/50
...
* Samples of output of "ps aux | grep -i qmail" (don't have any with
dozens of qmail-smtpd right now, since I'm manually killing these
processes before they swamp the server for the moment):
schnarff.com:/var/log/qmail/smtpd$ ps aux | grep -i
qmail-smtpd
qmaild 20862 35.5 0.3 484 1392 ?? R 10:34AM 1:58.48
/var/qmail/bin/qmail-smtpd
qmaild 23417 29.7 0.3 460 1400 ?? R 10:36AM 0:15.72
/var/qmail/bin/qmail-smtpd
qmaild 28459 27.5 0.3 476 1432 ?? R 10:36AM 0:03.64
/var/qmail/bin/qmail-smtpd
qmaild 26008 25.6 0.3 604 1416 ?? R 10:36AM 0:11.86
/var/qmail/bin/qmail-smtpd
root 13548 0.0 0.1 260 336 ?? I 10:07AM 0:00.02
supervise
qmail-smtpd
qmaild 21751 0.0 0.1 464 496 ?? S 10:33AM 0:00.01
/usr/local/bin/tcpserver -v -R -l schnarff.com -x /etc/tcp.smtp.cdb -c
50 -u 1012 -g 1005 0 smtp
/var/qmail/bin/qmail-smtpd
qmaild 26311 0.0 0.1 264 692 ?? S 10:36AM 0:00.01
/var/qmail/bin/qmail-smtpd
schnarff.com:/var/log/qmail/smtpd$ ps aux | grep -i
qmail-smtpd
qmaild 6570 95.5 0.3 376 1416 ?? R 10:32AM 0:31.39
/var/qmail/bin/qmail-smtpd
root 13548 0.0 0.1 260 336 ?? I 10:07AM 0:00.02
supervise
qmail-smtpd
qmaild 5794 0.0 0.1 316 500 ?? S 10:31AM 0:00.01
/usr/local/bin/tcpserver -v -R -l schnarff.com -x /etc/tcp.smtp.cdb -c
50 -u 1012 -g 1005 0 smtp /var/qmail/bin/qmail-smtpd
I'd appreciate any information at all on what might be causing this, and
will be more than happy to supply additional info as necessary to help
diagnose.
Thanks,
Alex Kirk
|