On Monday, May 21 at 11:34 PM, quoth Samuel Murez:
Has *anything* about the box changed? Has anything about its
environment changed?
The only thing that happens that is new is that it hangs from time to time,
about once a month, and needs to be rebooted, whereas it used to hang maybe
once every year a while ago.
That would have been nice to know. It sounds like you have something
far more serious going on on your machine. Either the kernel is
corrupt, or you have some faulty hardware or something. Unless someone
else has a great suggestion... When a machine stops being reliable (as
in, needs a reboot every now and then, because it crashes for no good
reason), *something* about it is broken in a very serious way, and
it's time to stop trusting its behavior.
I have no idea why this happens, and it doesn't seem to affect any other
functions of the server.
Not to be overly crass here, but when the machine goes down, it
affects ALL functions of the server. This just happens to be the first
one that has begun seeing other smaller effects as well. I think
digging to the bottom of an apparent software failure on a machine
with more fundamental problems is a losing game. We can't trust that
when we tell the computer to do something that it will actually do it,
DESPITE the fact that it *usually* does it, most of the time, for
other services, except that once a month when it decides not to do
anything for anyone.
[root@yoruban root]# cat /home/baronsam/shared/test_tbird_email.txt
| /var/qmail/bin/qmail-remote murez.com baronsam@samuelmurez.com
sam@murez.com
ZSorry, I wasn't able to establish an SMTP connection. (#4.4.1)
[root@yoruban root]# dig murez.com
A connection failure has nothing (causal) to do with a DNS failure
(unless the DNS information is *wrong*, and you're attempting to
connect to the wrong machine).
doesn't this make you think that there's something wrong with the
way my server's trying to send mail, rather than with the network ?
No.
Looking at our symptoms we have:
1. The computer sometimes crashes for no reason at all
(not qmail-related)
2. DNS requests sometimes fail for no reason at all
(not qmail-related)
3. qmail-remote sometimes has trouble making DNS requests
(surprise!)
4. qmail-remote sometimes has trouble making network connections
If your only symptom was #4, that would suggest your qmail-remote
configuration was incorrect, or perhaps some patch had been applied
incorrectly and qmail-remote was misbehaving.
If your only symptoms were #3 and #4, that would suggest that
qmail-remote had become corrupted, and that either strace-ing that
binary, or simply recompiling it from a fresh copy of the unmodified
source code would help.
If your only symptoms were #2 and #3, then that would suggest you are
having DNS issues, and you need to investigate your DNS setup.
If your only symptoms were #2, #3, and #4, that would suggest that
there's a problem with the network (or your networking libraries),
because multiple unrelated programs are all having difficulty
communicating over the network (difficulty that is manifesting itself
in several different ways).
But #1 suggests that there's a more fundamental problem. And the fact
that #1 started happening at the same time that the other symptoms
started happening strongly suggests that they are related. This says
to me that the other symptoms (#2, #3, and #4) are merely the growing
consequences of a slowly failing computer. Whether it's the hardware
that's going bad or the software (kernel, libraries, etc.) that's been
corrupted somehow, it doesn't sound like this is some little bug in
qmail or some little configuration tweak or something else simple. It
sounds more like you have a serious problem.
Do you have any ideas of other troubleshooting activities I could undertake
?
Unfortunately, at this point, nothing that isn't rather drastic.
~Kyle
--
Carpe per diem---seize the check.
-- Robin Williams
pgpV5VDqzjfKB.pgp
Description: PGP signature
|