by Tykling
02. oct 2016 12:22 UTC
I spotted a listen queue overflow error message in /var/log/messages on one of my jailhosts today.
I have no idea what could be causing this. This jailhost is busy so it could be a lot of things. Seems to happen about every hour, perhaps some scheduled job?
Most results when you search for the error mention TCP services that can't keep up with the connection rate to the TCP port. But since I didn't know which TCP port I had some detective work to do.
The error from messages looked like this:
Oct 1 01:17:10 bong kernel: sonewconn: pcb 0xfffff800647962d0: Listen queue overflow: 151 already in queue awaiting acceptance (1 occurrences) Oct 1 02:14:10 bong kernel: sonewconn: pcb 0xfffff800647962d0: Listen queue overflow: 151 already in queue awaiting acceptance (1 occurrences) Oct 1 03:14:11 bong kernel: sonewconn: pcb 0xfffff800647962d0: Listen queue overflow: 151 already in queue awaiting acceptance (1 occurrences) Oct 1 04:13:11 bong kernel: sonewconn: pcb 0xfffff800647962d0: Listen queue overflow: 151 already in queue awaiting acceptance (1 occurrences) Oct 1 05:11:11 bong kernel: sonewconn: pcb 0xfffff800647962d0: Listen queue overflow: 151 already in queue awaiting acceptance (1 occurrences) Oct 1 06:09:11 bong kernel: sonewconn: pcb 0xfffff800647962d0: Listen queue overflow: 151 already in queue awaiting acceptance (1 occurrences) Oct 1 07:09:11 bong kernel: sonewconn: pcb 0xfffff800647962d0: Listen queue overflow: 151 already in queue awaiting acceptance (1 occurrences) Oct 1 08:10:12 bong kernel: sonewconn: pcb 0xfffff800647962d0: Listen queue overflow: 151 already in queue awaiting acceptance (1 occurrences) Oct 1 09:10:12 bong kernel: sonewconn: pcb 0xfffff800647962d0: Listen queue overflow: 151 already in queue awaiting acceptance (1 occurrences) Oct 1 10:09:12 bong kernel: sonewconn: pcb 0xfffff800647962d0: Listen queue overflow: 151 already in queue awaiting acceptance (1 occurrences) Oct 1 11:09:12 bong kernel: sonewconn: pcb 0xfffff800647962d0: Listen queue overflow: 151 already in queue awaiting acceptance (1 occurrences) Oct 1 12:10:13 bong kernel: sonewconn: pcb 0xfffff800647962d0: Listen queue overflow: 151 already in queue awaiting acceptance (1 occurrences)
The errors all have the same pcb 0xfffff800647962d0 so I need to find the matching socket. I can use netstat(1) for this:
$ netstat -Aan | grep fffff800647962d0 fffff800647962d0 stream 0 0 fffff8021ab66938 0 0 0 public/pickup
Okay, I recognize this immediately - this isn't even a TCP socket, it is a unix socket opened by Postfix. Unfortunately the socket has a relative path, and netstat doesn't output the pid that opened the socket, so I need to find out which Postfix instance (in which jail) is not processing connections. There is quite a few of them since a lot of the jails are mailservers:
$ sockstat -l | grep public/pickup postfix pickup 94418 6 stream public/pickup postfix pickup 93468 6 stream public/pickup postfix pickup 93444 6 stream public/pickup postfix pickup 93137 6 stream public/pickup postfix pickup 92970 6 stream public/pickup postfix pickup 92777 6 stream public/pickup postfix pickup 92219 6 stream public/pickup postfix pickup 92209 6 stream public/pickup postfix pickup 92199 6 stream public/pickup postfix pickup 92174 6 stream public/pickup postfix pickup 92048 6 stream public/pickup postfix pickup 91887 6 stream public/pickup postfix pickup 91619 6 stream public/pickup postfix pickup 91580 6 stream public/pickup postfix pickup 91579 6 stream public/pickup postfix pickup 91576 6 stream public/pickup postfix pickup 91571 6 stream public/pickup postfix pickup 91549 6 stream public/pickup postfix pickup 91534 6 stream public/pickup postfix pickup 91511 6 stream public/pickup postfix pickup 91506 6 stream public/pickup postfix pickup 91456 6 stream public/pickup postfix pickup 91154 6 stream public/pickup postfix pickup 90225 6 stream public/pickup postfix pickup 89419 6 stream public/pickup postfix pickup 88126 6 stream public/pickup postfix pickup 46251 6 stream public/pickup postfix pickup 44779 6 stream public/pickup postfix pickup 44654 6 stream public/pickup postfix pickup 40736 6 stream public/pickup postfix pickup 40649 6 stream public/pickup postfix pickup 40604 6 stream public/pickup postfix pickup 37761 6 stream public/pickup root master 9692 18 stream public/pickup root master 37916 18 stream public/pickup root master 41241 18 stream public/pickup root master 7695 18 stream public/pickup root master 10668 18 stream public/pickup root master 47108 18 stream public/pickup root master 93509 18 stream public/pickup root master 52179 18 stream public/pickup root master 20236 18 stream public/pickup root master 5484 17 stream public/pickup root master 5163 17 stream public/pickup root master 65010 18 stream public/pickup root master 28457 18 stream public/pickup root master 19717 18 stream public/pickup root master 12271 18 stream public/pickup root master 93695 18 stream public/pickup root master 10337 18 stream public/pickup root master 7350 18 stream public/pickup root master 7080 18 stream public/pickup root master 6789 18 stream public/pickup root master 6525 18 stream public/pickup root master 6206 18 stream public/pickup root master 5799 18 stream public/pickup root master 5543 18 stream public/pickup root master 5281 18 stream public/pickup root master 5006 18 stream public/pickup root master 4720 30 stream public/pickup root master 4410 18 stream public/pickup root master 4131 18 stream public/pickup root master 3867 18 stream public/pickup root master 3550 18 stream public/pickup root master 3254 18 stream public/pickup root master 2568 17 stream public/pickup root master 1945 17 stream public/pickup root master 1079 17 stream public/pickup
$ netstat -Lan | grep 151 unix 151/0/100 public/pickupThe numbers are described in netstat(1):
If -L is also present, show the size of the various listen queues. The first count shows the number of unaccepted connections, the second count shows the amount of unaccepted incomplete connections, and the third count is the maximum number of queued connections.The "Listen queue overflow" error happens when a queue is more than 150% full. The queue size is 100 meaning that a queue of 151 pending triggers the error.
for jail in $(jls -h name | tail +2); do 
    sudo jexec $jail netstat -Aan 2>/dev/null | grep -q fffff800647962d0;
    if [ $? -eq 0 ]; then
        echo "found in jail $jail";
    fi;
done
This output the jail name and I was able to go fix Postfix in the jail and the errors went away. The problem was a missing main.cf config making the postfix pickup process unable to.. pickup. The connections were just queued up on the socket, until it reached the limit.