FreeBSD, Jails and SYSV IPC

by Tykling


23. mar 2018 09:45 UTC


I have never been a big fan of software using SYSV IPC shared memory and semaphores. I love PostgreSQL but over the years its use of SYSV IPC has caused various issues for me. I use Zabbix as well, which makes heavy use of shared memory.

I run all my stuff in FreeBSD jails which used to make things even more complicated, because SYSV IPC stuff wasn't namespaced, so two jails with allow.sysvipc=1 could see and modify each others shared memory and semaphores - not ideal. In FreeBSD 11 and beyond this is no longer an issue, as I will demonstrate later in this post.

Tuning the amount of shared memory and semaphores available to the system it is not trivial. Some of the limits are /boot/loader.conf tunables which can only be changed at reboot. Applications don't always document very well how much they will use. If you have more than one jail using shared memory and/or semaphores on a jailhost the tuning becomes even more tricky. People tend to just throw larger and larger numbers in /boot/loader.conf until it works. That is not how I like to work with systems. I do. not. blindly copy/paste stuff from websites without clearly understanding what it is.

Until now, SYSV IPC has been the exception to this rule. I have tried a bunch of times to understand what it is and how it works (from a sysadmin perspective), and failed miserably each time. This means that when something does go wrong it isn't exactly clear what I am supposed to do about it. This blogpost is an attempt to demystify the thing for future reference.

Background

Shared Memory, Semaphores and Message Queues are collectively known as SYSV IPC. Shared Memory is used when sofware wants to share a chunk of memory between processes. Semaphores are used for interprocess communication. They are often used to check and manage allocation of resources such as shared memory. Message Queues will not be covered in this blogpost.

Shared Memory

Applications shared memory usage is limited by the kernel, so finding and understanding those limits seems like a good place to start. After looking into the limits I will look at the current resource consumption and figure how to calculate how high the limits actually need to be.

Shared Memory Limits

ipcs -M can be used to show the currently active limits for shared memory:

[tsr@sorthat /usr/src]$ ipcs -M
shminfo:
        shmmax:    536870912    (max shared memory segment size)
        shmmin:            1    (min shared memory segment size)
        shmmni:          192    (max number of shared memory identifiers)
        shmseg:          128    (max shared memory segments per process)
        shmall:      4097152    (max amount of shared memory in pages)

The above values map directly to the following sysctls:

kern.ipc.shmall: 4097152
kern.ipc.shmseg: 128
kern.ipc.shmmni: 192
kern.ipc.shmmin: 1
kern.ipc.shmmax: 536870912

Some of these are tunable only in /boot/loader.conf, others can be set with sysctl:

[tsr@svaneke ~]$ sudo sysctl kern.ipc.shmseg=129
sysctl: oid 'kern.ipc.shmseg' is a read only tunable
sysctl: Tunable values are set in /boot/loader.conf
[tsr@svaneke ~]$ sudo sysctl kern.ipc.shmall=131072
kern.ipc.shmall: 131073 -> 131072
[tsr@svaneke ~]$

Nowadays the defaults for shared memory are usually more than enough for a PostgreSQL jail, because a modern PostgreSQL only uses a very small amount of shared memory (just a single segment of 48 bytes per server). Zabbix is a different story, it uses shared memory a lot so kern.ipc.shmall will need tuning.

kern.ipc.shmall

kern.ipc.shmall limits the total amount of shared memory, in pages. The pagesize command shows the pagesize of the running system. Combining the two allows us to see how much memory in bytes the system is allowed to use for shared memory:

[tsr@sorthat ~]$ echo "$(sysctl -n kern.ipc.shmall) * $(pagesize)"
4097152 * 4096
[tsr@sorthat ~]$ !! | bc
echo "$(sysctl -n kern.ipc.shmall) * $(pagesize)" | bc
16781934592
[tsr@sorthat ~]$ 

So, 16781934592 bytes, or almost 17GB. The default value of kern.ipc.shmall is 131072 pages (defined here). That means a default FreeBSD 11 system has 536870912 bytes or around 500MB available for shared memory. That is not enough for Zabbix which is why the system above has kern.ipc.shmall=4097152 in /etc/sysctl.conf.

Calculating exactly how much Zabbix will need is not very well documented. The documentation just says to increase kern.ipc.shmall to 2097152 pages and kern.ipc.shmmax to 134217728 bytes (128MB) (the default is 536870912 (512MB) so no need for that, must be old advice I guess).

kern.ipc.shmseg and kern.ipc.shmmax

kern.ipc.shmseg limits how many shared memory segments can be allocated, the default is 128. kern.ipc.shmmax limits the maximum size of each segment, the default is 536870912 bytes, or around 500MB. Even though this system has a higher kern.ipc.shmall than the default, it is still true that kern.ipc.shmseg * kern.ipc.shmmax > kern.ipc.shmall so it follows that the system will not be able to allocate all of them at their max size due to the limits imposed by kern.ipc.shmall:

[tsr@sorthat ~]$ echo "$(sysctl -n kern.ipc.shmmax) * $(sysctl -n kern.ipc.shmseg)" | bc
68719476736
[tsr@sorthat ~]$ 

So, 68719476736 bytes, or almost 68GB, which is much higher than the total maximum allowed 16781934592 bytes (16GB) enforced by kern.ipc.shmall. This is not going to be an issue, it simply means that _every_ segment cannot be the _maximum_ size at the _same_ time.

Shared Memory Usage

Now that I know the limits imposed by the kernel it is time to look at the current shared memory usage. The ipcs -ma command can show details:

[tsr@sorthat ~]$ ipcs -ma
Shared Memory:
T           ID          KEY MODE        OWNER    GROUP    CREATOR  CGROUP         NATTCH        SEGSZ         CPID         LPID ATIME    DTIME    CTIME   
m       458752            0 --rw------- 770      770      770      770               190           48        21599        21599  8:14:33 14:07:38  8:14:33
m       458753            0 --rw------- 122      122      122      122               144     16777216        22110        22110  8:15:28 14:07:34  8:15:28
m       458754            0 --rw------- 122      122      122      122               144      8388608        22110        22110  8:15:28 14:07:34  8:15:28
m       458755            0 --rw------- 122      122      122      122               144     16777216        22110        22110  8:15:28 14:07:34  8:15:28
m       458756            0 --rw------- 122      122      122      122               144    456340276        22110        22110  8:15:28 14:07:34  8:15:28
m       786437            0 --rw------- 122      122      122      122               144     80530636        22110        22110  8:15:28 14:07:34  8:15:28
m       786438            0 --rw------- 122      122      122      122               144        36252        22110        22110  8:15:28 14:07:34  8:15:28
m       524295            0 --rw------- 122      122      122      122               144    536870912        22110        22110  8:15:28 14:07:34  8:15:28
m       524296            0 --rw------- 122      122      122      122                97     16777216        22711        22711  8:15:37 no-entry  8:15:37
m       524297            0 --rw------- 122      122      122      122                97      4194304        22711        22711  8:15:37 no-entry  8:15:37
m       524298            0 --rw------- 122      122      122      122                97     57042535        22711        22711  8:15:37 no-entry  8:15:37
m       524299            0 --rw------- 122      122      122      122                97     10066329        22711        22711  8:15:37 no-entry  8:15:37
m       589836            0 --rw------- 122      122      122      122                97        24408        22711        22711  8:15:37 no-entry  8:15:37

[tsr@sorthat ~]$ 

Using standard unix tools we can sum up all the values in the SEGSZ column and we get the total number of bytes of shared memory current in use on the system:

[tsr@sorthat ~]$ echo $(ipcs -ma | cut -w -f 10 | grep -v SEGSZ | grep -v "^$" | tr "\n" "+" | sed "s/+$//") | bc
1203825956
[tsr@sorthat ~]$ 

So 1203825956 bytes or just around 1.2GB. Nowhere near the limits we have above.

The number of allocated segments is also comfortably below the limit of 128 set by kern.ipc.shmseg:

[tsr@sorthat ~]$ ipcs -ma | wc -l
      16
[tsr@sorthat ~]$ 

(substract 3 from the number to get the precise count without header and empty lines)

Most of the shared memory segments shown above belong to UID 122 except for the 48 byte one which belongs to UID 770. Since I am running ipcs on the jailhost (as opposed to inside a jail) the UIDs and GIDs cannot be resolved since the local /etc/passwd does not contain know about them. It is easy to find out what it might be though:

[tsr@sorthat ~]$ grep 770 /usr/jails/*/etc/passwd
/usr/jails/postgres4.sorthat.servers.bornfiber.dk/etc/passwd:postgres:*:770:770:PostgreSQL Daemon:/var/db/postgres:/bin/sh
[tsr@sorthat ~]$ grep 122 /usr/jails/*/etc/passwd
/usr/jails/zabbix2.servers.bornfiber.dk/etc/passwd:zabbix:*:122:122:Zabbix NMS:/nonexistent:/bin/sh
/usr/jails/zabbixproxy1.servers.bornfiber.dk/etc/passwd:zabbix:*:122:122:Zabbix NMS:/nonexistent:/bin/sh
[tsr@sorthat ~]$ 

So UID 770 is PostgreSQL and UID 122 is Zabbix. The CPID column also contains the process ID of the process which created the shared memory segment, and the LPID column contains the pid of the process which last did an operation on the segment.

So now I know:

  • How much total shared memory is currently in use
  • How much shared memory I can use in total across the whole server
  • How many shared memory allocations I currently have on the server
  • How many shared memory allocations I can make in total
  • Which userids and processes in which jails are using the shared memory

Time to look at Semaphores!

Semaphore Limits

ipcs -S can be used to show the current limits for Semaphores:

[tsr@sorthat ~]$ ipcs -S
seminfo:
        semmni:           50    (# of semaphore identifiers)
        semmns:          340    (# of semaphores in system)
        semmnu:          150    (# of undo structures in system)
        semmsl:          340    (max # of semaphores per id)
        semopm:          100    (max # of operations per semop call)
        semume:           50    (max # of undo entries per process)
        semusz:          632    (size in bytes of undo structure)
        semvmx:        32767    (semaphore maximum value)
        semaem:        16384    (adjust on exit max value)

[tsr@sorthat ~]$ 

The values shown above are the defaults for FreeBSD 11. They map to the following sysctls:

[tsr@sorthat ~]$ sysctl kern.ipc | grep sem
kern.ipc.semaem: 16384
kern.ipc.semvmx: 32767
kern.ipc.semusz: 632
kern.ipc.semume: 50
kern.ipc.semopm: 100
kern.ipc.semmsl: 340
kern.ipc.semmnu: 150
kern.ipc.semmns: 340
kern.ipc.semmni: 50
[tsr@sorthat ~]$ 

The important ones for PostgreSQL are kern.ipc.semmni (maximum number of semaphore sets) and kern.ipc.semmns (maximum number of semaphores). Note that the PostgreSQL documentation on this says to also set kern.ipc.semmnu=256 in the example for FreeBSD, but it also says Various other settings related to "semaphore undo", such as SEMMNU and SEMUME, do not affect PostgreSQL. so I am not setting kern.ipc.semmnu. These are /boot/loader.conf tunables, they are readonly when using sysctl.

kern.ipc.semmni

This setting limits the maximum number of semaphore sets for the system.

Calculating this for PostgreSQL can be done using the following formula from the docs: ceil((max_connections + autovacuum_max_workers + max_worker_processes + 5) / 16). On this server max_connections is 200, autovacuum_max_workers is at the default of 3, and max_worker_processes is at the default of 8. This means we have (200 + 3 + 8 + 5) / 16 = 13.5 which we round up to 14. The default setting of 50 on FreeBSD should be plenty as long as only PostgreSQL uses semaphores.

Zabbix appears to use 1 semaphore set per server, and does not mention semaphores in the documentation.

kern.ipc.semmns

This setting limits the maximum number of semaphores on the system.

Calculating how many semaphores PostgreSQL needs can be done using the following formula from the docs: ceil((max_connections + autovacuum_max_workers + max_worker_processes + 5) / 16) * 17. Given the values above we end up with ((200 + 3 + 8 + 5) / 16) * 17 = 229.5 which we round up to 230. Again, the default setting of 340 on FreeBSD should be plenty as long as only PostgreSQL uses semaphores.

Zabbix appears to use 14 semaphores per server, and does not mention semaphores in the documentation.

Semaphore Usage

The familiar ipcs command can show the current semaphore usage:

[tsr@sorthat ~]$ ipcs -as
Semaphores:
T           ID          KEY MODE        OWNER    GROUP    CREATOR  CGROUP          NSEMS OTIME    CTIME   
s       458752            0 --rw------- 770      770      770      770                17 17:26:50  8:14:33
s       524289            0 --rw------- 770      770      770      770                17 17:26:16  8:14:33
s       589826            0 --rw------- 770      770      770      770                17 17:22:25  8:14:33
s       589827            0 --rw------- 770      770      770      770                17 17:21:58  8:14:33
s       589828            0 --rw------- 770      770      770      770                17 17:26:05  8:14:33
s       589829            0 --rw------- 770      770      770      770                17 17:26:51  8:14:33
s       589830            0 --rw------- 770      770      770      770                17 17:26:23  8:14:33
s       589831            0 --rw------- 770      770      770      770                17 17:26:11  8:14:33
s       589832            0 --rw------- 770      770      770      770                17 17:25:59  8:14:33
s       589833            0 --rw------- 770      770      770      770                17 17:26:26  8:14:33
s       589834            0 --rw------- 770      770      770      770                17 17:21:00  8:14:33
s       589835            0 --rw------- 770      770      770      770                17 17:20:19  8:14:33
s       589836            0 --rw------- 770      770      770      770                17 17:25:44  8:14:33
s       589837            0 --rw------- 770      770      770      770                17 17:26:44  8:14:33
s       589838            0 --rw------- 122      122      122      122                14 17:26:54  8:15:28
s       589839            0 --rw------- 122      122      122      122                14 17:26:54  8:15:37

[tsr@sorthat ~]$ 

Each line represents a semaphore set:

[tsr@sorthat ~]$ ipcs -as | wc -l
      19
[tsr@sorthat ~]$ 

(substract 3 from the number to get the precise count without header and empty lines)

So I am currently using 16 out of the permitted 50 semaphore sets (kern.ipc.semmni).

By adding up the numbers in the NSEMS column we can see the number of semaphores currenly in use:

[tsr@sorthat ~]$ echo "$(ipcs -as | cut -w -f 9 | egrep -v "(^$|Semaphores|NSEMS)" | tr "\n" "+" | sed "s/+$//")" | bc
266
[tsr@sorthat ~]$ 

And I am currently using 266 out of the permitted 340 semaphores (kern.ipc.semmns).

So now I know:

  • How many semaphore sets I can use on this system
  • How many semaphore sets I am currently using, and the UID that created them
  • How many semaphores I can use on the system
  • How many semaphores are currently in use, and the UID that created them

It would make sense to add these metrics to some monitoring, but that is an exercise for a future blogpost.

SYSV IPC in FreeBSD Jails

FreeBSD jails all share the same kernel. When something in a jail needs SYSV IPC the jails has to be given permission to use it.

Before FreeBSD 11 SYSV IPC resources were not namespaced, and you could only enable everything with allow.sysvipc=1 or enable nothing at all. The primary problem with this is that you use jails to seperate services, in case one of them gets compromised. But imagine a jailhost with two seperate jails A and B, which both use SYSV IPC stuff. Jail A gets owned, and is now able to read and modify the SYSV IPC resources for jail B. Clearly not ideal.

The old advice was to run the services in the jails with different UIDs, but that advice only helps as long as your intruder doesn't get root. See below for a view from inside a jail, which can also see the SYSV IPC resources from another jail on the same jailhost. This is from inside a jail with allow.sysvipc=1:

[tsr@postgres4 ~]$ ipcs
Message Queues:
T           ID          KEY MODE        OWNER    GROUP   

Shared Memory:
T           ID          KEY MODE        OWNER    GROUP   
m        65536            0 --rw------- 122      122     
m        65537            0 --rw------- 122      122     
m        65538            0 --rw------- 122      122     
m        65539            0 --rw------- 122      122     
m        65540            0 --rw------- 122      122     
m        65541            0 --rw------- 122      122     
m        65542            0 --rw------- 122      122     
m        65543            0 --rw------- 122      122     
m        65544            0 --rw------- 122      122     
m        65545            0 --rw------- 122      122     
m        65546            0 --rw------- 122      122     
m        65547            0 --rw------- 122      122     
m       131084      5432001 --rw------- postgres postgres

Semaphores:
T           ID          KEY MODE        OWNER    GROUP   
s        65536            0 --rw------- 302      302     
s        65537            0 --rw------- 122      122     
s        65538            0 --rw------- 122      122     
s       131075      5432001 --rw------- postgres postgres
s       131076      5432002 --rw------- postgres postgres
s       131077      5432003 --rw------- postgres postgres
s       131078      5432004 --rw------- postgres postgres
s       131079      5432005 --rw------- postgres postgres
s       131080      5432006 --rw------- postgres postgres
s       131081      5432007 --rw------- postgres postgres
s       131082      5432008 --rw------- postgres postgres
s       131083      5432009 --rw------- postgres postgres
s       131084      5432010 --rw------- postgres postgres
s       131085      5432011 --rw------- postgres postgres
s       131086      5432012 --rw------- postgres postgres
s       131087      5432013 --rw------- postgres postgres
s       131088      5432014 --rw------- postgres postgres

[tsr@postgres4 ~]$ 

The semaphores and shared memory shown with a numeric UID are the ones that do not belong to this jail. The root user in this jail is able to modify or delete these, even though they belong to another jail.

FreeBSD 11 solves this in an elegant way:

In FreeBSD 11 allow.sysvipc=1 is no longer recommended, instead three new permissions has been introduced:

  • sysvshm: Controls access to shared memory
  • sysvsem: Controls access to SYSV semaphores
  • sysvmsg: Controls access to SYSV message queues

Each of these can have three values:

  • disable: Disables access to this type of resource (default)
  • inherit: Makes the jail inherit the global SYSV namespace (the old behaviour, same as allow.sysvipc=1)
  • new: Creates a new seperate SYSV namespace for this jail. This is what you want.

So the example above with a PostgreSQL jail which needs shared memory and semaphores I add sysvshm=new and sysvsem=new instead of allow.sysvipc=1 in FreeBSD 11 and beyond. Seen from the jail it looks the same except no entries from other jails are visible:

[tsr@postgres4 ~]$ ipcs
Message Queues:
T           ID          KEY MODE        OWNER    GROUP   

Shared Memory:
T           ID          KEY MODE        OWNER    GROUP   
m       131084      5432001 --rw------- postgres postgres

Semaphores:
T           ID          KEY MODE        OWNER    GROUP   
s       131075      5432001 --rw------- postgres postgres
s       131076      5432002 --rw------- postgres postgres
s       131077      5432003 --rw------- postgres postgres
s       131078      5432004 --rw------- postgres postgres
s       131079      5432005 --rw------- postgres postgres
s       131080      5432006 --rw------- postgres postgres
s       131081      5432007 --rw------- postgres postgres
s       131082      5432008 --rw------- postgres postgres
s       131083      5432009 --rw------- postgres postgres
s       131084      5432010 --rw------- postgres postgres
s       131085      5432011 --rw------- postgres postgres
s       131086      5432012 --rw------- postgres postgres
s       131087      5432013 --rw------- postgres postgres
s       131088      5432014 --rw------- postgres postgres

[tsr@postgres4 ~]$ 

This is very, very nice (and about time). Going back to my early beginnings with FreeBSD jails I have been wondering when this would get fixed properly. Yay!

Troubleshooting

Today (March 2018) I was called in on an issue where a PostgreSQL server was unable to start after a crash because of what I was told was suspected diskspace issues. I was greeted with the familiar message:

[tsr@postgres4 /usr/home/tsr]$ sudo service postgresql start
Password:
pg_ctl: another server might be running; trying to start server anyway
FATAL:  could not create semaphores: No space left on device
DETAIL:  Failed system call was semget(5432005, 17, 03600).
HINT:  This error does *not* mean that you have run out of disk space.  It occurs when either the system limit for the maximum number of semaphore sets (SEMMNI), or the system wide maximum number of semaphores (SEMMNS), would be exceeded.  You need to raise the respective kernel parameter.  Alternatively, reduce PostgreSQL's consumption of semaphores by reducing its max_connections parameter.
        The PostgreSQL documentation contains more information about configuring your system for PostgreSQL.
LOG:  database system is shut down
pg_ctl: could not start server
Examine the log output.
[tsr@postgres4 /usr/home/tsr]$

This has nothing to do with diskspace of course, as the message says, it has to do with semaphore limits. So I checked the current status of SYSV IPC ressource usage with ipcs -a:

[tsr@sorthat ~]$ ipcs -a
Message Queues:
T           ID          KEY MODE        OWNER    GROUP    CREATOR  CGROUP                 CBYTES                 QNUM               QBYTES        LSPID        LRPID STIME    RTIME    CTIME   

Shared Memory:
T           ID          KEY MODE        OWNER    GROUP    CREATOR  CGROUP         NATTCH        SEGSZ         CPID         LPID ATIME    DTIME    CTIME   
m       458752            0 --rw------- 770      770      770      770               190           48        21599        21599  8:14:33 13:47:36  8:14:33
m       458753            0 --rw------- 122      122      122      122               144     16777216        22110        22110  8:15:28 13:46:59  8:15:28
m       458754            0 --rw------- 122      122      122      122               144      8388608        22110        22110  8:15:28 13:46:59  8:15:28
m       458755            0 --rw------- 122      122      122      122               144     16777216        22110        22110  8:15:28 13:46:59  8:15:28
m       458756            0 --rw------- 122      122      122      122               144    456340276        22110        22110  8:15:28 13:46:59  8:15:28
m       786437            0 --rw------- 122      122      122      122               144     80530636        22110        22110  8:15:28 13:46:59  8:15:28
m       786438            0 --rw------- 122      122      122      122               144        36252        22110        22110  8:15:28 13:46:59  8:15:28
m       524295            0 --rw------- 122      122      122      122               144    536870912        22110        22110  8:15:28 13:46:59  8:15:28
m       524296            0 --rw------- 122      122      122      122                97     16777216        22711        22711  8:15:37 no-entry  8:15:37
m       524297            0 --rw------- 122      122      122      122                97      4194304        22711        22711  8:15:37 no-entry  8:15:37
m       524298            0 --rw------- 122      122      122      122                97     57042535        22711        22711  8:15:37 no-entry  8:15:37
m       524299            0 --rw------- 122      122      122      122                97     10066329        22711        22711  8:15:37 no-entry  8:15:37
m       589836            0 --rw------- 122      122      122      122                97        24408        22711        22711  8:15:37 no-entry  8:15:37

Semaphores:
T           ID          KEY MODE        OWNER    GROUP    CREATOR  CGROUP          NSEMS OTIME    CTIME   
s       458752            0 --rw------- 770      770      770      770                17 13:47:25  8:14:33
s       524289            0 --rw------- 770      770      770      770                17 13:39:02  8:14:33
s       589826            0 --rw------- 770      770      770      770                17 13:37:52  8:14:33
s       589827            0 --rw------- 770      770      770      770                17 13:45:18  8:14:33
s       589828            0 --rw------- 770      770      770      770                17 13:47:01  8:14:33
s       589829            0 --rw------- 770      770      770      770                17 13:47:42  8:14:33
s       589830            0 --rw------- 770      770      770      770                17 13:46:47  8:14:33
s       589831            0 --rw------- 770      770      770      770                17 13:47:11  8:14:33
s       589832            0 --rw------- 770      770      770      770                17 13:46:51  8:14:33
s       589833            0 --rw------- 770      770      770      770                17 13:45:57  8:14:33
s       589834            0 --rw------- 770      770      770      770                17 13:46:33  8:14:33
s       589835            0 --rw------- 770      770      770      770                17 13:34:15  8:14:33
s       589836            0 --rw------- 770      770      770      770                17 13:47:33  8:14:33
s       589837            0 --rw------- 770      770      770      770                17 13:47:33  8:14:33
s       589838            0 --rw------- 122      122      122      122                14 13:47:44  8:15:28
s       589839            0 --rw------- 122      122      122      122                14 13:47:44  8:15:37

[tsr@sorthat ~]$ 

Obviously PostgreSQL isn't running (since it refused to start), and I had already shut down the Zabbix jails earlier in a frenzy to try to get PostgreSQL to start up. So nothing should be using SYSV IPC ressources at all. Yet there they were, plain as day. Somehow they had not been cleaned up properly and the lingering semaphores were now preventing PostgreSQL from starting.

Since no running jails were using Shared Memory or Semaphores I could use ipcrm -W to clean up everything:

[tsr@sorthat ~]$ sudo ipcrm -W
[tsr@sorthat ~]$ ipcs -t
Message Queues:
T           ID          KEY MODE        OWNER    GROUP    STIME    RTIME    CTIME

Shared Memory:
T           ID          KEY MODE        OWNER    GROUP    ATIME    DTIME    CTIME

Semaphores:
T           ID          KEY MODE        OWNER    GROUP    OTIME    CTIME
[tsr@sorthat ~]$

This command should be used with care, be very sure you know what you are doing. It should only be used if you are certain nothing else is running which needs the shared memory or semaphores. ipcrm also has switches to delete individual semaphore sets or shared memory segments in cases where that is needed.

After cleaning up the old semaphores PostgreSQL started up without any problems. After that I started the Zabbix jails again, and then I started writing this blogpost so I never have to go through this again.

Conclusion

I have no idea why PostgreSQL crashed in the first place. I also have no idea why it was unable to clean up the lingering semaphores after the crash. But at least I know how to find and remove any lingering semaphores in case it happens again. I will also increase the semaphore limit kern.ipc.semmns to a large enough value that it can handle at least twice what PostgreSQL needs, so if this happens again it should still be able to start.

I kind of feel like the FreeBSD rc.d init script PostgreSQL should run ipcrm to clean up any lingering stuff before starting it, but people on #postgres on Freenode seemed to disagree.

PostgreSQL 10 uses Posix Semaphores instead of SYSV IPC semaphores, which will make the problem with semaphores for PostgreSQL go away entirely.

Search this blog