by Tykling
15. apr 2025 14:05 UTC
Hello pirates!
Yesterday I stumbled upon an issue that was sticky enough that it felt appropriate to document the journey here. It is the story of the PostgreSQL client libpq
acting weird when running in an environment where $HOME
is set to an existing but unreadable path (which, admittedly, is a weird thing to have in an environment). Whether or not this is a bug is a matter of interpretation, but after getting to the bottom of it I wanted to take some notes.
I was minding my own business and checking some PostgreSQL server logs when I noticed log messages like this one from some Django servers, for example 2a09:94c4:55d1:76a0::5
:
Apr 15 11:48:15 postgres1 postgres[3988]: [7-1] 2025-04-15 11:48:15.129 UTC [3988] [unknown]@2a09:94c4:55d1:76a0::5(54996)@[unknown]/[unknown] LOG: could not accept SSL connection: EOF detected
The error tells me that the client with IP 2a09:94c4:55d1:76a0::5
, a Django server, is unable to establish TLS connections to Postgres. My PostgreSQL servers all have real hostnames and certificates from LetsEncrypt, so there is no reason why the TLS connection should fail here. The websites all work despite of this error because of two things:
sslmode
in libpq
which is prefer, which means first try an SSL connection; if that fails, try a non-SSL connectionpg_hba.conf
allowed both plaintext and encrypted connections (since this is all internal between FreeBSD jails on the same host). This means they were allowed to fallback to plaintext connections if TLS fails.This means that the setting sslmode=prefer
is in effect suppressing the TLS connection error, because Django just falls back to plaintext (and Postgres lets it). So the first thing I needed to do was tickle out an error message from the TLS connection attempt. The mission was clear:
hostssl
in pg_hba.conf
.
Since step 1 would likely break the website (forcing TLS when TLS doesn't work tends to do that) the plan was to start out playing with a staging site where downtime doesn't matter, before touching any of the production sites.
To force always TLS for Djangos DB connection I went to Djangos settings.py
file in the DATABASES
setting and added these two options:
'OPTIONS': {'sslmode': 'verify-full', 'sslrootcert': 'system'}
To see it in context see this commit. The two options mean:
/etc/ssl/certs
). Failing to add this option results in an error like failed: root certificate file "/var/db/postgres/.postgresql/root.crt" does not exist
Either provide the file, use the system's trusted roots with sslrootcert=system, or change sslmode to disable server certificate verification.
Adding these settings did break the websites database connection, but finally I got a python backtrace showing the error. The error message was somewhat confusing though:
could not open certificate file "/root/.postgresql/postgresql.crt": Permission denied
This error message was confusing for a number of reasons. I don't run PostgreSQL or uwsgi as root, I don't have any certificates in /root
, the server had no references to /root/.postgresql/postgresql.crt
anywhere, and what the hell was going on. Where did this path come from, what is trying to read from it, and why? Something something Cotton Eyed Joe.
To add to the mystery the Django manage.py
commands shell
and dbshell
still both worked great. They both use the configuration from Djangos settings.py
proving that the sslmode
and sslrootcert
additions I made were valid and functional.
So to sum up, after adding 'OPTIONS': {'sslmode': 'verify-full', 'sslrootcert': 'system'}
to force Django to use TLS for database connections the uwsgi
website broke but manage.py shell
still works well. Mystery? mystery!
The website runs under uwsgi
as user www
, and uwsgi
is run by the process supervisor supervisord
which runs as root
. Since the TLS error only happens when using Django through the website, and not when using it through the shell, the error is likely somewhere in uwsgi
or supervisord
.
While playing with verify-full
using psql
on the command-line I saw an example of what the error looks like when libpq
can't find the root CA certificate to verify the connection.
Observe the difference with and without sslrootcert=system
(in this case testing locally on the postgres server):
[postgres@postgres1 ~]$ psql "host=postgres1.servers.bornhack.org sslmode=verify-full" psql: error: connection to server at "postgres1.servers.bornhack.org" (2a09:94c4:55d1:76a0::4), port 5432 failed: root certificate file "/var/db/postgres/.postgresql/root.crt" does not exist Either provide the file, use the system's trusted roots with sslrootcert=system, or change sslmode to disable server certificate verification. [postgres@postgres1 ~]$ psql "host=postgres1.servers.bornhack.org sslmode=verify-full sslrootcert=system" psql (16.8) SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off) Type "help" for help. postgres=# \q [postgres@postgres1 ~]$
Note that the error message root certificate file "/var/db/postgres/.postgresql/root.crt" does not exist
has different wording from the one I got , which was could not open certificate file "/root/.postgresql/postgresql.crt": Permission denied
. It is looking in a different homedir of course, but it is also looking for root.crt
instead of postgresql.crt
. This would later turn out to be significant.
I did a lot of digging around, verifying along the way that the connection parameters did make it all the way to psycopg2
. After a LOT of print()
-based debugging I slowly narrowed the differences between the working dbshell
invocation and the non-functional uwsgi
invocation of the same Django code and config.
At some point I started digging into the environment Django runs under when calling dbshell
versus under uwsgi
and figured out that if I set HOME
in the supervisord
environment for uwsgi
the error went away! It didn't seem to matter what value HOME
was set to, it didn't even have to be a valid path, any value (even an empty string!) seemed to fix it.
With a functional workaround in hand I could have stopped there, but I wanted to get to the bottom of it. Something about having HOME=/root
in the environment was causing something to error out instead of just ignoring HOME
and running as if there was no HOME
(which was clearly possible since HOME=
also worked).
After a long evening of narrowing it further I eventually had a way to reproduce it outside uwsgi
and supervisord
using psql
directly:
[postgres@postgres1 ~]$ echo $HOME /var/db/postgres [postgres@postgres1 ~]$ psql "host=postgres1.servers.bornhack.org sslmode=verify-full sslrootcert=system" psql (16.8) SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off) Type "help" for help. postgres=# \q [postgres@postgres1 ~]$ HOME= psql "host=postgres1.servers.bornhack.org sslmode=verify-full sslrootcert=system" psql (16.8) SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off) Type "help" for help. postgres=# \q [postgres@postgres1 ~]$ HOME=/nonexistant psql "host=postgres1.servers.bornhack.org sslmode=verify-full sslrootcert=system" psql (16.8) SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off) Type "help" for help. postgres=# \q could not save history to file "/nonexistant/.psql_history": No such file or directory [postgres@postgres1 ~]$ HOME=/root psql "host=postgres1.servers.bornhack.org sslmode=verify-full sslrootcert=system" psql: error: connection to server at "postgres1.servers.bornhack.org" (2a09:94c4:55d1:76a0::4), port 5432 failed: could not open certificate file "/root/.postgresql/postgresql.crt": Permission denied [postgres@postgres1 ~]$
In other words: Setting HOME
in the environment to any path which exists but is not readable fails with the above error.
I went digging in the libpq
code and eventually found the codepath being taken which leads to this error.
initialise_SSL()
in this section which determines whether or not to use a home directory, setting the bool have_homedir
.
pqGetHomeDirectory()
which is defined here which simply checks if HOME
is defined in the environment. I would argue that this code should also check if the contents of HOME
is a valid and readable path before returning true
.have_homedir=true
will use a default path for client certificate if one has not been defined in the connection settings, the default client certificate path is defined here and here resulting in the path /root/.postgresql/postgresql.crt
being accessed, triggering the error.
This finally explained why I was seeing this issue only in uwsgi
under supervisord
. TLS connections never worked in this setup, this error has always been there. I just never noticed because plaintext fallback was enabled.
The error message was never about reading the CA certificate (even though I initially thought so, since that was the setting I changed!). The error message was about trying to read a client certificate which libpq
will always attempt to do if run in an env with HOME
set to a non-empty string. If the error returned from attempting to read from this path is anything other than ENOENT
or ENOTDIR
an error is raised. In other words, if the path exists but is not readable. This finally explains why my workaround of setting HOME
worked:
supervisord
inherit the environment which supervisord
itself runs under
supervisord
runs as root
everything it runs inherits an environment with HOME=/root
- regardless of the user the process runs as.
HOME
in the supervisord
config for uwsgi
effectively overrides the HOME=/root
so uwsgi
no longer inherits it.
manage.py shell
or manage.py dbshell
worked fine, since those are run with HOME
set to the homedir of the user running the command
Once I understood the issue the fix was trivial. This commit to my Ansible roles makes HOME=
the new default for uwsgi
stuff run under my supervisord
instances. Arguably this is how the supervisord
default should always have been. Or maybe processes should inherit HOME
for the user the process runs as (www
in this case). The last remaining jobs were:
DATABASES
settings to include 'OPTIONS': {'sslmode': 'verify-full', 'sslrootcert': 'system'}
so TLS is always used
pg_hba.conf
files to use hostssl
instead of host
so plaintext connections are disallowed.
This was quite a tricky error to get to the bottom of. A couple of simple changes to libpq
would have made this much less painful:
pqGetHomeDirectory()
should IMO return false
if HOME
is not readable. This would have avoided the issue entirely, and arguably an unreadable HOME
should be treated the same as a non-existing HOME
client
somewhere so it is clear what it is attempting to read: The message could not open certificate file "/root/.postgresql/postgresql.crt": Permission denied
would have been more clear if it read could not open client certificate file "/root/.postgresql/postgresql.crt": Permission denied
I hope this blogpost can help other people stuck with this issue. I wanted to send a patch to fix the two small things mentioned above, but since Postgres do not accept contributions via pull requests on Github and their contributing guidelines is TL;DR I will sadly leave it up to someone else to fix this.
I've somewhat recently signed up for Github Sponsors meaning it is now easy to sponsor me and my work. If this post or some of my other writing, software or services have helped you then you can consider becoming a sponsor.