Ansible and Unix File Permissions

by Tykling


19. nov 2016 13:35 UTC


Background

Earlier this week I was pretty surprised to see some weird permissions on some nginx config files on my servers. The servers are managed by Ansible so I suspected some changes I made to my ansible roles a few days prior. I only made syntax changes so I didn't expect anything to change. But sometimes the rabbit hole goes deeper than you imagined :)

So I looked at the Ansible task that creates and maintains these files:

- name: Create nginx extra include configs (acls and more)
  copy:
    owner: root
    group: wheel
    mode: 600
    content: "{{ item.value.content }}"
    dest: "/usr/local/etc/nginx/{{ item.value.filename }}"
  with_dict: "{{ nginx_extra_configs | default({}) }}"
  when: nginx_proxy | default(False)

This task is used together with a host_vars file containing a section something like this:

nginx_extra_configs:
  bfipacl:
    filename: bornfiber-ip-acl.conf
    content: |
      allow 100.20.45.100/32; # management
      allow 10.20.64.0/24; # something else
  anotheracl:
    filename: another-acl.conf
    content: |
      allow 192.168.0.0/24; # management

And the task just loops through the nginx_extra_configs dict creating or updating each file as needed.

All the Ansible modules that have to do with file management (the copy, template and file modules spring to mind, but many more exist) have the same mode property which is used to control the permissions of the file. Ansibles documentation has this advice about modes being octal (this text is taken from file module documentation):

For those used to /usr/bin/chmod remember that modes are actually octal numbers (like 0644). Leaving off the leading zero will likely have unexpected results. As of version 1.8, the mode may be specified as a symbolic mode (for example, u+rwx or u=rw,g=r,o=r).

While I understand the intent of this advice, the text as it stands is misleading at best. The advice will help in many cases - but only because setuid, setgid and sticky bit is rarely used. And it completely fails to address subtle differences in parsing behaviour, which I had to learn the hard way.

Now, Ansible tasks are defined in YAML, and when defining properties for a task you have a couple of choices for syntax. The changes I made earlier this week - which resulted in the wrong file permissions - was to switch from the key=value syntax to the structured key: value format in my tasks. Basically I changed all the tasks like so:

 - name: Install git
   become: yes
   pkgng:
-    name=git
-    state=present
+    name: git
+    state: present

It just feels more natural to me to use the same key: value syntax in the task arguments as I do in the rest of the task files, so I standardized on this syntax (thinking it would not change any actual functionality).

Problem?

After the syntax change I ran my playbooks and discovered that some of my config files now had wrong permissions. I expected these files to be chmod 600 (so rw for owner, and no permissions for group and others) but what I got was this:

[tsr@webproxy ~]$ ls -l /usr/local/etc/nginx/bornfiber-ip-acl.conf
---x-wx--T  1 root  wheel   594 Nov  2 10:53 bornfiber-ip-acl.conf

I didn't actually change any tasks, just the syntax of them, so this was baffling. I basically changed mode=600 to mode: 600 and now the permissions are completely different.

Numeric unix file permissions are specified as octal (base 8) as three or four digits, where the three rightmost digits represent the permissions for owner, group and others, respectively. The optional fourth and leftmost digit represents the setuid, setgid and sticky bit. This is assumed to be 0 if it is left out, which is why chmod 600 file yields the same result as chmod 0600 file.

I suspected that my 600 was being interpreted as decimal instead of octal after my syntax change. To test the theory I simply convert decimal 600 to octal and feed it to chmod, and check the resulting permissions:

user@privat:~$ echo "obase=8; 600" | bc
1130
user@privat:~$ touch test
user@privat:~$ chmod 1130 test
user@privat:~$ ls -l test 
---x-wx--T 1 user user 0 Nov 19 15:09 test
user@privat:~$ 

Great! I've now confirmed that the problem is the mode being interpreted as decimal instead of octal after my syntax change. Reading the above snippet from the Ansible documentation about octal numbers I changed my tasks to use 0600 instead of 600 and thought that was the end of that. Until I started thinking a bit more about it. The reason prefixing my permissions with a 0 worked is that the number is now interpreted as an octal number instead of a decimal number by pyyaml which is used by Ansible to parse the configuration files. But what if my permissions don't start with a 0?

Further Demystification

Ansible is based on Python 2. Python 2 has two valid octal notations: 0600 and 0o600 both mean octal 600 which equals decimal 1130. Note that Python 3 only supports the 0o600 notation (to avoid this kind of stuff I suspect).

Anyway, using the key=value syntax it seems the permissions number is interpreted as an octal no matter what, but with the key: value syntax I switched to the Python notation for octal comes into play: An unquoted number is now considered a decimal - unless it begins with a 0! So my 0600 is interpreted as an octal now, and my file gets the proper permissions, and all is well it seems.

But what if I wanted to give my file the sticky bit, or setuid/setgid. This would make the first number of the permissions a non-0 instead of a 0, and I am back with wrong permissions. Observe the difference between mode: 1600 and mode: "1600" and mode: 0o1600 below:

[tsr@webproxy ~]$ ls -l /usr/local/etc/nginx/bornfiber-ip-acl.conf # mode: 1600
---x--S--T  1 root  wheel  594 Nov  2 10:53 /usr/local/etc/nginx/bornfiber-ip-acl.conf
[tsr@webproxy ~]$ ls -l /usr/local/etc/nginx/bornfiber-ip-acl.conf # mode: "1600"
-rw------T  1 root  wheel  594 Nov  2 10:53 /usr/local/etc/nginx/bornfiber-ip-acl.conf
[tsr@webproxy ~]$ ls -l /usr/local/etc/nginx/bornfiber-ip-acl.conf # mode: 0o1600
-rw------T  1 root  wheel  594 Nov  2 10:53 /usr/local/etc/nginx/bornfiber-ip-acl.conf
[tsr@webproxy ~]$
I've since changed all my tasks to use quoted octal values when using numeric permissions. Sticking to either quoted or 0o prefixed values should ensure I don't run into these problems again.

Conclusion

The advice in the Ansible documentation is wrong, or misleading at best. The advice should be something like; Note that unix file permissions are octal, and should either be quoted or prefixed with "0o" to ensure correct interpretation. To set mode 644 use mode: "644" or mode: 0o644.. This would have saved me some work, but on the other hand, it is always nice to refresh basic concepts like unix permissions and non-base10 numbers :).

After working on this I wanted to open an issue on Github but I found this which discusses this problem and will probably find a solution sooner or later.

Search this blog

Tags for this blogpost