diff --git a/Read-Only-FS/README.md b/Read-Only-FS/README.md new file mode 100644 index 0000000..12cad1a --- /dev/null +++ b/Read-Only-FS/README.md @@ -0,0 +1,249 @@ +# Using read-only File System in OpenShift containers + +## Context + +The [CIS Security Best Practices](https://www.cisecurity.org/benchmark/docker/), +mandates the use of a read-only file system in containers. + +This guide explains how to use read-only File System in OpenShift. + +## Default configuration + +By default, when a container is created in OpenShift, the root filesystem is +mounted as read-write. + +On this root filesystem, OpenShift applies additional security restrictions: + +- Linux File Sytem [DAC](https://en.wikipedia.org/wiki/Discretionary_access_control) (unix permissions) +- SELinux [MAC](https://en.wikipedia.org/wiki/Mandatory_access_control) +- Non-privileged, random UID for the running process + +You can easily verify that the root File System is mounted read-write using the +following procedure. + +First, create a dummy container based on the RHEL 7.5 image: + +```sh +oc new-app --name rootfs registry.access.redhat.com/rhel:7.5 +oc patch dc rootfs --type=json -p '[{"op": "add", "path": "/spec/template/spec/containers/0/command", "value": ["/bin/sh", "-c", "while :; do sleep 1; done" ]}]' +``` + +Watch the container being created: + +```sh +oc get pods -w -l app=rootfs +``` + +Once created, check the container root filesystem mount: + +```sh +oc rsh $(oc get pods -l app=rootfs -o name|tail -n 1) mount |head -n 1 +``` + +You should get something like this: + +```raw +overlay on / type overlay (rw,relatime,context="system_u:object_r:svirt_sandbox_file_t:s0:c4,c27",lowerdir=/var/lib/docker/overlay2/l/DOKXVDUEKEI37AXQ7HKYX54UGF:/var/lib/docker/overlay2/l/F6L6WHTZAHKPX722FPFCSPJR7Z:/var/lib/docker/overlay2/l/AZIFQJPO3T2VMKKXOLDVL4Y7RI,upperdir=/var/lib/docker/overlay2/2b1a55df9f0b3d935d2c92ea324d79ccfac956a1be469f82662f8305419c615a/diff,workdir=/var/lib/docker/overlay2/2b1a55df9f0b3d935d2c92ea324d79ccfac956a1be469f82662f8305419c615a/work) +``` + +The root file system is mounted as read-write. + +By default, OpenShift is using the `restricted` Security Context Constraints (SCC): + +```raw +$ oc describe scc restricted +Name: restricted +Priority: +Access: + Users: + Groups: system:authenticated +Settings: + Allow Privileged: false + Default Add Capabilities: + Required Drop Capabilities: KILL,MKNOD,SETUID,SETGID + Allowed Capabilities: + Allowed Seccomp Profiles: + Allowed Volume Types: configMap,downwardAPI,emptyDir,persistentVolumeClaim,projected,secret + Allowed Flexvolumes: + Allow Host Network: false + Allow Host Ports: false + Allow Host PID: false + Allow Host IPC: false + Read Only Root Filesystem: false + Run As User Strategy: MustRunAsRange + UID: + UID Range Min: + UID Range Max: + SELinux Context Strategy: MustRunAs + User: + Role: + Type: + Level: + FSGroup Strategy: MustRunAs + Ranges: + Supplemental Groups Strategy: RunAsAny + Ranges: +``` + +As you can see, the `Read Only Root Filesystem` option is **NOT enabled** in +this SCC. + +This means the user can write only where the Unix permissions allow to do so. + +This can easily be verified by getting a terminal on the running container: + +```sh +oc rsh $(oc get pods -l app=rootfs -o name|tail -n 1) +``` + +First, try to find a place on the filesystem that is writeable by the current user: + +```sh +find / -xdev -writable -ls +``` + +You should get a similar result: + +```raw +286074170 0 lrwxrwxrwx 1 root root 9 Jul 14 14:24 /etc/systemd/system/systemd-logind.service -> /dev/null +286074171 0 lrwxrwxrwx 1 root root 9 Jul 14 14:24 /etc/systemd/system/getty.target -> /dev/null +286074172 0 lrwxrwxrwx 1 root root 9 Jul 14 14:24 /etc/systemd/system/console-getty.service -> /dev/null +286074173 0 lrwxrwxrwx 1 root root 9 Jul 14 14:24 /etc/systemd/system/sys-fs-fuse-connections.mount -> /dev/null +320708631 0 drwxrwxrwt 2 root root 6 Jul 14 14:24 /var/tmp +278398210 0 drwxrwxrwt 7 root root 132 Jul 14 14:24 /tmp +303803069 0 lrwxrwxrwx 1 root root 10 Jul 14 14:23 /usr/tmp -> ../var/tmp +``` + +So, the only writeable files and directories on a RHEL7 image are: + +- some files in `/etc/systemd/system/` **because they are a symlink to `/dev/null`** +- `/tmp` and `/var/tmp` which are needed by most applications to store their temporary files +- `/usr/tmp` which is a symlink to `/var/tmp` + +As you can see, the default RHEL 7.5 image comes with a relevant set of Unix permissions +and do not requires a read-only root file system. + +You can convince yourself by creating a file in `/tmp`: + +```sh +touch /tmp/foo +``` + +And being forbidden to create a file elsewhere: + +```sh +$ touch /bar +touch: cannot touch '/bar': Permission denied +``` + +## Mounting the Root FS read-only + +At this point, if you still want to mount the root filesystem as read-only, you would need to: + +- create a dedicated [Security Context Constraint (SCC)](https://docs.openshift.com/container-platform/3.9/admin_guide/manage_scc.html) +- create a [Service Account](https://docs.openshift.com/container-platform/3.9/dev_guide/service_accounts.html) +- [affect the SCC to the Service Account](https://blog.openshift.com/understanding-service-accounts-sccs/) +- [affect this Service Account to your Deployment](https://blog.openshift.com/understanding-service-accounts-sccs/) + +Create a SCC named [`readonly-fs`](read-only-scc.yaml) that mounts the root file system as read-only: + +```sh +oc create -f read-only-scc.yaml +``` + +Create a service account: + +```sh +oc create sa readonly +``` + +Affect the `readonly-fs` SCC to the `readonly` service account: + +```sh +oc adm policy add-scc-to-user readonly-fs -z readonly +``` + +Affect the `readonly` service account to the `rootfs` deployment: + +```sh +oc patch dc/rootfs --patch '{"spec":{"template":{"spec":{"serviceAccountName": "readonly"}}}}' +``` + +Verify that the root file system is mounted read-only: + +```sh +$ oc rsh $(oc get pods -l app=rootfs -o name|tail -n 1) mount |head -n 1 +overlay on / type overlay (ro,relatime,context="system_u:object_r:svirt_sandbox_file_t:s0:c4,c27",lowerdir=/var/lib/docker/overlay2/l/6HXYZ6ASQAXKMULESF4PBCMOVC:/var/lib/docker/overlay2/l/F6L6WHTZAHKPX722FPFCSPJR7Z:/var/lib/docker/overlay2/l/AZIFQJPO3T2VMKKXOLDVL4Y7RI,upperdir=/var/lib/docker/overlay2/0ceff5b5dae1a00ee14086e6bd0ef5db1600f5f1f2de192255917ceb09ebd31d/diff,workdir=/var/lib/docker/overlay2/0ceff5b5dae1a00ee14086e6bd0ef5db1600f5f1f2de192255917ceb09ebd31d/work) +``` + +If you re-run the `find / -xdev -writable -ls` command, you should get a different result: + +- the files in `/etc/systemd/system/` are still symlinked to `/dev/null` +- but the `/tmp` and `/var/tmp` are not writable anymore + +If you try to create a file in `/tmp`, you should get an explicit error message: + +```raw +$ touch /tmp/foo +touch: cannot touch '/tmp/foo': Read-only file system +``` + +But since `/tmp` and `/var/tmp` are required to be writable my most applications, +you would need to mount a writable `tmpfs` filesystem in those locations: + +```sh +oc volume dc/rootfs --add --overwrite --name tmp --mount-path /tmp --type emptyDir +oc volume dc/rootfs --add --overwrite --name vartmp --mount-path /var/tmp --type emptyDir +``` + +If you re-run the `touch /tmp/foo` command, it should now succeed while the +rest of the root file system is still read-only. + +## Why is the root file-system not mounted read-only by default ? + +Even if it can be seen as a good practice to mount the root filesystem as read-only, +there also other good reasons not to do so. + +Several reasons are tied to the current state of container images, namely those found on +Docker Hub: + +- most docker images found on Docker Hub cannot be run with a read-only root file system +- most docker images found on Docker Hub run as root, so a read-only root file system is a + way to mitigate the fact that root can write anywhere in the container. But since + OpenShift runs by default all containers on a randomized, non-privileged userid, this + mitigation is not needed anymore. + +There are also other reasons related to maintenance and ease of use: + +- If you plan to mount the root file system as read-only, the container cannot be + handled anymore as a black box. You need to understand the requirements of the + application and mount writable `tmpfs` at the required locations. +- When the application is shipped with a sample data set (a pre-provisioned SQLite + database for instance), you will need to define an init container to provision + this sample data set, which is another component to craft, maintain, support, etc. +- Also, when software editor or when the development team changes the layout of the + application, with a read-only root file system you would need to re-engineer the + deployment, whereas with the default OpenShift configuration, the + software editor or development team would just have to update the Unix permissions + of the container image and the deployment of the new version could be triggered + automatically. + +## Conclusion + +As a conclusion, it is definitelly possible to use read-only root filesystems in +OpenShift. For very specific environments where the risks are high, you might consider +this option. + +The rationale around the read-only root file system from the [CIS Security Best Practices](https://www.cisecurity.org/benchmark/docker/) is: + +- This leads to an immutable infrastructure +- Since the container instance cannot be written to, there is no need to audit instance divergence +- Reduced security attack vectors since the instance cannot be tampered with or written to +- Ability to use a purely volume based backup without backing up anything from the instance + +While I definitely agree with the rationale, I also think the read-only root file system +has an impact on the way container are managed and the perceived security gain must be weighted +with the required cost to implement, maintain and support this configuration. + +Also, as you can see in this example, the default OpenShift configuration provides +other mechanisms to reach the same goals. diff --git a/Read-Only-FS/read-only-scc.yaml b/Read-Only-FS/read-only-scc.yaml new file mode 100644 index 0000000..0bc6567 --- /dev/null +++ b/Read-Only-FS/read-only-scc.yaml @@ -0,0 +1,40 @@ +apiVersion: security.openshift.io/v1 +kind: SecurityContextConstraints +metadata: + annotations: + kubernetes.io/description: restricted SCC + read-only FS + name: readonly-fs +allowHostDirVolumePlugin: false +allowHostIPC: false +allowHostNetwork: false +allowHostPID: false +allowHostPorts: false +allowPrivilegedContainer: false +allowedCapabilities: null +allowedFlexVolumes: null +defaultAddCapabilities: null +fsGroup: + type: MustRunAs +groups: +- system:authenticated +priority: null +readOnlyRootFilesystem: true +requiredDropCapabilities: +- KILL +- MKNOD +- SETUID +- SETGID +runAsUser: + type: MustRunAsRange +seLinuxContext: + type: MustRunAs +supplementalGroups: + type: RunAsAny +users: [] +volumes: +- configMap +- downwardAPI +- emptyDir +- persistentVolumeClaim +- projected +- secret