2 changed files with 289 additions and 0 deletions
@ -0,0 +1,249 @@ |
|||||
|
# Using read-only File System in OpenShift containers |
||||
|
|
||||
|
## Context |
||||
|
|
||||
|
The [CIS Security Best Practices](https://www.cisecurity.org/benchmark/docker/), |
||||
|
mandates the use of a read-only file system in containers. |
||||
|
|
||||
|
This guide explains how to use read-only File System in OpenShift. |
||||
|
|
||||
|
## Default configuration |
||||
|
|
||||
|
By default, when a container is created in OpenShift, the root filesystem is |
||||
|
mounted as read-write. |
||||
|
|
||||
|
On this root filesystem, OpenShift applies additional security restrictions: |
||||
|
|
||||
|
- Linux File Sytem [DAC](https://en.wikipedia.org/wiki/Discretionary_access_control) (unix permissions) |
||||
|
- SELinux [MAC](https://en.wikipedia.org/wiki/Mandatory_access_control) |
||||
|
- Non-privileged, random UID for the running process |
||||
|
|
||||
|
You can easily verify that the root File System is mounted read-write using the |
||||
|
following procedure. |
||||
|
|
||||
|
First, create a dummy container based on the RHEL 7.5 image: |
||||
|
|
||||
|
```sh |
||||
|
oc new-app --name rootfs registry.access.redhat.com/rhel:7.5 |
||||
|
oc patch dc rootfs --type=json -p '[{"op": "add", "path": "/spec/template/spec/containers/0/command", "value": ["/bin/sh", "-c", "while :; do sleep 1; done" ]}]' |
||||
|
``` |
||||
|
|
||||
|
Watch the container being created: |
||||
|
|
||||
|
```sh |
||||
|
oc get pods -w -l app=rootfs |
||||
|
``` |
||||
|
|
||||
|
Once created, check the container root filesystem mount: |
||||
|
|
||||
|
```sh |
||||
|
oc rsh $(oc get pods -l app=rootfs -o name|tail -n 1) mount |head -n 1 |
||||
|
``` |
||||
|
|
||||
|
You should get something like this: |
||||
|
|
||||
|
```raw |
||||
|
overlay on / type overlay (rw,relatime,context="system_u:object_r:svirt_sandbox_file_t:s0:c4,c27",lowerdir=/var/lib/docker/overlay2/l/DOKXVDUEKEI37AXQ7HKYX54UGF:/var/lib/docker/overlay2/l/F6L6WHTZAHKPX722FPFCSPJR7Z:/var/lib/docker/overlay2/l/AZIFQJPO3T2VMKKXOLDVL4Y7RI,upperdir=/var/lib/docker/overlay2/2b1a55df9f0b3d935d2c92ea324d79ccfac956a1be469f82662f8305419c615a/diff,workdir=/var/lib/docker/overlay2/2b1a55df9f0b3d935d2c92ea324d79ccfac956a1be469f82662f8305419c615a/work) |
||||
|
``` |
||||
|
|
||||
|
The root file system is mounted as read-write. |
||||
|
|
||||
|
By default, OpenShift is using the `restricted` Security Context Constraints (SCC): |
||||
|
|
||||
|
```raw |
||||
|
$ oc describe scc restricted |
||||
|
Name: restricted |
||||
|
Priority: <none> |
||||
|
Access: |
||||
|
Users: <none> |
||||
|
Groups: system:authenticated |
||||
|
Settings: |
||||
|
Allow Privileged: false |
||||
|
Default Add Capabilities: <none> |
||||
|
Required Drop Capabilities: KILL,MKNOD,SETUID,SETGID |
||||
|
Allowed Capabilities: <none> |
||||
|
Allowed Seccomp Profiles: <none> |
||||
|
Allowed Volume Types: configMap,downwardAPI,emptyDir,persistentVolumeClaim,projected,secret |
||||
|
Allowed Flexvolumes: <all> |
||||
|
Allow Host Network: false |
||||
|
Allow Host Ports: false |
||||
|
Allow Host PID: false |
||||
|
Allow Host IPC: false |
||||
|
Read Only Root Filesystem: false |
||||
|
Run As User Strategy: MustRunAsRange |
||||
|
UID: <none> |
||||
|
UID Range Min: <none> |
||||
|
UID Range Max: <none> |
||||
|
SELinux Context Strategy: MustRunAs |
||||
|
User: <none> |
||||
|
Role: <none> |
||||
|
Type: <none> |
||||
|
Level: <none> |
||||
|
FSGroup Strategy: MustRunAs |
||||
|
Ranges: <none> |
||||
|
Supplemental Groups Strategy: RunAsAny |
||||
|
Ranges: <none> |
||||
|
``` |
||||
|
|
||||
|
As you can see, the `Read Only Root Filesystem` option is **NOT enabled** in |
||||
|
this SCC. |
||||
|
|
||||
|
This means the user can write only where the Unix permissions allow to do so. |
||||
|
|
||||
|
This can easily be verified by getting a terminal on the running container: |
||||
|
|
||||
|
```sh |
||||
|
oc rsh $(oc get pods -l app=rootfs -o name|tail -n 1) |
||||
|
``` |
||||
|
|
||||
|
First, try to find a place on the filesystem that is writeable by the current user: |
||||
|
|
||||
|
```sh |
||||
|
find / -xdev -writable -ls |
||||
|
``` |
||||
|
|
||||
|
You should get a similar result: |
||||
|
|
||||
|
```raw |
||||
|
286074170 0 lrwxrwxrwx 1 root root 9 Jul 14 14:24 /etc/systemd/system/systemd-logind.service -> /dev/null |
||||
|
286074171 0 lrwxrwxrwx 1 root root 9 Jul 14 14:24 /etc/systemd/system/getty.target -> /dev/null |
||||
|
286074172 0 lrwxrwxrwx 1 root root 9 Jul 14 14:24 /etc/systemd/system/console-getty.service -> /dev/null |
||||
|
286074173 0 lrwxrwxrwx 1 root root 9 Jul 14 14:24 /etc/systemd/system/sys-fs-fuse-connections.mount -> /dev/null |
||||
|
320708631 0 drwxrwxrwt 2 root root 6 Jul 14 14:24 /var/tmp |
||||
|
278398210 0 drwxrwxrwt 7 root root 132 Jul 14 14:24 /tmp |
||||
|
303803069 0 lrwxrwxrwx 1 root root 10 Jul 14 14:23 /usr/tmp -> ../var/tmp |
||||
|
``` |
||||
|
|
||||
|
So, the only writeable files and directories on a RHEL7 image are: |
||||
|
|
||||
|
- some files in `/etc/systemd/system/` **because they are a symlink to `/dev/null`** |
||||
|
- `/tmp` and `/var/tmp` which are needed by most applications to store their temporary files |
||||
|
- `/usr/tmp` which is a symlink to `/var/tmp` |
||||
|
|
||||
|
As you can see, the default RHEL 7.5 image comes with a relevant set of Unix permissions |
||||
|
and do not requires a read-only root file system. |
||||
|
|
||||
|
You can convince yourself by creating a file in `/tmp`: |
||||
|
|
||||
|
```sh |
||||
|
touch /tmp/foo |
||||
|
``` |
||||
|
|
||||
|
And being forbidden to create a file elsewhere: |
||||
|
|
||||
|
```sh |
||||
|
$ touch /bar |
||||
|
touch: cannot touch '/bar': Permission denied |
||||
|
``` |
||||
|
|
||||
|
## Mounting the Root FS read-only |
||||
|
|
||||
|
At this point, if you still want to mount the root filesystem as read-only, you would need to: |
||||
|
|
||||
|
- create a dedicated [Security Context Constraint (SCC)](https://docs.openshift.com/container-platform/3.9/admin_guide/manage_scc.html) |
||||
|
- create a [Service Account](https://docs.openshift.com/container-platform/3.9/dev_guide/service_accounts.html) |
||||
|
- [affect the SCC to the Service Account](https://blog.openshift.com/understanding-service-accounts-sccs/) |
||||
|
- [affect this Service Account to your Deployment](https://blog.openshift.com/understanding-service-accounts-sccs/) |
||||
|
|
||||
|
Create a SCC named [`readonly-fs`](read-only-scc.yaml) that mounts the root file system as read-only: |
||||
|
|
||||
|
```sh |
||||
|
oc create -f read-only-scc.yaml |
||||
|
``` |
||||
|
|
||||
|
Create a service account: |
||||
|
|
||||
|
```sh |
||||
|
oc create sa readonly |
||||
|
``` |
||||
|
|
||||
|
Affect the `readonly-fs` SCC to the `readonly` service account: |
||||
|
|
||||
|
```sh |
||||
|
oc adm policy add-scc-to-user readonly-fs -z readonly |
||||
|
``` |
||||
|
|
||||
|
Affect the `readonly` service account to the `rootfs` deployment: |
||||
|
|
||||
|
```sh |
||||
|
oc patch dc/rootfs --patch '{"spec":{"template":{"spec":{"serviceAccountName": "readonly"}}}}' |
||||
|
``` |
||||
|
|
||||
|
Verify that the root file system is mounted read-only: |
||||
|
|
||||
|
```sh |
||||
|
$ oc rsh $(oc get pods -l app=rootfs -o name|tail -n 1) mount |head -n 1 |
||||
|
overlay on / type overlay (ro,relatime,context="system_u:object_r:svirt_sandbox_file_t:s0:c4,c27",lowerdir=/var/lib/docker/overlay2/l/6HXYZ6ASQAXKMULESF4PBCMOVC:/var/lib/docker/overlay2/l/F6L6WHTZAHKPX722FPFCSPJR7Z:/var/lib/docker/overlay2/l/AZIFQJPO3T2VMKKXOLDVL4Y7RI,upperdir=/var/lib/docker/overlay2/0ceff5b5dae1a00ee14086e6bd0ef5db1600f5f1f2de192255917ceb09ebd31d/diff,workdir=/var/lib/docker/overlay2/0ceff5b5dae1a00ee14086e6bd0ef5db1600f5f1f2de192255917ceb09ebd31d/work) |
||||
|
``` |
||||
|
|
||||
|
If you re-run the `find / -xdev -writable -ls` command, you should get a different result: |
||||
|
|
||||
|
- the files in `/etc/systemd/system/` are still symlinked to `/dev/null` |
||||
|
- but the `/tmp` and `/var/tmp` are not writable anymore |
||||
|
|
||||
|
If you try to create a file in `/tmp`, you should get an explicit error message: |
||||
|
|
||||
|
```raw |
||||
|
$ touch /tmp/foo |
||||
|
touch: cannot touch '/tmp/foo': Read-only file system |
||||
|
``` |
||||
|
|
||||
|
But since `/tmp` and `/var/tmp` are required to be writable my most applications, |
||||
|
you would need to mount a writable `tmpfs` filesystem in those locations: |
||||
|
|
||||
|
```sh |
||||
|
oc volume dc/rootfs --add --overwrite --name tmp --mount-path /tmp --type emptyDir |
||||
|
oc volume dc/rootfs --add --overwrite --name vartmp --mount-path /var/tmp --type emptyDir |
||||
|
``` |
||||
|
|
||||
|
If you re-run the `touch /tmp/foo` command, it should now succeed while the |
||||
|
rest of the root file system is still read-only. |
||||
|
|
||||
|
## Why is the root file-system not mounted read-only by default ? |
||||
|
|
||||
|
Even if it can be seen as a good practice to mount the root filesystem as read-only, |
||||
|
there also other good reasons not to do so. |
||||
|
|
||||
|
Several reasons are tied to the current state of container images, namely those found on |
||||
|
Docker Hub: |
||||
|
|
||||
|
- most docker images found on Docker Hub cannot be run with a read-only root file system |
||||
|
- most docker images found on Docker Hub run as root, so a read-only root file system is a |
||||
|
way to mitigate the fact that root can write anywhere in the container. But since |
||||
|
OpenShift runs by default all containers on a randomized, non-privileged userid, this |
||||
|
mitigation is not needed anymore. |
||||
|
|
||||
|
There are also other reasons related to maintenance and ease of use: |
||||
|
|
||||
|
- If you plan to mount the root file system as read-only, the container cannot be |
||||
|
handled anymore as a black box. You need to understand the requirements of the |
||||
|
application and mount writable `tmpfs` at the required locations. |
||||
|
- When the application is shipped with a sample data set (a pre-provisioned SQLite |
||||
|
database for instance), you will need to define an init container to provision |
||||
|
this sample data set, which is another component to craft, maintain, support, etc. |
||||
|
- Also, when software editor or when the development team changes the layout of the |
||||
|
application, with a read-only root file system you would need to re-engineer the |
||||
|
deployment, whereas with the default OpenShift configuration, the |
||||
|
software editor or development team would just have to update the Unix permissions |
||||
|
of the container image and the deployment of the new version could be triggered |
||||
|
automatically. |
||||
|
|
||||
|
## Conclusion |
||||
|
|
||||
|
As a conclusion, it is definitelly possible to use read-only root filesystems in |
||||
|
OpenShift. For very specific environments where the risks are high, you might consider |
||||
|
this option. |
||||
|
|
||||
|
The rationale around the read-only root file system from the [CIS Security Best Practices](https://www.cisecurity.org/benchmark/docker/) is: |
||||
|
|
||||
|
- This leads to an immutable infrastructure |
||||
|
- Since the container instance cannot be written to, there is no need to audit instance divergence |
||||
|
- Reduced security attack vectors since the instance cannot be tampered with or written to |
||||
|
- Ability to use a purely volume based backup without backing up anything from the instance |
||||
|
|
||||
|
While I definitely agree with the rationale, I also think the read-only root file system |
||||
|
has an impact on the way container are managed and the perceived security gain must be weighted |
||||
|
with the required cost to implement, maintain and support this configuration. |
||||
|
|
||||
|
Also, as you can see in this example, the default OpenShift configuration provides |
||||
|
other mechanisms to reach the same goals. |
||||
@ -0,0 +1,40 @@ |
|||||
|
apiVersion: security.openshift.io/v1 |
||||
|
kind: SecurityContextConstraints |
||||
|
metadata: |
||||
|
annotations: |
||||
|
kubernetes.io/description: restricted SCC + read-only FS |
||||
|
name: readonly-fs |
||||
|
allowHostDirVolumePlugin: false |
||||
|
allowHostIPC: false |
||||
|
allowHostNetwork: false |
||||
|
allowHostPID: false |
||||
|
allowHostPorts: false |
||||
|
allowPrivilegedContainer: false |
||||
|
allowedCapabilities: null |
||||
|
allowedFlexVolumes: null |
||||
|
defaultAddCapabilities: null |
||||
|
fsGroup: |
||||
|
type: MustRunAs |
||||
|
groups: |
||||
|
- system:authenticated |
||||
|
priority: null |
||||
|
readOnlyRootFilesystem: true |
||||
|
requiredDropCapabilities: |
||||
|
- KILL |
||||
|
- MKNOD |
||||
|
- SETUID |
||||
|
- SETGID |
||||
|
runAsUser: |
||||
|
type: MustRunAsRange |
||||
|
seLinuxContext: |
||||
|
type: MustRunAs |
||||
|
supplementalGroups: |
||||
|
type: RunAsAny |
||||
|
users: [] |
||||
|
volumes: |
||||
|
- configMap |
||||
|
- downwardAPI |
||||
|
- emptyDir |
||||
|
- persistentVolumeClaim |
||||
|
- projected |
||||
|
- secret |
||||
Loading…
Reference in new issue