I was looking at a Kubernetes issue the other day and it led me down a kind of interesting rabbit hole, so I thought it’d be worth sharing as I learned a couple of things.
Background
The issue is to do with the interaction of allowPrivilegeEscalation
and added capabilities in a Kubernetes workload specification. In the issue the reporter noted that if you add CAP_SYS_ADMIN
to a manifest while setting allowPrivilegeEscalation: false
it blocks the deploy but other capabilities when added do not block.
allowPrivilegeEscalation
is kind of an interesting flag as it doesn’t really do what the name says. In reality, what it does is set a specific Linux Kernel setting designed to stop a process from getting more privileges than when it started, however the name implies it’s intended to do a more wide ranging set of blocks. My colleague Christophe has a detailed post looking at this misunderstanding.
However what was specifically interesting to me was, when I tried out a quick manifest to re-create the problem, I wasn’t able to and the pod I created was admitted ok.
After a bit of looking I realised that when adding the capability, I’d used the name SYS_ADMIN
instead of CAP_SYS_ADMIN
, and it had worked fine, weird!
Exploring what’s going on
I decided to put together a couple of quick test cases to understand what’s happening (manifests are here).
capsysadminpod.yaml
- This pod addsCAP_SYS_ADMIN
to the capabilities listsysadminpod.yaml
- This pod addsSYS_ADMIN
to the capabilities listdontallowprivesccapsysadminpod.yaml
- This hasallowPrivilegeEscalation: false
set and addsCAP_SYS_ADMIN
to the capabilities listdontallowprivescsysadminpod.yaml
- This hasallowPrivilegeEscalation: false
set and addsSYS_ADMIN
to the capabilities listinvalidcap.yaml
- This pod has an invalid capability (LOREM
) set.
Trying these manifests out in a kind cluster (using containerd as CRI) showed a couple of things
- Adding
CAP_SYS_ADMIN
worked but there was no capability added. - Adding
SYS_ADMIN
worked and the capability was added. - setting
allowPrivilegeEscalation: false
and addingCAP_SYS_ADMIN
was blocked - setting
allowPrivilegeEscalation: false
and addingSYS_ADMIN
was allowed and the capability was added. - setting an invalid capability worked ok but no capability was added.
So a couple of lessons from that. Kubernetes does not check what capabilities you add, and no error is generated if you add an invalid one, it just doesn’t do anything. Also there’s a redundant block in Kubernetes at the moment where something that doesn’t do anything is blocked, but something which does do something is allowed ok…
Doing some more searching on Github turned up some more history on this. Back in 2021, there was a PR to try and fix this which didn’t get merged, and there’s another issue from 2023 on it as well.
From that one thing that caught my eye was that apparently CRI-O handles this differently than containerd, which I thought was interesting
Comparing CRI-O - with iximiuz labs
I wanted to test out this difference in behaviour, but unfortunately I don’t have a CRI-O backed cluster available on my test lab. Fortunately, iximiuz labs has an awesome Kubernetes playground where you can specify various combinations of CRI and CNI to test out different scenarios, which is nice!
Testing out a cluster there with CRI-O confirmed that things are handled rather differently.
- Adding
CAP_SYS_ADMIN
worked and the capability was added. - Adding
SYS_ADMIN
worked and the capability was added. - setting
allowPrivilegeEscalation: false
and addingCAP_SYS_ADMIN
was blocked - setting
allowPrivilegeEscalation: false
and addingSYS_ADMIN
was allowed and the capability was added. - setting an invalid capability resulted in an error on container creation (CRI-O prepended the capability set with
CAP_
and then threw an error stopping pod creation as it was invalid).
So we can see that CRI-O handles things a bit differently, allowing both SYS_ADMIN
and CAP_SYS_ADMIN
to work and erroring out on invalid capabilities!
Conclusion
Sometimes we can assume that Kubernetes clusters will work the same way, so we can freely move workloads from one to another, regardless of distribution. This case provides an illustration of one way that that assumption might not hold up, and we can see some surprising results!