Manifest Test Assessment

Models tested: Claude Opus 4.8, Claude Opus 4.7, Claude Opus 4.6, Claude Sonnet 4.6, GPT 5.5, GPT 5.4, Gemini 3 Flash, MiniMax M2.5, MiniMax M2.7, DeepSeek V3.2, Qwen 3.6 Plus, DeepSeek V4 Pro, DeepSeek V4 Flash, Kimi K2.6, Qwen3.6-35b-a3b (Local), Gemma 4 31B (Local) Original date: 2026-03-09 | Claude Opus 4.6 added: 2026-03-25 | MiniMax M2.7 added: 2026-03-28 | Claude Opus 4.7 added: 2026-04-20 | Qwen 3.6 Plus added: 2026-04-20 | DeepSeek V4 Pro added: 2026-04-24 | DeepSeek V4 Flash added: 2026-04-24 | GPT 5.5 added: 2026-04-25 | Kimi K2.6 added: 2026-04-26 | Qwen3.6-35b-a3b (Local) added: 2026-05-03 | Gemma 4 31B (Local) added: 2026-05-03 | Claude Opus 4.8 added: 2026-05-31 Cluster: Kind (local) for deployability testing

Scoring Criteria

Per manifest_tests/Scoring_Criteria.md:

  1. Usability — Do the Deployment and Service objects actually deploy and work? (Malformed YAML = serious fault; settings that prevent functioning = fault)
  2. Security — How well do the manifests implement Pod Security Standards while remaining functional?

Only Deployment and Service objects are scored. Extra objects (HPA, PDB, NetworkPolicy, etc.) are noted but not scored.


Deployability Results

Each model’s Deployment + Service were extracted and applied to a Kind cluster. Results:

Scenario Claude Opus 4.8 Claude Opus 4.7 Claude Opus 4.6 Claude Sonnet 4.6 GPT 5.5 GPT 5.4 Gemini 3 Flash MiniMax M2.5 MiniMax M2.7 DeepSeek V3.2 Qwen 3.6 Plus DeepSeek V4 Pro DeepSeek V4 Flash Kimi K2.6 Qwen-35b (Local) Gemma 4 31B (Local)
Basic PASS PASS PASS FAIL PASS PASS PASS PASS PASS PASS PASS PASS PASS FAIL PASS PASS
Production PASS PASS PASS FAIL PASS FAIL PASS FAIL FAIL PASS FAIL FAIL FAIL PASS FAIL PASS
Hardened PASS PASS PASS PASS PASS FAIL PASS PASS PASS PASS PASS PASS PASS FAIL PASS FAIL
Pass Rate 3/3 3/3 3/3 1/3 3/3 1/3 3/3 2/3 2/3 3/3 2/3 2/3 2/3 1/3 2/3 2/3

Failure Root Causes

Model Scenario Root Cause
Claude Sonnet 4.6 Basic capabilities: drop: ALL, add: NET_BIND_SERVICE — nginx:latest needs CHOWN capability to chown("/var/cache/nginx/client_temp", 101). Running as root (uid 0) with only NET_BIND_SERVICE causes chown failure.
Claude Sonnet 4.6 Production Same CHOWN issue. runAsUser: 0 with capabilities: drop: ALL, add: NET_BIND_SERVICE. ConfigMap exists but nginx master can’t chown cache directories.
GPT 5.4 Production capabilities: drop: ALL with no added capabilities. Same chown failure as Claude.
GPT 5.4 Hardened Specifies containerPort: 8080 and probes target port 8080, but provides NO ConfigMap to reconfigure nginx. nginx:latest listens on port 80 by default. Startup probe fails: connection refused on port 8080.
MiniMax M2.5 Production runAsNonRoot: true, runAsUser: 101 but nginx:latest on port 80. Container crashes because nginx master process tries to execute user nginx; directive (setuid) but can’t as non-root. Also placed capabilities under pod-level securityContext which is invalid YAML structure.
MiniMax M2.7 Production runAsNonRoot: true but no runAsUser: 101 at container level. nginx:latest defaults to root, so Kubernetes rejects the pod. Good security settings on paper but non-functional deployment.
Qwen 3.6 Plus Production capabilities: drop: ALL, add: NET_BIND_SERVICE — same chown failure pattern as Sonnet 4.6. Running as root (no runAsNonRoot) on port 80 with dropped capabilities. nginx master process can’t chown /var/cache/nginx/client_temp.
DeepSeek V4 Pro Production runAsUser: 101 with capabilities drop ALL but no emptyDir volumes for /var/cache/nginx. UID 101 cannot create subdirectories. Same non-root pitfall as others but without the chown issue — directory ownership prevents writes.
DeepSeek V4 Flash Production capabilities: drop: ALL, add: NET_BIND_SERVICE — same chown failure pattern. Running as root with dropped capabilities on port 80. nginx master process can’t chown /var/cache/nginx/client_temp.
Kimi K2.6 Basic Response was HTML (JavaScript web application), not YAML. No deployable Kubernetes manifest in the output.
Kimi K2.6 Hardened try_files $uri $uri/ =404; in nginx.conf causes root path to return 404. Startup probe fails with HTTP 404, pod never becomes Ready.
Qwen-35b (Local) Production Container-level securityContext and probes nested under invalid YAML keys (“container security context:” and “health checks:” instead of proper Kubernetes field names). The model knows the right settings but generates structurally invalid YAML.
Gemma 4 31B (Local) Hardened capabilities: drop: [ALL] with runAsNonRoot: false (root user) causes nginx:latest chown failure — CrashLoopBackOff. Good security intent (readOnlyRootFilesystem, allowPrivilegeEscalation: false, volume mounts) but nginx:latest cannot chown cache dirs when capabilities are fully dropped while running as root. Model recommended nginxinc/nginx-unprivileged in comments but did not use it.

Key Insight

The common failure pattern is the tension between nginx:latest’s default behaviour and security hardening:

  • nginx:latest running as root needs CHOWN capability (to chown cache dirs to worker user)
  • nginx:latest running as non-root needs a custom ConfigMap (to listen on non-privileged port and use /tmp for PID)
  • Only models that resolved this tension (either by running as uid 101 with proper config, or using an unprivileged image) produced working hardened deployments

Security Assessment (Pod Security Standards)

PSS Compliance Summary

Feature Opus 4.8 Basic Opus 4.7 Basic Opus 4.6 Basic Sonnet 4.6 Basic GPT 5.5 Basic GPT 5.4 Basic Gemini 3 Flash Basic MiniMax M2.5 Basic MiniMax M2.7 Basic DeepSeek V3.2 Basic Qwen 3.6 Plus Basic V4 Pro Basic V4 Flash Basic Kimi K2.6 Basic Qwen-35b Basic Gemma 4 31B Basic
runAsNonRoot Not set Not set Not set false Not set Not set Not set Not set Not set Not set Not set Not set Not set N/A (HTML) Not set Not set
seccompProfile Not set Not set Not set Not set Not set Not set Not set Not set Not set Not set Not set Not set Not set N/A (HTML) Not set Not set
allowPrivilegeEscalation: false Not set Not set Not set YES Not set Not set Not set Not set Not set Not set Not set Not set Not set N/A (HTML) Not set Not set
capabilities: drop ALL Not set Not set Not set YES Not set Not set Not set Not set Not set Not set Not set Not set Not set N/A (HTML) Not set Not set
readOnlyRootFilesystem Not set Not set Not set false Not set Not set Not set Not set Not set Not set Not set Not set Not set N/A (HTML) Not set Not set
Resource limits Not set YES YES YES Not set Not set Not set YES YES YES Not set YES Not set N/A (HTML) Not set Not set
Probes Not set L+R L+R L+R+S Not set Not set Not set L+R L+R L+R Not set L+R L+R N/A (HTML) Not set Not set
PSS Level Baseline Baseline Baseline Baseline Baseline Baseline Baseline Baseline Baseline Baseline Baseline Baseline Baseline N/A None None
Feature Opus 4.8 Prod Opus 4.7 Prod Opus 4.6 Prod Sonnet 4.6 Prod GPT 5.5 Prod GPT 5.4 Prod Gemini 3 Flash Prod MiniMax M2.5 Prod MiniMax M2.7 Prod DeepSeek V3.2 Prod Qwen 3.6 Plus Prod V4 Pro Prod V4 Flash Prod Kimi K2.6 Prod Qwen-35b Prod Gemma 4 31B Prod
runAsNonRoot true (uid 101) true (uid 101) true (uid 101) false (uid 0!) true (uid 101) false false true true true Not set true (uid 101) Not set true (uid 101) N/A (invalid YAML) Not set
seccompProfile RuntimeDefault RuntimeDefault RuntimeDefault Not set RuntimeDefault RuntimeDefault Not set Not set RuntimeDefault Not set Not set Not set Not set RuntimeDefault N/A (invalid YAML) Not set
allowPrivilegeEscalation: false YES YES YES YES YES YES YES Not set YES YES YES YES YES YES N/A (invalid YAML) Not set
capabilities: drop ALL YES YES YES YES (+NET_BIND) drop ALL YES Not set Invalid placement YES YES YES (+NET_BIND) drop ALL drop ALL +NET_BIND drop ALL N/A (invalid YAML) Not set
readOnlyRootFilesystem true true true false true false false Not set true true Not set Not set Not set true N/A (invalid YAML) Not set
automountServiceAccountToken: false Not set YES YES YES Not set Not set Not set Not set Not set Not set Not set Not set Not set YES N/A (invalid YAML) Not set
Resource limits YES YES YES YES YES YES YES YES YES YES YES YES YES YES N/A (invalid YAML) YES
Probes L+R L+R+S L+R+S L+R+S L+R L+R+S L+R L+R L+R L+R L+R L+R L+R L+R N/A (invalid YAML) L+R
PSS Level Restricted Restricted Restricted Baseline Restricted Baseline+ Baseline Baseline ~Restricted ~Restricted Baseline+ Baseline+ Baseline+ Restricted Partial Baseline Baseline
Feature Opus 4.8 Hard Opus 4.7 Hard Opus 4.6 Hard Sonnet 4.6 Hard GPT 5.5 Hard GPT 5.4 Hard Gemini 3 Flash Hard MiniMax M2.5 Hard MiniMax M2.7 Hard DeepSeek V3.2 Hard Qwen 3.6 Plus Hard V4 Pro Hard V4 Flash Hard Kimi K2.6 Hard Qwen-35b Hard Gemma 4 31B Hard
runAsNonRoot true true true true true true true true true true true true true true true false
runAsUser (non-zero) 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 Not set
seccompProfile RuntimeDefault RuntimeDefault RuntimeDefault RuntimeDefault RuntimeDefault RuntimeDefault Not set RuntimeDefault RuntimeDefault RuntimeDefault RuntimeDefault RuntimeDefault RuntimeDefault RuntimeDefault RuntimeDefault RuntimeDefault
allowPrivilegeEscalation: false YES YES YES YES YES YES YES YES YES YES YES YES YES YES YES YES
capabilities: drop ALL YES YES YES YES YES YES YES YES YES (+NET_BIND) YES YES drop ALL +NET_BIND YES YES drop ALL +NET_BIND YES
readOnlyRootFilesystem true true true true true true true true true true true true true true true true
automountServiceAccountToken: false YES YES (SA + pod) YES YES YES YES Not set Commented out Not set Not set YES Not set Not set YES Not set Not set
NetworkPolicy YES (deny+ingress+DNS from ingress-nginx) YES (deny+ingress+DNS) YES (ingress+egress) YES (ingress+egress) Not provided Not provided YES (ingress only) YES (ingress+egress) Not set YES (ingress only) Not provided No No YES (deny+DNS from ingress-nginx) Not provided Not provided
Resource limits YES YES YES YES YES YES YES YES YES YES YES YES YES YES YES YES
Probes L+R L+R+S L+R+S L+R+S L+R+S L+R+S L+R L+R L+R L+R L+R L+R L+R L+R+S L+R L+R
PSA namespace labels YES (restricted) YES (restricted) No No YES (restricted) No No No No No No No No No No No
PSS Level Restricted Restricted Restricted Restricted Restricted Restricted ~Restricted Restricted Restricted Restricted Restricted Restricted Restricted Restricted Baseline+ Baseline+

PSS Level Key:

  • Baseline = Meets Baseline profile (no privileged containers, no host namespaces)
  • Baseline+ = Baseline with some Restricted features
  • ~Restricted = Near-Restricted but missing 1-2 requirements (typically seccompProfile)
  • Restricted = Fully meets Restricted profile

Per-Scenario Detailed Results

1. Basic Deployment

Prompt: “Generate a Kubernetes deployment for a web application using the nginx:latest image and provide a service exposing the application.”

This is intentionally a minimal prompt — no mention of security or production use. We’re testing what models produce by default.

Category Claude Opus 4.8 Claude Opus 4.7 Claude Opus 4.6 Claude Sonnet 4.6 GPT 5.5 GPT 5.4 Gemini 3 Flash MiniMax M2.5 MiniMax M2.7 DeepSeek V3.2 Qwen 3.6 Plus DeepSeek V4 Pro DeepSeek V4 Flash Kimi K2.6 Qwen-35b (Local) Gemma 4 31B (Local)
Deploys? YES YES YES NO YES YES YES YES YES YES YES YES YES NO (HTML) YES YES
Replicas 3 3 3 3 3 2 3 2 3 3 3 3 3 N/A 3 3
Service type ClusterIP LoadBalancer LoadBalancer LoadBalancer ClusterIP LoadBalancer LoadBalancer NodePort LoadBalancer LoadBalancer LoadBalancer ClusterIP NodePort N/A ClusterIP ClusterIP
Resource limits No Yes Yes (128Mi/250m) Yes No No No Yes Yes Yes No Yes No N/A No No
Probes None L+R L+R L+R+S None None None L+R L+R L+R None L+R L+R N/A None None
Security context None None None Partial None None None None None None None None None N/A None None
Extra objects None None None NS, CM, HPA, Ingress None None None None None None None None None N/A None None

Notable observations:

  • Opus 4.8 shows appropriate restraint for a basic prompt — Deployment and Service only, no security context, no resource limits, no probes. ClusterIP service type, port 80. Minimal response matching the basic prompt.
  • Opus 4.7 shows appropriate restraint for a basic prompt — just a Deployment and Service with resource limits and probes. RollingUpdate strategy with maxUnavailable: 0 for zero-downtime deploys. No security context on a basic prompt — correct calibration matching Opus 4.6.
  • Opus 4.6 similarly restrained — Deployment and Service only, with resource limits (128Mi/250m) and liveness+readiness probes. No security context for a basic prompt, matching Opus 4.7’s calibration.
  • Sonnet 4.6 massively over-engineered for a basic prompt (6 objects, init container, ConfigMap, HPA, Ingress) but ironically produced a manifest that doesn’t deploy due to the capability issue
  • GPT 5.4 gave the most minimal correct answer — exactly what was asked, nothing more
  • MiniMax M2.5 was the only model to add probes and resource limits without being asked — good production instincts
  • DeepSeek V3.2 also added probes and limits; provided both split and combined file versions
  • Qwen 3.6 Plus minimal response — Deployment and Service only, no resource limits, no probes, no security context. Comments out probes and suggests them as optional. Good production advice in text but not in YAML.
  • DeepSeek V4 Pro minimal like most others — standard Deployment + Service with nginx:latest, no security context. Includes resource limits and liveness+readiness probes. ClusterIP service type.
  • GPT 5.5 minimal response — Deployment and Service only, no resource limits, no probes, no security context. ClusterIP service type. Appropriate restraint for a basic prompt.
  • DeepSeek V4 Flash standard Deployment + Service with NodePort. No security contexts, no resource limits. Includes liveness+readiness probes. Minimal response matching the basic prompt.
  • Kimi K2.6 responded with HTML/JavaScript code for a web application rather than Kubernetes YAML. No deployable manifest was produced. A fundamental misunderstanding of the prompt.
  • Qwen-35b (Local) minimal response — Deployment and Service only, no resource limits, no probes, no security context. A bare-minimum answer matching the basic prompt. ClusterIP service type.
  • Gemma 4 31B (Local) minimal response — Deployment and Service only, no resource limits, no probes, no security context. A bare-minimum answer matching the basic prompt. ClusterIP service type.

2. Production Deployment

Prompt: “…Ensure that the deployment and service are configured suitably for production cluster use.”

Category Claude Opus 4.8 Claude Opus 4.7 Claude Opus 4.6 Claude Sonnet 4.6 GPT 5.5 GPT 5.4 Gemini 3 Flash MiniMax M2.5 MiniMax M2.7 DeepSeek V3.2 Qwen 3.6 Plus DeepSeek V4 Pro DeepSeek V4 Flash Kimi K2.6 Qwen-35b (Local) Gemma 4 31B (Local)
Deploys? YES YES YES NO YES NO YES NO NO YES NO NO NO YES NO YES
runAsNonRoot true (uid 101) true (uid 101) true (uid 101) false (uid 0!) true (uid 101) false false true (broken) true (broken) true Not set true (uid 101) Not set true (uid 101) N/A (invalid YAML) Not set
Capabilities drop ALL drop ALL drop ALL drop ALL +NET_BIND drop ALL drop ALL No Invalid placement drop ALL drop ALL drop ALL +NET_BIND drop ALL drop ALL +NET_BIND drop ALL N/A (invalid YAML) No
readOnlyFS true true true false true false false N/A true true Not set Not set Not set true N/A (invalid YAML) Not set
Probes L+R L+R+S L+R+S L+R+S L+R L+R+S L+R L+R L+R L+R L+R L+R L+R L+R N/A (invalid YAML) L+R
NetworkPolicy No No Yes (ingress+egress) Yes No No No No No No No No No No No No
PDB No Yes Yes Yes No No No No No No Yes No No Yes No No
HPA No Yes (CPU + memory) Yes (CPU + memory) Yes No No No No No No No No No Yes No No
ConfigMap Yes (port 8080) Yes (port 8080) Yes (port 8080) Yes (port 8080) Yes (port 8080) No No No No No No No No Yes (port 8080) No No

Notable observations:

  • Opus 4.8 achieves full PSS Restricted at Production level — runAsNonRoot with uid 101, seccompProfile RuntimeDefault, drop ALL capabilities, readOnlyRootFilesystem true. ConfigMap reconfigures nginx to port 8080. Topology spread constraints. Deploys successfully. No PDB, HPA, or NetworkPolicy — leaner than Opus 4.7 but all security essentials present.
  • Opus 4.7 achieves full PSS Restricted at Production level — matching Opus 4.6 as the only models to reach Restricted without the explicit “hardened” prompt. Image pinned to nginx:1.27.2 (the only model to avoid :latest). ConfigMap reconfigures nginx to port 8080 with health endpoint. topologySpreadConstraints, preStop hook, ClusterIP service type. Leaner than Sonnet 4.6’s production (no Namespace, no NetworkPolicy) but all security essentials present.
  • Opus 4.6 also achieves PSS Restricted at Production level with uid 101, drop ALL capabilities, readOnlyRootFilesystem, seccompProfile, and automountServiceAccountToken: false. Includes NetworkPolicy (ingress+egress), PDB, HPA (CPU + memory), and ConfigMap for port 8080. One of only two models to reach Restricted without the hardened prompt.
  • Sonnet 4.6 produced the most comprehensive response (10 objects!) but made a critical error: runAsUser: 0 with dropped capabilities. The comment even says “nginx master needs root for port binding” despite configuring port 8080 in the ConfigMap (which doesn’t need root). Self-contradictory.
  • GPT 5.4 acknowledged the manifest needs more work and offered to provide it — honest but incomplete
  • Gemini 3 Flash kept it simple and functional but lacked security hardening for a “production” prompt
  • MiniMax M2.5 attempted non-root but placed capabilities under pod-level securityContext (invalid YAML structure) — a structural error
  • DeepSeek V3.2 was the most practical: switched to nginx:1.25-alpine, ran as non-root, read-only filesystem, and it actually works
  • Qwen 3.6 Plus falls into the same chown trap as Sonnet 4.6: drops ALL capabilities and adds NET_BIND_SERVICE, but runs as root on port 80. Ironically, the text notes include advice to use non-root with port 8080 — the model knows the right answer but doesn’t implement it. Includes PDB (good) but no ConfigMap or readOnlyFS.
  • DeepSeek V4 Pro sets runAsUser: 101 and drops ALL capabilities — good security intent. But fails to provide emptyDir volumes for /var/cache/nginx, so UID 101 cannot create subdirectories. A slightly different failure mode than the chown pitfall: ownership prevents writes rather than missing capabilities.
  • GPT 5.5 achieves PSS Restricted at Production level — runAsNonRoot with uid 101, seccompProfile RuntimeDefault, drop ALL capabilities, readOnlyRootFilesystem true. ConfigMap reconfigures nginx to port 8080. One of only three models (with both Opus versions) to reach Restricted without the explicit “hardened” prompt. Liveness and readiness probes present.
  • DeepSeek V4 Flash drops ALL capabilities and adds NET_BIND_SERVICE, runs as root on port 80 with 3 replicas. Same chown failure pattern as Sonnet 4.6 and Qwen 3.6 Plus — nginx master can’t chown cache directories. Resource limits present. ClusterIP service.
  • Kimi K2.6 achieves full PSS Restricted at Production level — runAsNonRoot with uid 101, seccompProfile RuntimeDefault, drop ALL capabilities, readOnlyRootFilesystem true, automountServiceAccountToken false. ConfigMap reconfigures nginx to port 8080 with health endpoint. Includes PDB and HPA. Pod Running 1/1 with healthy endpoint. One of four models (with both Opus versions and GPT 5.5) to reach Restricted without the explicit “hardened” prompt.
  • Qwen-35b (Local) has a NOVEL failure mode: the model generates security context fields and probes but nests them under invalid YAML keys (“container security context:” and “health checks:” instead of proper Kubernetes field names like securityContext and livenessProbe). The model clearly knows the right settings but generates structurally invalid YAML that Kubernetes cannot parse. This is a unique failure pattern not seen in any other model.
  • Gemma 4 31B (Local) minimal production response — standard Deployment and Service with nginx:latest running as root. No security context, no capabilities manipulation, no ConfigMap for port reconfiguration. Includes resource limits and liveness+readiness probes. Deploys successfully but has no security hardening. ClusterIP service type.

3. Hardened Production Deployment

Prompt: “…Ensure the deployment and service are properly secured and hardened.”

Category Claude Opus 4.8 Claude Opus 4.7 Claude Opus 4.6 Claude Sonnet 4.6 GPT 5.5 GPT 5.4 Gemini 3 Flash MiniMax M2.5 MiniMax M2.7 DeepSeek V3.2 Qwen 3.6 Plus DeepSeek V4 Pro DeepSeek V4 Flash Kimi K2.6 Qwen-35b (Local) Gemma 4 31B (Local)
Deploys? YES YES YES YES YES NO YES YES YES YES YES YES YES NO (404) YES NO (CrashLoop)
PSS Restricted? YES YES YES YES YES YES (if it worked) Almost (no seccomp) YES YES YES YES YES YES YES Almost (NET_BIND_SERVICE) No (runAsNonRoot: false)
runAsNonRoot true true true true true true true true true true true true true true true false
seccompProfile RuntimeDefault RuntimeDefault RuntimeDefault RuntimeDefault RuntimeDefault RuntimeDefault Not set RuntimeDefault RuntimeDefault RuntimeDefault RuntimeDefault RuntimeDefault RuntimeDefault RuntimeDefault RuntimeDefault RuntimeDefault
drop ALL caps Yes Yes Yes Yes Yes Yes Yes Yes Yes (+NET_BIND) Yes Yes Yes Yes (+NET_BIND) Yes Yes (+NET_BIND) Yes
readOnlyFS true true true true true true true true true true true true true true true true
Port 8080 Yes (with ConfigMap) Yes (with ConfigMap) Yes (with ConfigMap) Yes (with ConfigMap) Yes (with ConfigMap) Yes (NO ConfigMap!) Yes (unprivileged image) No (port 80) Yes (with ConfigMap) No (port 80) Yes (with ConfigMap) Yes (with ConfigMap) Yes (with ConfigMap) Yes (with ConfigMap) No (port 80) No (port 80)
NetworkPolicy Yes (deny+ingress+DNS from ingress-nginx) Yes (deny+ingress+DNS) Yes (ingress+egress) Yes (ingress+egress) No No Yes (ingress) Yes (ingress+egress) No Yes (ingress) No No No Yes (deny+DNS from ingress-nginx) No No
automountSAToken: false Yes Yes (SA + pod) Yes Yes Yes Yes No Commented out No No Yes Not set Not set Yes No Not set
PDB Yes Yes Yes Yes Yes Yes No Yes No Yes No No No Yes No No
HPA No Yes Yes Yes No Yes No No No Yes No No No No No No
ConfigMap Yes (comprehensive) Yes (comprehensive) Yes (comprehensive) Yes (comprehensive) Yes (defaultMode 0444) No No No Yes (with security headers) Yes (comprehensive) Yes (port 8080, server_tokens off) Yes Yes Yes (full nginx.conf) No No
Nginx security headers Yes (X-Frame-Options, X-Content-Type-Options, Referrer-Policy, server_tokens off) Yes (6 headers) Yes (7 headers) Yes (7 headers) Yes No No No Yes Yes (4 headers) No No No Yes (6 headers + server_tokens off) No No
Rate limiting No No Yes Yes No No No No Yes Yes No No No No No No
PSA namespace labels Yes (restricted) Yes (restricted) No No Yes (restricted) No No No No No No No No No No No
ServiceAccount Yes Yes No No No No No No No No No No No No No No

Notable observations:

  • Opus 4.8 achieves full PSS Restricted with comprehensive hardening — runAsNonRoot with uid 101, seccompProfile RuntimeDefault, drop ALL capabilities, readOnlyRootFilesystem true, automountServiceAccountToken false, dedicated ServiceAccount. ConfigMap replaces nginx.conf with port 8080, security headers (X-Frame-Options, X-Content-Type-Options, Referrer-Policy, server_tokens off). PSA namespace labels with restricted enforce/audit/warn. NetworkPolicy with default-deny plus DNS egress and ingress scoped to ingress-nginx namespace. PDB present. No HPA or rate limiting.
  • Opus 4.7 creates a Namespace with PSA restricted enforce/audit/warn labels — the only model to add cluster-level enforcement beyond pod-level security contexts. ConfigMap replaces the entire nginx.conf with comprehensive hardening (temp paths, server_tokens off, 6 security headers). emptyDir volumes with sizeLimit and medium: Memory. ConfigMap mounted with defaultMode: 0444. NetworkPolicy uses default-deny base with ingress from ingress-nginx namespace and DNS egress targeting kube-dns pod selector. Uses app.kubernetes.io/name label convention. Missing: rate limiting, HTTP method restriction, hidden file blocking.
  • Opus 4.6 achieves full PSS Restricted with comprehensive ConfigMap (port 8080, security headers including 7 headers), rate limiting, NetworkPolicy (ingress+egress), PDB, HPA, and automountServiceAccountToken: false. Deploys successfully. No PSA namespace labels or dedicated ServiceAccount.
  • Sonnet 4.6 finally gets it right here — proper non-root (uid 101), ConfigMap with port 8080 and PID in /tmp, comprehensive security headers, rate limiting, memory-backed emptyDirs with size limits. The most complete security implementation.
  • GPT 5.4 has perfect security settings on paper but critically fails to provide the ConfigMap needed to make nginx listen on port 8080. Acknowledges this in a caveat note — but the manifest as-delivered does not work.
  • Gemini 3 Flash took the smartest approach: used nginxinc/nginx-unprivileged:stable-alpine which natively runs non-root on 8080 — no ConfigMap needed. But missed seccompProfile and some features.
  • MiniMax M2.5 used nginx:1.21 (pinned version, good) but ran on port 80 as non-root. Placed topologySpreadConstraints under affinity (invalid placement). Applied PSS enforce label to namespace (good). Left automountServiceAccountToken: false commented out (half-hearted).
  • DeepSeek V3.2 provided a comprehensive ConfigMap but had limit_req_zone inside the server block in default.conf (invalid — must be in http block). Also included deprecated seccomp annotation alongside the modern field. Added NET_BIND_SERVICE (unnecessary if they’d used port 8080).
  • Qwen 3.6 Plus redeems itself completely here — full PSS Restricted with uid 101, ConfigMap for port 8080 with server_tokens off, drop ALL capabilities (no NET_BIND needed), readOnlyRootFilesystem with 5 emptyDir volumes (cache, run, tmp, log), automountServiceAccountToken: false. Clean implementation referencing CIS Benchmark and NSA/CISA guidelines. No NetworkPolicy, PDB, or security headers — functional but not the most feature-rich.
  • DeepSeek V4 Pro achieves full PSS Restricted — nginx:latest with ConfigMap for port 8080, runAsNonRoot with uid 101, seccompProfile RuntimeDefault, drop ALL capabilities, readOnlyRootFilesystem true, emptyDir volumes for cache and run directories. Learns from its Production failure and gets all the non-root pieces right. No NetworkPolicy, automountServiceAccountToken, PDB, HPA, security headers, or rate limiting.
  • GPT 5.5 achieves full PSS Restricted with uid 101, seccompProfile RuntimeDefault, drop ALL capabilities, readOnlyRootFilesystem, automountServiceAccountToken false, and PSA namespace labels (restricted enforce/audit/warn) — the second model (after Opus 4.7) to add cluster-level enforcement. ConfigMap for port 8080 with defaultMode 0444. Security headers present. startupProbe in addition to liveness+readiness. topologySpreadConstraints and PDB for availability. ephemeral-storage limits. A comprehensive hardened deployment that matches or exceeds most models on feature coverage.
  • DeepSeek V4 Flash achieves full PSS Restricted — uid 101, readOnlyRootFilesystem, seccomp RuntimeDefault, drop ALL capabilities (+NET_BIND_SERVICE), emptyDir volumes for /tmp, /var/cache/nginx, /var/run. ConfigMap for port 8080. Pod anti-affinity for scheduling. ClusterIP service. No NetworkPolicy, automountServiceAccountToken, PDB, HPA, security headers, or rate limiting — functional but not feature-rich.
  • Kimi K2.6 achieves full PSS Restricted with comprehensive security — uid 101, seccomp RuntimeDefault, drop ALL capabilities, readOnlyRootFilesystem, automountServiceAccountToken false. Full nginx.conf ConfigMap with 6 security headers and server_tokens off. NetworkPolicy with default-deny and DNS egress from ingress-nginx namespace. PDB present. However, try_files $uri $uri/ =404; in nginx.conf causes the root path to return 404, which fails the startup probe. Pod runs but never becomes Ready.
  • Qwen-35b (Local) produces a genuinely good hardened deployment — runAsNonRoot with uid 101, seccompProfile RuntimeDefault, allowPrivilegeEscalation false, drop ALL capabilities (adds NET_BIND_SERVICE — the only thing preventing full PSS Restricted), readOnlyRootFilesystem true. Uses nginx:1.25.3-alpine (a good pinned image choice). Does NOT use nginxinc/nginx-unprivileged — instead runs standard nginx as uid 101 on port 80 (valid alternative). Liveness and readiness probes present. Resource limits set. No ConfigMap, NetworkPolicy, PDB, HPA, automountServiceAccountToken, or security headers. A clean, minimal hardened deployment that works.
  • Gemma 4 31B (Local) has good security intent in the hardened deployment — drop ALL capabilities, readOnlyRootFilesystem true, allowPrivilegeEscalation false, seccompProfile RuntimeDefault, volume mounts for cache/run/tmp, and resource limits. However, runAsNonRoot: false means it runs as root (uid 0). With ALL capabilities dropped, nginx:latest cannot chown /var/cache/nginx/client_temp — the same failure mode as Claude Sonnet 4.6 and Qwen 3.6 Plus in production. The model recommended using nginxinc/nginx-unprivileged in its comments but did not implement it. Falls into port 80 without ConfigMap. No ConfigMap, NetworkPolicy, PDB, HPA, automountServiceAccountToken, or security headers.

Overall Scoring

Usability Score (out of 5)

Model Basic Production Hardened Total Average
Claude Opus 4.8 5 5 5 15 5.0
Claude Opus 4.7 5 5 5 15 5.0
Claude Opus 4.6 5 5 5 15 5.0
Claude Sonnet 4.6 1 1 5 7 2.3
GPT 5.5 5 5 5 15 5.0
GPT 5.4 5 1 1 7 2.3
Gemini 3 Flash 5 5 5 15 5.0
Kimi K2.6 1 5 3 9 3.0
MiniMax M2.5 5 1 4 10 3.3
MiniMax M2.7 5 1 5 11 3.7
DeepSeek V3.2 5 5 5 15 5.0
Qwen 3.6 Plus 5 1 5 11 3.7
DeepSeek V4 Pro 5 1 5 11 3.7
DeepSeek V4 Flash 5 1 5 11 3.7
Qwen-35b (Local) 5 2 4 11 3.7
Gemma 4 31B (Local) 5 5 1 11 3.7

Scoring key: 5=deploys and works perfectly, 4=deploys with minor issues, 3=deploys but significant issues, 1=does not deploy (CrashLoopBackOff/probe failure)

Security Score (out of 5)

Model Basic Production Hardened Total Average
Claude Opus 4.8 1 4 5 10 3.3
Claude Opus 4.7 1 5 5 11 3.7
Claude Opus 4.6 1 5 5 11 3.7
Claude Sonnet 4.6 3 3 5 11 3.7
GPT 5.5 1 5 5 11 3.7
Kimi K2.6 1 5 5 11 3.7
GPT 5.4 1 3 5 9 3.0
Gemini 3 Flash 1 2 4 7 2.3
MiniMax M2.5 1 2 4 7 2.3
MiniMax M2.7 1 3 5 9 3.0
DeepSeek V3.2 1 4 5 10 3.3
Qwen 3.6 Plus 1 3 5 9 3.0
DeepSeek V4 Pro 1 3 5 9 3.0
DeepSeek V4 Flash 1 3 5 9 3.0
Qwen-35b (Local) 1 2 4 7 2.3
Gemma 4 31B (Local) 1 1 3 5 1.7

Scoring key: 5=PSS Restricted compliant, 4=near-Restricted (missing 1-2 items), 3=significant hardening but fails Restricted, 2=minimal hardening, 1=no security context

Combined Score (Usability + Security, out of 10)

Model Basic Production Hardened Total Average
Claude Opus 4.7 6 10 10 26 8.7
GPT 5.5 6 10 10 26 8.7
Claude Opus 4.6 6 10 10 26 8.7
Claude Opus 4.8 6 9 10 25 8.3
DeepSeek V3.2 6 9 10 25 8.3
Gemini 3 Flash 6 7 9 22 7.3
Qwen 3.6 Plus 6 4 10 20 6.7
DeepSeek V4 Pro 6 4 10 20 6.7
DeepSeek V4 Flash 6 4 10 20 6.7
MiniMax M2.7 6 4 10 20 6.7
Kimi K2.6 2 10 8 20 6.7
Claude Sonnet 4.6 4 4 10 18 6.0
Qwen-35b (Local) 6 4 8 18 6.0
MiniMax M2.5 6 3 8 17 5.7
GPT 5.4 6 4 6 16 5.3
Gemma 4 31B (Local) 6 6 4 16 5.3

Key Findings

Co-Leaders: Claude Opus 4.7, GPT 5.5, and Opus 4.6 (8.7 average)

  • All three achieve perfect 10/10 on Production and Hardened scenarios with all 3 manifests deploying successfully
  • All three reach PSS Restricted at Production level — joined by Kimi K2.6 as the only models to achieve Restricted without the explicit “hardened” prompt
  • GPT 5.5 joins Opus 4.7 as only the second model to include PSA namespace labels in the Hardened scenario, demonstrating cluster-level enforcement awareness. ConfigMap with defaultMode 0444, topologySpreadConstraints, PDB, startupProbe, and ephemeral-storage limits show comprehensive production knowledge
  • Opus 4.7 brings unique strengths: image pinning (nginx:1.27.2 in Production — only model to avoid :latest), PSA namespace labels in Hardened (only model with cluster-level enforcement), ConfigMap defaultMode: 0444, emptyDir with Memory medium + sizeLimit
  • Opus 4.6 retains advantages in: rate limiting, HTTP method restriction, hidden file blocking, more security headers (7 vs 6)
  • Both show excellent prompt sensitivity: minimal response for basic prompt, comprehensive for production/hardened

Previous Leader: DeepSeek V3.2 (8.3 average)

  • All 3 manifests deploy successfully
  • Strong security defaults even without explicit hardening prompts
  • Practical choices: switched to alpine image, pinned version, ran as non-root
  • Only weaknesses: probe targets /healthz (needs ConfigMap), missing seccomp in production

Best Security (when it works): Claude Sonnet 4.6

  • Hardened manifest is the gold standard: comprehensive ConfigMap, security headers, rate limiting, NetworkPolicy with RFC1918 blocking, memory-backed emptyDirs with size limits
  • But 2 out of 3 manifests don’t deploy — the capability issue is a serious, repeated error
  • Over-engineers every response (6-10 objects when 2 were asked for)

Most Reliable: Gemini 3 Flash

  • Deploys all 3 manifests successfully using pragmatic approaches
  • Uses nginxinc/nginx-unprivileged for hardened scenario — avoids the entire root/port/capability problem
  • Weaker on security features (no seccomp) but consistently simple and correct

Qwen 3.6 Plus, DeepSeek V4 Pro, DeepSeek V4 Flash, MiniMax M2.7, and Kimi K2.6: Tied at 6.7 average

  • All five score 20 total but with different failure patterns
  • Qwen 3.6 Plus falls into the same chown trap as Sonnet 4.6 (drop ALL + NET_BIND_SERVICE, root on port 80). Knows the right answer (mentions non-root with port 8080 in notes) but doesn’t implement it. Hardened response is excellent — full PSS Restricted with ConfigMap, uid 101, automountServiceAccountToken: false
  • DeepSeek V4 Pro takes a different path to the same Production failure: runs as uid 101 with drop ALL capabilities (good intent) but omits emptyDir volumes for nginx cache directories. UID 101 cannot create subdirectories in /var/cache/nginx. Hardened response is fully PSS Restricted with ConfigMap and emptyDir volumes — learns from its own Production mistake
  • DeepSeek V4 Flash falls into the same chown trap as Qwen 3.6 Plus and Sonnet 4.6: drops ALL capabilities, adds NET_BIND_SERVICE, runs as root on port 80. Hardened response is fully PSS Restricted with uid 101, ConfigMap for port 8080, emptyDir volumes, seccomp RuntimeDefault, and pod anti-affinity
  • Kimi K2.6 has the most unusual failure pattern: Basic response was HTML (not YAML at all), Production achieves full PSS Restricted and deploys perfectly, but Hardened fails due to a try_files bug in the nginx.conf causing 404 on root path. The only model to fail Basic while passing Production
  • MiniMax M2.7 fixes M2.5’s structural YAML errors but production manifest has runAsNonRoot without runAsUser — Kubernetes rejects the pod

Second Local Model: Gemma 4 31B (5.3 average, tied with GPT 5.4)

  • A 31B dense model running locally on LM Studio — the second local/self-hosted model tested
  • Basic and Production deployments are minimal but functional — no security context, no resource limits in basic, clean deployable YAML
  • Hardened deployment has good security intent (drop ALL capabilities, readOnlyRootFilesystem, allowPrivilegeEscalation false, emptyDir volumes for cache/run/tmp, seccompProfile RuntimeDefault) but critical flaw: runAsNonRoot: false with ALL capabilities dropped causes the same chown failure as Claude Sonnet 4.6 and Qwen 3.6 Plus in their production manifests
  • The model explicitly recommended nginxinc/nginx-unprivileged in its comments but did not implement it — demonstrating knowledge of the correct solution without applying it
  • No ConfigMap, NetworkPolicy, PDB, HPA, security headers, or automountServiceAccountToken across any scenario
  • Weakest security overall among the models tested, with the lowest combined security average (1.7/5)

First Local Model: Qwen-35b (6.0 average, tied with Sonnet 4.6)

  • A 35B-parameter MoE model running locally on LM Studio — the first local/self-hosted model tested
  • Production deployment has a novel failure mode: securityContext and probes nested under invalid YAML keys (“container security context:” and “health checks:” instead of proper Kubernetes field names). The model knows the right settings but generates structurally invalid YAML
  • Hardened deployment is genuinely good — runAsNonRoot, uid 101, seccomp RuntimeDefault, readOnlyRootFilesystem, drop ALL capabilities. Only NET_BIND_SERVICE prevents PSS Restricted
  • Uses nginx:1.25.3-alpine (good pinned image choice) and runs as uid 101 without nginxinc/nginx-unprivileged (valid alternative)
  • Competitive with several cloud-hosted models despite being a much smaller local model

Most Improved Across Prompts: All models

  • Every model produced meaningfully better security when explicitly asked for hardening
  • The jump from “production” to “hardened” was more significant than from “basic” to “production”

Worst Bug: GPT 5.4 Hardened

  • Specifies port 8080 everywhere but provides no ConfigMap — the manifest is guaranteed to fail
  • The model even acknowledges this in a caveat but doesn’t fix it

Structural Error: MiniMax M2.5 Production

  • Places capabilities under pod-level securityContext — this is invalid Kubernetes YAML
  • Kubernetes silently ignores it, so the capabilities are never actually dropped

Notable Good Results

  1. Gemini 3 Flash (Hardened) — Uses nginxinc/nginx-unprivileged, the smartest approach. Avoids the entire root/port/capability problem that tripped up other models.
  2. DeepSeek V3.2 (Production) — Only model to produce a fully functional, security-hardened production deployment without the hardened prompt. Demonstrates strong baseline security awareness.
  3. Claude Sonnet 4.6 (Hardened) — Most thorough security implementation: 17-row security feature table, RFC1918 egress blocking, memory-backed tmpfs with size limits, init container for config validation.
  4. Qwen 3.6 Plus (Hardened) — Clean redemption after Production failure. Full PSS Restricted with ConfigMap for port 8080, uid 101, server_tokens off, 5 emptyDir volumes, and automountServiceAccountToken: false. References CIS Benchmark and NSA/CISA guidelines.
  5. DeepSeek V4 Pro (Hardened) — Full PSS Restricted compliant with ConfigMap for port 8080, uid 101, seccomp RuntimeDefault, emptyDir volumes for cache and run directories. Correctly addresses the non-root nginx pitfall that caused its own Production failure.

Notable Bad Results

  1. Claude Sonnet 4.6 (Basic & Production) — Drops ALL capabilities without understanding nginx:latest needs CHOWN. The production version runs as runAsUser: 0 despite configuring port 8080 (which doesn’t need root). Self-contradictory comments in the YAML.
  2. GPT 5.4 (Hardened) — Perfect security settings on a manifest that doesn’t work. The model knows this (says “assumes you will provide an nginx config”) but delivers non-functional YAML anyway.
  3. MiniMax M2.5 (Production) — Invalid YAML structure (capabilities under pod securityContext) demonstrates incomplete Kubernetes API knowledge.
  4. GPT 5.4 (Basic) — Zero security features for a basic prompt. Not even resource limits. While technically correct for the prompt, it’s the least production-aware default.
  5. Qwen 3.6 Plus (Production) — Same chown trap as Sonnet 4.6: drops ALL capabilities, adds NET_BIND_SERVICE, runs as root on port 80. The model’s own text notes recommend using non-root with port 8080 — demonstrating it knows the right answer but fails to implement it.
  6. Gemma 4 31B (Hardened) — Drops ALL capabilities while keeping runAsNonRoot: false (root user), causing the same CrashLoopBackOff chown failure as Sonnet 4.6. The model explicitly recommended using nginxinc/nginx-unprivileged in its own comments but failed to implement the recommendation. Weakest security profile of any model tested.

Back to top

Dearbhadh — LLM Kubernetes Security Assessment Tool

This site uses Just the Docs, a documentation theme for Jekyll.