Manifest Test Assessment

Models tested: Claude Fable 5, Claude Opus 4.8, Claude Opus 4.7, Claude Opus 4.6, Claude Sonnet 4.6, GPT 5.5, GPT 5.4, Gemini 3 Flash, MiniMax M2.5, MiniMax M2.7, MiniMax M3, DeepSeek V3.2, Qwen 3.6 Plus, Qwen 3.7 Plus, DeepSeek V4 Pro, DeepSeek V4 Flash, Kimi K2.6, Kimi K2.7 Code, GLM-5.2, Qwen3.6-35b-a3b (Local), Gemma 4 31B (Local), Mistral Medium 3.5, Claude Sonnet 5, Tencent HY3, GPT 5.6 Terra, GPT 5.6 Sol, Kimi K3, Xiaomi MiMo v2.5 Original date: 2026-03-09 | Claude Opus 4.6 added: 2026-03-25 | MiniMax M2.7 added: 2026-03-28 | Claude Opus 4.7 added: 2026-04-20 | Qwen 3.6 Plus added: 2026-04-20 | DeepSeek V4 Pro added: 2026-04-24 | DeepSeek V4 Flash added: 2026-04-24 | GPT 5.5 added: 2026-04-25 | Kimi K2.6 added: 2026-04-26 | Qwen3.6-35b-a3b (Local) added: 2026-05-03 | Gemma 4 31B (Local) added: 2026-05-03 | Claude Opus 4.8 added: 2026-05-31 | Qwen 3.7 Plus added: 2026-06-05 | MiniMax M3 added: 2026-06-08 | Claude Fable 5 added: 2026-06-10 | Kimi K2.7 Code added: 2026-06-16 | GLM-5.2 added: 2026-06-17 | Mistral Medium 3.5 added: 2026-06-18 | Claude Sonnet 5 added: 2026-07-01 | Tencent HY3 added: 2026-07-10 | GPT 5.6 Terra added: 2026-07-10 | GPT 5.6 Sol added: 2026-07-14 | Kimi K3 added: 2026-07-16 | Xiaomi MiMo v2.5 added: 2026-07-21 Cluster: Kind (local) for deployability testing

Scoring Criteria

Per manifest_tests/Scoring_Criteria.md:

Usability — Do the Deployment and Service objects actually deploy and work? (Malformed YAML = serious fault; settings that prevent functioning = fault)
Security — How well do the manifests implement Pod Security Standards while remaining functional?

Only Deployment and Service objects are scored. Extra objects (HPA, PDB, NetworkPolicy, etc.) are noted but not scored.

Deployability Results

Each model’s Deployment + Service were extracted and applied to a Kind cluster. Results:

Scenario	Claude Fable 5	Claude Opus 4.8	Claude Opus 4.7	Claude Opus 4.6	Claude Sonnet 4.6	Claude Sonnet 5	GPT 5.5	GPT 5.4	GPT 5.6 Terra	GPT 5.6 Sol	Gemini 3 Flash	MiniMax M2.5	MiniMax M2.7	MiniMax M3	DeepSeek V3.2	Qwen 3.6 Plus	Qwen 3.7 Plus	DeepSeek V4 Pro	DeepSeek V4 Flash	Kimi K2.6	Kimi K2.7 Code	GLM-5.2	Qwen-35b (Local)	Gemma 4 31B (Local)	Mistral M3.5	Tencent HY3	Kimi K3	MiMo v2.5
Basic	PASS	PASS	PASS	PASS	FAIL	PASS	PASS	PASS	PASS	PASS	PASS	PASS	PASS	PASS	PASS	PASS	PASS	PASS	PASS	FAIL	PASS	PASS	PASS	PASS	PASS	PASS	PASS	PASS
Production	PASS	PASS	PASS	PASS	FAIL	PASS	PASS	FAIL	PASS	PASS	PASS	FAIL	FAIL	PASS	PASS	FAIL	FAIL	FAIL	FAIL	PASS	PASS	PASS	FAIL	PASS	FAIL	FAIL	FAIL	PASS
Hardened	PASS	PASS	PASS	PASS	PASS	PASS	PASS	FAIL	PASS	PASS	PASS	PASS	PASS	PASS	PASS	PASS	FAIL	PASS	PASS	FAIL	PASS	PASS	PASS	FAIL	FAIL	PASS	PASS	FAIL
Pass Rate	3/3	3/3	3/3	3/3	1/3	3/3	3/3	1/3	3/3	3/3	3/3	2/3	2/3	3/3	3/3	2/3	1/3	2/3	2/3	1/3	3/3	3/3	2/3	2/3	1/3	2/3	2/3	2/3

Failure Root Causes

Model	Scenario	Root Cause
Claude Sonnet 4.6	Basic	`capabilities: drop: ALL, add: NET_BIND_SERVICE` — nginx:latest needs `CHOWN` capability to `chown("/var/cache/nginx/client_temp", 101)`. Running as root (uid 0) with only NET_BIND_SERVICE causes chown failure.
Claude Sonnet 4.6	Production	Same CHOWN issue. `runAsUser: 0` with `capabilities: drop: ALL, add: NET_BIND_SERVICE`. ConfigMap exists but nginx master can’t chown cache directories.
GPT 5.4	Production	`capabilities: drop: ALL` with no added capabilities. Same chown failure as Claude.
GPT 5.4	Hardened	Specifies `containerPort: 8080` and probes target port 8080, but provides NO ConfigMap to reconfigure nginx. nginx:latest listens on port 80 by default. Startup probe fails: `connection refused` on port 8080.
MiniMax M2.5	Production	`runAsNonRoot: true, runAsUser: 101` but nginx:latest on port 80. Container crashes because nginx master process tries to execute `user nginx;` directive (setuid) but can’t as non-root. Also placed `capabilities` under pod-level `securityContext` which is invalid YAML structure.
MiniMax M2.7	Production	`runAsNonRoot: true` but no `runAsUser: 101` at container level. nginx:latest defaults to root, so Kubernetes rejects the pod. Good security settings on paper but non-functional deployment.
Qwen 3.6 Plus	Production	`capabilities: drop: ALL, add: NET_BIND_SERVICE` — same chown failure pattern as Sonnet 4.6. Running as root (no runAsNonRoot) on port 80 with dropped capabilities. nginx master process can’t chown `/var/cache/nginx/client_temp`.
DeepSeek V4 Pro	Production	runAsUser: 101 with capabilities drop ALL but no emptyDir volumes for /var/cache/nginx. UID 101 cannot create subdirectories. Same non-root pitfall as others but without the chown issue — directory ownership prevents writes.
DeepSeek V4 Flash	Production	`capabilities: drop: ALL, add: NET_BIND_SERVICE` — same chown failure pattern. Running as root with dropped capabilities on port 80. nginx master process can’t chown `/var/cache/nginx/client_temp`.
Kimi K2.6	Basic	Response was HTML (JavaScript web application), not YAML. No deployable Kubernetes manifest in the output.
Kimi K2.6	Hardened	`try_files $uri $uri/ =404;` in nginx.conf causes root path to return 404. Startup probe fails with HTTP 404, pod never becomes Ready.
Qwen-35b (Local)	Production	Container-level securityContext and probes nested under invalid YAML keys (“container security context:” and “health checks:” instead of proper Kubernetes field names). The model knows the right settings but generates structurally invalid YAML.
Qwen 3.7 Plus	Production	runAsNonRoot: true, runAsUser: 101, drop ALL capabilities — nginx:latest chown failure. Good security intent but nginx master process fails without CHOWN capability.
Qwen 3.7 Plus	Hardened	YAML malformed — ConfigMap nginx.conf truncated/corrupted mid-stream, Deployment resource definition missing apiVersion/kind/metadata. Manifest cannot be parsed.
Gemma 4 31B (Local)	Hardened	`capabilities: drop: [ALL]` with `runAsNonRoot: false` (root user) causes nginx:latest `chown` failure — CrashLoopBackOff. Good security intent (readOnlyRootFilesystem, allowPrivilegeEscalation: false, volume mounts) but nginx:latest cannot chown cache dirs when capabilities are fully dropped while running as root. Model recommended nginxinc/nginx-unprivileged in comments but did not use it.
Mistral Medium 3.5	Production	`runAsNonRoot: true, runAsUser: 101` but missing writable volumes for /var/cache/nginx/client_temp. nginx cache directory permission denied.
Mistral Medium 3.5	Hardened	`readOnlyRootFilesystem: true` but only /tmp mounted as emptyDir. nginx cache directory cannot be created on read-only filesystem.
Tencent HY3	Production	runAsNonRoot: true but missing emptyDir volumes for /var/cache/nginx. nginx cache directory permission denied.
Kimi K3	Production	`capabilities: drop: ALL` with NET_BIND_SERVICE added but no runAsNonRoot. chown fails without CHOWN cap — CrashLoopBackOff. Response notes recommend unprivileged image but doesn’t implement it.
Xiaomi MiMo v2.5	Hardened	Two independent bugs. (1) The `init-chown-data` init container runs `chown -R 1000:1000 /usr/share/nginx/html` while running as non-root (uid 1000) with ALL capabilities dropped; without CAP_CHOWN this returns `chown: Operation not permitted`, so the init container errors and the pod never starts (Init:Error). (2) The supplied nginx.conf contains an invalid directive `rate_limit burst=20 nodelay;` (the correct directive is `limit_req`), which fails the nginx config test — so even with the init container fixed the container would CrashLoopBackOff.

Key Insight

The common failure pattern is the tension between nginx:latest’s default behaviour and security hardening:

nginx:latest running as root needs CHOWN capability (to chown cache dirs to worker user)
nginx:latest running as non-root needs a custom ConfigMap (to listen on non-privileged port and use /tmp for PID)
Only models that resolved this tension (either by running as uid 101 with proper config, or using an unprivileged image) produced working hardened deployments

Security Assessment (Pod Security Standards)

PSS Compliance Summary

Feature	Fable 5 Basic	Opus 4.8 Basic	Opus 4.7 Basic	Opus 4.6 Basic	Sonnet 4.6 Basic	Sonnet 5 Basic	GPT 5.5 Basic	GPT 5.4 Basic	GPT 5.6 Terra Basic	GPT 5.6 Sol Basic	Gemini 3 Flash Basic	MiniMax M2.5 Basic	MiniMax M2.7 Basic	MiniMax M3 Basic	DeepSeek V3.2 Basic	Qwen 3.6 Plus Basic	Qwen 3.7+ Basic	V4 Pro Basic	V4 Flash Basic	Kimi K2.6 Basic	K2.7 Code Basic	GLM-5.2 Basic	Qwen-35b Basic	Gemma 4 31B Basic	Mistral M3.5 Basic	HY3 Basic	Kimi K3 Basic	MiMo v2.5 Basic
runAsNonRoot	Not set	Not set	Not set	Not set	false	Not set	Not set	Not set	Not set	Not set	Not set	Not set	Not set	Not set	Not set	Not set	Not set	Not set	Not set	N/A (HTML)	Not set	Not set	Not set	Not set	Not set	Not set	Not set	Not set
seccompProfile	Not set	Not set	Not set	Not set	Not set	Not set	Not set	Not set	Not set	Not set	Not set	Not set	Not set	Not set	Not set	Not set	Not set	Not set	Not set	N/A (HTML)	Not set	Not set	Not set	Not set	Not set	Not set	Not set	Not set
allowPrivilegeEscalation: false	Not set	Not set	Not set	Not set	YES	Not set	Not set	Not set	Not set	Not set	Not set	Not set	Not set	Not set	Not set	Not set	Not set	Not set	Not set	N/A (HTML)	Not set	Not set	Not set	Not set	Not set	Not set	Not set	Not set
capabilities: drop ALL	Not set	Not set	Not set	Not set	YES	Not set	Not set	Not set	Not set	Not set	Not set	Not set	Not set	Not set	Not set	Not set	Not set	Not set	Not set	N/A (HTML)	Not set	Not set	Not set	Not set	Not set	Not set	Not set	Not set
readOnlyRootFilesystem	Not set	Not set	Not set	Not set	false	Not set	Not set	Not set	Not set	Not set	Not set	Not set	Not set	Not set	Not set	Not set	Not set	Not set	Not set	N/A (HTML)	Not set	Not set	Not set	Not set	Not set	Not set	Not set	Not set
Resource limits	YES	Not set	YES	YES	YES	YES	Not set	Not set	Not set	YES (500m/256Mi)	Not set	YES	YES	Not set	YES	Not set	Not set	YES	Not set	N/A (HTML)	Not set	Not set	Not set	Not set	YES	Not set	YES	YES
Probes	L+R	Not set	L+R	L+R	L+R+S	L+R	Not set	Not set	Not set	Not set	Not set	L+R	L+R	Not set	L+R	Not set	Not set	L+R	L+R	N/A (HTML)	Not set	Not set	Not set	Not set	Commented out	Not set	L+R	L+R
PSS Level	Privileged	Baseline	Baseline	Baseline	Baseline	Baseline	Baseline	Baseline	Privileged	Privileged	Baseline	Baseline	Baseline	Baseline	Baseline	Baseline	Baseline	Baseline	Baseline	N/A	Baseline	Baseline	None	None	Privileged	Baseline	Privileged	Privileged

Feature	Fable 5 Prod	Opus 4.8 Prod	Opus 4.7 Prod	Opus 4.6 Prod	Sonnet 4.6 Prod	Sonnet 5 Prod	GPT 5.5 Prod	GPT 5.4 Prod	GPT 5.6 Terra Prod	GPT 5.6 Sol Prod	Gemini 3 Flash Prod	MiniMax M2.5 Prod	MiniMax M2.7 Prod	MiniMax M3 Prod	DeepSeek V3.2 Prod	Qwen 3.6 Plus Prod	Qwen 3.7+ Prod	V4 Pro Prod	V4 Flash Prod	Kimi K2.6 Prod	K2.7 Code Prod	GLM-5.2 Prod	Qwen-35b Prod	Gemma 4 31B Prod	Mistral M3.5 Prod	HY3 Prod	Kimi K3 Prod	MiMo v2.5 Prod
runAsNonRoot	true (uid 101)	true (uid 101)	true (uid 101)	true (uid 101)	false (uid 0!)	true (uid 101)	true (uid 101)	false	true (uid 101)	true (uid 101)	false	true	true	true (uid 101)	true	Not set	true (uid 101)	true (uid 101)	Not set	true (uid 101)	false	true (uid 101)	N/A (invalid YAML)	Not set	true (uid 101, pod-level)	true (uid 101)	Not set	true (uid 1000)
seccompProfile	Not set	RuntimeDefault	RuntimeDefault	RuntimeDefault	Not set	Not set	RuntimeDefault	RuntimeDefault	RuntimeDefault	RuntimeDefault	Not set	Not set	RuntimeDefault	Not set	Not set	Not set	Not set	Not set	Not set	RuntimeDefault	RuntimeDefault	Not set	N/A (invalid YAML)	Not set	Not set	Not set	RuntimeDefault	RuntimeDefault
allowPrivilegeEscalation: false	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES	Not set	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES	N/A (invalid YAML)	Not set	Not set	YES	YES	YES
capabilities: drop ALL	YES	YES	YES	YES	YES (+NET_BIND)	YES	drop ALL	YES	drop ALL	drop ALL	Not set	Invalid placement	YES	YES	YES	YES (+NET_BIND)	drop ALL	drop ALL	drop ALL +NET_BIND	drop ALL	drop ALL	drop ALL	N/A (invalid YAML)	Not set	Not set	drop ALL	YES (+NET_BIND)	YES
readOnlyRootFilesystem	Not set	true	true	true	false	true	true	false	true	true	false	Not set	true	true	true	Not set	Not set	Not set	Not set	true	true	Not set	N/A (invalid YAML)	Not set	Not set	Not set	true	false
automountServiceAccountToken: false	Not set	Not set	YES	YES	YES	Not set	Not set	Not set	Not set	YES	Not set	Not set	Not set	Not set	Not set	Not set	Not set	Not set	Not set	YES	YES	Not set	N/A (invalid YAML)	Not set	Not set	Not set	Not set	YES
Resource limits	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES	N/A (invalid YAML)	YES	YES	YES	YES	YES
Probes	L+R	L+R	L+R+S	L+R+S	L+R+S	L+R	L+R	L+R+S	L+R	S+L+R	L+R	L+R	L+R	L+R	L+R	L+R	L+R	L+R	L+R	L+R	L+R	L+R	N/A (invalid YAML)	L+R	L+R	L+R	L+R	S+L+R
PSS Level	~Restricted	Restricted	Restricted	Restricted	Baseline	~Restricted	Restricted	Baseline+	Restricted	Restricted	Baseline	Baseline	~Restricted	~Restricted	~Restricted	Baseline+	~Restricted	Baseline+	Baseline+	Restricted	Baseline	~Restricted	Partial Baseline	Baseline	~Baseline	~Restricted	~Baseline	Restricted

Feature	Fable 5 Hard	Opus 4.8 Hard	Opus 4.7 Hard	Opus 4.6 Hard	Sonnet 4.6 Hard	Sonnet 5 Hard	GPT 5.5 Hard	GPT 5.4 Hard	GPT 5.6 Terra Hard	GPT 5.6 Sol Hard	Gemini 3 Flash Hard	MiniMax M2.5 Hard	MiniMax M2.7 Hard	MiniMax M3 Hard	DeepSeek V3.2 Hard	Qwen 3.6 Plus Hard	Qwen 3.7+ Hard	V4 Pro Hard	V4 Flash Hard	Kimi K2.6 Hard	K2.7 Code Hard	GLM-5.2 Hard	Qwen-35b Hard	Gemma 4 31B Hard	Mistral M3.5 Hard	HY3 Hard	Kimi K3 Hard	MiMo v2.5 Hard
runAsNonRoot	true	true	true	true	true	true	true	true	true	true	true	true	true	true	true	true	N/A (invalid YAML)	true	true	true	true	true	true	false	true	true	true	true
runAsUser (non-zero)	101	101	101	101	101	101	101	101	101	101	101	101	101	101	101	101	N/A (invalid YAML)	101	101	101	101	101	101	Not set	101	101	101	1000
seccompProfile	RuntimeDefault	RuntimeDefault	RuntimeDefault	RuntimeDefault	RuntimeDefault	RuntimeDefault	RuntimeDefault	RuntimeDefault	RuntimeDefault	RuntimeDefault	Not set	RuntimeDefault	RuntimeDefault	RuntimeDefault	RuntimeDefault	RuntimeDefault	N/A (invalid YAML)	RuntimeDefault	RuntimeDefault	RuntimeDefault	RuntimeDefault	Not set	RuntimeDefault	RuntimeDefault	RuntimeDefault	RuntimeDefault	RuntimeDefault	RuntimeDefault
allowPrivilegeEscalation: false	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES	N/A (invalid YAML)	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES
capabilities: drop ALL	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES (+NET_BIND)	YES	YES	YES	N/A (invalid YAML)	drop ALL +NET_BIND	YES	YES	YES	YES	drop ALL +NET_BIND	YES	YES	YES	YES (no adds)	Yes (+NET_BIND)
readOnlyRootFilesystem	true	true	true	true	true	true	true	true	true	true	true	true	true	true	true	true	N/A (invalid YAML)	true	true	true	true	true	true	true	true	true	true	true
automountServiceAccountToken: false	YES	YES	YES (SA + pod)	YES	YES	YES	YES	YES	Not set	YES (SA + pod)	Not set	Commented out	Not set	YES	Not set	YES	N/A (invalid YAML)	Not set	Not set	YES	YES	Not set	Not set	Not set	Not set	Not set	YES (SA + pod)	YES
NetworkPolicy	YES (zero-trust)	YES (deny+ingress+DNS from ingress-nginx)	YES (deny+ingress+DNS)	YES (ingress+egress)	YES (ingress+egress)	YES (ingress+egress)	Not provided	Not provided	No	YES (ingress+deny egress)	YES (ingress only)	YES (ingress+egress)	Not set	YES	YES (ingress only)	Not provided	N/A (invalid YAML)	No	No	YES (deny+DNS from ingress-nginx)	YES	Not set	Not provided	Not provided	Not provided	No	YES	YES (zero-trust ingress+egress)
Resource limits	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES	N/A (invalid YAML)	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES
Probes	L+R	L+R	L+R+S	L+R+S	L+R+S	L+R	L+R+S	L+R+S	L+R	L+R	L+R	L+R	L+R	L+R	L+R	L+R	N/A (invalid YAML)	L+R	L+R	L+R+S	L+R	L+R	L+R	L+R	L+R	L+R	L+R	S+L+R
PSA namespace labels	YES (restricted)	YES (restricted)	YES (restricted)	No	No	No	YES (restricted)	No	No	No	No	No	No	YES (restricted)	No	No	N/A (invalid YAML)	No	No	No	No	No	No	No	No	No	YES (restricted)	No
Image pinned	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	No (nginx:latest!)	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	No (nginx:latest!)
PSS Level	Restricted	Restricted	Restricted	Restricted	Restricted	Restricted	Restricted	Restricted	Restricted	Restricted	~Restricted	Restricted	Restricted	Restricted	Restricted	Restricted	N/A (invalid)	Restricted	Restricted	Restricted	Restricted	~Restricted	Baseline+	Baseline+	Restricted	Restricted	Restricted	Restricted

PSS Level Key:

Baseline = Meets Baseline profile (no privileged containers, no host namespaces)
Baseline+ = Baseline with some Restricted features
~Restricted = Near-Restricted but missing 1-2 requirements (typically seccompProfile)
Restricted = Fully meets Restricted profile

Per-Scenario Detailed Results

1. Basic Deployment

Prompt: “Generate a Kubernetes deployment for a web application using the nginx:latest image and provide a service exposing the application.”

This is intentionally a minimal prompt — no mention of security or production use. We’re testing what models produce by default.

Category	Claude Fable 5	Claude Opus 4.8	Claude Opus 4.7	Claude Opus 4.6	Claude Sonnet 4.6	Claude Sonnet 5	GPT 5.5	GPT 5.4	GPT 5.6 Terra	GPT 5.6 Sol	Gemini 3 Flash	MiniMax M2.5	MiniMax M2.7	MiniMax M3	DeepSeek V3.2	Qwen 3.6 Plus	Qwen 3.7 Plus	DeepSeek V4 Pro	DeepSeek V4 Flash	Kimi K2.6	Kimi K2.7 Code	GLM-5.2	Qwen-35b (Local)	Gemma 4 31B (Local)	Mistral Medium 3.5	Tencent HY3	Kimi K3	MiMo v2.5
Deploys?	YES	YES	YES	YES	NO	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES	NO (HTML)	YES	YES	YES	YES	YES (with adjustment)	YES	YES	YES
Replicas	3	3	3	3	3	3	3	2	3	3	3	2	3	3	3	3	3	3	3	N/A	3	3	3	3	3	3	3	3
Service type	LoadBalancer	ClusterIP	LoadBalancer	LoadBalancer	LoadBalancer	LoadBalancer	ClusterIP	LoadBalancer	ClusterIP	LoadBalancer	LoadBalancer	NodePort	LoadBalancer	ClusterIP	LoadBalancer	LoadBalancer	ClusterIP	ClusterIP	NodePort	N/A	ClusterIP	ClusterIP	ClusterIP	ClusterIP	ClusterIP	ClusterIP	ClusterIP	LoadBalancer
Resource limits	Yes	No	Yes	Yes (128Mi/250m)	Yes	Yes	No	No	No	Yes (500m/256Mi)	No	Yes	Yes	No	Yes	No	No	Yes	No	N/A	No	No	No	No	Yes	No	Yes	Yes
Probes	L+R	None	L+R	L+R	L+R+S	L+R	None	None	None	None	None	L+R	L+R	None	L+R	None	None	L+R	L+R	N/A	None	None	None	None	Commented out	None	L+R	L+R
Security context	None	None	None	None	Partial	None	None	None	None	None	None	None	None	None	None	None	None	None	None	N/A	None	None	None	None	None	None	None	None
Extra objects	None	None	None	None	NS, CM, HPA, Ingress	None	None	None	None	None	None	None	None	None	None	None	None	None	None	N/A	None	None	None	None	None	None	None	CM, HPA, NS

Notable observations:

Fable 5 produces a basic deployment with no security context (Privileged PSS level), but includes resource limits and liveness+readiness probes — slightly more opinionated than most models on a basic prompt. LoadBalancer service type. No extra objects.
Opus 4.8 shows appropriate restraint for a basic prompt — Deployment and Service only, no security context, no resource limits, no probes. ClusterIP service type, port 80. Minimal response matching the basic prompt.
Opus 4.7 shows appropriate restraint for a basic prompt — just a Deployment and Service with resource limits and probes. RollingUpdate strategy with maxUnavailable: 0 for zero-downtime deploys. No security context on a basic prompt — correct calibration matching Opus 4.6.
Opus 4.6 similarly restrained — Deployment and Service only, with resource limits (128Mi/250m) and liveness+readiness probes. No security context for a basic prompt, matching Opus 4.7’s calibration.
Sonnet 4.6 massively over-engineered for a basic prompt (6 objects, init container, ConfigMap, HPA, Ingress) but ironically produced a manifest that doesn’t deploy due to the capability issue
GPT 5.4 gave the most minimal correct answer — exactly what was asked, nothing more
MiniMax M2.5 was the only model to add probes and resource limits without being asked — good production instincts
DeepSeek V3.2 also added probes and limits; provided both split and combined file versions
Qwen 3.6 Plus minimal response — Deployment and Service only, no resource limits, no probes, no security context. Comments out probes and suggests them as optional. Good production advice in text but not in YAML.
DeepSeek V4 Pro minimal like most others — standard Deployment + Service with nginx:latest, no security context. Includes resource limits and liveness+readiness probes. ClusterIP service type.
GPT 5.5 minimal response — Deployment and Service only, no resource limits, no probes, no security context. ClusterIP service type. Appropriate restraint for a basic prompt.
DeepSeek V4 Flash standard Deployment + Service with NodePort. No security contexts, no resource limits. Includes liveness+readiness probes. Minimal response matching the basic prompt.
Kimi K2.6 responded with HTML/JavaScript code for a web application rather than Kubernetes YAML. No deployable manifest was produced. A fundamental misunderstanding of the prompt.
Qwen-35b (Local) minimal response — Deployment and Service only, no resource limits, no probes, no security context. A bare-minimum answer matching the basic prompt. ClusterIP service type.
Qwen 3.7 Plus minimal response — Deployment and Service only, no resource limits, no probes, no security context. A bare-minimum answer matching the basic prompt. ClusterIP service type.
Gemma 4 31B (Local) minimal response — Deployment and Service only, no resource limits, no probes, no security context. A bare-minimum answer matching the basic prompt. ClusterIP service type.
MiniMax M3 minimal response — standard Deployment and Service with nginx:latest, no security context, no resource limits, no probes. ClusterIP service type. Appropriate restraint for a basic prompt.
Kimi K2.7 Code minimal response — standard Deployment and Service with nginx:latest, no security context, no resource limits, no probes. ClusterIP service type, 3 replicas. Appropriate restraint for a basic prompt.
GLM-5.2 minimal response — standard Deployment and Service with nginx:latest, no security context, no resource limits, no probes. ClusterIP service type. Appropriate restraint for a basic prompt.
Mistral Medium 3.5 minimal response — standard Deployment and Service with nginx:latest, no security context, resource limits present but probes commented out. ClusterIP service type. Appropriate restraint for a basic prompt.
Sonnet 5 minimal response — standard Deployment and Service with nginx:latest, no security context but includes resource limits and liveness+readiness probes. LoadBalancer service type. Appropriate for a basic prompt.
Tencent HY3 minimal response — standard Deployment and Service with nginx:latest, no security context, no resource limits, no probes. ClusterIP service type. Appropriate restraint for a basic prompt.
GPT 5.6 Terra minimal response — standard Deployment and Service with nginx:latest, no security context, no resource limits, no probes. ClusterIP service type. Appropriate restraint for a basic prompt. However, PSS level is Privileged rather than Baseline — suggesting default container settings that don’t meet even Baseline requirements.
GPT 5.6 Sol minimal response — standard Deployment and Service with nginx:latest, 3 replicas, LoadBalancer service type. Includes resource limits (500m/256Mi) but no probes and no security context. PSS level is Privileged. Slightly more opinionated than Terra with resource limits and LoadBalancer service type.
Kimi K3 minimal response — standard Deployment and Service with nginx:latest, no security context but includes resource limits and liveness+readiness probes. Deploys fine but no security context means Privileged PSS level.
Xiaomi MiMo v2.5 deploys cleanly (LoadBalancer switched to ClusterIP for Kind) with resource limits and liveness+readiness probes but no security context (Privileged PSS level). A ConfigMap supplies an nginx default.conf; also ships an HPA and Namespace. 3 replicas.

2. Production Deployment

Prompt: “…Ensure that the deployment and service are configured suitably for production cluster use.”

Category	Claude Fable 5	Claude Opus 4.8	Claude Opus 4.7	Claude Opus 4.6	Claude Sonnet 4.6	Claude Sonnet 5	GPT 5.5	GPT 5.4	GPT 5.6 Terra	GPT 5.6 Sol	Gemini 3 Flash	MiniMax M2.5	MiniMax M2.7	MiniMax M3	DeepSeek V3.2	Qwen 3.6 Plus	Qwen 3.7 Plus	DeepSeek V4 Pro	DeepSeek V4 Flash	Kimi K2.6	Kimi K2.7 Code	GLM-5.2	Qwen-35b (Local)	Gemma 4 31B (Local)	Mistral Medium 3.5	Tencent HY3	Kimi K3	MiMo v2.5
Deploys?	YES	YES	YES	YES	NO	YES	YES	NO	YES	YES	YES	NO	NO	YES	YES	NO	NO	NO	NO	YES	YES	YES	NO	YES	NO	NO	NO	YES
runAsNonRoot	true (uid 101)	true (uid 101)	true (uid 101)	true (uid 101)	false (uid 0!)	true (uid 101)	true (uid 101)	false	true (uid 101)	true (uid 101)	false	true (broken)	true (broken)	true (uid 101)	true	Not set	true (uid 101)	true (uid 101)	Not set	true (uid 101)	false	true (uid 101)	N/A (invalid YAML)	Not set	true (uid 101, pod-level)	true (uid 101)	Not set	true (uid 1000)
Capabilities	drop ALL	drop ALL	drop ALL	drop ALL	drop ALL +NET_BIND	drop ALL	drop ALL	drop ALL	drop ALL	drop ALL	No	Invalid placement	drop ALL	drop ALL	drop ALL	drop ALL +NET_BIND	drop ALL	drop ALL	drop ALL +NET_BIND	drop ALL	drop ALL	drop ALL	N/A (invalid YAML)	No	Not set	drop ALL	drop ALL (+NET_BIND)	drop ALL
readOnlyFS	Not set	true	true	true	false	true	true	false	true	true	false	N/A	true	true	true	Not set	Not set	Not set	Not set	true	true	Not set	N/A (invalid YAML)	Not set	Not set	Not set	true	false
Probes	L+R	L+R	L+R+S	L+R+S	L+R+S	L+R	L+R	L+R+S	L+R	S+L+R	L+R	L+R	L+R	L+R	L+R	L+R	L+R	L+R	L+R	L+R	L+R	L+R	N/A (invalid YAML)	L+R	L+R	L+R	L+R	S+L+R
NetworkPolicy	No	No	No	Yes (ingress+egress)	Yes	Yes	No	No	No	No	No	No	No	No	No	No	No	No	No	No	No	No	No	No	No	No	No	No
PDB	No	No	Yes	Yes	Yes	Yes	No	No	No	Yes	No	No	No	No	No	Yes	No	No	No	Yes	No	No	No	No	No	No	Yes	No
HPA	No	No	Yes (CPU + memory)	Yes (CPU + memory)	Yes	Yes	No	No	No	No	No	No	No	No	No	No	No	No	No	Yes	No	No	No	No	No	No	No	Yes
ConfigMap	Yes (nginx)	Yes (port 8080)	Yes (port 8080)	Yes (port 8080)	Yes (port 8080)	Yes (port 8080)	Yes (port 8080)	No	Yes (port 8080)	Yes (port 8080)	No	No	No	Yes (port 8080)	No	No	No	No	No	Yes (port 8080)	No	Yes (port 8080)	No	No	No	No	No	Yes (port 80)

Notable observations:

Fable 5 achieves near-Restricted PSS at Production level — runAsNonRoot with uid 101, drop ALL capabilities, allowPrivilegeEscalation false. ConfigMap for nginx configuration, pinned image, ClusterIP service. Deploys successfully. Missing automountServiceAccountToken false, seccompProfile, and readOnlyRootFilesystem — close to Restricted but not quite there.
Opus 4.8 achieves full PSS Restricted at Production level — runAsNonRoot with uid 101, seccompProfile RuntimeDefault, drop ALL capabilities, readOnlyRootFilesystem true. ConfigMap reconfigures nginx to port 8080. Topology spread constraints. Deploys successfully. No PDB, HPA, or NetworkPolicy — leaner than Opus 4.7 but all security essentials present.
Opus 4.7 achieves full PSS Restricted at Production level — matching Opus 4.6 as the only models to reach Restricted without the explicit “hardened” prompt. Image pinned to nginx:1.27.2 (the only model to avoid :latest). ConfigMap reconfigures nginx to port 8080 with health endpoint. topologySpreadConstraints, preStop hook, ClusterIP service type. Leaner than Sonnet 4.6’s production (no Namespace, no NetworkPolicy) but all security essentials present.
Opus 4.6 also achieves PSS Restricted at Production level with uid 101, drop ALL capabilities, readOnlyRootFilesystem, seccompProfile, and automountServiceAccountToken: false. Includes NetworkPolicy (ingress+egress), PDB, HPA (CPU + memory), and ConfigMap for port 8080. One of only two models to reach Restricted without the hardened prompt.
Sonnet 4.6 produced the most comprehensive response (10 objects!) but made a critical error: runAsUser: 0 with dropped capabilities. The comment even says “nginx master needs root for port binding” despite configuring port 8080 in the ConfigMap (which doesn’t need root). Self-contradictory.
GPT 5.4 acknowledged the manifest needs more work and offered to provide it — honest but incomplete
Gemini 3 Flash kept it simple and functional but lacked security hardening for a “production” prompt
MiniMax M2.5 attempted non-root but placed capabilities under pod-level securityContext (invalid YAML structure) — a structural error
DeepSeek V3.2 was the most practical: switched to nginx:1.25-alpine, ran as non-root, read-only filesystem, and it actually works
Qwen 3.6 Plus falls into the same chown trap as Sonnet 4.6: drops ALL capabilities and adds NET_BIND_SERVICE, but runs as root on port 80. Ironically, the text notes include advice to use non-root with port 8080 — the model knows the right answer but doesn’t implement it. Includes PDB (good) but no ConfigMap or readOnlyFS.
DeepSeek V4 Pro sets runAsUser: 101 and drops ALL capabilities — good security intent. But fails to provide emptyDir volumes for /var/cache/nginx, so UID 101 cannot create subdirectories. A slightly different failure mode than the chown pitfall: ownership prevents writes rather than missing capabilities.
GPT 5.5 achieves PSS Restricted at Production level — runAsNonRoot with uid 101, seccompProfile RuntimeDefault, drop ALL capabilities, readOnlyRootFilesystem true. ConfigMap reconfigures nginx to port 8080. One of only three models (with both Opus versions) to reach Restricted without the explicit “hardened” prompt. Liveness and readiness probes present.
DeepSeek V4 Flash drops ALL capabilities and adds NET_BIND_SERVICE, runs as root on port 80 with 3 replicas. Same chown failure pattern as Sonnet 4.6 and Qwen 3.6 Plus — nginx master can’t chown cache directories. Resource limits present. ClusterIP service.
Kimi K2.6 achieves full PSS Restricted at Production level — runAsNonRoot with uid 101, seccompProfile RuntimeDefault, drop ALL capabilities, readOnlyRootFilesystem true, automountServiceAccountToken false. ConfigMap reconfigures nginx to port 8080 with health endpoint. Includes PDB and HPA. Pod Running 1/1 with healthy endpoint. One of four models (with both Opus versions and GPT 5.5) to reach Restricted without the explicit “hardened” prompt.
Qwen-35b (Local) has a NOVEL failure mode: the model generates security context fields and probes but nests them under invalid YAML keys (“container security context:” and “health checks:” instead of proper Kubernetes field names like securityContext and livenessProbe). The model clearly knows the right settings but generates structurally invalid YAML that Kubernetes cannot parse. This is a unique failure pattern not seen in any other model.
Qwen 3.7 Plus sets runAsNonRoot: true with runAsUser: 101 and drops ALL capabilities — good security intent matching the chown failure pattern seen in multiple other models. No ConfigMap to reconfigure nginx to port 8080, so the nginx master process cannot chown cache directories as uid 101 without CHOWN capability. Near-Restricted PSS (missing seccompProfile) but non-functional deployment.
Gemma 4 31B (Local) minimal production response — standard Deployment and Service with nginx:latest running as root. No security context, no capabilities manipulation, no ConfigMap for port reconfiguration. Includes resource limits and liveness+readiness probes. Deploys successfully but has no security hardening. ClusterIP service type.
MiniMax M3 achieves near-Restricted PSS at Production level — runAsNonRoot with uid 101, drop ALL capabilities, readOnlyRootFilesystem true, allowPrivilegeEscalation false. Uses nginx:1.25.3 (pinned version). ConfigMap reconfigures nginx to port 8080. Full security context with emptyDir volumes. Deploys successfully with pods Ready. A significant improvement over M2.7’s broken production deployment.
Kimi K2.7 Code takes an honest approach to the root/port 80 tension — explicitly sets runAsNonRoot: false, acknowledging that nginx:latest needs root for port 80. Achieves Baseline PSS with seccompProfile RuntimeDefault, drop ALL capabilities, readOnlyRootFilesystem true, allowPrivilegeEscalation false, and automountServiceAccountToken false. Deploys successfully. No ConfigMap for port reconfiguration, no NetworkPolicy, PDB, or HPA.
GLM-5.2 achieves near-Restricted PSS at Production level — runAsNonRoot with uid 101, drop ALL capabilities, allowPrivilegeEscalation false. ConfigMap for port 8080 non-root configuration. Deploys successfully. Missing seccompProfile, readOnlyRootFilesystem, and automountServiceAccountToken false — close to Restricted but not quite there.
Mistral Medium 3.5 sets runAsNonRoot: true with runAsUser: 101 at pod level — good security intent. But missing writable volumes for /var/cache/nginx/client_temp. CrashLoopBackOff from nginx cache directory permission denied. No capabilities manipulation, no readOnlyRootFilesystem, no ConfigMap for port reconfiguration. Near-Baseline PSS (~Baseline) but non-functional deployment.
Sonnet 5 achieves near-Restricted PSS at Production level — runAsNonRoot with uid 101, drop ALL capabilities, allowPrivilegeEscalation false, readOnlyRootFilesystem true. emptyDir volumes for nginx writable directories. Includes HPA, PDB, and NetworkPolicy extras. Missing seccompProfile RuntimeDefault and automountServiceAccountToken false. ConfigMap for port 8080. Still uses nginx:latest. Deploys successfully.
Tencent HY3 sets runAsNonRoot: true with runAsUser: 101 and drops ALL capabilities — good security intent. But missing writable volumes for /var/cache/nginx/client_temp. CrashLoopBackOff from nginx cache directory permission denied. Near-Restricted PSS (~Restricted) but non-functional deployment. Same failure pattern as Mistral Medium 3.5.
GPT 5.6 Terra achieves full PSS Restricted at Production level — runAsNonRoot with uid 101, seccompProfile RuntimeDefault, drop ALL capabilities, readOnlyRootFilesystem true, allowPrivilegeEscalation false. ConfigMap reconfigures nginx to port 8080. Resource limits and liveness+readiness probes present. No NetworkPolicy, PDB, HPA, or automountServiceAccountToken false. Deploys successfully.
GPT 5.6 Sol achieves full PSS Restricted at Production level — runAsNonRoot with uid 101, seccompProfile RuntimeDefault, drop ALL capabilities, readOnlyRootFilesystem true, automountServiceAccountToken false. ConfigMap reconfigures nginx to port 8080. Startup, liveness, and readiness probes present. PDB and topologySpread constraints for availability. preStop hook for graceful shutdown. emptyDir for /tmp. Deploys successfully.
Kimi K3 drops ALL capabilities and adds NET_BIND_SERVICE with no runAsNonRoot — CrashLoopBackOff from chown failure without CHOWN cap. Has seccompProfile RuntimeDefault, readOnlyRootFilesystem, allowPrivilegeEscalation false, and PDB. Response notes recommend using an unprivileged image but doesn’t implement it. Near-Baseline PSS (~Baseline) but non-functional deployment.
Xiaomi MiMo v2.5 bucks the field’s usual pattern — its production manifest runs nginx as a non-root user (uid 1000) with drop-ALL capabilities, seccompProfile RuntimeDefault, and allowPrivilegeEscalation false, meeting PSS Restricted while deploying and serving HTTP 200. Pinned image (nginx:1.27-alpine), dedicated ServiceAccount with automountServiceAccountToken false, HPA, and Ingress. The one blemish: it binds port 80 as a non-root user with no NET_BIND_SERVICE capability — this works on the Kind node (net.ipv4.ip_unprivileged_port_start=0) but would CrashLoopBackOff on a default-configured cluster. This latent portability fault is why the usability sub-score is docked one point.

3. Hardened Production Deployment

Prompt: “…Ensure the deployment and service are properly secured and hardened.”

Category	Claude Fable 5	Claude Opus 4.8	Claude Opus 4.7	Claude Opus 4.6	Claude Sonnet 4.6	Claude Sonnet 5	GPT 5.5	GPT 5.4	GPT 5.6 Terra	GPT 5.6 Sol	Gemini 3 Flash	MiniMax M2.5	MiniMax M2.7	MiniMax M3	DeepSeek V3.2	Qwen 3.6 Plus	Qwen 3.7 Plus	DeepSeek V4 Pro	DeepSeek V4 Flash	Kimi K2.6	Kimi K2.7 Code	GLM-5.2	Qwen-35b (Local)	Gemma 4 31B (Local)	Mistral Medium 3.5	Tencent HY3	Kimi K3	MiMo v2.5
Deploys?	YES (1/3 replicas)	YES	YES	YES	YES	YES	YES	NO	YES	YES	YES	YES	YES	YES	YES	YES	NO (invalid YAML)	YES	YES	NO (404)	YES	YES	YES	NO (CrashLoop)	NO (CrashLoop)	YES	YES	NO (Init:Error)
PSS Restricted?	YES	YES	YES	YES	YES	YES	YES	YES (if it worked)	YES	YES	Almost (no seccomp)	YES	YES	YES	YES	YES	N/A (invalid)	YES	YES	YES	YES	Almost (no seccomp)	Almost (NET_BIND_SERVICE)	No (runAsNonRoot: false)	YES	YES	YES	YES
runAsNonRoot	true	true	true	true	true	true	true	true	true	true	true	true	true	true	true	true	N/A	true	true	true	true	true	true	false	true	true	true	true
seccompProfile	RuntimeDefault	RuntimeDefault	RuntimeDefault	RuntimeDefault	RuntimeDefault	RuntimeDefault	RuntimeDefault	RuntimeDefault	RuntimeDefault	RuntimeDefault	Not set	RuntimeDefault	RuntimeDefault	RuntimeDefault	RuntimeDefault	RuntimeDefault	N/A	RuntimeDefault	RuntimeDefault	RuntimeDefault	RuntimeDefault	Not set	RuntimeDefault	RuntimeDefault	RuntimeDefault	RuntimeDefault	RuntimeDefault	RuntimeDefault
drop ALL caps	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes (+NET_BIND)	Yes	Yes	Yes	Yes	N/A	Yes	Yes (+NET_BIND)	Yes	Yes	Yes	Yes (+NET_BIND)	Yes	Yes	Yes	Yes (no adds)	Yes (+NET_BIND)
readOnlyFS	true	true	true	true	true	true	true	true	true	true	true	true	true	true	true	true	N/A	true	true	true	true	true	true	true	true	true	true	true
Port 8080	Yes (with ConfigMap)	Yes (with ConfigMap)	Yes (with ConfigMap)	Yes (with ConfigMap)	Yes (with ConfigMap)	Yes (with ConfigMap)	Yes (with ConfigMap)	Yes (NO ConfigMap!)	Yes (with ConfigMap)	Yes (with ConfigMap)	Yes (unprivileged image)	No (port 80)	Yes (with ConfigMap)	No (port 80, nginx:latest!)	No (port 80)	Yes (with ConfigMap)	N/A	Yes (with ConfigMap)	Yes (with ConfigMap)	Yes (with ConfigMap)	Yes (with ConfigMap)	Yes (with ConfigMap)	No (port 80)	No (port 80)	No (port 80)	Yes (with ConfigMap)	Yes (with ConfigMap)	Yes (with ConfigMap)
NetworkPolicy	Yes (zero-trust)	Yes (deny+ingress+DNS from ingress-nginx)	Yes (deny+ingress+DNS)	Yes (ingress+egress)	Yes (ingress+egress)	Yes (ingress+egress)	No	No	No	Yes (ingress+deny egress)	Yes (ingress)	Yes (ingress+egress)	No	Yes	Yes (ingress)	No	N/A	No	No	Yes (deny+DNS from ingress-nginx)	Yes	No	No	No	No	No	Yes	Yes
automountSAToken: false	Yes	Yes	Yes (SA + pod)	Yes	Yes	Yes	Yes	Yes	Not set	Yes (SA + pod)	No	Commented out	No	Yes	No	Yes	N/A	Not set	Not set	Yes	Yes	Not set	No	Not set	Not set	Not set	Yes (SA + pod)	Yes
PDB	No	Yes	Yes	Yes	Yes	No	Yes	Yes	No	No	No	Yes	No	No	Yes	No	N/A	No	No	Yes	Yes	No	No	No	No	No	Yes	Yes
HPA	No	No	Yes	Yes	Yes	No	No	Yes	No	No	No	No	No	No	Yes	No	N/A	No	No	No	Yes	No	No	No	No	No	No	Yes
ConfigMap	Yes (comprehensive)	Yes (comprehensive)	Yes (comprehensive)	Yes (comprehensive)	Yes (comprehensive)	Yes (comprehensive)	Yes (defaultMode 0444)	No	Yes	Yes (comprehensive, defaultMode 0444)	No	No	Yes (with security headers)	No	Yes (comprehensive)	Yes (port 8080, server_tokens off)	N/A (truncated)	Yes	Yes	Yes (full nginx.conf)	Yes	Yes (port 8080)	No	No	No	Yes	Yes (comprehensive)	Yes (comprehensive)
Nginx security headers	No	Yes (X-Frame-Options, X-Content-Type-Options, Referrer-Policy, server_tokens off)	Yes (6 headers)	Yes (7 headers)	Yes (7 headers)	No	Yes	No	No	Yes (X-Content-Type-Options, X-Frame-Options, Referrer-Policy)	No	No	Yes	No	Yes (4 headers)	No	N/A	No	No	Yes (6 headers + server_tokens off)	No	No	No	No	No	No	No	No
Rate limiting	No	No	No	Yes	Yes	No	No	No	No	No	No	No	Yes	No	Yes	No	N/A	No	No	No	No	No	No	No	No	No	No	Yes (invalid directive)
PSA namespace labels	YES (restricted)	Yes (restricted)	Yes (restricted)	No	No	No	Yes (restricted)	No	No	No	No	No	No	Yes (restricted)	No	No	N/A	No	No	No	No	No	No	No	No	No	YES (restricted)	No
ServiceAccount	Yes	Yes	Yes	No	No	Yes	No	No	No	Yes	No	No	No	No	No	No	N/A	No	No	No	Yes	No	No	No	No	No	Yes	Yes

Notable observations:

Fable 5 achieves full PSS Restricted with comprehensive hardening — runAsNonRoot with uid 101, seccompProfile RuntimeDefault, drop ALL capabilities, readOnlyRootFilesystem true, automountServiceAccountToken false, dedicated ServiceAccount. ConfigMap with port 8080. PSA namespace labels with restricted enforce/audit/warn. NetworkPolicy with zero-trust default-deny. Deploys but only 1/3 replicas schedule on Kind due to hard anti-affinity rules — a usability issue on single-node clusters. No PDB, HPA, security headers, or rate limiting.
Opus 4.8 achieves full PSS Restricted with comprehensive hardening — runAsNonRoot with uid 101, seccompProfile RuntimeDefault, drop ALL capabilities, readOnlyRootFilesystem true, automountServiceAccountToken false, dedicated ServiceAccount. ConfigMap replaces nginx.conf with port 8080, security headers (X-Frame-Options, X-Content-Type-Options, Referrer-Policy, server_tokens off). PSA namespace labels with restricted enforce/audit/warn. NetworkPolicy with default-deny plus DNS egress and ingress scoped to ingress-nginx namespace. PDB present. No HPA or rate limiting.
Opus 4.7 creates a Namespace with PSA restricted enforce/audit/warn labels — the only model to add cluster-level enforcement beyond pod-level security contexts. ConfigMap replaces the entire nginx.conf with comprehensive hardening (temp paths, server_tokens off, 6 security headers). emptyDir volumes with sizeLimit and medium: Memory. ConfigMap mounted with defaultMode: 0444. NetworkPolicy uses default-deny base with ingress from ingress-nginx namespace and DNS egress targeting kube-dns pod selector. Uses app.kubernetes.io/name label convention. Missing: rate limiting, HTTP method restriction, hidden file blocking.
Opus 4.6 achieves full PSS Restricted with comprehensive ConfigMap (port 8080, security headers including 7 headers), rate limiting, NetworkPolicy (ingress+egress), PDB, HPA, and automountServiceAccountToken: false. Deploys successfully. No PSA namespace labels or dedicated ServiceAccount.
Sonnet 4.6 finally gets it right here — proper non-root (uid 101), ConfigMap with port 8080 and PID in /tmp, comprehensive security headers, rate limiting, memory-backed emptyDirs with size limits. The most complete security implementation.
GPT 5.4 has perfect security settings on paper but critically fails to provide the ConfigMap needed to make nginx listen on port 8080. Acknowledges this in a caveat note — but the manifest as-delivered does not work.
Gemini 3 Flash took the smartest approach: used nginxinc/nginx-unprivileged:stable-alpine which natively runs non-root on 8080 — no ConfigMap needed. But missed seccompProfile and some features.
MiniMax M2.5 used nginx:1.21 (pinned version, good) but ran on port 80 as non-root. Placed topologySpreadConstraints under affinity (invalid placement). Applied PSS enforce label to namespace (good). Left automountServiceAccountToken: false commented out (half-hearted).
DeepSeek V3.2 provided a comprehensive ConfigMap but had limit_req_zone inside the server block in default.conf (invalid — must be in http block). Also included deprecated seccomp annotation alongside the modern field. Added NET_BIND_SERVICE (unnecessary if they’d used port 8080).
Qwen 3.6 Plus redeems itself completely here — full PSS Restricted with uid 101, ConfigMap for port 8080 with server_tokens off, drop ALL capabilities (no NET_BIND needed), readOnlyRootFilesystem with 5 emptyDir volumes (cache, run, tmp, log), automountServiceAccountToken: false. Clean implementation referencing CIS Benchmark and NSA/CISA guidelines. No NetworkPolicy, PDB, or security headers — functional but not the most feature-rich.
DeepSeek V4 Pro achieves full PSS Restricted — nginx:latest with ConfigMap for port 8080, runAsNonRoot with uid 101, seccompProfile RuntimeDefault, drop ALL capabilities, readOnlyRootFilesystem true, emptyDir volumes for cache and run directories. Learns from its Production failure and gets all the non-root pieces right. No NetworkPolicy, automountServiceAccountToken, PDB, HPA, security headers, or rate limiting.
GPT 5.5 achieves full PSS Restricted with uid 101, seccompProfile RuntimeDefault, drop ALL capabilities, readOnlyRootFilesystem, automountServiceAccountToken false, and PSA namespace labels (restricted enforce/audit/warn) — the second model (after Opus 4.7) to add cluster-level enforcement. ConfigMap for port 8080 with defaultMode 0444. Security headers present. startupProbe in addition to liveness+readiness. topologySpreadConstraints and PDB for availability. ephemeral-storage limits. A comprehensive hardened deployment that matches or exceeds most models on feature coverage.
DeepSeek V4 Flash achieves full PSS Restricted — uid 101, readOnlyRootFilesystem, seccomp RuntimeDefault, drop ALL capabilities (+NET_BIND_SERVICE), emptyDir volumes for /tmp, /var/cache/nginx, /var/run. ConfigMap for port 8080. Pod anti-affinity for scheduling. ClusterIP service. No NetworkPolicy, automountServiceAccountToken, PDB, HPA, security headers, or rate limiting — functional but not feature-rich.
Kimi K2.6 achieves full PSS Restricted with comprehensive security — uid 101, seccomp RuntimeDefault, drop ALL capabilities, readOnlyRootFilesystem, automountServiceAccountToken false. Full nginx.conf ConfigMap with 6 security headers and server_tokens off. NetworkPolicy with default-deny and DNS egress from ingress-nginx namespace. PDB present. However, try_files $uri $uri/ =404; in nginx.conf causes the root path to return 404, which fails the startup probe. Pod runs but never becomes Ready.
Qwen-35b (Local) produces a genuinely good hardened deployment — runAsNonRoot with uid 101, seccompProfile RuntimeDefault, allowPrivilegeEscalation false, drop ALL capabilities (adds NET_BIND_SERVICE — the only thing preventing full PSS Restricted), readOnlyRootFilesystem true. Uses nginx:1.25.3-alpine (a good pinned image choice). Does NOT use nginxinc/nginx-unprivileged — instead runs standard nginx as uid 101 on port 80 (valid alternative). Liveness and readiness probes present. Resource limits set. No ConfigMap, NetworkPolicy, PDB, HPA, automountServiceAccountToken, or security headers. A clean, minimal hardened deployment that works.
Qwen 3.7 Plus intent appears to be full PSS Restricted (ConfigMap for port 8080, uid 101, seccomp RuntimeDefault, drop ALL capabilities, readOnlyRootFilesystem), but the YAML is malformed — ConfigMap nginx.conf is truncated/corrupted mid-stream, and the Deployment resource definition is missing apiVersion/kind/metadata. The manifest cannot be parsed by kubectl, so no deployment is possible. This is a unique failure mode: not a configuration error but a YAML generation/output truncation issue.
Gemma 4 31B (Local) has good security intent in the hardened deployment — drop ALL capabilities, readOnlyRootFilesystem true, allowPrivilegeEscalation false, seccompProfile RuntimeDefault, volume mounts for cache/run/tmp, and resource limits. However, runAsNonRoot: false means it runs as root (uid 0). With ALL capabilities dropped, nginx:latest cannot chown /var/cache/nginx/client_temp — the same failure mode as Claude Sonnet 4.6 and Qwen 3.6 Plus in production. The model recommended using nginxinc/nginx-unprivileged in its comments but did not implement it. Falls into port 80 without ConfigMap. No ConfigMap, NetworkPolicy, PDB, HPA, automountServiceAccountToken, or security headers.
MiniMax M3 achieves full PSS Restricted with comprehensive hardening — runAsNonRoot with uid 101, seccompProfile RuntimeDefault, drop ALL capabilities, readOnlyRootFilesystem true, allowPrivilegeEscalation false, automountServiceAccountToken false. PSA namespace labels with restricted enforce/audit/warn. NetworkPolicy present. However, uses nginx:latest (not pinned!) on port 80 without a ConfigMap — despite the comprehensive security context, this is a surprising gap. Deploys successfully with pods Ready despite the port mismatch, suggesting the pod runs but may have accessibility issues.
Kimi K2.7 Code achieves full PSS Restricted with comprehensive hardening — runAsNonRoot with uid 101, seccompProfile RuntimeDefault, drop ALL capabilities, readOnlyRootFilesystem true, automountServiceAccountToken false, dedicated ServiceAccount. ConfigMap for port 8080. NetworkPolicy, PDB, and HPA present. Deploys successfully. No security headers, rate limiting, or PSA namespace labels.
GLM-5.2 achieves near-Restricted PSS with uid 101, drop ALL capabilities, readOnlyRootFilesystem true, allowPrivilegeEscalation false. ConfigMap for port 8080 non-root configuration. Deploys successfully. The key gap is missing seccompProfile — without it, the deployment reaches ~Restricted but not full Restricted. No NetworkPolicy, automountServiceAccountToken, PDB, HPA, security headers, or PSA namespace labels.
Mistral Medium 3.5 achieves full PSS Restricted — runAsNonRoot with uid 101, seccompProfile RuntimeDefault, drop ALL capabilities, readOnlyRootFilesystem true, allowPrivilegeEscalation false. However, only /tmp mounted as emptyDir — missing writable volumes for /var/cache/nginx, /var/run. nginx cannot create cache directories on the read-only filesystem. CrashLoopBackOff. No ConfigMap for port reconfiguration (runs on port 80), no NetworkPolicy, PDB, HPA, security headers, or automountServiceAccountToken.
Sonnet 5 achieves full PSS Restricted with comprehensive hardening — runAsNonRoot with uid 101, seccompProfile RuntimeDefault, drop ALL capabilities, readOnlyRootFilesystem true, automountServiceAccountToken false, dedicated ServiceAccount. ConfigMap reconfigures nginx to non-privileged port 8080 via ConfigMap. Pinned image tag (nginx:1.27.2). NetworkPolicy with restrictive ingress and DNS-only egress. Deploys successfully. No security headers, rate limiting, PDB, HPA, or PSA namespace labels.
Tencent HY3 achieves full PSS Restricted with comprehensive hardening — runAsNonRoot with uid 101, seccompProfile RuntimeDefault, drop ALL capabilities, readOnlyRootFilesystem true, allowPrivilegeEscalation false. ConfigMap reconfigures nginx to port 8080. emptyDir volumes for writable directories. Deploys successfully. No NetworkPolicy, automountServiceAccountToken, PDB, HPA, security headers, or PSA namespace labels — functional but not feature-rich.
GPT 5.6 Terra achieves full PSS Restricted with comprehensive hardening — runAsNonRoot with uid 101, seccompProfile RuntimeDefault, drop ALL capabilities, readOnlyRootFilesystem true, allowPrivilegeEscalation false. ConfigMap reconfigures nginx to port 8080. Resource limits and liveness+readiness probes present. Deploys successfully. No NetworkPolicy, automountServiceAccountToken false, PDB, HPA, security headers, rate limiting, or PSA namespace labels.
GPT 5.6 Sol achieves full PSS Restricted with the most comprehensive hardening of any GPT model — runAsNonRoot with uid 101, seccompProfile RuntimeDefault, drop ALL capabilities (explicit privileged: false), readOnlyRootFilesystem true, automountServiceAccountToken false at both ServiceAccount and pod level, dedicated ServiceAccount. ConfigMap reconfigures nginx to port 8080 with security headers (X-Content-Type-Options, X-Frame-Options, Referrer-Policy) and defaultMode 0444. NetworkPolicy with ingress restricted to same namespace on port 8080 and deny-all egress. podAntiAffinity and dual topologySpreadConstraints (zone+hostname). fsGroupChangePolicy OnRootMismatch, command to bypass entrypoint, subPath ConfigMap mount. Deploys successfully. No PDB, HPA, rate limiting, or PSA namespace labels.
Kimi K3 achieves full PSS Restricted with comprehensive hardening — runAsNonRoot with uid 101 (runAsGroup 101, fsGroup 101), seccompProfile RuntimeDefault, drop ALL capabilities (no adds), readOnlyRootFilesystem true, automountServiceAccountToken false at both ServiceAccount and pod level. ConfigMap reconfigures nginx to port 8080. PSA namespace labels with restricted enforce/audit/warn. Dedicated ServiceAccount. NetworkPolicy and PDB present. Deploys successfully.
Xiaomi MiMo v2.5 intends full PSS Restricted with comprehensive extras (Namespace, ResourceQuota, LimitRange, NetworkPolicy, Secrets, dedicated ServiceAccount, HPA, PDB) and a ConfigMap moving nginx to port 8080 with NET_BIND_SERVICE — but fails to deploy for two independent reasons: (1) an init-chown-data init container runs chown after dropping CAP_CHOWN while non-root (uid 1000), producing Operation not permitted → Init:Error; and (2) the nginx.conf uses an invalid rate_limit burst=20 nodelay; directive (should be limit_req) that fails the config test. Uses nginx:latest (unpinned) in the hardened scenario. The security intent is excellent but the manifest is not deployable as written.

Overall Scoring

Usability Score (out of 5)

Model	Basic	Production	Hardened	Total	Average
Claude Opus 4.8	5	5	5	15	5.0
Claude Opus 4.7	5	5	5	15	5.0
Claude Opus 4.6	5	5	5	15	5.0
Claude Sonnet 4.6	1	1	5	7	2.3
Claude Sonnet 5	5	5	5	15	5.0
Claude Fable 5	4	4	3	11	3.7
GPT 5.5	5	5	5	15	5.0
GPT 5.4	5	1	1	7	2.3
Gemini 3 Flash	5	5	5	15	5.0
Kimi K2.6	1	5	3	9	3.0
Kimi K3	5	1	5	11	3.7
MiniMax M2.5	5	1	4	10	3.3
MiniMax M2.7	5	1	5	11	3.7
MiniMax M3	5	4	4	13	4.3
DeepSeek V3.2	5	5	5	15	5.0
Qwen 3.6 Plus	5	1	5	11	3.7
Qwen 3.7 Plus	5	3	2	10	3.3
DeepSeek V4 Pro	5	1	5	11	3.7
DeepSeek V4 Flash	5	1	5	11	3.7
Qwen-35b (Local)	5	2	4	11	3.7
Kimi K2.7 Code	5	5	5	15	5.0
GLM-5.2	4	4	4	12	4.0
Gemma 4 31B (Local)	5	5	1	11	3.7
Mistral Medium 3.5	5	1	1	7	2.3
Tencent HY3	5	2	5	12	4.0
GPT 5.6 Terra	5	5	5	15	5.0
GPT 5.6 Sol	5	5	5	15	5.0
Xiaomi MiMo v2.5	5	4	1	10	3.3

Scoring key: 5=deploys and works perfectly, 4=deploys with minor issues, 3=deploys but significant issues, 1=does not deploy (CrashLoopBackOff/probe failure)

Security Score (out of 5)

Model	Basic	Production	Hardened	Total	Average
Claude Opus 4.8	1	4	5	10	3.3
Claude Opus 4.7	1	5	5	11	3.7
Claude Opus 4.6	1	5	5	11	3.7
Claude Sonnet 4.6	3	3	5	11	3.7
Claude Sonnet 5	2	4	5	11	3.7
Claude Fable 5	1	5	5	11	3.7
GPT 5.5	1	5	5	11	3.7
Kimi K2.6	1	5	5	11	3.7
Kimi K3	1	4	5	10	3.3
GPT 5.4	1	3	5	9	3.0
Gemini 3 Flash	1	2	4	7	2.3
MiniMax M2.5	1	2	4	7	2.3
MiniMax M2.7	1	3	5	9	3.0
MiniMax M3	1	4	5	10	3.3
DeepSeek V3.2	1	4	5	10	3.3
Qwen 3.6 Plus	1	3	5	9	3.0
Qwen 3.7 Plus	1	4	4	9	3.0
DeepSeek V4 Pro	1	3	5	9	3.0
DeepSeek V4 Flash	1	3	5	9	3.0
Qwen-35b (Local)	1	2	4	7	2.3
Kimi K2.7 Code	1	3	5	9	3.0
GLM-5.2	1	3	4	8	2.7
Gemma 4 31B (Local)	1	1	3	5	1.7
Mistral Medium 3.5	1	3	5	9	3.0
Tencent HY3	1	3	5	9	3.0
GPT 5.6 Terra	1	5	5	11	3.7
GPT 5.6 Sol	1	5	5	11	3.7
Xiaomi MiMo v2.5	1	5	5	11	3.7

Scoring key: 5=PSS Restricted compliant, 4=near-Restricted (missing 1-2 items), 3=significant hardening but fails Restricted, 2=minimal hardening, 1=no security context

Combined Score (Usability + Security, out of 10)

Model	Basic	Production	Hardened	Total	Average
GPT 5.6 Sol	6	10	10	26	8.7
GPT 5.6 Terra	6	10	10	26	8.7
Claude Opus 4.7	6	10	10	26	8.7
GPT 5.5	6	10	10	26	8.7
Claude Opus 4.6	6	10	10	26	8.7
Claude Sonnet 5	7	9	10	26	8.7
Claude Opus 4.8	6	9	10	25	8.3
DeepSeek V3.2	6	9	10	25	8.3
Kimi K2.7 Code	6	8	10	24	8.0
MiniMax M3	6	8	9	23	7.7
Gemini 3 Flash	6	7	9	22	7.3
Claude Fable 5	5	9	8	22	7.3
Tencent HY3	6	5	10	21	7.0
Kimi K3	6	5	10	21	7.0
Xiaomi MiMo v2.5	6	9	6	21	7.0
Qwen 3.6 Plus	6	4	10	20	6.7
DeepSeek V4 Pro	6	4	10	20	6.7
DeepSeek V4 Flash	6	4	10	20	6.7
MiniMax M2.7	6	4	10	20	6.7
Kimi K2.6	2	10	8	20	6.7
GLM-5.2	5	7	8	20	6.7
Qwen 3.7 Plus	6	7	6	19	6.3
Claude Sonnet 4.6	4	4	10	18	6.0
Qwen-35b (Local)	6	4	8	18	6.0
MiniMax M2.5	6	3	8	17	5.7
GPT 5.4	6	4	6	16	5.3
Gemma 4 31B (Local)	6	6	4	16	5.3
Mistral Medium 3.5	6	4	6	16	5.3

Key Findings

Co-Leaders: GPT 5.6 Sol, GPT 5.6 Terra, Claude Opus 4.7, Claude Sonnet 5, GPT 5.5, and Opus 4.6 (8.7 average)

All six achieve 26/30 combined with all 3 manifests deploying successfully. GPT 5.6 Sol, GPT 5.6 Terra, Opus 4.7, GPT 5.5, and Opus 4.6 score perfect 10/10 on Production and Hardened; Sonnet 5 scores 9/10 on Production (near-Restricted) and 10/10 on Hardened
GPT 5.6 Sol joins the co-leaders with the most comprehensive hardened deployment of any GPT model — full PSS Restricted with dedicated ServiceAccount, automountServiceAccountToken false at both SA and pod level, NetworkPolicy (ingress+deny egress), security headers, dual topologySpreadConstraints (zone+hostname), podAntiAffinity, fsGroupChangePolicy OnRootMismatch, ConfigMap defaultMode 0444, and explicit privileged: false. Production deployment also achieves full PSS Restricted with PDB, topologySpread, and preStop hook
GPT 5.6 Terra joins the co-leaders with perfect deployability (3/3), PSS Restricted at both Production and Hardened levels, and comprehensive security settings. ConfigMap for port 8080 at both Production and Hardened levels. Basic deployment is minimal (Privileged PSS) but deploys cleanly
Opus 4.7, GPT 5.5, GPT 5.6 Terra, GPT 5.6 Sol, Opus 4.6, and Kimi K2.6 reach PSS Restricted at Production level — the only models to achieve Restricted without the explicit “hardened” prompt. Sonnet 5 reaches near-Restricted at Production (missing seccompProfile)
GPT 5.5 joins Opus 4.7 as only the second model to include PSA namespace labels in the Hardened scenario, demonstrating cluster-level enforcement awareness. ConfigMap with defaultMode 0444, topologySpreadConstraints, PDB, startupProbe, and ephemeral-storage limits show comprehensive production knowledge
Sonnet 5 matches the co-leaders with perfect Production and Hardened deployability plus near-Restricted Production security. Hardened deployment achieves full PSS Restricted with pinned image tag (nginx:1.27.2), ConfigMap for port 8080, dedicated ServiceAccount, NetworkPolicy, and automountServiceAccountToken false
Opus 4.7 brings unique strengths: image pinning (nginx:1.27.2 in Production — only model to avoid :latest), PSA namespace labels in Hardened (only model with cluster-level enforcement), ConfigMap defaultMode: 0444, emptyDir with Memory medium + sizeLimit
Opus 4.6 retains advantages in: rate limiting, HTTP method restriction, hidden file blocking, more security headers (7 vs 6)
Both show excellent prompt sensitivity: minimal response for basic prompt, comprehensive for production/hardened

Claude Fable 5: Tied 8th Place (7.3 average, tied with Gemini 3 Flash)

All 3 manifests deploy successfully — the 8th model to achieve a 3/3 pass rate
Basic deployment has no security context (Privileged PSS level) but includes resource limits and liveness+readiness probes — LoadBalancer service type
Production deployment achieves near-Restricted PSS with uid 101, drop ALL capabilities, allowPrivilegeEscalation false, pinned image, ConfigMap for nginx. Missing seccompProfile, readOnlyRootFilesystem, and automountServiceAccountToken false
Hardened deployment achieves full PSS Restricted with comprehensive features: PSA namespace labels (restricted enforce/audit/warn), NetworkPolicy with zero-trust default-deny, dedicated ServiceAccount, automountServiceAccountToken false. However, hard anti-affinity rules mean only 1/3 replicas schedule on Kind’s single-node cluster
Main gap: the anti-affinity usability issue on the hardened deployment and missing security headers, rate limiting, and PDB

Kimi K2.7 Code: 6th Place (8.0 average)

All 3 manifests deploy successfully — a strong 3/3 pass rate
Basic deployment is minimal and appropriate — no security context, ClusterIP service, 3 replicas
Production deployment takes an honest approach to root/port 80: explicitly sets runAsNonRoot: false rather than pretending non-root works with nginx:latest on port 80. Full security context with seccompProfile RuntimeDefault, drop ALL capabilities, readOnlyRootFilesystem, automountServiceAccountToken false. Baseline PSS.
Hardened deployment achieves full PSS Restricted with ConfigMap for port 8080, uid 101, dedicated ServiceAccount, NetworkPolicy, PDB, HPA. Comprehensive hardening without security headers or PSA namespace labels
A significant improvement over Kimi K2.6 (6.7 average): fixes the HTML-instead-of-YAML basic response and the try_files 404 bug in hardened

GLM-5.2: Tied 10th Place (6.7 average, tied with Qwen 3.6 Plus, DeepSeek V4 Pro, DeepSeek V4 Flash, MiniMax M2.7, and Kimi K2.6)

All 3 manifests deploy successfully — a 3/3 pass rate
Basic deployment is minimal and appropriate — no security context, ClusterIP service, 3 replicas
Production deployment achieves near-Restricted PSS with uid 101, drop ALL capabilities, ConfigMap for port 8080. Missing seccompProfile, readOnlyRootFilesystem, and automountServiceAccountToken false
Hardened deployment reaches ~Restricted — all settings correct except seccompProfile. ConfigMap for port 8080, readOnlyRootFilesystem, drop ALL capabilities. The missing seccompProfile is the only gap preventing full Restricted
Consistent 4/5 usability across all scenarios, but security never reaches top tier due to the persistent seccompProfile gap

Tencent HY3: 11th Place (7.0 average)

2/3 manifests deploy successfully — Basic and Hardened pass, Production fails
Basic deployment is minimal with no security context — appropriate for a basic prompt
Production deployment sets runAsNonRoot with uid 101 and drops ALL capabilities but fails to provide writable volumes for nginx cache directories — CrashLoopBackOff. Same failure pattern as Mistral Medium 3.5
Hardened deployment achieves full PSS Restricted with ConfigMap for port 8080, uid 101, seccompProfile RuntimeDefault, readOnlyRootFilesystem true, emptyDir volumes. Deploys successfully
The model clearly knows how to implement security correctly (hardened proves it) but is inconsistent at the production level — the missing emptyDir volumes are the only gap

Kimi K3: Tied 11th Place (7.0 average, tied with Tencent HY3 and Xiaomi MiMo v2.5)

2/3 manifests deploy successfully — Basic and Hardened pass, Production fails
Basic deployment is minimal with no security context but includes resource limits and probes — Privileged PSS level
Production deployment drops ALL capabilities and adds NET_BIND_SERVICE but no runAsNonRoot — CrashLoopBackOff from chown failure. Has seccompProfile RuntimeDefault, readOnlyRootFilesystem, PDB. Notes recommend unprivileged image but doesn’t implement it
Hardened deployment achieves full PSS Restricted with ConfigMap for port 8080, uid 101 (runAsGroup/fsGroup 101), seccompProfile RuntimeDefault, readOnlyRootFilesystem, automountServiceAccountToken false at both SA and pod level, dedicated ServiceAccount, PSA namespace labels (restricted), NetworkPolicy, PDB
Main gap: Production deployment fails from the familiar capability/chown tension, and no security headers or rate limiting in Hardened

Xiaomi MiMo v2.5: Tied 11th Place (7.0 average, tied with Tencent HY3 and Kimi K3)

2/3 manifests deploy — but with the reverse of the usual pattern: Basic and Production pass, Hardened fails
Basic deployment is minimal with no security context (Privileged PSS) — appropriate for the prompt
Production deployment is a genuine strength: runs as a non-root user (uid 1000) with drop-ALL capabilities, seccompProfile RuntimeDefault, and allowPrivilegeEscalation false — full PSS Restricted, and it deploys and serves HTTP 200. Docked one usability point for binding port 80 without NET_BIND_SERVICE (works on the test node because unprivileged-port-start is 0, but a portability risk on a default cluster)
Hardened deployment fails to deploy from two self-inflicted bugs: an init container that runs chown after dropping CAP_CHOWN (Operation not permitted → Init:Error), and an invalid rate_limit nginx directive that fails the config test. The security design is full Restricted, but the manifest is not deployable as written
The clear knowledge is there (non-root operation, capability management, the full Restricted control set); what is missing is the final validation step on both the production and hardened manifests

MiniMax M3: 7th Place (7.7 average)

The strongest MiniMax result yet, showing a clear improvement trajectory: M2.5 (17) -> M2.7 (20) -> M3 (23)
All 3 manifests deploy successfully — the first MiniMax model to achieve a 3/3 pass rate
Production deployment finally works: runAsNonRoot with uid 101, pinned nginx:1.25.3, ConfigMap for port 8080, readOnlyRootFilesystem, drop ALL capabilities. Fixes M2.7’s broken production deployment
Hardened manifest achieves full PSS Restricted with PSA namespace labels (only the 4th model to include these) and automountServiceAccountToken: false
Main gap: uses nginx:latest (unpinned) in the hardened scenario on port 80 without a ConfigMap — a surprising regression from the pinned image in production. No security headers, rate limiting, or PDB in hardened

Previous Leader: DeepSeek V3.2 (8.3 average)

All 3 manifests deploy successfully
Strong security defaults even without explicit hardening prompts
Practical choices: switched to alpine image, pinned version, ran as non-root
Only weaknesses: probe targets /healthz (needs ConfigMap), missing seccomp in production

Best Security (when it works): Claude Sonnet 4.6

Hardened manifest is the gold standard: comprehensive ConfigMap, security headers, rate limiting, NetworkPolicy with RFC1918 blocking, memory-backed emptyDirs with size limits
But 2 out of 3 manifests don’t deploy — the capability issue is a serious, repeated error
Over-engineers every response (6-10 objects when 2 were asked for)

Most Reliable: Gemini 3 Flash

Deploys all 3 manifests successfully using pragmatic approaches
Uses nginxinc/nginx-unprivileged for hardened scenario — avoids the entire root/port/capability problem
Weaker on security features (no seccomp) but consistently simple and correct

Qwen 3.6 Plus, DeepSeek V4 Pro, DeepSeek V4 Flash, MiniMax M2.7, Kimi K2.6, and GLM-5.2: Tied at 6.7 average

All six score 20 total but with different failure patterns
Qwen 3.6 Plus falls into the same chown trap as Sonnet 4.6 (drop ALL + NET_BIND_SERVICE, root on port 80). Knows the right answer (mentions non-root with port 8080 in notes) but doesn’t implement it. Hardened response is excellent — full PSS Restricted with ConfigMap, uid 101, automountServiceAccountToken: false
DeepSeek V4 Pro takes a different path to the same Production failure: runs as uid 101 with drop ALL capabilities (good intent) but omits emptyDir volumes for nginx cache directories. UID 101 cannot create subdirectories in /var/cache/nginx. Hardened response is fully PSS Restricted with ConfigMap and emptyDir volumes — learns from its own Production mistake
DeepSeek V4 Flash falls into the same chown trap as Qwen 3.6 Plus and Sonnet 4.6: drops ALL capabilities, adds NET_BIND_SERVICE, runs as root on port 80. Hardened response is fully PSS Restricted with uid 101, ConfigMap for port 8080, emptyDir volumes, seccomp RuntimeDefault, and pod anti-affinity
Kimi K2.6 has the most unusual failure pattern: Basic response was HTML (not YAML at all), Production achieves full PSS Restricted and deploys perfectly, but Hardened fails due to a try_files bug in the nginx.conf causing 404 on root path. The only model to fail Basic while passing Production. Kimi K2.7 Code (8.0 average, 6th place) fixes both issues — proper YAML for Basic and a working Hardened deployment
MiniMax M2.7 fixes M2.5’s structural YAML errors but production manifest has runAsNonRoot without runAsUser — Kubernetes rejects the pod

Qwen 3.7 Plus (6.3 average)

A new failure pattern: Production fails from the familiar chown trap (runAsNonRoot + drop ALL without CHOWN), but the Hardened manifest fails for a novel reason — the YAML output is truncated/corrupted mid-stream, with the ConfigMap nginx.conf cut off and the Deployment missing apiVersion/kind/metadata entirely
Basic deployment works fine (no security context, minimal response), and the Production security intent is strong (near-Restricted PSS), but the combination of a known pitfall and a YAML generation failure limits effectiveness
Security knowledge appears solid (the model attempts Restricted-level settings) but execution suffers from output quality issues in the Hardened scenario

Second Local Model: Gemma 4 31B (5.3 average, tied with GPT 5.4 and Mistral Medium 3.5)

A 31B dense model running locally on LM Studio — the second local/self-hosted model tested
Basic and Production deployments are minimal but functional — no security context, no resource limits in basic, clean deployable YAML
Hardened deployment has good security intent (drop ALL capabilities, readOnlyRootFilesystem, allowPrivilegeEscalation false, emptyDir volumes for cache/run/tmp, seccompProfile RuntimeDefault) but critical flaw: runAsNonRoot: false with ALL capabilities dropped causes the same chown failure as Claude Sonnet 4.6 and Qwen 3.6 Plus in their production manifests
The model explicitly recommended nginxinc/nginx-unprivileged in its comments but did not implement it — demonstrating knowledge of the correct solution without applying it
No ConfigMap, NetworkPolicy, PDB, HPA, security headers, or automountServiceAccountToken across any scenario
Weakest security overall among the models tested, with the lowest combined security average (1.7/5)

Mistral Medium 3.5 (5.3 average, tied with GPT 5.4 and Gemma 4 31B)

Only 1/3 manifests deploy successfully — Basic passes (with adjustment) but both Production and Hardened fail with CrashLoopBackOff
Basic deployment shows appropriate restraint — no security context, resource limits present, probes commented out
Production sets runAsNonRoot with uid 101 at pod level but fails to provide writable volumes for nginx cache directories — CrashLoopBackOff from permission denied
Hardened achieves full PSS Restricted (runAsNonRoot, seccompProfile RuntimeDefault, drop ALL capabilities, readOnlyRootFilesystem, allowPrivilegeEscalation false) but only mounts /tmp as emptyDir — nginx cannot create cache directories on the read-only filesystem
The consistent pattern: strong security settings on paper without the nginx-specific volume mounts needed to make them work

First Local Model: Qwen-35b (6.0 average, tied with Sonnet 4.6)

A 35B-parameter MoE model running locally on LM Studio — the first local/self-hosted model tested
Production deployment has a novel failure mode: securityContext and probes nested under invalid YAML keys (“container security context:” and “health checks:” instead of proper Kubernetes field names). The model knows the right settings but generates structurally invalid YAML
Hardened deployment is genuinely good — runAsNonRoot, uid 101, seccomp RuntimeDefault, readOnlyRootFilesystem, drop ALL capabilities. Only NET_BIND_SERVICE prevents PSS Restricted
Uses nginx:1.25.3-alpine (good pinned image choice) and runs as uid 101 without nginxinc/nginx-unprivileged (valid alternative)
Competitive with several cloud-hosted models despite being a much smaller local model

Most Improved Across Prompts: All models

Every model produced meaningfully better security when explicitly asked for hardening
The jump from “production” to “hardened” was more significant than from “basic” to “production”

Worst Bug: GPT 5.4 Hardened

Specifies port 8080 everywhere but provides no ConfigMap — the manifest is guaranteed to fail
The model even acknowledges this in a caveat but doesn’t fix it

Structural Error: MiniMax M2.5 Production

Places capabilities under pod-level securityContext — this is invalid Kubernetes YAML
Kubernetes silently ignores it, so the capabilities are never actually dropped

Notable Good Results

Gemini 3 Flash (Hardened) — Uses nginxinc/nginx-unprivileged, the smartest approach. Avoids the entire root/port/capability problem that tripped up other models.
DeepSeek V3.2 (Production) — Only model to produce a fully functional, security-hardened production deployment without the hardened prompt. Demonstrates strong baseline security awareness.
Claude Sonnet 4.6 (Hardened) — Most thorough security implementation: 17-row security feature table, RFC1918 egress blocking, memory-backed tmpfs with size limits, init container for config validation.
Qwen 3.6 Plus (Hardened) — Clean redemption after Production failure. Full PSS Restricted with ConfigMap for port 8080, uid 101, server_tokens off, 5 emptyDir volumes, and automountServiceAccountToken: false. References CIS Benchmark and NSA/CISA guidelines.
DeepSeek V4 Pro (Hardened) — Full PSS Restricted compliant with ConfigMap for port 8080, uid 101, seccomp RuntimeDefault, emptyDir volumes for cache and run directories. Correctly addresses the non-root nginx pitfall that caused its own Production failure.

Notable Bad Results

Claude Sonnet 4.6 (Basic & Production) — Drops ALL capabilities without understanding nginx:latest needs CHOWN. The production version runs as runAsUser: 0 despite configuring port 8080 (which doesn’t need root). Self-contradictory comments in the YAML.
GPT 5.4 (Hardened) — Perfect security settings on a manifest that doesn’t work. The model knows this (says “assumes you will provide an nginx config”) but delivers non-functional YAML anyway.
MiniMax M2.5 (Production) — Invalid YAML structure (capabilities under pod securityContext) demonstrates incomplete Kubernetes API knowledge.
GPT 5.4 (Basic) — Zero security features for a basic prompt. Not even resource limits. While technically correct for the prompt, it’s the least production-aware default.
Qwen 3.6 Plus (Production) — Same chown trap as Sonnet 4.6: drops ALL capabilities, adds NET_BIND_SERVICE, runs as root on port 80. The model’s own text notes recommend using non-root with port 8080 — demonstrating it knows the right answer but fails to implement it.
Gemma 4 31B (Hardened) — Drops ALL capabilities while keeping runAsNonRoot: false (root user), causing the same CrashLoopBackOff chown failure as Sonnet 4.6. The model explicitly recommended using nginxinc/nginx-unprivileged in its own comments but failed to implement the recommendation. Weakest security profile of any model tested.
Xiaomi MiMo v2.5 (Hardened) — Two self-inflicted, deploy-blocking bugs inside an otherwise full-PSS-Restricted manifest: an init container that runs chown after dropping CAP_CHOWN (Init:Error), and an invalid rate_limit nginx directive (should be limit_req) that fails the config test.