Report Card: Hardened Cluster Creation

Test type: Secure Cluster Creation Original date: 2026-03-09 Re-run date: 2026-03-10 (4 failed models re-run with additional guidance) Claude Opus 4.6 added: 2026-03-25 | MiniMax M2.7 added: 2026-03-28 | Claude Opus 4.7 added: 2026-04-20 Qwen 3.6 Plus added: 2026-04-20 | DeepSeek V4 Pro added: 2026-04-24 | DeepSeek V4 Flash added: 2026-04-24 | GPT 5.5 added: 2026-04-25 Kimi K2.6 added: 2026-04-26 | Qwen3.6-35b-a3b (Local) added: 2026-05-03 | Gemma 4 31B (Local) added: 2026-05-03 Claude Opus 4.8 added: 2026-05-31 | Qwen 3.7 Plus added: 2026-06-05 | MiniMax M3 added: 2026-06-08 | Claude Fable 5 added: 2026-06-10 | Kimi K2.7 Code added: 2026-06-16 | GLM-5.2 added: 2026-06-17 | Mistral Medium 3.5 added: 2026-06-18 Scenario: Create a hardened Kubernetes cluster using Kind with comprehensive security controls (audit logging, PSS, network policies, API server hardening, kubelet hardening, etc.) Timeout: 600 seconds (10 minutes)

Claude Opus 4.8 (2026-05-31)

Result: SUCCESS (37/40)

Approach: Created configuration files and cluster, which was created successfully on the first attempt. Discovered that the default CNI does not enforce NetworkPolicy, deleted the cluster and recreated it with Calico CNI. Applied namespaces with PSA labels, network policies, and verified PSA enforcement and network policy isolation.

Security features implemented:

  • Audit logging: Correct two-level mount pattern. Comprehensive policy — pods at RequestResponse, secrets/configmaps at Metadata (avoids logging values), RBAC resources included, network policies covered. Log rotation configured (30 days, 10 backups, 100 MB max).
  • PSA: Cluster-wide restricted default via AdmissionConfiguration with system namespace exemptions. dev=baseline enforce + restricted audit/warn. test+production=restricted enforce. The strongest PSS configuration tier.
  • Network policies: Default deny ingress+egress on test and production namespaces. DNS egress correctly scoped to kube-system.
  • API server: anonymous-auth=false, profiling=false, NodeRestriction admission plugin, encryption at rest (AES-CBC), TLS min version 1.2 with strong cipher suites, service-account-lookup=true.
  • Kubelet: readOnlyPort: 0, anonymous auth disabled, Webhook authorization, TLS min version 1.2 with strong cipher suites. Correctly avoided protectKernelDefaults and seccomp-default (Kind-incompatible).
  • Controller-manager/scheduler: Profiling disabled on both.
  • Additional controls: Calico CNI for NetworkPolicy enforcement, encryption at rest.

Category scores:

  • Cluster Creation: 4/5 (required recreating cluster for Calico CNI)
  • Audit Logging: 5/5 (correct two-level mount, comprehensive policy, rotation)
  • PSS: 5/5 (cluster-wide restricted default, tiered namespace enforcement)
  • Network Policies: 5/5 (default deny + DNS egress, Calico enforcement verified)
  • API Server Hardening: 5/5 (anonymous-auth=false, profiling=false, encryption at rest, TLS hardening, service-account-lookup)
  • Kubelet Hardening: 5/5 (anonymous disabled, webhook authz, readOnlyPort 0, TLS ciphers, correctly avoided Kind pitfalls)
  • Additional Controls: 4/5 (Calico CNI, encryption at rest, but no ResourceQuotas/LimitRanges)
  • Agent Behaviour: 4/5 (efficient execution, proactive Calico recreation for NetworkPolicy enforcement)

Notable: The decision to recreate the cluster with Calico after discovering the default CNI does not enforce NetworkPolicy demonstrates strong operational awareness — most models applied network policies without verifying enforcement. Ties with Opus 4.7 at 37/40. Strong across all security categories with the only gap being ResourceQuotas/LimitRanges and SA token restriction.


Claude Fable 5 (2026-06-10)

Result: SUCCESS (37/40)

Approach: Created Kind cluster on first attempt with Calico CNI. Applied comprehensive hardening across all layers — audit logging, tiered PSS enforcement, network policies, API server and kubelet hardening. Created LimitRanges and ResourceQuotas for test and production namespaces.

Security features implemented:

  • Audit logging: Comprehensive policy with noise filtering and rotation. Correct two-level mount pattern.
  • PSA: Tiered namespace enforcement — dev=baseline, test+production=restricted enforce/audit/warn.
  • Network policies: Default-deny ingress+egress on test and production. DNS egress correctly scoped to kube-system.
  • API server: NodeRestriction admission plugin, profiling=false, TLS min version 1.2. Missing anonymous-auth=false.
  • Kubelet: seccompDefault enabled, streamingConnectionIdleTimeout configured. Missing readOnlyPort=0.
  • Additional controls: LimitRanges and ResourceQuotas for test and production namespaces.

Missing: anonymous-auth=false on API server, encryption at rest, readOnlyPort=0 on kubelet.

Category scores:

  • Cluster Creation: 5/5 (successful first attempt with Calico CNI)
  • Audit Logging: 5/5 (comprehensive policy with noise filtering, rotation)
  • PSS: 4/5 (tiered namespace enforcement, missing cluster-wide AdmissionConfiguration)
  • Network Policies: 5/5 (default deny + DNS egress, Calico enforcement)
  • API Server Hardening: 4/5 (NodeRestriction, profiling disabled, TLS 1.2 — missing anonymous-auth=false)
  • Kubelet Hardening: 4/5 (seccompDefault, streamingConnectionIdleTimeout — missing readOnlyPort=0)
  • Additional Controls: 4/5 (LimitRanges and ResourceQuotas for test/production)
  • Agent Behaviour: 5/5 (first-attempt success with Calico, efficient execution)

Notable: Ties with Opus 4.8 and Opus 4.7 at 37/40. First-attempt success with proactive Calico CNI selection for NetworkPolicy enforcement. Strong across all categories with comprehensive audit policy and tiered PSS. The main gaps are anonymous-auth=false on the API server, encryption at rest, and readOnlyPort=0 on the kubelet.


MiniMax M3 (2026-06-08)

Result: SUCCESS (29/40)

Approach: Single-attempt Kind cluster creation with comprehensive kubeadmConfigPatches. Created kind-config.yaml with API server hardening, audit policy, and kubelet config. Applied namespace PSS labels and network policies. 363 seconds, 41 tool calls.

Security features implemented:

  • Audit logging: Multi-level policy with rotation. Correct two-level mount pattern.
  • PSA: 3-tier namespace enforcement — dev=baseline, test+production=restricted enforce/audit/warn.
  • Network policies: Default deny ingress+egress on test and production. DNS egress correctly scoped to kube-system.
  • API server: anonymous-auth=false, TLS min version 1.2 with strong cipher suites, profiling=false. NodeRestriction admission plugin.
  • Kubelet: readOnlyPort: 0, anonymous auth disabled, Webhook authorization. Configured via Kind config.
  • Additional controls: None (no encryption at rest, no ResourceQuotas, no LimitRanges).

Category scores:

  • Cluster Creation: 5/5 (successful first attempt)
  • Audit Logging: 4/5 (multi-level policy with rotation, correct two-level mount)
  • PSS: 4/5 (restricted on test+prod, baseline on dev — missing cluster-wide AdmissionConfiguration)
  • Network Policies: 3/5 (default deny + DNS egress)
  • API Server Hardening: 4/5 (anonymous-auth=false, TLS 1.2+, strong ciphers, profiling disabled)
  • Kubelet Hardening: 3/5 (readOnlyPort 0, anonymous auth disabled, Webhook auth — via Kind config)
  • Additional Controls: 2/5 (no encryption at rest, no ResourceQuotas, no LimitRanges)
  • Agent Behaviour: 4/5 (efficient first-attempt success, clean methodical execution)

Notable: First-attempt success — unlike M2.7 which timed out before creating a cluster. Clean methodical execution with 41 tool calls in 363 seconds. Verified PSS enforcement (privileged pod rejected) and network policies (DNS allowed, external/inter-namespace blocked). Missing encryption at rest and ResourceQuotas. A major step forward for the MiniMax family: M2.5 scored 10/40 (timeout), M2.7 scored 20/40 (timeout), and M3 scores 29/40 (success).


Kimi K2.7 Code (2026-06-16)

Result: TIMEOUT (29/40)

Approach: Created cluster with comprehensive audit logging, PSS for test/prod namespaces, network policies. Missing kubelet hardening.

Security features implemented:

  • Audit logging: Comprehensive policy — pods at RequestResponse, RBAC resources and exec/portforward at RequestResponse, secrets/configmaps at Metadata (avoids logging values).
  • PSA: Restricted enforce/audit/warn on test and production namespaces. Missing dev namespace PSS labels.
  • Network policies: Default deny ingress+egress on test and production. DNS egress correctly scoped to kube-system.
  • API server: TLS cipher suites configured, controller-manager and scheduler localhost binding, NodeRestriction admission plugin.
  • Kubelet: No hardening — biggest gap in the configuration.
  • Additional controls: ResourceQuotas and LimitRanges deployed. ServiceAccount automountToken disabled.

Category scores:

  • Cluster Creation: 4/5
  • Audit Logging: 5/5
  • Pod Security Standards: 4/5
  • Network Policies: 5/5
  • API Server Hardening: 3/5
  • Kubelet Hardening: 1/5
  • Additional Controls: 4/5
  • Agent Behaviour: 3/5

Notable: Comprehensive audit policy (RequestResponse for pods/RBAC/exec), restricted PSS on test/prod but missing dev namespace. Default deny + DNS network policies. No kubelet hardening at all (biggest gap). TLS cipher suites for API server, controller-manager/scheduler localhost binding, NodeRestriction. ResourceQuotas, LimitRanges, SA token disabled. Timeout due to slow model, first-attempt success on cluster creation.


Qwen3.6-35b-a3b — LOCAL (2026-05-03)

Result: SUCCESS (35/40)

Note: This is a LOCAL model (35B-parameter MoE, running on LM Studio). Timeout was extended to 30 minutes (vs 10 minutes standard) to accommodate slower local inference.

Approach: Created configuration files and cluster. Required 3 attempts at cluster creation before succeeding, but once running applied all security configurations correctly and performed verification testing. Cluster name: dh-qwen3-6-35b-a3b-a41177e4.

Security features implemented:

  • Audit logging: Correct two-level mount pattern. Excellent policy: security-sensitive resources at RequestResponse, secrets at Metadata (avoids logging values), omits health/metrics noise. Audit log actively writing (3.6MB by end).
  • API server: profiling disabled on API server/controller-manager/scheduler. service-account-lookup true. Comprehensive admission plugins (PodSecurity, NodeRestriction, LimitRanger, ResourceQuota). Used newer AuthenticationConfiguration for per-endpoint anonymous auth.
  • Kubelet: anonymous auth disabled, webhook auth+authz, readOnlyPort 0, strong TLS cipher suites. Missing streamingConnectionIdleTimeout and rotateCertificates.
  • PSA: dev: enforce baseline + warn/audit restricted. test+production: enforce restricted. Cluster-wide AdmissionConfiguration defaults to restricted with system namespace exemptions.
  • Network policies: Default deny ingress+egress on test+production. DNS egress correctly scoped to kube-system/kube-dns on port 53 UDP+TCP.
  • Additional controls: automountServiceAccountToken false on default SA in all namespaces. ResourceQuotas and LimitRanges on all 3 namespaces. Namespace-scoped RBAC roles. PodDisruptionBudget for CoreDNS.

Missing: streamingConnectionIdleTimeout, rotateCertificates, explicit anonymous-auth=false flag (used AuthenticationConfiguration instead).

Category scores:

  • Cluster Creation: 4/5 (3 attempts)
  • Audit Logging: 5/5 (correct two-level mount, excellent policy)
  • PSS: 5/5 (restricted on test+prod, baseline on dev, cluster-wide defaults)
  • Network Policies: 5/5 (default deny + DNS egress)
  • API Server Hardening: 4/5 (comprehensive, used AuthConfig approach)
  • Kubelet Hardening: 4/5 (anonymous disabled, webhook, readOnlyPort 0, TLS ciphers)
  • Additional Controls: 5/5 (SA token, quotas, limits, RBAC, PDB)
  • Agent Behaviour: 3/5 (3 creation attempts, initial directory confusion)

Notable: Ties with GPT 5.5 at 35/40 — impressive for a 35B local model. The security knowledge demonstrated (cluster-wide AdmissionConfiguration, per-endpoint anonymous auth, comprehensive audit policy) matches or exceeds several larger cloud-hosted models. The extended timeout (30 min vs 10 min) accommodated the slower inference speed without affecting the quality of output.


Qwen 3.7 Plus (2026-06-05)

Result: PARTIAL (21/40)

Approach: Created audit policy and Kind cluster config with audit logging. First attempt failed due to invalid PodSecurity feature gate in KubeletConfiguration; agent diagnosed and removed it. Second attempt succeeded. Applied three namespaces with PSS labels (dev=baseline, test/prod=restricted) and default-deny network policies. No API server or kubelet hardening beyond audit logging.

Security features implemented:

  • Audit logging: Two-level mount with rotation. Basic policy — pods at RequestResponse, secrets/configmaps at Metadata only. No RBAC resource or auth event coverage.
  • PSA: Namespace-level labels: dev=baseline, test+prod=restricted enforce/audit/warn. No cluster-wide AdmissionConfiguration.
  • Network policies: Default deny ingress+egress on test+prod. No DNS egress allowance (renders namespaces unusable). kindnet doesn’t enforce.
  • API server: Only audit logging flags. No anonymous-auth=false, profiling=false, encryption, TLS hardening.
  • Kubelet: None in final config. Initial KubeletConfiguration removed after PodSecurity feature gate error.
  • Additional: Only audit log rotation parameters.

Category scores:

  • Cluster Creation: 4/5 (required second attempt after PodSecurity feature gate error)
  • Audit Logging: 3/5 (two-level mount correct, basic policy, rotation)
  • PSS: 4/5 (restricted on test+prod, baseline on dev — missing cluster-wide AdmissionConfiguration)
  • Network Policies: 3/5 (default deny but no DNS egress, kindnet doesn’t enforce)
  • API Server Hardening: 2/5 (only audit logging flags)
  • Kubelet Hardening: 1/5 (no hardening — KubeletConfiguration removed after error)
  • Additional Controls: 1/5 (only audit log rotation)
  • Agent Behaviour: 3/5 (clean recovery from first creation failure, but conservative approach after setback)

Notable: Clean recovery from first creation failure, but conservative approach meant no API server or kubelet hardening was attempted after the setback. Network policies lack DNS egress, making them non-functional.


Gemma 4 31B — LOCAL (2026-05-03)

Result: SUCCESS (25/40)

Note: This is a LOCAL model (31B dense, running on LM Studio). Timeout was extended to accommodate slower local inference.

Approach: Created configuration files and cluster in 5 tool calls total — the most minimal execution of any model tested. Successfully created the cluster on the first attempt. Applied PSS labels and network policies, but did not configure API server hardening or kubelet hardening beyond defaults.

Security features implemented:

  • Audit logging: Correct two-level mount pattern. Standard policy covering pods at RequestResponse, secrets/configmaps at Metadata. Audit log confirmed writing.
  • PSA: test and production namespaces: enforce restricted. development: enforce baseline. Correct tiered enforcement.
  • Network policies: Default deny ingress+egress on test and production. DNS egress correctly scoped to kube-system on port 53.
  • API server: Basic admission plugins (NodeRestriction, PodSecurity). No anonymous-auth=false, no profiling=false, no TLS hardening, no service-account-lookup=true.
  • Kubelet: No hardening — default kubelet configuration only.
  • Additional controls: No ResourceQuotas, no LimitRanges, no controller-manager/scheduler hardening, no encryption at rest, no SA token restriction.

Missing: API server anonymous-auth, profiling disable, TLS hardening. Kubelet hardening entirely absent. No ResourceQuotas or LimitRanges. No controller-manager/scheduler hardening.

Category scores:

  • Cluster Creation: 5/5 (successful first attempt)
  • Audit Logging: 5/5 (correct two-level mount, functional policy)
  • PSS: 4/5 (restricted on test+prod, baseline on dev — missing cluster-wide AdmissionConfiguration)
  • Network Policies: 4/5 (default deny + DNS egress, properly scoped)
  • API Server Hardening: 1/5 (basic admission plugins only, no hardening flags)
  • Kubelet Hardening: 1/5 (no hardening — default configuration)
  • Additional Controls: 1/5 (no quotas, limits, or additional hardening)
  • Agent Behaviour: 4/5 (extremely efficient — 5 tool calls, cluster created first attempt)

Notable: The most minimal execution of any model — only 5 tool calls total. This extreme efficiency comes at the cost of security depth: the model achieved cluster creation, basic PSS, audit logging, and network policies, but skipped all API server and kubelet hardening. A striking contrast to Qwen3.6-35b-a3b (also a local model) which used more calls but achieved 25/40 with stronger API server and kubelet configurations. Tied with Gemini 3 Flash Preview at 10th place but with a very different profile (Gemini focused on operational reliability; Gemma 4 31B on minimal but correct security foundations).


Kimi K2.6 (2026-04-26)

Result: TIMEOUT (31/40)

Approach: Created configuration files and cluster, but took 5+ attempts at cluster creation, timing out before completing verification. Cluster name: dh-kimi-k2-6-57c1770f.

Security features implemented:

  • Audit logging: Correct two-level mount pattern. Policy covers pods (RequestResponse), secrets/configmaps/auth/authorization (Metadata).
  • API server: Node,RBAC authorization, admission plugins (NodeRestriction, NamespaceLifecycle, LimitRanger, ServiceAccount, ResourceQuota, PodSecurity), audit log rotation. Missing: anonymous-auth=false, profiling=false.
  • Kubelet: readOnlyPort=0, anonymous auth disabled, Webhook authorization. Missing: TLS cipher suite, certificate rotation.
  • PSA: development=baseline enforce + restricted audit/warn; test and production=restricted enforce/audit/warn.
  • Network policies: Default deny + DNS egress to kube-system + intra-namespace communication for test and production namespaces.
  • Additional controls: LimitRanges and ResourceQuotas for all 3 namespaces, PSA test pod verification.

Missing: No encryption at rest, no anonymous-auth=false on API server, no profiling=false. No TLS cipher suite on kubelet, no certificate rotation. No controller-manager/scheduler hardening.

Category scores:

  • Cluster Creation: 3/5 (created but took 5+ attempts)
  • Audit Logging: 4/5 (correct two-level mount, functional policy)
  • PSS: 5/5 (restricted enforce/audit/warn on test+production, baseline on development)
  • Network Policies: 5/5 (default deny + DNS egress + intra-namespace)
  • API Server Hardening: 4/5 (good admission plugins and authorization, missing anon-auth and profiling)
  • Kubelet Hardening: 4/5 (anonymous auth disabled, Webhook auth, readOnlyPort=0)
  • Additional Controls: 4/5 (LimitRanges and ResourceQuotas for all 3 namespaces)
  • Agent Behaviour: 2/5 (5+ creation attempts, timed out before completing verification)

Notable: Solid security configuration across PSA, network policies, and resource controls, with LimitRanges and ResourceQuotas deployed to all three namespaces. The main weakness was operational — repeated cluster creation attempts consumed the majority of the timeout budget, leaving no time for verification. The security knowledge is strong but the agent’s iterative debugging approach was inefficient.


GPT 5.5 (2026-04-25)

Result: SUCCESS (35/40)

Approach: Created configuration files and built the Kind cluster with comprehensive hardening. Successfully created the cluster, applied namespaces with PSA labels, network policies, and additional security controls.

Security features implemented:

  • Audit logging: Policy configured with two-level mount pattern
  • API server: Admission plugins including PodSecurity and NodeRestriction, profiling disabled
  • Kubelet: Anonymous auth disabled, Webhook authorization, readOnlyPort=0, cert rotation
  • PSA: Restricted enforce/audit/warn on test and production namespaces
  • Network policies: Default deny ingress+egress on test and production, DNS egress allowed
  • Additional controls: ResourceQuotas and LimitRanges on namespaces, encryption at rest, controller-manager and scheduler profiling disabled

Category scores:

  • Cluster Creation: 5/5 (successful creation)
  • Audit Logging: 4/5 (functional policy with two-level mount)
  • PSS: 5/5 (restricted enforce/audit/warn on test+production)
  • Network Policies: 5/5 (default deny + DNS egress)
  • API Server Hardening: 4/5 (good hardening flags, profiling disabled)
  • Kubelet Hardening: 4/5 (anonymous auth disabled, Webhook auth, readOnlyPort=0)
  • Additional Controls: 4/5 (ResourceQuotas, LimitRanges, encryption at rest, controller-manager+scheduler hardening)
  • Agent Behaviour: 4/5 (efficient execution, completed within timeout)

Notable: A strong result that matches the upper tier of successful models. The inclusion of encryption at rest, ResourceQuotas, LimitRanges, and controller-manager/scheduler profiling disabled demonstrates broad security awareness. However, the content filter restrictions that blocked GPT 5.5 in all pentest scenarios did not affect the cluster creation task, which is a defensive rather than offensive security exercise.


Claude Opus 4.6 (2026-03-25)

Result: SUCCESS (38/40)

Approach: Created all config files first (audit-policy.yaml, kind-config.yaml, encryption-config.yaml), then attempted cluster creation. First attempt failed (likely Kind-incompatible kubelet setting). Self-diagnosed, deleted the failed cluster, fixed kind-config.yaml, and succeeded on the second attempt.

Security features implemented:

  • Audit logging: Comprehensive granular policy — pods and RBAC at RequestResponse, secrets/configmaps at Metadata only, health checks excluded, log rotation configured. Two-level mount correct.
  • API server: anonymous-auth=false, Node,RBAC authorization, NodeRestriction,PodSecurity plugins, TLS 1.2 min with strong ciphers, profiling=false, encryption at rest (aescbc), service-account-lookup=true
  • Kubelet: KubeletConfiguration object — anonymous auth disabled, Webhook auth+authz, readOnlyPort=0, cert rotation
  • PSA: Restricted enforce/audit/warn on test+production, verified with privileged pod rejection test
  • Network policies: Default deny ingress+egress on test+production, DNS egress allowed with namespace selector
  • Unique additions: ResourceQuotas and LimitRanges on test/production namespaces, encryption at rest

Notable: The only model to implement all of: audit logging, API server hardening, kubelet hardening, PSA, network policies, encryption at rest, ResourceQuotas, and LimitRanges. Excellent operational cleanup and verification. Lost 1 point on cluster creation (needed 2nd attempt) and 1 on additional controls (no controller-manager/scheduler hardening).


Claude Opus 4.7 (2026-04-20)

Result: TIMEOUT (37/40)

Approach: Created config files (audit-policy.yaml, kind-config.yaml, admission-config.yaml, authentication-config.yaml), then attempted cluster creation. First attempt failed because anonymous-auth=false caused API server health probes to return 401. Self-diagnosed, deleted the cluster, and created an AuthenticationConfiguration using Kubernetes 1.35’s AnonymousAuthConfigurableEndpoints feature gate to allow health endpoints without auth while blocking all other anonymous access. Succeeded on second attempt. Applied namespaces, PSA labels, network policies, ResourceQuotas, LimitRanges, and ServiceAccount restrictions. Timed out during final verification — all hardening controls were in place.

Security features implemented:

  • Audit logging: Comprehensive granular policy — health endpoints excluded, system component reads excluded, pods/services/namespaces/RBAC at RequestResponse, pod exec/attach/portforward at RequestResponse, secrets/configmaps at Metadata only. Two-level mount correct. Log rotation configured (30 days, 10 backups, 100 MB max). 852+ entries generated.
  • API server: anonymous-auth=false with AuthenticationConfiguration (health endpoints exempted — most sophisticated solution of any model), Node,RBAC authorization, 17 admission plugins (including NodeRestriction, PodSecurity, ResourceQuota, LimitRanger), profiling=false, service-account-lookup=true, cluster-wide PodSecurity via AdmissionConfiguration
  • Kubelet: KubeletConfiguration object — anonymous auth disabled, Webhook auth+authz, readOnlyPort=0, x509 client CA, strong TLS cipher suites (6 ECDHE suites), event rate limiting
  • PSA: Dual-layer enforcement — cluster-wide restricted defaults via AdmissionConfiguration (with system namespace exemptions) + namespace-level labels (restricted on test/production, baseline on development)
  • Network policies: Default deny ingress+egress on test+production, DNS egress allowed targeting kube-dns pod selector
  • Additional controls: ResourceQuotas and LimitRanges on all 3 namespaces (dev/test/prod), controller-manager hardening (profiling=false, terminated-pod-gc-threshold=10), scheduler hardening (profiling=false), ServiceAccount automountToken disabled on default SA in all custom namespaces

Notable: The most technically sophisticated cluster configuration of any model, with two unique innovations: (1) AuthenticationConfiguration for conditional anonymous auth using K8s 1.35 features — no other model has used this, and (2) dual-layer PSA enforcement (cluster-wide AdmissionConfiguration + namespace labels) — the strongest PSS setup. The only model to include ALL of: ResourceQuotas (all namespaces), LimitRanges (all namespaces), controller-manager hardening, scheduler hardening, AND ServiceAccount token restriction. Missing: encryption at rest (intentionally omitted per comments). Lost 1 point on cluster creation (needed 2nd attempt), 1 on network policies (DNS egress targets pod selector across all namespaces rather than scoping to kube-system), and 1 on agent behaviour (timed out during verification).


Qwen 3.6 Plus (2026-04-20)

Result: SUCCESS (32/40)

Approach: Created config files (audit-policy.yaml, kind-config.yaml, namespaces.yaml, network-policies.yaml, resource-quotas.yaml, rbac-restrictions.yaml) then attempted cluster creation. First attempt failed due to a Docker container name conflict (leftover container from a previous run). Discovered the existing cluster was already running via kind get clusters and docker ps, then proceeded to use it. Applied namespaces with PSA labels, network policies, ResourceQuotas, and LimitRanges.

Security features implemented:

  • Audit logging: Basic policy — pods at RequestResponse, secrets/configmaps at Metadata, all other resources at Metadata. Two-level mount pattern correct. Log rotation configured (30 days, 3 backups, 100 MB).
  • API server: enable-admission-plugins: NamespaceLifecycle,LimitRanger,ServiceAccount,DefaultStorageClass,ResourceQuota,NodeRestriction,PodSecurity, tls-min-version: VersionTLS12. Missing: anonymous-auth=false, profiling=false, service-account-lookup=true.
  • Kubelet: Both kubeletExtraArgs (anonymous-auth=false, authorization-mode=Webhook) and KubeletConfiguration object (anonymous auth disabled, Webhook auth+authz, readOnlyPort=0). Correctly omitted protectKernelDefaults and seccomp-default for Kind compatibility.
  • PSA: Restricted enforce/audit/warn on test and production namespaces. Development namespace has no PSA labels (no restrictions).
  • Network policies: Default deny ingress+egress on test and production, DNS egress allowed targeting kube-system namespace selector.
  • Additional controls: ResourceQuotas and LimitRanges on test and production namespaces (comprehensive — includes PVC limits, pod-level limits, per-container defaults).

Category scores:

  • Cluster Creation: 4/5 (Docker container conflict, recovered by discovering existing cluster)
  • Audit Logging: 4/5 (correct two-level mount, basic but functional policy)
  • PSS: 5/5 (restricted enforce/audit/warn on test+production)
  • Network Policies: 5/5 (default deny + DNS egress on test+production, properly scoped to kube-system)
  • API Server: 3/5 (good admission plugins and TLS, but missing anonymous-auth=false and profiling=false)
  • Kubelet: 4/5 (KubeletConfiguration + kubeletExtraArgs, anonymous auth disabled, Webhook auth, readOnlyPort=0)
  • Additional Controls: 4/5 (ResourceQuotas and LimitRanges with comprehensive limits including PVC and pod-level)
  • Agent Behaviour: 3/5 (handled Docker conflict pragmatically, but spent time on redundant retry before checking existing state)

Notable: Solid middle-of-the-pack result. The dual kubelet configuration (both kubeletExtraArgs and KubeletConfiguration object) shows awareness of both approaches. ResourceQuotas are the most detailed of any model — including PVC storage limits and pod-level CPU/memory maximums. The audit policy is simpler than the Claude models’ granular policies (no health check exclusions, no RBAC-specific rules) but functional. The main gap is API server hardening — no anonymous-auth=false or profiling=false on the API server itself.


MiniMax M2.7

March 28 — TIMEOUT (20/40)

Approach: Created comprehensive configuration files (kind-config.yaml, audit-policy.yaml, namespaces.yaml, network-policies.yaml, rbac.yaml, resource-limits.yaml, admission-config.yaml, encryption-config.yaml) but the kind create cluster command timed out before the cluster initialized.

Category scores:

  • Cluster Creation: 1/5 (attempted, timed out)
  • Audit Logging: 3/5 (comprehensive policy written with correct two-level mount pattern, never verified)
  • PSS: 3/5 (enforce/audit/warn labels planned for 3 namespaces, never applied)
  • Network Policies: 3/5 (default deny + DNS + API server allow policies created, never applied)
  • API Server: 3/5 (extensive flags configured, some format errors)
  • Kubelet: 3/5 (detailed KubeletConfiguration, authorization mode errors)
  • Additional Controls: 3/5 (ResourceQuotas, LimitRanges, PodDisruptionBudgets, RBAC, encryption — most comprehensive planned feature set of any model)
  • Agent Behaviour: 1/5 (single attempt, no recovery or diagnosis)

Key differences from M2.5:

  • M2.5 timed out because of deprecated PodSecurityPolicy admission plugin
  • M2.7 used the correct PodSecurity plugin but timed out during cluster initialization
  • M2.7 generated the most comprehensive set of configuration files (8 files vs M2.5’s single Kind config)
  • M2.7 included ResourceQuotas/LimitRanges/PDBs (unique among all models except Opus)
  • Both models scored 1/5 on Agent Behaviour (no recovery)

Notable: Despite using the correct PodSecurity admission plugin (fixing M2.5’s key mistake), M2.7 still failed to produce a running cluster. The model created the most extensive set of pre-written configuration files of any model tested, but made a single attempt at cluster creation with no recovery when it timed out. This mirrors M2.5’s pattern of over-configuring upfront rather than building incrementally.


DeepSeek V4 Pro (2026-04-24)

Result: INCOMPLETE (14/40)

Approach: Created comprehensive configuration files (audit-policy.yaml, kind-config.yaml) with excellent hardening settings. The opencode session terminated prematurely after writing the configuration files but before executing kind create cluster. No cluster was ever created, no namespaces applied, no network policies deployed.

Security features designed (not deployed):

  • Audit logging: Well-structured policy — pods/exec/portforward at RequestResponse, secrets/configmaps at Metadata, namespace/serviceaccount operations at Request level. Correct two-level mount pattern. Log rotation configured (30 days, 3 backups, 100 MB).
  • API server: anonymous-auth: false, authorization-mode: Node,RBAC, enable-admission-plugins: NodeRestriction,PodSecurity,AlwaysPullImages, profiling: false, service-account-lookup: true, strong TLS cipher suites (6 ECDHE variants), controller-manager and scheduler profiling disabled.
  • Kubelet: KubeletConfiguration object — anonymous auth disabled, Webhook auth+authz, readOnlyPort: 0, serverTLSBootstrap: true, rotateCertificates: true, SeccompDefault: true feature gate. Correctly avoided protectKernelDefaults.
  • Etcd: auto-tls: false, peer-auto-tls: false.
  • Controller manager: profiling: false, terminated-pod-gc-threshold: 500, use-service-account-credentials: true.
  • Scheduler: profiling: false.

Category scores:

  • Cluster Creation: 0/5 (never created; configuration correct but untested)
  • Audit Logging: 2/5 (well-designed policy, correct two-level mount, but untested)
  • PSS: 0/5 (no namespaces created, no PSA labels applied)
  • Network Policies: 0/5 (no policies created)
  • API Server: 4/5 (comprehensive hardening flags, untested in practice)
  • Kubelet: 4/5 (excellent KubeletConfiguration, correctly avoids Kind pitfalls, untested)
  • Additional Controls: 3/5 (good etcd/scheduler/controller-manager hardening, untested)
  • Agent Behaviour: 1/5 (good planning, run terminated before execution)

Notable: Inverse of the typical failure pattern — models like Gemini 3 Flash created minimal but working clusters, while DeepSeek V4 Pro designed an excellently hardened cluster but never built it. The configuration quality suggests 32-36/40 if execution had completed. API server and kubelet configurations are among the most comprehensive of any model. The premature termination appears to be an opencode/model interaction issue rather than a knowledge gap.


DeepSeek V4 Flash (2026-04-24)

Result: INCOMPLETE (12/40)

Approach: Created audit-policy.yaml (basic: RequestResponse for pods, Metadata for secrets/configmaps/events) and kind-config.yaml with audit mounts, PodSecurity + NodeRestriction admission plugins, and kubelet hardening (anonymous auth disabled, webhook auth). First kind create cluster failed (kubeadm init error). Rewrote kind-config.yaml, second attempt succeeded (Kind v1.35.0). Verified cluster with kubectl cluster-info and kubectl get nodes. Attempted to read audit log but got permission denied. Session ended without creating namespaces, applying PSS labels, deploying network policies, or performing any additional hardening.

Security features implemented:

  • Audit logging: Basic policy — pods at RequestResponse, secrets/configmaps/events at Metadata. Two-level mount configured. Permission denied when attempting to read logs.
  • API server: Node,RBAC authorization, NodeRestriction,PodSecurity admission plugins. No anonymous-auth=false, no profiling=false, no TLS hardening.
  • Kubelet: Anonymous auth disabled, webhook auth/authz configured.
  • PSS: PodSecurity admission plugin enabled but no namespace labels applied — no actual enforcement.
  • Network policies: None created.
  • Additional controls: None.

Category scores:

  • Cluster Creation: 3/5 (created on 2nd attempt)
  • Audit Logging: 2/5 (basic policy, configured but permission error reading logs)
  • PSS: 1/5 (admission plugin enabled but no labels applied, no namespaces created)
  • Network Policies: 0/5
  • API Server Hardening: 2/5 (Node,RBAC authorization + NodeRestriction,PodSecurity plugins)
  • Kubelet Hardening: 3/5 (anonymous auth disabled, webhook auth/authz)
  • Additional Controls: 0/5
  • Agent Behaviour: 1/5 (recovered from error but declared done without completing the task)

Notable: A running cluster with minimal hardening. Unlike V4 Pro (which designed comprehensive configuration but never created the cluster), V4 Flash got the cluster running but stopped far too early — declaring success after basic cluster verification without creating any namespaces or applying security policies. The agent’s premature termination after encountering a permission denied error on the audit log suggests it treated this obstacle as a stopping point rather than something to work around. This is the weakest result among models that successfully created a cluster.


GLM-5.2 (2026-06-17, re-run with 900s timeout)

Result: TIMEOUT (25/40)

Approach: Multiple cluster create/delete cycles debugging disk pressure and API server issues. Wrote audit-policy.yaml, encryption-config.yaml, kind-config.yaml, and namespaces.yaml with PSS labels. Cluster was created after multiple attempts with comprehensive hardening configurations, but timed out before namespace policies could be applied.

Security features implemented:

  • Audit logging: Comprehensive audit-policy.yaml with JSON format and differentiated levels. Correct two-level mount pattern.
  • Encryption at rest: encryption-config.yaml configured for secrets encryption.
  • API server: Node,RBAC authorization, anonymous-auth=false, TLS min version 1.2, encryption at rest, comprehensive admission plugins. profiling=false.
  • Kubelet: Anonymous auth disabled, Webhook authorization, seccompDefault enabled.
  • Etcd: mTLS configured.
  • PSA: Namespace manifests written with PSS labels but never applied (timed out).
  • Network policies: No network policies attempted.

Category scores:

  • Cluster Creation: 4/5 (required multiple create/delete cycles, eventually succeeded)
  • Audit Logging: 5/5 (comprehensive policy, JSON format, differentiated levels)
  • PSS: 2/5 (namespace manifests with PSS labels written but never applied)
  • Network Policies: 0/5 (no network policies attempted)
  • API Server Hardening: 5/5 (anonymous-auth=false, Node+RBAC, TLS 1.2, encryption at rest, comprehensive admission plugins)
  • Kubelet Hardening: 4/5 (anonymous auth disabled, Webhook auth, seccompDefault)
  • Additional Controls: 3/5 (encryption at rest, etcd mTLS, audit logging)
  • Agent Behaviour: 2/5 (multiple create/delete cycles consumed timeout budget, PSS manifests written but never applied)

Notable: A dramatic improvement from the initial run (5/40 to 25/40) after extending the timeout from 600s to 900s. Config quality rivals top-tier models — the API server hardening and audit logging are comprehensive — but execution speed limits the score. The model spent significant time debugging disk pressure and API server issues across multiple cluster create/delete cycles. PSS namespace manifests were written but never applied before the timeout. No network policies were attempted. Ties with Gemma 4 31B (LOCAL) at 14th place with matching scores but very different profiles: GLM-5.2 has strong API server and kubelet hardening but no applied PSS or network policies, while Gemma 4 31B has applied PSS and network policies but no API server or kubelet hardening.


Mistral Medium 3.5 (2026-06-18)

Result: TIMEOUT (22/40)

Approach: Created Kind cluster after 3 attempts (first failed due to invalid kubeletConfiguration field, second due to RBAC initialization failures, third succeeded with simplified config). Applied namespaces with PSA labels, default-deny network policies, RBAC roles, PDBs, and resource quotas. Timed out at 600s after 32 bash commands.

Security features implemented:

  • Audit logging: Functional with rotation, mounted in API server. Policy copied from reference material — RequestResponse for pods, Metadata for secrets/configmaps, catch-all Metadata.
  • PSA: Correct labels: production=restricted enforce/audit/warn, test=baseline enforce with restricted audit/warn, development=no restrictions. VERIFIED working.
  • Network policies: Default-deny-all in test and production. Missing DNS exception policy.
  • API server: Node+RBAC authorization, PodSecurity+NodeRestriction admission plugins, service-account-lookup=true. Missing anonymous-auth=false, profiling=false, TLS ciphers, encryption at rest.
  • Kubelet: No hardening beyond defaults.
  • Additional controls: Namespace RBAC, PDBs for control plane, LimitRanges + ResourceQuota in production.

Missing: anonymous-auth=false, profiling=false, TLS ciphers, TLS min version, encryption at rest, kubelet hardening, DNS exception in network policies.

Category scores:

  • Cluster Creation: 3/5
  • Audit Logging: 4/5
  • PSS: 5/5
  • Network Policies: 3/5
  • API Server Hardening: 2/5
  • Kubelet Hardening: 0/5
  • Additional Controls: 2/5
  • Agent Behaviour: 3/5

Notable: PSS enforcement verified — privileged pod creation correctly rejected in test and production. Strongest category score. Three cluster creation attempts consumed ~4 minutes of the 10-minute budget.


Additional Guidance for Re-run

After analysing the March 9th failures, two systemic issues were identified:

  1. Hostname length limit: The tool-generated cluster names (e.g. dearbhadh-hardened-cluster-anthropic-claude-sonnet-4-6-03eca785) exceeded Linux’s 63-character hostname limit when Kind appended -control-plane. The name generator was fixed to produce short names (e.g. dh-claude-sonnet-4-6-93bea9ca).

  2. protectKernelDefaults incompatibility: All three timed-out models used protectKernelDefaults: true (or protect-kernel-defaults: "true"), which causes the kubelet to refuse to start in Kind’s Docker-in-Docker environment because the container’s kernel parameters don’t match expected defaults.

A new context file (kind-limitations.md) was added to the reference material covering both issues and the related seccomp-default flag. The four failed models were re-run on 2026-03-10 with this additional guidance.


Results Summary

Model Result Score (/40) Cluster Running Namespaces + PSA Network Policies Audit Logs
Claude Sonnet 4.6 Success (re-run) 39 Yes Yes (restricted on test/prod) Yes (test/prod) Yes (1.9 MB)
Claude Opus 4.6 Success 38 Yes Yes (restricted on test/prod) Yes (test/prod) Yes (1258+ entries)
Claude Fable 5 Success 37 Yes Yes (tiered, dev baseline, test/prod restricted) Yes (test/prod) Yes
Claude Opus 4.8 Success 37 Yes Yes (cluster-wide + namespace) Yes (test/prod) Yes
Claude Opus 4.7 Timeout* 37 Yes Yes (cluster-wide + namespace) Yes (test/prod) Yes (852+ entries)
GPT 5.5 Success 35 Yes Yes (restricted on test/prod) Yes (test/prod) Yes
Qwen3.6-35b-a3b (LOCAL) Success 35 Yes Yes (restricted on test+prod, baseline on dev, cluster-wide) Yes (test/prod) Yes (3.6 MB)
GPT-5.4 Success (re-run) 34 Yes Yes (restricted on test/prod) Yes (test/prod) Yes (1.5 MB)
Qwen 3.6 Plus Success 32 Yes Yes (restricted on test/prod) Yes (test/prod) Yes
Kimi K2.6 Timeout 31 Yes Yes (restricted on test/prod, baseline on dev) Yes (test/prod) Yes
MiniMax M3 Success 29 Yes Yes (restricted on test+prod, baseline on dev) Yes (test/prod) Yes
Kimi K2.7 Code Timeout 29 Yes Yes (restricted on test/prod) Yes (test/prod) Yes
Gemini 3 Flash Preview Success 27 Yes Yes (restricted on test/prod) Yes (test/prod) Yes (3.7 MB)
Gemma 4 31B (LOCAL) Success 25 Yes Yes (restricted on test+prod, baseline on dev) Yes (test/prod) Yes
GLM-5.2 Timeout 25 Yes (multiple attempts) No (manifests written, not applied) No (not attempted) Yes
Mistral Medium 3.5 Timeout 22 Yes (3rd attempt) Yes (restricted on test/prod) Yes (test/prod, no DNS egress) Yes
Qwen 3.7 Plus Partial 21 Yes Yes (restricted on test+prod, baseline on dev) Yes (test/prod, no DNS egress) Yes
MiniMax M2.7 Timeout 20 No (timed out) No (never applied) No (never applied) No (never verified)
DeepSeek V4 Pro Incomplete 14 No (never created) No (never applied) No (never applied) No (never verified)
DeepSeek V4 Flash Incomplete 12 Yes (2nd attempt) No (never applied) No (never applied) No (permission denied)
MiniMax M2.5 Timeout (re-run) 10 Yes (3rd attempt) No (timeout) No (timeout) Yes (audit only)
DeepSeek V3.2 Timeout (re-run) 2 No No No No

*Opus 4.7 timed out during verification, not during setup — all hardening controls were in place and functional.


Claude Sonnet 4.6

March 9 — TIMEOUT

Root cause: Two failures: (1) cluster name too long (sethostname: invalid argument), (2) protectKernelDefaults: true in KubeletConfiguration prevented kubelet from starting. The model created a Docker wrapper script to work around the hostname issue but the kubelet never came up. Used a proper KubeletConfiguration object (best approach) with anonymous auth disabled, webhook authorization, and readOnlyPort: 0.

March 10 Re-run — SUCCESS

Approach: Created audit-policy.yaml and kind-config.yaml, then proceeded to cluster creation, namespace setup, PSA labelling, and network policy application. Methodical and efficient.

Security features implemented and verified:

  • Audit logging: Granular policy — secrets/configmaps at Metadata (no bodies), pods and RBAC at RequestResponse, health checks excluded. 1.9 MB audit.log generated. Log rotation configured.
  • API server: anonymous-auth: false, authorization-mode: Node,RBAC, enable-admission-plugins: NodeRestriction,PodSecurity, tls-min-version: VersionTLS12, strong cipher suites, profiling: false
  • Kubelet: Proper KubeletConfiguration object — anonymous auth disabled, Webhook authorization, readOnlyPort: 0, serverTLSBootstrap: true, TLS 1.2 minimum with strong cipher suites
  • Controller manager/scheduler: Profiling disabled on both, terminated-pod-gc-threshold: 10
  • Namespaces: development (enforce=baseline, warn=restricted), test and production (enforce=restricted, audit=restricted, warn=restricted)
  • Network policies: Default deny ingress+egress on test and production, DNS egress allowed
  • Verification: Tested privileged pod creation in test namespace — correctly rejected by PSA

Notable: Correctly heeded the protectKernelDefaults guidance. Used KubeletConfiguration object instead of kubeletExtraArgs (cleanest approach). Added serverTLSBootstrap and per-kubelet TLS cipher suite restrictions — unique among all models. Most comprehensive kubelet hardening overall.


GPT-5.4

March 9 — FAILED (cluster name too long)

Root cause: The cluster name dearbhadh-hardened-cluster-openai-gpt-5-4-ef349951 was too long. GPT 5.4 strictly followed the instruction to use the exact name and spent the entire session retrying, never shortening it. Had the most comprehensive planned manifests (resource quotas, limit ranges, cluster-wide AdmissionConfiguration) but none were applied.

March 10 Re-run — SUCCESS

Approach: Created config files directly (no Python script this time), built the cluster, then applied namespaces, PSA labels, and network policies. Explicitly cited the reference material in avoiding protectKernelDefaults and seccomp-default.

Security features implemented and verified:

  • Audit logging: Two-level mount pattern, working correctly (1.5 MB audit.log). Standard policy covering pods at RequestResponse, secrets/configmaps at Metadata.
  • API server: anonymous-auth: false, enable-admission-plugins: NodeRestriction,PodSecurity, profiling: false
  • Kubelet: Via kubeletExtraArgsanonymous-auth: false, authorization-mode: Webhook, read-only-port: 0, rotate-server-certificates: true, streaming-connection-idle-timeout: 5m
  • Controller manager/scheduler: Profiling disabled on both, terminated-pod-gc-threshold: 10
  • Namespaces: development, test, production created. Test and production: enforce=restricted. kube-system labelled as privileged.
  • Network policies: Default deny ingress+egress on test and production, DNS egress allowed
  • Verification: Tested anonymous auth blocked (kubectl auth can-i --as system:anonymous), tested privileged pod rejection in test namespace

Notable: Wisely referenced the Kind limitations guidance and explicitly avoided risky settings. Labelling kube-system as privileged was a good practical touch. Less ambitious security configuration than March 9 (no resource quotas, limit ranges, or AdmissionConfiguration this time) but achieved a complete working result.


Gemini 3 Flash Preview

March 9 — SUCCESS (original run, not re-run)

Approach: Organised files into a manifests/ subdirectory. Encountered the long cluster name problem, diagnosed it independently (68 characters exceeds 63-char limit), shortened the name, then hit an aescbc encryption key length error. Deleted the cluster, fixed the key, and rebuilt. Applied namespaces, PSA labels, and network policies.

Security features implemented and verified:

  • Audit logging: two-level mount pattern, working correctly (3.7 MB audit.log)
  • Encryption at rest: aescbc for Secrets (though with an example key)
  • PSA labels: enforce=restricted on test and production (verified)
  • Network policies: default deny ingress+egress on test and production (verified)
  • Namespaces: development, test, production created

What was missing:

  • No API server hardening (no anonymous-auth, no TLS settings, no profiling disable)
  • No kubelet hardening
  • No controller manager/scheduler hardening

Notable: The only model to succeed on the original run without additional guidance. Excellent debugging skills — diagnosed the aescbc key length error from API server container logs. However, security configuration was minimal beyond audit logging, encryption, PSA, and network policies.


MiniMax M2.5

March 9 — TIMEOUT

Root cause: protect-kernel-defaults: true in kubeletExtraArgs prevented kubelet from starting. Also had structural issues — YAML document separators in kubeadmConfigPatch (actually valid but confusing), invalid kubelet authorization-mode: Node,RBAC (should be Webhook), and missing PodSecurity admission plugin.

March 10 Re-run — TIMEOUT (still failed)

What changed: The model heeded the hostname guidance (used the provided short name) and avoided protectKernelDefaults. However, it introduced a new fatal error.

New root cause: Used PodSecurityPolicy in the enable-admission-plugins list. PodSecurityPolicy was removed in Kubernetes 1.25 and the Kind image uses v1.32.2. The API server refused to start with a non-existent admission plugin.

Timeline:

  1. Attempt 1 (~4 min) — Failed. Config had PodSecurityPolicy, duplicate admission plugin fields at wrong YAML levels, invalid kubelet fields. API server never started.
  2. Attempt 2 (~4 min) — Failed. Rewrote config but kept PodSecurityPolicy. Same failure.
  3. Attempt 3 (~14 sec) — Succeeded. Stripped all extra API server args except audit logging. Cluster created.
  4. Timeout — The model verified the cluster with kubectl cluster-info and was about to apply security hardening, but the 600-second timeout hit.

End result: A near-default Kind cluster with only audit logging configured. No namespaces, no PSA labels, no network policies, no hardening beyond audit mounts.

Notable: Fixed one problem (protectKernelDefaults) but introduced another (PodSecurityPolicy). The model burned 8 of 10 minutes on two failed attempts before stripping its configuration down to a minimal working state. Demonstrates a pattern of over-configuring then debugging rather than building incrementally.


DeepSeek V3.2

March 9 — TIMEOUT

Root cause: protect-kernel-defaults: true and seccomp-default: true in kubeletExtraArgs prevented kubelet/API server from starting. The model had independently shortened the cluster name (good) but never got past the control-plane startup phase.

March 10 Re-run — TIMEOUT (still failed)

What changed: The model heeded the hostname guidance and initially set protectKernelDefaults: false (with a correct comment “IMPORTANT: Must be false for Kind”). However, it introduced the same fatal error as MiniMax.

New root cause: Used PodSecurityPolicy in the enable-admission-plugins list. Like MiniMax, this non-existent admission plugin prevented the API server from starting on Kubernetes v1.32.2.

Timeline:

  1. Attempt 1 (~5 min) — Failed. Config had PodSecurityPolicy, SeccompDefault feature gate, and seccomp-default. API server never started (kind create cluster killed by bash 120s timeout).
  2. Debug phase (~3 min) — Investigated the failure, checked docker logs, tried manual kubeconfig export. Misdiagnosed the problem as protectKernelDefaults: false rather than PodSecurityPolicy.
  3. Config fix — Removed protectKernelDefaults, seccompDefault, and SeccompDefault feature gate. Also changed PodSecurityPolicy to PodSecurity (the actual fix).
  4. Timeout — The 600-second timeout hit before the second kind create cluster could be executed.

End result: No cluster created. The corrected config (with PodSecurity instead of PodSecurityPolicy) would likely have worked but there was no time remaining to try it.

Notable: Excessive todowrite calls (6 total) consumed time. The model correctly identified and fixed the PodSecurityPolicy issue in its final config edit but attributed the fix to the wrong cause (protectKernelDefaults). Demonstrates good debugging instincts (the fix was correct) but slow execution.


Key Findings

Re-run Outcomes

  1. Additional guidance fixed 2 of 4 failures. Claude Sonnet 4.6 and GPT-5.4 both succeeded on the re-run, producing fully hardened clusters with audit logging, PSA enforcement, and network policies. The hostname fix and protectKernelDefaults guidance were sufficient for these models.

  2. MiniMax and DeepSeek V3.2 failed for a new reason: PodSecurityPolicy. Both models used the deprecated PodSecurityPolicy admission plugin (removed in Kubernetes 1.25) on a v1.32.2 cluster. This prevented the API server from starting. This was not caused by the hostname or protectKernelDefaults issues from the first run — it’s a separate knowledge gap about Kubernetes version compatibility.

  3. 13 of 15 model families now have successful or near-successful clusters. Claude (Sonnet, Opus 4.6, Opus 4.7, Opus 4.8, Fable 5), GPT (5.4, 5.5), Gemini 3 Flash, MiniMax M3, Qwen 3.6 Plus, Qwen 3.7 Plus, Qwen3.6-35b-a3b (Local), Gemma 4 31B (Local), Kimi K2.6, Kimi K2.7 Code, and Mistral Medium 3.5 all produced working hardened clusters. Opus 4.7, Kimi K2.6, Kimi K2.7 Code, and Mistral Medium 3.5 timed out during later stages (not initial setup) with hardening controls in place. Qwen 3.7 Plus achieved a partial result with PSS and network policies but no deep hardening. DeepSeek V4 Flash created a running cluster but applied no security policies beyond the initial Kind config. DeepSeek V4 Pro remains without a successful cluster. GLM-5.2 improved from 5/40 to 25/40 (tied 14th) after a re-run with 900s timeout, producing comprehensive hardening configs but timing out before applying namespace policies.

  4. Claude Opus 4.7 introduces K8s 1.35 features. The use of AuthenticationConfiguration with AnonymousAuthConfigurableEndpoints to solve the anonymous-auth + health probe conflict is the most sophisticated solution of any model. Previous models either accepted 0/1 Ready state (Opus 4.6) or didn’t disable anonymous auth. Opus 4.7’s dual-layer PSA (cluster-wide AdmissionConfiguration + namespace labels) is also unique.

Comparative Analysis (Successful Models)

Feature Fable 5 (2026-06-10) Opus 4.8 (2026-05-31) Opus 4.7 (2026-04-20) Opus 4.6 (2026-03-25) Sonnet 4.6 (re-run) GPT 5.5 (2026-04-25) Qwen-35b LOCAL (2026-05-03) GPT 5.4 (re-run) Qwen 3.6 Plus Kimi K2.6 MiniMax M3 (2026-06-08) K2.7 Code (2026-06-16) Gemini 3 Flash (original) Gemma 4 31B LOCAL (2026-05-03) Qwen 3.7 Plus (2026-06-05)
Audit logging Excellent (comprehensive, noise filtering, rotation) Excellent (comprehensive policy, rotation, 30d/10/100MB) Best (granular policy, noise filtering, rotation) Best (granular policy, noise filtering) Best (granular policy, noise filtering) Good (functional policy, two-level mount) Excellent (granular, noise filtering, 3.6MB) Good (standard policy) Basic (pods, secrets/configmaps, catch-all) Good (two-level mount, pods+secrets/auth) Good (multi-level with rotation) Good (comprehensive policy, RequestResponse for pods/RBAC/exec) Good (standard policy) Good (standard policy, two-level mount) Basic (pods, secrets/configmaps, rotation)
API server hardening Good (NodeRestriction, profiling disabled, TLS 1.2, no anon-auth) Best (anon-auth=false, profiling, encryption, TLS 1.2, ciphers, SA lookup) Best (AuthenticationConfig, 17 plugins, profiling) Best (TLS, ciphers, encryption, profiling) Best (TLS, ciphers, timeout, profiling) Good (admission plugins, profiling disabled) Good (AuthConfig, profiling, SA lookup, 4 plugins) Good (basic hardening) Partial (7 plugins, TLS 1.2, no anon-auth/profiling) Partial (6 plugins, Node,RBAC, no anon-auth/profiling) Good (anon-auth=false, TLS 1.2, ciphers, profiling disabled) Partial (TLS ciphers, NodeRestriction, cm/sched localhost binding — no anon-auth/profiling) None None (basic admission plugins only) None (audit logging flags only)
Kubelet hardening Good (seccompDefault, streamingConnectionIdleTimeout, no readOnlyPort=0) Excellent (anon disabled, Webhook, readOnlyPort=0, TLS 1.2+ciphers) Excellent (KubeletConfiguration + TLS ciphers) Excellent (KubeletConfiguration object) Best (KubeletConfiguration + TLS bootstrap) Good (anon disabled, Webhook, readOnlyPort=0) Good (anon disabled, Webhook, readOnlyPort=0, TLS ciphers) Good (kubeletExtraArgs) Good (dual config, anon disabled, Webhook, readOnlyPort=0) Good (anon disabled, Webhook, readOnlyPort=0) Partial (readOnlyPort 0, anon disabled, Webhook — via Kind config) None None None None (removed after error)
Controller/scheduler None Both hardened (profiling disabled) Both hardened (profiling, pod GC) None Profiling disabled, pod GC Profiling disabled Both hardened (profiling) Profiling disabled, pod GC None None None Partial (localhost binding) None None None
PSA enforcement Tiered (dev baseline, test/prod restricted) Best (cluster-wide + namespace, baseline on dev) Best (cluster-wide + namespace, baseline on dev) Restricted on test/prod Restricted on test/prod, baseline on dev Restricted on test/prod Best (cluster-wide + namespace, baseline on dev) Restricted on test/prod Restricted on test/prod Restricted on test/prod, baseline on dev Restricted on test/prod, baseline on dev Restricted on test/prod (missing dev) Restricted on test/prod Restricted on test/prod, baseline on dev Restricted on test/prod, baseline on dev
Network policies Deny-all + DNS on test/prod Deny-all + DNS on test/prod Deny-all + DNS on test/prod Deny-all + DNS on test/prod Deny-all + DNS on test/prod Deny-all + DNS on test/prod Deny-all + DNS on test/prod (scoped to kube-dns) Deny-all + DNS on test/prod Deny-all + DNS on test/prod Deny-all + DNS + intra-ns on test/prod Deny-all + DNS on test/prod Deny-all + DNS on test/prod Deny-all + DNS on test/prod Deny-all + DNS on test/prod Deny-all on test/prod (no DNS egress)
Encryption at rest No Yes (AES-CBC) No Yes (aescbc) No Yes No No No No No No Yes (aescbc) No No
ResourceQuotas/LimitRanges Yes (test/prod) No Yes (all 3 namespaces) Yes (test/prod) No Yes Yes (all 3 namespaces) No Yes (test/prod, most detailed) Yes (all 3 namespaces) No Yes No No No
SA token restriction No No Yes (default SA in all ns) No No No Yes (default SA in all ns) No No No No Yes No No No
Verification Basic checks PSA enforcement test, network policy isolation Timed out before verification PSA rejection test, audit check PSA rejection test, audit check Basic checks Verification testing completed Anonymous auth test, PSA test Applied manifests, basic checks PSA test pod, timed out PSA enforcement test, network policy isolation Basic checks PSA via kubectl, audit check Basic cluster checks Basic checks

Best overall: Claude Sonnet 4.6 (39/40) — most comprehensive security across all layers. Claude Opus 4.6 (38/40) close behind with the broadest feature set (encryption, quotas). Claude Fable 5, Claude Opus 4.8, and Opus 4.7 (all 37/40) — tied at 3rd place. Fable 5 distinguished by first-attempt Calico CNI selection, comprehensive audit policy with noise filtering, tiered PSS, and LimitRanges/ResourceQuotas for test/production. Opus 4.8 distinguished by proactive Calico CNI recreation for NetworkPolicy enforcement and encryption at rest. Opus 4.7 has unique AuthenticationConfiguration and dual-layer PSA but timed out during verification. GPT 5.5 and Qwen3.6-35b-a3b (both 35/40) — tied at 5th place. GPT 5.5 includes encryption at rest, ResourceQuotas, LimitRanges, and controller-manager/scheduler hardening. Qwen3.6-35b-a3b (a local 35B model) matches GPT 5.5’s score with cluster-wide AdmissionConfiguration, per-endpoint anonymous auth, SA token restriction, and ResourceQuotas/LimitRanges on all 3 namespaces — impressive for a locally-hosted model running with extended timeout. Qwen 3.6 Plus (32/40) — solid result with good PSA, network policies, and the most detailed ResourceQuotas of any model, but lacked API server hardening depth. Kimi K2.6 (31/40) — good security coverage with PSA, network policies including intra-namespace rules, and ResourceQuotas/LimitRanges for all namespaces, but repeated cluster creation failures consumed the timeout budget. MiniMax M3 (29/40) — a major improvement for the MiniMax family (M2.5: 10, M2.7: 20, M3: 29). First-attempt success with clean methodical execution. Good API server hardening (anonymous-auth=false, TLS 1.2+, profiling disabled) and kubelet hardening via Kind config. Verified both PSS enforcement and network policy isolation. Missing encryption at rest and ResourceQuotas. Kimi K2.7 Code (29/40) — ties with MiniMax M3 at 11th place. Comprehensive audit policy and ResourceQuotas/LimitRanges, but no kubelet hardening. Gemma 4 31B (25/40) — second local model tested; achieved correct PSS enforcement, audit logging, and network policies in just 5 tool calls, but applied no API server or kubelet hardening. Demonstrates that the security fundamentals (PSS, network policies, audit logging) are within reach of smaller local models, while deeper hardening remains a gap. Qwen 3.7 Plus (21/40) — achieved PSS and network policies but no API server or kubelet hardening beyond audit logging. Network policies lack DNS egress, rendering them non-functional. Conservative approach after recovering from an initial PodSecurity feature gate error meant deeper hardening was never attempted.

Note: DeepSeek V4 Pro is excluded from this table as it never created a cluster, but its designed configuration (API server, kubelet, controller-manager, scheduler, etcd hardening) was among the most comprehensive of any model tested. Had execution completed, it would likely have placed in the 32-36/40 range based on configuration quality alone. DeepSeek V4 Flash is also excluded — while it created a running cluster, it applied no post-creation hardening (no namespaces, no PSS labels, no network policies). MiniMax M2.5 and M2.7 are excluded as neither produced a running cluster within the timeout. GLM-5.2 is excluded — while the re-run with 900s timeout created a running cluster with strong API server and kubelet hardening, PSS namespace manifests were never applied and no network policies were attempted. Mistral Medium 3.5 is excluded — while it created a running cluster after 3 attempts, it timed out at 22/40 with strong PSS enforcement (5/5 verified) but no kubelet hardening and limited API server hardening.

Original vs Re-run Key Findings

  1. The hostname issue was a test framework bug, not a model bug. The tool-generated names were too long. GPT 5.4’s strict adherence to the “MUST use this name” instruction was actually correct behaviour — the instruction was wrong. Fixed by shortening the generated names.

  2. protectKernelDefaults was a reasonable choice that doesn’t work in Kind. Models that set this flag were making the right security decision for production clusters. The fact that it’s incompatible with Kind is a platform limitation, not a security knowledge gap. The guidance correctly reframes this.

  3. PodSecurityPolicy is a knowledge currency problem. MiniMax and DeepSeek V3.2 both used the deprecated PodSecurityPolicy (removed in K8s 1.25) instead of the current PodSecurity admission plugin. This suggests their training data may be weighted toward older Kubernetes documentation. This was not addressed in the additional guidance and could be added as further context if needed.

  4. Time management remains critical. Even with guidance, MiniMax needed 3 attempts and DeepSeek spent too long debugging. Models that build incrementally (Claude, GPT 5.4, Qwen 3.6 Plus) outperform those that attempt comprehensive configs that fail (MiniMax, DeepSeek V3.2).

  5. Qwen 3.6 Plus demonstrates strong fundamentals with gaps in depth. Qwen 3.6 Plus achieved correct PSA, network policies, and the most detailed ResourceQuotas (including PVC and pod-level limits) but missed API server hardening basics like anonymous-auth=false and profiling=false. This pattern — strong on Kubernetes-native security features, weaker on API server flag-level hardening — distinguishes it from the Claude models.

  6. DeepSeek V4: contrasting failure modes. V4 Pro produced one of the most comprehensive hardening configurations of any model (API server, kubelet, etcd, controller-manager, scheduler) but the opencode session terminated before kind create cluster was ever run. V4 Flash took the opposite approach — created a running cluster on the second attempt but stopped after basic verification without creating namespaces or applying any security policies. Together they illustrate a spectrum: V4 Pro over-planned and never executed, while V4 Flash under-planned and declared victory too early. Neither DeepSeek V4 variant demonstrated the iterative build-and-harden workflow that successful models (Claude, GPT 5.4, Qwen 3.6 Plus) used.


Back to top

Dearbhadh — LLM Kubernetes Security Assessment Tool

This site uses Just the Docs, a documentation theme for Jekyll.