Report Card: Hardened Cluster Creation
Test type: Secure Cluster Creation Original date: 2026-03-09 Re-run date: 2026-03-10 (4 failed models re-run with additional guidance) Claude Opus 4.6 added: 2026-03-25 | MiniMax M2.7 added: 2026-03-28 | Claude Opus 4.7 added: 2026-04-20 Qwen 3.6 Plus added: 2026-04-20 | DeepSeek V4 Pro added: 2026-04-24 | DeepSeek V4 Flash added: 2026-04-24 | GPT 5.5 added: 2026-04-25 Kimi K2.6 added: 2026-04-26 | Qwen3.6-35b-a3b (Local) added: 2026-05-03 | Gemma 4 31B (Local) added: 2026-05-03 Claude Opus 4.8 added: 2026-05-31 Scenario: Create a hardened Kubernetes cluster using Kind with comprehensive security controls (audit logging, PSS, network policies, API server hardening, kubelet hardening, etc.) Timeout: 600 seconds (10 minutes)
Claude Opus 4.8 (2026-05-31)
Result: SUCCESS (37/40)
Approach: Created configuration files and cluster, which was created successfully on the first attempt. Discovered that the default CNI does not enforce NetworkPolicy, deleted the cluster and recreated it with Calico CNI. Applied namespaces with PSA labels, network policies, and verified PSA enforcement and network policy isolation.
Security features implemented:
- Audit logging: Correct two-level mount pattern. Comprehensive policy — pods at RequestResponse, secrets/configmaps at Metadata (avoids logging values), RBAC resources included, network policies covered. Log rotation configured (30 days, 10 backups, 100 MB max).
- PSA: Cluster-wide restricted default via AdmissionConfiguration with system namespace exemptions. dev=baseline enforce + restricted audit/warn. test+production=restricted enforce. The strongest PSS configuration tier.
- Network policies: Default deny ingress+egress on test and production namespaces. DNS egress correctly scoped to kube-system.
- API server:
anonymous-auth=false,profiling=false, NodeRestriction admission plugin, encryption at rest (AES-CBC), TLS min version 1.2 with strong cipher suites,service-account-lookup=true. - Kubelet:
readOnlyPort: 0, anonymous auth disabled, Webhook authorization, TLS min version 1.2 with strong cipher suites. Correctly avoidedprotectKernelDefaultsandseccomp-default(Kind-incompatible). - Controller-manager/scheduler: Profiling disabled on both.
- Additional controls: Calico CNI for NetworkPolicy enforcement, encryption at rest.
Category scores:
- Cluster Creation: 4/5 (required recreating cluster for Calico CNI)
- Audit Logging: 5/5 (correct two-level mount, comprehensive policy, rotation)
- PSS: 5/5 (cluster-wide restricted default, tiered namespace enforcement)
- Network Policies: 5/5 (default deny + DNS egress, Calico enforcement verified)
- API Server Hardening: 5/5 (anonymous-auth=false, profiling=false, encryption at rest, TLS hardening, service-account-lookup)
- Kubelet Hardening: 5/5 (anonymous disabled, webhook authz, readOnlyPort 0, TLS ciphers, correctly avoided Kind pitfalls)
- Additional Controls: 4/5 (Calico CNI, encryption at rest, but no ResourceQuotas/LimitRanges)
- Agent Behaviour: 4/5 (efficient execution, proactive Calico recreation for NetworkPolicy enforcement)
Notable: The decision to recreate the cluster with Calico after discovering the default CNI does not enforce NetworkPolicy demonstrates strong operational awareness — most models applied network policies without verifying enforcement. Ties with Opus 4.7 at 37/40. Strong across all security categories with the only gap being ResourceQuotas/LimitRanges and SA token restriction.
Qwen3.6-35b-a3b — LOCAL (2026-05-03)
Result: SUCCESS (35/40)
Note: This is a LOCAL model (35B-parameter MoE, running on LM Studio). Timeout was extended to 30 minutes (vs 10 minutes standard) to accommodate slower local inference.
Approach: Created configuration files and cluster. Required 3 attempts at cluster creation before succeeding, but once running applied all security configurations correctly and performed verification testing. Cluster name: dh-qwen3-6-35b-a3b-a41177e4.
Security features implemented:
- Audit logging: Correct two-level mount pattern. Excellent policy: security-sensitive resources at RequestResponse, secrets at Metadata (avoids logging values), omits health/metrics noise. Audit log actively writing (3.6MB by end).
- API server: profiling disabled on API server/controller-manager/scheduler. service-account-lookup true. Comprehensive admission plugins (PodSecurity, NodeRestriction, LimitRanger, ResourceQuota). Used newer AuthenticationConfiguration for per-endpoint anonymous auth.
- Kubelet: anonymous auth disabled, webhook auth+authz, readOnlyPort 0, strong TLS cipher suites. Missing streamingConnectionIdleTimeout and rotateCertificates.
- PSA: dev: enforce baseline + warn/audit restricted. test+production: enforce restricted. Cluster-wide AdmissionConfiguration defaults to restricted with system namespace exemptions.
- Network policies: Default deny ingress+egress on test+production. DNS egress correctly scoped to kube-system/kube-dns on port 53 UDP+TCP.
- Additional controls: automountServiceAccountToken false on default SA in all namespaces. ResourceQuotas and LimitRanges on all 3 namespaces. Namespace-scoped RBAC roles. PodDisruptionBudget for CoreDNS.
Missing: streamingConnectionIdleTimeout, rotateCertificates, explicit anonymous-auth=false flag (used AuthenticationConfiguration instead).
Category scores:
- Cluster Creation: 4/5 (3 attempts)
- Audit Logging: 5/5 (correct two-level mount, excellent policy)
- PSS: 5/5 (restricted on test+prod, baseline on dev, cluster-wide defaults)
- Network Policies: 5/5 (default deny + DNS egress)
- API Server Hardening: 4/5 (comprehensive, used AuthConfig approach)
- Kubelet Hardening: 4/5 (anonymous disabled, webhook, readOnlyPort 0, TLS ciphers)
- Additional Controls: 5/5 (SA token, quotas, limits, RBAC, PDB)
- Agent Behaviour: 3/5 (3 creation attempts, initial directory confusion)
Notable: Ties with GPT 5.5 at 35/40 — impressive for a 35B local model. The security knowledge demonstrated (cluster-wide AdmissionConfiguration, per-endpoint anonymous auth, comprehensive audit policy) matches or exceeds several larger cloud-hosted models. The extended timeout (30 min vs 10 min) accommodated the slower inference speed without affecting the quality of output.
Gemma 4 31B — LOCAL (2026-05-03)
Result: SUCCESS (25/40)
Note: This is a LOCAL model (31B dense, running on LM Studio). Timeout was extended to accommodate slower local inference.
Approach: Created configuration files and cluster in 5 tool calls total — the most minimal execution of any model tested. Successfully created the cluster on the first attempt. Applied PSS labels and network policies, but did not configure API server hardening or kubelet hardening beyond defaults.
Security features implemented:
- Audit logging: Correct two-level mount pattern. Standard policy covering pods at RequestResponse, secrets/configmaps at Metadata. Audit log confirmed writing.
- PSA: test and production namespaces: enforce restricted. development: enforce baseline. Correct tiered enforcement.
- Network policies: Default deny ingress+egress on test and production. DNS egress correctly scoped to kube-system on port 53.
- API server: Basic admission plugins (NodeRestriction, PodSecurity). No
anonymous-auth=false, noprofiling=false, no TLS hardening, noservice-account-lookup=true. - Kubelet: No hardening — default kubelet configuration only.
- Additional controls: No ResourceQuotas, no LimitRanges, no controller-manager/scheduler hardening, no encryption at rest, no SA token restriction.
Missing: API server anonymous-auth, profiling disable, TLS hardening. Kubelet hardening entirely absent. No ResourceQuotas or LimitRanges. No controller-manager/scheduler hardening.
Category scores:
- Cluster Creation: 5/5 (successful first attempt)
- Audit Logging: 5/5 (correct two-level mount, functional policy)
- PSS: 4/5 (restricted on test+prod, baseline on dev — missing cluster-wide AdmissionConfiguration)
- Network Policies: 4/5 (default deny + DNS egress, properly scoped)
- API Server Hardening: 1/5 (basic admission plugins only, no hardening flags)
- Kubelet Hardening: 1/5 (no hardening — default configuration)
- Additional Controls: 1/5 (no quotas, limits, or additional hardening)
- Agent Behaviour: 4/5 (extremely efficient — 5 tool calls, cluster created first attempt)
Notable: The most minimal execution of any model — only 5 tool calls total. This extreme efficiency comes at the cost of security depth: the model achieved cluster creation, basic PSS, audit logging, and network policies, but skipped all API server and kubelet hardening. A striking contrast to Qwen3.6-35b-a3b (also a local model) which used more calls but achieved 25/40 with stronger API server and kubelet configurations. Tied with Gemini 3 Flash Preview at 10th place but with a very different profile (Gemini focused on operational reliability; Gemma 4 31B on minimal but correct security foundations).
Kimi K2.6 (2026-04-26)
Result: TIMEOUT (31/40)
Approach: Created configuration files and cluster, but took 5+ attempts at cluster creation, timing out before completing verification. Cluster name: dh-kimi-k2-6-57c1770f.
Security features implemented:
- Audit logging: Correct two-level mount pattern. Policy covers pods (RequestResponse), secrets/configmaps/auth/authorization (Metadata).
- API server:
Node,RBACauthorization, admission plugins (NodeRestriction, NamespaceLifecycle, LimitRanger, ServiceAccount, ResourceQuota, PodSecurity), audit log rotation. Missing:anonymous-auth=false,profiling=false. - Kubelet:
readOnlyPort=0, anonymous auth disabled, Webhook authorization. Missing: TLS cipher suite, certificate rotation. - PSA: development=baseline enforce + restricted audit/warn; test and production=restricted enforce/audit/warn.
- Network policies: Default deny + DNS egress to kube-system + intra-namespace communication for test and production namespaces.
- Additional controls: LimitRanges and ResourceQuotas for all 3 namespaces, PSA test pod verification.
Missing: No encryption at rest, no anonymous-auth=false on API server, no profiling=false. No TLS cipher suite on kubelet, no certificate rotation. No controller-manager/scheduler hardening.
Category scores:
- Cluster Creation: 3/5 (created but took 5+ attempts)
- Audit Logging: 4/5 (correct two-level mount, functional policy)
- PSS: 5/5 (restricted enforce/audit/warn on test+production, baseline on development)
- Network Policies: 5/5 (default deny + DNS egress + intra-namespace)
- API Server Hardening: 4/5 (good admission plugins and authorization, missing anon-auth and profiling)
- Kubelet Hardening: 4/5 (anonymous auth disabled, Webhook auth, readOnlyPort=0)
- Additional Controls: 4/5 (LimitRanges and ResourceQuotas for all 3 namespaces)
- Agent Behaviour: 2/5 (5+ creation attempts, timed out before completing verification)
Notable: Solid security configuration across PSA, network policies, and resource controls, with LimitRanges and ResourceQuotas deployed to all three namespaces. The main weakness was operational — repeated cluster creation attempts consumed the majority of the timeout budget, leaving no time for verification. The security knowledge is strong but the agent’s iterative debugging approach was inefficient.
GPT 5.5 (2026-04-25)
Result: SUCCESS (35/40)
Approach: Created configuration files and built the Kind cluster with comprehensive hardening. Successfully created the cluster, applied namespaces with PSA labels, network policies, and additional security controls.
Security features implemented:
- Audit logging: Policy configured with two-level mount pattern
- API server: Admission plugins including PodSecurity and NodeRestriction, profiling disabled
- Kubelet: Anonymous auth disabled, Webhook authorization, readOnlyPort=0, cert rotation
- PSA: Restricted enforce/audit/warn on test and production namespaces
- Network policies: Default deny ingress+egress on test and production, DNS egress allowed
- Additional controls: ResourceQuotas and LimitRanges on namespaces, encryption at rest, controller-manager and scheduler profiling disabled
Category scores:
- Cluster Creation: 5/5 (successful creation)
- Audit Logging: 4/5 (functional policy with two-level mount)
- PSS: 5/5 (restricted enforce/audit/warn on test+production)
- Network Policies: 5/5 (default deny + DNS egress)
- API Server Hardening: 4/5 (good hardening flags, profiling disabled)
- Kubelet Hardening: 4/5 (anonymous auth disabled, Webhook auth, readOnlyPort=0)
- Additional Controls: 4/5 (ResourceQuotas, LimitRanges, encryption at rest, controller-manager+scheduler hardening)
- Agent Behaviour: 4/5 (efficient execution, completed within timeout)
Notable: A strong result that matches the upper tier of successful models. The inclusion of encryption at rest, ResourceQuotas, LimitRanges, and controller-manager/scheduler profiling disabled demonstrates broad security awareness. However, the content filter restrictions that blocked GPT 5.5 in all pentest scenarios did not affect the cluster creation task, which is a defensive rather than offensive security exercise.
Claude Opus 4.6 (2026-03-25)
Result: SUCCESS (38/40)
Approach: Created all config files first (audit-policy.yaml, kind-config.yaml, encryption-config.yaml), then attempted cluster creation. First attempt failed (likely Kind-incompatible kubelet setting). Self-diagnosed, deleted the failed cluster, fixed kind-config.yaml, and succeeded on the second attempt.
Security features implemented:
- Audit logging: Comprehensive granular policy — pods and RBAC at RequestResponse, secrets/configmaps at Metadata only, health checks excluded, log rotation configured. Two-level mount correct.
- API server:
anonymous-auth=false,Node,RBACauthorization,NodeRestriction,PodSecurityplugins, TLS 1.2 min with strong ciphers,profiling=false, encryption at rest (aescbc),service-account-lookup=true - Kubelet: KubeletConfiguration object — anonymous auth disabled, Webhook auth+authz,
readOnlyPort=0, cert rotation - PSA: Restricted enforce/audit/warn on test+production, verified with privileged pod rejection test
- Network policies: Default deny ingress+egress on test+production, DNS egress allowed with namespace selector
- Unique additions: ResourceQuotas and LimitRanges on test/production namespaces, encryption at rest
Notable: The only model to implement all of: audit logging, API server hardening, kubelet hardening, PSA, network policies, encryption at rest, ResourceQuotas, and LimitRanges. Excellent operational cleanup and verification. Lost 1 point on cluster creation (needed 2nd attempt) and 1 on additional controls (no controller-manager/scheduler hardening).
Claude Opus 4.7 (2026-04-20)
Result: TIMEOUT (37/40)
Approach: Created config files (audit-policy.yaml, kind-config.yaml, admission-config.yaml, authentication-config.yaml), then attempted cluster creation. First attempt failed because anonymous-auth=false caused API server health probes to return 401. Self-diagnosed, deleted the cluster, and created an AuthenticationConfiguration using Kubernetes 1.35’s AnonymousAuthConfigurableEndpoints feature gate to allow health endpoints without auth while blocking all other anonymous access. Succeeded on second attempt. Applied namespaces, PSA labels, network policies, ResourceQuotas, LimitRanges, and ServiceAccount restrictions. Timed out during final verification — all hardening controls were in place.
Security features implemented:
- Audit logging: Comprehensive granular policy — health endpoints excluded, system component reads excluded, pods/services/namespaces/RBAC at RequestResponse, pod exec/attach/portforward at RequestResponse, secrets/configmaps at Metadata only. Two-level mount correct. Log rotation configured (30 days, 10 backups, 100 MB max). 852+ entries generated.
- API server:
anonymous-auth=falsewithAuthenticationConfiguration(health endpoints exempted — most sophisticated solution of any model),Node,RBACauthorization, 17 admission plugins (including NodeRestriction, PodSecurity, ResourceQuota, LimitRanger),profiling=false,service-account-lookup=true, cluster-wide PodSecurity viaAdmissionConfiguration - Kubelet: KubeletConfiguration object — anonymous auth disabled, Webhook auth+authz,
readOnlyPort=0, x509 client CA, strong TLS cipher suites (6 ECDHE suites), event rate limiting - PSA: Dual-layer enforcement — cluster-wide restricted defaults via AdmissionConfiguration (with system namespace exemptions) + namespace-level labels (restricted on test/production, baseline on development)
- Network policies: Default deny ingress+egress on test+production, DNS egress allowed targeting kube-dns pod selector
- Additional controls: ResourceQuotas and LimitRanges on all 3 namespaces (dev/test/prod), controller-manager hardening (profiling=false, terminated-pod-gc-threshold=10), scheduler hardening (profiling=false), ServiceAccount automountToken disabled on default SA in all custom namespaces
Notable: The most technically sophisticated cluster configuration of any model, with two unique innovations: (1) AuthenticationConfiguration for conditional anonymous auth using K8s 1.35 features — no other model has used this, and (2) dual-layer PSA enforcement (cluster-wide AdmissionConfiguration + namespace labels) — the strongest PSS setup. The only model to include ALL of: ResourceQuotas (all namespaces), LimitRanges (all namespaces), controller-manager hardening, scheduler hardening, AND ServiceAccount token restriction. Missing: encryption at rest (intentionally omitted per comments). Lost 1 point on cluster creation (needed 2nd attempt), 1 on network policies (DNS egress targets pod selector across all namespaces rather than scoping to kube-system), and 1 on agent behaviour (timed out during verification).
Qwen 3.6 Plus (2026-04-20)
Result: SUCCESS (32/40)
Approach: Created config files (audit-policy.yaml, kind-config.yaml, namespaces.yaml, network-policies.yaml, resource-quotas.yaml, rbac-restrictions.yaml) then attempted cluster creation. First attempt failed due to a Docker container name conflict (leftover container from a previous run). Discovered the existing cluster was already running via kind get clusters and docker ps, then proceeded to use it. Applied namespaces with PSA labels, network policies, ResourceQuotas, and LimitRanges.
Security features implemented:
- Audit logging: Basic policy — pods at RequestResponse, secrets/configmaps at Metadata, all other resources at Metadata. Two-level mount pattern correct. Log rotation configured (30 days, 3 backups, 100 MB).
- API server:
enable-admission-plugins: NamespaceLifecycle,LimitRanger,ServiceAccount,DefaultStorageClass,ResourceQuota,NodeRestriction,PodSecurity,tls-min-version: VersionTLS12. Missing:anonymous-auth=false,profiling=false,service-account-lookup=true. - Kubelet: Both
kubeletExtraArgs(anonymous-auth=false, authorization-mode=Webhook) andKubeletConfigurationobject (anonymous auth disabled, Webhook auth+authz, readOnlyPort=0). Correctly omittedprotectKernelDefaultsandseccomp-defaultfor Kind compatibility. - PSA: Restricted enforce/audit/warn on test and production namespaces. Development namespace has no PSA labels (no restrictions).
- Network policies: Default deny ingress+egress on test and production, DNS egress allowed targeting kube-system namespace selector.
- Additional controls: ResourceQuotas and LimitRanges on test and production namespaces (comprehensive — includes PVC limits, pod-level limits, per-container defaults).
Category scores:
- Cluster Creation: 4/5 (Docker container conflict, recovered by discovering existing cluster)
- Audit Logging: 4/5 (correct two-level mount, basic but functional policy)
- PSS: 5/5 (restricted enforce/audit/warn on test+production)
- Network Policies: 5/5 (default deny + DNS egress on test+production, properly scoped to kube-system)
- API Server: 3/5 (good admission plugins and TLS, but missing anonymous-auth=false and profiling=false)
- Kubelet: 4/5 (KubeletConfiguration + kubeletExtraArgs, anonymous auth disabled, Webhook auth, readOnlyPort=0)
- Additional Controls: 4/5 (ResourceQuotas and LimitRanges with comprehensive limits including PVC and pod-level)
- Agent Behaviour: 3/5 (handled Docker conflict pragmatically, but spent time on redundant retry before checking existing state)
Notable: Solid middle-of-the-pack result. The dual kubelet configuration (both kubeletExtraArgs and KubeletConfiguration object) shows awareness of both approaches. ResourceQuotas are the most detailed of any model — including PVC storage limits and pod-level CPU/memory maximums. The audit policy is simpler than the Claude models’ granular policies (no health check exclusions, no RBAC-specific rules) but functional. The main gap is API server hardening — no anonymous-auth=false or profiling=false on the API server itself.
MiniMax M2.7
March 28 — TIMEOUT (20/40)
Approach: Created comprehensive configuration files (kind-config.yaml, audit-policy.yaml, namespaces.yaml, network-policies.yaml, rbac.yaml, resource-limits.yaml, admission-config.yaml, encryption-config.yaml) but the kind create cluster command timed out before the cluster initialized.
Category scores:
- Cluster Creation: 1/5 (attempted, timed out)
- Audit Logging: 3/5 (comprehensive policy written with correct two-level mount pattern, never verified)
- PSS: 3/5 (enforce/audit/warn labels planned for 3 namespaces, never applied)
- Network Policies: 3/5 (default deny + DNS + API server allow policies created, never applied)
- API Server: 3/5 (extensive flags configured, some format errors)
- Kubelet: 3/5 (detailed KubeletConfiguration, authorization mode errors)
- Additional Controls: 3/5 (ResourceQuotas, LimitRanges, PodDisruptionBudgets, RBAC, encryption — most comprehensive planned feature set of any model)
- Agent Behaviour: 1/5 (single attempt, no recovery or diagnosis)
Key differences from M2.5:
- M2.5 timed out because of deprecated PodSecurityPolicy admission plugin
- M2.7 used the correct PodSecurity plugin but timed out during cluster initialization
- M2.7 generated the most comprehensive set of configuration files (8 files vs M2.5’s single Kind config)
- M2.7 included ResourceQuotas/LimitRanges/PDBs (unique among all models except Opus)
- Both models scored 1/5 on Agent Behaviour (no recovery)
Notable: Despite using the correct PodSecurity admission plugin (fixing M2.5’s key mistake), M2.7 still failed to produce a running cluster. The model created the most extensive set of pre-written configuration files of any model tested, but made a single attempt at cluster creation with no recovery when it timed out. This mirrors M2.5’s pattern of over-configuring upfront rather than building incrementally.
DeepSeek V4 Pro (2026-04-24)
Result: INCOMPLETE (14/40)
Approach: Created comprehensive configuration files (audit-policy.yaml, kind-config.yaml) with excellent hardening settings. The opencode session terminated prematurely after writing the configuration files but before executing kind create cluster. No cluster was ever created, no namespaces applied, no network policies deployed.
Security features designed (not deployed):
- Audit logging: Well-structured policy — pods/exec/portforward at RequestResponse, secrets/configmaps at Metadata, namespace/serviceaccount operations at Request level. Correct two-level mount pattern. Log rotation configured (30 days, 3 backups, 100 MB).
- API server:
anonymous-auth: false,authorization-mode: Node,RBAC,enable-admission-plugins: NodeRestriction,PodSecurity,AlwaysPullImages,profiling: false,service-account-lookup: true, strong TLS cipher suites (6 ECDHE variants), controller-manager and scheduler profiling disabled. - Kubelet: KubeletConfiguration object — anonymous auth disabled, Webhook auth+authz,
readOnlyPort: 0,serverTLSBootstrap: true,rotateCertificates: true,SeccompDefault: truefeature gate. Correctly avoidedprotectKernelDefaults. - Etcd:
auto-tls: false,peer-auto-tls: false. - Controller manager:
profiling: false,terminated-pod-gc-threshold: 500,use-service-account-credentials: true. - Scheduler:
profiling: false.
Category scores:
- Cluster Creation: 0/5 (never created; configuration correct but untested)
- Audit Logging: 2/5 (well-designed policy, correct two-level mount, but untested)
- PSS: 0/5 (no namespaces created, no PSA labels applied)
- Network Policies: 0/5 (no policies created)
- API Server: 4/5 (comprehensive hardening flags, untested in practice)
- Kubelet: 4/5 (excellent KubeletConfiguration, correctly avoids Kind pitfalls, untested)
- Additional Controls: 3/5 (good etcd/scheduler/controller-manager hardening, untested)
- Agent Behaviour: 1/5 (good planning, run terminated before execution)
Notable: Inverse of the typical failure pattern — models like Gemini 3 Flash created minimal but working clusters, while DeepSeek V4 Pro designed an excellently hardened cluster but never built it. The configuration quality suggests 32-36/40 if execution had completed. API server and kubelet configurations are among the most comprehensive of any model. The premature termination appears to be an opencode/model interaction issue rather than a knowledge gap.
DeepSeek V4 Flash (2026-04-24)
Result: INCOMPLETE (12/40)
Approach: Created audit-policy.yaml (basic: RequestResponse for pods, Metadata for secrets/configmaps/events) and kind-config.yaml with audit mounts, PodSecurity + NodeRestriction admission plugins, and kubelet hardening (anonymous auth disabled, webhook auth). First kind create cluster failed (kubeadm init error). Rewrote kind-config.yaml, second attempt succeeded (Kind v1.35.0). Verified cluster with kubectl cluster-info and kubectl get nodes. Attempted to read audit log but got permission denied. Session ended without creating namespaces, applying PSS labels, deploying network policies, or performing any additional hardening.
Security features implemented:
- Audit logging: Basic policy — pods at RequestResponse, secrets/configmaps/events at Metadata. Two-level mount configured. Permission denied when attempting to read logs.
- API server:
Node,RBACauthorization,NodeRestriction,PodSecurityadmission plugins. Noanonymous-auth=false, noprofiling=false, no TLS hardening. - Kubelet: Anonymous auth disabled, webhook auth/authz configured.
- PSS: PodSecurity admission plugin enabled but no namespace labels applied — no actual enforcement.
- Network policies: None created.
- Additional controls: None.
Category scores:
- Cluster Creation: 3/5 (created on 2nd attempt)
- Audit Logging: 2/5 (basic policy, configured but permission error reading logs)
- PSS: 1/5 (admission plugin enabled but no labels applied, no namespaces created)
- Network Policies: 0/5
- API Server Hardening: 2/5 (Node,RBAC authorization + NodeRestriction,PodSecurity plugins)
- Kubelet Hardening: 3/5 (anonymous auth disabled, webhook auth/authz)
- Additional Controls: 0/5
- Agent Behaviour: 1/5 (recovered from error but declared done without completing the task)
Notable: A running cluster with minimal hardening. Unlike V4 Pro (which designed comprehensive configuration but never created the cluster), V4 Flash got the cluster running but stopped far too early — declaring success after basic cluster verification without creating any namespaces or applying security policies. The agent’s premature termination after encountering a permission denied error on the audit log suggests it treated this obstacle as a stopping point rather than something to work around. This is the weakest result among models that successfully created a cluster.
Additional Guidance for Re-run
After analysing the March 9th failures, two systemic issues were identified:
-
Hostname length limit: The tool-generated cluster names (e.g.
dearbhadh-hardened-cluster-anthropic-claude-sonnet-4-6-03eca785) exceeded Linux’s 63-character hostname limit when Kind appended-control-plane. The name generator was fixed to produce short names (e.g.dh-claude-sonnet-4-6-93bea9ca). -
protectKernelDefaultsincompatibility: All three timed-out models usedprotectKernelDefaults: true(orprotect-kernel-defaults: "true"), which causes the kubelet to refuse to start in Kind’s Docker-in-Docker environment because the container’s kernel parameters don’t match expected defaults.
A new context file (kind-limitations.md) was added to the reference material covering both issues and the related seccomp-default flag. The four failed models were re-run on 2026-03-10 with this additional guidance.
Results Summary
| Model | Result | Score (/40) | Cluster Running | Namespaces + PSA | Network Policies | Audit Logs |
|---|---|---|---|---|---|---|
| Claude Sonnet 4.6 | Success (re-run) | 39 | Yes | Yes (restricted on test/prod) | Yes (test/prod) | Yes (1.9 MB) |
| Claude Opus 4.6 | Success | 38 | Yes | Yes (restricted on test/prod) | Yes (test/prod) | Yes (1258+ entries) |
| Claude Opus 4.8 | Success | 37 | Yes | Yes (cluster-wide + namespace) | Yes (test/prod) | Yes |
| Claude Opus 4.7 | Timeout* | 37 | Yes | Yes (cluster-wide + namespace) | Yes (test/prod) | Yes (852+ entries) |
| GPT 5.5 | Success | 35 | Yes | Yes (restricted on test/prod) | Yes (test/prod) | Yes |
| Qwen3.6-35b-a3b (LOCAL) | Success | 35 | Yes | Yes (restricted on test+prod, baseline on dev, cluster-wide) | Yes (test/prod) | Yes (3.6 MB) |
| GPT-5.4 | Success (re-run) | 34 | Yes | Yes (restricted on test/prod) | Yes (test/prod) | Yes (1.5 MB) |
| Qwen 3.6 Plus | Success | 32 | Yes | Yes (restricted on test/prod) | Yes (test/prod) | Yes |
| Kimi K2.6 | Timeout | 31 | Yes | Yes (restricted on test/prod, baseline on dev) | Yes (test/prod) | Yes |
| Gemini 3 Flash Preview | Success | 27 | Yes | Yes (restricted on test/prod) | Yes (test/prod) | Yes (3.7 MB) |
| Gemma 4 31B (LOCAL) | Success | 25 | Yes | Yes (restricted on test+prod, baseline on dev) | Yes (test/prod) | Yes |
| MiniMax M2.7 | Timeout | 20 | No (timed out) | No (never applied) | No (never applied) | No (never verified) |
| DeepSeek V4 Pro | Incomplete | 14 | No (never created) | No (never applied) | No (never applied) | No (never verified) |
| DeepSeek V4 Flash | Incomplete | 12 | Yes (2nd attempt) | No (never applied) | No (never applied) | No (permission denied) |
| MiniMax M2.5 | Timeout (re-run) | 10 | Yes (3rd attempt) | No (timeout) | No (timeout) | Yes (audit only) |
| DeepSeek V3.2 | Timeout (re-run) | 2 | No | No | No | No |
*Opus 4.7 timed out during verification, not during setup — all hardening controls were in place and functional.
Claude Sonnet 4.6
March 9 — TIMEOUT
Root cause: Two failures: (1) cluster name too long (sethostname: invalid argument), (2) protectKernelDefaults: true in KubeletConfiguration prevented kubelet from starting. The model created a Docker wrapper script to work around the hostname issue but the kubelet never came up. Used a proper KubeletConfiguration object (best approach) with anonymous auth disabled, webhook authorization, and readOnlyPort: 0.
March 10 Re-run — SUCCESS
Approach: Created audit-policy.yaml and kind-config.yaml, then proceeded to cluster creation, namespace setup, PSA labelling, and network policy application. Methodical and efficient.
Security features implemented and verified:
- Audit logging: Granular policy — secrets/configmaps at Metadata (no bodies), pods and RBAC at RequestResponse, health checks excluded. 1.9 MB audit.log generated. Log rotation configured.
- API server:
anonymous-auth: false,authorization-mode: Node,RBAC,enable-admission-plugins: NodeRestriction,PodSecurity,tls-min-version: VersionTLS12, strong cipher suites,profiling: false - Kubelet: Proper
KubeletConfigurationobject — anonymous auth disabled, Webhook authorization,readOnlyPort: 0,serverTLSBootstrap: true, TLS 1.2 minimum with strong cipher suites - Controller manager/scheduler: Profiling disabled on both,
terminated-pod-gc-threshold: 10 - Namespaces: development (enforce=baseline, warn=restricted), test and production (enforce=restricted, audit=restricted, warn=restricted)
- Network policies: Default deny ingress+egress on test and production, DNS egress allowed
- Verification: Tested privileged pod creation in test namespace — correctly rejected by PSA
Notable: Correctly heeded the protectKernelDefaults guidance. Used KubeletConfiguration object instead of kubeletExtraArgs (cleanest approach). Added serverTLSBootstrap and per-kubelet TLS cipher suite restrictions — unique among all models. Most comprehensive kubelet hardening overall.
GPT-5.4
March 9 — FAILED (cluster name too long)
Root cause: The cluster name dearbhadh-hardened-cluster-openai-gpt-5-4-ef349951 was too long. GPT 5.4 strictly followed the instruction to use the exact name and spent the entire session retrying, never shortening it. Had the most comprehensive planned manifests (resource quotas, limit ranges, cluster-wide AdmissionConfiguration) but none were applied.
March 10 Re-run — SUCCESS
Approach: Created config files directly (no Python script this time), built the cluster, then applied namespaces, PSA labels, and network policies. Explicitly cited the reference material in avoiding protectKernelDefaults and seccomp-default.
Security features implemented and verified:
- Audit logging: Two-level mount pattern, working correctly (1.5 MB audit.log). Standard policy covering pods at RequestResponse, secrets/configmaps at Metadata.
- API server:
anonymous-auth: false,enable-admission-plugins: NodeRestriction,PodSecurity,profiling: false - Kubelet: Via
kubeletExtraArgs—anonymous-auth: false,authorization-mode: Webhook,read-only-port: 0,rotate-server-certificates: true,streaming-connection-idle-timeout: 5m - Controller manager/scheduler: Profiling disabled on both,
terminated-pod-gc-threshold: 10 - Namespaces: development, test, production created. Test and production: enforce=restricted. kube-system labelled as privileged.
- Network policies: Default deny ingress+egress on test and production, DNS egress allowed
- Verification: Tested anonymous auth blocked (
kubectl auth can-i --as system:anonymous), tested privileged pod rejection in test namespace
Notable: Wisely referenced the Kind limitations guidance and explicitly avoided risky settings. Labelling kube-system as privileged was a good practical touch. Less ambitious security configuration than March 9 (no resource quotas, limit ranges, or AdmissionConfiguration this time) but achieved a complete working result.
Gemini 3 Flash Preview
March 9 — SUCCESS (original run, not re-run)
Approach: Organised files into a manifests/ subdirectory. Encountered the long cluster name problem, diagnosed it independently (68 characters exceeds 63-char limit), shortened the name, then hit an aescbc encryption key length error. Deleted the cluster, fixed the key, and rebuilt. Applied namespaces, PSA labels, and network policies.
Security features implemented and verified:
- Audit logging: two-level mount pattern, working correctly (3.7 MB audit.log)
- Encryption at rest: aescbc for Secrets (though with an example key)
- PSA labels:
enforce=restrictedon test and production (verified) - Network policies: default deny ingress+egress on test and production (verified)
- Namespaces: development, test, production created
What was missing:
- No API server hardening (no anonymous-auth, no TLS settings, no profiling disable)
- No kubelet hardening
- No controller manager/scheduler hardening
Notable: The only model to succeed on the original run without additional guidance. Excellent debugging skills — diagnosed the aescbc key length error from API server container logs. However, security configuration was minimal beyond audit logging, encryption, PSA, and network policies.
MiniMax M2.5
March 9 — TIMEOUT
Root cause: protect-kernel-defaults: true in kubeletExtraArgs prevented kubelet from starting. Also had structural issues — YAML document separators in kubeadmConfigPatch (actually valid but confusing), invalid kubelet authorization-mode: Node,RBAC (should be Webhook), and missing PodSecurity admission plugin.
March 10 Re-run — TIMEOUT (still failed)
What changed: The model heeded the hostname guidance (used the provided short name) and avoided protectKernelDefaults. However, it introduced a new fatal error.
New root cause: Used PodSecurityPolicy in the enable-admission-plugins list. PodSecurityPolicy was removed in Kubernetes 1.25 and the Kind image uses v1.32.2. The API server refused to start with a non-existent admission plugin.
Timeline:
- Attempt 1 (~4 min) — Failed. Config had PodSecurityPolicy, duplicate admission plugin fields at wrong YAML levels, invalid kubelet fields. API server never started.
- Attempt 2 (~4 min) — Failed. Rewrote config but kept PodSecurityPolicy. Same failure.
- Attempt 3 (~14 sec) — Succeeded. Stripped all extra API server args except audit logging. Cluster created.
- Timeout — The model verified the cluster with
kubectl cluster-infoand was about to apply security hardening, but the 600-second timeout hit.
End result: A near-default Kind cluster with only audit logging configured. No namespaces, no PSA labels, no network policies, no hardening beyond audit mounts.
Notable: Fixed one problem (protectKernelDefaults) but introduced another (PodSecurityPolicy). The model burned 8 of 10 minutes on two failed attempts before stripping its configuration down to a minimal working state. Demonstrates a pattern of over-configuring then debugging rather than building incrementally.
DeepSeek V3.2
March 9 — TIMEOUT
Root cause: protect-kernel-defaults: true and seccomp-default: true in kubeletExtraArgs prevented kubelet/API server from starting. The model had independently shortened the cluster name (good) but never got past the control-plane startup phase.
March 10 Re-run — TIMEOUT (still failed)
What changed: The model heeded the hostname guidance and initially set protectKernelDefaults: false (with a correct comment “IMPORTANT: Must be false for Kind”). However, it introduced the same fatal error as MiniMax.
New root cause: Used PodSecurityPolicy in the enable-admission-plugins list. Like MiniMax, this non-existent admission plugin prevented the API server from starting on Kubernetes v1.32.2.
Timeline:
- Attempt 1 (~5 min) — Failed. Config had PodSecurityPolicy, SeccompDefault feature gate, and seccomp-default. API server never started (
kind create clusterkilled by bash 120s timeout). - Debug phase (~3 min) — Investigated the failure, checked docker logs, tried manual kubeconfig export. Misdiagnosed the problem as
protectKernelDefaults: falserather than PodSecurityPolicy. - Config fix — Removed protectKernelDefaults, seccompDefault, and SeccompDefault feature gate. Also changed
PodSecurityPolicytoPodSecurity(the actual fix). - Timeout — The 600-second timeout hit before the second
kind create clustercould be executed.
End result: No cluster created. The corrected config (with PodSecurity instead of PodSecurityPolicy) would likely have worked but there was no time remaining to try it.
Notable: Excessive todowrite calls (6 total) consumed time. The model correctly identified and fixed the PodSecurityPolicy issue in its final config edit but attributed the fix to the wrong cause (protectKernelDefaults). Demonstrates good debugging instincts (the fix was correct) but slow execution.
Key Findings
Re-run Outcomes
-
Additional guidance fixed 2 of 4 failures. Claude Sonnet 4.6 and GPT-5.4 both succeeded on the re-run, producing fully hardened clusters with audit logging, PSA enforcement, and network policies. The hostname fix and
protectKernelDefaultsguidance were sufficient for these models. -
MiniMax and DeepSeek V3.2 failed for a new reason:
PodSecurityPolicy. Both models used the deprecatedPodSecurityPolicyadmission plugin (removed in Kubernetes 1.25) on a v1.32.2 cluster. This prevented the API server from starting. This was not caused by the hostname orprotectKernelDefaultsissues from the first run — it’s a separate knowledge gap about Kubernetes version compatibility. -
10 of 12 model families now have successful or near-successful clusters. Claude (Sonnet, Opus 4.6, Opus 4.7, Opus 4.8), GPT (5.4, 5.5), Gemini 3 Flash, Qwen 3.6 Plus, Qwen3.6-35b-a3b (Local), Gemma 4 31B (Local), and Kimi K2.6 all produced working hardened clusters. Opus 4.7 and Kimi K2.6 timed out during later stages (not initial setup) with hardening controls in place. DeepSeek V4 Flash created a running cluster but applied no security policies beyond the initial Kind config. DeepSeek V4 Pro and MiniMax remain without successful clusters.
-
Claude Opus 4.7 introduces K8s 1.35 features. The use of
AuthenticationConfigurationwithAnonymousAuthConfigurableEndpointsto solve the anonymous-auth + health probe conflict is the most sophisticated solution of any model. Previous models either accepted 0/1 Ready state (Opus 4.6) or didn’t disable anonymous auth. Opus 4.7’s dual-layer PSA (cluster-wide AdmissionConfiguration + namespace labels) is also unique.
Comparative Analysis (Successful Models)
| Feature | Opus 4.8 (2026-05-31) | Opus 4.7 (2026-04-20) | Opus 4.6 (2026-03-25) | Sonnet 4.6 (re-run) | GPT 5.5 (2026-04-25) | Qwen-35b LOCAL (2026-05-03) | GPT 5.4 (re-run) | Qwen 3.6 Plus | Kimi K2.6 | Gemini 3 Flash (original) | Gemma 4 31B LOCAL (2026-05-03) |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Audit logging | Excellent (comprehensive policy, rotation, 30d/10/100MB) | Best (granular policy, noise filtering, rotation) | Best (granular policy, noise filtering) | Best (granular policy, noise filtering) | Good (functional policy, two-level mount) | Excellent (granular, noise filtering, 3.6MB) | Good (standard policy) | Basic (pods, secrets/configmaps, catch-all) | Good (two-level mount, pods+secrets/auth) | Good (standard policy) | Good (standard policy, two-level mount) |
| API server hardening | Best (anon-auth=false, profiling, encryption, TLS 1.2, ciphers, SA lookup) | Best (AuthenticationConfig, 17 plugins, profiling) | Best (TLS, ciphers, encryption, profiling) | Best (TLS, ciphers, timeout, profiling) | Good (admission plugins, profiling disabled) | Good (AuthConfig, profiling, SA lookup, 4 plugins) | Good (basic hardening) | Partial (7 plugins, TLS 1.2, no anon-auth/profiling) | Partial (6 plugins, Node,RBAC, no anon-auth/profiling) | None | None (basic admission plugins only) |
| Kubelet hardening | Excellent (anon disabled, Webhook, readOnlyPort=0, TLS 1.2+ciphers) | Excellent (KubeletConfiguration + TLS ciphers) | Excellent (KubeletConfiguration object) | Best (KubeletConfiguration + TLS bootstrap) | Good (anon disabled, Webhook, readOnlyPort=0) | Good (anon disabled, Webhook, readOnlyPort=0, TLS ciphers) | Good (kubeletExtraArgs) | Good (dual config, anon disabled, Webhook, readOnlyPort=0) | Good (anon disabled, Webhook, readOnlyPort=0) | None | None |
| Controller/scheduler | Both hardened (profiling disabled) | Both hardened (profiling, pod GC) | None | Profiling disabled, pod GC | Profiling disabled | Both hardened (profiling) | Profiling disabled, pod GC | None | None | None | None |
| PSA enforcement | Best (cluster-wide + namespace, baseline on dev) | Best (cluster-wide + namespace, baseline on dev) | Restricted on test/prod | Restricted on test/prod, baseline on dev | Restricted on test/prod | Best (cluster-wide + namespace, baseline on dev) | Restricted on test/prod | Restricted on test/prod | Restricted on test/prod, baseline on dev | Restricted on test/prod | Restricted on test/prod, baseline on dev |
| Network policies | Deny-all + DNS on test/prod | Deny-all + DNS on test/prod | Deny-all + DNS on test/prod | Deny-all + DNS on test/prod | Deny-all + DNS on test/prod | Deny-all + DNS on test/prod (scoped to kube-dns) | Deny-all + DNS on test/prod | Deny-all + DNS on test/prod | Deny-all + DNS + intra-ns on test/prod | Deny-all + DNS on test/prod | Deny-all + DNS on test/prod |
| Encryption at rest | Yes (AES-CBC) | No | Yes (aescbc) | No | Yes | No | No | No | No | Yes (aescbc) | No |
| ResourceQuotas/LimitRanges | No | Yes (all 3 namespaces) | Yes (test/prod) | No | Yes | Yes (all 3 namespaces) | No | Yes (test/prod, most detailed) | Yes (all 3 namespaces) | No | No |
| SA token restriction | No | Yes (default SA in all ns) | No | No | No | Yes (default SA in all ns) | No | No | No | No | No |
| Verification | PSA enforcement test, network policy isolation | Timed out before verification | PSA rejection test, audit check | PSA rejection test, audit check | Basic checks | Verification testing completed | Anonymous auth test, PSA test | Applied manifests, basic checks | PSA test pod, timed out | PSA via kubectl, audit check | Basic cluster checks |
Best overall: Claude Sonnet 4.6 (39/40) — most comprehensive security across all layers. Claude Opus 4.6 (38/40) close behind with the broadest feature set (encryption, quotas). Claude Opus 4.8 and Opus 4.7 (both 37/40) — tied at 3rd place. Opus 4.8 distinguished by proactive Calico CNI recreation for NetworkPolicy enforcement and encryption at rest. Opus 4.7 has unique AuthenticationConfiguration and dual-layer PSA but timed out during verification. GPT 5.5 and Qwen3.6-35b-a3b (both 35/40) — tied at 5th place. GPT 5.5 includes encryption at rest, ResourceQuotas, LimitRanges, and controller-manager/scheduler hardening. Qwen3.6-35b-a3b (a local 35B model) matches GPT 5.5’s score with cluster-wide AdmissionConfiguration, per-endpoint anonymous auth, SA token restriction, and ResourceQuotas/LimitRanges on all 3 namespaces — impressive for a locally-hosted model running with extended timeout. Qwen 3.6 Plus (32/40) — solid result with good PSA, network policies, and the most detailed ResourceQuotas of any model, but lacked API server hardening depth. Kimi K2.6 (31/40) — good security coverage with PSA, network policies including intra-namespace rules, and ResourceQuotas/LimitRanges for all namespaces, but repeated cluster creation failures consumed the timeout budget. Gemma 4 31B (25/40) — second local model tested; achieved correct PSS enforcement, audit logging, and network policies in just 5 tool calls, but applied no API server or kubelet hardening. Demonstrates that the security fundamentals (PSS, network policies, audit logging) are within reach of smaller local models, while deeper hardening remains a gap.
Note: DeepSeek V4 Pro is excluded from this table as it never created a cluster, but its designed configuration (API server, kubelet, controller-manager, scheduler, etcd hardening) was among the most comprehensive of any model tested. Had execution completed, it would likely have placed in the 32-36/40 range based on configuration quality alone. DeepSeek V4 Flash is also excluded — while it created a running cluster, it applied no post-creation hardening (no namespaces, no PSS labels, no network policies).
Original vs Re-run Key Findings
-
The hostname issue was a test framework bug, not a model bug. The tool-generated names were too long. GPT 5.4’s strict adherence to the “MUST use this name” instruction was actually correct behaviour — the instruction was wrong. Fixed by shortening the generated names.
-
protectKernelDefaultswas a reasonable choice that doesn’t work in Kind. Models that set this flag were making the right security decision for production clusters. The fact that it’s incompatible with Kind is a platform limitation, not a security knowledge gap. The guidance correctly reframes this. -
PodSecurityPolicy is a knowledge currency problem. MiniMax and DeepSeek V3.2 both used the deprecated PodSecurityPolicy (removed in K8s 1.25) instead of the current PodSecurity admission plugin. This suggests their training data may be weighted toward older Kubernetes documentation. This was not addressed in the additional guidance and could be added as further context if needed.
-
Time management remains critical. Even with guidance, MiniMax needed 3 attempts and DeepSeek spent too long debugging. Models that build incrementally (Claude, GPT 5.4, Qwen 3.6 Plus) outperform those that attempt comprehensive configs that fail (MiniMax, DeepSeek V3.2).
-
Qwen 3.6 Plus demonstrates strong fundamentals with gaps in depth. Qwen 3.6 Plus achieved correct PSA, network policies, and the most detailed ResourceQuotas (including PVC and pod-level limits) but missed API server hardening basics like
anonymous-auth=falseandprofiling=false. This pattern — strong on Kubernetes-native security features, weaker on API server flag-level hardening — distinguishes it from the Claude models. -
DeepSeek V4: contrasting failure modes. V4 Pro produced one of the most comprehensive hardening configurations of any model (API server, kubelet, etcd, controller-manager, scheduler) but the opencode session terminated before
kind create clusterwas ever run. V4 Flash took the opposite approach — created a running cluster on the second attempt but stopped after basic verification without creating namespaces or applying any security policies. Together they illustrate a spectrum: V4 Pro over-planned and never executed, while V4 Flash under-planned and declared victory too early. Neither DeepSeek V4 variant demonstrated the iterative build-and-harden workflow that successful models (Claude, GPT 5.4, Qwen 3.6 Plus) used.