Cluster Tests

Assessment of LLM agents creating hardened Kubernetes clusters using Kind. Each model was given 10 minutes to create a cluster with comprehensive security controls including audit logging, Pod Security Standards, network policies, API server hardening, and kubelet hardening.

Original date: 2026-03-09 | Re-run date: 2026-03-10 | Claude Opus 4.6 added: 2026-03-25 | MiniMax M2.7 added: 2026-03-28 | Claude Opus 4.7 added: 2026-04-20 | Qwen 3.6 Plus added: 2026-04-20 | DeepSeek V4 Pro added: 2026-04-24 | DeepSeek V4 Flash added: 2026-04-24 Models: Claude Opus 4.7, Claude Opus 4.6, Claude Sonnet 4.6, GPT 5.4, Gemini 3 Flash Preview, Qwen 3.6 Plus, DeepSeek V4 Pro, DeepSeek V4 Flash, MiniMax M2.7, MiniMax M2.5, DeepSeek V3.2

Hardened Cluster Creation

Cluster Tests

Table of contents