Quiz Report Card: Pod Security Standards Levels

Reference Answer

The three Kubernetes Pod Security Standards levels are:

Level	Description	Use Case
Privileged	Unrestricted policy, widest possible permissions. Allows known privilege escalations.	System/infrastructure workloads managed by privileged, trusted users
Baseline	Minimally restrictive, prevents known privilege escalations while allowing common container defaults. Blocks hostNetwork, hostPID, privileged containers, hostPath volumes, etc.	General-purpose application workloads
Restricted	Heavily restricted, follows current pod hardening best practices. Requires non-root, seccomp profiles, drops capabilities, etc.	Security-critical applications

Additional context that strengthens an answer:

Enforcement is via Pod Security Admission (PSA) controller
Three enforcement modes: enforce (reject), audit (log), warn (user warning)
Applied at namespace level via labels
PSA replaced Pod Security Policies (PSP), removed in v1.25

Scoring Criteria

Correct three levels: Privileged, Baseline, Restricted — must name all three
Accurate descriptions: Each level’s purpose and restrictions described correctly
Additional context: Enforcement modes, PSA, namespace labels, PSP replacement
No errors: Inaccurate claims are penalised

This is described in the scoring notes as “a very basic question that all models should get right.”

Results Summary

Model	Score	All 3 Levels	Descriptions	Enforcement Modes	Additional Context	Errors
anthropic/claude-opus-4.7	9/10	Yes	Good	Yes	PSA, PSP replacement	None
google/gemini-3-flash-preview	9/10	Yes	Detailed	Yes	PSA, namespace labels	None
anthropic/claude-sonnet-4.6	9/10	Yes	Detailed	Yes	Namespace labels	None
deepseek/deepseek-v3.2	8/10	Yes	Good	No	PSA, PSP replacement	None
openai/gpt-5.4	7/10	Yes	Brief	No	None	None
minimax/minimax-m2.5	7/10	Yes	Brief	No	PSA	PCI-DSS claim
minimax/minimax-m2.7	9/10	Yes	Yes	Yes	Yes	None
qwen/qwen3.6-plus	9/10	Yes	Good	No	PSA mentioned	None
deepseek/deepseek-v4-pro	10/10	Yes	Detailed	Yes	PSA, enforcement modes, PSP replacement	None
deepseek/deepseek-v4-flash	9/10	Yes	Good	Yes	PSA mentioned	readOnlyRootFilesystem error
moonshotai/kimi-k2.6	9/10	Yes	Good	Yes	PSA history	No specific field details
openai/gpt-5.5	10/10	Yes	Accurate	No	None	None
qwen/qwen3.6-35b-a3b (LOCAL)	10/10	Yes	Yes	Yes	Perfect — all levels correct
anthropic/claude-opus-4.8	10/10	Yes	Accurate	Yes	PSA, PSP replacement	None
google/gemma-4-31b (LOCAL)	9/10	Yes	Yes	Yes	PSP replacement mentioned	None
qwen/qwen3.7-plus	9/10	Yes	Good	Yes	PSA mentioned	None
minimax/minimax-m3	10/10	Yes	Accurate	Yes	PSA, PSP replacement	None
anthropic/claude-fable-5	9/10	Yes	Good	Yes	PSA mentioned	Minor hostPath error
moonshotai/kimi-k2.7-code	9/10	Yes	Good	Yes	PSA mentioned	None
z-ai/glm-5.2	10/10	Yes	Accurate	Yes	PSA, PSP replacement	None
mistralai/mistral-medium-3-5	8/10	Yes	Good	Yes	PSA mentioned	Baseline/running-as-root error
anthropic/claude-sonnet-5	10/10	Yes	Accurate	Yes	PSA, PSP replacement	None
tencent/hy3	9/10	Yes	Good	Yes	PSA mentioned	None
openai/gpt-5.6-terra	9/10	Yes	Good	Yes	PSA mentioned	None
openai/gpt-5.6-sol	10/10	Yes	Accurate	Yes	PSA, PSP replacement	None
moonshotai/kimi-k3	10/10	Yes	Accurate	Yes	PSA, enforcement modes	None
xiaomi/mimo-v2.5	10/10	Yes	Accurate	No	PSA, namespace labels, PSP replacement	None

Detailed Analysis

anthropic/claude-opus-4.7 — 9/10

Strengths:

All 3 levels correctly named with accurate descriptions
Explicitly mentions all 3 enforcement modes (enforce, audit, warn) — the key scoring criterion
Correctly notes PSA replaced PodSecurityPolicy
Does not incorrectly claim readOnlyRootFilesystem is part of Restricted (avoids trap)

Weaknesses: Minor — could detail specific controls at each level.

Comparison vs Opus 4.6 (8): Improvement. Coverage of enforcement modes is the differentiator.

Notable: Joins the 9/10 club on this question alongside Sonnet, Gemini 3 Flash, and MiniMax M2.7. The enforcement modes coverage is the key improvement over Opus 4.6.

google/gemini-3-flash-preview — 9/10

Strengths:

All three levels correctly identified with detailed descriptions
Each level has Purpose, Security Posture, and Use Case — the most structured breakdown
Enforcement modes (enforce/audit/warn) correctly explained
Mentions Pod Security Admission and namespace labels
Good detail on what Restricted requires (non-root, seccomp, minimal privileges)
Correctly notes Restricted is more difficult to implement

Weaknesses:

None significant — comprehensive and accurate

Notable: The most thorough answer. The structured format with Purpose/Security Posture/Use Case for each level makes this the most useful as a reference.

anthropic/claude-sonnet-4.6 — 9/10

Strengths:

All three levels correctly identified with clear descriptions
Enforcement modes (enforce/audit/warn) correctly explained
Mentions namespace-level application
Good practical details: Baseline blocks hostNetwork, hostPID, privileged containers
Restricted requirements: non-root, dropping capabilities, seccomp profiles

Weaknesses:

None significant — clean and accurate

Notable: Clear, well-structured answer with the right level of detail. The enforcement modes section is a useful addition that shows understanding beyond just the three levels.

deepseek/deepseek-v3.2 — 8/10

Strengths:

All three levels correctly identified with good descriptions
Correctly notes PSA replaced PSP as of v1.25 — useful historical context
Descriptions are accurate and well-ordered (least to most restrictive)
Mentions Pod Security Admission

Weaknesses:

Does not mention enforcement modes (enforce/audit/warn)
Slightly less detail than Gemini 3 Flash or Claude on what each level restricts

Notable: The PSP replacement mention (v1.25) is a useful piece of context that only DeepSeek V3.2 includes. Solid, accurate answer.

openai/gpt-5.4 — 7/10

Strengths:

All three levels correctly identified
Descriptions are accurate, matching official documentation language
No errors

Weaknesses:

Very brief — three one-line descriptions with no elaboration
No mention of enforcement modes, PSA, or namespace labels
No mention of what each level specifically restricts
Offers to “explain what each level allows or blocks” but doesn’t actually do it

Notable negative: The briefest answer across all models. While accurate, a question about the three PSS levels warrants at least some detail about what each level enforces. Offering to elaborate without doing so doesn’t add value.

minimax/minimax-m2.5 — 7/10

Strengths:

All three levels correctly identified
Mentions Pod Security Admission as the enforcement mechanism
Concise descriptions

Weaknesses:

Claims Restricted “complies with most industry standards (like PCI-DSS)” — the Pod Security Standards documentation makes no such claim. PCI-DSS compliance involves far more than pod security settings.
No mention of enforcement modes (enforce/audit/warn)
Brief descriptions

Notable negative: The PCI-DSS compliance claim is unsupported and misleading. While Restricted aligns with security best practices, claiming compliance with specific regulatory frameworks overstates what PSS provides.

minimax/minimax-m2.7 — 9/10

Strengths:

Correctly lists all three levels (Privileged, Baseline, Restricted)
Clear concise explanation of each level
Practical guidance on when to use each
Clean table format

Weaknesses:

None significant

Notable: Near-perfect answer on this straightforward question. Matches Sonnet and Gemini 3 Flash at 9/10.

qwen/qwen3.6-plus — 9/10

Strengths:

All three levels correctly named: Privileged, Baseline, Restricted
Accurate and concise descriptions for each level
Correctly notes enforcement via PodSecurity admission controller
Mentions namespace-level application via labels
Good contextual detail: Baseline “prevents known privilege escalations”, Restricted “enforces current pod hardening best practices”

Weaknesses:

Does not mention the three enforcement modes (enforce, audit, warn)
Brief — could have elaborated on what each level specifically blocks or requires
No mention of PSP replacement history

Notable: A clean, accurate answer that covers the essentials well. The conciseness is appropriate for this straightforward question, though the missing enforcement modes prevent a perfect score.

deepseek/deepseek-v4-pro — 10/10

Strengths:

Perfect response correctly identifying all three levels (Privileged, Baseline, Restricted) with accurate descriptions
Covers all three enforcement modes (enforce, audit, warn)
Correctly notes PSA integration and PSP replacement history
Detailed and accurate descriptions of what each level restricts

Weaknesses:

None

Notable: The first model to score a perfect 10/10 on this question. While this is considered a “basic” question, V4 Pro’s response goes beyond the minimum by covering enforcement modes, PSA integration, and PSP replacement — all the additional context that the scoring criteria reward. Represents a major improvement over DeepSeek V3.2 (8/10).

deepseek/deepseek-v4-flash — 9/10

Strengths:

Correctly names all three levels: Privileged, Baseline, Restricted
Good descriptions of each level’s purpose and restrictions
Mentions enforcement modes (enforce, audit, warn)
PSA controller mentioned as enforcement mechanism

Weaknesses:

Minor error: Claims Restricted level enforces readOnlyRootFilesystem — this is not actually part of the Restricted profile in the Pod Security Standards. The Restricted level focuses on runAsNonRoot, seccomp, capabilities, and volume types, but does not mandate readOnlyRootFilesystem.

Notable: A strong score of 9/10, matching Opus 4.7, Sonnet, Gemini 3 Flash, Qwen 3.6 Plus, and MiniMax M2.7. The readOnlyRootFilesystem error is a minor inaccuracy that prevents a perfect score. Close to V4 Pro’s 10/10 on this question.

openai/gpt-5.5 — 10/10

Strengths:

Correctly identifies all three levels: Privileged, Baseline, Restricted
Accurate one-sentence descriptions matching official documentation language
Clean, concise answer with no errors or fabrications
Each level’s description is precise: “Unrestricted policy, allowing known privilege escalations” (Privileged), “prevents known privilege escalations while allowing common workloads” (Baseline), “Heavily restricted policy following current Pod hardening best practices” (Restricted)

Weaknesses:

Very brief — no mention of enforcement modes (enforce/audit/warn), PSA controller, or namespace labels
No additional context about PSP replacement or practical usage

Notable: A perfect score despite being the most concise answer. The descriptions are accurate and match the official Kubernetes documentation closely. Joins DeepSeek V4 Pro as the only models to score 10/10 on this question, though V4 Pro achieved it with much more detail. GPT 5.5’s brevity works here because the question asks “what are they” — naming and describing the three levels correctly is sufficient.

moonshotai/kimi-k2.6 — 9/10

Strengths:

All three levels correctly named: Privileged, Baseline, Restricted
Good descriptions of each level’s purpose and restrictions
Good PSA history context — covers the transition from PSP to PSA
Enforcement modes mentioned

Weaknesses:

No specific field-level details for what each level enforces (e.g., which securityContext fields are checked at Restricted level)

Notable: Joins the 9/10 group alongside Opus 4.7, Sonnet, Gemini 3 Flash, MiniMax M2.7, Qwen 3.6 Plus, and DeepSeek V4 Flash. The PSA history context adds value beyond the basic three-level answer.

qwen/qwen3.6-35b-a3b (LOCAL) — 10/10

Strengths:

All three levels correctly named: Privileged, Baseline, Restricted
Accurate descriptions of each level’s purpose and restrictions
Notes PSP replacement context — correctly identifies PSA as the successor to PodSecurityPolicy
Mentions label-based enforcement at the namespace level
No factual errors

Weaknesses: None significant.

Notable: A perfect score — joins GPT 5.5 and DeepSeek V4 Pro as the third model to achieve 10/10 on this question. This is a well-defined conceptual question where the answer is thoroughly documented in Kubernetes documentation, which plays to the local model’s strengths. The PSP replacement context and label-based enforcement mention show understanding beyond just naming the three levels.

google/gemma-4-31b (LOCAL) — 9/10

Strengths:

All three levels correctly named: Privileged, Baseline, Restricted
Accurate descriptions of each level’s purpose and what each level restricts
Mentions all three enforcement modes (enforce, audit, warn)
Notes PSP replacement context — correctly identifies PSA as the successor to PodSecurityPolicy
No factual errors

Weaknesses:

Could have provided more specific field-level details for what each level enforces

Notable: A strong score, joining the large 9/10 group alongside Opus 4.7, Sonnet, Gemini 3 Flash, MiniMax M2.7, Qwen 3.6 Plus, DeepSeek V4 Flash, and Kimi K2.6. This is Gemma 4 31B’s highest score across all quiz questions and demonstrates that well-documented conceptual knowledge plays to the model’s strengths. The mention of enforcement modes and PSP replacement context adds value beyond just naming the three levels.

anthropic/claude-opus-4.8 — 10/10

Strengths:

All three levels correctly named: Privileged, Baseline, Restricted
Accurate descriptions of each level matching official documentation
Good PSA context — covers enforcement mechanism and PSP replacement
No factual errors

Weaknesses: None significant.

Comparison vs Opus 4.7 (9): Improvement. Achieves a perfect score with more complete additional context.

Notable: Joins GPT 5.5, DeepSeek V4 Pro, and Qwen-35b as the fourth model to achieve 10/10 on this question. The Anthropic family’s PSS knowledge improves with each generation: Opus 4.6 (8), Opus 4.7 (9), Opus 4.8 (10).

qwen/qwen3.7-plus — 9/10

Strengths:

All three levels correctly named: Privileged, Baseline, Restricted
Accurate descriptions of each level’s purpose and restrictions
Mentions all three enforcement modes (enforce, audit, warn)
PSA controller mentioned as enforcement mechanism
No factual errors

Weaknesses:

Could have provided more specific field-level details for what each level enforces

Notable: Joins the large 9/10 group alongside Opus 4.7, Sonnet, Gemini 3 Flash, MiniMax M2.7, Qwen 3.6 Plus, DeepSeek V4 Flash, Kimi K2.6, and Gemma 4 31B. Same score as Qwen 3.6 Plus — both Qwen cloud models handle this straightforward question well with consistent accuracy.

minimax/minimax-m3 — 10/10

Strengths:

All three levels correctly identified: Privileged, Baseline, Restricted
Accurate descriptions matching official documentation with appropriate context
Good PSA context — covers enforcement mechanism and PSP replacement
No factual errors

Weaknesses: None significant.

Notable: Joins GPT 5.5, Opus 4.8, DeepSeek V4 Pro, and Qwen-35b as a co-winner with a perfect 10/10 on this question. A clear improvement over MiniMax M2.5 (7/10) and matching MiniMax M2.7 (9/10) in demonstrating strong knowledge of well-documented Kubernetes concepts. The MiniMax family’s PSS knowledge improves with each generation.

anthropic/claude-fable-5 — 9/10

Strengths:

All three levels correctly named: Privileged, Baseline, Restricted
Accurate descriptions of each level’s purpose and restrictions
Mentions all three enforcement modes (enforce, audit, warn)
PSA controller mentioned as enforcement mechanism
No significant factual errors

Weaknesses:

Minor error on hostPath — states that Baseline blocks hostPath volumes, when in fact the Baseline profile allows hostPath. Only the Restricted profile blocks hostPath.

Notable: Joins the large 9/10 group alongside Opus 4.7, Sonnet, Gemini 3 Flash, MiniMax M2.7, Qwen 3.6 Plus, DeepSeek V4 Flash, Kimi K2.6, Gemma 4 31B, and Qwen 3.7 Plus. The minor hostPath error prevents a perfect score but otherwise demonstrates strong knowledge of PSS levels. The Anthropic family’s PSS knowledge continues to be strong: Opus 4.6 (8), Opus 4.7 (9), Opus 4.8 (10), Fable 5 (9).

moonshotai/kimi-k2.7-code — 9/10

Strengths:

All 3 levels correctly identified: Privileged, Baseline, Restricted
Purpose of each level clearly explained
Correct note about progressive restrictions
Mentions PSA enforcement modes

Weaknesses:

Brief compared to the highest scorers — could include specific fields controlled at each level

Notable: Matches the large 9/10 group alongside Opus 4.7, Sonnet, Gemini 3 Flash, MiniMax M2.7, Qwen 3.6 Plus, DeepSeek V4 Flash, K2.6, Gemma 4 31B, Qwen 3.7 Plus, and Fable 5. Same score as K2.6 — both Moonshot models handle this straightforward question well.

z-ai/glm-5.2 — 10/10

Strengths:

All three levels correctly named: Privileged, Baseline, Restricted
Accurate descriptions of each level matching official documentation
Covers all three enforcement modes (enforce, audit, warn)
Good PSA context — covers enforcement mechanism and PSP replacement
No factual errors

Weaknesses: None significant.

Notable: Joins GPT 5.5, Opus 4.8, DeepSeek V4 Pro, Qwen-35b, and MiniMax M3 as a co-winner with a perfect 10/10 on this question. A clean, accurate response that covers the essential elements plus the additional context (enforcement modes, PSA, PSP replacement) that the scoring criteria reward.

mistralai/mistral-medium-3-5 — 8/10

Strengths:

All three levels correctly named: Privileged, Baseline, Restricted
Good descriptions of each level’s purpose and restrictions
Mentions all three enforcement modes (enforce, audit, warn)
PSA controller mentioned as enforcement mechanism

Weaknesses:

Incorrectly states Baseline blocks running as root — the Baseline profile does not restrict running as root; only the Restricted profile enforces runAsNonRoot: true. This is a notable error that conflates Baseline and Restricted restrictions.

Notable: Matches the 8/10 cluster (DeepSeek V3.2) in scoring. The running-as-root error is a specific factual mistake that demonstrates confusion between the Baseline and Restricted profiles’ restrictions. Otherwise demonstrates solid understanding of PSS levels with correct enforcement mode coverage.

anthropic/claude-sonnet-5 — 10/10

Strengths:

All three levels correctly named: Privileged, Baseline, Restricted
Accurate descriptions of each level matching official documentation
Covers all three enforcement modes (enforce, audit, warn)
Good PSA context — covers enforcement mechanism and PSP replacement
No factual errors

Weaknesses: None significant.

Notable: Joins GPT 5.5, Opus 4.8, DeepSeek V4 Pro, Qwen-35b, MiniMax M3, and GLM-5.2 as a co-winner with a perfect 10/10 on this question. The Anthropic family’s PSS knowledge is consistently strong: Opus 4.6 (8), Opus 4.7 (9), Opus 4.8 (10), Fable 5 (9), Sonnet 5 (10).

tencent/hy3 — 9/10

Strengths:

All three levels correctly named: Privileged, Baseline, Restricted
Accurate descriptions of each level’s purpose and restrictions
Correctly identifies all three enforcement modes (enforce, audit, warn)
PSA controller mentioned as enforcement mechanism
No factual errors

Weaknesses:

Could have provided more specific field-level details for what each level enforces

Notable: Joins the large 9/10 group alongside Opus 4.7, Sonnet 4.6, Gemini 3 Flash, MiniMax M2.7, Qwen 3.6 Plus, DeepSeek V4 Flash, Kimi K2.6, Gemma 4 31B, Qwen 3.7 Plus, Kimi K2.7 Code, and Fable 5. Correctly identifies all three levels with accurate descriptions — a strong showing on this well-documented topic.

openai/gpt-5.6-terra — 9/10

Strengths:

Correctly identifies all three PSS levels: Privileged, Baseline, and Restricted
Accurate descriptions of what each level permits and restricts

Weaknesses:

Slightly less detailed than top-scoring responses on specific controls enforced at each level

Notable: Joins the large 9/10 group alongside Opus 4.7, Sonnet 4.6, Gemini 3 Flash, MiniMax M2.7, Qwen 3.6 Plus, DeepSeek V4 Flash, Kimi K2.6, Gemma 4 31B, Qwen 3.7 Plus, Kimi K2.7 Code, Fable 5, and HY3. The OpenAI family performs well on this straightforward question: GPT 5.4 (7), GPT 5.5 (10), GPT 5.6 Terra (9).

openai/gpt-5.6-sol — 10/10

Strengths:

All three levels correctly named: Privileged, Baseline, Restricted
Accurate descriptions of each level matching official documentation
Covers all three enforcement modes (enforce, audit, warn)
Good PSA context — covers enforcement mechanism and PSP replacement
No factual errors
Perfect identification of all three levels with correct descriptions

Weaknesses: None significant.

Notable: Joins GPT 5.5, Opus 4.8, DeepSeek V4 Pro, Qwen-35b, MiniMax M3, GLM-5.2, and Sonnet 5 as a co-winner with a perfect 10/10 on this question. The OpenAI family’s PSS knowledge continues to be strong: GPT 5.4 (7), GPT 5.5 (10), GPT 5.6 Terra (9), GPT 5.6 Sol (10). Sol surpasses Terra’s 9/10 with more complete additional context.

moonshotai/kimi-k3 – 10/10

Strengths:

All three levels correctly named: Privileged, Baseline, Restricted
Accurate descriptions of each level matching official documentation
Covers all three enforcement modes (enforce, audit, warn)
Good PSA context – covers enforcement mechanism via PSA controller with enforce/audit/warn modes
No factual errors

Weaknesses: None significant.

Notable: Joins GPT 5.6 Sol, GPT 5.5, Opus 4.8, DeepSeek V4 Pro, Qwen-35b, MiniMax M3, GLM-5.2, and Sonnet 5 as a co-winner with a perfect 10/10 on this question. The Moonshot family’s PSS knowledge is strong and improving: K2.6 (9), K2.7 Code (9), K3 (10).

xiaomi/mimo-v2.5 — 10/10

Strengths:

Correctly names all three levels (Privileged, Baseline, Restricted) with accurate descriptions of what each permits/restricts
Correctly ties enforcement to Pod Security Admission and namespace labels
Notes PSA superseded PodSecurityPolicy

Weaknesses: None significant.

Notable: A perfect 10/10 tying the co-winners — one of MiMo’s strongest answers, showing confident knowledge of well-documented PSS concepts. This is a shared top score, not a sole win.

Key Findings

All models answered correctly: As the scoring notes predicted, this is a basic question and all five models correctly identified Privileged, Baseline, and Restricted.
Differentiation is in depth and additional context: The gap between models comes from enforcement modes, PSA details, and the specifics of what each level restricts — not from getting the core answer wrong.
Gemini 3 Flash and Claude provided the most complete answers: Both included enforcement modes and namespace-level application, giving the reader a more complete picture of how PSS works in practice.
GPT 5.4 was unnecessarily brief: Offering to elaborate without doing so is a weaker approach than just providing a thorough answer upfront.
MiniMax M2.5’s PCI-DSS claim is a fabrication: The official PSS documentation makes no reference to PCI-DSS compliance. This is the only factual error across all five responses.