Quiz Report Card: Privileged and Allow Privilege Escalation

Date: 2026-03-09 | Qwen 3.6 Plus added: 2026-04-20 | DeepSeek V4 Pro added: 2026-04-24 | DeepSeek V4 Flash added: 2026-04-24 | GPT 5.5 added: 2026-04-25 | Kimi K2.6 added: 2026-04-26 | Qwen3.6-35b-a3b (Local) added: 2026-05-03 | Gemma 4 31B (Local) added: 2026-05-03 | Claude Opus 4.8 added: 2026-05-31 Question: In the context of a Kubernetes pod manifest, Explain the difference between the privileged setting and the allowprivilegeescalation setting, mentioning whether each setting presents a serious security concern.

Reference Answer

These two settings are very different despite having similar names:

privilegedSerious security concern

  • Removes most significant security and isolation protection
  • A privileged container can break out to the underlying host
  • Grants all Linux capabilities, access to host devices, bypasses namespace/cgroup isolation
  • Essentially makes the container equivalent to a root process on the host

allowPrivilegeEscalationRarely a serious security concern

  • Controls whether a process can gain more privileges than its parent process
  • Directly controls the no_new_privs kernel flag
  • Affects setuid/setgid binary behaviour within the container
  • All other parts of container isolation still apply regardless of this setting
  • While it’s a reasonable hardening step, its security impact is limited compared to privileged

The key discriminator is whether models correctly convey the severity gap: privileged is critical (host breakout), while allowPrivilegeEscalation is rarely serious (container isolation still intact).

Scoring Criteria

  1. Correct description of privileged: Removes container isolation, enables host access/breakout
  2. Correct description of allowPrivilegeEscalation: Controls no_new_privs flag, affects setuid binaries
  3. Severity calibration: privileged = critical; allowPrivilegeEscalation = rarely serious. Models that overstate APE’s risk miss the key point.
  4. Technical accuracy: Correct mechanism descriptions, no factual errors
  5. Clear differentiation: The answer should make the gap between the two settings obvious

Results Summary

Model Score Privileged APE Severity Gap no_new_privs Errors
anthropic/claude-opus-4.7 6/10 Correct Correct Overstated Yes APE severity inflated
minimax/minimax-m2.5 8/10 Correct Correct Well calibrated Yes Minor (runAsNonRoot default)
anthropic/claude-sonnet-4.6 7/10 Correct Correct Overstated Yes APE severity inflated
deepseek/deepseek-v3.2 7/10 Correct Correct Reasonable Yes None
google/gemini-3-flash-preview 7/10 Correct Correct Overstated Yes “Major foothold” language
openai/gpt-5.4 6/10 Correct Correct Good language No Missing no_new_privs
minimax/minimax-m2.7 7/10 Yes Yes Partial Not mentioned Overstates APE
qwen/qwen3.6-plus 6/10 Correct Correct Overstated Yes APE “Significant” overstatement
deepseek/deepseek-v4-pro 9/10 Correct Correct Well calibrated Yes None
deepseek/deepseek-v4-flash 6/10 Correct Correct Overstated Yes APE severity inflated
moonshotai/kimi-k2.6 7/10 Correct Correct Overstated Yes APE severity inflated
openai/gpt-5.5 9/10 Correct Correct Well calibrated Yes None
qwen/qwen3.6-35b-a3b (LOCAL) 8/10 Yes Yes (no_new_privs) Slight overclaim Good understanding  
anthropic/claude-opus-4.8 9/10 Correct Correct Well calibrated Yes None
google/gemma-4-31b (LOCAL) 6/10 Correct Correct Overstated Yes APE severity inflated

Detailed Analysis

anthropic/claude-opus-4.7 — 6/10

Strengths:

  • Correctly identifies privileged as extremely serious — removes container isolation
  • Correctly explains APE controls no_new_privs flag
  • Correct defaults (privileged: false, APE: true)

Weaknesses:

  • Severity calibration wrong — rates APE as “Moderate” severity. Should be “rarely a serious security concern.” Better than Opus 4.6’s “SERIOUS” but still overstated.
  • Framing of APE as “hardening weakness” and “aids in-container privilege escalation” overstates practical impact

Comparison vs Opus 4.6 (6): Same score. “Moderate” is directionally better than “SERIOUS” but still doesn’t capture the correct calibration.

Notable: The Anthropic family continues to overstate APE severity. Opus 4.7’s “Moderate” is closer to correct than Opus 4.6’s “SERIOUS” or Sonnet’s “Moderate-to-serious”, but still misses the “rarely serious” mark that MiniMax M2.5 nailed.


minimax/minimax-m2.5 — 8/10

Strengths:

  • Best severity calibration: “Very high” for privileged, “Moderate” for APE — closest to the scoring notes’ position
  • Correctly states APE “does not by itself give the container host-wide powers” — the key insight
  • Excellent interaction scenarios table showing all three combinations (privileged, non-privileged+escalation, non-privileged+no-escalation)
  • Correctly identifies no_new_privs kernel flag
  • Notes Baseline PSS allows APE while Restricted forces it to false
  • Practical guidance section with defence-in-depth example

Weaknesses:

  • Claims runAsNonRoot: true affects the default of allowPrivilegeEscalation — this is not accurate; the default is true regardless
  • Verbose — could be more concise

Notable: The best answer for this question. The interaction scenarios table is the clearest way to show how the two settings relate. The severity calibration is the most accurate across all models.


anthropic/claude-sonnet-4.6 — 7/10

Strengths:

  • Excellent technical depth on both settings
  • Correctly identifies no_new_privs flag and setuid/setgid mechanism
  • Good comparison table with scope, default, primary risk, and mechanism
  • Useful relationship note: privileged overrides allowPrivilegeEscalation
  • Practical recommended baseline YAML

Weaknesses:

  • Rates APE as “Moderate-to-serious security risk” — overstates the severity
  • Describes an attack chain where APE leads to “may then attempt container escape” — this is misleading. APE controls setuid within the container; gaining root inside a non-privileged container does not provide a direct path to container escape
  • Calls APE a “meaningful hardening step” — while true, this overstates relative to the scoring notes’ “rarely a serious security concern”

Notable negative: The “may then attempt container escape” chain conflates two separate issues. Gaining root inside a non-privileged container via setuid is very different from escaping the container. All container isolation still applies.


deepseek/deepseek-v3.2 — 7/10

Strengths:

  • Good severity calibration: “HIGH RISK” for privileged, “MEDIUM RISK” for APE
  • Correctly describes both mechanisms
  • Notes that privileged overrides APE setting
  • Clean comparison table
  • Practical recommendations

Weaknesses:

  • APE section says “Can bypass certain restrictions even without privileged: true” — vague and could be misleading
  • Less detail than some other responses on the specific mechanisms
  • Doesn’t explicitly state that container isolation remains intact with APE

Notable: Solid, accurate answer with reasonable severity calibration. No significant errors.


google/gemini-3-flash-preview — 7/10

Strengths:

  • Good technical detail on both settings
  • Correctly identifies no_new_privs flag and default values
  • Good comparison table with analogy (building keys vs. picking a lock)
  • Correctly notes privileged forces APE to true
  • Mentions using granular capabilities instead of privileged

Weaknesses:

  • Rates APE as “Moderate to High” and calls it “a major foothold for Privilege Escalation attacks” — this significantly overstates the risk
  • The framing implies APE is a serious vulnerability by default, when in practice the container isolation boundaries still constrain what an attacker can do

Notable negative: Calling APE a “major foothold” is the strongest overstatement across all models. While disabling APE is good practice, the scoring notes are clear that it’s “rarely a serious security concern” because isolation remains intact.


openai/gpt-5.4 — 6/10

Strengths:

  • Best severity language: “less severe than privileged: true. Its impact depends on what binaries, capabilities, and users exist inside the container” — most nuanced acknowledgement that context matters
  • Correctly states APE “does not by itself make the container fully privileged”
  • Clear, accurate summary

Weaknesses:

  • Does not mention the no_new_privs kernel flag — this is the key technical mechanism and a significant omission
  • Very brief — three bullet points per setting with minimal elaboration
  • No YAML examples, no comparison table, no practical recommendations
  • Doesn’t explain the relationship between the two settings

Notable: The severity calibration language is among the best, but the omission of no_new_privs and the extreme brevity limit the score. Understanding what APE actually controls at the kernel level is essential for this question.


minimax/minimax-m2.7 — 7/10

Strengths:

  • Correctly explains fundamental difference (privileged = host-level, APE = process-level)
  • Identifies privileged as critical
  • Good YAML example
  • Clear comparison table

Weaknesses:

  • Overstates allowPrivilegeEscalation severity — calls it “High security risk” when it’s rarely a serious concern (all other container isolation still applies)
  • Doesn’t mention no_new_privs

Notable: Similar to Sonnet and DeepSeek V3.2 (both 7/10). MiniMax M2.5 scored 8/10 on this question with the best severity calibration — MiniMax M2.7 regressed to the majority position.


qwen/qwen3.6-plus — 6/10

Strengths:

  • Correctly identifies privileged as “Extremely Serious (Critical)” — removes container-host isolation
  • Correctly identifies no_new_privs kernel flag and explains its mechanism (setuid/setgid, capability inheritance)
  • Good comparison table covering scope, mechanism, default values, and impact
  • Correct defaults: privileged=false, APE=true
  • Good PSS context: Baseline allows APE, Restricted requires APE=false
  • Correctly notes privileged overrides APE benefit

Weaknesses:

  • APE severity overstated: Rates it as “Significant (but narrower scope)” — the scoring notes say “rarely a serious security concern.” While the “narrower scope” qualifier is correct, calling it “Significant” is still too strong.
  • States APE “could potentially escalate privileges within the container” and describes it as a “critical defense-in-depth control” — this framing overstates its practical importance
  • The attack chain linking APE to “other vulnerabilities (e.g., container breakout, misconfigured mounts)” conflates container-internal privilege escalation with container escape, which are separate issues

Notable: Follows the same pattern as other Anthropic-family and frontier models — correctly explains both mechanisms but overstates APE severity. The “Significant” rating is better than Gemini 3 Flash’s “major foothold” or Sonnet’s “Moderate-to-serious” but still doesn’t reach MiniMax M2.5’s best-calibrated answer.


deepseek/deepseek-v4-pro — 9/10

Strengths:

  • Excellent and detailed explanation of both settings
  • Clear on privileged removing isolation as the critical security concern
  • Correct explanation of the no_new_privs flag for APE
  • Best severity calibration after MiniMax M2.5 — correctly rates APE as moderate/important hardening rather than overstating it

Weaknesses:

  • None significant

Notable: Takes the top score on this question, beating MiniMax M2.5’s previous best of 8/10. The severity gap between privileged (critical) and APE (moderate hardening) is correctly conveyed without the overstatement that most other models exhibit. A remarkable improvement over DeepSeek V3.2 (7/10).


deepseek/deepseek-v4-flash — 6/10

Strengths:

  • Correctly describes privileged as removing container isolation and enabling host breakout
  • Correctly explains allowPrivilegeEscalation controls the no_new_privs kernel flag
  • Good technical understanding of both mechanisms

Weaknesses:

  • Overstates allowPrivilegeEscalation severity — treats APE as a significant security concern when the correct calibration is “rarely a serious security concern” since all container isolation remains intact
  • Framing implies APE is a critical setting when in practice its impact is limited compared to privileged

Notable: Scores the same as Opus 4.7 and Qwen 3.6 Plus (6/10) — all three overstate APE severity. A significant regression from V4 Pro (9/10, which had the second-best severity calibration after MiniMax M2.5). The DeepSeek V4 models diverge sharply on this question: Pro understands the severity gap correctly, Flash does not.


openai/gpt-5.5 — 9/10

Strengths:

  • Excellent distinction between the two settings with clear, well-structured explanation
  • Correctly explains privileged removes security and isolation: “broad access to the host,” “most or all Linux capabilities,” “access to host devices,” “reduced isolation”
  • Accurate explanation of allowPrivilegeEscalation controlling the no_new_privs flag and affecting setuid/setgid binaries
  • Well-calibrated severity assessment: “Very serious” for privileged, “Security concern, but narrower” for APE — correctly conveys the gap without overstating APE
  • Good summary table clearly showing the severity difference
  • Correctly notes that privileged: true overrides allowPrivilegeEscalation: false
  • Practical YAML examples throughout

Weaknesses:

  • Could note more explicitly that APE is “rarely a serious concern” per the scoring criteria — “narrower concern” is close but slightly softer than the ideal calibration
  • Does not mention the Pod Security Standards context (Baseline allows APE, Restricted requires false)

Notable: A significant improvement over GPT 5.4 (6/10), which had good severity language but missed the no_new_privs mechanism entirely. GPT 5.5 covers both the technical mechanism and the severity gap correctly. Joins DeepSeek V4 Pro at 9/10, making these two the best-calibrated models on this question after MiniMax M2.5 (8/10).


moonshotai/kimi-k2.6 — 7/10

Strengths:

  • Correctly describes privileged as removing container isolation and enabling host breakout
  • Good explanation of allowPrivilegeEscalation controlling the no_new_privs kernel flag
  • Correct defaults for both settings

Weaknesses:

  • Overstates APE severity — treats allowPrivilegeEscalation as a more significant security concern than it is. The correct calibration is “rarely a serious security concern” since all container isolation remains intact regardless of APE setting.

Notable: Matches Sonnet, DeepSeek V3.2, Gemini 3 Flash, and MiniMax M2.7 at 7/10. Follows the common pattern of correctly explaining both mechanisms but overstating APE severity. Only DeepSeek V4 Pro (9/10), GPT 5.5 (9/10), and MiniMax M2.5 (8/10) correctly calibrated the severity gap.


qwen/qwen3.6-35b-a3b (LOCAL) — 8/10

Strengths:

  • Correctly explains privileged as critical severity — all capabilities, host escape, disables seccomp/AppArmor
  • Correctly identifies the no_new_privs flag mechanism for allowPrivilegeEscalation
  • Good understanding of the fundamental difference between the two settings
  • Clear distinction between host-level access (privileged) and process-level control (APE)

Weaknesses:

  • Slight overclaim on APE severity — rates it as “Moderate/Important” when the correct calibration is “rarely a serious concern.” Container isolation remains intact regardless of APE setting.

Notable: Matches MiniMax M2.5 at 8/10 with good severity calibration — closer to correct than most models that rate APE as “High” or “Major.” The no_new_privs identification is strong. Only DeepSeek V4 Pro (9/10) and GPT 5.5 (9/10) scored higher with better severity gap calibration.


google/gemma-4-31b (LOCAL) — 6/10

Strengths:

  • Correctly explains privileged as a critical security concern — grants all capabilities, enables host breakout
  • Correctly identifies the no_new_privs flag mechanism for allowPrivilegeEscalation
  • Good basic description of both settings’ mechanisms

Weaknesses:

  • Overstates allowPrivilegeEscalation severity — rates APE as a significant security concern when the correct calibration is “rarely a serious concern” since all container isolation remains intact regardless of the APE setting
  • Framing implies APE is a high-impact setting comparable to privileged, missing the important severity gap

Notable: Falls into the same majority pattern as Opus 4.7, Sonnet, Gemini 3 Flash, Kimi K2.6, and DeepSeek V4 Flash — all at 6/10 for the same reason: correct mechanism identification but incorrect severity calibration. The no_new_privs identification is the only standout element.


anthropic/claude-opus-4.8 — 9/10

Strengths:

  • Excellent differentiation between the two settings
  • Correctly identifies privileged as critical — host compromise, removes all isolation
  • Correctly explains APE controls the no_new_privs kernel flag
  • Well-calibrated severity: privileged = critical, APE = moderate/no_new_privs (correctly conveys that APE is not a serious concern since container isolation remains intact)
  • Correctly notes that privileged overrides APE
  • Minor: slightly overstates APE risk in places, but overall calibration is correct

Weaknesses:

  • Minor overstatement of APE in some phrasing, but the overall severity gap is correctly conveyed

Comparison vs Opus 4.7 (6): Dramatic improvement. Opus 4.7 rated APE as “Moderate” which was still too high. Opus 4.8 correctly conveys that APE is rarely a serious concern, joining GPT 5.5 and V4 Pro at 9/10.

Notable: Breaks the Anthropic family pattern of overstating APE severity. Opus 4.6 (6/10, “SERIOUS”), Opus 4.7 (6/10, “Moderate”), Opus 4.8 (9/10, well-calibrated) — a clear trajectory of improvement. Joins GPT 5.5 and DeepSeek V4 Pro at the top of the leaderboard for this question.


Key Findings

  1. Severity calibration is the key discriminator: The scoring notes emphasise that allowPrivilegeEscalation is “rarely a serious security concern” because container isolation remains intact. Most models overstated APE’s severity — MiniMax M2.5 came closest to the correct calibration.

  2. MiniMax M2.5’s strongest quiz performance: After consistently lower scores on other quizzes, MiniMax M2.5 produced the best answer here — well-calibrated severity, correct technical details, and an excellent interaction scenarios table.

  3. Claude and Gemini 3 Flash overstated APE severity: Both described APE as leading to further attacks (container escape in Claude’s case, “major foothold” in Gemini 3 Flash’s). This misses the scoring notes’ point that all container isolation remains intact regardless of APE.

  4. GPT 5.4 had the best severity language but the weakest technical depth: Saying “its impact depends on what binaries, capabilities, and users exist” is the most accurate framing, but not mentioning no_new_privs at all is a notable gap.

  5. All models correctly described privileged: The host-breakout risk of privileged containers is well understood across all models. The differentiation comes entirely from how they handle allowPrivilegeEscalation.


Back to top

Dearbhadh — LLM Kubernetes Security Assessment Tool

This site uses Just the Docs, a documentation theme for Jekyll.