AI.42 — Prompt Injection Surface Mapping

Structural Problem

Language models process user instructions and system policies through the same input channel — the prompt. The structural problem is that the boundary between instruction space (what the user wants the model to do) and policy space (what the model is constrained not to do) is not architecturally enforced but relies on the model's learned representation. Prompt injection attacks exploit the structural weakness of this boundary, crafting inputs that cause the model to treat policy-violating instructions as legitimate user requests.

This is a structural coupling problem: because instructions and policies share the same input representation, there exist coupling paths between them that attackers can exploit to project policy-violating content into the model's instruction-processing space.

System Context

This application addresses LLM security at the prompt processing layer, spanning user-facing applications, system prompts, safety filters, and the model's internal processing of instructions versus constraints. The relevant system boundary includes the prompt construction pipeline, the model's instruction-following mechanism, the policy enforcement layer, and the attack surface between them.

Diagnostic Capability

Injection surface mapping identifying the structural boundaries between instruction and policy processing where attacks are feasible
Vulnerability surface characterization describing the geometry of the jailbreak attack surface for specific model and prompt configurations
Policy enforcement robustness assessment evaluating how well the current prompt architecture resists injection attacks
Hardening guidance providing structural recommendations for reducing the injection surface area

Typical Failure Modes

Direct injection where crafted input text overrides system-level policy constraints through structural boundary crossing
Indirect injection where content from external sources (retrieved documents, tool outputs) contains instructions that the model processes as legitimate
Gradual erosion where a sequence of individually benign inputs progressively shifts the model away from policy compliance

Example Use Cases

Security assessment: Structural mapping of injection surfaces for deployed LLM applications before production launch
Prompt architecture design: Structural guidance for building prompt pipelines that minimize the instruction-policy boundary vulnerability
Red team preparation: Providing structural maps of the attack surface to guide red team evaluation efforts

Strategic Relevance

Prompt injection is the dominant security vulnerability in LLM-based applications. Structural mapping of the injection surface provides a systematic foundation for security hardening that goes beyond pattern-matching defenses, enabling architecturally grounded protection of AI applications.

SORT Structural Lens

The SORT framework addresses this application through four structural dimensions, each providing a distinct analytical layer.

V1 — Observed Phenomenon

Prompt injections bypass policy constraints.

V2 — Structural Cause

Structural boundary between instruction and policy space.

V3 — SORT Effect Space

Mapping of jailbreak attack surface.

V4 — Decision Space

Prompt design, policy enforcement, security hardening.

← Back to Application Catalog