XML Injection (aka Blind XPath Injection)

Draft Base
Structure: Simple
Description

XML Injection occurs when an application fails to properly validate or escape user-controlled input before including it in XML documents or queries. This allows attackers to inject malicious XML elements or syntax, potentially altering the document's structure, extracting sensitive data, or disrupting processing logic.

Extended Description

XML documents use special characters like <, >, &, and " to define elements and attributes. If user input containing these characters is inserted without neutralization, an attacker can break out of intended data fields and inject new XML tags, modify queries (like XPath), or even reference external entities. This can lead to data theft, logic bypasses, or denial of service. Preventing XML injection requires strict input validation, context-aware output encoding, and the use of parameterized XPath interfaces. Managing this at scale across numerous codebases and APIs is difficult; an ASPM platform like Plexicus can help you track and remediate these flaws across your entire stack by correlating SAST findings with runtime behavior and prioritizing the most critical exposures.

Common Consequences 1
Scope: ConfidentialityIntegrityAvailability

Impact: Execute Unauthorized Code or CommandsRead Application DataModify Application Data

Detection Methods 1
Automated Static AnalysisHigh
Automated static analysis, commonly referred to as Static Application Security Testing (SAST), can find some instances of this weakness by analyzing source code (or binary/compiled code) without having to execute it. Typically, this is done by building a model of data flow and control flow, then searching for potentially-vulnerable patterns that connect "sources" (origins of input) with "sinks" (destinations where the data interacts with external components, a lower layer such as the OS, etc.)
Potential Mitigations 1
Phase: Implementation

Strategy: Input Validation

Assume all input is malicious. Use an "accept known good" input validation strategy, i.e., use a list of acceptable inputs that strictly conform to specifications. Reject any input that does not strictly conform to specifications, or transform it into something that does. When performing input validation, consider all potentially relevant properties, including length, type of input, the full range of acceptable values, missing or extra inputs, syntax, consistency across related fields, and conformance to business rules. As an example of business rule logic, "boat" may be syntactically valid because it only contains alphanumeric characters, but it is not valid if the input is only expected to contain colors such as "red" or "blue." Do not rely exclusively on looking for malicious or malformed inputs. This is likely to miss at least one undesirable input, especially if the code's environment changes. This can give attackers enough room to bypass the intended validation. However, denylists can be useful for detecting potential attacks or determining which inputs are so malformed that they should be rejected outright.
References 2
Blind XPath Injection
Amit Klein
19-05-2004
ID: REF-882
The Art of Software Security Assessment
Mark Dowd, John McDonald, and Justin Schuh
Addison Wesley
2006
ID: REF-62
Applicable Platforms
Languages:
Not Language-Specific : Undetermined
Modes of Introduction
Implementation
Taxonomy Mapping
  • PLOVER
  • OWASP Top Ten 2007
  • OWASP Top Ten 2004
  • WASC
  • Software Fault Patterns
Notes
MaintenanceThe description for this entry is generally applicable to XML, but the name includes "blind XPath injection" which is more closely associated with Improper Neutralization of Data within XPath Expressions ('XPath Injection'). Therefore this entry might need to be deprecated or converted to a general category - although injection into raw XML is not covered by Improper Neutralization of Data within XPath Expressions ('XPath Injection') or Improper Neutralization of Data within XQuery Expressions ('XQuery Injection').
TheoreticalIn vulnerability theory terms, this is a representation-specific case of a Data/Directive Boundary Error.
Research GapUnder-reported. This is likely found regularly by third party code auditors, but there are very few publicly reported examples.