Improper Neutralization of Invalid Characters in Identifiers in Web Pages

Draft Variant
Structure: Simple
Description

This vulnerability occurs when an application fails to properly filter or escape invalid characters within web identifiers like HTML tag names or URI schemes, allowing malicious sequences to pass through.

Extended Description

Modern web browsers often interpret malformed identifiers in unexpected ways. For instance, a browser might silently strip out null bytes or other invalid characters, reconstructing a dangerous payload that the developer's filters missed. This creates a gap where your security checks see a harmless string, but the browser executes it as active code. Attackers exploit this by embedding these invalid sequences—like null bytes or alternative encodings—within identifiers such as URI schemes. A common bypass involves encoding "javascript:" as something like `java%00script:`, which might bypass a simple blacklist filter but still render as executable JavaScript in the browser, leading to cross-site scripting (XSS) or other client-side attacks.

Common Consequences 1
Scope: ConfidentialityIntegrityAvailability

Impact: Read Application DataExecute Unauthorized Code or Commands

Detection Methods 1
Automated Static AnalysisHigh
Automated static analysis, commonly referred to as Static Application Security Testing (SAST), can find some instances of this weakness by analyzing source code (or binary/compiled code) without having to execute it. Typically, this is done by building a model of data flow and control flow, then searching for potentially-vulnerable patterns that connect "sources" (origins of input) with "sinks" (destinations where the data interacts with external components, a lower layer such as the OS, etc.)
Potential Mitigations 2
Phase: Implementation

Strategy: Output Encoding

Use and specify an output encoding that can be handled by the downstream component that is reading the output. Common encodings include ISO-8859-1, UTF-7, and UTF-8. When an encoding is not specified, a downstream component may choose a different encoding, either by assuming a default encoding or automatically inferring which encoding is being used, which can be erroneous. When the encodings are inconsistent, the downstream component might treat some character or byte sequences as special, even if they are not special in the original encoding. Attackers might then be able to exploit this discrepancy and conduct injection attacks; they even might be able to bypass protection mechanisms that assume the original encoding is also being used by the downstream component. The problem of inconsistent output encodings often arises in web pages. If an encoding is not specified in an HTTP header, web browsers often guess about which encoding is being used. This can open up the browser to subtle XSS attacks.
Phase: Implementation

Strategy: Attack Surface Reduction

To help mitigate XSS attacks against the user's session cookie, set the session cookie to be HttpOnly. In browsers that support the HttpOnly feature (such as more recent versions of Internet Explorer and Firefox), this attribute can prevent the user's session cookie from being accessible to malicious client-side scripts that use document.cookie. This is not a complete solution, since HttpOnly is not supported by all browsers. More importantly, XMLHTTPRequest and other powerful browser technologies provide read access to HTTP headers, including the Set-Cookie header in which the HttpOnly flag is set.

Effectiveness: Defense in Depth

Observed Examples 1
CVE-2004-0595XSS filter doesn't filter null characters before looking for dangerous tags, which are ignored by web browsers. Multiple Interpretation Error (MIE) and validate-before-cleanse.
Applicable Platforms
Languages:
Not Language-Specific : Undetermined
Modes of Introduction
Implementation
Taxonomy Mapping
  • PLOVER
  • Software Fault Patterns