Inappropriate Encoding for Output Context

Incomplete Base
Structure: Simple
Description

This vulnerability occurs when a system uses one type of encoding for its output, but the component receiving that data expects a different encoding. The mismatch causes the downstream component to interpret the data incorrectly.

Extended Description

When the wrong encoding is applied, even if it's similar to the correct one, the receiving component may decode characters into unexpected control commands or special elements. This breaks the intended separation between data and executable instructions, potentially allowing injection attacks to bypass security checks like input validation. While common in web security—like using HTML entity encoding in a JavaScript context where it's ineffective—this issue can affect any system where data passes between components using different encoding rules. The core problem isn't a lack of encoding, but using encoding that doesn't match the context in which the data will be interpreted.

Common Consequences 1
Scope: IntegrityConfidentialityAvailability

Impact: Modify Application DataExecute Unauthorized Code or Commands

An attacker could modify the structure of the message or data being sent to the downstream component, possibly injecting commands.

Detection Methods 1
Automated Static AnalysisHigh
Automated static analysis, commonly referred to as Static Application Security Testing (SAST), can find some instances of this weakness by analyzing source code (or binary/compiled code) without having to execute it. Typically, this is done by building a model of data flow and control flow, then searching for potentially-vulnerable patterns that connect "sources" (origins of input) with "sinks" (destinations where the data interacts with external components, a lower layer such as the OS, etc.)
Potential Mitigations 3
Phase: Implementation

Strategy: Output Encoding

Use context-aware encoding. That is, understand which encoding is being used by the downstream component, and ensure that this encoding is used. If an encoding can be specified, do so, instead of assuming that the default encoding is the same as the default being assumed by the downstream component.
Phase: Architecture and Design

Strategy: Output Encoding

Where possible, use communications protocols or data formats that provide strict boundaries between control and data. If this is not feasible, ensure that the protocols or formats allow the communicating components to explicitly state which encoding/decoding method is being used. Some template frameworks provide built-in support.
Phase: Architecture and Design

Strategy: Libraries or Frameworks

Use a vetted library or framework that does not allow this weakness to occur or provides constructs that make this weakness easier to avoid. For example, consider using the ESAPI Encoding control [REF-45] or a similar tool, library, or framework. These will help the programmer encode outputs in a manner less prone to error. Note that some template mechanisms provide built-in support for the appropriate encoding.
Demonstrative Examples 1
This code dynamically builds an HTML page using POST data:

Code Example:

Bad
PHP
php

...*

php
The programmer attempts to avoid XSS exploits (Improper Neutralization of Input During Web Page Generation ('Cross-site Scripting')) by encoding the POST values so they will not be interpreted as valid HTML. However, the htmlentities() encoding is not appropriate when the data are used as HTML attributes, allowing more attributes to be injected.
For example, an attacker can set picAltText to:

Code Example:

Attack
bash
This will result in the generated HTML image tag:

Code Example:

Result
HTML
html
The attacker can inject arbitrary javascript into the tag due to this incorrect encoding.
Observed Examples 1
CVE-2009-2814Server does not properly handle requests that do not contain UTF-8 data; browser assumes UTF-8, allowing XSS.
References 7
Injection-safe templating languages
Jim Manico
30-06-2010
ID: REF-786
Can we please stop saying that XSS is boring and easy to fix!
Dinis Cruz
25-09-2010
ID: REF-787
Canoe: XSS prevention via context-aware output encoding
Ivan Ristic
24-09-2010
ID: REF-788
What is the Future of Automated XSS Defense Tools?
Jim Manico
08-03-2011
ID: REF-789
XSS Attacks
Jeremiah Grossman, Robert "RSnake" Hansen, Petko "pdp" D. Petkov, Anton Rager, and Seth Fogie
Syngress
2007
ID: REF-709
DOM based XSS Prevention Cheat Sheet
OWASP
ID: REF-725
OWASP Enterprise Security API (ESAPI) Project
OWASP
ID: REF-45
Applicable Platforms
Languages:
Not Language-Specific : Undetermined
Related Attack Patterns
Taxonomy Mapping
  • The CERT Oracle Secure Coding Standard for Java (2011)