Improper Validation of Syntactic Correctness of Input

Incomplete Base
Structure: Simple
Description

This vulnerability occurs when software expects input in a specific, well-structured format but fails to properly check that the incoming data actually follows those rules.

Extended Description

Modern applications often rely on structured data formats like JSON, XML, YAML, or even code snippets. These formats have strict grammatical rules (syntax) that parsers use to understand the data. When you don't validate that untrusted input correctly adheres to this expected syntax, you hand control of your parser to an attacker. They can send malformed data designed to crash the parser, trigger obscure error messages that leak system information, or exploit hidden bugs in the parsing logic itself. Robust input validation is your first line of defense. Instead of assuming data is well-formed, actively verify its syntactic correctness before any processing begins. Use established, security-hardened parsers with strict mode enabled and define a precise schema for all expected inputs. This practice prevents attackers from manipulating the parsing stage to cause denial-of-service, information disclosure, or create an opening for more severe injection attacks.

Common Consequences 1
Scope: Other

Impact: Varies by Context

Potential Mitigations 1
Phase: Implementation

Strategy: Input Validation

Assume all input is malicious. Use an "accept known good" input validation strategy, i.e., use a list of acceptable inputs that strictly conform to specifications. Reject any input that does not strictly conform to specifications, or transform it into something that does. When performing input validation, consider all potentially relevant properties, including length, type of input, the full range of acceptable values, missing or extra inputs, syntax, consistency across related fields, and conformance to business rules. As an example of business rule logic, "boat" may be syntactically valid because it only contains alphanumeric characters, but it is not valid if the input is only expected to contain colors such as "red" or "blue." Do not rely exclusively on looking for malicious or malformed inputs. This is likely to miss at least one undesirable input, especially if the code's environment changes. This can give attackers enough room to bypass the intended validation. However, denylists can be useful for detecting potential attacks or determining which inputs are so malformed that they should be rejected outright.

Effectiveness: High

Demonstrative Examples 1
The following code loads and parses an XML file.

Code Example:

Bad
Java

// Read DOM* try { ``` ... DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); factory.setValidating( false ); .... c_dom = factory.newDocumentBuilder().parse( xmlFile ); } catch(Exception ex) { ... }

The XML file is loaded without validating it against a known XML Schema or DTD.
Observed Examples 2
CVE-2016-4029Chain: incorrect validation of intended decimal-based IP address format (Improper Validation of Syntactic Correctness of Input) enables parsing of octal or hexadecimal formats (Incorrect Parsing of Numbers with Different Radices), allowing bypass of an SSRF protection mechanism (Server-Side Request Forgery (SSRF)).
CVE-2007-5893HTTP request with missing protocol version number leads to crash
Applicable Platforms
Languages:
Not Language-Specific : Often
Modes of Introduction
Implementation
Related Weaknesses
Notes
MaintenanceThis entry is still under development and will continue to see updates and content improvements.