This vulnerability occurs when software incorrectly transforms data between different formats, leading to corrupted or misinterpreted information that can break functionality or create security gaps.
Encoding errors happen when a system fails to properly convert data from one character set or format to another, such as between UTF-8, ASCII, or URL encoding. This often surfaces when handling user input, transferring data across systems, or preparing information for display. The result is 'mojibake' (garbled text), lost data, or unexpected characters that can crash applications, corrupt data storage, or bypass validation checks. For developers, the core issue is assuming data will always be in a single, expected format. To prevent this, explicitly define and validate character encodings at every system boundary—when reading input, sending output, or storing data. Use standardized, well-tested libraries for all encoding/decoding operations instead of custom logic, and consistently apply these transformations across your entire application stack to maintain data integrity.
Impact: Unexpected State
Strategy: Input Validation
Strategy: Output Encoding
Strategy: Input Validation