CWE-120 Base Incompleto High likelihood

Buffer Copy without Checking Size of Input ('Classic Buffer Overflow')

This vulnerability occurs when a program copies data from one memory location to another without first verifying that the source data will fit within the destination buffer's allocated space.

Definición

What is CWE-120?

This vulnerability occurs when a program copies data from one memory location to another without first verifying that the source data will fit within the destination buffer's allocated space.
A classic buffer overflow happens when software blindly trusts input data size. When more data is copied into a fixed-size buffer than it can hold, the excess spills over into adjacent memory. This can corrupt other variables, crash the program, or, most critically, allow an attacker to overwrite critical control data like function return addresses, potentially redirecting the program's execution to malicious code. To prevent this, developers must always validate input sizes before copying operations. Use safe functions that specify buffer limits (like `strncpy` instead of `strcpy` in C), or better yet, employ modern languages with built-in bounds checking. The core principle is never to trust external input and to enforce strict boundaries for all memory operations.
Vulnerability Diagram CWE-120
Classic Stack Buffer Overflow char buf[8] saved EBP return address stack ↓ strcpy(buf, attacker_input_64_bytes) → overflow → Oversized input rewrites saved registers and the return address.
Impacto en el mundo real

Real-world CVEs caused by CWE-120

  • buffer overflow using command with long argument

  • buffer overflow in local program using long environment variable

  • buffer overflow in comment characters, when product increments a counter for a ">" but does not decrement for "<"

  • By replacing a valid cookie value with an extremely long string of characters, an attacker may overflow the application's buffers.

  • By replacing a valid cookie value with an extremely long string of characters, an attacker may overflow the application's buffers.

Cómo lo explotan los atacantes

Ruta del atacante paso a paso

  1. 1

    The following code asks the user to enter their last name and then attempts to store the value entered in the last_name array.

  2. 2

    The problem with the code above is that it does not restrict or limit the size of the name entered by the user. If the user enters "Very_very_long_last_name" which is 24 characters long, then a buffer overflow will occur since the array can only hold 20 characters total.

  3. 3

    The following code attempts to create a local copy of a buffer to perform some manipulations to the data.

  4. 4

    However, the programmer does not ensure that the size of the data pointed to by string will fit in the local buffer and copies the data with the potentially dangerous strcpy() function. This may result in a buffer overflow condition if an attacker can influence the contents of the string parameter.

  5. 5

    The code below calls the gets() function to read in data from the command line.

Ejemplo de código vulnerable

Vulnerable C

The following code asks the user to enter their last name and then attempts to store the value entered in the last_name array.

Vulnerable C
char last_name[20];
  printf ("Enter your last name: ");
  scanf ("%s", last_name);
Ejemplo de código seguro

Secure pseudo

Seguro pseudo
// Validate, sanitize, or use a safe API before reaching the sink.
function handleRequest(input) {
  const safe = validateAndEscape(input);
  return executeWithGuards(safe);
}
What changed: the unsafe sink is replaced (or the input is validated/escaped) so the same payload no longer triggers the weakness.
Lista de prevención

How to prevent CWE-120

  • Requirements Use a language that does not allow this weakness to occur or provides constructs that make this weakness easier to avoid. For example, many languages that perform their own memory management, such as Java and Perl, are not subject to buffer overflows. Other languages, such as Ada and C#, typically provide overflow protection, but the protection can be disabled by the programmer. Be wary that a language's interface to native code may still be subject to overflows, even if the language itself is theoretically safe.
  • Architecture and Design Use a vetted library or framework that does not allow this weakness to occur or provides constructs that make this weakness easier to avoid. Examples include the Safe C String Library (SafeStr) by Messier and Viega [REF-57], and the Strsafe.h library from Microsoft [REF-56]. These libraries provide safer versions of overflow-prone string-handling functions.
  • Operation / Build and Compilation Use automatic buffer overflow detection mechanisms that are offered by certain compilers or compiler extensions. Examples include: the Microsoft Visual Studio /GS flag, Fedora/Red Hat FORTIFY_SOURCE GCC flag, StackGuard, and ProPolice, which provide various mechanisms including canary-based detection and range/index checking. D3-SFCV (Stack Frame Canary Validation) from D3FEND [REF-1334] discusses canary-based detection in detail.
  • Implementation Consider adhering to the following rules when allocating and managing an application's memory: - Double check that your buffer is as large as you specify. - When using functions that accept a number of bytes to copy, such as strncpy(), be aware that if the destination buffer size is equal to the source buffer size, it may not NULL-terminate the string. - Check buffer boundaries if accessing the buffer in a loop and make sure there is no danger of writing past the allocated space. - If necessary, truncate all input strings to a reasonable length before passing them to the copy and concatenation functions.
  • Implementation Assume all input is malicious. Use an "accept known good" input validation strategy, i.e., use a list of acceptable inputs that strictly conform to specifications. Reject any input that does not strictly conform to specifications, or transform it into something that does. When performing input validation, consider all potentially relevant properties, including length, type of input, the full range of acceptable values, missing or extra inputs, syntax, consistency across related fields, and conformance to business rules. As an example of business rule logic, "boat" may be syntactically valid because it only contains alphanumeric characters, but it is not valid if the input is only expected to contain colors such as "red" or "blue." Do not rely exclusively on looking for malicious or malformed inputs. This is likely to miss at least one undesirable input, especially if the code's environment changes. This can give attackers enough room to bypass the intended validation. However, denylists can be useful for detecting potential attacks or determining which inputs are so malformed that they should be rejected outright.
  • Architecture and Design For any security checks that are performed on the client side, ensure that these checks are duplicated on the server side, in order to avoid CWE-602. Attackers can bypass the client-side checks by modifying values after the checks have been performed, or by changing the client to remove the client-side checks entirely. Then, these modified values would be submitted to the server.
  • Operation / Build and Compilation Run or compile the software using features or extensions that randomly arrange the positions of a program's executable and libraries in memory. Because this makes the addresses unpredictable, it can prevent an attacker from reliably jumping to exploitable code. Examples include Address Space Layout Randomization (ASLR) [REF-58] [REF-60] and Position-Independent Executables (PIE) [REF-64]. Imported modules may be similarly realigned if their default memory addresses conflict with other modules, in a process known as "rebasing" (for Windows) and "prelinking" (for Linux) [REF-1332] using randomly generated addresses. ASLR for libraries cannot be used in conjunction with prelink since it would require relocating the libraries at run-time, defeating the whole purpose of prelinking. For more information on these techniques see D3-SAOR (Segment Address Offset Randomization) from D3FEND [REF-1335].
  • Operation Use a CPU and operating system that offers Data Execution Protection (using hardware NX or XD bits) or the equivalent techniques that simulate this feature in software, such as PaX [REF-60] [REF-61]. These techniques ensure that any instruction executed is exclusively at a memory address that is part of the code segment. For more information on these techniques see D3-PSEP (Process Segment Execution Prevention) from D3FEND [REF-1336].
Señales de detección

How to detect CWE-120

Automated Static Analysis High

This weakness can often be detected using automated static analysis tools. Many modern tools use data flow analysis or constraint-based techniques to minimize the number of false positives. Automated static analysis generally does not account for environmental considerations when reporting out-of-bounds memory operations. This can make it difficult for users to determine which warnings should be investigated first. For example, an analysis tool might report buffer overflows that originate from command line arguments in a program that is not expected to run with setuid or other special privileges.

Automated Dynamic Analysis

This weakness can be detected using dynamic tools and techniques that interact with the software using large test suites with many diverse inputs, such as fuzz testing (fuzzing), robustness testing, and fault injection. The software's operation may slow down, but it should not become unstable, crash, or generate incorrect results.

Manual Analysis

Manual analysis can be useful for finding this weakness, but it might not achieve desired code coverage within limited time constraints. This becomes difficult for weaknesses that must be considered for all inputs, since the attack surface can be too large.

Automated Static Analysis - Binary or Bytecode High

According to SOAR [REF-1479], the following detection techniques may be useful: ``` Highly cost effective: ``` Bytecode Weakness Analysis - including disassembler + source code weakness analysis Binary Weakness Analysis - including disassembler + source code weakness analysis

Manual Static Analysis - Binary or Bytecode SOAR Partial

According to SOAR [REF-1479], the following detection techniques may be useful: ``` Cost effective for partial coverage: ``` Binary / Bytecode disassembler - then use manual analysis for vulnerabilities & anomalies

Dynamic Analysis with Automated Results Interpretation SOAR Partial

According to SOAR [REF-1479], the following detection techniques may be useful: ``` Cost effective for partial coverage: ``` Web Application Scanner Web Services Scanner Database Scanners

Auto-corrección de Plexicus

Plexicus detecta automáticamente CWE-120 y abre un PR de corrección en menos de 60 segundos.

Codex Remedium escanea cada commit, identifica esta debilidad concreta y entrega un pull request listo para revisión con el parche. Sin tickets. Sin traspasos.

Preguntas frecuentes

Frequently asked questions

¿Qué es CWE-120?

This vulnerability occurs when a program copies data from one memory location to another without first verifying that the source data will fit within the destination buffer's allocated space.

¿Qué gravedad tiene CWE-120?

MITRE califica la probabilidad de explotación como Alta — esta debilidad se explota activamente en la práctica y debe priorizarse para su remediación.

¿Qué lenguajes o plataformas se ven afectados por CWE-120?

MITRE lists the following affected platforms: C, C++, Assembly.

¿Cómo puedo prevenir CWE-120?

Use a language that does not allow this weakness to occur or provides constructs that make this weakness easier to avoid. For example, many languages that perform their own memory management, such as Java and Perl, are not subject to buffer overflows. Other languages, such as Ada and C#, typically provide overflow protection, but the protection can be disabled by the programmer. Be wary that a language's interface to native code may still be subject to overflows, even if the language itself is…

¿Cómo detecta y corrige Plexicus CWE-120?

El motor SAST de Plexicus detecta la firma de flujo de datos para CWE-120 en cada commit. Cuando hay coincidencia, nuestro agente Codex Remedium abre un PR de corrección con el código corregido, las pruebas y un resumen de una línea para el revisor.

¿Dónde puedo aprender más sobre CWE-120?

MITRE publica la definición canónica en https://cwe.mitre.org/data/definitions/120.html. También puedes consultar la documentación de OWASP y NIST para guías relacionadas.

Listo cuando tú lo estés

Deja de pagar por desarrollador.
Empieza a cerrar el bucle.

Plexicus es el ASPM nativo de IA que escanea, filtra, corrige, pentestea y explica — de forma autónoma. Desarrolladores ilimitados, repos ilimitados, acciones de IA de uso justo. Nivel gratuito real, €269/mo anual cuando estés listo.