Inefficient Regular Expression Complexity

Draft Base
Structure: Simple
Description

This vulnerability occurs when an application uses a poorly constructed regular expression that can trigger catastrophic backtracking, leading to extreme CPU consumption and potential denial-of-service.

Extended Description

The root cause lies in how many regex engines handle failed matches through a process called backtracking. When a pattern doesn't match, the engine tries different paths by rewinding to earlier decision points. A poorly designed regex—often involving nested quantifiers (like (a+)+) or ambiguous patterns—can create an exponential number of these backtracking paths relative to the input length. Attackers exploit this by providing carefully crafted, non-matching input that forces the engine to evaluate all possible backtracking paths. This causes CPU usage to spike dramatically, potentially freezing the application or server. The risk is highest when processing user-controlled strings without complexity limits, making regex-based input validation a common attack vector for denial-of-service.

Common Consequences 1
Scope: Availability

Impact: DoS: Resource Consumption (CPU)

Potential Mitigations 4
Phase: Architecture and Design
Use regular expressions that do not support backtracking, e.g. by removing nested quantifiers.

Effectiveness: High

Phase: System Configuration
Set backtracking limits in the configuration of the regular expression implementation, such as PHP's pcre.backtrack_limit. Also consider limits on execution time for the process.

Effectiveness: Moderate

Phase: Implementation
Do not use regular expressions with untrusted input. If regular expressions must be used, avoid using backtracking in the expression.

Effectiveness: High

Phase: Implementation
Limit the length of the input that the regular expression will process.

Effectiveness: Moderate

Demonstrative Examples 2

ID : DX-158

This example attempts to check if an input string is a "sentence" [REF-1164].

Code Example:

Bad
JavaScript

var test_string = "Bad characters: $@#"; var bad_pattern = /^(\w+\s?)*$/i; var result = test_string.search(bad_pattern);

The regular expression has a vulnerable backtracking clause inside (\w+\s?)*$ which can be triggered to cause a Denial of Service by processing particular phrases. To fix the backtracking problem, backtracking is removed with the ?= portion of the expression which changes it to a lookahead and the \2 which prevents the backtracking. The modified example is:

Code Example:

Good
JavaScript

var test_string = "Bad characters: $@#"; var good_pattern = /^((?=(\w+))\2\s?)*$/i; var result = test_string.search(good_pattern);

Note that [REF-1164] has a more thorough (and lengthy) explanation of everything going on within the RegEx.
This example attempts to check if an input string is a "sentence" and is modified for Perl [REF-1164].

Code Example:

Bad
Perl

my $test_string = "Bad characters: $@#"; my $bdrslt = $test_string; $bdrslt =~ /^(\w+\s?)*$/i;

The regular expression has a vulnerable backtracking clause inside (\w+\s?)*$ which can be triggered to cause a Denial of Service by processing particular phrases. To fix the backtracking problem, backtracking is removed with the ?= portion of the expression which changes it to a lookahead and the \2 which prevents the backtracking. The modified example is:

Code Example:

Good
Perl

my $test_string = "Bad characters: $@#"; my $gdrslt = $test_string; $gdrslt =~ /^((?=(\w+))\2\s?)*$/i;

Note that [REF-1164] has a more thorough (and lengthy) explanation of everything going on within the RegEx.
Observed Examples 8
CVE-2020-5243server allows ReDOS with crafted User-Agent strings, due to overlapping capture groups that cause excessive backtracking.
CVE-2021-21317npm package for user-agent parser prone to ReDoS due to overlapping capture groups
CVE-2019-16215Markdown parser uses inefficient regex when processing a message, allowing users to cause CPU consumption and delay preventing processing of other messages.
CVE-2019-6785Long string in a version control product allows DoS due to an inefficient regex.
CVE-2019-12041Javascript code allows ReDoS via a long string due to excessive backtracking.
CVE-2015-8315ReDoS when parsing time.
CVE-2015-8854ReDoS when parsing documents.
CVE-2017-16021ReDoS when validating URL.
References 7
Runaway Regular Expressions: Catastrophic Backtracking
Jan Goyvaerts
22-12-2019
ID: REF-1162
Regular expression Denial of Service - ReDoS
Adar Weidman
ID: REF-1163
Catastrophic backtracking
Ilya Kantor
13-12-2020
ID: REF-1164
Freezing the Web: A Study of ReDoS Vulnerabilities in JavaScript-based Web Servers
Cristian-Alexandru Staicu and Michael Pradel
USENIX Security Symposium
11-07-2018
ID: REF-1165
The Impact of Regular Expression Denial of Service (ReDoS) in Practice: An Empirical Study at the Ecosystem Scale
James C. Davis, Christy A. Coghlan, Francisco Servant, and Dongyoon Lee
01-08-2018
ID: REF-1166
The Regular Expression Denial of Service (ReDoS) cheat-sheet
James Davis
23-05-2020
ID: REF-1167
Likelihood of Exploit

High

Applicable Platforms
Languages:
Not Language-Specific : Undetermined
Modes of Introduction
Implementation
Alternate Terms

ReDoS

ReDoS is an abbreviation of "Regular expression Denial of Service".

Regular Expression Denial of Service

While this term is attack-focused, this is commonly used to describe the weakness.

Catastrophic backtracking

This term is used to describe the behavior of the regular expression as a negative technical impact.