How to use dynamic variables in javascript regex safely?

Easily construct dynamic regular expressions using variables, and avoid security risks by following the suggestions.

How to use dynamic variables in javascript regex safely?
Photo by Piotr Łaskawski / Unsplash

Regular Expressions are one of the favorite features and a powerful tool of many programming languages.

Regular expression patterns match character combinations in strings. In JavaScript, regular expressions are also objects.

You can create a JavaScript regexp object using a special syntax or regexp constructor. The following code shows how to create a new regular expression in both methods:

const regExpA = /Hello World/m;
const regExpB = new RegExp("Hello World", "m");

A regex pattern may consist of a single character, template literals (a regular sequence of single characters, i.e., "hello world" like the above example), and special characters that modify the behavior of the regex matches:

  • Character Classes (i.e. [xyz], [^xyz], \d, \D)
  • Assertions (i.e. ^, $, \b, \B, x(?=y))
  • Quantifiers (i.e. x*, x+, x?, x{n}, x{n,}, x{n,m})
  • Groups and backreferences (i.e. (x), (?<Name>x), (?:x))
const regExp = /a+/; // Matches one or more "a".

Dynamic Regular Expressions

You can create a dynamic string by using string concatenation or yet better "Template Literals", like the following example:

String Concatenation

function getRegExp(name) {
  return new RegExp("Hello " + name);
}

const regExp = getRegExp("World");
console.info("Hello World!".match(regExp));

Template Literals

The following function produces the same Regular Expression:

function getRegExp(name) {
  return new RegExp(`Hello ${name}`);
}
const regExp = getRegExp("World");
console.info("Hello World!".match(regExp));

Security

A hidden danger awaits us when using dynamic Regular Expressions: ReDos Attack.

Wikipedia defines a ReDos as:

A regular expression denial of service (ReDoS)[1] is an algorithmic complexity attack that produces a denial-of-service by providing a regular expression and an input that takes a long time to evaluate. ... An attacker can thus cause a program to spend substantial time by providing a specially crafted regular expression and input. The program will then slow down or become unresponsive.

In short, some regular expressions can be CPU intensive, and an attacker can use them to bring your server to its knees. You can read the details here. It would be best to be careful while constructing regular expressions, even for non-dynamic ones. Most of the time, you can write the risky regular expressions a "non-risky" way: For example, (.*a)+ can be rewritten to ([^a]*a)+.

While building a regular expression, if we depend on the value of the variable provided by the end user, the risk of the ReDos increases significantly. You should be careful with regex strings because specific patterns may stop your server. Here is another good article explaining the attack and how to avoid it using simple terms. If you are sure what you do and want to use variables in Regular Expression, call npm to the rescue. There are multiple modules related to ReDos attacks, but none of them are fail-safe because it is not easy to check the whole string of a dynamic regex and decide whether it is safe. However, you can eliminate a significant risk by prohibiting special characters such as quantifiers by using escape-string-regexp.

​## Conclusion

Regular expressions are compelling tools and may solve many of our daily coding problems. However, you should use them sparingly and look for other possible solutions. I do the following to make my Regular Expressions safe:

  • Do not use Regular Expressions. Search for other methods to solve the problem at hand without using Regular Expressions in the first place.
  • Do not use dynamic regular expressions if possible.
  • Sanitizing user input with escape-string-regexp for Regular Expressions accepting a JavaScript variable.
  • Using Re2 instead of RegExp
  • When using Perl, I use reliable libraries of reusable patterns from CPAN. These libraries contain patterns and the best way to search, match and replace standard strings such as "zip code", "digit year", "full name", etc.

​Please leave a comment if you have other suggestions.