Generate random sequences matching a given regular expression


Generate random sequences matching a given regular expression

Having an arbitrary regular expression how do i generate random text sequences matching (or not matching) a given regular expression? I’m interested in both theory and practical usage (need this in context of .NET to improve my tests) so links to articles or existing libraries would be appreciated.


Solution 1:

From a purely theoretical standpoint, regular expressions are equivalent to rational languages, which are constructed on the following basis:

  • {} (a language with no words) is rational.
  • {a} (a language with a single one-word letter) is rational.
  • If L and M are two rational languages, then their union (the words either in L or in M) is rational.
  • If L and M are two rational languages, then their concatenation LM (words constructed by prepending a word from L to a word from M) is also rational.
  • If L is a rational language, then L* (words constructed by concatenating any number of words from language L) is also rational.

This constructive approach complements the regular expression recognition/matching approach, and helps construct recursively words that match the expression:

  • make({}) = ""
  • make({a}) = "a"
  • make(A|B) = flip-coin ? make(A) : make(B)
  • make(AB) = make(A) + make(B)
  • make(A*) = flip-coin ? "" : make(A) + make(A*)

Solution 2:

One way to look at it is that every regular expression can (I believe) be represented as a finite state machine in which each transition between states consists of consuming one or more characters (or maybe zero characters is also valid? Long time since I studied this).

So, one way to go about this would be to think of or convert the regex to a finite state machine and move from state to state, selecting transitions at random until you hit a valid end state.

Of course, I have no idea how feasible this method actually is in code, or whether any such libraries exist.

Solution 3:

In .NET, you may also want to look at project Fare. It does exactly what you describe.

Here is how to use it:

var xeger = new Xeger(pattern);
var match = xeger.Generate();

Regex.IsMatch(match, pattern);
// Prints -> true

Since you mentioned unit testing, you may also consider using AutoFixture which once you decorate a Property (or Field) with the [RegularExpression] attribute it will assign a value that matches the regular expression.

Solution 4:

rxrdg – Regular Expression Random Data Generator. Written in C#.

Solution 5:

Although not .NET but can be interesting. Found this implementation in Haskell. The author claims it to be more powerful than CPAN Genex.