Improving Regular Expression Performance
Improving Regular Expression Performance
Submitted by Corey Goldberg on Tue, 06/02/2007 - 16:13.Alex from Dojo just
linked to a fascinating article about by Russ Cox about
Regular Expressions: Regular
Expression Matching Can Be Simple And Fast (but is slow in Java, Perl, PHP, Python,
Ruby, ...)
from the article:
The following graph plots time required to check whether a?^na^n matches a^n:
wow... so awk and grep use the Thomson NFA implementation of regexes, while most programming languages don't.
... and here I thought Perl was the regex king.
from the article:
"This article reviews the good theory: regular expressions, finite automata, and a regular expression search algorithm invented by Ken Thompson in the mid-1960s. It also puts the theory into practice, describing a simple implementation of Thompson's algorithm. That implementation, less than 400 lines of C, is the one that went head to head with Perl above. It outperforms the more complex real-world implementations used by Perl, Python, PCRE, and others. The article concludes with a discussion of how theory might yet be converted into practice in the real-world implementations."so.. there is a 40 year old technique that improves performance of regexes dramatically?
The following graph plots time required to check whether a?^na^n matches a^n:
wow... so awk and grep use the Thomson NFA implementation of regexes, while most programming languages don't.
... and here I thought Perl was the regex king.
