A Tale of Two Regexps · 25 July, 02:06 PM
It seems that in .NET 3.5, the static methods on the Regex class will use cached versions of compiled regular expressions. According to the .NET 3.5 Documentation for the System.Text.Regex class:
The Regex class contains several static (or Shared in Visual Basic) methods that allow you to use a regular expression without explicitly creating a Regex object. In the .NET Framework version 2.0, regular expressions compiled from static method calls are cached, whereas regular expressions compiled from instance method calls are not cached. By default, the regular expression engine caches the 15 most recently used static regular expressions. As a result, in applications that rely extensively on a fixed set of regular expressions to extract, modify, or validate text, you may prefer to call these static methods rather than their corresponding instance methods.
I immediately flagged this tidbit, not because it had anything to do with the problem I was trying to solve, but because, in fine Regex tradition, it seems that using a Regex might create two problems: now I have to worry about caching. It seems like one of those solutions to a problem that I may not even have; and on top of that, now I have to worry about usage of the static methods on Regex by callers other than my code: code written by other team members, code in other libraries, code in the .NET Framework itself. The thing is, this caching policy is on such a small scale that it works for only the smallest applications and in order to tune it, you need to do the same thing that you’d have to do to fix performance issues in the first place: profile and audit the code to find the problem. My inclination is to avoid the static methods altogether, which is probably not the intended effect.
This kind of thing crops up in the .NET Framework a few other places; most notably the ServicePointManager – you simply can’t create IP connections without going through ServicePointManager, but that introduces its own complications and behaviors that in some instances, you need to go to extreme lengths to code around, since all these classes invariably involve some combination of private, internal, or sealed classes. I was reading the documentation on Edi Weitz’ Regex Coach which states “It might be worthwhile to note that due to the dynamic nature of Lisp The Regex Coach could be written without changing a single line of code in the CL-PPCRE engine itself although the application has to track information and query the engine while the regular expressions is parsed and the scanners are built. All this could be done ‘after the fact’ by using facilities like defadvice and :around methods.” Edi’s comparing with Perl, but the situation’s the same in .NET – if you don’t like the way the default libraries work, well, that’s tough.
— Gordon Weakliem
Comment
Commenting is closed for this article.