Validating with regular expressions
If you actually check the Google query I linked above, people have been writing (or trying to write) RFC-compliant regular expressions to parse email addresses for years.
But what if I told you there were a way to determine whether or not an email is valid without resorting to regular expressions at all? The activation email is a practice that’s been in use for years, but it’s often paired with complex validations that the email is formatted correctly.
But before we benchmark: let’s look at the solution candidates first.
To be able to compare results, we decided on a baseline for our benchmark.
And it was: a regular expression match was to blame!
Actually not the regular expression itself, but the regular expression property was obviously being set, and validating the incoming value using a regular expression.
No backtracking, no capturing, no nothing which the regular expression engine provides us. Benchmark Dot Net makes it incredibly easy to write benchmarks and get results in a structured way.
Just a boolean giving us a clue on whether the value is valid or not. Go check their getting started with Benchmark Dot Net page - it takes a couple of attributes and a method call to run a reliable benchmark.
They can get ridiculously convoluted as in the case above and, according to the specification, are often too strict anyway.Always fascinated by the inner workings of things, I decided to open and decompile this generated assembly using dot Peek.Three classes seem to be generated: A factory (used internally by the Regex engine), a That’s… But nevertheless: it’s all compiled, so performance may just be awesome!Not a super big deal in itself, except that while mapping this validation was executed for a few thousand objects, resulting in around one second (sometimes more) to map these objects. Here’s the code that had the bottleneck:, supposed to be super fast?And especially in this case, where we’re only validating the string consists of a set of allowed characters, and making sure the string length is between 1 and 254 characters in length?