A bit on the performance of lexers in Python
I was expecting the runtime [of the hand written lexer] to be much closer to the single-regex version; in fact I was expecting it to be a bit slower (because most of the regex engine work is done at a lower level). But it turned out to be much faster, more than 2.5x.
In the comments Caleb Spare pointed to Rob Pike's Regular expressions in lexing and parsing which reiterates the arguments for simple lexers that don't use regular expressions. Despite all of this, regular expression based lexers are extremely common in the Python world.
Good lexing and even parsing algorithms are both extremely efficient and very well known (the problem has been studied almost since the start of computer science). A good high performance lexer generally looks at each input character only once and runs a relatively short amount of focused code per character to tokenize the input stream. A good regular expression engine can avoid backtracking but is almost invariably going to run more complex code (and often use more memory) to examine each character. As covered in Russ Cox's series on regular expressions, garden variety regular expression engines in Python, Perl, and several other languages aren't even that efficient and do backtrack (sometimes extensively).
(Given this I wouldn't be surprised if a hand-written Python lexer that was run under PyPy was quite fast, either competitive with or even faster than a Python regex-based one. Assembling a test case and doing the benchmarking work is left as an exercise.)