Regular expressions are powerful, useful, and — in my opinion — lots of fun! Thanks to the prevalence of Twitter, every web developer will be exposed to regex sooner or later: before outputting tweets in HTML, Twitter names and hyperlinks must be wrapped in anchor tags.
Here's the gist: a match will begin with "@" and the at sign must be followed by one or more word (letter / number / underscore) characters. The @name must either appear at the beginning of the tweet or be preceded by a space. This prevents the regular expression from matching "@example" in "email@example.com".
tweet.replace(/(^|\s)(@\w+)/gm, '$1<a href="http://twitter.com/$2">$2</a>');
It would of course be nicer to write:
tweet.replace(/(?<=(?:^|\s))(@\w+)/gm, '<a href="http://twitter.com/$1">$1</a>');
preg_replace('/(^|\s)(@\w+)/m', '$1<a href="http://twitter.com/$2">$2</a>', $tweet);
Python does support lookbehinds, but only fixed-width lookbehinds, so it
(?<=^|\s). No matter.
import re re.sub(r'(?m)(^|\s)(@\w+)', lambda m: m.group(1) + '<a href="http://twitter.com/' + m.group(2) + '">' + m.group(2) + '</a>', tweet)
For once, Python's syntax is the least elegant!
Interestingly, while testing these snippets I found I did not need to specify multi-line mode. Perhaps multi-line mode is assumed? I'd like to know the answer.
The regular expression involved in matching hyperlinks is more complex. I'll point you to John Gruber's liberal regex for matching URLs as he's clearly put a great deal of thought into what is essentially a single line of code!