Linkify tweets with regex

Regular expressions are powerful, useful, and — in my opinion — lots of fun! Thanks to the prevalence of Twitter, every web developer will be exposed to regex sooner or later: before outputting tweets in HTML, Twitter names and hyperlinks must be wrapped in anchor tags.
Matching @names
Here's the gist: a match will begin with "@" and the at sign must be followed by one or more word (letter / number / underscore) characters. The @name must either appear at the beginning of the tweet or be preceded by a space. This prevents the regular expression from matching "@example" in "me@example.com".
JavaScript implementation
tweet.replace(/(^|\s)(@\w+)/gm, '$1<a href="http://twitter.com/$2">$2</a>');
It would of course be nicer to write:
tweet.replace(/(?<=(?:^|\s))(@\w+)/gm, '<a href="http://twitter.com/$1">$1</a>');
Unfortunately, JavaScript does not support lookbehinds in regular expressions, so one's forced to capture the preceding space character (if in fact there is one) and spit it out in the replacement string.
PHP implementation
preg_replace('/(^|\s)(@\w+)/m', '$1<a href="http://twitter.com/$2">$2</a>', $tweet);
Python implementation
Python does support lookbehinds, but only fixed-width lookbehinds, so it
won't allow (?<=^|\s). No matter.
import re re.sub(r'(?m)(^|\s)(@\w+)', lambda m: m.group(1) + '<a href="http://twitter.com/' + m.group(2) + '">' + m.group(2) + '</a>', tweet)
For once, Python's syntax is the least elegant!
Interestingly, while testing these snippets I found I did not need to specify multi-line mode. Perhaps multi-line mode is assumed? I'd like to know the answer.
Matching hyperlinks
The regular expression involved in matching hyperlinks is more complex. I'll point you to John Gruber's liberal regex for matching URLs as he's clearly put a great deal of thought into what is essentially a single line of code!
Possibly related posts
- End of string anchor in JavaScript regular expressions
- PHP brush for SyntaxHighlighter
- Self-caching functions in JavaScript and Python
- Converting integers to ordinals
- Filtering lists in Python, Ruby, and JavaScript
Comments
John Gruber's regex is too liberal for tweets URLs, because sometimes people will do the following: "I like this URL http://t.co/awiefj, and it likes me."
His regex will capture the final comma, but it should not be captured. Then, things get trickier if the URL is adjacent to ".
Yep, matching URLs in text is something that's impossible to do with 100% accuracy, no matter how many hours you spend fiddling with your regex. I think the best approach is to write something simple which handles the common cases, and not worry about the inevitable failures. I agree that matching the comma in your example is bad; that's a common case I'd like to handle "correctly".