Escaping HTML in JavaScript

I recently came across an interesting article at wonko.com on HTML escaping, which provoked me to rewrite Bitbucket's escape function (invoked from within Underscore templates):

function makeSafe(text) {
  return text.replace(/[&<>"'`]/g, function (chr) {
    return '&#' + chr.charCodeAt(0) + ';';
  });
};

This ensures that inserted content cannot escape the confines of a quoted attribute value. Unquoted attributes are more problematic:

Unquoted attribute values are one of the single biggest XSS vectors there is. If you don’t quote your attribute values, you’re essentially leaving the door wide open for naughty people to inject naughty things into your HTML. Very few escaper implementations cover all the edge cases necessary to prevent unquoted attribute values from becoming XSS vectors.

To accommodate unquoted attribute values, the following function could be used instead:

function makeSafe(text) {
  return text.replace(/\W/g, function (chr) {
    return '&#' + chr.charCodeAt(0) + ';';
  });
};

I created a jsPerf test case which confirms that there's a performance hit associated with using this more liberal regular expression. Keep in mind, though, that if “A” takes 1ms to execute and “B” takes ten times as long, “B” still only takes 10ms. Quite often a significant comparative speed difference is insignificant in absolute terms; I'd argue that this is the case here.

Respond