Tech Note: A Regular Expression to Convert Hashtags and Twitter Names to Links
Once in a while I come up with a bit of code I think might be useful to others, and here’s the latest example: a regular expression that parses a string of text looking for Twitter hashtags (preceded with ‘#’) or usernames (preceded with ‘@’), and replaces them with HTML links to those pages at twitter.com. Every bit of text posted at LGF in a comment, page or article gets passed through this code.
It’s written in PHP, and uses preg_replace_callback()
to process the matches with an anonymous function.
The regular expression is fairly sophisticated; named capturing groups are used to make it easier to reference the groups (instead of counting up parentheses), and it uses positive lookbehind combined with a lookaround conditional in order to avoid converting patterns that look like hashtags or usernames if they’re already contained within an HTML link.
Don’t panic, I won’t go into any more depth on the regular expression; I’ll leave that as an exercise for those who may be so inclined.
$text = preg_replace_callback(
'~
(?<=
^
|
(?<=
[^a-zA-Z0-9-.&]
)
)
(?(?=
]*>.+?
)
(?:
]*>.+?
)
|
(?
[@#]{1}
)
(?
[A-Za-z_]+[A-Za-z0-9_]+
)
)
~xsi',
function($matches) {
if ($matches['name']) {
return (
'#'
: $matches['name'] . '/with_replies">@'
) . $matches['name'] . ''
);
} else {
return $matches[0];
}
},
$text
);