Whatterz


Parsing Twitter Usernames, Hashtags and URLs with ColdFusion

by Simon. Average Reading Time: about 2 minutes.

Some time ago, well almost a year ago actually, I posted an article called Parsing Twitter Usernames, Hashtags and URLs with JavaScript. From that article, it became immediately apparent that this was an issue many people were confronting and one that required an answer. Now, belatedly, it is the turn of ColdFusion to get the Twitter love.

Compared to JavaScript it is far easier to parse the URLs, Usernames and Hashtags in a tweet using ColdFusion and minor amendments to the regular expressions used in the JavaScript code.

Below is an example tweet that I’ll use for this post.

<cfset myTweet = "Woot! I've just taken receipt of my Holux M-241 GPS logger. Good call @fordie. http://bit.ly/2RsAu ##holux ##gpslogger" />

NB. For the purpose of this test, I need to double-hash the hashtags to prevent ColdFusion throwing an error.

Parsing URLs as Links to the resource

We can simply demonstrate the parsing of the link with the following code in the body of the page:

<cfset myTweet = REReplace(myTweet,'([A-Za-z]+:\/\/[A-Za-z0-9-_]+\.[A-Za-z0-9-_:%&amp;\?\/.=]+)','<a href="\1">\1</a>','ALL') />

NB. The \1 is a back reference to part of the regular expression match. A backreference stores the part of the string matched by the part of the regular expression inside the parentheses. This means you can reuse it inside the regular expression, or afterwards as I am doing in each of these examples.

The resultant HTML generated is the following:

Woot! I've just taken receipt of my Holux M-241 GPS logger. Good call @fordie. <a href="http://bit.ly/2RsAu">http://bit.ly/2RsAu</a> #holux #gpslogger

Parsing Usernames as Links to Twitter

Following on from the URL example above, we can apply a similar methodology to Twitter usernames since they can also be URLs to their associated Twitter page.

We can simply demonstrate this with the following code:

<cfset myTweet = REReplace(myTweet,'[@]+([A-Za-z0-9-_]+)','<a href="http://twitter.com/\1" rel="nofollow">@\1</a>','ALL') />

The regular expression in this case finds all instances of @username. The Twitter URL is then applied to the username.

The resultant HTML generated is the following:

Woot! I've just taken receipt of my Holux M-241 GPS logger. Good call <a href="http://twitter.com/fordie" rel="nofollow">@fordie</a>. http://bit.ly/2RsAu #holux #gpslogger

Parsing Hashtags as Links to Twitter’s Search

Finally, Twitter also allows user’s to create Hastags within their posts. Hashtags are a community-driven convention for adding additional context and metadata to your tweets. Like regular URLs and usernames, Hastags can been parsed as a URL to an online resource, in this case, Twitter’s search.

We can simply demonstrate this with the following code:

<cfset myTweet = REReplace(myTweet,'[##]+([A-Za-z0-9-_]+)','<a href="http://search.twitter.com/search?q=%23\1" rel="nofollow">##\1</a>','ALL') />

The regular expression in this case finds all instances of #hashtag. The Twitter Search URL is then applied to the hashtag.

The resultant HTML generated is the following:

Woot! I've just taken receipt of my Holux M-241 GPS logger. Good call @fordie. http://bit.ly/2RsAu <a href="http://search.twitter.com/search?q=%23holux" rel="nofollow">#holux</a> <a href="http://search.twitter.com/search?q=%23ipslogger" rel="nofollow">#gpslogger</a>

All in one

So, putting all the regular expressions together, you would end up with the following:

Woot! I've just taken receipt of my Holux M-241 GPS logger. Good call <a href="http://twitter.com/fordie" rel="nofollow">@fordie</a>. <a href="http://bit.ly/2RsAu">http://bit.ly/2RsAu</a> <a href="http://search.twitter.com/search?q=%23holux" rel="nofollow">#holux</a> <a href="http://search.twitter.com/search?q=%23gpslogger" rel="nofollow">#ipslogger</a>

Which translates as the more useful tweet:

Woot! I’ve just taken receipt of my Holux M-241 GPS logger. Good call @fordie. http://bit.ly/2RsAu #holux #gpslogger

Where to take it next

Wrap these code snippets up into a simple twitterise function could be a good starter for ten. Following that, we could also create a simple Twitter feed reader, but I’ll leave that up to you to develop.

This article has been tagged

, , , , , , , , , , , ,

Other articles I recommend

Parsing Twitter Usernames, Hashtags and URLs with JavaScript

As part of an AIR project that I have been working on with my good friend Rob, we came across the need to parse a number of URLs within the text of a Twitter post. This may not sound too easy at first, but thanks to the prototype property available on JavaScript objects, our task was a relatively simple one.

Tweet-specific Language

Over time Twitter, or more accurately, Tweets have acquired a unique lexicon of their own. Some of the volcabulary has been around since the dawn of Twitter — like @username at the beginning of a Tweet — whilst others are relatively recent — such as lists — but all of them make the language of Tweets unique.

Twitter Monitoring and Analytics Tools

Do you want to get serious about using Twitter to market your services? Do you need to measure how much impact a topic has on Twitter? Or are you just just curious about your Twitter “performance” or perhaps someone elses? Well, here’s the good news: there are lots of analytics tools you can use to measure topics, followers, retweets and more. Some of them even provide you with free useful tools and widgets to integrate into your website or blog.

  • Devin

    Thanks a ton for this post, and the JS equivalent. You just saved me a lot of headache in figuring out how to turn hashtags into links. I’m not good with regex, so it would have taken me along time to figure out…

  • http://blog.peterfisher.me.uk PFWD

    Thanks,
    Helped a lot

  • http://www.joshfraser.com Josh Fraser

    Thanks for sharing this. One tweak is to check that you don’t match email addresses when parsing twitter handles. For example, email@domain.com shouldn’t link on @domain.

  • jmcdanielx

    Thanks for the great posted, here is my php adaptation for preg_replace():

    // replace urls with protocols first
    $tweet = preg_replace(‘/[A-Za-z]+:\/\/[A-Za-z0-9-_]+\.[A-Za-z0-9-_:%&\?\/.=]+/’, ‘$0‘, $tweet);

    // replace urls without protocols next
    $tweet = preg_replace(‘/[A-Za-z0-9-_]+\.[A-Za-z0-9-_]+\.[A-Za-z0-9-_]+\.[A-Za-z0-9-_:%&\?\/.=]+/’, ‘$0‘, $tweet);

    // apply links to username
    $tweet = preg_replace(‘/[@]+([A-Za-z0-9-_]+)/’, ‘$0‘, $tweet);

    // finally replace search strings
    $tweet = preg_replace(‘/[#]+([A-Za-z0-9-_]+)/’, ‘$0‘, $tweet);

  • http://twitter.com/SimianE Gary Stanton

    Thanks for this, worked a treat.

  • http://www.inframes.com Jon Ewing

    Really useful bit of a script – saved me a load of time, thanks

  • http://www.inframes.com Jon Ewing

    Really useful bit of a script – saved me a load of time, thanks