Whatterz


Parsing Twitter Usernames, Hashtags and URLs with JavaScript

by Simon. Average Reading Time: about 4 minutes.

Updated 10/05/2011

As part of an AIR project that I have been working on with my good friend Rob, we came across the need to parse a number of URLs within the text of a Twitter post. This may not sound too easy at first, but thanks to the prototype property available on JavaScript objects, our task was a relatively simple one.

The prototype object of JavaScript is a pre-built object that simplifies the process of adding custom properties or methods to all instances of an object. For example, there is not a trim() method available on the String class, therefore, through the wizardry of regular expressions and the prototype property, I can add one.

You simply need to specify String.prototype before your method definition. e.g.:

String.prototype.trim = function() {
	return this.replace(/^\s+|\s+$/g,"");
}

With this in mind, we can add methods to our String class, at runtime, that will allow us to manipulate the text string that is passed back in a Twitter JSON packet.

The Goal

To auto-magically parse different types of links within a text string. We will look at standard URL links, links applied to Twitter usernames and those applied to Hashtags.

Demo

The demonstration simply takes a test string and outputs it to the screen using JavaScript.

See the demo in action.

Parsing URLs as Links to the resource

First we create a custom method of the String.prototype property called parseURL. When invoked on a string, the regular expression finds any instance of a URL and will wrap the URL with an HTML anchor, with the correct href attribute and value applied.

String.prototype.parseURL = function() {
	return this.replace(/[A-Za-z]+:\/\/[A-Za-z0-9-_]+\.[A-Za-z0-9-_:%&~\?\/.=]+/g, function(url) {
		return url.link(url);
	});
};

Demo 1.

We can simply demonstrate the parsing of the link with the following code in the body of the page:

<script type="text/javascript">
var test = "Simon Whatley's online musings can be found at: http://www.simonwhatley.co.uk";
document.write(test.parseURL());
</script>

In the above example, a simple string variable is created called test, which contains a URL. The text does not contain any HTML at this stage. We then write out the test variable applying the parseURL() method to it.

The resultant HTML generated is the following:

Simon Whatley's online musings can be found at: <a href="http://www.simonwhatley.co.uk">http://www.simonwhatley.co.uk</a>

When rendered in a browser, the code becomes a hyper-link.

Parsing Usernames as Links to Twitter

Following on from the URL example above, we can apply a similar methodology to Twitter usernames since they can also be URLs to their associated Twitter page.

Again we create a custom method of the String.prototype property, this time we’ll called it parseUser. The regular expression in this case finds all instances of @username. We then simply replace the @ as this is not part of the actual username. The Twitter URL is then applied to the username.

String.prototype.parseUsername = function() {
	return this.replace(/[@]+[A-Za-z0-9-_]+/g, function(u) {
		var username = u.replace("@","")
		return u.link("http://twitter.com/"+username);
	});
};

Demo 2.

We can simply demonstrate this with the following code:

<script type="text/javascript">
var test = "@whatterz is writing a post about JavaScript.";
document.writeln(test.parseUsername());
</script>

The resultant HTML generated is the following:

<a href="http://twitter.com/whatterz">@whatterz</a> is writing a post about JavaScript

Parsing Hashtags as Links to Twitter’s Search

Finally, Twitter also allows user’s to create Hastags within their posts. Hashtags are a community-driven convention for adding additional context and metadata to your tweets. Like regular URLs and usernames, Hastags can been parsed as a URL to an online resource, in this case, Twitter’s search.

Again we create a custom method of the String.prototype property, this time we’ll called it parseHashtag. The regular expression in this case finds all instances of #hashtag. The Twitter Search URL is then applied to the hashtag.

String.prototype.parseHashtag = function() {
	return this.replace(/[#]+[A-Za-z0-9-_]+/g, function(t) {
		var tag = t.replace("#","%23")
		return t.link("http://search.twitter.com/search?q="+tag);
	});
};

Demo 3.

We can simply demonstrate this with the following code:

<script type="text/javascript">
var test = "Simon is writing a post about #twitter and parsing hashtags as URLs";
document.writeln(test.parseHashtag());
</script>

The resultant HTML generated is the following:

Simon is writing a post about <a href="http://search.twitter.com/search?q=%23twitter">#twitter</a> and parsing hashtags as URLs

NB. Twitter’s search was originally provided by Summize. However, as of July 2008, they have been bought by Twitter and the search can be found at http://search.twitter.com.

Where to take it next

Using the above code, we can now create a simple Twitter feed reader. Using, for example jQuery, to get and parse the Twitter JSON packet we can then apply the prototype methods to the text entries.

It is also worth noting that it is possible to cascade the methods, so we can do the following:

<script type="text/javascript">
var test = "@whatterz is writing a blog post about #twitter, which can be found at http://www.simonwhatley.co.uk";
document.writeln(test.parseURL().parseUsername().parseHashtag());
</script>

Download the code

The example code can be downloaded from the demo page.

This article has been tagged

, , , , , , , , , , , , ,

Other articles I recommend

Parsing Twitter Usernames, Hashtags and URLs with ColdFusion

Some time ago, well almost a year ago actually, I posted an article called Parsing Twitter Usernames, Hashtags and URLs with JavaScript. From that article, it became immediately apparent that this was an issue many people were confronting and one that required an answer. Now, belatedly, it is the turn of ColdFusion to get the Twitter love.

Using the MooTools Autocompleter Plugin with ColdFusion

In a previous post, I demonstrated how to implement Dylan Verheul’s jQuery Autocomplete plugin. Not content with demonstrating one library’s plugin, it is now the turn of Mootools.

Using jQuery Auto-Complete with ColdFusion

Creating an autocomplete form field historically has not been a trivial matter and would require an indepth knowledge of JavaScript and CSS. However, the task is made far more simple when using one of the many freely-available JavaScript libraries. In this post I will show you how to implement the jQuery Autocomplete created by Dylan Verheul.

  • Catherine Mortali

    Great info!

    I’m using the ParseURL function on a block of text that I’m passing using a CF variable. It’s only grabbing the first URL and creating the hyperlink, not the whole text.

    Here’s my code snippet:

    var #toScript(thisString, “test”)#;

    document.write(test.parseURL());

    Any ideas why it won’t work on the entire text?

    Thanks,
    Catherine

  • @Catherine, the post relates to pure JavaScript. If you need to use a ColdFusion variable, you need to make it available in the DOM for JavaScript to ‘read’ and then parse.

  • Amanda

    How would I use the parseURL function with a form submission? I want to be able to have a user enter some information in a text area, and if they included any urls, I need to use the parseURL function on them.

    My code:

    Tips:

    tips ?>

    I also tried using onSubmit and using just tips.parseURL…
    I am not very good with Javascript yet, so maybe you could explain what I’m supposed to do to get this to work properly?

    Thanks a lot!

  • Amanda

    oh, sorry lol the html was used as html…here it is:

    Tips:
    tips ?>
    

    hopefull this’ll show up…sorry about that!

  • Amanda

    and it didn’t….I don’t know how to show you the code then…

    I was using a textarea with id=”tips” and was trying to call the parseURL method with onClick on the submit button .. onClick=”document.write(tips.parseURL())” and also tried onSubmit instead of onClick and just tips.parseURL() as well

  • Steve

    Howdy! Thanks for the article.

    So how do I assign the contents of the JSON packet to a variable? Right now I’m using the little patch of Javascript that Twitter provided to display a few Twitter items, but I’m not sure how to actually execute these functions on that data. Is there a way to assign the content to a variable, so I can run these functions on it?

    Thanks for your help…

    Steve

  • @amanda I would suggest using a server-side code to parse the URLs on form submission rather than JavaScript, ‘on the fly’. However, you could use jQuery (or other JavaScript framework) and the above String methods to manipulate the contents of the textarea on an event such as onblur.

  • @steve it’s a simple case of declaring a variable and interrogating the webservice with, for example jQuery. e.g.:

    var json = $.getJSON( [url], [data], [callback] );

    You need to replace the bracketed (i.e. [url] etc) variables with your own parameters.

    N.B. The $ is jQuery, so you will also need to include the jQuery library.

  • Hi Simon. Just wanted to say that this is exactly the answer to my dilemma. I’ve taken a moment to share this with the CMS Made Simple community and have a post on the forum regarding this. You can view it here http://forum.cmsmadesimple.org/index.php/topic,29083.new/spam,true.html

    Thanks for this!

  • using all of them in one go, as showed above you might sometimes get errors if using large object references directly

    e.g twitt = { “text” : “some twitter text http://somewhere.com“, source : “web” }

    cascading … twitt.text.parseURL().parseUsername().parseHashtag() will give an error in FF, IE

    You could better still just use

    document.writeln( ( ( twitt.text.parseURL() ).parseUsername() ).parseHashtag() );

  • Luis

    Wow! Turn this into a WordPress plugin which can parse newly posted AND existing posts and i would be on cloud 9!

  • Works like a charm. Better than needing to code this myself.

    Many thanks.

  • I found this post while doing a quick search for a PHP function to do the thing. Mostly I was just glad someone had saved me the trouble of putting together the regex for each of the components.

    Anyhow, I’ll share my PHP version here in case any one else stumbling upon this post is looking for it.

    function twitterize($raw_text) {
    	$output = $raw_text;
     
    	// parse urls
    	$pattern = '/([A-Za-z]+:\/\/[A-Za-z0-9-_]+\.[A-Za-z0-9-_:%&amp;\?\/.=]+)/i';
    	$replacement = '<a href="$1" rel="nofollow">$1</a>';
    	$output = preg_replace($pattern, $replacement, $output);
     
    	// parse usernames
    	$pattern = '/[@]+([A-Za-z0-9-_]+)/';
    	$replacement = '@<a href="http://twitter.com/$1" rel="nofollow">$1</a>';
    	$output = preg_replace($pattern, $replacement, $output);
     
    	// parse hashtags
    	$pattern = '/[#]+([A-Za-z0-9-_]+)/';
    	$replacement = '#<a href="http://search.twitter.com/search?q=%23$1" rel="nofollow">$1</a>';
    	$output = preg_replace($pattern, $replacement, $output);
     
    	return $output;
    }

    Thanks again for saving me some work!

  • Looks like my code fell victim to an overzealous HTML parser. The $replacement variables above should be hyperlink tags, with the $1 as the URL and the value, e.g.

    a href=”http://twitter.com/$1″

    Sorry that didn’t work out so well.

  • I don’t see that anyone else has mentioned this, but it looks like the javascript doesn’t handle multiple hashtags (I haven’t tested out multiple @usernames yet).
    For example: http://twitter.com/theadb/statuses/1429168203
    Any thoughts on how that might be handled?

  • I hacked Jim’s code from above and was successful in using it, so I thought I would share it for the PHP folks out there. Use

    ${1}

    instead of

    $1

    to be able to add surrounding text, such as the

    <a>

    tags in this case.

    See “Example #1 Using backreferences followed by numeric literals” at http://ca.php.net/preg_replace

    function twitterize($raw_text) {
    	$output = $raw_text;
     
    	// parse urls
    	$pattern = '/([A-Za-z]+:\/\/[A-Za-z0-9-_]+\.[A-Za-z0-9-_:%&amp;\?\/.=]+)/i';
    	$replacement = '<a href="${1}" rel="nofollow">${1}</a>';
    	$output = preg_replace($pattern, $replacement, $output);
     
    	// parse usernames
    	$pattern = '/[@]+([A-Za-z0-9-_]+)/';
    	$replacement = '<a href="http://twitter.com/${1}" rel="nofollow">@${1}</a>';
    	$output = preg_replace($pattern, $replacement, $output);
     
    	// parse hashtags
    	$pattern = '/[#]+([A-Za-z0-9-_]+)/';
    	$replacement = '<a href="http://search.twitter.com/search?q=%23${1}" rel="nofollow">#${1}</a>';
    	$output = preg_replace($pattern, $replacement, $output);
     
    	return $output;
    }
  • Not sure what happened with the previous comment, but here is the correct code:

    function twitterize($raw_text) {
    	$output = $raw_text;
     
    	// parse urls
    	$pattern = '/([A-Za-z]+:\/\/[A-Za-z0-9-_]+\.[A-Za-z0-9-_:%&amp;\?\/.=]+)/i';
    	$replacement = '<a href="${1}" rel="nofollow">${1}</a>';
    	$output = preg_replace($pattern, $replacement, $output);
     
    	// parse usernames
    	$pattern = '/[@]+([A-Za-z0-9-_]+)/';
    	$replacement = '<a href="http://twitter.com/${1}" rel="nofollow">@${1}</a>';
    	$output = preg_replace($pattern, $replacement, $output);
     
    	// parse hashtags
    	$pattern = '/[#]+([A-Za-z0-9-_]+)/';
    	$replacement = '<a href="http://search.twitter.com/search?q=%23${1}" rel="nofollow">#${1}</a>';
    	$output = preg_replace($pattern, $replacement, $output);
     
    	return $output;
    }
  • @Jonathan regarding multiple hashtags, you’ll want to add the “global” modifier to the regular expression, ie:

    /[#]+[A-Za-z0-9-_]+/g

    … the “/g” instead of just “/” will match all occurrences.

  • Oscar Rottink

    Thanks Simon, this was very very usefull! I’m not a real programmer but sometimes I have an idea and like to try to realize it. This helped me so much.
    @Mark Quenzada, great! I had the same problem with multiple hashes and urls, this works like a charm. Happy person here :)

  • jim jim

    Thank you so much for this script. It is so helpful and saved me many hours and headaches.

  • anyone able to build this into a wordpress plugin??!

    that would be killer!

  • Super helpful, thanks! Made use of it on our website.

    Cheers mate.

    ` C

  • @simon,

    Thanks for this. Will save me a job! Hope things are going well with you at the moment.

    Cheers

  • i spent a couple hours writing regex before i found this… thanks for your work :) i should know by now almost everything has been done before

  • Aaaah this is good. I’m going to start doing this :)

  • Pingback: How to Parse Twitter Usernames, Hashtags and URLs in C# 3.0 » jesal gadhia()

  • Just wanted to say thanks for this. You saved me a ton of time! :)

  • NobodyReally

    Very very handy, thanks for this!

  • Thanks – I just used this code for a quick mashup.

    Quick hint – the regexes need to end in /g so that they do global replace, not just replacing the first match. Eg.:

    /[#]+[A-Za-z0-9-_]+/g not /[#]+[A-Za-z0-9-_]+/

    Thanks!

    Stef

  • When using the routine for matching names (@niczak for example) how do you avoid matching email addresses as well? I think a W@ should be in place to avoid this, other suggestions?

  • When using the routine for matching names (@niczak for example) how do you avoid matching email addresses as well? I think a W@ should be in place to avoid this, other suggestions?

  • Conor

    Thanks very much for this code. I’ve used it in a Google Maps & Twitter Mashup. Very handy. Cheers

  • The regex to parse hashtags is getting only the first hashtag. If a tweet has two hashtags, for example, it will parse only the first one.

    To correct this, just put a “g” in the end of the regex. Like this:

    String.prototype.parseHashtag = function() {
    return this.replace(/[#]+[A-Za-z0-9-_]+/g, function(t) {
    var tag = t.replace(“#”,”%23″)
    return t.link(“http://search.twitter.com/search?q=”+tag);
    });
    };

    Thank you for the post!

  • Eduardo Cancino

    If you have an email address it recognizes it as twitter username, even if it is on a link

  • Eduardo Cancino

    If you have an email address parser the username parser getts it wrong, to correct this:

    Just added ^ at the beginning of pattern, hope it helps

    p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 11.0px Monaco}
    p.p2 {margin: 0.0px 0.0px 0.0px 0.0px; font: 11.0px Monaco; color: #413af9}
    span.s1 {color: #901a66}
    span.s2 {color: #000000}
    span.s3 {color: #413af9}
    span.Apple-tab-span {white-space:pre}

    String.prototype.parseUsername = function() {
    return this.replace(/^[@]+[A-Za-z0-9-_]+/g, function(u) {
    var username = u.replace(“@”,””)
    return u.link(“http://twitter.com/”+username);
    });
    };

  • All of the patterns used in here will match things that look like twitter elements but are not, for example # that’s part of a URL, @ that’s part of an email address. They also match misformatted items like @@  and ##. Eduardo Cancino suggested a fix that won’t really work as it will only match at the start of a string (which will catch most replies); you need to enforce that they are not preceeded by a word boundary. Twitter user names can’t contain ‘-‘, so that should not be included in the class (which is wrong anyway as ‘-‘ has a special meaning unless it’s the last character in a range). Conveniently, the chars allowed in a twitter id are exactly those included in the w class. Twitter IDs are limited to 20 characters, so anything that’s over 20 chars is not a twitter ID, so the pattern should assert that too.

    Hashtags are not limited to ASCII, so a better pattern for that should allow anything but whitespace (and by experimentation with twitter’s own client, some punctuation, extend as you like).

    So, revised regexes (should work in both PHP and JS) are:

    Twitter ID:

    /(?!b)@[w]{1,20}(?![w])/

    Hashtag:

    /(?!b)#[^s.'”]+/

    So given this tweet:

    #hello @xyz @hello #hashtag nobody@example.com #blah @iamnotatwitteridatall http://www.examplecom/index.php#fragment #<3 #bababa

    It should extract @hello and @xyz twitter IDs (but NOT @example or @iamnotatwitteridatall) and #hello, #hashtag, #<3 and #bababa (but NOT #fragment).

  • A83

    If I would like to add an class to each of these links so I could style hashtags, urls and usernames differently how would I do that?

    Thanks!

  • Life saver, great code =] thanks

  • thx!

  • Eric

    Hi,

    Great post, this solved my exact problem.  However, I still can’t figure out how to make the links open up in a new window/tab.  How can we modify the parseURL() function to set the target attribute of the anchor tag to _blank?

    Thanks,

    -Eric

  • Eric Atallah

    Hello,

    Is there anyway to add:  target=”_blank”  to the hyperlink so it opens in a new window/tab?

    Thanks!

  • Pingback: Parsing tweet for Hashtags, Usernames and URLs in Java « Intelligrape Groovy & Grails Blogs()

  • Thank you for your wonderful work!

  • roma8989

    great post… this works wonders…..

    is it possible to give these links different classes (you know, a diff class for hashtag urls than for user urls, for example), in case you want to have ursers & hashtags in diff font colors?

  • roma8989

    the url for the hashtag doesn’t work.. get this:

    The Twitter REST API v1 is no longer active. Please migrate to API v1.1.
    https://dev.twitter.com/docs/api/1.1/overview

    what is correct url for the hashtag with new 1.1 API, please…. went to Twitter, from their hashtag urls tried this

    var tag = t.replace(“#”,””);
    return t.link(‘http://search.twitter.com/search?q=%23’ + tag + ‘&src=hash’);

    but get exact same message about migrating to API v1.1

    thank you…..

  • roma8989

    I don’t know why my comment about problem with parsing hashtag has been held for moderation for hours now, but I found solution:

    in function to parse hashtag, instead of this

    var tag = t.replace(“#”,”%23″);
    return t.link(‘http://search.twitter.com/search?q=’ + tag);

    use this

    var tag = t.replace(“#”,””);
    return t.link(‘https://twitter.com/search?q=%23’ + tag + ‘&src=hash’);

  • Tobias Feistmantl

    Great article! Thanks!