<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Simon Whatley &#187; Microsoft</title>
	<atom:link href="http://www.simonwhatley.co.uk/tag/microsoft/feed" rel="self" type="application/rss+xml" />
	<link>http://www.simonwhatley.co.uk</link>
	<description>The opposite of every great idea is another great idea</description>
	<lastBuildDate>Wed, 02 Nov 2011 09:28:34 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Mark Pilgrim &#8211; A Gentle Introduction to Video Encoding: Lossy Video Codecs</title>
		<link>http://www.simonwhatley.co.uk/mark-pilgrim-a-gentle-introduction-to-video-encoding-lossy-video-codecs</link>
		<comments>http://www.simonwhatley.co.uk/mark-pilgrim-a-gentle-introduction-to-video-encoding-lossy-video-codecs#comments</comments>
		<pubDate>Tue, 11 Oct 2011 09:00:51 +0000</pubDate>
		<dc:creator>Simon</dc:creator>
				<category><![CDATA[Technology]]></category>
		<category><![CDATA[Adobe]]></category>
		<category><![CDATA[Android]]></category>
		<category><![CDATA[Apple]]></category>
		<category><![CDATA[AVI]]></category>
		<category><![CDATA[bbc]]></category>
		<category><![CDATA[Codec]]></category>
		<category><![CDATA[Container formats]]></category>
		<category><![CDATA[Dirac]]></category>
		<category><![CDATA[DivX]]></category>
		<category><![CDATA[DivX-certified]]></category>
		<category><![CDATA[encoding]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[H.264]]></category>
		<category><![CDATA[iphone]]></category>
		<category><![CDATA[iTunes Store]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[Mac OS X]]></category>
		<category><![CDATA[Mark Pilgrim]]></category>
		<category><![CDATA[Microsoft]]></category>
		<category><![CDATA[Microsoft Corporation]]></category>
		<category><![CDATA[mobile devices]]></category>
		<category><![CDATA[MPEG]]></category>
		<category><![CDATA[MPEG-1]]></category>
		<category><![CDATA[MPEG-2]]></category>
		<category><![CDATA[MPEG-4]]></category>
		<category><![CDATA[Ogg]]></category>
		<category><![CDATA[open source decoder software]]></category>
		<category><![CDATA[Theora]]></category>
		<category><![CDATA[video]]></category>
		<category><![CDATA[Video codecs]]></category>
		<category><![CDATA[WMV]]></category>
		<category><![CDATA[Xiph.org Foundation]]></category>
		<category><![CDATA[YouTube]]></category>

		<guid isPermaLink="false">http://www.simonwhatley.co.uk/?p=4805</guid>
		<description><![CDATA[The most important consideration in video encoding is choosing a video codec. A future article will talk about how to pick the one that’s right for you, but for now I just want to introduce the concept and describe the playing field. (This information is likely to go out of date quickly; future readers, be aware that this was written in December 2008.)]]></description>
			<content:encoded><![CDATA[<p><strong>This article was first published on 19th December 2008, on Mark Pilgrim&#8217;s website. That website no longer exists so this article serves as an historical record. I have preserved all emphasis and links as per the original article.</strong></p>
<p>The most important consideration in video encoding is choosing a video codec. A future article will talk about how to pick the one that&#8217;s right for you, but for now I just want to introduce the concept and describe the playing field. (This information is likely to go out of date quickly; future readers, be aware that this was written in December 2008.)</p>
<p>When you talk about &#8220;watching a video,&#8221; you&#8217;re probably talking about a combination of one video stream, one audio stream, and possibly some subtitles or captions. But you probably don&#8217;t have two different files; you just have &#8220;the video.&#8221; Maybe it&#8217;s an AVI file, or an MP4 file. These are just container formats, like a ZIP file that contains multiple kinds of files within it. The container format defines how to store the video and audio streams in a single file (and subtitles too, if any).</p>
<p>When you &#8220;watch a video,&#8221; your video player is doing several things at once:</p>
<ol>
<li>Interpreting the container format to find out which video and audio tracks are available, and how they are stored within the file so that it can find the data it needs to decode next</li>
<li>Decoding the video stream and displaying a series of images on the screen</li>
<li>Decoding the audio stream and sending the sound to your speakers</li>
<li>Possibly decoding the subtitle stream as well, and showing and hiding phrases at the appropriate times while playing the video</li>
</ol>
<p>A <em>video codec</em> is an algorithm by which a video stream is encoded, i.e. it specifies how to do #2 above. Your video player <em>decodes</em> the video stream according to the <em>video codec</em>, then displays a series of images, or &#8220;frames,&#8221; on the screen. Most modern video codecs use all sorts of tricks to minimize the amount of information required to display one frame after the next. For example, instead of storing each individual frame (like a screenshot), they will only store the differences between frames. Most videos don&#8217;t actually change all that much from one frame to the next, so this allows for high compression rates, which results in smaller file sizes. (There are many, many other complicated tricks too, which I&#8217;ll dive into in a future article.)</p>
<p>There are <em>lossy</em> and <em>lossless</em> video codecs; today&#8217;s article will only deal with lossy codecs. A <em>lossy video codec</em> means that information is being irretrievably lost during encoding. Like copying an audio cassette tape, you&#8217;re losing information about the source video, and degrading the quality, every time you encode. Instead of the &#8220;hiss&#8221; of an audio cassette, a re-re-re-encoded video may look blocky, especially during scenes with a lot of motion. (Actually, this can happen even if you encode straight from the original source, if you choose a poor video codec or pass it the wrong set of parameters.) On the bright side, lossy video codecs can offer amazing compression rates, and many offer ways to &#8220;cheat&#8221; and smooth over that blockiness during playback, to make the loss less noticeable to the human eye.</p>
<p>There are <a href="http://samples.mplayerhq.hu/V-codecs/" title="Video codecs" target="_blank" rel="nofollow"><em>tons</em> of video codecs</a>. Today I&#8217;ll discuss five modern lossy video codecs: MPEG-4 ASP, H.264, VC-1, Theora, and Dirac.</p>
<h3>MPEG-4 ASP</h3>
<p>a.k.a. &#8220;MPEG-4 Advanced Simple Profile.&#8221; <a href="http://en.wikipedia.org/wiki/MPEG-4_Part_2" title="MPEG-4 ASP" target="_blank" rel="nofollow">MPEG-4 ASP</a> was developed by <a href="http://en.wikipedia.org/wiki/Moving_Picture_Experts_Group" title="The MPEG Group" target="_blank" rel="nofollow">the MPEG group</a> and standardized in 2001. You may have heard of <a href="http://en.wikipedia.org/wiki/DivX" title="Wikipedia: DivX" target="_blank" rel="nofollow">DivX</a>, <a href="http://en.wikipedia.org/wiki/Xvid" title="Wikipedia: Xvid" target="_blank" rel="nofollow">Xvid</a>, or <a href="http://en.wikipedia.org/wiki/3ivx" title="Wikipedia: 3ivx" target="_blank" rel="nofollow">3ivx</a>; these are all competing implementations of the MPEG-4 ASP standard. <a href="http://www.xvid.org/" title="Xvid" target="_blank" rel="nofollow">Xvid is open source</a>; DivX and 3ivx are closed source. The company behind DivX has had some mainstream success in branding &#8220;DivX&#8221; as synonymous with &#8220;MPEG-4 ASP.&#8221; For example, this <a href="http://www.amazon.com/Philips-DVP642-DivX-Certified-Progressive-Scan-Player/dp/B000204SWE" title="Amazon: DivX certifiied DVD Player" target="_blank" rel="nofollow">&#8220;DivX-certified&#8221; DVD player</a> can actually play <a href="http://www.jarnot.com/twiki/bin/view/Public/DVP642LisaBsAVIGuide" title="MPEG-4 ASP videos" target="_blank" rel="nofollow">most MPEG-4 ASP videos</a> in an AVI container, even if they were created with a competing encoder. (To confuse things even further, the company behind <a href="http://en.wikipedia.org/wiki/DivX#DivX_Media_Format_.28DMF.29" title="DivX has now created their own container format" target="_blank" rel="nofollow">DivX has now created their own container format</a>.)</p>
<p><strong>MPEG-4 ASP is patent-encumbered</strong>; licensing is brokered through the <a href="http://www.mpegla.com/" title="MPEG LA Group" target="_blank" rel="nofollow">MPEG LA group</a>. MPEG-4 ASP video can be embedded in most popular container formats, including AVI, MP4, and MKV.</p>
<h3>H.264</h3>
<p>a.k.a. &#8220;MPEG-4 part 10,&#8221; a.k.a. &#8220;MPEG-4 AVC,&#8221; a.k.a. &#8220;MPEG-4 Advanced Video Coding.&#8221; <a href="http://en.wikipedia.org/wiki/H.264" title="Wikipedia: H.264" target="_blank" rel="nofollow">H.264</a> was also developed by the <a href="http://en.wikipedia.org/wiki/Moving_Picture_Experts_Group" title="Wikipedia: Moving Picture Experts Group (MPEG)" target="_blank" rel="nofollow">MPEG group</a> and standardized in 2003. It aims to provide a single codec for low-bandwidth, low-CPU devices (cell phones); high-bandwidth, high-CPU devices (modern desktop computers); and everything in between. To accomplish this, the H.264 standard is split into &#8220;<a href="http://en.wikipedia.org/wiki/H.264#Profiles" title="Wikipedia: H.264 Profiles" target="_blank" rel="nofollow">profiles</a>,&#8221; which each define a set of optional features that trade complexity for file size. Higher profiles use more optional features, offer better visual quality at smaller file sizes, take longer to encode, and require more CPU power to decode in real-time.</p>
<p>To give you a rough idea of the range of profiles, <a href="http://www.apple.com/iphone/specs.html" title="Apple's iPhone supports Baseline profile" target="_blank" rel="nofollow">Apple&#8217;s iPhone supports Baseline profile</a>, the <a href="http://www.apple.com/appletv/specs.html" title="AppleTV supports Baseline and Main profiles" target="_blank" rel="nofollow">AppleTV set-top box supports Baseline and Main profiles</a>, and <a href="http://www.kaourantin.net/2007/08/what-just-happened-to-video-on-web_20.html" title="Adobe Flash supports Baseline, Main and High profiles" target="_blank" rel="nofollow">Adobe Flash on a desktop PC supports Baseline, Main, and High profiles</a>. YouTube (owned by Google, my employer) now uses H.264 to encode <a href="http://blog.wired.com/business/2008/12/youtube-adds-hd.html" title="high-definition videos" target="_blank" rel="nofollow">high-definition videos</a>, playable through Adobe Flash; YouTube also provides H.264-encoded video to mobile devices, including Apple&#8217;s iPhone and phones running Google&#8217;s <a href="http://code.google.com/android/" title="Android mobile operating system" target="_blank" rel="nofollow">Android mobile operating system</a>. Also, H.264 is one of the video codecs mandated by the Blu-Ray specification; Blu-Ray discs that use it generally use the High profile.</p>
<p>Most non-PC devices that play H.264 video (including iPhones and standalone Blu-Ray players) actually do the decoding on a dedicated chip, since their main CPUs are nowhere near powerful enough to decode the video in real-time. Recent high-end desktop graphics cards also support decoding H.264 in hardware. There are a number of <a href="http://compression.ru/video/codec_comparison/mpeg-4_avc_h264_2007_en.html" title="Competing H.264 encoders" target="_blank" rel="nofollow">competing H.264 encoders</a>, including the <a href="http://www.videolan.org/developers/x264.html" title="Open source x264 library" target="_blank" rel="nofollow">open source x264 library</a>. The <strong>H.264 standard is patent-encumbered</strong>; licensing is brokered through the <a href="http://www.mpegla.com/" title="MPEG LA Group" target="_blank" rel="nofollow">MPEG LA group</a>. H.264 video can be embedded in most popular container formats, including MP4 (used primarily by <a href="http://www.apple.com/itunes/whatson/movies.html" title="Apple iTunes Store" target="_blank" rel="nofollow">Apple&#8217;s iTunes Store</a>) and MKV (used primarily by video pirates).</p>
<h3>VC-1</h3>
<p><a href="http://en.wikipedia.org/wiki/VC-1" title="Wikipedia: VC-1" target="_blank" rel="nofollow">VC-1</a> evolved from Microsoft&#8217;s WMV9 codec and was <a href="http://www.betanews.com/article/Microsoft_VC1_Codec_Now_a_Standard/1144097224" title="Codec standardised in 2006" target="_blank" rel="nofollow">standardized in 2006</a>. It is primarily used and promoted by Microsoft for high-definition video, although, like H.264, it has <a href="http://www.microsoft.com/windows/windowsmedia/howto/articles/vc1techoverview.aspx#OverviewofVC1" title="VC-1 profiles" target="_blank" rel="nofollow">a range of profiles</a> to trade complexity for file size. Also like H.264, it is mandated by the Blu-Ray specification, and all Blu-Ray players are required to be able to decode it. <strong>The VC-1 codec is patent-encumbered</strong>, with licensing brokered through the <a href="http://www.mpegla.com/" title="MPEG LA Group" target="_blank" rel="nofollow">MPEG LA group</a>.</p>
<p>Wikipedia has a brief <a href="http://en.wikipedia.org/wiki/Comparison_of_H.264_and_VC-1" title="Wikipedia: Technical compariosn of VC-1 and H.264" target="_blank" rel="nofollow">technical comparison of VC-1 and H.264</a>; <a href="http://www.microsoft.com/windows/windowsmedia/howto/articles/vc1techoverview.aspx#VC1ComparedtoOtherCodecs" title="Microsoft comparison of VC-1 with other codecs" target="_blank" rel="nofollow">Microsoft has their own comparison</a>; Multimedia.cx has a <a href="http://wiki.multimedia.cx/index.php?title=H.264" title="Venn diagram outlining similarities and differences between codecs" target="_blank" rel="nofollow">pretty Venn diagram outlining the similarities</a> and differences. Multimedia.cx also discusses the <a href="http://wiki.multimedia.cx/index.php?title=VC-1" title="Technical features of VC-1" target="_blank" rel="nofollow">technical features of VC-1</a>. I also found this <a href="http://www.avsforum.com/avs-vb/showthread.php?p=9931723#post9931723" title="History of VC-1 and H.264" target="_blank" rel="nofollow">history of VC-1 and H.264</a> to be interesting (as well as <a href="http://archive2.avsforum.com/avs-vb/showthread.php?p=6594314#post6594314" title="Rebuttal" target="_blank" rel="nofollow">this rebuttal</a>).</p>
<p>VC-1 is designed to be container-independent, although it is most often embedded in an ASF container. An open source decoder for VC-1 video was a <a href="http://code.google.com/soc/2006/ffmpeg/appinfo.html?csaid=5AA777DB19E2BB24" title="2006 Google Summer of Code project" target="_blank" rel="nofollow">2006 Google Summer of Code project</a>, and the resulting code was added to the multi-faceted <a href="http://ffmpeg.mplayerhq.hu/" title="ffmpeg library" target="_blank" rel="nofollow">ffmpeg library</a>.</p>
<h3>Theora</h3>
<p><a href="http://en.wikipedia.org/wiki/Theora" title="Wikipedia: Theora" target="_blank" rel="nofollow">Theora</a> evolved from the VP3 codec and has subsequently been developed by the <a href="http://xiph.org/" title="Xiph Foundation" target="_blank" rel="nofollow">Xiph.org Foundation</a>. <strong>Theora is a royalty-free codec and is not encumbered by any known patents</strong> other than the original VP3 patents, which have been irrevocably licensed royalty-free. Although the standard has been &#8220;frozen&#8221; since 2004, the Theora project (which includes an open source reference encoder and decoder) <a href="http://lists.xiph.org/pipermail/theora-dev/2008-November/003736.html" title="Version 1.0 November 2008" target="_blank" rel="nofollow">only hit 1.0 in November 2008</a>.</p>
<p>Theora video can be embedded in any container format, although it is most often seen in an Ogg container. All major Linux distributions support Theora out-of-the-box, and <a href="https://developer.mozilla.org/web-tech/2008/10/14/firefox-31-beta-1-an-overview-of-features-for-web-developers/" title="Mozilla Firefox 3.1 includes native support for Theora video" target="_blank" rel="nofollow">Mozilla Firefox 3.1 will include native support for Theora video in an Ogg container</a>. And by &#8220;native&#8221;, I mean &#8220;available on all platforms without platform-specific plugins.&#8221; You can also play Theora video <a href="http://www.xiph.org/dshow/" title="Theora video on Windows" target="_blank" rel="nofollow">on Windows</a> or <a href="http://xiph.org/quicktime/" title="Theora video on Mac OS X" target="_blank" rel="nofollow">on Mac OS X</a> after installing Xiph.org&#8217;s open source decoder software.</p>
<p>The reference encoder included in Theora 1.0 is widely criticized for being slow and poor quality, but Theora 1.1 will include a new encoder that takes better advantage of Theora&#8217;s features, while staying backward-compatible with current decoders. (Info: <a href="http://web.mit.edu/xiphmont/Public/theora/demo.html" title="Demo 1" target="_blank" rel="nofollow">1</a>, <a href="http://web.mit.edu/xiphmont/Public/theora/demo2.html" title="Demo 2" target="_blank" rel="nofollow">2</a>, <a href="http://web.mit.edu/xiphmont/Public/theora/demo3.html" title="Demo 3" target="_blank" rel="nofollow">3</a>, <a href="http://web.mit.edu/xiphmont/Public/theora/demo4.html" title="Demo 4" target="_blank" rel="nofollow">4</a>, <a href="http://web.mit.edu/xiphmont/Public/theora/demo5.html" title="Demo 5" target="_blank" rel="nofollow">5</a>, <a href="http://svn.xiph.org/branches/theora-thusnelda/" title="source code" target="_blank" rel="nofollow">source code</a>.)</p>
<h3>Dirac</h3>
<p>Dirac was <a href="http://www.bbc.co.uk/rd/projects/dirac/" title="Dirac, developed by the BBC" target="_blank" rel="nofollow">developed by the BBC</a> to provide a royalty-free alternative to H.264 and VC-1 that the BBC could use to stream high-definition television content in Great Britain. Like H.264, Dirac aims to provide a single codec for the full spectrum of very low- and very high-bandwidth streaming. <strong>Dirac is not encumbered by any known patents</strong>, and there are two open source implementations, <a href="http://diracvideo.org/download/dirac-research/" title="Dirac research" target="_blank" rel="nofollow">dirac-research</a> (the BBC&#8217;s reference implementation) and <a href="http://www.diracvideo.org/download/schroedinger/" title="Schroedinger" target="_blank" rel="nofollow">Schroedinger</a> (optimized for speed).</p>
<p>The Dirac standard was only finalized in 2008, so there is very little mainstream use yet, although the <a href="http://www.ibc.org/cgi-bin/ibc_dailynews_cms.cgi?story_no=25368&#038;issue=4" title="Dirac used internally during the 2008 Olympics" target="_blank" rel="nofollow">BBC did use it internally during the 2008 Olympics</a>. Dirac-encoded video tracks can be embedded in several popular container formats, including <a href="http://www.diracvideo.org/wiki/index.php/DiracInISOM" title="MP4 format" target="_blank" rel="nofollow">MP4</a>, <a href="http://www.diracvideo.org/wiki/index.php/DiracInOgg" title="Ogg format" target="_blank" rel="nofollow">Ogg</a>, <a href="http://www.diracvideo.org/wiki/index.php/DiracInMatroska" title="MKV format" target="_blank" rel="nofollow">MKV</a>, and <a href="http://www.diracvideo.org/wiki/index.php/DiracInAVI" title="AVI format" target="_blank" rel="nofollow">AVI</a>. <a href="http://www.videolan.org/vlc/" title="VLC" target="_blank" rel="nofollow">VLC</a> 0.9.2 (<a href="http://www.diracvideo.org/node/19" title="VLC 0.9.2 released in September 2008" target="_blank" rel="nofollow">released in September 2008</a>) can play Dirac-encoded video within an Ogg or MP4 container.</p>
<p><strong>And on and on&#8230;</strong><br />
Of course, this is only scratching the surface of all the available video codecs. Video encoding goes way back, but my focus in this series is on the present and near-future, not the past. If you like, you can read about <a href="http://en.wikipedia.org/wiki/MPEG-2" title="Wikipedia: MPEG-2" target="_blank" rel="nofollow">MPEG-2</a> (used in DVDs), <a href="http://en.wikipedia.org/wiki/MPEG-1" title="Wikipedia: MPEG-1" target="_blank" rel="nofollow">MPEG-1</a> (used in Video CDs), older versions of Microsoft&#8217;s <a href="http://en.wikipedia.org/wiki/Windows_Media_Video#Windows_Media_Video" title="Wikipedia: Windows Media Video (WMV)" target="_blank" rel="nofollow">WMV</a> family, <a href="http://en.wikipedia.org/wiki/Sorenson_codec" title="Wikipedia: Sorenson codec" target="_blank" rel="nofollow">Sorenson</a>, <a href="http://en.wikipedia.org/wiki/Indeo" title="Wikipedia: Indeo" target="_blank" rel="nofollow">Indeo</a>, and <a href="http://en.wikipedia.org/wiki/Cinepak" title="Wikipedia: Cinepak" target="_blank" rel="nofollow">Cinepak</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.simonwhatley.co.uk/mark-pilgrim-a-gentle-introduction-to-video-encoding-lossy-video-codecs/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Optimise Your URLs for Web Crawlers and Indexing</title>
		<link>http://www.simonwhatley.co.uk/optimise-your-urls-for-web-crawlers-and-indexing</link>
		<comments>http://www.simonwhatley.co.uk/optimise-your-urls-for-web-crawlers-and-indexing#comments</comments>
		<pubDate>Thu, 08 Oct 2009 11:15:05 +0000</pubDate>
		<dc:creator>Simon</dc:creator>
				<category><![CDATA[Analytics]]></category>
		<category><![CDATA[Search]]></category>
		<category><![CDATA[Canonical]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Index]]></category>
		<category><![CDATA[Information retrieval]]></category>
		<category><![CDATA[Microsoft]]></category>
		<category><![CDATA[Robots exclusion standard]]></category>
		<category><![CDATA[search engine]]></category>
		<category><![CDATA[search engine optimisation]]></category>
		<category><![CDATA[search engines]]></category>
		<category><![CDATA[url]]></category>
		<category><![CDATA[URL redirection]]></category>
		<category><![CDATA[Web archiving]]></category>
		<category><![CDATA[web crawlers]]></category>
		<category><![CDATA[Web search engine]]></category>
		<category><![CDATA[webmaster]]></category>
		<category><![CDATA[world wide web]]></category>
		<category><![CDATA[Yahoo]]></category>

		<guid isPermaLink="false">http://www.simonwhatley.co.uk/?p=2011</guid>
		<description><![CDATA[Many questions about website architecture, crawling and indexing, and even ranking issues can be boiled down to one central issue: How easy is it for search engines to crawl your site?]]></description>
			<content:encoded><![CDATA[<p>Many questions about website architecture, crawling and indexing, and even ranking issues can be boiled down to one central issue: How easy is it for search engines to crawl your site?</p>
<p>The Internet is not simply a big place it is a huge place; new content is being created all the time. Google, Yahoo and Microsoft each have a finite number of resources, so when faced with the nearly-infinite quantity of content that&#8217;s available online, their various crawlers are only able to find and crawl a percentage of that content. Then, of all the content they&#8217;ve crawled, they&#8217;re only able to index a portion. Of course with the cheapness of storage, the search engines are able to index more and more content each day, but not at the pace the Web is growing.</p>
<p><abbr title="Universal Resource Locator">URL</abbr>s are like the bridges between your website and a search engine&#8217;s crawler: crawlers need to be able to find and cross those bridges (i.e., find and crawl your <abbr title="Universal Resource Locator">URL</abbr>s) in order to get to your site&#8217;s content. If your <abbr title="Universal Resource Locator">URL</abbr>s are complicated or redundant, crawlers are going to spend time tracing and retracing their steps; if your <abbr title="Universal Resource Locator">URL</abbr>s are organised and lead directly to distinct content, crawlers can spend their time accessing your content rather than crawling through empty pages, or crawling the same content over and over via different <abbr title="Universal Resource Locator">URL</abbr>s.</p>
<p><script type="text/javascript"><!--
google_ad_client = "pub-6475233631580417";
/* 468x60 Basic */
google_ad_slot = "7117418273";
google_ad_width = 468;
google_ad_height = 60;
//-->
</script>
<script type="text/javascript"
src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
</script></p>
<p>So, what can you do as a website developer or owner to reduce that labyrinth of <abbr title="Universal Resource Locator">URL</abbr>s and helping crawlers find more of your content faster? Below are a few ideas:</p>
<ul>
<li><strong>Remove unnecessary query string details from the URL.</strong><br />
Parameters in the <abbr title="Universal Resource Locator">URL</abbr> that don&#8217;t change the content of the page&#8211;like session <abbr title="Identity">ID</abbr>s or list sort orders&#8211;can be removed from the <abbr title="Universal Resource Locator">URL</abbr> and put into a cookie. By putting this information in a cookie and <a href="http://en.wikipedia.org/wiki/URL_redirection#HTTP_status_codes_3xx" title="Wikipedia: URL Redirection">301 redirecting</a> to a <q>clean</q> <abbr title="Universal Resource Locator">URL</abbr>, you retain the information and reduce the number of <abbr title="Universal Resource Locator">URL</abbr>s pointing to that same content.
</li>
<li><strong>Stop infinite pagination in, for example, lists and calendars.</strong><br />
If you have a calendar with infinite past and future dates or a list with infinite pagination you have what is described as an <q>infinite crawl space</q>, which is a huge burden on crawlers. To resolve the calendar issue, you can add no-follow attributes to links to dynamically created future calendar pages. When creating pagination links, disable previous and next links when the first and last pages are reached and redirect users to an appropriate page if the query string in the <abbr title="Universal Resource Locator">URL</abbr> is <q>hacked</q> (this may be a <q>page not found</q> static page).
</li>
<li><strong>Utilise the robots.txt file to prevent actions the web crawlers can&#8217;t or shouldn&#8217;t perform.</strong><br />
Using a <a href="http://www.robotstxt.org" title="Robots.txt" target="_blank" rel="nofollow">robots.txt</a> file, you can disallow crawling of login pages, contact forms, shopping carts, and other pages whose sole functionality is something that a crawler can&#8217;t and shouldn&#8217;t perform. This lets crawlers spend more of their time crawling content that they can actually do something with.
</li>
<li><strong>Prevent duplicate content.</strong><br />
An ideal scenario for crawlers is a one-to-one link between content an a <abbr title="Universal Resource Locator">URL</abbr>. Each <abbr title="Universal Resource Locator">URL</abbr> leads to a unique bit of content and each piece of content can be accessed by a unique <abbr title="Universal Resource Locator">URL</abbr>. The closer your site can get to this scenario, the more streamlined your site will be for crawling and indexing. If your CMS makes this difficult to achieve, you can use the <a href="/canonical-urls-what-are-they-all-about">canonical tag</a> to indicate a preferred <abbr title="Universal Resource Locator">URL</abbr> for duplicate content.
</li>
</ul>
<p>More information on this topic can be found on the <a href="http://sites.google.com/site/webmasterhelpforum/en/faq--crawling--indexing---ranking#duplicate-content" title="Google Webmaster Central Blog" target="_blank" rel="nofollow">Google Webmaster Central Blog</a>.</p>
<p><script type="text/javascript"><!--
google_ad_client = "pub-6475233631580417";
/* 468x60 Basic */
google_ad_slot = "7117418273";
google_ad_width = 468;
google_ad_height = 60;
//-->
</script>
<script type="text/javascript"
src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
</script></p>
]]></content:encoded>
			<wfw:commentRss>http://www.simonwhatley.co.uk/optimise-your-urls-for-web-crawlers-and-indexing/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Canonical URLs &#8211; What Are They All About?</title>
		<link>http://www.simonwhatley.co.uk/canonical-urls-what-are-they-all-about</link>
		<comments>http://www.simonwhatley.co.uk/canonical-urls-what-are-they-all-about#comments</comments>
		<pubDate>Wed, 07 Oct 2009 09:34:23 +0000</pubDate>
		<dc:creator>Simon</dc:creator>
				<category><![CDATA[Analytics]]></category>
		<category><![CDATA[Search]]></category>
		<category><![CDATA[Canonical]]></category>
		<category><![CDATA[Duplicate content]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Microsoft]]></category>
		<category><![CDATA[search engine]]></category>
		<category><![CDATA[search engine optimisation]]></category>
		<category><![CDATA[Search engine optimization]]></category>
		<category><![CDATA[search engines]]></category>
		<category><![CDATA[search results]]></category>
		<category><![CDATA[SEO]]></category>
		<category><![CDATA[Uniform Resource Identifier]]></category>
		<category><![CDATA[url]]></category>
		<category><![CDATA[web application]]></category>
		<category><![CDATA[Yahoo]]></category>

		<guid isPermaLink="false">http://www.simonwhatley.co.uk/?p=2043</guid>
		<description><![CDATA[Carpe diem on any duplicate content worries: Google, Yahoo and Microsoft now support a format that allows you to publicly specify your preferred version of a URL. If your site has identical or vastly similar content that’s accessible through multiple URLs, this format provides you with more control over the URL returned in search results. It also helps to make sure that properties such as link popularity are consolidated to your preferred version.]]></description>
			<content:encoded><![CDATA[<p>Google announced as long ago as February, in their official <a href="http://googlewebmastercentral.blogspot.com/2009/02/specify-your-canonical.html" title="" target="_blank" rel="nofollow ">Webmaster Central Blog</a> a new canonical <abbr title="Universal Resource Locator">URL</abbr> tag:</p>
<blockquote><p>Carpe diem on any duplicate content worries: we now support a format that allows you to publicly specify your preferred version of a URL. If your site has identical or vastly similar content that&#8217;s accessible through multiple URLs, this format provides you with more control over the URL returned in search results. It also helps to make sure that properties such as link popularity are consolidated to your preferred version.</p></blockquote>
<p>But what do they mean by <q>canonical</q>? One of the definitions of <q>canonical</q> is <q>reduced to the simplest and most significant form possible without loss of generality.</q></p>
<p>What this means is that if you have a page&#8211;let&#8217;s take an e-commerce product page&#8211;and the simplest <abbr title="Universal Resource Locator">URL</abbr> that you want it accessible by is:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">http://www.site.com/category/product.html</pre></div></div>

<p>you can add the canonical tag to that specific product. Google, Yahoo and Microsoft use this tag to tell their search engines which <abbr title="Universal Resource Locator">URL</abbr> it should have for the current page.</p>
<p>Now, let&#8217;s say that the particular software you use <strong>also</strong> allows you to access the same product using:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">http://www.site.com/company/product.html</pre></div></div>

<p>and</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">http://www.site.com/different_category/product.html</pre></div></div>

<p>Perhaps this one product is in multiple categories. With this tag in place when any of the alternate pages are loaded this tag notifies any search engine that this is really the same product as the page you defined in the canonical tag. So, you are still allowed to have the content available as generally needed (by categories, tags, or some other organisation system) and still avoid having the content duplicated and penalised.</p>
<p>To implement the canonical <abbr title="Universal Resource Locator">URL</abbr> tag in your web application, you simply need to do the following inside the <code>&lt;head&gt;</code> section of the duplicate content URLs:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">&lt;link rel=&quot;canonical&quot; href=&quot;http://www.site.com/category/product.html&quot; /&gt;</pre></div></div>

<p>As Google mention, this tag is a hint that they <q>honour strongly</q>. Google will take your preference into account, in conjunction with other signals, when calculating the most relevant page to display in search results.</p>
<p><script type="text/javascript"><!--
google_ad_client = "pub-6475233631580417";
/* 468x60 Basic */
google_ad_slot = "7117418273";
google_ad_width = 468;
google_ad_height = 60;
//-->
</script>
<script type="text/javascript"
src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
</script></p>
]]></content:encoded>
			<wfw:commentRss>http://www.simonwhatley.co.uk/canonical-urls-what-are-they-all-about/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>My Work Philosophy</title>
		<link>http://www.simonwhatley.co.uk/my-work-philosophy</link>
		<comments>http://www.simonwhatley.co.uk/my-work-philosophy#comments</comments>
		<pubDate>Thu, 05 Mar 2009 15:29:19 +0000</pubDate>
		<dc:creator>Simon</dc:creator>
				<category><![CDATA[Strategy]]></category>
		<category><![CDATA[Adobe]]></category>
		<category><![CDATA[api]]></category>
		<category><![CDATA[Asides]]></category>
		<category><![CDATA[Design]]></category>
		<category><![CDATA[Dev Opera]]></category>
		<category><![CDATA[Development]]></category>
		<category><![CDATA[Freelancing]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Google Code]]></category>
		<category><![CDATA[JavaScript]]></category>
		<category><![CDATA[Microsoft]]></category>
		<category><![CDATA[philosophy]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[unix]]></category>
		<category><![CDATA[web community]]></category>
		<category><![CDATA[work]]></category>
		<category><![CDATA[Yahoo]]></category>
		<category><![CDATA[zen]]></category>
		<category><![CDATA[Zoho]]></category>

		<guid isPermaLink="false">http://www.simonwhatley.co.uk/?p=1833</guid>
		<description><![CDATA[Okay, so many of the points below aren’t purely my philosophy, but ideas and principles I have picked up along the way throughout my [development] career. Some relate to the UNIX philosophy, or even the Zen of Python, but wherever they’re from, they can be applied to many other domains.]]></description>
			<content:encoded><![CDATA[<p>Okay, so many of the points below aren&#8217;t purely my philosophy, but ideas and principles I have picked up along the way throughout my [development] career. Some relate to the <a href="http://en.wikipedia.org/wiki/Unix_philosophy" title="Wikipedia: UNIX Philosophy" target="_blank" rel="nofollow">UNIX philosophy</a>, or even the <a href="http://www.python.org/dev/peps/pep-0020/" title="Zen of Python" target="_blank" rel="nofollow">Zen of Python</a>, but wherever they&#8217;re from, they can be applied to many other domains.</p>
<ul>
<li><strong>Don&#8217;t reinvent the wheel unless you really have to</strong>. Borrow code and ideas from elsewhere whenever it makes sense. The web community it great at sharing, just look at the various JavaScript libraries, the huge quantities of <abbr title="Application Programming Interface">API</abbr>s or indeed the major players&#8217; developer areas: <a href="http://code.google.com" title="Google Code" target="_blank" rel="nofollow">Google Code</a>, <a href="http://developer.yahoo.com" title="Yahoo! Developer Network" target="_blank" rel="nofollow">Yahoo! Developer Network</a>, <a href="https://developer.mozilla.org" title="Mozilla Developer Center" target="_blank" rel="nofollow">Mozilla Developer Center</a>, <a href="http://www.adobe.com/devnet/" title="Adobe Developer Connection" target="_blank" rel="nofollow">Adobe Developer Connection</a> and <a href="http://dev.opera.com" title="Dev Opera" target="_blank" rel="nofollow">Dev Opera</a> to name five I regularly refer to.</li>
<li><q><strong>Things should be as simple as possible, but no simpler</strong></q> (Einstein). This idea is really born out of and emphasised by <a href="http://gettingreal.37signals.com/" title="37Signals' Getting Real" target="_blank" rel="nofollow">37Signals&#8217; Getting Real book</a>. Commonly, 90% of people using an application only use 10% of it&#8217;s functionality. The key therefore is to find what people use most often and only build that functionality. If there is a requirement to add more, then sobeit. This can also apply to the code-level, the essence here being a balance between over- and under-engineering something.</li>
<li><strong>Do one thing well</strong> (The <q>UNIX philosophy</q>). It is better to do one thing well, than several second-rate. This could be at the code level &#8212; think encapsulation, coupling and cohesion &#8212; or indeed at the application level &#8212; you&#8217;re never going to beat Microsoft Word, but Google and Zoho have developed compelling alternatives, but with far less features. </li>
<li><strong>Don&#8217;t fret too much about performance</strong> &#8212; understand how to write efficient code and plan to optimise later if or when needed.</li>
<li><strong>Don&#8217;t try for perfection</strong> because <q>good enough</q> is often just that. This of course is a matter for conjecture. If I were working on a personal project, I may be more stringent on perfection than say, for a client&#8217;s application. This doesn&#8217;t mean to say the client&#8217;s application would be any worse, but rather it is a question of dotting-the-is and crossing-the-ts. It also depends on your perspective and what gains can be made by aiming for <q>perfection</q>.</li>
<li>(Hence) <strong>it&#8217;s okay to cut corners sometimes</strong>, only if you can do it right later. I rarely adhere to this! It makes sense to do it right the first time, since <q>bodge-jobs</q> often come back to haunt you and result in double the effort!</li>
<li><strong>Don&#8217;t fight it; go with the flow</strong>. This is somewhat clich&eacute;d, but the essence behind this is try to avoid getting stressed out. This isn&#8217;t always easy to achieve, but taking a step back from a situation and avoiding politics is important.</li>
</ul>
<p>I often strive for perfection, which isn&#8217;t an entirely clever pursuit since it is almost impossible to achieve. However, in a realm of imperfection, the principles above have helped me to achieve a modicum of decent code throughout the years. They may also resonate and provide inspiration for you.</p>
<p><script type="text/javascript"><!--
google_ad_client = "pub-6475233631580417";
/* 468x60 Basic */
google_ad_slot = "7117418273";
google_ad_width = 468;
google_ad_height = 60;
//-->
</script>
<script type="text/javascript"
src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
</script></p>
]]></content:encoded>
			<wfw:commentRss>http://www.simonwhatley.co.uk/my-work-philosophy/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Enabling Search Engine Safe URLs with Apache and htaccess</title>
		<link>http://www.simonwhatley.co.uk/enabling-search-engine-safe-urls-with-apache-and-htaccess</link>
		<comments>http://www.simonwhatley.co.uk/enabling-search-engine-safe-urls-with-apache-and-htaccess#comments</comments>
		<pubDate>Mon, 08 Dec 2008 15:57:15 +0000</pubDate>
		<dc:creator>Simon</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[Search]]></category>
		<category><![CDATA[.htaccess]]></category>
		<category><![CDATA[All]]></category>
		<category><![CDATA[Apache]]></category>
		<category><![CDATA[ColdBox]]></category>
		<category><![CDATA[ColdFusion]]></category>
		<category><![CDATA[Fusebox]]></category>
		<category><![CDATA[HTTP]]></category>
		<category><![CDATA[httpd.conf]]></category>
		<category><![CDATA[ISAPI]]></category>
		<category><![CDATA[Microsoft]]></category>
		<category><![CDATA[mod_rewrite]]></category>
		<category><![CDATA[New Brunswick]]></category>
		<category><![CDATA[None]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[search engine]]></category>
		<category><![CDATA[search engine optimisation]]></category>
		<category><![CDATA[search engine robots]]></category>
		<category><![CDATA[search engine safe]]></category>
		<category><![CDATA[url]]></category>
		<category><![CDATA[URL rewriting]]></category>
		<category><![CDATA[USD]]></category>
		<category><![CDATA[web applications]]></category>

		<guid isPermaLink="false">http://www.simonwhatley.co.uk/?p=1635</guid>
		<description><![CDATA[An increasingly popular technique among websites and in particular, blogs, is the idea of making URLs search engine friendly, or safe, on the premise that doing so will help search engine optimisation. By removing the obscure query string element of a URL and replacing it with keyword rich alternatives, not only makes it more readable for a human being, but also the venerable robots that allow our page content to be found in the first place.]]></description>
			<content:encoded><![CDATA[<p>An increasingly popular technique among websites and in particular, blogs, is the idea of making <abbr title="Universal Resource Locator">URL</abbr>s search engine friendly, or safe, on the premise that doing so will help search engine optimisation. By removing the obscure query string element of a <abbr title="Universal Resource Locator">URL</abbr> and replacing it with keyword rich alternatives, not only makes it more readable for a human being, but also the venerable robots that allow our page content to be found in the first place.</p>
<p>For example, the following is WordPress&#8217; default URL configuration for a post:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">http://www.domain.com/?p=1635</pre></div></div>

<p>However, buy using a URL-rewriting available in the Apache webserver, we can achieve a far better result, such as the following:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">http://www.domain.com/search-engine-safe-urls</pre></div></div>

<p>NB. It is also possible to achieve a similar result with an <abbr title="Internet Server Application Programming Interface">ISAPI</abbr> rewrite for Microsoft&#8217;s <abbr title="Internet Information Server">IIS</abbr> webserver, but this topic will not be included in this post.</p>
<p>To get your website working with <abbr title="search engine safe">SES</abbr> <abbr title="Universal Resource Locator">URL</abbr>s you need to enable both the <code>mod_rewite</code> module and <code>AllowOverride</code> directive in the Apache configuration file.</p>
<p>Uncomment (remove #) from the following to enable the re-write rule:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">LoadModule rewrite_module modules/mod_rewrite.so</pre></div></div>

<p>Change the <code>AllowOverride</code> directive from none to all</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">&lt;directory /&gt;
    Options FollowSymLinks
    AllowOverride all
    Order deny,allow
    Deny from all
&lt;/directory&gt;
&nbsp;
&lt;directory &quot;C:/WebRoot&quot;&gt;
    # Possible values for the Options directive are &quot;None&quot;, &quot;All&quot;,
    # or any combination of:
    #   Indexes Includes FollowSymLinks SymLinksifOwnerMatch ExecCGI MultiViews
    #
    # Note that &quot;MultiViews&quot; must be named *explicitly* --- &quot;Options All&quot;
    # doesn't give it to you.
    #
    # The Options directive is both complicated and important.  Please see
    # http://httpd.apache.org/docs/2.2/mod/core.html#options
    # for more information.
    #
    Options Indexes FollowSymLinks
&nbsp;
    #
    # AllowOverride controls what directives may be placed in .htaccess files.
    # It can be &quot;All&quot;, &quot;None&quot;, or any combination of the keywords:
    #   Options FileInfo AuthConfig Limit
    #
    AllowOverride All
&nbsp;
    #
    # Controls who can get stuff from this server.
    #
    Order allow,deny
    Allow from all
&lt;/directory&gt;</pre></div></div>

<p>On Apache webservers, <code>.htaccess</code> (hypertext access) is the default name of directory-level configuration files. An <code>.htaccess</code> file is placed in a particular directory, and the directives in the <code>.htaccess</code> file apply to that directory, and all its subdirectories. It provides the ability to customize configuration for requests to the particular directory. In our case, enabling search engine safe (<abbr title="search engine safe">SES</abbr>) <abbr title="Universal Resource Locator">URL</abbr>s.</p>
<p>By setting the <code>AllowOverride</code> directive to <q>All</q> in effect defers configuration settings to the <code>.htaccess</code> file.</p>
<p>An example <code>.htaccess</code> file could include the following code to rewrite the URLs:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">RewriteEngine On
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ index.php/$1 [L,QSA]</pre></div></div>

<p>Search engine friendly <abbr title="Universal Resource Locator">URL</abbr>s are implemented with Rewrite engines. The rewrite engine modifies the <abbr title="Universal Resource Locator">URL</abbr> based upon a number of rewrite conditions and rules.</p>
<p>The <code>RewriteBase</code> directive explicitly sets the base <abbr title="Universal Resource Locator">URL</abbr> for per-directory rewrites. The <code>RewriteCond</code> directive defines a rule condition, so in this case handling missing files or directories. Finally, the <code>RewriteRule</code> directive is the real rewriting workhorse. In this example, we&#8217;re getting everything in the <abbr title="Uniform Resource Identifier">URI</abbr> &#8212; i.e. not including the protocol (HTTP/S) and domain name &#8212; based upon a regular expression. This is then appended to the default file reference &#8212; index.php &#8212; as a <a href="http://www.regular-expressions.info/brackets.html" title="Regular Expression: back references" target="_blank" rel="nofollow">back reference</a>. The <code>[L,QSA]</code> refers to the rule being the last rule and append any query string parameters to the default file. It is important to note that this is all done on the server side, the user will never see the website address changing in the browser&#8217;s address bar. Furthermore, simply transposing the index.php filename with your default file name &#8212; e.g. index.cfm, default.aspx &#8212; will have the same result. Indeed, the above rewrite rules are becoming a de-facto standard for web applications.</p>
<p>To fully understand <code>mod_rewrite</code> rules above, look at the <a href="http://httpd.apache.org/docs/1.3/mod/mod_rewrite.html" title="Apache mod_rewrite documentation" target="_blank" rel="nofollow">Apache mod_rewrite documentation</a>.</p>
<p>Once you have your <abbr title="Search Engine Safe">SES</abbr> functionality in place on the webserver, it is then the responsibility of your application framework to understand the <abbr title="Universal Resource Locator">URL</abbr> construction and handle it accordingly. Fortunately, frameworks such as <a href="http://www.coldboxframework.com" title="ColdBox Framework" target="_blank" rel="nofollow">ColdBox</a> and <a href="http://www.fusebox.org" title="Fusebox Framework" target="_blank" rel="nofollow" >Fusebox</a> for ColdFusion, <a href="http://framework.zend.com" title="Zend PHP framework" target="_blank" rel="nofollow">Zend</a> and <a href="http://www.symfony-project.com" title="Symfony PHP fraemwork" target="_blank" rel="nofollow">Symfony</a> for <abbr title="PHP Hypertext Precursor">PHP</abbr>, all contain functionality to do this, but that is the subject of an entirely different post.</p>
<p>Users of web applications prefer short, neat <abbr title="Universal Resource Locator">URL</abbr>s to raw query string parameters. A concise <abbr title="Universal Resource Locator">URL</abbr> is easy to remember, and less time-consuming to type in. If the <abbr title="Universal Resource Locator">URL</abbr> can be made to relate clearly to the content of the page, then errors are not only less likely to happen, but our good friends the search engine robots are able to draw a stronger assumption of the pages&#8217; relevance and content.</p>
<p><script type="text/javascript"><!--
google_ad_client = "pub-6475233631580417";
/* 468x60 Basic */
google_ad_slot = "7117418273";
google_ad_width = 468;
google_ad_height = 60;
//-->
</script>
<script type="text/javascript"
src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
</script></p>
]]></content:encoded>
			<wfw:commentRss>http://www.simonwhatley.co.uk/enabling-search-engine-safe-urls-with-apache-and-htaccess/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Google, Yahoo and Microsoft Webmaster Tools</title>
		<link>http://www.simonwhatley.co.uk/google-yahoo-and-microsoft-webmaster-tools</link>
		<comments>http://www.simonwhatley.co.uk/google-yahoo-and-microsoft-webmaster-tools#comments</comments>
		<pubDate>Fri, 10 Oct 2008 11:25:29 +0000</pubDate>
		<dc:creator>Simon</dc:creator>
				<category><![CDATA[Search]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Microsoft]]></category>
		<category><![CDATA[optimisation]]></category>
		<category><![CDATA[sitemaps]]></category>
		<category><![CDATA[Tools]]></category>
		<category><![CDATA[webmasters]]></category>
		<category><![CDATA[website]]></category>
		<category><![CDATA[Yahoo]]></category>

		<guid isPermaLink="false">http://www.simonwhatley.co.uk/?p=1312</guid>
		<description><![CDATA[The first step to increasing your site’s visibility on the top search engines such as Google, Yahoo! and MSN is to help their respective robots crawl and index your site. To avoid undesirable content in the search indexes, webmasters can instruct spiders not to crawl certain files or directories through the standard robots.txt file. Conversely and importantly, webmasters can also notify the search engines about the existence and importance of pages with a sitemap.xml file]]></description>
			<content:encoded><![CDATA[<p>The first step to increasing your site&#8217;s visibility on the top search engines such as <a href="http://www.google.com" title="Google Search" target="_blank" rel="nofollow">Google</a>, <a href="http://www.yahoo.com" title="Yahoo! Search" target="_blank" rel="nofollow">Yahoo!</a> and <a href="http://www.msn.com" title="Microsoft Search" target="_blank" rel="nofollow">MSN</a> is to help their respective robots crawl and index your site.</p>
<p>To avoid undesirable content in the search indexes, webmasters can instruct spiders not to crawl certain files or directories through the standard robots.txt file. Conversely and importantly, webmasters can also notify the search engines about the existence and importance of pages with a sitemap.xml file. (Both files are placed in the root directory of the domain.)</p>
<p>Fortunately for the webmaster, the major search engines provide various tools to help manage both Sitemap and Robot files.</p>
<p>To gain an understanding of both &#8216;protocols&#8217;, I&#8217;ll discuss them briefly below.</p>
<h3>Sitemaps (Inclusion Protocol)</h3>
<p>The Sitemaps protocol allows a webmaster to inform search engines about <abbr title="Universal Resource Locator">URL</abbr>s on a website that are available for crawling. A Sitemap is an <abbr title="eXtensible Markup Language">XML</abbr> file that lists the <abbr title="Universal Resource Locator">URL</abbr>s for a site. It allows webmasters to include additional information about each <abbr title="Universal Resource Locator">URL</abbr>: when it was last updated, how often it changes, and how important it is in relation to other <abbr title="Universal Resource Locator">URL</abbr>s in the site. This allows search engines to crawl the site more intelligently. Sitemaps are a <abbr title="Universal Resource Locator">URL</abbr> inclusion protocol and complement robots.txt, a <abbr title="Universal Resource Locator">URL</abbr> exclusion protocol.</p>
<p>The webmaster can generate a Sitemap containing all accessible <abbr title="Universal Resource Locator">URL</abbr>s on the site and submit it to search engines. Since Google, MSN, Yahoo!, and Ask use the same protocol now, having a Sitemap would let the biggest search engines have the updated pages information.</p>
<p>Sitemaps supplement and do not replace the existing crawl-based mechanisms that search engines already use to discover <abbr title="Universal Resource Locator">URL</abbr>s. By submitting Sitemaps to a search engine, a webmaster is only helping that engine&#8217;s crawlers to do a better job of crawling their site(s). Using this protocol does not guarantee that web pages will be included in search indexes, nor does it influence the way that pages are ranked in search results.</p>
<p>The following is a cut-down version of the sitemap.xml for this website. WordPress, via a plugin, automatically updates this file each time a new post or page is written.</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">&lt;urlset xsi:schemaLocation=&quot;http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd&quot;&gt;
&lt;url&gt;
&lt;loc&gt;http://www.simonwhatley.co.uk/&lt;/loc&gt;
&lt;lastmod&gt;2008-10-08T14:50:16+00:00&lt;/lastmod&gt;
&lt;changefreq&gt;daily&lt;/changefreq&gt;
&lt;priority&gt;1.0&lt;/priority&gt;
&lt;/url&gt;
&lt;url&gt;
&lt;loc&gt;
http://www.simonwhatley.co.uk/big-city-little-people
&lt;/loc&gt;
&lt;lastmod&gt;2008-10-08T14:50:16+00:00&lt;/lastmod&gt;
&lt;changefreq&gt;monthly&lt;/changefreq&gt;
&lt;priority&gt;0.1&lt;/priority&gt;
&lt;/url&gt;
&lt;/urlset&gt;</pre></div></div>

<p>More information about sitemaps can be found on the <a href="http://www.sitemaps.org" title="Sitemaps.org website" target="_blank" rel="nofollow">Sitemaps.org website</a>.</p>
<p><script type="text/javascript"><!--
google_ad_client = "pub-6475233631580417";
/* 468x60 Basic */
google_ad_slot = "7117418273";
google_ad_width = 468;
google_ad_height = 60;
//-->
</script>
<script type="text/javascript"
src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
</script></p>
<h3>Robots (Exclusion Protocol)</h3>
<p>The robot exclusion standard, also known as the Robots Exclusion Protocol or robots.txt protocol, is a convention to prevent cooperating web spiders and other web robots from accessing all or part of a website which is otherwise publicly viewable. Robots are often used by search engines to categorise and archive web sites. The standard complements Sitemaps, a robot inclusion standard for websites.</p>
<p>A robots.txt file on a website will function as a request that specified robots ignore specified files or directories in their search. This might be, for example, out of a preference for privacy from search engine results, or the belief that the content of the selected directories might be misleading or irrelevant to the categorisation of the site as a whole.</p>
<p>The protocol, however, is purely advisory. It relies on the cooperation of the web robot, so that marking an area of a site out of bounds with robots.txt does not guarantee privacy. Some web site administrators have tried to use the robots file to make private parts of a website invisible to the rest of the world, but the file is necessarily publicly available and its content is easily checked by anyone with a web browser.</p>
<p>For example, the following tells all crawlers not to enter four directories of a website:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">User-agent: *
Disallow: /cgi-bin/
Disallow: /images/
Disallow: /tmp/
Disallow: /private/</pre></div></div>

<p>Exclusion can also be achieved on a page-level basis using a Meta-tag. This is a tag that would be placed in the <abbr title="HyperText Markup Language">HTML</abbr> head of of a web page. The <code>robots</code> attribute controls whether search engine spiders are allowed to index a page, or not, and whether they should follow links from a page, or not.</p>
<p>A common example could be as follows:</p>

<div class="wp_syntax"><div class="code"><pre class="html" style="font-family:monospace;">&lt;!DOCTYPE html PUBLIC &quot;-//W3C//DTD XHTML 1.0 Strict//EN&quot;
	&quot;http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd&quot;&gt;
&lt;html xmlns=&quot;http://www.w3.org/1999/xhtml&quot; dir=&quot;ltr&quot; lang=&quot;en-GB&quot; xml:lang=&quot;en&quot;&gt;
 &lt;head profile=&quot;http://gmpg.org/xfn/11&quot;&gt;
	&lt;title&gt;Simon Whatley&lt;/title&gt;
	&lt;meta http-equiv=&quot;robots&quot; content=&quot;index,follow&quot; /&gt;
&lt;/head&gt;
&lt;body&gt;
&lt;/body&gt;
&lt;/html&gt;</pre></div></div>

<p>A word of caution though, Meta tags are not the best option to prevent search engines from indexing content of your website.</p>
<p>More information about Robots.txt files can be found on the <a href="http://www.robotstxt.org/" title="Robotstxt.org website" target="_blank" rel="nofollow">Robotstxt.org website</a>.</p>
<p><script type="text/javascript"><!--
google_ad_client = "pub-6475233631580417";
/* 468x60 Basic */
google_ad_slot = "7117418273";
google_ad_width = 468;
google_ad_height = 60;
//-->
</script>
<script type="text/javascript"
src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
</script></p>
<h3>Webmaster Tools</h3>
<p>The top 3 search providers all have their own webmaster tools admin interface. The Google offering is the most advanced, but it&#8217;s good practice to use and submit information to all three.</p>
<p>Links to their services are provided below:</p>
<ul>
<li><a href="http://www.google.com/webmasters/" title="Google Webmasters" target="_blank" rel="nofollow">Google Webmasters</a></li>
<li><a href="http://siteexplorer.search.yahoo.com" title="Yahoo! Site Explorer" target="_blank" rel="nofollow">Yahoo! Site Explorer</a></li>
<li><a href="http://webmaster.live.com" title="Live Search Webmaster Centre" target="_blank" rel="nofollow">Live Search Webmaster Centre</a></li>
</ul>
<p>Ask doesn&#8217;t have an interface. However, you can still ping their Submission Service using the <abbr title="Universal Resource Locator">URL</abbr> <code>http://submissions.ask.com/ping?sitemap=</code> in conjunction with your sitemap <abbr title="Universal Resource Locator">URL</abbr>.</p>
<h3>Further Information</h3>
<ul>
<li><a href="http://sourceforge.net/project/showfiles.php?group_id=137793&#038;package_id=153422" title="SourceForge Project: Sitemap Generator" target="_blank" rel="nofollow">Sitemap Generator</a></li>
<li><a href="http://code.google.com/sm_thirdparty.html" title="Google Code: Sitemap 3rd Party porgrams and websites" target="_blank" rel="nofollow">Sitemap 3rd Party porgrams and websites</a></li>
<li><a href="http://www.webmaster-toolkit.com/" title="Webmaster Toolkit" target="_blank" rel="nofollow">Webmaster Toolkit</a></li>
</ul>
<p><script type="text/javascript"><!--
google_ad_client = "pub-6475233631580417";
/* 468x60 Basic */
google_ad_slot = "7117418273";
google_ad_width = 468;
google_ad_height = 60;
//-->
</script>
<script type="text/javascript"
src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
</script></p>
]]></content:encoded>
			<wfw:commentRss>http://www.simonwhatley.co.uk/google-yahoo-and-microsoft-webmaster-tools/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>SQL User-Defined Function: ReplaceChars</title>
		<link>http://www.simonwhatley.co.uk/sql-user-defined-function-replacechars</link>
		<comments>http://www.simonwhatley.co.uk/sql-user-defined-function-replacechars#comments</comments>
		<pubDate>Fri, 19 Sep 2008 14:46:13 +0000</pubDate>
		<dc:creator>Simon</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[database server]]></category>
		<category><![CDATA[extend]]></category>
		<category><![CDATA[fairly straight forward]]></category>
		<category><![CDATA[function]]></category>
		<category><![CDATA[Microsoft]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[SQL]]></category>
		<category><![CDATA[sub-routine]]></category>
		<category><![CDATA[subroutine]]></category>
		<category><![CDATA[t-sql]]></category>
		<category><![CDATA[UDF]]></category>
		<category><![CDATA[user defined function]]></category>

		<guid isPermaLink="false">http://www.simonwhatley.co.uk/?p=425</guid>
		<description><![CDATA[The SQL Replace function enables us to look for a certain character phrase in a string and replace it with another character phrase. The updated string is then returned by the function.]]></description>
			<content:encoded><![CDATA[<p>The <abbr title="Structured Query Language">SQL</abbr> <code>REPLACE</code> function enables us to look for a certain character phrase in a string and replace it with another character phrase. The updated string is then returned by the function.</p>
<p>The syntax for this string function is the same for <abbr title="Structured Query Language">SQL</abbr> Server, Oracle and Microsoft Access. The syntax is as follows:</p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;"><span style="color: #ff00ff;">REPLACE</span><span style="color: #66cc66;">&#40;</span>stringToLookIn, stringToMatch, replacementsString<span style="color: #66cc66;">&#41;</span></pre></div></div>

<p>The syntax is fairly straight forward, the <em>stringToMatch</em> parameter is the character phrase that we want to replace, the <em>replacementsString</em> is the character phrase that will replace any occurence of the stringToMatch parameter. If the stringToMatch phrase occurs more than once in the string, then all instances of the phrase will be replaced with the replacement string. If no matches were found then the string is returned unaltered.</p>
<p>If we want to match multiple items, we need to nest the <code>REPLACE</code> function:</p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;"><span style="color: #ff00ff;">REPLACE</span><span style="color: #66cc66;">&#40;</span><span style="color: #ff00ff;">REPLACE</span><span style="color: #66cc66;">&#40;</span>stringToLookIn, stringToMatch, replacementsString<span style="color: #66cc66;">&#41;</span>, stringToMatch, replacementsString<span style="color: #66cc66;">&#41;</span></pre></div></div>

<p>or set the replaced string into a new variable multiple times:</p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;">stringReturned <span style="color: #0000ff;">=</span> <span style="color: #ff00ff;">REPLACE</span><span style="color: #66cc66;">&#40;</span>stringToLookIn, stringToMatch, replacementsString<span style="color: #66cc66;">&#41;</span>
stringReturned <span style="color: #0000ff;">=</span> <span style="color: #ff00ff;">REPLACE</span><span style="color: #66cc66;">&#40;</span>stringReturned, stringToMatch, replacementsString<span style="color: #66cc66;">&#41;</span></pre></div></div>

<p>This is far from ideal, especially the more strings there are to be matched. This is where <a href="http://en.wikipedia.org/wiki/User-defined_function" title="Wikipedia: User-Defined Functions" target="_blank" rel="nofollow">User-Defined Functions</a> (<abbr title="User Defined Functions">UDF</abbr>s) can provide the answer.</p>
<p>A User-Defined Function, is a function provided by the user of a program or environment. In <abbr title="Structured Query Language">SQL</abbr> databases, a user-defined function provides a mechanism for extending the functionality of the database server by adding a function that can be evaluated in <abbr title="Structured Query Language">SQL</abbr> statements.</p>
<p><script type="text/javascript"><!--
google_ad_client = "pub-6475233631580417";
/* 468x60 Basic */
google_ad_slot = "7117418273";
google_ad_width = 468;
google_ad_height = 60;
//-->
</script>
<script type="text/javascript"
src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
</script></p>
<h3>The Function Code</h3>
<p>Below is the complete function definition:</p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;"><<span style="color: #0000ff;">|</span>/1/>CREATE</span> <<span style="color: #0000ff;">|</span>/1/>FUNCTION</span> dbo.udf_ReplaceChars
<span style="color: #66cc66;">&#40;</span>
@ReplaceList		<<span style="color: #0000ff;">|</span>/2/>VARCHAR</span><span style="color: #66cc66;">&#40;</span><<span style="color: #0000ff;">|</span> style<span style="color: #66cc66;">=</span>"color: #cc66cc;">50</span><span style="color: #66cc66;">&#41;</span>,
@String			<<span style="color: #0000ff;">|</span>/2/>VARCHAR</span><span style="color: #66cc66;">&#40;</span><<span style="color: #0000ff;">|</span> style<span style="color: #66cc66;">=</span>"color: #cc66cc;">100</span><span style="color: #66cc66;">&#41;</span>
<span style="color: #66cc66;">&#41;</span>
<<span style="color: #0000ff;">|</span>/1/>RETURNS</span> <<span style="color: #0000ff;">|</span>/2/>VARCHAR</span><span style="color: #66cc66;">&#40;</span><<span style="color: #0000ff;">|</span> style<span style="color: #66cc66;">=</span>"color: #cc66cc;">100</span><span style="color: #66cc66;">&#41;</span>
<<span style="color: #0000ff;">|</span>/1/>AS</span>
<<span style="color: #0000ff;">|</span>/1/>BEGIN</span>
	<<span style="color: #0000ff;">|</span>/1/>DECLARE</span>	@<<span style="color: #0000ff;">|</span>/2/>CHAR</span>		<<span style="color: #0000ff;">|</span>/2/>CHAR</span><span style="color: #66cc66;">&#40;</span><<span style="color: #0000ff;">|</span> style<span style="color: #66cc66;">=</span>"color: #cc66cc;">1</span><span style="color: #66cc66;">&#41;</span>,
		@Loop		<<span style="color: #0000ff;">|</span>/2/>INT</span>
&nbsp;
	<<span style="color: #0000ff;">|</span>/1/>SET</span> @Loop  <span style="color: #0000ff;">=</span> <<span style="color: #0000ff;">|</span> style<span style="color: #66cc66;">=</span>"color: #cc66cc;">0</span>
	<<span style="color: #0000ff;">|</span>/1/>WHILE</span> @Loop <span style="color: #66cc66;">&lt;</span> <span style="color: #0000ff;">=</span> <span style="color: #ff00ff;">LEN</span><span style="color: #66cc66;">&#40;</span>@ReplaceList<span style="color: #66cc66;">&#41;</span>
	<<span style="color: #0000ff;">|</span>/1/>BEGIN</span>
		<<span style="color: #0000ff;">|</span>/1/>SET</span>	@Loop <span style="color: #0000ff;">=</span> @Loop <span style="color: #0000ff;">+</span> <<span style="color: #0000ff;">|</span> style<span style="color: #66cc66;">=</span>"color: #cc66cc;">1</span>
		<<span style="color: #0000ff;">|</span>/1/>SET</span>	@<<span style="color: #0000ff;">|</span>/2/>CHAR</span> <span style="color: #0000ff;">=</span> <span style="color: #ff00ff;">SUBSTRING</span><span style="color: #66cc66;">&#40;</span>@ReplaceList, @Loop, <<span style="color: #0000ff;">|</span> style<span style="color: #66cc66;">=</span>"color: #cc66cc;">1</span><span style="color: #66cc66;">&#41;</span>
		<<span style="color: #0000ff;">|</span>/1/>SET</span>	@String <span style="color: #0000ff;">=</span> <span style="color: #ff00ff;">REPLACE</span><span style="color: #66cc66;">&#40;</span>@String, @<<span style="color: #0000ff;">|</span>/2/>CHAR</span>, <span style="color: #ff0000;">''</span><span style="color: #66cc66;">&#41;</span>
	<<span style="color: #0000ff;">|</span>/1/>END</span>
&nbsp;
	<<span style="color: #0000ff;">|</span>/1/>RETURN</span>		@String
&nbsp;
<<span style="color: #0000ff;">|</span>/1/>END</span>
<<span style="color: #0000ff;">|</span>/1/>GO</span></pre></div></div>

<p>The function simply loops over the replace list, finding each instance of the list item in the string in which we want to replace items. The new string is then returned out of the function.</p>
<h3>The Function In Use</h3>
<p>A very simple use of the replace function could be as follows:</p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;"><<span style="color: #0000ff;">|</span>/1/>SELECT</span> dbo.udf_ReplaceChars<span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">'=,/,&lt;,&gt;,@,~,#'</span>, columnName<span style="color: #66cc66;">&#41;</span> <<span style="color: #0000ff;">|</span>/1/>AS</span> newColumn, columnName
<<span style="color: #0000ff;">|</span>/1/>FROM</span> tableName</pre></div></div>

<p>The function is not restricted to <code>SELECT</code> statements. Below is an example of an <code>UPDATE</code> statement utilising a variable:</p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;"><<span style="color: #0000ff;">|</span>/1/>UPDATE</span> tableName
<<span style="color: #0000ff;">|</span>/1/>SET</span> columnName <span style="color: #0000ff;">=</span> dbo.udf_ReplaceChars<span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">'=,/,&lt;,&gt;,@,~,#'</span>, @variableName<span style="color: #66cc66;">&#41;</span>
<<span style="color: #0000ff;">|</span>/1/>WHERE</span> idName <span style="color: #0000ff;">=</span> @myId</pre></div></div>

<h3>Download the Code</h3>
<p><a href="/examples/sql/functions/udf_ReplaceChars.txt" title="Download the code">Download the code</a>, rename the file to .sql and run on your database instance. You will then be able to reference the function in your Stored Procedures.</p>
<p><script type="text/javascript"><!--
google_ad_client = "pub-6475233631580417";
/* 468x60 Basic */
google_ad_slot = "7117418273";
google_ad_width = 468;
google_ad_height = 60;
//-->
</script>
<script type="text/javascript"
src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
</script></p>
]]></content:encoded>
			<wfw:commentRss>http://www.simonwhatley.co.uk/sql-user-defined-function-replacechars/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>What&#039;s In Google Chrome&#039;s User-Agent String</title>
		<link>http://www.simonwhatley.co.uk/whats-in-google-chromes-user-agent-string</link>
		<comments>http://www.simonwhatley.co.uk/whats-in-google-chromes-user-agent-string#comments</comments>
		<pubDate>Fri, 12 Sep 2008 12:10:43 +0000</pubDate>
		<dc:creator>Simon</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[Technology]]></category>
		<category><![CDATA[Apple]]></category>
		<category><![CDATA[Browsers]]></category>
		<category><![CDATA[Chrome]]></category>
		<category><![CDATA[Chrome's address bar]]></category>
		<category><![CDATA[encryption]]></category>
		<category><![CDATA[Firefox]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Google Chrome]]></category>
		<category><![CDATA[Google Inc.]]></category>
		<category><![CDATA[HTTP]]></category>
		<category><![CDATA[HyperText Transfer Protocol]]></category>
		<category><![CDATA[Internet Explorer]]></category>
		<category><![CDATA[Internet users]]></category>
		<category><![CDATA[Microsoft]]></category>
		<category><![CDATA[Microsoft Vista]]></category>
		<category><![CDATA[Microsoft Windows]]></category>
		<category><![CDATA[mobile phones]]></category>
		<category><![CDATA[Mozilla]]></category>
		<category><![CDATA[Official Build Google Inc.]]></category>
		<category><![CDATA[Opera]]></category>
		<category><![CDATA[operating system]]></category>
		<category><![CDATA[Safari]]></category>
		<category><![CDATA[United States]]></category>
		<category><![CDATA[url]]></category>
		<category><![CDATA[User Agent]]></category>
		<category><![CDATA[web crawlers]]></category>
		<category><![CDATA[Web Standards era]]></category>
		<category><![CDATA[webmaster]]></category>
		<category><![CDATA[windowing system]]></category>
		<category><![CDATA[Windows NT]]></category>
		<category><![CDATA[X11]]></category>

		<guid isPermaLink="false">http://www.simonwhatley.co.uk/?p=1123</guid>
		<description><![CDATA[With the advent Google Chrome there has been a lot of media coverage regarding the browser’s uptake and how it will compete with Internet Explorer, Firefox and Safari. This is where the User Agent becomes most valuable.]]></description>
			<content:encoded><![CDATA[<p>With the advent <a href="http://www.google.com/chrome/" title="" target="_blank" rel="nofollow">Google Chrome</a> there has been a lot of media coverage regarding the browser&#8217;s uptake and how it will compete with Internet Explorer, Firefox and Safari. This is where the User Agent becomes most valuable. It can be used in analytics software to determine the browser share and consequently aid the development of the website.</p>
<p>But what is a User Agent? A User Agent is the client application used with a particular network protocol; the phrase is most commonly used in reference to those which access the Web. Web user agents range from web browsers and e-mail clients to search engine crawlers (<q>spiders</q>), as well as mobile phones, screen readers and braille browsers used by people with disabilities. When Internet users visit a web site, a text string is generally sent to identify the user agent to the server. This forms part of the <abbr title="HyperText Transfer Protocol">HTTP</abbr> request, prefixed with <strong>user-agent:</strong> and typically includes information such as the application name, version, host operating system, and language. Bots, such as web crawlers, often also include a <abbr title="Universal Resource Locator">URL</abbr> and/or e-mail address so that the webmaster can contact the operator of the bot.</p>
<p><script type="text/javascript"><!--
google_ad_client = "pub-6475233631580417";
/* 468x60 Basic */
google_ad_slot = "7117418273";
google_ad_width = 468;
google_ad_height = 60;
//-->
</script>
<script type="text/javascript"
src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
</script></p>
<p>By simply typing <strong>about:version</strong> into Chrome&#8217;s address bar you will be presented with the following information:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">Google Chrome
0.2.149.29 (1798)
Official Build
Google Inc.
Copyright © 2006-2008 Google Inc. All Rights Reserved.
Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.2.149.29 Safari/525.13</pre></div></div>

<p>As you can see Chrome&#8217;s version information provides limited detail about the browser. The last line is the important one. It is the <abbr title="HyperText Transfer Protocol">HTTP</abbr> <em>User-Agent</em> header:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.2.149.29 Safari/525.13.</pre></div></div>

<p>If you know the <a href="http://tools.ietf.org/html/rfc2616" title="RFC 2616 Hypertext Transfer Protocol - HTTP/1.1" target="_blank" rel="nofollow">RFC 2616</a> specification on the HyperText Transfer Protocol &#8212; which incidentally, I gladly don&#8217;t &#8212; you would know that the User Agent, or more formally, product token, should be short and to the point:</p>
<blockquote><p>
Product tokens SHOULD be short and to the point. They MUST NOT be used for advertising or other non-essential information. Although any token character MAY appear in a product-version, this token SHOULD only be used for a version identifier (i.e., successive versions of the same product SHOULD only differ in the product-version portion of  the product value).
</p></blockquote>
<p>Clearly this isn&#8217;t the case! One of Google&#8217;s reason&#8217;s behind creating the Chrome browser was to start afresh. It would have therefore been truely amazing if they had made the string simply <em>Chrome/0.2.149.27</em>.</p>
<p>Unfortunately, <a href="http://en.wikipedia.org/wiki/Browser_sniffing" title="Wikipedia: Browser Sniffing" target="_blank" rel="nofollow">browser sniffing</a> makes an ever-growing <abbr title="User-Agent">UA</abbr> string the path of least resistance for browser vendors.</p>
<p><script type="text/javascript"><!--
google_ad_client = "pub-6475233631580417";
/* 468x60 Basic */
google_ad_slot = "7117418273";
google_ad_width = 468;
google_ad_height = 60;
//-->
</script>
<script type="text/javascript"
src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
</script></p>
<p>So, what does Chrome&#8217;s User Agent string actually mean:</p>
<ul>
<li><strong>Mozilla/</strong> &#8211; This means that browser has the kind of capabilities that Netscape 1.1 had compared to <a href="http://en.wikipedia.org/wiki/Mosaic_(web_browser)" title="Wikipedia: Mosaic Web Browser" target="_blank" rel="nofollow">Mosaic</a> and <a href="http://en.wikipedia.org/wiki/Lynx_(web_browser)" title="Wikipedia: Lynx Web Browser" target="_blank" rel="nofollow">Lynx</a>.</li>
<li><strong>5.0</strong> &#8211; This means that the browser engine is from the post-Browser War Web Standards era as opposed to being from the Browser War era.</li>
<li><strong>(Windows;</strong> &#8211; This means that general windowing system flavor the browser runs on is Windows (as opposed to, for example, Apple and X11).</li>
<li><strong>U;</strong> &#8211; This means that the browser has at least the level of <a href="http://en.wikipedia.org/wiki/User_agent#Encryption_strength_.22U.22_.2F_.22I.22_.2F_.22N.22" title="Wikipedia: Encryption Strength" target="_blank" rel="nofollow">cryptographic capability / encryption strength</a> that U.S. versions of browsers had in the late 1990s.</li>
<li><strong>Windows NT 6.0;</strong> &#8211; This indicates the operating system the browser is running on. In this instance, the browser is running on Vista.</li>
<li><strong>en-US)</strong> &#8211; This indicates the user interface language of the browser (U.S. English in this case). This may be used to choose between different <em>content</em> languages even though <abbr title="HyperText Transfer Protocol">HTTP</abbr> has a different header for that purpose.</li>
<li><strong>AppleWebKit/</strong> &#8211; This indicates that the engine of the browser is <a href="http://webkit.org/" title="Webkit opensource project" target="_blank" rel="nofollow">WebKit</a> as opposed to being <a href="http://developer.mozilla.org/en/Gecko" title="Mozilla: Gecko Layout Engine" target="_blank" rel="nofollow">Gecko</a>. Developers should not do user agent sniffing as a rule, but if they still do, this is what they should be sniffing.</li>
<li><strong>525.13</strong> &#8211; This is the WebKit version from which Chrome branched its copy. Site admins could use this to detect old versions with known bugs.</li>
<li><strong>(KHTML, like Gecko)</strong> &#8211; This introduces the substring <q>Gecko</q> into the <abbr title="User-Agent">UA</abbr> string while pointing out to human readers that Webkit was forked from <a href="http://en.wikipedia.org/wiki/KHTML" title="Wikipedia: KHTML" target="_blank" rel="nofollow">KHTML</a>. Without this substring, Chrome might be put in the same category as <abbr title="Internet Explorer">IE</abbr> and Netscape 4.</li>
<li><strong>Chrome/</strong> &#8211; This string identifies the browser as actually Google Chrome.</li>
<li><strong>0.2.149.27</strong> &#8211; This is the Chrome version. This could be used to detect old versions with known bugs.</li>
<li><strong>Safari/</strong> &#8211; This means that the browser is like Safari as opposed to being like Firefox.</li>
<li><strong>525.13</strong> &#8211; This just repeats the WebKit version in order to have <em>some</em> version but not the irrelevant Safari.app version.</li>
</ul>
<p><script type="text/javascript"><!--
google_ad_client = "pub-6475233631580417";
/* 468x60 Basic */
google_ad_slot = "7117418273";
google_ad_width = 468;
google_ad_height = 60;
//-->
</script>
<script type="text/javascript"
src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
</script></p>
]]></content:encoded>
			<wfw:commentRss>http://www.simonwhatley.co.uk/whats-in-google-chromes-user-agent-string/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Microsoft&#039;s Latest Browser Packaged with &#039;Porn Mode&#039;</title>
		<link>http://www.simonwhatley.co.uk/microsofts-latest-browser-packaged-with-porn-mode</link>
		<comments>http://www.simonwhatley.co.uk/microsofts-latest-browser-packaged-with-porn-mode#comments</comments>
		<pubDate>Mon, 01 Sep 2008 17:58:39 +0000</pubDate>
		<dc:creator>Simon</dc:creator>
				<category><![CDATA[Technology]]></category>
		<category><![CDATA[advertisers]]></category>
		<category><![CDATA[adverts]]></category>
		<category><![CDATA[Browsers]]></category>
		<category><![CDATA[Computer Weekly]]></category>
		<category><![CDATA[content network]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[IE8]]></category>
		<category><![CDATA[Internet Explorer]]></category>
		<category><![CDATA[Internet Explorer 8]]></category>
		<category><![CDATA[Microsoft]]></category>
		<category><![CDATA[online advertisers]]></category>
		<category><![CDATA[revenue]]></category>
		<category><![CDATA[web browsing]]></category>

		<guid isPermaLink="false">http://www.simonwhatley.co.uk/?p=1021</guid>
		<description><![CDATA[This weeks Computer Weekly magazine's Downtime section has an interesting story: In what is likely to be better news for men than women, Microsoft's latest browser, Internet Explorer 8, boasts a feature that allows users to hide the trail of their web browsing.]]></description>
			<content:encoded><![CDATA[<p>This weeks <a href="http://www.computerweekly.com/" title="Computer Weekly" target="_blank" rel="nofollow">Computer Weekly</a> magazine&#8217;s Downtime section has an interesting story:</p>
<blockquote><p>In what is likely to be better news for men than women, Microsoft&#8217;s latest browser, Internet Explorer 8, boasts a feature that allows users to hide the trail of their web browsing.</p></blockquote>
<p>The feature, predictably nicknamed &#8220;porn mode&#8221;, stops casual users and, crucially, online advertisers from seeing a browser&#8217;s audit trail. This means that advertisers cannot easily target adverts based upon a user&#8217;s viewing habits and conversely, advert providers, such as Google&#8217;s Adsense, cannot easily reimburse members of their content network since they do not know where the user clicked an advert.</p>
<p><script type="text/javascript"><!--
google_ad_client = "pub-6475233631580417";
/* 468x60 Basic */
google_ad_slot = "7117418273";
google_ad_width = 468;
google_ad_height = 60;
//-->
</script>
<script type="text/javascript"
src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
</script></p>
]]></content:encoded>
			<wfw:commentRss>http://www.simonwhatley.co.uk/microsofts-latest-browser-packaged-with-porn-mode/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Launching Yourself as a Freelancer &#8211; Publicity</title>
		<link>http://www.simonwhatley.co.uk/launching-yourself-as-a-freelancer-publicity</link>
		<comments>http://www.simonwhatley.co.uk/launching-yourself-as-a-freelancer-publicity#comments</comments>
		<pubDate>Wed, 06 Aug 2008 11:44:05 +0000</pubDate>
		<dc:creator>Simon</dc:creator>
				<category><![CDATA[Strategy]]></category>
		<category><![CDATA[Adobe]]></category>
		<category><![CDATA[aggregators]]></category>
		<category><![CDATA[blog]]></category>
		<category><![CDATA[blog owner]]></category>
		<category><![CDATA[brand]]></category>
		<category><![CDATA[branding]]></category>
		<category><![CDATA[brightkite]]></category>
		<category><![CDATA[ColdFusion]]></category>
		<category><![CDATA[contractor]]></category>
		<category><![CDATA[demo example applications]]></category>
		<category><![CDATA[europe]]></category>
		<category><![CDATA[freelance]]></category>
		<category><![CDATA[freelancer]]></category>
		<category><![CDATA[Freelancing]]></category>
		<category><![CDATA[FriendFeed]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[micro-blogging]]></category>
		<category><![CDATA[micro-blogging services]]></category>
		<category><![CDATA[Microsoft]]></category>
		<category><![CDATA[moveabletype expressionweb]]></category>
		<category><![CDATA[online presence]]></category>
		<category><![CDATA[online world]]></category>
		<category><![CDATA[plurk]]></category>
		<category><![CDATA[search engines]]></category>
		<category><![CDATA[socialthing]]></category>
		<category><![CDATA[temporary]]></category>
		<category><![CDATA[tumblr]]></category>
		<category><![CDATA[Twitter]]></category>
		<category><![CDATA[typepad]]></category>
		<category><![CDATA[united kingdom]]></category>
		<category><![CDATA[United States]]></category>
		<category><![CDATA[wordpress]]></category>
		<category><![CDATA[Yahoo]]></category>

		<guid isPermaLink="false">http://www.simonwhatley.co.uk/?p=752</guid>
		<description><![CDATA[In the first part of this series I talked about setting yourself up in business. The next step is to publicise yourself and your skills. At this point, it is helpful to know exactly what line of work you want to be focusing on, since you will need to target your efforts.]]></description>
			<content:encoded><![CDATA[<p>In the <a href="/launching-yourself-as-a-freelancer" title="Launching Yourself as a Freelancer">first part of this series</a> I talked about setting yourself up in business. The next step is to publicise yourself and your skills. At this point, it is helpful to know exactly what line of work you want to be focusing on, since you will need to target your efforts.</p>
<p>In the dim and distant past, the job of publicising yourself was extremely difficult. Can you imagine life without the Internet, mobile telephones and email? How did people ever do business? With the advent of the World Wide Web and in particular search engines and blogging, this all changed and a wealth of opportunity has become available, especially to the freelancer.</p>
<p>But where do you start?</p>
<h3>Create a Brand</h3>
<p>Creating a brand is a great way to market yourself. This does not have to be the same as your company, and through time you may set up different brands for different sectors or ideas you may have. Brands serve to create associations and therefore, expectations of products you create, so a good brand name is a great way to get recognised in your community.</p>
<p>You can <a href="http://www.ipo.gov.uk" title="UK Intellectual Property Office" target="_blank" rel="nofollow">register the brand</a> in the <acronym title="United Kingdom">UK</acronym>, Europe and the <acronym title="United States">US</acronym>, although the latter requires a <acronym title="United States">US</acronym> address. It is also not a given that your brand registration will be successful, making it a costly exercise. Careful consideration is what is needed here.</p>
<h3>Create an Avatar</h3>
<p>Avatars are images or icons that represent you in the online world. They are an extension of your brand. For example, the header of my website is also my <a href="http://en.wikipedia.org/wiki/Favicon" title="Wikipedia: Favicon" target="_blank" rel="nofollow">favicon</a> and <a href="http://en.wikipedia.org/wiki/Avatar_(computing)" title="Wikipedia: Avatar" target="_blank" rel="nofollow">avatar</a> on various online services. It is a great way for people to draw an association between your online presence and you.</p>
<p><script type="text/javascript"><!--
google_ad_client = "pub-6475233631580417";
/* 468x60 Basic */
google_ad_slot = "7117418273";
google_ad_width = 468;
google_ad_height = 60;
//-->
</script>
<script type="text/javascript"
src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
</script></p>
<h3>Create a Blog</h3>
<p>Blogs are a great way to get yourself known and therefore heard amongst your peer group. Your blog should really be an extension of your brand and is a great avenue to showcase your skills, demo example applications, code and designs, or simply give your opinion on a subject.</p>
<p>I use the excellent <a href="http://wordpress.org" title="WordPress" target="_blank" rel="nofollow">WordPress</a> blogging application, in a self-hosted environment. You don&#8217;t need to do this since there is a hosted version at <a href="http://www.wordpress.com" title="WordPress.com" target="_blank" rel="nofollow">WordPress.com</a>, or you could use <a href="http://www.blogger.com" title="Blogger" target="_blank" rel="nofollow">Blogger</a>, another popular blogging platform, provided by Google.</p>
<p>The key to blogging is talk about what you enjoy, don&#8217;t just keep it technical. Blogs should be an extension of you, not an avenue for pretentious comment; you&#8217;ll soon be found out!</p>
<p>If you go the self-hosted route, you&#8217;ll need a domain name, hosting provider and obviously a blog application. I have listed a few below that can get you started.</p>
<p>Domain Names:</p>
<ul>
<li>
<a href="http://www.nominet.org.uk/" title="Nominet" target="_blank" rel="nofollow">Nominet</a></li>
<li>
<a href="http://www.easily.co.uk" title="Easily" target="_blank" rel="nofollow">Easily</a></li>
<li>
<a href="http://www.eurodns.com" title="EuroDNS" target="_blank" rel="nofollow">EuroDNS</a></li>
</ul>
<p>Hosting Providers:</p>
<ul>
<li>
<a href="http://www.hostmysite.com" title="HostMySite" target="_blank" rel="nofollow">HostMySite</a></li>
<li>
<a href="http://www.titanhosts.net" title="Titan Internet" target="_blank" rel="nofollow">Titan Internet</a></li>
<li>
<a href="http://www.flinthosts.co.uk" title="Flint Hosts" target="_blank" rel="nofollow">Flint Hosts</a></li>
<li>
<a href="http://www.ukhost4u.co.uk" title="UKHost4U" target="_blank" rel="nofollow">UKHost4U</a></li>
<li>
<a href="http://www.1and1.co.uk" title="1and1" target="_blank" rel="nofollow">1and1</a></li>
</ul>
<p>Blog Applications:</p>
<ul>
<li>
<a href="http://wordpress.org" title="WordPress.org" target="_blank" rel="nofollow">WordPress</a> (free)</li>
<li>
<a href="http://www.movabletype.org" title="Moveable Type" target="_blank" rel="nofollow">Moveable Type</a> (free)</li>
<li>
<a href="http://expressionengine.com" title="ExpressionEngine" target="_blank" rel="nofollow">ExpressionEngine</a> (free)</li>
<li>
<a href="http://www.typepad.com" title="TypePad" target="_blank" rel="nofollow">TypePad</a></li>
</ul>
<p>If going the self-hosted is all too complicated for you or you simply don&#8217;t want the hassle that is associated with self-hosting, all is not lost. WordPress.com and Blogger are for you.</p>
<p>Blog Hosting Providers:</p>
<ul>
<li>
<a href="http://www.wordpress.com" title="WordPress.com" target="_blank" rel="nofollow">WordPress.com</a></li>
<li>
<a href="http://www.blogger.com" title="Blogger" target="_blank" rel="nofollow">Blogger</a></li>
</ul>
<p>Both services take the onus away from the user when it comes to management (backups, plugins etc). At the simplest level, all you need to do is create and publish the content.</p>
<h3>Join feed aggregators</h3>
<p>To get noticed in the blogosphere, you can&#8217;t simply rely on the Google, Yahoo! and Microsoft search engines ranking your site. You will need to alert your peers to the fact that you&#8217;ve created some content that is worth reading. You can achieve this with feed aggregators.</p>
<p>Below I list a few that I use:</p>
<ul>
<li>
<a href="http://feeds.adobe.com" title="Adobe Feeds" target="_blank" rel="nofollow">Adobe</a></li>
<li>
<a href="http://www.fullasagoog.com" title="Full as a Goog" target="_blank" rel="nofollow">Full-as-a-Goog</a></li>
<li>
<a href="http://coldfusionbloggers.org" title="ColdFusion Bloggers" target="_blank" rel="nofollow">ColdFusionBloggers</a></li>
<li>
<a href="http://www.feed-squirrel.com" title="Feed Squirrel" target="_blank" rel="nofollow">Feed Squirrel</a></li>
<li>
<a href="http://londonbloggers.iamcal.com" title="London Bloggers" target="_blank" rel="nofollow">London Bloggers</a></li>
</ul>
<p>If you use WordPress, then you&#8217;re in luck. WordPress has a service called <a href="http://pingomatic.com" title="Ping-o-matic!" target="_blank" rel="nofollow">Ping-o-matic</a>, which updates different search engines when your blog has been updated. You can also add your own services to ping and therefore notify the service of new content.</p>
<h3>Comment on Blogs</h3>
<p>Commenting on blogs is another great way of getting yourself known as well as offering an opinion. Since comments allow you to include a link back to your website, try and comment as your brand.</p>
<p>One tip, try not to be defamatory towards the blog owner, or others unless you have a strong justification for doing so. It&#8217;s all about the karma!</p>
<h3>Join Micro-Blogging Services</h3>
<p>If blogging is not your thing or you don&#8217;t have time to write articles, there are a number of blogging and, more importantly, <a href="http://en.wikipedia.org/wiki/Micro-blogging" title="Wikipedia: Micro-Blogging" target="_blank" rel="nofollow">micro-blogging</a> services available to you that allow you to get your thoughts out into the wide-world.</p>
<p>Such services include the not-always-venerable <a href="http://twitter.com" title="Twitter" target="_blank" rel="nofollow">Twitter</a>, the feature rich <a href="http://pownce.com" title="Pownce" target="_blank" rel="nofollow">Pownce</a>, the new kid on the block <a href="http://www.plurk.com" title="Plurk" target="_blank" rel="nofollow">Plurk</a> and the blogging service, <a href="http://www.tumblr.com" title="Tumblr" target="_blank" rel="nofollow">Tumblr</a>.</p>
<p>Building a following will allow you to announce to your followers important events and ask questions of them.</p>
<h3>What&#8217;s Next</h3>
<p>In the next part of this series, I&#8217;ll talk about networking, a natural extension to publicising yourself on the web.</p>
<p><script type="text/javascript"><!--
google_ad_client = "pub-6475233631580417";
/* 468x60 Basic */
google_ad_slot = "7117418273";
google_ad_width = 468;
google_ad_height = 60;
//-->
</script>
<script type="text/javascript"
src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
</script></p>
]]></content:encoded>
			<wfw:commentRss>http://www.simonwhatley.co.uk/launching-yourself-as-a-freelancer-publicity/feed</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
	</channel>
</rss>

