<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Simon Whatley &#187; webmasters</title>
	<atom:link href="http://www.simonwhatley.co.uk/tag/webmasters/feed" rel="self" type="application/rss+xml" />
	<link>http://www.simonwhatley.co.uk</link>
	<description>The opposite of every great idea is another great idea</description>
	<lastBuildDate>Wed, 02 Nov 2011 09:28:34 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Google, Yahoo and Microsoft Webmaster Tools</title>
		<link>http://www.simonwhatley.co.uk/google-yahoo-and-microsoft-webmaster-tools</link>
		<comments>http://www.simonwhatley.co.uk/google-yahoo-and-microsoft-webmaster-tools#comments</comments>
		<pubDate>Fri, 10 Oct 2008 11:25:29 +0000</pubDate>
		<dc:creator>Simon</dc:creator>
				<category><![CDATA[Search]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Microsoft]]></category>
		<category><![CDATA[optimisation]]></category>
		<category><![CDATA[sitemaps]]></category>
		<category><![CDATA[Tools]]></category>
		<category><![CDATA[webmasters]]></category>
		<category><![CDATA[website]]></category>
		<category><![CDATA[Yahoo]]></category>

		<guid isPermaLink="false">http://www.simonwhatley.co.uk/?p=1312</guid>
		<description><![CDATA[The first step to increasing your site’s visibility on the top search engines such as Google, Yahoo! and MSN is to help their respective robots crawl and index your site. To avoid undesirable content in the search indexes, webmasters can instruct spiders not to crawl certain files or directories through the standard robots.txt file. Conversely and importantly, webmasters can also notify the search engines about the existence and importance of pages with a sitemap.xml file]]></description>
			<content:encoded><![CDATA[<p>The first step to increasing your site&#8217;s visibility on the top search engines such as <a href="http://www.google.com" title="Google Search" target="_blank" rel="nofollow">Google</a>, <a href="http://www.yahoo.com" title="Yahoo! Search" target="_blank" rel="nofollow">Yahoo!</a> and <a href="http://www.msn.com" title="Microsoft Search" target="_blank" rel="nofollow">MSN</a> is to help their respective robots crawl and index your site.</p>
<p>To avoid undesirable content in the search indexes, webmasters can instruct spiders not to crawl certain files or directories through the standard robots.txt file. Conversely and importantly, webmasters can also notify the search engines about the existence and importance of pages with a sitemap.xml file. (Both files are placed in the root directory of the domain.)</p>
<p>Fortunately for the webmaster, the major search engines provide various tools to help manage both Sitemap and Robot files.</p>
<p>To gain an understanding of both &#8216;protocols&#8217;, I&#8217;ll discuss them briefly below.</p>
<h3>Sitemaps (Inclusion Protocol)</h3>
<p>The Sitemaps protocol allows a webmaster to inform search engines about <abbr title="Universal Resource Locator">URL</abbr>s on a website that are available for crawling. A Sitemap is an <abbr title="eXtensible Markup Language">XML</abbr> file that lists the <abbr title="Universal Resource Locator">URL</abbr>s for a site. It allows webmasters to include additional information about each <abbr title="Universal Resource Locator">URL</abbr>: when it was last updated, how often it changes, and how important it is in relation to other <abbr title="Universal Resource Locator">URL</abbr>s in the site. This allows search engines to crawl the site more intelligently. Sitemaps are a <abbr title="Universal Resource Locator">URL</abbr> inclusion protocol and complement robots.txt, a <abbr title="Universal Resource Locator">URL</abbr> exclusion protocol.</p>
<p>The webmaster can generate a Sitemap containing all accessible <abbr title="Universal Resource Locator">URL</abbr>s on the site and submit it to search engines. Since Google, MSN, Yahoo!, and Ask use the same protocol now, having a Sitemap would let the biggest search engines have the updated pages information.</p>
<p>Sitemaps supplement and do not replace the existing crawl-based mechanisms that search engines already use to discover <abbr title="Universal Resource Locator">URL</abbr>s. By submitting Sitemaps to a search engine, a webmaster is only helping that engine&#8217;s crawlers to do a better job of crawling their site(s). Using this protocol does not guarantee that web pages will be included in search indexes, nor does it influence the way that pages are ranked in search results.</p>
<p>The following is a cut-down version of the sitemap.xml for this website. WordPress, via a plugin, automatically updates this file each time a new post or page is written.</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">&lt;urlset xsi:schemaLocation=&quot;http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd&quot;&gt;
&lt;url&gt;
&lt;loc&gt;http://www.simonwhatley.co.uk/&lt;/loc&gt;
&lt;lastmod&gt;2008-10-08T14:50:16+00:00&lt;/lastmod&gt;
&lt;changefreq&gt;daily&lt;/changefreq&gt;
&lt;priority&gt;1.0&lt;/priority&gt;
&lt;/url&gt;
&lt;url&gt;
&lt;loc&gt;
http://www.simonwhatley.co.uk/big-city-little-people
&lt;/loc&gt;
&lt;lastmod&gt;2008-10-08T14:50:16+00:00&lt;/lastmod&gt;
&lt;changefreq&gt;monthly&lt;/changefreq&gt;
&lt;priority&gt;0.1&lt;/priority&gt;
&lt;/url&gt;
&lt;/urlset&gt;</pre></div></div>

<p>More information about sitemaps can be found on the <a href="http://www.sitemaps.org" title="Sitemaps.org website" target="_blank" rel="nofollow">Sitemaps.org website</a>.</p>
<p><script type="text/javascript"><!--
google_ad_client = "pub-6475233631580417";
/* 468x60 Basic */
google_ad_slot = "7117418273";
google_ad_width = 468;
google_ad_height = 60;
//-->
</script>
<script type="text/javascript"
src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
</script></p>
<h3>Robots (Exclusion Protocol)</h3>
<p>The robot exclusion standard, also known as the Robots Exclusion Protocol or robots.txt protocol, is a convention to prevent cooperating web spiders and other web robots from accessing all or part of a website which is otherwise publicly viewable. Robots are often used by search engines to categorise and archive web sites. The standard complements Sitemaps, a robot inclusion standard for websites.</p>
<p>A robots.txt file on a website will function as a request that specified robots ignore specified files or directories in their search. This might be, for example, out of a preference for privacy from search engine results, or the belief that the content of the selected directories might be misleading or irrelevant to the categorisation of the site as a whole.</p>
<p>The protocol, however, is purely advisory. It relies on the cooperation of the web robot, so that marking an area of a site out of bounds with robots.txt does not guarantee privacy. Some web site administrators have tried to use the robots file to make private parts of a website invisible to the rest of the world, but the file is necessarily publicly available and its content is easily checked by anyone with a web browser.</p>
<p>For example, the following tells all crawlers not to enter four directories of a website:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">User-agent: *
Disallow: /cgi-bin/
Disallow: /images/
Disallow: /tmp/
Disallow: /private/</pre></div></div>

<p>Exclusion can also be achieved on a page-level basis using a Meta-tag. This is a tag that would be placed in the <abbr title="HyperText Markup Language">HTML</abbr> head of of a web page. The <code>robots</code> attribute controls whether search engine spiders are allowed to index a page, or not, and whether they should follow links from a page, or not.</p>
<p>A common example could be as follows:</p>

<div class="wp_syntax"><div class="code"><pre class="html" style="font-family:monospace;">&lt;!DOCTYPE html PUBLIC &quot;-//W3C//DTD XHTML 1.0 Strict//EN&quot;
	&quot;http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd&quot;&gt;
&lt;html xmlns=&quot;http://www.w3.org/1999/xhtml&quot; dir=&quot;ltr&quot; lang=&quot;en-GB&quot; xml:lang=&quot;en&quot;&gt;
 &lt;head profile=&quot;http://gmpg.org/xfn/11&quot;&gt;
	&lt;title&gt;Simon Whatley&lt;/title&gt;
	&lt;meta http-equiv=&quot;robots&quot; content=&quot;index,follow&quot; /&gt;
&lt;/head&gt;
&lt;body&gt;
&lt;/body&gt;
&lt;/html&gt;</pre></div></div>

<p>A word of caution though, Meta tags are not the best option to prevent search engines from indexing content of your website.</p>
<p>More information about Robots.txt files can be found on the <a href="http://www.robotstxt.org/" title="Robotstxt.org website" target="_blank" rel="nofollow">Robotstxt.org website</a>.</p>
<p><script type="text/javascript"><!--
google_ad_client = "pub-6475233631580417";
/* 468x60 Basic */
google_ad_slot = "7117418273";
google_ad_width = 468;
google_ad_height = 60;
//-->
</script>
<script type="text/javascript"
src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
</script></p>
<h3>Webmaster Tools</h3>
<p>The top 3 search providers all have their own webmaster tools admin interface. The Google offering is the most advanced, but it&#8217;s good practice to use and submit information to all three.</p>
<p>Links to their services are provided below:</p>
<ul>
<li><a href="http://www.google.com/webmasters/" title="Google Webmasters" target="_blank" rel="nofollow">Google Webmasters</a></li>
<li><a href="http://siteexplorer.search.yahoo.com" title="Yahoo! Site Explorer" target="_blank" rel="nofollow">Yahoo! Site Explorer</a></li>
<li><a href="http://webmaster.live.com" title="Live Search Webmaster Centre" target="_blank" rel="nofollow">Live Search Webmaster Centre</a></li>
</ul>
<p>Ask doesn&#8217;t have an interface. However, you can still ping their Submission Service using the <abbr title="Universal Resource Locator">URL</abbr> <code>http://submissions.ask.com/ping?sitemap=</code> in conjunction with your sitemap <abbr title="Universal Resource Locator">URL</abbr>.</p>
<h3>Further Information</h3>
<ul>
<li><a href="http://sourceforge.net/project/showfiles.php?group_id=137793&#038;package_id=153422" title="SourceForge Project: Sitemap Generator" target="_blank" rel="nofollow">Sitemap Generator</a></li>
<li><a href="http://code.google.com/sm_thirdparty.html" title="Google Code: Sitemap 3rd Party porgrams and websites" target="_blank" rel="nofollow">Sitemap 3rd Party porgrams and websites</a></li>
<li><a href="http://www.webmaster-toolkit.com/" title="Webmaster Toolkit" target="_blank" rel="nofollow">Webmaster Toolkit</a></li>
</ul>
<p><script type="text/javascript"><!--
google_ad_client = "pub-6475233631580417";
/* 468x60 Basic */
google_ad_slot = "7117418273";
google_ad_width = 468;
google_ad_height = 60;
//-->
</script>
<script type="text/javascript"
src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
</script></p>
]]></content:encoded>
			<wfw:commentRss>http://www.simonwhatley.co.uk/google-yahoo-and-microsoft-webmaster-tools/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

