Introducing Sitemaps for Google News

Good news for webmasters of English-language news sites: If your site is currently included in Google News, you can now create News Sitemaps that tell us exactly which articles to crawl for inclusion in Google News. In addition, you can access crawl errors, which tell you if there were any problems crawling the articles in your News Sitemaps, or, for that matter, any articles on your site that Google News reaches through its normal crawl.

Freshness is important for news, so we recrawl all News Sitemaps frequently. The News Sitemaps XML definition lets you specify a publication date and time for each article to help us process fresh articles in timely fashion. You can also specify keywords for each article to inform the placement of the articles into sections on Google News.

If your English-language news site is currently included in Google News, the news features are automatically enabled in webmaster tools; just add the site to your account. Here's how the new summary page will look:

The presence of the News crawl link on the left indicates that the news features are enabled. A few things to note:
  • You will only have the news features enabled if your site is currently included in Google News. If it's not, you can request inclusion.

  • In most cases, you should add the site for the hostname under which you publish your articles. For example, if you publish your articles at URLs such as http://www.example.com/business/article123.html, you should add the site http://www.example.com/. Exception: If your site is within a hosting site, you should add the site for your homepage, e.g., http://members.tripod.com/mynewssite/. If you publish articles under multiple hostnames, you should add a site for each of them.

  • You must verify your site to enable the news features.

We'll be working to make the news features available to publishers in more languages as soon as possible.

Joint support for the Sitemap Protocol

We're thrilled to tell you that Yahoo! and Microsoft are joining us in supporting the Sitemap protocol.

As part of this development, we're moving the protocol to a new namespace, www.sitemaps.org, and raising the version number to 0.9. The sponsoring companies will continue to collaborate on the protocol and publish enhancements on the jointly-maintained site sitemaps.org.

If you've already submitted a Sitemap to Google using the previous namespace and version number, we'll continue to accept it. If you haven't submitted a Sitemap before, check out the documentation on www.sitemaps.org for information on creating one. You can submit your Sitemap file to Google using Google webmaster tools. See the documentation that Yahoo! and Microsoft provide for information about submitting to them.

If any website owners, tool writers, or webserver developers haven't gotten around to implementing Sitemaps yet, thinking this was just a crazy Google experiment, we hope this joint announcement shows that the industry is heading in this direction. The more Sitemaps eventually cover the entire web, the more we can revolutionize the way web crawlers interact with websites. In our view, the experiment is still underway.

New third-party Sitemaps tools

Hello, webmasters, I'm Maile, and I recently joined the team here at Google webmaster central. And I already have good news to report: we've updated our third-party program and websites information. These third-party tools provide lots of options for easily generate a Sitemap -- from plugins for content management systems to online generators.

Many thanks to this community for continuing to innovate and improve the Sitemap tools. Since most of my work focuses on the Sitemaps protocol, I hope to meet you on our Sitemaps protocol discussion group.

Learn more about Googlebot's crawl of your site and more!

We've added a few new features to webmaster tools and invite you to check them out.

Googlebot activity reports
Check out these cool charts! We show you the number of pages Googlebot's crawled from your site per day, the number of kilobytes of data Googlebot's downloaded per day, and the average time it took Googlebot to download pages. Webmaster tools show each of these for the last 90 days. Stay tuned for more information about this data and how you can use it to pinpoint issues with your site.

Crawl rate control
Googlebot uses sophisticated algorithms that determine how much to crawl each site. Our goal is to crawl as many pages from your site as we can on each visit without overwhelming your server's bandwidth.

We've been conducting a limited test of a new feature that enables you to provide us information about how we crawl your site. Today, we're making this tool available to everyone. You can access this tool from the Diagnostic tab. If you'd like Googlebot to slow down the crawl of your site, simply choose the Slower option.

If we feel your server could handle the additional bandwidth, and we can crawl your site more, we'll let you know and offer the option for a faster crawl.

If you request a changed crawl rate, this change will last for 90 days. If you liked the changed rate, you can simply return to webmaster tools and make the change again.


Enhanced image search
You can now opt into enhanced image search for the images on your site, which enables our tools such as Google Image Labeler to associate the images included in your site with labels that will improve indexing and search quality of those images. After you've opted in, you can opt out at any time.

Number of URLs submitted
Recently at SES San Jose, a webmaster asked me if we could show the number of URLs we find in a Sitemap. He said that he generates his Sitemaps automatically and he'd like confirmation that the number he thinks he generated is the same number we received. We thought this was a great idea. Simply access the Sitemaps tab to see the number of URLs we found in each Sitemap you've submitted.

As always, we hope you find these updates useful and look forward to hearing what you think.

Multiple Sitemaps in the same directory

We've gotten a few questions about whether you can put multiple Sitemaps in the same directory. Yes, you can!

You might want to have multiple Sitemap files in a single directory for a number of reasons. For instance, if you have an auction site, you might want to have a daily Sitemap with new auction offers and a weekly Sitemap with less time-sensitive URLs. Or you could generate a new Sitemap every day with new offers, so that the list of Sitemaps grows over time. Either of these solutions works just fine.

Or, here's another sample scenario: Suppose you're a provider that supports multiple web shops, and they share a similar URL structure differentiated by a parameter. For example:

http://example.com/stores/home?id=1
http://example.com/stores/home?id=2
http://example.com/stores/home?id=3

Since they're all in the same directory, it's fine by our rules to put the URLs for all of the stores into a single Sitemap, under http://example.com/ or http://example.com/stores/. However, some webmasters may prefer to have separate Sitemaps for each store, such as:

http://example.com/stores/store1_sitemap.xml
http://example.com/stores/store2_sitemap.xml
http://example.com/stores/store3_sitemap.xml

As long as all URLs listed in the Sitemap are at the same location as the Sitemap or in a sub directory (in the above example http://example.com/stores/ or perhaps http://example.com/stores/catalog) it's fine for multiple Sitemaps to live in the same directory (as many as you want!). The important thing is that Sitemaps not contain URLs from parent directories or completely different directories -- if that happens, we can't be sure that the submitter controls the URL's directory, so we can't trust the metadata.

The above Sitemaps could also be collected into a single Sitemap index file and easily be submitted via Google webmaster tools. For example, you could create http://example.com/stores/sitemap_index.xml as follows:

<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.google.com/schemas/sitemap/0.84">
<sitemap>
<loc>http://example.com/stores/store1_sitemap.xml</loc>
<lastmod>2006-10-01T18:23:17+00:00</lastmod>
</sitemap>
<sitemap>
<loc>http://example.com/stores/store2_sitemap.xml</loc>
<lastmod>2006-10-01</lastmod>
</sitemap>
<sitemap>
<loc>http://example.com/stores/store3_sitemap.xml</loc>
<lastmod>2006-10-05</lastmod>
</sitemap>
</sitemapindex>

Then simply add the index file to your account, and you'll be able to see any errors for each of the child Sitemaps.

If each store includes more than 50,000 URLs (the maximum number for a single Sitemap), you would need to have multiple Sitemaps for each store. In that case, you may want to create a Sitemap index file for each store that lists the Sitemaps for that store. For instance:

http://example.com/stores/store1_sitemapindex.xml
http://example.com/stores/store2_sitemapindex.xml
http://example.com/stores/store3_sitemapindex.xml

Since Sitemap index files can't contain other index files, you would need to submit each Sitemap index file to your account separately.

Whether you list all URLs in a single Sitemap or in multiple Sitemaps (in the same directory of different directories) is simply based on what's easiest for you to maintain. We treat the URLs equally for each of these methods of organization.