Giving you fresher, more recent search results

Search results, like warm cookies right out of the oven or cool refreshing fruit on a hot summer’s day, are best when they’re fresh. Even if you don’t specify it in your search, you probably want search results that are relevant and recent.

If I search for [olympics], I probably want information about next summer’s upcoming Olympics, not the 1900 Summer Olympics (the only time my favorite sport, cricket, was played). Google Search uses a freshness algorithm, designed to give you the most up-to-date results, so even when I just type [olympics] without specifying 2012, I still find what I’m looking for.

Given the incredibly fast pace at which information moves in today’s world, the most recent information can be from the last week, day or even minute, and depending on the search terms, the algorithm needs to be able to figure out if a result from a week ago about a TV show is recent, or if a result from a week ago about breaking news is too old.

We completed our Caffeine web indexing system last year, which allows us to crawl and index the web for fresh content quickly on an enormous scale. Building upon the momentum from Caffeine, today we’re making a significant improvement to our ranking algorithm that impacts roughly 35 percent of searches and better determines when to give you more up-to-date relevant results for these varying degrees of freshness.
  • Recent events or hot topics. For recent events or hot topics that begin trending on the web, you want to find the latest information immediately. Now when you search for current events like [occupy oakland protest], or for the latest news about the [nba lockout], you’ll see more high-quality pages that might only be minutes old. 
  • Regularly recurring events. Some events take place on a regularly recurring basis, such as annual conferences like [ICALP] or an event like the [presidential election]. Without specifying with your keywords, it’s implied that you expect to see the most recent event, and not one from 50 years ago. There are also things that recur more frequently, so now when you’re searching for the latest [NFL scores], [dancing with the stars] results or [exxon earnings], you’ll see the latest information. 
  • Frequent updates. There are also searches for information that changes often, but isn’t really a hot topic or a recurring event. For example, if you’re researching the [best slr cameras], or you’re in the market for a new car and want [subaru impreza reviews], you probably want the most up to date information. 
There are plenty of cases where results that are a few years old might still be useful for you. [fast tomato sauce recipe] certainly saved me after a call from my wife reminded me I had volunteered to make dinner! On the other hand, when I search for the [49ers score], a result that is a week old might be too old.

Different searches have different freshness needs. This algorithmic improvement is designed to better understand how to differentiate between these kinds of searches and the level of freshness you need, and make sure you get the most up to the minute answers.

Update 11/7/11: To clarify, when we say this algorithm impacted 35% of searches, we mean at least one result on the page was affected, as opposed to when we've said noticeably impacted in the past, which means changes that are significant enough that an average user would notice. Using that same scale, this change noticeably impacts 6 - 10% of searches, depending on the language and domain you're searching on.

(Cross-posted on the Inside Search blog)

Another look under the hood of search

(Cross-posted on the Inside Search blog and the Public Policy blog)

Over the past few years, we’ve released a series of blog posts to share the methodology and process behind our search ranking, evaluation and algorithmic changes. Just last month, Ben Gomes, Matt Cutts and I participated in a Churchill Club event where we discussed how search works and where we believe it’s headed in the future.

Beyond our talk and various blog posts, we wanted to give people an even deeper look inside search, so we put together a short video that gives you a sense of the work that goes into the changes and improvements we make to Google almost every day. While an improvement to the algorithm may start with a creative idea, it always goes through a process of rigorous scientific testing. Simply put: if the data from our experiments doesn’t show that we’re helping users, we won’t launch the change.

In the world of search, we’re always striving to deliver the answers you’re looking for. After all, we know you have a choice of a search engine every time you open a browser. As the Internet becomes bigger, richer and more interactive it means that we have to work that much harder to ensure we’re unearthing and displaying the best results for you.

Inside Google's search office

(Cross-posted on the Inside Search Blog)

I’ve been working with Matt Cutts and Ben Gomes in the same office for over 10 years. We work on search every day, and earlier this week, we took our office talk to the stage at an event hosted by the Churchill Club. Search Engine Land’s Danny Sullivan moderated our in-depth discussion on search, how it works, and what’s ahead for us in the future. We also reminisced about first joining Google, the time my car ran out of gas as Ben and I discussed a change to the algorithm, and other great memories over the years.

Come sit inside our office for a chat about Google Search:

  • To hear more about the principles that drive changes to the algorithm and how these changes are tested and implemented, go to 15:40
  • To hear the discussion on why we don’t hand-pick results, start watching at 41:04
  • For more on my vision for the future of search, jump to 1:12:28
  • Guess who Danny thinks is the brains, looks, and brawn of this operation at 1:08 (hint: I’m the brains).

Google Commerce Search 3.0: You won’t believe it’s online shopping

When we first introduced Google Commerce Search—our search solution for e-commerce websites—our focus was on improving search quality and speed to help online shoppers find what they’re looking for. Retailers such as Woodcraft Supply, and implemented Google Commerce Search on their respective websites; Woodcraft increased search revenues 34 percent, BabyAge increased site searches 64 percent and HealthWarehouse saw online conversions increase 19 percent—and all have reported an increase in customer satisfaction.

Today we’re building on the capabilities that have proved useful to our retail partners with the third-generation Google Commerce Search (GCS). With this new version, we hope to help create an even more interactive and engaging experience for shoppers and retailers.

Here are some of the cool new features in GCS 3.0:
  • Search as You Type provides instant gratification to shoppers, returning product results with every keystroke, right from the search bar
  • Local Product Availability helps retailers bridge online and offline sales by showing shoppers when a product is also available in a store nearby—in-line with the search results
  • Enhanced Merchandising tools allow retailers to create product promotions that display in banners alongside related search queries, and to easily set query-based landing pages (for example, when a visitor types [shoes], they’re directed to a “shoe” page)
  • Product Recommendations (Labs) helps shoppers make purchase decisions by showing them what others viewed and ultimately bought

Search As You Type on

With this release we're also welcoming three new retail partners: Forever21, General Nutrition Company (GNC) and L’Occitane. GNC implemented Google Commerce Search in less than a week on their mobile website, while Forever 21 and L’Occitane are currently working to implement various new features of GCS, such as Search as You Type and Local Product Availability. Here’s what Christine Burke, VP of International E-Commerce at cosmetics staple L’Occitane had to say about GCS 3.0:
L’Occitane is unique in that our beauty products center around ingredients—such as lavender, shea butter and verbena. As our customers visit our re-designed website to shop and research our products, we’re excited about the speed and accuracy of on-site search results that will be provided to us through Google Commerce Search. We’re also very excited about the possibility of the new local inventory feature, which can help us connect our customers with their favorite products in one of our 170 U.S. boutiques.
For more information, visit

Hide sites to find more of what you want

Over the years we’ve experimented with a number of ways to help you personalize the results you find on Google, from SearchWiki to stars in search to location settings. Now there’s yet another way to find more of what you want on Google by blocking the sites you don’t want to see.

You’ve probably had the experience where you’ve clicked a result and it wasn’t quite what you were looking for. Many times you’ll head right back to Google. Perhaps the result just wasn’t quite right, but sometimes you may dislike the site in general, whether it’s offensive, pornographic or of generally low quality. For times like these, you’ll start seeing a new option to block particular domains from your future search results. Now when you click a result and then return to Google, you’ll find a new link next to “Cached” that reads “Block all results.”

As always, Matt’s been gracious enough to let us use him as an example. His site is awesome, though, and we doubt many people will want to block it!

Once you click the link to “Block all results” you’ll get a confirmation message, as well as the option to undo your choice. You’ll see the link whether or not you’re signed in, but the domains you block are connected with your Google Account, so you’ll need to sign in before you can confirm a block.

Once you’ve blocked a domain, you won’t see it in your future search results. (Side note: Sometimes you may have to search on a new term, rather than simply refreshing your browser, before you'll notice the domain has been successfully removed.) The next time you’re searching and a blocked page would have appeared, you’ll see a message telling you results have been blocked, making it easy to manage your personal list of blocked sites. This message will appear at the top or bottom of the results page depending on the relevance of the blocked pages.

You can see a list of your blocked sites in a new settings page, which you can access by visiting your Search Settings or clicking on the “Manage blocked sites” link that appears when you block a domain. On the settings page you can find details about the sites you’ve blocked, block new sites, or unblock sites if you’ve changed your mind.

We’re adding this feature because we believe giving you control over the results you find will provide an even more personalized and enjoyable experience on Google. In addition, while we’re not currently using the domains people block as a signal in ranking, we’ll look at the data and see whether it would be useful as we continue to evaluate and improve our search results in the future. The new feature is rolling out today and tomorrow on in English for people using Chrome 9+, IE8+ and Firefox 3.5+, and we’ll be expanding to new regions, languages and browsers soon. We hope you find it useful, and we’ll be listening closely to your suggestions.

Finding more high-quality sites in search

Our goal is simple: to give people the most relevant answers to their queries as quickly as possible. This requires constant tuning of our algorithms, as new content—both good and bad—comes online all the time.

Many of the changes we make are so subtle that very few people notice them. But in the last day or so we launched a pretty big algorithmic improvement to our ranking—a change that noticeably impacts 11.8% of our queries—and we wanted to let people know what’s going on. This update is designed to reduce rankings for low-quality sites—sites which are low-value add for users, copy content from other websites or sites that are just not very useful. At the same time, it will provide better rankings for high-quality sites—sites with original content and information such as research, in-depth reports, thoughtful analysis and so on.

We can’t make a major improvement without affecting rankings for many sites. It has to be that some sites will go up and some will go down. Google depends on the high-quality content created by wonderful websites around the world, and we do have a responsibility to encourage a healthy web ecosystem. Therefore, it is important for high-quality sites to be rewarded, and that’s exactly what this change does.

It’s worth noting that this update does not rely on the feedback we’ve received from the Personal Blocklist Chrome extension, which we launched last week. However, we did compare the Blocklist data we gathered with the sites identified by our algorithm, and we were very pleased that the preferences our users expressed by using the extension are well represented. If you take the top several dozen or so most-blocked domains from the Chrome extension, then this algorithmic change addresses 84% of them, which is strong independent confirmation of the user benefits.

So, we’re very excited about this new ranking improvement because we believe it’s a big step in the right direction of helping people find ever higher quality in our results. We’ve been tackling these issues for more than a year, and working on this specific change for the past few months. And we’re working on many more updates that we believe will substantially improve the quality of the pages in our results.

To start with, we’re launching this change in the U.S. only; we plan to roll it out elsewhere over time. We’ll keep you posted as we roll this and other changes out, and as always please keep giving us feedback about the quality of our results because it really helps us to improve Google Search.

Update April 11: We’ve rolled out this algorithmic change globally to all English-language Google users and incorporated new signals as we iterate and improve. We’ll continue testing and refining the change before expanding to additional languages. You can learn more on our Webmaster Central Blog.

New Chrome extension: block sites from Google’s web search results

(Cross-posted on the Google Chrome Blog)

We’ve been exploring different algorithms to detect content farms, which are sites with shallow or low-quality content. One of the signals we're exploring is explicit feedback from users. To that end, today we’re launching an early, experimental Chrome extension so people can block sites from their web search results. If installed, the extension also sends blocked site information to Google, and we will study the resulting feedback and explore using it as a potential ranking signal for our search results.

You can download the extension and start blocking sites now. It looks like this:

When you block a site with the extension, you won’t see results from that domain again in your Google search results. You can always revoke a blocked site at the bottom of the search results, so it's easy to undo blocks:

You can also edit your list of blocked sites by clicking on the extension's icon in the top right of the Chrome window.

This is an early test, but the extension is available in English, French, German, Italian, Portuguese, Russian, Spanish and Turkish. We hope this extension improves your search experience, and thanks in advance for participating in this experiment. If you’re a tech-savvy Chrome user, please download and try the Personal Blocklist extension today.

Microsoft’s Bing uses Google search results—and denies it

By now, you may have read Danny Sullivan’s recent post: “Google: Bing is Cheating, Copying Our Search Results” and heard Microsoft’s response, “We do not copy Google's results.” However you define copying, the bottom line is, these Bing results came directly from Google.

I’d like to give you some background and details of our experiments that lead us to understand just how Bing is using Google web search results.

It all started with tarsorrhaphy. Really. As it happens, tarsorrhaphy is a rare surgical procedure on eyelids. And in the summer of 2010, we were looking at the search results for an unusual misspelled query [torsorophy]. Google returned the correct spelling—tarsorrhaphy—along with results for the corrected query. At that time, Bing had no results for the misspelling. Later in the summer, Bing started returning our first result to their users without offering the spell correction (see screenshots below). This was very strange. How could they return our first result to their users without the correct spelling? Had they known the correct spelling, they could have returned several more relevant results for the corrected query.

This example opened our eyes, and over the next few months we noticed that URLs from Google search results would later appear in Bing with increasing frequency for all kinds of queries: popular queries, rare or unusual queries and misspelled queries. Even search results that we would consider mistakes of our algorithms started showing up on Bing.

We couldn’t shake the feeling that something was going on, and our suspicions became much stronger in late October 2010 when we noticed a significant increase in how often Google’s top search result appeared at the top of Bing’s ranking for a variety of queries. This statistical pattern was too striking to ignore. To test our hypothesis, we needed an experiment to determine whether Microsoft was really using Google’s search results in Bing’s ranking.

We created about 100 “synthetic queries”—queries that you would never expect a user to type, such as [hiybbprqag]. As a one-time experiment, for each synthetic query we inserted as Google’s top result a unique (real) webpage which had nothing to do with the query. Below is an example:

To be clear, the synthetic query had no relationship with the inserted result we chose—the query didn’t appear on the webpage, and there were no links to the webpage with that query phrase. In other words, there was absolutely no reason for any search engine to return that webpage for that synthetic query. You can think of the synthetic queries with inserted results as the search engine equivalent of marked bills in a bank.

We gave 20 of our engineers laptops with a fresh install of Microsoft Windows running Internet Explorer 8 with Bing Toolbar installed. As part of the install process, we opted in to the “Suggested Sites” feature of IE8, and we accepted the default options for the Bing Toolbar.

We asked these engineers to enter the synthetic queries into the search box on the Google home page, and click on the results, i.e., the results we inserted. We were surprised that within a couple weeks of starting this experiment, our inserted results started appearing in Bing. Below is an example: a search for [hiybbprqag] on Bing returned a page about seating at a theater in Los Angeles. As far as we know, the only connection between the query and result is Google’s result page (shown above).

We saw this happen for multiple queries. For the query [delhipublicschool40 chdjob] we inserted a search result for a credit union:

The same credit union soon showed up on Bing for that query:

For the query [juegosdeben1ogrande] we inserted a page of hip hop bling jewelry:

And the same hip hop bling page showed up in Bing:

As we see it, this experiment confirms our suspicion that Bing is using some combination of:
or possibly some other means to send data to Bing on what people search for on Google and the Google search results they click. Those results from Google are then more likely to show up on Bing. Put another way, some Bing results increasingly look like an incomplete, stale version of Google results—a cheap imitation.

At Google we strongly believe in innovation and are proud of our search quality. We’ve invested thousands of person-years into developing our search algorithms because we want our users to get the right answer every time they search, and that’s not easy. We look forward to competing with genuinely new search algorithms out there—algorithms built on core innovation, and not on recycled search results from a competitor. So to all the users out there looking for the most authentic, relevant search results, we encourage you to come directly to Google. And to those who have asked what we want out of all this, the answer is simple: we'd like for this practice to stop.

Google search and search engine spam

January brought a spate of stories about Google’s search quality. Reading through some of these recent articles, you might ask whether our search quality has gotten worse. The short answer is that according to the evaluation metrics that we’ve refined over more than a decade, Google’s search quality is better than it has ever been in terms of relevance, freshness and comprehensiveness. Today, English-language spam in Google’s results is less than half what it was five years ago, and spam in most other languages is even lower than in English. However, we have seen a slight uptick of spam in recent months, and while we’ve already made progress, we have new efforts underway to continue to improve our search quality.

Just as a reminder, webspam is junk you see in search results when websites try to cheat their way into higher positions in search results or otherwise violate search engine quality guidelines. A decade ago, the spam situation was so bad that search engines would regularly return off-topic webspam for many different searches. For the most part, Google has successfully beaten back that type of “pure webspam”—even while some spammers resort to sneakier or even illegal tactics such as hacking websites.

As we’ve increased both our size and freshness in recent months, we’ve naturally indexed a lot of good content and some spam as well. To respond to that challenge, we recently launched a redesigned document-level classifier that makes it harder for spammy on-page content to rank highly. The new classifier is better at detecting spam on individual web pages, e.g., repeated spammy words—the sort of phrases you tend to see in junky, automated, self-promoting blog comments. We’ve also radically improved our ability to detect hacked sites, which were a major source of spam in 2010. And we’re evaluating multiple changes that should help drive spam levels even lower, including one change that primarily affects sites that copy others’ content and sites with low levels of original content. We’ll continue to explore ways to reduce spam, including new ways for users to give more explicit feedback about spammy and low-quality sites.

As “pure webspam” has decreased over time, attention has shifted instead to “content farms,” which are sites with shallow or low-quality content. In 2010, we launched two major algorithmic changes focused on low-quality sites. Nonetheless, we hear the feedback from the web loud and clear: people are asking for even stronger action on content farms and sites that consist primarily of spammy or low-quality content. We take pride in Google search and strive to make each and every search perfect. The fact is that we’re not perfect, and combined with users’ skyrocketing expectations of Google, these imperfections get magnified in perception. However, we can and should do better.

One misconception that we’ve seen in the last few weeks is the idea that Google doesn’t take as strong action on spammy content in our index if those sites are serving Google ads. To be crystal clear:
  • Google absolutely takes action on sites that violate our quality guidelines regardless of whether they have ads powered by Google;
  • Displaying Google ads does not help a site’s rankings in Google; and
  • Buying Google ads does not increase a site’s rankings in Google’s search results.
These principles have always applied, but it’s important to affirm they still hold true.

People care enough about Google to tell us—sometimes passionately—what they want to see improved. We deeply appreciate this feedback. Combined with our own scientific evaluations, user feedback allows us to explore every opportunity for possible improvements. Please tell us how we can do a better job, and we’ll continue to work towards a better Google.

A recent improvement for Arabic searches

This post is the latest in an ongoing series about how we harness the data we collect to improve our products and services for our users. - Ed.

We've learned that when performing a search on Google, people sometimes forget to separate words with spaces. Moreover, people often mistakenly repeat a letter within a single word. For instance, when writing the query [amazingly beautiful poem], you might write it as [amazingly beautiifullpoem].

These types of errors are much more common in languages like Arabic, where most of the letters are cursive. That means that the shapes of the letters change, based on the position of the letter in the word (initial, middle, final or isolated). Moreover, some Arabic letters are considered word breaks, meaning that the following letter must be in an "initial" shape. In other words, if the last letter of one word is a word break, the following word may not be separated with a space.

For example, the queries [وزارةالتعليم] and [وزارة التعليم] have an identical meaning (Ministry of Education) and they're both written in a common form for Arabic documents. But they have different, albeit correct, formats — the first query is written as a single word, while the second is written as two. Google needs to understand that while they're written differently, they mean the same thing and should yield the exact same search results. In this example, both queries were written correctly, just in different formats. But sometimes people just make errors — like repeating the same letter twice. For example, you might write [راائعة الجماال], repeating the letter "ا" twice in both query words. In this case the correct spelling should be [رائعة الجمال]. It's important that Google search recognizes your query — despite spelling errors.

To address issues like this, we recently developed a search ranking improvement that targets certain Arabic queries. Our algorithm employs rules of Arabic spelling and grammar along with signals from historical search data to decide when to leave out spaces between words or when to remove unnecessarily repeated letters. Now, when you type a query leaving out spaces or repeating a letter, we'll return better results based not only on what you typed, but also on what our algorithm understands is the "correct" query. For example, here's what happens when you type [قصيدة راائعةالجماال] ([amazingly beautiful poem] in Arabic) with repeated letters and dropped spaces between words.

As you can see, the Google results contain the corrected query, the terms قصيدة رائعة الجمال, in bold.

For most people, this might seem like a small enhancement. But for us, it’s a big change. Our tests show we've improved search for 10% of Arabic language queries. Which, when you think about it, is a lot of people.