How to avoid getting hooked



This post is one of a series devoted to online security. - Ed.


Millions of people have gotten "urgent" emails asking them to take immediate action to prevent some impending disaster. "Our bank has a new security system. Update your information now or you won't be able to access your account," or "We couldn't verify your information; click here to update your account." Sometimes the email claims that something awful will happen to the sender (or a third party), as in "The sum of $30,000,000 is going to go to the Government unless you help me transfer it to your bank account."

People who click on the links in these emails may see a web page that looks like a legitimate site they've visited before. Because the page looks familiar, these people enter their username, password, or other private information on the site. What they've actually done is given an unknown third party all the information needed to hijack their account, steal their money, or open up new lines of credit in their name. They just fell for a phishing attack.

The concept behind such an attack is pretty simple: Someone masquerades as someone else in an effort to fool you into sharing personal or other sensitive information with them. Phishers can masquerade as just about anyone, including banks, email and application providers, online merchants, online payment services, and even governments. And while some of these attacks are crude and easy to spot, many of them are sophisticated and well constructed. That fake email from "your bank" can look very real; the bogus "login page" you're redirected to can seem completely legitimate.

The good news is there are things you can do to steer clear of phishing attacks:
  • Be careful about responding to emails that ask you for sensitive information. You should be wary of clicking on links in emails or responding to emails that are asking for things like account numbers, user names and passwords, or other personal information such as social security numbers. Most legitimate businesses will never ask for this information via email. Google doesn't.
  • Go to the site yourself, rather than clicking on links in suspicious emails. If you receive a communication asking for sensitive information but think it could be legitimate, open a new browser window and go to the organization's website as you normally would (for instance, by using a bookmark or by typing out the address of the organization's website). This will improve the chances that you're dealing with the organization's website rather than with a phisher's website, and if there's actually something you need to do, there will usually be a notification on the site. Also, if you're not sure about a request you've received, don't be afraid to contact the organization directly to ask. It takes just a few minutes to go to the organization's website, find an email address or phone number for customer support, and reach out to confirm whether the request is legitimate.
  • If you're on a site that's asking you to enter sensitive information, check for signs of anything suspicious. If you're on a site that's asking for sensitive information -- no matter how you got there -- check for the signs that it's really the official website for the organization. For example, check the URL to make sure the page is actually part of the organization's website, and not a fraudulent page on a different domain (such as mybankk.com or g00gle.com.) If you're on a page that should be secured (like one asking you to enter in your credit card information) look for "https" at the beginning of the URL and the padlock icon in the browser. (In Firefox and Internet Explorer 6, the padlock appears in the bottom right-hand corner, while in Internet Explorer 7 the padlock appears on the right-hand side of the address bar.) These signs aren't infallible, but they're a good place to start.
  • Be wary of the "fabulous offers" and "fantastic prizes" that you'll sometimes come across on the web. If something seems too good to be true, it probably is, and it could be a phisher trying to steal your information. Whenever you come across an offer online that requires you to share personal or other sensitive information to take advantage of it, be sure to ask lots of questions and check the site asking for your information for signs of anything suspicious.
  • Use a browser that has a phishing filter. The latest versions of most browsers -- including Firefox, Internet Explorer, and Opera -- include phishing filters that can help you spot potential phishing attacks.
All fairly simple, right? What it all comes down to is if someone asks you to share personal or other sensitive information online, take a moment to think through the request carefully. Doing so will help you stay safe online, and help us all put phishers out of business.

Working together to fight malware


We recently began a series of posts related to online security that focus on how we secure information (with posts like these) and how you can protect yourself online. Here's the latest in the series.- Ed.

As part of this ongoing security series, we'd like to talk a little about malware. The term malware, derived from "malicious software," refers to any software specifically designed to harm your computer or the software it's running.

Malware can be added to your computer, with or without your knowledge, in a number of ways -- usually when you visit a website containing malware or when you download seemingly innocent software. It can then slow down your system, send fake emails from your email account, steal sensitive information like credit card numbers or passwords from your computer, and more.

The conventional wisdom was that you could avoid malware by learning to spot sites that were created with the sole purpose of spreading it, and by staying away from other sites that might be risky. But recent research from Google suggests that an increasing number of malware attacks are taking place on sites you'd normally regard as safe or legitimate, but have actually been compromised.

Google works closely with the security community to identify malware on the web and then share that information more broadly. We've set up a number of automated systems to scour our index for potentially dangerous sites, and we add a label to those that appear to be a vehicle for malware. If you're searching on Google and click on a link that we've flagged, a warning page will appear before you move forward.

We also notify webmasters if we discover that a site is no longer secure and provide a method for webmasters that clean up their sites to request a review. And starting soon, we'll be providing more detail on sites that appear to be spreading malware, so users have a better sense of why we have flagged a given site and webmasters can more easily identify and correct issues on their sites.

All this stems directly from our security philosophy: We believe that if we all work together to identify threats and stamp them out, we can make the web a safer place for everyone. Of course, we can't catch everything, so our users play a crucial part of this effort too. Below are a few tips that can help you reduce your chances of being affected by malware:
  • Use anti-virus software. Most anti-virus software is specifically designed to find and remove harmful software on your computer. Be sure you have anti-virus software installed on your computer (you can get a free trial through Google Pack if you don't), keep it current, and use it to run frequent full-system checks.
  • Make sure your operating system and browser are up to date. Attackers typically target vulnerabilities in your operating system (OS) and your browser to install malware on your computer. OS and browser providers frequently release updates to close those vulnerabilities. Enable automatic updates for both your browser and your OS, and check for alerts to ensure you have the latest and greatest protection.
  • Be careful about what you download. While Google and everyone else in the online community is working hard to identify harmful sites, new sources of malware are emerging all the time. Whenever you're prompted to download an email attachment, install a plug-in, or download an unfamiliar piece of software, take a moment to think it through. You won't always be able to identify a risky download, but if you practice some reasonable caution, you'll be able to reduce that risk.
If you come across a potentially dangerous site that hasn't already been flagged, please report it. To learn more about malware and how to protect yourself, check out StopBadware.org's help page.

Making search better in Catalonia, Estonia, and everywhere else



We recently began a series of posts on how we harness the power of data. Earlier we told you how data has been critical to the advancement of search; about using data to make our products safe and to prevent fraud; this post is the newest in the series. -Ed.

One of the most important uses of data at Google is building language models. By analyzing how people use language, we build models that enable us to interpret searches better, offer spelling corrections, understand when alternative forms of words are needed, offer language translation, and even suggest when searching in another language is appropriate.

One place we use these models is to find alternatives for words used in searches. For example, for both English and French users, "GM" often means the company "General Motors," but our language model understands that in French searches like seconde GM, it means "Guerre Mondiale" (World War), whereas in STI GM it means "Génie Mécanique" (Mechanical Engineering). Another meaning in English is "genetically modified," which our language model understands in GM corn. We've learned this based on the documents we've seen on the web and by observing that users will use both "genetically modified" and "GM" in the same set of searches.

We use similar techniques in all languages. For example, if a Catalan user searches for resultat elecció barris BCN (searching for the result of a neighborhood election in Barcelona), Google will also find pages that use the words "resultats" or "eleccions" or that talk about "Barcelona" instead of "BCN." And our language models also tell us that the Estonian user looking for Tartu juuksur, a barber in Tartu, might also be interested in a "juuksurisalong," or "barber shop."

In the past, language models were built from dictionaries by hand. But such systems are incomplete and don't reflect how people actually use language. Because our language models are based on users' interactions with Google, they are more precise and comprehensive -- for example, they incorporate names, idioms, colloquial usage, and newly coined words not often found in dictionaries.

When building our models, we use billions of web documents and as much historical search data as we can, in order to have the most comprehensive understanding of language possible. We analyze how our users searched and how they revised their searches. By looking across the aggregated searches of many users, we can infer the relationships of words to each other.

Queries are not made in isolation -- analyzing a single search in the context of the searches before and after it helps us understand a searcher's intent and make inferences. Also, by analyzing how users modify their searches, we've learned related words, variant grammatical forms, spelling corrections, and the concepts behind users' information needs. (We're able to make these connections between searches using cookie IDs -- small pieces of data stored in visitors' browsers that allow us to distinguish different users. To understand how cookies work, watch this video.)

To provide more relevant search results, Google is constantly developing new techniques for language modeling and building better models. One element in building better language models is using more data collected over longer periods of time. In languages with many documents and users, such as English, our language models allow us to improve results deep into the "long tail" of searches, learning about rare usages. However, for languages with fewer users and fewer documents on the web, building language models can be a challenge. For those languages we need to work with longer periods of data to build our models. For example, it takes more than a year of searches in Catalan to provide a comparable amount of data as a single day of searching in English; for Estonian, more than two and a half years worth of searching is needed to match a day of English. Having longer periods of data enables us to improve search for these less commonly used languages.

At Google, we want to ensure that we can help users everywhere find the things they're looking for; providing accurate, relevant results for searches in all languages worldwide is core to Google's mission. Building extensive models of historical usage in every language we can, especially when there are few users, is an essential piece of making search work for everyone, everywhere.

A common sense approach to Internet safety



Over the years, we've built tools and offered resources to help kids and families stay safe online. Our SafeSearch feature, for example, helps filter explicit content from search results.

We've also been involved in a variety of local initiatives to educate families about how to stay safe while surfing the web. Here are a few highlights:
  • Google India initiated "Be NetSmart," an Internet safety campaign created in cooperation with local law enforcement authorities that aims to educate students, parents, and teachers across the country about the great value the Internet can bring to their lives, while also teaching best practices for safe surfing.
  • And Google Germany worked with the national government, industry representatives, and a number of local organizations recently to launch a search engine for children.
As part of these ongoing efforts to provide online safety resources for parents and kids, we've created Tips for Online Safety, a site designed to help families find quick links to safety tools like SafeSearch, as well as new resources, like a video offering online safety pointers that we've developed in partnership with Common Sense Media. In the video, Anne Zehren, president of Common Sense, offers easy-to-implement tips, like how to set privacy and sharing controls on social networking sites and the importance of having reasonable rules for Internet use at home with appropriate levels of supervision.

Users can also download our new Online Family Safety Guide (PDF), which includes useful Internet Safety pointers for parents, or check out a quick tutorial on SafeSearch created by one of our partner organizations, GetNetWise.

We all have roles to play in keeping kids safe online. Parents need to be involved with their kids' online lives and teach them how to make smart decisions. And Internet companies like Google need to continue to empower parents and kids with tools and resources that help put them in control of their online experiences and make web surfing safer.

Using data to help prevent fraud



We recently began a series of posts on how we harness the power of data. Earlier we told you how data has been critical to the advancement of search technology. Then we shared how we use log data to help make Google products safer for users. This post is the newest in the series. -Ed.

Protecting our advertisers against click fraud is a lot like solving a crime: the more clues we have, the better we can determine which clicks to mark as invalid, so advertisers are not charged for them.

As we've mentioned before, our Ad Traffic Quality team built, and is constantly adding to, our three-stage system for detecting invalid clicks. The three stages are: (1) proactive real-time filters, (2) proactive offline analysis, and (3) reactive investigations.

So how do we use logs information for click fraud detection? Our logs are where we get the clues for the detective work. Logs provide us with the repository of data which are used to detect patterns, anomalous behavior, and other signals indicative of click fraud.

Millions of users click on AdWords ads every day. Every single one of those clicks -- and the even more numerous impressions associated with them -- is analyzed by our filters (stage 1), which operate in real-time. This stage certainly utilizes our logs data, but it is stages 2 and 3 which rely even more heavily on deeper analysis of the data in our logs. For example, in stage 2, our team pores over the millions of impressions and clicks -- as well as conversions -- over a longer time period. In combing through all this information, our team is looking for unusual behavior in hundreds of different data points.

IP addresses of computers clicking on ads are very useful data points. A simple use of IP addresses is determining the source location for traffic. That is, for a given publisher or advertiser, where are their clicks coming from? Are they all coming from one country or city? Is that normal for an ad of this type? Although we don't use this information to identify individuals, we look at these in aggregate and study patterns. This information is imperfect, but by analyzing a large volume of this data it is very helpful in helping to prevent fraud. For example, examining an IP address usually tells us which ISP that person is using. It is easy for people on most home Internet connections to get a new IP address by simply rebooting their DSL or cable modem. However, that new IP address will still be registered to their ISP, so additional ad clicks from that machine will still have something in common. Seeing an abnormally high number of clicks on a single publisher from the same ISP isn't necessarily proof of fraud, but it does look suspicious and raises a flag for us to investigate. Other information contained in our logs, such as the browser type and operating system of machines associated with ad clicks, are analyzed in similar ways.

These data points are just a few examples of hundreds of different factors we take into account in click fraud detection. Without this information, and enough of it to identify fraud attempted over a longer time period, it would be extremely difficult to detect invalid clicks with a high degree of confidence, and proactively create filters that help optimize advertiser ROI. Of course, we don't need this information forever; last year we started anonymizing server logs after 18 months. As always, our goal is to balance the utility of this information (as we try to improve Google’s services for you) with the best privacy practices for our users.

If you want to learn more about how we collect information to better detect click fraud, visit our Ad Traffic Quality Resource Center.

Using log data to help keep you safe



We recently began two new series of posts. The first, which explains how we harness data for our users, started with this post. The second, focusing on how we secure information and how users can protect themselves online, began here. This post is the second installment in both series.- Ed.

We sometimes get questions on what Google does with server log data, which registers how users are interacting with our services. We take great care in protecting this data, and while we've talked previously about some of the ways it can be useful, something we haven't covered yet are the ways it can help us make Google products safer for our users.

While the Internet on the whole is a safe place, and most of us will never fall victim to an attack, there are more than a few threats out there, and we do everything we can to help you stay a step ahead of them. Any information we can gather on how attacks are launched and propagated helps us do so.

That's where server log data comes in. We analyze logs for anomalies or other clues that might suggest malware or phishing attacks in our search results, attacks on our products and services, and other threats to our users. And because we have a reasonably significant data sample, with logs stretching back several months, we're able to perform aggregate, long-term analyses that can uncover new security threats, provide greater understanding of how previous threats impacted our users, and help us ensure that our threat detection and prevention measures are properly tuned.

We can't share too much detail (we need to be careful not to provide too many clues on what we look for), but we can use historical examples to give you a better idea of how this kind of data can be useful. One good example is the Santy search worm (PDF), which first appeared in late 2004. Santy used combinations of search terms on Google to identify and then infect vulnerable web servers. Once a web server was infected, it became part of a botnet and started searching Google for more vulnerable servers. Spreading in this way, Santy quickly infected thousands and thousands of web servers across the Internet.

As soon as Google recognized the attack, we began developing a series of tools to automatically generate "regular expressions" that could identify potential Santy queries and then block them from accessing Google.com or flag them for further attention. But because regular expressions like these can sometimes snag legitimate user queries too, we designed the tools so they'd test new expressions in our server log databases first, in order to determine how each one would affect actual user queries. If it turned out that a regular expression affected too many legitimate user queries, the tools would automatically adjust the expression, analyze its performance against the log data again, and then repeat the process as many times as necessary.

In this instance, having access to a good sample of log data meant we were able to refine one of our automated security processes, and the result was a more effective resolution of the problem. In other instances, the data has proven useful in minimizing certain security threats, or in preventing others completely. In the end, what this means is that whenever you use Google search, or Google Apps, or any of our other services, your interactions with those products helps us learn more about security threats that could impact your online experience. And the better the data we have, the more effectively we can protect all our users.

How Google keeps your information secure



As many of you know, we spend a lot of time around here thinking about new products to help you run your life more efficiently, whether that’s organizing email in a better way, sharing pictures with friends, or collaborating in real time on documents. What you may not know is that we also spend a lot of time thinking about the security that goes into those products, and more specifically the ways we can protect you and your private information.

While the chances are that you'll never have a security problem, we take security very seriously, and that's why we have some of the best engineers in the world working here to secure information. Much of their work is confidential, but we do want to share some of the ways we're protecting your data. There are a few things you should know about how we handle confidential information:
  • Philosophy: First is our philosophy. At Google, security is a continuous process. We don't just "check" a product for security before we launch it -- we are thinking about security before the product is even created, and we are building it in throughout the product's development. Also critical is our belief in layered protection. It's much like securing your house. You put your most private information in a safe. You secure the safe in your house, which is protected with locks and possibly an alarm system. And then you have the neighborhood watch program or the local police monitoring your neighborhood. It's very similar at Google. Our most sensitive information is difficult to find or access (the safe). Our network and facilities (the house) are protected in both high- and low-tech ways: encryption, alarms, and other technology for our systems, and strong physical security at our facilities. And finally, we've learned that when security is done right, it's done best as a community (the neighborhood); we encourage everyone to help us identify potential problems and solutions. Researchers who work at security and technology companies all over the world are constantly looking for security problems on the Internet, and we work closely with that community to find and fix potential problems.
  • Technology: These layers of protection are built on the best security technology in the world. While we employ products developed by others in the security community, we build a lot of our security technology ourselves. Some of the most innovative components of our security architecture focus on automation and scale. These are important to us because we're handling searches, emails, and other activities for millions of users every day. To keep our security processes a step ahead, we automate the way we test our software for possible security vulnerabilities and the way we monitor for possible security attacks. We're also constantly seeking more ways to use encryption and other technical measures to protect your data, while still maintaining a great user experience.
  • Process: In addition to technology, we have a set of processes that dictate how we secure confidential information at Google and who can access it. We carefully manage access to confidential information of any sort, and very few Googlers have access to what we consider very sensitive data. This is in no small part because there's very little reason for us to provide that access -- most of our processes are automated, and don't require much human intervention. Of course, the limited number of people who are granted access to sensitive data must have special approval. And while we hold ourselves to a very high standard, we also work to ensure that our processes meet (and in many cases exceed) industry standards. These include audits for Sarbanes-Oxley, SAS 70, PCI (payment card industry) compliance, and more. By working with independent auditors, who evaluate compliance with standards that hold hundreds of different companies to very rigorous requirements, we add another layer of checks and balances to our security processes.
  • People: The most important part of our approach to security is our people. Google employs some of the best and brightest security engineers in the world. Many of our engineers came from very high-profile security environments, such as banks, credit card companies, and high-volume retail organizations, and a large number of them hold PhDs and patents in security and software engineering. As you can imagine, our engineers are smart and curious and are on the lookout for security anomalies and best practices in the industry. Our engineers have published hundreds of academic papers on technically detailed topics such as drive-by downloads that install malware (PDF file) or hostile virtualized environments. (You can find some of these papers here.) What's more, we cultivate a collaborative approach to security among all of our engineers, requiring everyone to pass a coding style review (which enables us to control the type of code used here and how it's used in order to prevent software problems) and ensuring that all code at Google is reviewed by multiple engineers so that it meets our software and security standards.
And throughout the company, we use our own products. That means we protect your information with the same security that we use to protect our own company emails and documents. And while we continue to innovate with our products, we'll also continue to innovate in the world of security. For more on our approach to security, visit our Security and Product Safety page.