Contributing To Open Source Software Security

Monday, May 5, 2008 11:38 AM



From operating systems to web browsers, open source software plays a critical role in the operation of the Internet. The security of open source software is therefore quite important, as it often interacts with personal information -- ranging from credit card numbers to medical records -- that needs to be kept safe. There has been a long-lived discussion on whether open source software is inherently more secure than closed source software. While popular opinion has begun to tilt in favor of openness, there are still arguments for both sides. Instead of diving into those treacherous waters (or giving weight to the idea of "inherent security"), I'd like to focus on the fruits of this extensive discussion. In particular, David A. Wheeler laid out a "bottom line" in his Secure Programming for Linux and Unix HOWTO which applies to both open and closed source software. It predicates real security in software on three actions:

  1. people need to actually review the code

  2. developers/reviewers need to know how to write secure code

  3. once found, security problems need to be fixed quickly, and their fixes distributed quickly


While distilling anything down to three steps makes it seem easy, this isn't necessarily the case. Given how important open source software is to Google, we've attempted to contribute to this bottom line. As Chris said before, our engineers are encouraged to contribute both software and time to open source efforts. We regularly submit the results of our automated and manual security analysis of open source software back to the community, including related software engineering time. In addition, our engineering teams frequently release software under open source licenses. This software was written either with security in mind, such as with security testing tools, or by engineers well-versed in the security challenges of their project.

These efforts leave one area completely unaddressed -- getting security problems fixed quickly, and then getting those fixes distributed quickly. It has been unclear how to best resolve this issue. There is no centralized security authority for open source projects, and operating system distribution publishers are the best bet for getting updates to the highest number of users. Even if users can get updates in this manner, how should a security researcher contact a particular project's author? If there's a potential, security-related issue, who can help evaluate the risk for a project? What resources are there for projects that have been compromised, but have no operational security background?

I'm proud to announce that Google has sponsored participation in oCERT, the open source computer emergency response team. oCERT is a volunteer workforce of security professionals from the open source community with the goal of providing security vulnerability mediation and incident response services to open source projects. It will strive to contact software authors with all security reports and aid in debugging and patching, especially in cases where the author, or the reporter, doesn't have a background in security. Reliable contacts for projects, publishers, and vendors will be maintained where possible and used for notification when issues arise and fixes are available for mediated issues. Additionally, oCERT will aid projects of any size with responses to security incidents, such as server compromises.

It is my hope that this initiative will not only aid in remediating security issues in a timely fashion, but also provide a means for additional security contributions to the open source community.

All Your iFrame Are Point to Us

Monday, February 11, 2008 1:57 PM



It has been over a year and a half since we started to identify web pages that infect vulnerable hosts via drive-by downloads, i.e. web pages that attempt to exploit their visitors by installing and running malware automatically. During that time we have investigated billions of URLs and found more than three million unique URLs on over 180,000 web sites automatically installing malware. During the course of our research, we have investigated not only the prevalence of drive-by downloads but also how users are being exposed to malware and how it is being distributed. Our research paper is currently under peer review, but we are making a technical report [PDF] available now. Although our technical report contains a lot more detail, we present some high-level findings here:

Search Results Containing a URL Labeled as Harmful


The above graph shows the percentage of daily queries that contain at least one search result labeled as harmful. In the past few months, more than 1% of all search results contained at least one result that we believe to point to malicious content and the trend seems to be increasing.

Browsing Habits

Good computer hygiene, such as running automatic updates for the operating system and third-party applications, as well as installing anti-virus products goes a long way in protecting your home computer. However, we have been wondering if users' browsing habits impact the likelihood of encountering malicious web pages. To study this aspect, we took a sample of ~7 million URLs and mapped them to DMOZ categories. Although we found that adult web pages may increase the risk of exploitation, each DMOZ category was affected.

Malicious Content Injection

To understand if malicious content on a web server is due to poor web server security, we analyzed the version numbers reported by web servers on which we found malicious pages. Specifically, we looked at the Apache and the PHP versions exported as part of a server's response. We found that over 38% of both Apache and PHP versions were outdated increasing the risk of remote content injection to these servers.

Our "Ghost In the Browser [PDF]" paper highlighted third-party content as one potential vector of malicious content. Today, a lot of third-party content is due to advertising. To assess the extent to which advertising contributes to drive-by downloads, we analyze the distribution chain of malware, i.e. all the intermediary URLs a browser downloads before reaching a malware payload. We inspected each distribution chain for membership in about 2,000 known advertising networks. If any URL in the distribution chain corresponds to a known advertising network, we count the whole page as being infectious due to Ads. In our analysis, we found that on average 2% of malicious web sites were delivering malware via advertising. The underlying problem is that advertising space is often syndicated to other parties who are not known to the web site owner. Although non-syndicated advertising networks such as Google Adwords are not affected, any advertising networks practicing syndication needs to carefully study this problem. Our technical report [PDF] contains more detail including an analysis based on the popularity of web sites.

Structural Properties of Malware Distribution


Finally, we also investigated the structural properties of malware distribution sites. Some malware distribution sites had as many as 21,000 regular web sites pointing to them. We also found that the majority of malware was hosted on web servers located in China. Interestingly, Chinese malware distribution sites are mostly pointed to by Chinese web servers.

We hope that an analysis such as this will help us to better understand the malware problem in the future and allow us to protect users all over the Internet from malicious web sites as best as we can. One thing is clear - we have a lot of work ahead of us.

Help us fill in the gaps!

Thursday, November 29, 2007 2:28 PM



We've been targeting malware for over a year and a half, and these efforts are paying off. We are now able to display warnings in search results when a site is known to be malicious, which can help you avoid drive-by downloads and other computer compromises. We are already distributing this data through the Safe Browsing API, and we are working on bringing this protection to more users by integrating with more Google products. While these are great steps, we need your help going forward!

Currently, we know of hundreds of thousands of websites that attempt to infect people's computers with malware. Unfortunately, we also know that there are more malware sites out there. This is where we need your help in filling in the gaps. If you come across a site that is hosting malware, we now have an easy way for you to let us know about it. If you come across a site that is hosting malware, please fill out this short form. Help us keep the internet safe, and report sites that distribute malware.

Auditing open source software

Monday, October 8, 2007 4:13 PM



Google encourages its employees to contribute back to the open source community, and there is no exception in Google's Security Team. Let's look at some interesting open source vulnerabilities that were located and fixed by members of Google's Security team. It is interesting to classify and aggregate the code flaws leading to the vulnerabilities, to see if any particular type of flaw is more prevalent.
  1. JDK. In May 2007, I released details on an interesting bug in the ICC profile parser in Sun's JDK. The bug is particularly interesting because it could be exploited by an evil image. Most previous JDK bugs involve a user having to run a whole evil applet. The key parts of code which demonstrate the bug are as follows:

    TagOffset = SpGetUInt32 (&Ptr);
    if (ProfileSize < TagOffset)
      return SpStatBadProfileDir;
    ...
    TagSize = SpGetUInt32 (&Ptr);
    if (ProfileSize < TagOffset + TagSize)
      return SpStatBadProfileDir;
    ...
    Ptr = (KpInt32_t *) malloc ((unsigned int)numBytes+HEADER);

    Both TagSize and TagOffset are untrusted unsigned 32-bit values pulled out of images being parsed. They are added together, causing a classic integer overflow condition and the bypass of the size check. A subsequent additional integer overflow in the allocation of a buffer leads to a heap-based buffer overflow.

  2. gunzip. In September 2006, my colleague Tavis Ormandy reported some interesting vulnerabilities in the gunzip decompressor. They were triggered when an evil compressed archive is decompressed. A lot of programs will automatically pass compressed data through gunzip, making it an interesting attack. The key parts of the code which demonstrate one of the bugs are as follows:

    ush count[17], weight[17], start[18], *p;
    ...
    for (i = 0; i < (unsigned)nchar; i++) count[bitlen[i]]++;

    Here, the stack-based array "count" is indexed by values in the "bitlen" array. These values are under the control of data in the incoming untrusted compressed data, and were not checked for being within the bounds of the "count" array. This led to corruption of data on the stack.


  3. libtiff. In August 2006, Tavis reported a range of security vulnerabilities in the libtiff image parsing library. A lot of image manipulation programs and services will be using libtiff if they handle TIFF format files. So, an evil TIFF file could compromise a lot of desktops or even servers. The key parts of the code which demonstrate one of the bugs are as follows:

    if (sp->cinfo.d.image_width != segment_width ||
        sp->cinfo.d.image_height != segment_height) {
      TIFFWarningExt(tif->tif_clientdata, module,
        "Improper JPEG strip/tile size, expected %dx%d, got %dx%d",

    Here, a TIFF file containing a JPEG image is being processed. In this case, both the TIFF header and the embedded JPEG image contain their own copies of the width and height of the image in pixels. This check above notices when these values differ, issues a warning, and continues. The destination buffer for the pixels is allocated based on the TIFF header values, and it is filled based on the JPEG values. This leads to a buffer overflow if a malicious image file contains a JPEG with larger dimensions than those in the TIFF header. Presumably the intent here was to support broken files where the embedded JPEG had smaller dimensions than those in the TIFF header. However, the consequences of larger dimensions that those in the TIFF header had not been considered.

We can draw some interesting conclusions from these bugs. The specific vulnerabilities are integer overflows, out-of-bounds array accesses and buffer overflows. However, the general theme is using an integer from an untrusted source without adequately sanity checking it. Integer abuse issues are still very common in code, particular code which is decoding untrusted binary data or protocols. We recommend being careful using any such code until it has been vetted for security (by extensive code auditing, fuzz testing, or preferably both). It is also important to watch for security updates for any decoding software you use, and keep patching up to date.

Information flow tracing and software testing

Monday, September 17, 2007 9:32 AM



Security testing of applications is regularly performed using fuzz testing. As previously discussed on this blog, Srinath's Lemon uses a form of smart fuzzing. Lemon is aware of classes of web application threats and the input families which trigger them, but not all fuzz testing frameworks have to be this complicated. Fuzz testing originally relied on purely random data, ignorant of specific threats and known dangerous input. Today, this approach is often overlooked in favor of more complicated techniques. Early sanity checks in applications looking for something as a simple as a version number may render testing with completely random input ineffective. However, the newer, more complicated fuzz testers require a considerable initial investment in the form of complete input format specifications or the selection of a large corpus of initial input samples.

At WOOT'07,I presented a paper on Flayer, a tool we developed internally to augment our security testing efforts. In particular, it allows for a fuzz testing technique that compromises between the original idea and the most complicated. Flayer makes it possible to remove input sanity checks at execution time. With the small investment of identifying these checks, Flayer allows for completely random testing to be performed with much higher efficacy. Already, we've uncovered multiple vulnerabilities in Internet-critical software using this approach.

The way that Flayer allows for sanity checks to be identified is perhaps the more interesting point. Flayer uses a dynamic analysis framework to analyze the target application at execution time. Flayer marks, or taints, input to the program and traces that data throughout its lifespan. Considerable research has been done in the past regarding information flow tracing using dynamic analysis. Primarily, this work has been aimed at malware and exploit detection and defense. However, none of the resulting software has been made publicly available.

While Flayer is still in its early stages, it is available for download under the GNU Public License. External contributions and feedback are encouraged!

Automating web application security testing

Monday, July 16, 2007 11:40 AM



Cross-site scripting (aka XSS) is the term used to describe a class of security vulnerabilities in web applications. An attacker can inject malicious scripts to perform unauthorized actions in the context of the victim's web session. Any web application that serves documents that include data from untrusted sources could be vulnerable to XSS if the untrusted data is not appropriately sanitized. A web application that is vulnerable to XSS can be exploited in two major ways:

    Stored XSS - Commonly exploited in a web application where one user enters information that's viewed by another user. An attacker can inject malicious scripts that are executed in the context of the victim's session. The exploit is triggered when a victim visits the website at some point in the future, such as through improperly sanitized blog comments and guestbook entries, which facilitates stored XSS.

    Reflected XSS - An application that echoes improperly sanitized user input received as query parameters is vulnerable to reflected XSS. With a vulnerable application, an attacker can craft a malicious URL and send it to the victim via email or any other mode of communication. When the victim visits the tampered link, the page is loaded along with the injected script that is executed in the context of the victim's session.

The general principle behind preventing XSS is the proper sanitization (via, for instance, escaping or filtering) of all untrusted data that is output by a web application. If untrusted data is output within an HTML document, the appropriate sanitization depends on the specific context in which the data is inserted into the HTML document. The context could be in the regular HTML body, tag attributes, URL attributes, URL query string attributes, style attributes, inside JavaScript, HTTP response headers, etc.

The following are some (by no means complete) examples of XSS vulnerabilities. Let's assume there is a web application that accepts user input as the 'q' parameter. Untrusted data coming from the attacker is marked in red.

  • Injection in regular HTML body - angled brackets not filtered or escaped

    <b>Your query '<script>evil_script()</script>' returned xxx results</b>

  • Injection inside tag attributes - double quote not filtered or escaped

    <form ...
      <input name="q" value="blah"><script>evil_script()</script>">
    </form>

  • Injection inside URL attributes - non-http(s) URL

    <img src="javascript:evil_script()">...</img>

  • In JavaScript context - single quote not filtered or escaped

    <script>
      var msg = 'blah'; evil_script(); //';
      // do something with msg variable
    </script>


In the cases where XSS arises from meta characters being inserted from untrusted sources into an HTML document, the issue can be avoided either by filtering/disallowing the meta characters, or by escaping them appropriately for the given HTML context. For example, the HTML meta characters <, >, &, " and ' must be replaced with their corresponding HTML entity references &lt;, &gt;, &amp;, &quot; and &#39 respectively. In a JavaScript-literal context, inserting a backslash in front of \, ', " and converting the carriage returns, line-feeds and tabs into \r, \n and \t respectively should avoid untrusted meta characters being interpreted as code.

How about an automated tool for finding XSS problems in web applications? Our security team has been developing a black box fuzzing tool called Lemon (deriving from the commonly-recognized name for a defective product). Fuzz testing (also referred to as fault-injection testing) is an automated testing approach based on supplying inputs that are designed to trigger and expose flaws in the application. Our vulnerability testing tool enumerates a web application's URLs and corresponding input parameters. It then iteratively supplies fault strings designed to expose XSS and other vulnerabilities to each input, and analyzes the resulting responses for evidence of such vulnerabilities. Although it started out as an experimental tool, it has proved to be quite effective in finding XSS problems. Besides XSS, it finds other security problems such as response splitting attacks, cookie poisoning problems, stacktrace leaks, encoding issues and charset bugs. Since the tool is homegrown it is easy to integrate into our automated test environment and to extend based on specific needs. We are constantly in the process of adding new attack vectors to improve the tool against known security problems.

Update:
I wanted to respond to a few questions that seem to be common among readers. I've listed them below. Thanks for the feedback. Please keep the questions and comments coming.

Q. Does Google plan to market it at some point?
A. Lemon is highly customized for Google apps and we have no plans of releasing it in near future.

Q. Did Google's security team check out any commercially available fuzzers? Is the ability to keep improving the fuzzer the main draw of a homegrown tool?
A. We did evaluate commercially available fuzzers but felt that our specialized needs could be served best by developing our own tools.

The reason behind the "We're sorry..." message

Monday, July 9, 2007 11:54 AM




Some of you might have seen this message while searching on Google, and wondered what the reason behind it might be. Instead of search results, Google displays the "We're sorry" message when we detect anomalous queries from your network. As a regular user, it is possible to answer a CAPTCHA - a reverse Turing test meant to establish that we are talking to a human user - and to continue searching. However, automated processes such as worms would have a much harder time solving the CAPTCHA. Several things can trigger the sorry message. Often it's due to infected computers or DSL routers that proxy search traffic through your network - this may be at home or even at a workplace where one or more computers might be infected. Overly aggressive SEO ranking tools may trigger this message, too. In other cases, we have seen self-propagating worms that use Google search to identify vulnerable web servers on the Internet and then exploit them. The exploited systems in turn then search Google for more vulnerable web servers and so on.  This can lead to a noticeable increase in search queries and sorry is one of our mechanisms to deal with this.

At ACM WORM 2006, we published a paper on Search Worms [PDF] that takes a much closer look at this phenomenon. Santy, one of the search worms we analyzed, looks for remote-execution vulnerabilities in the popular phpBB2 web application. In addition to exhibiting worm like propagation patterns, Santy also installs a botnet client as a payload that connects the compromised web server to an IRC channel. Adversaries can then remotely control the compromised web servers and use them for DDoS attacks, spam or phishing. Over time, the adversaries have realized that even though a botnet consisting of web servers provides a lot of aggregate bandwidth, they can increase leverage by changing the content on the compromised web servers to infect visitors and in turn join the computers of compromised visitors into much larger botnets. This fundamental change from remote attack to client based download of malware formed the basis of the research presented in our first post. In retrospect, it is interesting to see how two seemingly unrelated problems are tightly connected.

Phishers and Malware authors beware!

Monday, June 18, 2007 2:59 PM



OK, so it might be a little early to declare victory, but we're excited about the Safe Browsing API we launched today. It provides a simple mechanism for downloading Google's lists of suspected phishing and malware URLs, so now any developer can access the blacklists used in products such as Firefox and Google Desktop.

The API is still experimental, but we hope it will be useful to ISPs, web-hosting companies, and anyone building a site or an application that publishes or transmits user-generated links. Sign up for a key and let us know how we can make the API better. We fully expect to iterate on the design and improve the data behind the API, and we'll be paying close attention to your feedback as we do that. We look forward to hearing your thoughts.

Thwarting a large-scale phishing attack

Monday, June 11, 2007 11:35 AM



In addition to targeting malware, we're interested in combating phishing, a social engineering attack where criminals attempt to lure unsuspecting web surfers into logging into a fake website that looks like a real website, such as eBay, E-gold or an online bank. Following a successful attack, phishers can steal money out of the victims' accounts or take their identities. To protect our users against phishing, we publish a blacklist of known phishing sites. This blacklist is the basis for the anti-phishing features in the latest versions of Firefox and Google Desktop. Although blacklists are necessarily a step behind as phishers move their phishing pages around, blacklists have proved to be reasonably effective.

Not all phishing attacks target sites with obvious financial value. Beginning in mid-March, we detected a five-fold increase in overall phishing page views. It turned out that the phishing pages generating 95% of the new phishing traffic targeted MySpace, the popular social networking site. While a MySpace account does not have any intrinsic monetary value, phishers had come up with ways to monetize this attack. We observed hijacked accounts being used to spread bulletin board spam for some advertising revenue. According to this interview with a phisher, phishers also logged in to the email accounts of the profile owners to harvest financial account information. In any case, phishing MySpace became profitable enough (more than phishing more traditional targets) that many of the active phishers began targeting it.

Interestingly, the attack vector for this new attack appeared to be MySpace itself, rather than the usual email spam. To observe the phishers' actions, we fed them the login information for a dummy MySpace account. We saw that when phishers compromised a MySpace account, they added links to their phishing page on the stolen profile, which would in turn result in additional users getting compromised. Using a quirk of the CSS supported in MySpace profiles, the phishers injected these links invisibly as see-through images covering compromised profiles. Clicking anywhere on an infected profile, including on links that appeared normal, redirected the user to a phishing page. Here's a sample of some CSS code injected into the "About Me" section of an affected profile:


<a style="text-decoration:none;position:
absolute;top:1px;left:1px;" href="http://myspacev.net"><img
style="border-width:0px;width:1200px; height:650px;"
src="http://x.myspace.com/images/clear.gif"></a></style>


In addition to contributing to the viral growth of the phishing attack, linking directly off of real MySpace content added to the appearance of legitimacy of these phishing pages. In fact, we received thousands of complaints from confused users along the lines of "Why won't it let any of my friends look at my pictures?" regarding our warnings on these phishing pages, suggesting that even an explicit warning was not enough to protect many users. The effectiveness of the attack and the increasing sophistication of the phishing pages, some of which were hosted on botnets and were near perfect duplications of MySpace's login page, meant that we needed to switch tactics to combat this new threat.

In late March, we reached out to MySpace to see what we could do to help. We provided lists of the top phishing sites and our anti-phishing blacklist to MySpace so that they could disable compromised accounts with links to those sites. Unfortunately, many of the blocked users did not remove the phishing links when they reactivated their accounts, so the attacks continued to spread. On April 19, MySpace updated their server software so that they could disable bad links in users' profiles without requiring any user action or altering any other profile content. Overnight, overall phishing traffic dropped by a factor of five back to the levels observed in early March. While MySpace phishing continues at much lower volumes, phishers are beginning to move on to new targets.

Things you can do to help end phishing and Internet fraud
  • Learn to recognize and avoid phishing. The Anti-Phishing Working Group has a good list of recommendations.

  • Update your software regularly and run an anti-virus program. If a cyber-criminal gains control of your computer through a virus or a software security flaw, he doesn't need to resort to phishing to steal your information.

  • Use different passwords on different sites and change them periodically. Phishers routinely try to log in to high-value targets, like online banking sites, with the passwords they steal for lower-value sites, like webmail and social networking services.

Web Server Software and Malware

Tuesday, June 5, 2007 9:30 AM

Posted by Nagendra Modadugu, Anti-Malware Team

In this post, we investigate the distribution of web server software to provide insight into how server software is correlated to servers hosting malware binaries or engaging in drive-by-downloads.

We determine server operating system by examining the 'Server:' HTTP header reported by most web servers. A survey of servers running roughly 80 million domain names reveals the web server software distribution shown below. Note that these figures may have some margin of error as it is not unusual to find hundreds of domains served by a single IP address.

Web server software across the Internet.



Web server software distribution across the Internet.



Our numbers report a slightly larger fraction of Apache servers compared to the Netcraft web server survey. Our analysis is based on crawl information and only root URLs were examined, therefore hosts that did not present a root URL (e.g. /index.htm) were not included in the statistics. This may have contributed to the disparity with the Netcraft numbers.

Amongst Apache servers, about 35% did not report any version information. Presumably the lack of version information is considered to be a defense against version specific attacks and worms. We observed a long tail of Apache server versions; the top three detected were 1.3.37 (15%), 1.3.33 (7.91%), and 2.0.54 (6.25%).

Amongst Microsoft servers, IIS 6.0 is by far the most popular version, making up about 80% of all IIS servers. IIS 5.0 made up most of the remainder.

Web server software across servers distributing malware.

We examined about 70,000 domains that over the past month have been either distributing malware or have been responsible for hosting browser exploits leading to drive-by-downloads. The breakdown by server software is depicted below. It is important to note that while many servers serve malware as a result of a server compromise (by remote exploits, password theft via keyloggers, etc.), some servers are configured to serve up exploits by their administrators.



Web server software distribution across malicious servers.


Compared to our sample of servers across the Internet, Microsoft IIS features twice as often (49% vs. 23%) as a malware distributing server. Amongst Microsoft IIS servers, the share of IIS 6.0 and IIS 5.0 remained the same at 80% and 20% respectively.

The distribution of top featured Apache server versions was different this time: 1.3.37 (50%), 1.3.34 (12%) and 1.3.33 (5%). 21% of the Apache servers did not report any version information. Incidentally, version 1.3.37 is the latest Apache server release in the 1.3 series, and it is hence somewhat of a surprise that this version features so prominently. One other factor we observe is a vast collection of Apache modules in use.

Distribution of web server software by country.





Web server distribution by country



Malicious web server distribution by country




The figure on the left shows the distribution of all Apache, IIS, and nginx webservers by country. Apache has the largest share, even though there is noticeable variation between countries. The figure on the right shows the distribution, by country, of webserver software of servers either distributing malware or hosting browser exploits. It is very interesting to see that in China and South Korea, a malicious server is much more likely to be running IIS than Apache.

We suspect that the causes for IIS featuring more prominently in these countries could be due to a combination of factors: first, automatic updates have not been enabled due to software piracy (piracy statistics from NationMaster, and BSA), and second, some security patches are not available for pirated copies of Microsoft operating systems. For instance the patch for a commonly seen ADODB.Stream exploit is not available to pirated copies of Windows operating systems.

Overall, we see a mix of results. In Germany, for instance, Apache is more likely to be serving malware than Microsoft IIS, compared to the overall distributions of these servers. In Asia, we see the reverse, which is part of the cause of Microsoft IIS having a disproportionately high representation at 49% of malware servers. In summary, our analysis demonstrates how important it is to keep web servers patched to the latest patch level.