Cross-site scripting (aka XSS) is the term used to describe a class of security vulnerabilities in web applications. An attacker can inject malicious scripts to perform unauthorized actions in the context of the victim's web session. Any web application that serves documents that include data from untrusted sources could be vulnerable to XSS if the untrusted data is not appropriately sanitized. A web application that is vulnerable to XSS can be exploited in two major ways:

    Stored XSS - Commonly exploited in a web application where one user enters information that's viewed by another user. An attacker can inject malicious scripts that are executed in the context of the victim's session. The exploit is triggered when a victim visits the website at some point in the future, such as through improperly sanitized blog comments and guestbook entries, which facilitates stored XSS.

    Reflected XSS - An application that echoes improperly sanitized user input received as query parameters is vulnerable to reflected XSS. With a vulnerable application, an attacker can craft a malicious URL and send it to the victim via email or any other mode of communication. When the victim visits the tampered link, the page is loaded along with the injected script that is executed in the context of the victim's session.

The general principle behind preventing XSS is the proper sanitization (via, for instance, escaping or filtering) of all untrusted data that is output by a web application. If untrusted data is output within an HTML document, the appropriate sanitization depends on the specific context in which the data is inserted into the HTML document. The context could be in the regular HTML body, tag attributes, URL attributes, URL query string attributes, style attributes, inside JavaScript, HTTP response headers, etc.

The following are some (by no means complete) examples of XSS vulnerabilities. Let's assume there is a web application that accepts user input as the 'q' parameter. Untrusted data coming from the attacker is marked in red.

  • Injection in regular HTML body - angled brackets not filtered or escaped

    <b>Your query '<script>evil_script()</script>' returned xxx results</b>

  • Injection inside tag attributes - double quote not filtered or escaped

    <form ...
      <input name="q" value="blah"><script>evil_script()</script>">

  • Injection inside URL attributes - non-http(s) URL

    <img src="javascript:evil_script()">...</img>

  • In JavaScript context - single quote not filtered or escaped

      var msg = 'blah'; evil_script(); //';
      // do something with msg variable

In the cases where XSS arises from meta characters being inserted from untrusted sources into an HTML document, the issue can be avoided either by filtering/disallowing the meta characters, or by escaping them appropriately for the given HTML context. For example, the HTML meta characters <, >, &, " and ' must be replaced with their corresponding HTML entity references &lt;, &gt;, &amp;, &quot; and &#39 respectively. In a JavaScript-literal context, inserting a backslash in front of \, ', " and converting the carriage returns, line-feeds and tabs into \r, \n and \t respectively should avoid untrusted meta characters being interpreted as code.

How about an automated tool for finding XSS problems in web applications? Our security team has been developing a black box fuzzing tool called Lemon (deriving from the commonly-recognized name for a defective product). Fuzz testing (also referred to as fault-injection testing) is an automated testing approach based on supplying inputs that are designed to trigger and expose flaws in the application. Our vulnerability testing tool enumerates a web application's URLs and corresponding input parameters. It then iteratively supplies fault strings designed to expose XSS and other vulnerabilities to each input, and analyzes the resulting responses for evidence of such vulnerabilities. Although it started out as an experimental tool, it has proved to be quite effective in finding XSS problems. Besides XSS, it finds other security problems such as response splitting attacks, cookie poisoning problems, stacktrace leaks, encoding issues and charset bugs. Since the tool is homegrown it is easy to integrate into our automated test environment and to extend based on specific needs. We are constantly in the process of adding new attack vectors to improve the tool against known security problems.

I wanted to respond to a few questions that seem to be common among readers. I've listed them below. Thanks for the feedback. Please keep the questions and comments coming.

Q. Does Google plan to market it at some point?
A. Lemon is highly customized for Google apps and we have no plans of releasing it in near future.

Q. Did Google's security team check out any commercially available fuzzers? Is the ability to keep improving the fuzzer the main draw of a homegrown tool?
A. We did evaluate commercially available fuzzers but felt that our specialized needs could be served best by developing our own tools.