Automating web application security testing

Monday, July 16, 2007 11:40 AM

Cross-site scripting (aka XSS) is the term used to describe a class of security vulnerabilities in web applications. An attacker can inject malicious scripts to perform unauthorized actions in the context of the victim's web session. Any web application that serves documents that include data from untrusted sources could be vulnerable to XSS if the untrusted data is not appropriately sanitized. A web application that is vulnerable to XSS can be exploited in two major ways:

    Stored XSS - Commonly exploited in a web application where one user enters information that's viewed by another user. An attacker can inject malicious scripts that are executed in the context of the victim's session. The exploit is triggered when a victim visits the website at some point in the future, such as through improperly sanitized blog comments and guestbook entries, which facilitates stored XSS.

    Reflected XSS - An application that echoes improperly sanitized user input received as query parameters is vulnerable to reflected XSS. With a vulnerable application, an attacker can craft a malicious URL and send it to the victim via email or any other mode of communication. When the victim visits the tampered link, the page is loaded along with the injected script that is executed in the context of the victim's session.

The general principle behind preventing XSS is the proper sanitization (via, for instance, escaping or filtering) of all untrusted data that is output by a web application. If untrusted data is output within an HTML document, the appropriate sanitization depends on the specific context in which the data is inserted into the HTML document. The context could be in the regular HTML body, tag attributes, URL attributes, URL query string attributes, style attributes, inside JavaScript, HTTP response headers, etc.

The following are some (by no means complete) examples of XSS vulnerabilities. Let's assume there is a web application that accepts user input as the 'q' parameter. Untrusted data coming from the attacker is marked in red.

  • Injection in regular HTML body - angled brackets not filtered or escaped

    <b>Your query '<script>evil_script()</script>' returned xxx results</b>

  • Injection inside tag attributes - double quote not filtered or escaped

    <form ...
      <input name="q" value="blah"><script>evil_script()</script>">

  • Injection inside URL attributes - non-http(s) URL

    <img src="javascript:evil_script()">...</img>

  • In JavaScript context - single quote not filtered or escaped

      var msg = 'blah'; evil_script(); //';
      // do something with msg variable

In the cases where XSS arises from meta characters being inserted from untrusted sources into an HTML document, the issue can be avoided either by filtering/disallowing the meta characters, or by escaping them appropriately for the given HTML context. For example, the HTML meta characters <, >, &, " and ' must be replaced with their corresponding HTML entity references &lt;, &gt;, &amp;, &quot; and &#39 respectively. In a JavaScript-literal context, inserting a backslash in front of \, ', " and converting the carriage returns, line-feeds and tabs into \r, \n and \t respectively should avoid untrusted meta characters being interpreted as code.

How about an automated tool for finding XSS problems in web applications? Our security team has been developing a black box fuzzing tool called Lemon (deriving from the commonly-recognized name for a defective product). Fuzz testing (also referred to as fault-injection testing) is an automated testing approach based on supplying inputs that are designed to trigger and expose flaws in the application. Our vulnerability testing tool enumerates a web application's URLs and corresponding input parameters. It then iteratively supplies fault strings designed to expose XSS and other vulnerabilities to each input, and analyzes the resulting responses for evidence of such vulnerabilities. Although it started out as an experimental tool, it has proved to be quite effective in finding XSS problems. Besides XSS, it finds other security problems such as response splitting attacks, cookie poisoning problems, stacktrace leaks, encoding issues and charset bugs. Since the tool is homegrown it is easy to integrate into our automated test environment and to extend based on specific needs. We are constantly in the process of adding new attack vectors to improve the tool against known security problems.

I wanted to respond to a few questions that seem to be common among readers. I've listed them below. Thanks for the feedback. Please keep the questions and comments coming.

Q. Does Google plan to market it at some point?
A. Lemon is highly customized for Google apps and we have no plans of releasing it in near future.

Q. Did Google's security team check out any commercially available fuzzers? Is the ability to keep improving the fuzzer the main draw of a homegrown tool?
A. We did evaluate commercially available fuzzers but felt that our specialized needs could be served best by developing our own tools.
The comments you read here belong only to the person who posted them. We do, however, reserve the right to remove off-topic comments.


Philipp Lenssen said...

Thanks for the explanations.

In a future post, can you explain how you limit the damage an XSS exploit cookie stealer on * can do? E.g. if an XSS hole is found at (these things have been found in the past), how do you ensure it can't easily spread to -- if that's even possible if you want to keep a single sign on via Google Account?

Alex said...

Is Lemon going to be available to the public?

Akuma said...

If you talk about XSS you rearly should have mentioned the "XSS Cheat Sheet". You can pull it up with a Google search for that term, since i did not want to post the URL here ;-)
It shows a lot more attack vectors.

.mario said...

Automated scanning will never be able to replace manual testing but it's a good and fast approcach to catch "low hanging fruits".

If you are looking for more advanced vectors I recommend the xssDB hosted on GNUCITIZEN. The vectors from the XSS Cheat Sheet will find their way in the next days as well as some XSS injection verctors of mine.

Also we will add SQL injection vectors later.

I am very interested in contributing to the Lemon project so if you need some manpower just drop me a line.


pdp said...

What about DOM-based XSS. This type of vector is quite common and extremely hard to detect. I don't think that there is a tool that can handle it at the moment.

The XSSDB can be used in many different ways. Since it is community driven I guess you might be interested in consuming the feed into your Lemon tool to provide finner results.

Javier Mendoza said...

Just a question, after reading this I understand that Lemon is a testing box, with some scripts, programs and so to test the pages weakness to a XSS attack. I think this is great, but what about, apart from this, using a layer 7 firewall in front of the servers?

At least here in Spain, these kind of boxes are not very common, although they filter HTTP request pretty well...

Just my 2 cents, and sorry for my English!

Pogo said...

I'd like to know if Lemon is ever going to be released to the public.

Any chance of this happening?

nEUrOO said...

> What about DOM-based XSS. This type
> of vector is quite common and
> extremely hard to detect. I don't
> think that there is a tool that can
> handle it at the moment.

As long as they will not do javascript static/functional analysis, tool will not be able to test for this.
And I don't think it will come by tomorrow.. :/

Andres Riancho said...

Shameless plug!

I have been working on a web application attack and audit framework for some time, maybe you guys would like to see it . Many things make w3af a great project: gpl, coded in python, extended using plugins and much much more!. The site is:

Hong said...

I don't think your suggestion for preventing XSS can avoid the "Injection inside URL attributes - non-http(s) URL" XSS vlun.


Mark said...

While I recognize you have to start somewhere, there is little to no support from Google when your account has been compromised. You fill out a plain and simple Google form with no reccord or confirmation number issued. No promise of a response in 5 business days, etc., just soon as possible. With the billions you make, maybe next Google should try to buy a company who knows how to offer good customer service, and offer a tinely response to customers who don't know what to do, or where to turn and are anxious because thousands have been taken out of their accounts. Maybe they will get it back, but really its rather disappointing you can't even send a confirmation email to let us know you really did get the email and that its not "crawling" around somewhere in cyberspace.

Mastishka said...
This comment has been removed by the author.
Mastishka said...

I own a website and had a Google AdSense account. In the early days when I was getting information about earning money via my website, I came to know about Google AdSense. As an analyst I am always curious about what is happening behind the scenes, so I went through the AdSense ad generator code which can be easily download from Google's server, which they used to generate Ads.

To know more about PPC model of advertisement I had gone through number of articles/reports on Pay Per Click mechanism including the report of Dr. Tuzhilin (Professor of Information Systems at the Stern School of Business at New York University), who evaluated Google’s invalid click detection efforts (Find PDF Report [Source:]).

After going through all those articles and analyzing Google’s code I found a way to simulate human behavior in click generation and page impressions in proper (acceptable) ratio from different geographic location (IP address) and was able to credit thousands of dollars in my AdSense account (By not a single human being generated click).

So, do you realy think they are really having good things with them???

Contact me at if you like to know more...

antispam96 said...
This comment has been removed by a blog administrator.
Mahender said...
This comment has been removed by a blog administrator.
Mahender said...

I am posting this message on behalf of Ishita Gureck who has profile on Orkut ,But facing problem because some intrudder,mischief people has create her two more profile with same name and details also has joined illegal communities from her fake profile.
Her original profile has more than 100 scraps.

but the fake two has 50 and 3 scraps respectively..when you search using ISHITA GURECK search option...there may be few political resons...We have well verified with details..I request to delete the same to avoid any further infeltration....Please help urgent..Is there any way to avoid any else to do such illegal act...Its major concern.

Arian said...

@mario -- re: Human analysis -- it's interesting you say this. As we refine tests at WhiteHat, I get to measure percentages of vulnerability.

e.g.- many classes of our XSS detection have a 99.9something% accuracy rate, and require almost no human validation.

Others can drop as low as 40% accuracy, but as we learn with time we can streamline what to look for, variances, and document them, and the net results is that finding locations weak to XSS in over 600 hosts better than any "scanner" isn't that hard.

@pdp -- DOM-based XSS: Um, we do this fairly well with WhiteHat Sentinel. We, like all the scanners, have a "DOM-based parser" as well as static analysis, but we have some tricks in automation, with humans added, that allow us to find this.

@hong -- Attribute-based XSS. Done. Solved. Sentinel scanner, above.

Took a while though.

@GOS blog -- You realize the whole XSS problem isn't just a "javascript injection problem"? There's many other ways to find this, not to mention you have livescript, actionscript, mocha, vbscript, and good old HTML that presents issues.

Take the image source tag -- the majority of modern browsers will not execute js directly in the src= tag, nor using the js embedded-as-an-image trick. Older IE, Opera, and some moz versions.

Here's a handful of simple examples:

I'm curious if Google has a "protect our top browsers" or "protect all browsers" stance?

Arian said...

@GOS blog -- re: recommendations, also one more BIG thing:

Converting CRLF to \r\n can be really dangerous depending on when, where, and how it's done.

In many, many applicatons folks do this wrong and you wind up with exploitable applications because \r\n lands somewhere in the headers, most commonly URI data, like a name-value pair, passed in the Location Header on a 302 redirect, but sometimes in a cookie with user-supplied value as well.

This can give you full control of the HTTP Response, in addition to opening up some dangerous cache-poisoning attacks that are very hard to detect and measure.


crazy.frog said...

what about url encoded attacks? u have not covered tht :-p

Haitham said...

excuse me......
but am i the only one who is ennoyed from this eye test every time i want to search something on google??
anyway if the way stays like this, i'm sure not only me, a lot of google users would transfer to yahoo or others. Because it is ennoying

Ron said...
This comment has been removed by a blog administrator.
Computer Guy said...

To make automatic test for WEB site vulnerabilities I use CyD NET Utilities. The latest version has a new testing engine. Sometimes the program make mistakes but work fine and fast.