Fuzzing at scale

Friday, August 12, 2011 2:59 PM

One of the exciting things about working on security at Google is that you have a lot of compute horsepower available if you need it. This is very useful if you’re looking to fuzz something, and especially if you’re going to use modern fuzzing techniques.

Using these techniques and large amounts of compute power, we’ve found hundreds of bugs in our own code, including Chrome components such as WebKit and the PDF viewer. We recently decided to apply the same techniques to fuzz Adobe’s Flash Player, which we include with Chrome in partnership with Adobe.

A good overview of some modern techniques can be read in this presentation. For the purposes of fuzzing Flash, we mainly relied on “corpus distillation”. This is a technique whereby you locate a large number of sample files for the format at hand (SWF in this case). You then see which areas of code are reached by each of the sample files. Finally, you run an algorithm to generate a minimal set of sample files that achieves the code coverage of the full set. This calculated set of files is a great basis for fuzzing: a manageable number of files that exercise lots of unusual code paths.

What does corpus distillation look like at Google scale? Turns out we have a large index of the web, so we cranked through 20 terabytes of SWF file downloads followed by 1 week of run time on 2,000 CPU cores to calculate the minimal set of about 20,000 files. Finally, those same 2,000 cores plus 3 more weeks of runtime were put to good work mutating the files in the minimal set (bitflipping, etc.) and generating crash cases. These crash cases included an interesting range of vulnerability categories, including buffer overflows, integer overflows, use-after-frees and object type confusions.

The initial run of the ongoing effort resulted in about 400 unique crash signatures, which were logged as 106 individual security bugs following Adobe's initial triage. As these bugs were resolved, many were identified as duplicates that weren't caught during the initial triage. A unique crash signature does not always indicate a unique bug. Since Adobe has access to symbols and sources, they were able to group similar crashes to perform root cause analysis reducing the actual number of changes to the code. No analysis was performed to determine how many of the identified crashes were actually exploitable. However, each crash was treated as though it were potentially exploitable and addressed by Adobe. In the final analysis, the Flash Player update Adobe shipped earlier this week contained about 80 code changes to fix these bugs.

Commandeering massive resource to improve security is rewarding on its own, but the real highlight of this exercise has been Adobe’s response. The Flash patch earlier this week fixes these bugs and incorporates UIPI protections for the Flash Player sandbox in Chrome which Justin Schuh contributed assistance on developing. Fixing so many issues in such a short time frame shows a real commitment to security from Adobe, for which we are grateful.
The comments you read here belong only to the person who posted them. We do, however, reserve the right to remove off-topic comments.


Posix said...

Good job guys!

Matt said...

Were any of these issues unique to the Mac or Windows versions of Flash?

jjhare said...

If only Adobe would show so much dedication to their paying customers and would stop using their ridiculous activation schemes that only impact those paying customers. That wouldn't take much effort at all.

Shantanu Joshi said...

Interesting information, thanks for the post!
BTW - who paid for the whole fuzzing exercise?

eisbaw said...


Please do this once every 6 months and the world will be a better place.

Chris Weber said...

Very cool guys - so that's 20 TB of just the SWF binaries?

Rajat Swarup said...

Adobe should pay you guys for improving their security. I hope you enroll into the no more free bugs scheme.

KoalaBear said...

Nice work.

Too bad Adobe tries to vaporize it and does not credit Google..

Quitch said...

I think Adobe Reader could do with such a pass too, especially as figures show it's a more popular target than Flash. Why couldn't you have packaged Reader with Chrome too? ;)

Nick Tulett said...

Could you give some real heavy extra emphasis to "20TB", "2,000 cores" and "4 weeks", just in case my manager skims this and asks why I can't do the same thing?

Robert Johnson said...

So let me get this straight. Google grabs millions of swf files, ignoring the original content creators license/wishes. Flips bits on their huge compute farm under the guise of computer science. Sends Adobe 400+ files that "crash", giving 0 guidance or even caring if they are exploitable, and thus harmful to internet users. And then complains they weren't given proper credit for 400+ vulnerabilities?? Give me a break...

TestFuzzer said...

This is completely unfair competition and unfair practices vis-a-vis other security researchers (or fuzzer enthus).

1. monoplistic use of access to swf file links. If I were to search and grab links for similar set (say even 1TB), your search engine would flag me off. And you dont provide any mechanism to do it legit way. You shut down the API.

2. Would you in any form allow fuzzing on your infra even if I were to pay for it? For example cycles on GAE?

You guyz killed couple of my bugs.

bartek5186 said...

I see credist here:

Craig S Wright said...

I find it funny that I just started writing a series of blog posts talking about the difficulties of teaching good coding practices with the text books we have when Adobe shows the results once again.


Andrew said...

Fuzz IE 10 next please!

Spudd86 said...

@Robert Johnson Google is not giving the swf files to anyone, and since Google's crawler found them the author pretty clearly intended people to be able to download them. So the creator's copyright is not infringed, nothing wrong there.

Second even if the bugs are not exploitable flash still crashes when it hits them, avoiding that can only be good for internet users since flash will not crash as often now. So what the heck are you complaining about exactly. I really want to know what about fixing 106 bugs is bad. Nothing is broken here, flash files that worked before still work. Seriously WTF is your problem.

Code Raptor said...

So you are basically saying:
(1) Fuzzer improves security
(2) Grid-based-fuzzer improves security radically
(3) Those who quickly fix security issues show a real "commitment" to security.

Unfortunately, you are wrong on all counts. Fuzzing tells you nothing about the security of the application, absolutely nothing. It is like having infinite monkeys typing on infinite keyboard, and attempting to find a finite probability that they'd produce a work of art. It is less time consuming to do a careful failure analysis, than attempt random fuzzing.

@Adobe: Please go back to the drawing board, and don't indulge in security theater like this. Oh, I forgot - you are now in cahoots with Google, pitched against Apple, so no wonder Google-monkeys are trying to paint your face white.

Google and Adobe - No matter how hard you try, you can't ever match Apple. Period.

So post something about real security, and stop blabbering marketing garbage.

Dr Craig S Wright GSE said...

@CR, you have stated "Fuzzing tells you nothing about the security of the application".

You could not be more wrong. Fuzzing has dramatically reduced the number of code errors in many major software suites and can cover 80-90% of execution paths.

This is not 100%, but nothing ever is, not even formal verification.

Now, all software has an unknown but fixed number of vulnerabilities at a point in development. This number will change as patches and updates occur, but for each release there are a fixed number of existing vulnerabilities that start as unknown vulnerabilities and are then discovered.

We modelled this in the paper below:

"A Quantitative Analysis into the Economics of Correcting Software Bugs"
Craig S. Wright and Tanveer A. Zia

Basically, the more bugs you find early, the lower the cost of mitigating them. This leaves fewer holes to be exploited and increases the costs of exploiting them.

Security has no absolutes, the notion of an absolute is false in anything we can think of and security is no different, it is only relative. This means it is a risk function. It comes to ecomonics and cost. Increasing the cost to explit software and reducing the cost of a vendor patching and discovering holes thus increases security.

@CR "It is less time consuming to do a careful failure analysis, than attempt random fuzzing."

I assume that you do not now what automation is?

Time costs for people, automation means we can fuzz faster than we can code review.

Also, there are many errors and omissions in any manual code review.

As for @Adobe, they suck as they do not do this, @Google had to do it for them.

As for Apple, they have just as many errors as Microsoft per SLOC, just fewer users.

Code Raptor said...

@DrCSW-GSE: Fuzzing may have "dramatically" reduced vulnerabilities, but it still tells you nothing about the security of the application. Yes, you fuzz, fix, fuzz, fix, fuzz, fix, but still do not know if all bugs have been squashed. Which is why you repeat the fuzz, fix cycle.

I agree with you about fixing bugs early. Sadly, most fuzzing happens *after* the application is deployed in production environment. So this fuzzing has increased costs significantly (as per your analysis, and I agree with it as well).

If all developers in the world are taught "careful failure analysis", almost every single bug will be caught in the earlier phases. Sadly, the attitude is to churn out code, and ship. And then run fuzzer grids to find bugs. This automation is even more costly then (again, as per your analysis, and I agree with it as well).

So I stand by my statement of careful failure case analysis being the need of the hour, and not fuzzing. Time and resources should be devoted to perfecting failure analysis techniques, than do some stupid random fuzzing that tells you essentially nothing about the application.

Dr Craig S Wright GSE said...

@CR I agree with you in principle.

A combination of static and dynamic code analysis early in the development cycle will make code with far fewer vulnerabilities.

More the cost of early detection and fixing will help create software with a lower overall cost. What we need to do is to have companies see that early detection will save them money in the long term.

More, it will make their clients suffer less.

Yes, the fuzz-fix cycle is ongoing, but you can estimate the numbers of remaining vulnerabilities. This does not of course mean that you have found them all, but you can know around how many more issues you can expect.

Adobe is crazy, the loss of goodwill as well as the costs they impose on themselves and their users is phenomenal, so it is incomprehensible how they do not improve their practices.

Google of course do not have source code to test for Adobe. So the fuzz cycle is as good as an external third party can get. I still say cudous to Google for doing this. They have removed many issues that Adobe had been sitting on, many that had been used as zero-days for attacks.

You may note that one of my things is picking on other academics who teach coding without security. Basically poor coding that the student is expected to unlean later.

So we agree here.

Now, you will be Happy to know that the main aspect of my research is in perfecting failure analysis techniques. So, yes, this is good. The issue is that fuzzing also helps. When Adobe and others do not do their job, fuzzing allows third parties to show just how much the development processes in companies like Adobe suck.