This is a tale of how one small-time , hobby-level, mash-up widget developer (that would be me) caused Google to appear to DOS attack Yahoo, and how Yahoo protected itself by blocking Google from accessing its content. This is a tale that almost has the pieces you'd want in a good story... It's got a hint of mystery... It almost has suspense... It's chock full of buzz words... And it even has two valuable lesson for developers of mash-up frameworks and Web Service APIs.
It all happened, unintentionally, through the seemingly innocent combination of yahoo’s del.icio.us social bookmarking API, the iGoogle mash-up homepage, Google’s proxy server and a widget I created to view my bookmarks.
Extremely High Level Summary
Uber Geeks Only
Uber Geeks Only
For a slightly less geeky explanation, some background info, and pictures skip to "The Full Details" section below.
I created a module for iGoogle (module home page) that displays del.icio.us bookmarks. Every time a user that had installed my del.icio.us widget reloaded their iGoogle page the widget made a lot of requests to the del.icio.us web API (owned by Yahoo). Apparently the volume of requests from each page reload combined with a number of my friends using the widget was enough to be considered on par with a small un-sustained DOS attack.
Because all requests from an iGoogle widget need to go through the Google proxy servers to avoid cross domain scripting security in the web browser, Yahoo saw the requests from my widget as coming from Google. Yahoo determined that Google was launching a DOS attack (they called it a swarm) and blocked some of their IP addresses from being able to access the del.icio.us web API.
The effect seems to be localized to IP addresses from a Google Data center near me because if I proxy my requests to Google through another part of the world the problem goes away. I emailed Yahoo (thread here) and they basically said they would keep the IP addresses blocked unless someone from Google contacted them and promised to throttle all future traffic.
Keep reading to find out how I got the widget working while Yahoo and Google remain in a standoff.
The Full Details (With Pictures!)
A little History
Del.icio.us is a social bookmarking site that I use to keep track of bookmarks without having to lug a computer around with me everywhere. (Here's a 3 minute video description of social bookmarking if you're not familiar with the concept.) iGoogle is a mash-up that allows you to create a personalized home page made up of publicly available modules.
There are a lot of modules out there for putting del.icio.us bookmarks on your iGoogle page, but none of them did exactly what I wanted.
Basically I wanted to see a list of my del.icio.us tags, and then when I clicked on a tag, the bookmarks with that tag should be immediately listed. All the modules I found brought me to the del.icio.us page for that tag, but that was way too much page reloading for me. So I created a module that has these features:
- Displays your del.icio.us tags in a tag cloud
- When you click on a tag, the bookmarks with that tag appear below the tag cloud. Click on that tag again to clear the list of bookmarks from the screen
- Clicking on the title brings you to your del.icio.us home page
- Supports up to 100 tags with 100 bookmarks per tag
How it works: Mash-ups & this Situation
Cross Site Scripting and mash-up Proxies
Now there is a catch with dynamically fetching content in a web browser called the Cross Site Scripting Security. Basically this means that if you get a page from google.com, the only place you’re allowed to make an AJAX request to, while on that page, is google.com. This is for your protection and is a good thing because it eliminates one more way people can steal your bank account PIN. Like most things for your protection the world has found a way around it and we are better off because the workaround has enabled mash-ups like iGoogle.
The solution (shown in the picture above) for getting around cross domain scripting security in a mash-up is for the owner of the original page (in this case google.com) to host a proxy server. All AJAX requests destined for other domains (like del.icio.us.com) are sent to that proxy at google.com and the proxy marshals the traffic between external sites and the web browser. The browser still protects against cross site scripting, but since all the traffic is going through the google.com proxy, the browser is none the wiser to the fact that data is coming from del.icio.us.com.
What Went Wrong
I made a bad assumption. I assumed that del.icio.us had a big data center and couldn't possibly care if I made 50 rapid requests of their web API. I actually remember asking myself the question "Should I throttle these requests?" It was a trade off between rapid UI response time for the widget users and load on the del.icio.us servers. I decided that since del.icio.us was owned by Yahoo and Yahoo serves up billions of pages a day, that they couldn't possibly care about a flutter of requests from my widget. I was wrong.
After almost 9 months of the widget growing in popularity Yahoo noticed and started returning the following page when the widget fetched bookmarks through the google proxy:
I originally figured this was just a del.icio.us API glitch and would go away. A week later through my emails with the del.icio.us support team I learned otherwise. The short version of the email thread is that Yahoo was actively blocking Google's IP addresses as a result of the traffic from my widget.
In an act of belated responsible development, I changed my module to only request the tags at start-up and to make an individual bookmark request only when the user clicks on a specific tag. Since the widget is dynamically served each time someone logs into iGoogle it was easy to replace the version everyone was using.
Yahoo continued the email discussion off the forum because as they stated "it's probably not as interesting to everyone at this point". I'd argue that it was never interesting to everyone and there are only a handful of people in the world that would care. But to me, the value of a web forum is to be able to find those obscure bits of information about a specific topic. So why take things off a forum? But I digress...
Through those private email I learned that the "swarms" subsided after my widget change but then picked back up a little. They were a bit vague on the numbers. Because del.icio.us doesn't have a way to track the application making the requests at a more granular level than IP address, they are basically blind as to the real cause of swarms.
I did some testing and discovered that the issue appears to be localized to my geographic region. (North East US) This makes sense since I probably hit the same google data center every time I use iGoogle and many of my friends using the widget are in this area. So the problem is limited to that small community of iGoogle / del.icio.us users, living near me, that want to access their bookmarks using their iGoogle home page through the delicious JSON API.
I have no definitive proof that it was only my module causing the problem, but circumstantial evidence suggests that the module was at least part of the problem.
In the end Yahoo refused to unblock the Google IP addresses and my posts to the iGoogle Google Group asking them to contact Yahoo went unanswered. I can't say I'm surprised, this is hardly a major issue in either of their world views. And rightly so...
This experience has taught me two things.
- If you are a mash-up framework providing an AJAX proxy, you should monitor the HTTP response codes coming back to assure that a widget developer has not damaged your relationship with an external service API.
- Whenever you provide developers an API for accessing your web content, be sure to provide a developer API Key and require its use with every request. This allows you to filter offending behavior by the developer or application instead of IP address. If del.icio.us had done this they could have simply blocked my widget and left their relationship with Google intact.
Epilogue - The End Run...One night it occurred to me that I could circumvent the Yahoo IP block by putting a second proxy server between Google and Yahoo. Five lines of CGI and 10 minutes later and my second proxy was working swimmingly. Then I changed my module to tell Google to make a request to my proxy server which in turn forwards the request to del.icio.us. and once again del.icio.us is none the wiser as to the source of the traffic.
I decided not to publish the updated version using my proxy server because I didn't implement either of the lessons I learned when I wrote it.