Tuesday, February 9, 2010

By Ed Ceaser (@asdf) and Nick Kallen (@nk)

Sometimes it's really hard to figure out what's causing problems in a web site like Twitter. But over time we have learned some techniques that help us to solve the variety of problems that occur in our complex web site.

A few weeks ago, we noticed something unusual: over 100 visitors to Twitter per second saw what is popularly known as "the fail whale". Normally these whales are rare; 100 per second was cause for alarm. Although even 100 per second is a very small fraction of our overall traffic, it still means that a lot of users had a bad experience when visiting the site. So we mobilized a team to find out the cause of the problem.