August 21st, 2008
Jason Sobel has an interesting post, “Scaling Out” on Facebook’s BCP work and the move to being multi-colo.
Interesting to me was noting that:
- they just got around to this 8 months ago, and they’re fscking Facebook (which means you can wait)
- they’re still doing all writes to a single datacenter
- they’re hacking an object-level mark/sweep into the MySQL replication stream suggesting a certain parable of a hammer and nails.
July 29th, 2007
Via the HA blog (an obviously unserved niche in retrospect), a very interesting 30 minute presentation on the Google Talk architecture.
ConnectedUsers * BuddyListSize * OnlineStateChanges
Interestingly people keep independently re-discovering that maintaining presence is the hard part of scaling these systems.
Its something that really came home hard in my talking with Twitter helping with their scaling challenges (so much so that we took a slide out of our “Social Software for Robots” talk to talk about it, and Blaine mentioned it again in his “Scaling Twitter” talk)
So by way of a PSA:
Presence isn’t easy.
Growth in social systems in non-linear. Ignore the network effect at your peril.
Kick the Tires
Also interesting was “Real Life Load Tests”. The GTalk team deployed to Orkut and GMail weeks before actually turning on the UI for the features to be able to monitor the load. These are the practices that make Bill’s recent observation on HA systems possible:
An interesting takeaway is that it’s clearly possible to re-architect data storage on super-busy production systems seemingly no matter where you start from.
For the rest of bullets see the HA blog post.