Google Talk Architecture, and High Availability (HA)
Via the HA blog (an obviously unserved niche in retrospect), a very interesting 30 minute presentation on the Google Talk architecture.
ConnectedUsers * BuddyListSize * OnlineStateChanges
Interestingly people keep independently re-discovering that maintaining presence is the hard part of scaling these systems.
Its something that really came home hard in my talking with Twitter helping with their scaling challenges (so much so that we took a slide out of our “Social Software for Robots” talk to talk about it, and Blaine mentioned it again in his “Scaling Twitter” talk)
So by way of a PSA:
Presence isn’t easy.
Growth in social systems in non-linear. Ignore the network effect at your peril.
Kick the Tires
Also interesting was “Real Life Load Tests”. The GTalk team deployed to Orkut and GMail weeks before actually turning on the UI for the features to be able to monitor the load. These are the practices that make Bill’s recent observation on HA systems possible:
An interesting takeaway is that it’s clearly possible to re-architect data storage on super-busy production systems seemingly no matter where you start from.
For the rest of bullets see the HA blog post.