Blog posts tagged "twitter"

Random Notes on Twitter Culture

December 4th, 2008

I tried to fit this all into 140 characters. I really did. I couldn’t do it, not even with disemvoweling.

#motrinmom

Chatting with a friend who does information architecture for pharmaceutical advertising she was shocked I hadn’t heard about the “Motrin Mom” twitter-in-a-teapot. I had no idea what she was talking about.

Apparently “Twittering Critics Brought Down [the] Motrin Mom Campaign”. And the entire advertising industry, at least here in New York, is having a fear-of-a-twitter planet moment. Complete with righteous anger about the “irrationality of Twitter”. (um, hello folks, but didn’t you build one of the largest global business by cynically manipulating people’s “irrationality”?)

But the part that really caught me off is this didn’t blip my radar at all. Maybe I was just offline for it, but as far as I can tell the twittering classes I follow didn’t peep about this. I thought Twitter was all about us? (Also, Summize you are already awesome and everything, but if you add “search within people you’re following” and “search within people who follow you” I promise to love you forever)

@flickr

Only tangentially related, I’m sure Tyler Hawkins aka @flickr has a very busy @replies tab.

What I can’t figure out is if all these folks responding to @flickr are really confused about whether Hawkins is a Flickr representative (he isn’t and doesn’t in anyway suggest he might be) or just believe so strongly that “@flickr” address twits will arrive in Flickr’s inbox that reality is irrelevant.

I’m torn on whether the assumption that when you speak you will be heard is the ultimate arrogance (and one particularly prevalent on Twitter), or if rather this proves that we’ve historically worried too much about URIs and that culture has no problem evolving them ad-hoc.

Now if only I had a thesis, rather then a rambling collection of half thoughts. Which is why I wanted to fit this all into 140 characters. Alas.

Ahab Failed

July 29th, 2008

A Couple of Caveats on Queuing

July 7th, 2008

Les’ “Delight Everyone” post is latest greatest addition to the 17th letter of the alphabet for savior conversation.

And believe me I’m a huge fan, and am busy carving out a night sometime this week to play with the RabbitMQ/XMPP bridge (/waves hi Alexis).

But …. there are a couple of caveats:

1) Some writes need to be real time.

Les notes this as well, but I just wanted to emphasize because really, they do.

If you can’t see your changes take effect in a system your understanding of cause and effect breaks down. It doesn’t matter that your understanding is wrong, you still need one to function. Ideally a physical analogy too. There are no real world effects that get queued for later application. Violate the principle of (falsely) seeming to respect real world cause and effect and your users will remain forever confused.

del.icio.us showing you the wrong state when you use the inline editing tool, and Flickr taking a handful of seconds to index a newly tagged photo are both good examples of subtly broken interfaces that can really throw people.

My data, now real time. Everyone else can wait (how long depends on how social your users are).

2) You’ve got to process that queue eventually.

Ideally you can add processing boxes in parallel forever but if your dequeuing rate falls below your queuing rate you are, in technical terms, screwed.

Think about it, if you’re falling behind 1 event per second, processing 1,000,000 events a second, but adding 1,000,001 for example, at the end of the day your 86,400 events in debt and counting. It’s likes losing money on individual sales, but trying to make it up in volume.

Good news: Traffic is spiky and most sites see daily cycles with quiet times.

Bad news: Many highly tuned systems exhibit slow down properties as their backlogs increase. Like a credit card, processing debt can get exponentially unmanageable.

In practice this means that most of the time your queue consumers should be sitting around bored. (see Allspaw’s Capacity Planning slides for more on that theme.)

If you can’t guarantee those real time writes for thems that cares, and mostly bored queue consumers the rest of the time then your queues might not delight you after all.

See also: Twitter, or Architecture Will Not Save You

Twitter, or Architecture Will Not Save You

May 28th, 2008


(circa 2006 Twitter maintenance cat)

Along with a whole slew of smart folks, I’ve been playing the current think game de jour, “How would you re-architect Twitter?”. Unlike most I’ve been having this conversation off and on for a couple of years, mostly with Blaine, in my unofficial “Friend of Twitter” capacity. (the same capacity that I wrote the first Twitter bot in, and have on rare occasion logged into their boxes to play “spot the run away performance issue.”)

For my money Leonard’s Brought to You By the 17th Letter of the Alphabet is probably the best proposed architecture I’ve seen — or at least it matches my own biases when I sat down last month to sketch out how build a Twitter-like thing. But when Leonard and I were chatting last week about this stuff, I was struck what was missing from the larger Blogosphere’s conversation: the issues Twitter is actually facing.

Folks both within Twitter and without have framed the conversation as an architectural challenge. Meanwhile the nattering classes have struck on the fundamental challenge of all social software (namely the network effects) and are reporting that they’ve gotten confirmation from “an individual who is familiar with the technical probelms at Twitter” that indeed Twitter is a social software site!

Living and Dying By the Network

All social software has to deal with the network effect. At scale it’s hard. And all large social software has had to solve it. If you’re looking for the roots of Twitter’s special challenges, you’re going to have to look a bit farther a field.

Though you can hedge your bets with this stuff by making less explicit promises than Twitter does (everything from my friends in a timely fashion is pretty hard promise to keep). Flickr mitigates some of this impact by making promises about recent contacts, not recent photos (there are a fewer people than photos), meanwhile Facebook can hide a slew of sins behind the fact that their newsfeeds are “editorialized”, no claims of completeness anywhere in site. (there is a figure floating around that at least at one point Facebook was dropping 80% of their updates on the floor)

So while architectures that strip down Twitter to queues, and logs could be a huge win, and while thinking about new architectures is the sexy, hard problem we all want to fix, Twitter’s problems are really of a more pedestrian hard, plumbing and ditch digging nature. Which is less fun, but reality.

Growth

Their first problem is growth. Honest to god hockey stick growth is so weird, and wild, and hard, thats it’s hard to imagine and cope with if you haven’t been through it at least once. To quote Leonard again (this from a few weeks ago back when TC thought they’d figured out that Twitter’s problems were Blaine):

“Even if you’re architecturally sound, you’re dealing with development with extremely tight timelines/pressures, so you have to make decisions to pick things that will work but will probably need to eventually be replaced (e.g. DRb for Twitter) — usually you won’t know when and what component will be the limiting factor since you don’t know what the uses cases will be to begin with. Development from prototype on is a series of compromises against the limited resources of man-hours and equipment. In a perfect world, you’d have perfect capacity planning and infinite resources, but if you’ve ever experienced real-world hockey-stick growth on a startup shoestring, you know that’s not the case. If you have, you understand that scaling is the brick that hits you when you’ve gone far beyond your capacity limits and when your machines hit double or triple digit loads. Architecture doesn’t help you one bit there.”

Growth is hard. Dealing with growth is rarely sexy. When your growth goes non-linear you’re tempted to think you’ve stumbled into a whole class of new problems that need wild new thinking. Resist. New ideas should be applied judiciously. Because mostly its plumbing. Tuning your databases, getting your thread buffer sizes right, managing the community, and the abuse.

Intelligence and Monitoring

Growth compounds the other hard problem that Twitter (and almost every sites I’ve seen) has, thery’re running black boxes. Social software is hard to heartbeat, socially or technically. It’s one of the places where our jobs are actually harder than those real time trading systems, and other five nines style hard computing systems.

And it’s a problem Twitter is still struggling to solve. (really you never stop solving it, your next SPOF will always come find you, and then you have something new to monitor) Twitter came late in life to Ganglia, and haven’t had the time to really burnish it. And Ganglia doesn’t ship by default with a graph for what to do when your site needs its memcache servers hot to run. And what do you do when Ganglia starts telling you your recent framework upgrade is causing a 10x increase in data returned from your DBs for the same QPS. Or that your URL shortening service is starting to slow down sporadically adding an extra 30ms burn to message handling. (how do you even graph that?)

Beyond LAMP Needs Better Intelligence

Monitoring and intelligence get even harder as you start to embrace these new architectures. Both because the systems are more complex, but largely because we don’t know what monitoring and resourcing for Web scale queues of data, and distributed hash tables look like. And we don’t yet have the scars from living through the failure scenarios. And we’re rolling our own solutions as it is early days, without the battle hardened tweaks and flags of an Apache or MySQL.

We all know that Jabber has different performance characteristics than the Web (that’s rather the point), but we don’t have the data to quantify what it looks like at network effect impacted scale. (the big IM installs, particularly LJ and Google have talked a bit in public, but their usage patterns tend to be pretty different than stream style APIs. Btw I’ll be talking about this a bit in Portland at OSCON in a few months!)

Recommendations

So I’d add to Leonard’s architecture (and I know Leonard is thinking about this), and the various other cloud architectures emerging that to make it work you need build monitoring and resourcing in from the ground up, or you’re distributed in the cloud queues are going to fail.

And solve the growth issues, with appropriate solutions for growth, which rarely involves architectural solutions.

Quiet Saturday Thoughts

April 5th, 2008

Thinking again about distributed log oriented writes as a better architecture for a whole class of persistent data we need to deal with. Atomic appends are actually one of the least appreciated features in GFS, and certainly the most critical feature HDFS is missing. Right now I’m not even sure I’m supposed to be worrying, my back of the napkins are saying maybe 10-20mil daily appends across 3-4mil queues is just like running a big mail install right? (remind me to look at Maildir again)

Also contrary to TC’s breathy article BigTable is not much like SimpleDB (other then they’re both ways of storing and retrieving data which aren’t MySQL) in that it doesn’t give you querying, just limited range scans on rows, and it seems to be really really expensive to add new columns (at least whenever I talk to Gengineers, they seem to flinch at the concept)

Meanwhile I’m still waiting on DevPay for SimpleDB, before I get into it in a big big way.

2007 Was Not the Year of the Addressbook

February 28th, 2008
from __future__ import the_cloud

the_cloud.twitter.me.unfollow.everybody
the_cloud.me.addressbook.known_twitterers.each |identity|
   the_cloud.twitter.follow.(identity) 
      if the_cloud.geolocator(the_cloud.dopplr, the_cloud.fireeagle).nearby?(identity)
end

Last year I wrote a SxSW Twitter killbot, but what I really wanted was the above. I almost wrote it, but there were one of two annoying problems, and I figured someone else would write it for me.

Its one year later, I’m starting to realize that I’m about to go into conference mode again, which on top of a sleep deprived delirium, and a certain disconnect form external data sources, also is the only time when I have Twits come to my phone. And I still can’t do the above! What have you people been working on all year! Don’t make me come back there and start a start up.

Other questions I’ve asked my addressbook lately, and failed to get a response:

Please partition my social graph into a Dijkstra Nikon/Canon split.

Does Bob like cilantro? And is Alice lactose intolerant?

Do any friends.known_vegetarians.have_yelp_reviews(Austin)?

Lots of others, all unanswered.

OAuth in PHP (for Twitter)

October 16th, 2007

Mike released HTTP_Request_OAuth today, so I spent a little while this evening coding up Service_Twitter as helper class for making OAuth authorized requests against the Twitter API.

Both are early enough in the dev cycle to be called proof of concepts.

Mostly I wrote it because I had always envisioned there being wrapper libraries around the low level OAuth implementations that wrapped the calls, and constants, and as Mike graciously went out and wrote a low level library I felt compelled to write a wrapper.

Also twittclient, an interactive client for getting an authed access token, essential to bootstrapping development.

And nota bene, HRO currently only supports the MD5 signing algorithm, which is undefined in the core spec, and subject to change. (Just in case you didn’t believe me about the early state of things.)

update 2008/4/18

This code no longer works because Twitter has taken down their (slightly non-compliant) OAuth endpoint. When they add OAuth support back in, I’ll link to it.

Union Sq. Ventures

July 30th, 2007

The writing was on the wall.

Jack Dorsey: Taking the subway to Union Sq. The NYC one. (July 23rd)

(actually I missed that one, but Twitters from Jack regarding the White Stripes were a dead give away!)

Bit late, but congrats to both Twitter and WeSabe on closing funding with Union Sq. Ventures.

Like Tony said, “All my friends go with Union Sq.”. I’ve been fan (a fan of a VC!?!?) since shortly before their del.icio.us investment, and they continue to fund my favorite start ups.

We walked by their office today, but too busy to stop in and say, “Hi”.

Google Talk Architecture, and High Availability (HA)

July 29th, 2007

P7280018_Moleskine_Kreisel

Via the HA blog (an obviously unserved niche in retrospect), a very interesting 30 minute presentation on the Google Talk architecture.

ConnectedUsers * BuddyListSize * OnlineStateChanges

Interestingly people keep independently re-discovering that maintaining presence is the hard part of scaling these systems.

Its something that really came home hard in my talking with Twitter helping with their scaling challenges (so much so that we took a slide out of our “Social Software for Robots” talk to talk about it, and Blaine mentioned it again in his “Scaling Twitter” talk)

So by way of a PSA:

Presence isn’t easy.

Growth in social systems in non-linear. Ignore the network effect at your peril.

Kick the Tires

Also interesting was “Real Life Load Tests”. The GTalk team deployed to Orkut and GMail weeks before actually turning on the UI for the features to be able to monitor the load. These are the practices that make Bill’s recent observation on HA systems possible:

An interesting takeaway is that it’s clearly possible to re-architect data storage on super-busy production systems seemingly no matter where you start from.

For the rest of bullets see the HA blog post.

Slides: Social Software for Robots

May 18th, 2007

Blaine and my slides from XTech07. (Oh, and SlideShare needs co-presenter features!)

More XTech ‘07 slides.