Blog posts tagged "twitter"

Fred’s wrong (or quoted out of context)

July 22nd, 2010

[Twitter breaks] because “it wasn’t built right — Twitter was built kind of as a hack and they didn’t really architect it to scale and they’ve never been able to catch up.” – Fred Wilson

This is wrong.

Twitter wasn’t built as a hack, it was just built. The way you or I might build something new, in a couple of weeks, with some databases, and a couple of cron jobs, and a daemon or three. If they had built it [portentous voice]TO BE TWITTER[/portentous voice] they would have failed.

Scaling is always a catch up game. Only way its ever worked. If you never catch up then something isn’t working, but it isn’t original sin.

May 4th, 2010

Here at FlickrHQ we’re debating releasing a feature to allow embedding of photos *on* *the* *WEB*. We think it’s going to be huge.less than a minute ago via web

Tagged:

When I say “FUD” …

October 22nd, 2009

"Flicker upcoming"? WTF? :)

… I mean Flickr/Upcoming/Delicious. In particular, I mean that brief moment of optimism in the Spring of ’06, on the roof of the Iron Cactus, at the Spread the FUD party, when it looked like Yahoo! had a wedge and the will to solve the social search problem, and magically, I might even get to be a part of that. I said in my cover letter (in silly flowery, cover letter speak)

“The next round of innovation will be about building connections. The explosion of voices, information and ideas is currently outpacing our techniques for coping with them. We need to be helping people and communities find new ways to connect, interact, and work together to make sense of this accelerating decentralization. Innovation has been blossoming at the edges of the Net since the beginning, but innovation is also moving back to the connecting nodes, like Yahoo.”

Which is much on my mind when I hear about Marissa demo’ing social search yesterday.

And I’m deeply puzzled (and not a little disappointed) that anyone would care if Bing or Google can search the public status timelines, if it doesn’t come with social context.

Now the question is can Goog shake their historied failure at all things social.

Photo from Jan Brašna

Twitter lists, creators vs curators, and who owns the meta-data?

October 16th, 2009

Flickr is a creators’ community. This informs a number of the decisions we make. Including the question of “who owns the meta-data?” (where own is defined as who can operate on it).

On Flickr a photo tag can be removed by a photo’s photographer whomever added it. And a tag only has a single instance. This is profoundly different then del.icio.us which is a curators community. On del.icio.us I can make any statements I want about an object in the world, and all the curators voice can be conglomerated towards consensus. Flickr privileges the creator, del.icio.us the consensus.

Even when we launch curatorial features, like the recent galleries launch, the content creator has final say about how their work is described, including membership in a gallery. You can not only remove your photo from a gallery, it can’t be re-added once you’ve done it, and you can block that curator from operating on your photos again.

This is all been a fairly rudimentary discussion by way of explaining my biases.

I’m excited by the Twitter lists feature, it’s a great example of enabling powerful interactions by offering stripped down bare minimum organizational tools. (in fact its almost identical to galleries in that aspect)

But interestingly, and frankly surprisingly to me (possibly given my biases), Twitter is positioning itself as a curatorial community, not a creator community. This might actually make sense in the sphere of social media experts, and their endless re-tweetings, but its a fundamental mismatch with my expectations as a very early member, and someone who isn’t trying to shill a product (beyond perhaps a slice into my own routine life)

Thankfully I was saved from having to make the effort (via buzz and meowrey)

From the Twitter lists beta

Tagged: , ,

Lessons of the Elders

August 27th, 2009

IMG_2788

Walking into work this morning, thinking about the vibrancy of the Twitter API developer ecosystem. Twitter has embraced (not necessarily intentionally) what I see to be open source’s two key aphorisms on successful community engagement (lightly paraphrased)

“Get people laid” – JWZ
“Leave lots of low hanging fruit” – Karl Fogel

People don’t go back often enough to the well of hard fought wisdom on community and online collaboration which the open source community developer. Believe me you do not want to spend the blood, sweat and tears to re-learn those lesson the hard way, they were fighting for a noble cause they believed in and it still sucked. Your micro-poking app ain’t all that.

Photo by curlyjazz

Flickr, Twitter, OAuth: A Secret History

July 1st, 2009

I remember it as a dark and stormy night, that seems unlikely, but I’m sure it was late and chilly and damp.

I remember being tired from a long day in the salt mines; that was during a period when I was always tired after work.

I remember there being whiskey, and knowing @maureen, that seems likely.

I’d just won some internal battles regarding delegated auth, and implemented Google AuthSub for the new Blogger Beta, as well as Amazon auth for a side project. So when I wanted to share photos from Flickr to Twitter, I knew it wasn’t going to be over HTTP Basic Auth.

A few weeks earlier @blaine and @factoryjoe had pulled me a into a project called OpenAuth that they’d been talking about for a couple of months — an alternative to yet another auth standard, and a solution for authenticating sites using OpenID.

So one late, damp night along Laguna St. with whiskey, we did a pattern extraction, identifying the minimal possible set of features to offer compatibility against existing best practice API authorization protocols. And wrote down the half pager that became the very first draft of the OAuth spec.

That spec wasn’t the final draft. That came later, after an open community standardization process allowing experts from the security, web, and usability community to weigh in and iterate on the design. But many of those decisions (and some of the mistakes) from that night made it into the final version.

Yesterday, a little over two years later, we finally shipped Flickr2Twitter.

So it was nice yesterday when people commented on the integration:

“Uses OAuth!” “Doesn’t ask for your Twitter password” “Great use of OAuth”.

And I thought to myself, “It better be, this is what OAuth was invented for — literally”.

Streams, affordances, Facebook, and rounding errors

March 18th, 2009

I’m not really a Facebook user, but it is impossible to be a serious practitioner of the rough craft of building social software without being at least somewhat a Facebook watcher. So indulge me a bit, as I add my own thoughts to the cacophony of folks writing about the Facebook re-design.

I’ve always thought their status updates design was brilliant. Not because it was usable or attractive, I’ve always thought it was terrible. But because their design didn’t make promises they couldn’t keep.

Think briefly about the platonic ideal of an activity stream, the increasingly common social pattern that makes your traditional CRUD fronted MySQL install cry at anything remotely resembling scale. All the updates from your social network, quickly listed for your viewing pleasure, in reverse date ordering. No two users of your service will share an activity stream view (unless your service tends towards 100% social graph overlap, in which case why bother?), writes are high volume and need to be committed quickly to preserve ordering, and shared caching is right out.

So you go queue-ish, you de-normalize. And now you’re pushing messages around between services, transactional commit are gone, and you’re dealing with the inevitable skew of distributed systems. But even in queue systems, 100% guaranteed, in order delivery is more fantasy then reality (though you can get close).

But Facebook was smarter then that. They specifically designed a page that was lossy. They said, “You don’t want to see everything, here is a subset of things your friends do we think you’ll be interested in.” And so you knew that you weren’t seeing everything, it wasn’t that they were failing their contract with you, but that they had decided not to show you something for editorial reasons. And you knew that if you wanted to see everything you had to dig, because that was the contract. And that digging was scoped to a user, your wall or your friends wall, data scoped by data owner — super cheap look up.

Contrast this to Twitter.

Twitter is infamous for its bad period of down time as growth went asymptotic. But less well remembered is the teeth gnashing and hair pulling of the bad period right before, where update loss, delivery failure, and out of order delivery where the bugaboos of the day. Twitter promised you would see all your friends updates, always, neatly collated. The promise is implicit in the design, the language, the APIs, the very DNA of the service. (in fact Twitter used to make more audacious claims, I still mourn the death of the “With Friends” feature, that allowed you to see anyone’s public updates in the context of their friend network, not just your own).

One of the best, unattributable quotes from Social Foo last year was the data point that Facebook was at one point losing up to 80% of messages across their update bus. As someone whose expectations are shaped by the five nines style promises of Twitter, its a loss at scale which I can’t possibly fathom. And it wasn’t even an issue in the Facebook community. And when they expire updates out of hot storage to less accessible stores, you don’t notice, because they never offered you the option to page back forever. Contrast again to Twitter whose design (if not content) encourages you to page back forever until you smack up against an arbitrary and surprising limit. (whose exact location has changed over the years)

That is designing with affordances. Don’t let your design make promises you can’t keep.

A much smaller and possibly less well known example is the Flickr activity page, where you can monitor activity on your photos, or photos you’ve expressed interest in. For years this page was framed in the language of “which of these limited time periods are you interested in seeing events in?”, that was the question the page tried to answer. Not, “what has ever happened on my stuff?”. Because that was a much harder, and more expensive question to answer. As part of the Toto launch (new homepage) on Flickr last Fall we explicitly changed our contract with our users. Great photography has a 150 year tradition, and we felt that we could at least try to expose 5 years worth of conversations. (and Flickr usage by our members evolves and changes as their lifes evolve and change, something all good social software should design for, rather then living in the ever present now) Our activity streams go all the way back to the beginning now, but it wasn’t a change undertaken without a lot of thinking, architecture, and engineering.

Simon Willison asked this week about best practice for architecting activity streams. And the answer is, “It depends.” Depends on the scope, scale, access patterns, and affordances you’re building — your contract with your users.

Which is a long way of saying think hard about the promises you make to your users, implicitly or explicitly.

And, Facebook, my friend, what the HELL are you thinking? You managed to negotiate the best deal in the business, talk about a racket, and you threw it away for a piece of Twitter’s pain? Are you stupid? Well, best of luck with that.

Random Notes on Twitter Culture

December 4th, 2008

I tried to fit this all into 140 characters. I really did. I couldn’t do it, not even with disemvoweling.

#motrinmom

Chatting with a friend who does information architecture for pharmaceutical advertising she was shocked I hadn’t heard about the “Motrin Mom” twitter-in-a-teapot. I had no idea what she was talking about.

Apparently “Twittering Critics Brought Down [the] Motrin Mom Campaign”. And the entire advertising industry, at least here in New York, is having a fear-of-a-twitter planet moment. Complete with righteous anger about the “irrationality of Twitter”. (um, hello folks, but didn’t you build one of the largest global business by cynically manipulating people’s “irrationality”?)

But the part that really caught me off is this didn’t blip my radar at all. Maybe I was just offline for it, but as far as I can tell the twittering classes I follow didn’t peep about this. I thought Twitter was all about us? (Also, Summize you are already awesome and everything, but if you add “search within people you’re following” and “search within people who follow you” I promise to love you forever)

@flickr

Only tangentially related, I’m sure Tyler Hawkins aka @flickr has a very busy @replies tab.

What I can’t figure out is if all these folks responding to @flickr are really confused about whether Hawkins is a Flickr representative (he isn’t and doesn’t in anyway suggest he might be) or just believe so strongly that “@flickr” address twits will arrive in Flickr’s inbox that reality is irrelevant.

I’m torn on whether the assumption that when you speak you will be heard is the ultimate arrogance (and one particularly prevalent on Twitter), or if rather this proves that we’ve historically worried too much about URIs and that culture has no problem evolving them ad-hoc.

Now if only I had a thesis, rather then a rambling collection of half thoughts. Which is why I wanted to fit this all into 140 characters. Alas.

Ahab Failed

July 29th, 2008

A Couple of Caveats on Queuing

July 7th, 2008

Les’ “Delight Everyone” post is latest greatest addition to the 17th letter of the alphabet for savior conversation.

And believe me I’m a huge fan, and am busy carving out a night sometime this week to play with the RabbitMQ/XMPP bridge (/waves hi Alexis).

But …. there are a couple of caveats:

1) Some writes need to be real time.

Les notes this as well, but I just wanted to emphasize because really, they do.

If you can’t see your changes take effect in a system your understanding of cause and effect breaks down. It doesn’t matter that your understanding is wrong, you still need one to function. Ideally a physical analogy too. There are no real world effects that get queued for later application. Violate the principle of (falsely) seeming to respect real world cause and effect and your users will remain forever confused.

del.icio.us showing you the wrong state when you use the inline editing tool, and Flickr taking a handful of seconds to index a newly tagged photo are both good examples of subtly broken interfaces that can really throw people.

My data, now real time. Everyone else can wait (how long depends on how social your users are).

2) You’ve got to process that queue eventually.

Ideally you can add processing boxes in parallel forever but if your dequeuing rate falls below your queuing rate you are, in technical terms, screwed.

Think about it, if you’re falling behind 1 event per second, processing 1,000,000 events a second, but adding 1,000,001 for example, at the end of the day your 86,400 events in debt and counting. It’s likes losing money on individual sales, but trying to make it up in volume.

Good news: Traffic is spiky and most sites see daily cycles with quiet times.

Bad news: Many highly tuned systems exhibit slow down properties as their backlogs increase. Like a credit card, processing debt can get exponentially unmanageable.

In practice this means that most of the time your queue consumers should be sitting around bored. (see Allspaw’s Capacity Planning slides for more on that theme.)

If you can’t guarantee those real time writes for thems that cares, and mostly bored queue consumers the rest of the time then your queues might not delight you after all.

See also: Twitter, or Architecture Will Not Save You

Twitter, or Architecture Will Not Save You

May 28th, 2008


(circa 2006 Twitter maintenance cat)

Along with a whole slew of smart folks, I’ve been playing the current think game de jour, “How would you re-architect Twitter?”. Unlike most I’ve been having this conversation off and on for a couple of years, mostly with Blaine, in my unofficial “Friend of Twitter” capacity. (the same capacity that I wrote the first Twitter bot in, and have on rare occasion logged into their boxes to play “spot the run away performance issue.”)

For my money Leonard’s Brought to You By the 17th Letter of the Alphabet is probably the best proposed architecture I’ve seen — or at least it matches my own biases when I sat down last month to sketch out how build a Twitter-like thing. But when Leonard and I were chatting last week about this stuff, I was struck what was missing from the larger Blogosphere’s conversation: the issues Twitter is actually facing.

Folks both within Twitter and without have framed the conversation as an architectural challenge. Meanwhile the nattering classes have struck on the fundamental challenge of all social software (namely the network effects) and are reporting that they’ve gotten confirmation from “an individual who is familiar with the technical probelms at Twitter” that indeed Twitter is a social software site!

Living and Dying By the Network

All social software has to deal with the network effect. At scale it’s hard. And all large social software has had to solve it. If you’re looking for the roots of Twitter’s special challenges, you’re going to have to look a bit farther a field.

Though you can hedge your bets with this stuff by making less explicit promises than Twitter does (everything from my friends in a timely fashion is pretty hard promise to keep). Flickr mitigates some of this impact by making promises about recent contacts, not recent photos (there are a fewer people than photos), meanwhile Facebook can hide a slew of sins behind the fact that their newsfeeds are “editorialized”, no claims of completeness anywhere in site. (there is a figure floating around that at least at one point Facebook was dropping 80% of their updates on the floor)

So while architectures that strip down Twitter to queues, and logs could be a huge win, and while thinking about new architectures is the sexy, hard problem we all want to fix, Twitter’s problems are really of a more pedestrian hard, plumbing and ditch digging nature. Which is less fun, but reality.

Growth

Their first problem is growth. Honest to god hockey stick growth is so weird, and wild, and hard, thats it’s hard to imagine and cope with if you haven’t been through it at least once. To quote Leonard again (this from a few weeks ago back when TC thought they’d figured out that Twitter’s problems were Blaine):

“Even if you’re architecturally sound, you’re dealing with development with extremely tight timelines/pressures, so you have to make decisions to pick things that will work but will probably need to eventually be replaced (e.g. DRb for Twitter) — usually you won’t know when and what component will be the limiting factor since you don’t know what the uses cases will be to begin with. Development from prototype on is a series of compromises against the limited resources of man-hours and equipment. In a perfect world, you’d have perfect capacity planning and infinite resources, but if you’ve ever experienced real-world hockey-stick growth on a startup shoestring, you know that’s not the case. If you have, you understand that scaling is the brick that hits you when you’ve gone far beyond your capacity limits and when your machines hit double or triple digit loads. Architecture doesn’t help you one bit there.”

Growth is hard. Dealing with growth is rarely sexy. When your growth goes non-linear you’re tempted to think you’ve stumbled into a whole class of new problems that need wild new thinking. Resist. New ideas should be applied judiciously. Because mostly its plumbing. Tuning your databases, getting your thread buffer sizes right, managing the community, and the abuse.

Intelligence and Monitoring

Growth compounds the other hard problem that Twitter (and almost every sites I’ve seen) has, thery’re running black boxes. Social software is hard to heartbeat, socially or technically. It’s one of the places where our jobs are actually harder than those real time trading systems, and other five nines style hard computing systems.

And it’s a problem Twitter is still struggling to solve. (really you never stop solving it, your next SPOF will always come find you, and then you have something new to monitor) Twitter came late in life to Ganglia, and haven’t had the time to really burnish it. And Ganglia doesn’t ship by default with a graph for what to do when your site needs its memcache servers hot to run. And what do you do when Ganglia starts telling you your recent framework upgrade is causing a 10x increase in data returned from your DBs for the same QPS. Or that your URL shortening service is starting to slow down sporadically adding an extra 30ms burn to message handling. (how do you even graph that?)

Beyond LAMP Needs Better Intelligence

Monitoring and intelligence get even harder as you start to embrace these new architectures. Both because the systems are more complex, but largely because we don’t know what monitoring and resourcing for Web scale queues of data, and distributed hash tables look like. And we don’t yet have the scars from living through the failure scenarios. And we’re rolling our own solutions as it is early days, without the battle hardened tweaks and flags of an Apache or MySQL.

We all know that Jabber has different performance characteristics than the Web (that’s rather the point), but we don’t have the data to quantify what it looks like at network effect impacted scale. (the big IM installs, particularly LJ and Google have talked a bit in public, but their usage patterns tend to be pretty different than stream style APIs. Btw I’ll be talking about this a bit in Portland at OSCON in a few months!)

Recommendations

So I’d add to Leonard’s architecture (and I know Leonard is thinking about this), and the various other cloud architectures emerging that to make it work you need build monitoring and resourcing in from the ground up, or you’re distributed in the cloud queues are going to fail.

And solve the growth issues, with appropriate solutions for growth, which rarely involves architectural solutions.

Quiet Saturday Thoughts

April 5th, 2008

Thinking again about distributed log oriented writes as a better architecture for a whole class of persistent data we need to deal with. Atomic appends are actually one of the least appreciated features in GFS, and certainly the most critical feature HDFS is missing. Right now I’m not even sure I’m supposed to be worrying, my back of the napkins are saying maybe 10-20mil daily appends across 3-4mil queues is just like running a big mail install right? (remind me to look at Maildir again)

Also contrary to TC’s breathy article BigTable is not much like SimpleDB (other then they’re both ways of storing and retrieving data which aren’t MySQL) in that it doesn’t give you querying, just limited range scans on rows, and it seems to be really really expensive to add new columns (at least whenever I talk to Gengineers, they seem to flinch at the concept)

Meanwhile I’m still waiting on DevPay for SimpleDB, before I get into it in a big big way.