oldtweets

July 10th, 2012

oldtweets is a search engine for the first year of Twitter.

A bunch of folks asked about the how. The Twitter API provides a method for fetching a tweet by ID. So to build an index of the first year of Twitter you need call the api for each ID in the range of IDs 1-20,000,000. 20 million API calls at the rate of 150 calls per hour. Or roughly 15 years of elapsed API time to index year one.

It also helps to know that Twitter is, and has always been, a MySQL shop, and that in the early days there was a theory about scaling databases by using large auto-increment offsets. (I don’t remember what the logic of that was) That started about 6 months in, was turned off for a while, and periodically drifted. So good news the 20 million ID space is very sparse, which significantly cuts down on the elapsed API time. You just need to send tracers into the space to map it.

From there it’s just a question of patience.

The whole things runs on a very small EC2 instance, and it’s on this week’s todo list to get the index running under Upstart, but it hasn’t happened yet. So if it goes away….

Why

I think our history is what makes us human, and the push to ephemerality and disposability “as a feature” is misguided. And a key piece of our personal histories is becoming “the story we want to remember”, aka what we’ve shared. I just wanted my old tweets, as a side effect I got all of them.

Providing an interface to the whole corpus was motivated by the desire that folks would investigate where the social norms arose, exactly like Rabble’s @-reply investigation.

Year 1

I thought year one was a meaningful symbol. It maps to the time when we were figuring out how to use Twitter, and maps to the time when I felt like the service was working best for me and mine as an “ambient intimacy” service.

Additionally after SxSW 2007 the rate of tweeting increased significantly, making the brute force approach even slower.

Tagged:

16 Responses to “oldtweets”

  1. whitneymcn says:

    I was relatively late to Twitter (though turns out I did squeak under the oldtwitter wire with a cursory Mar 28, 10:46 2007 tweet), but when I really started using it in late 2007 or early 2008, the realization that this was different and fascinating in ways I hadn’t really thought about hit me hard:

    http://smr.absono.us/2008/02/that-ambient-intimacy-thing/

  2. harryh says:

    I really regret deleting my very first tweet which was something like this:

    Oh look, it’s like dodgeball but with fewer features!

    Which was factually correct but (with the benefit of hindsight) entirely missing the point.

  3. jeremy says:

    Kellan,

    We had to use prime offsets because tweet ID’s were mysql auto increment, and we were experimenting with multiple write masters (something which never actually worked even with the help of Percona and others), and that was MySQL AB’s recommended procedure for avoiding auto increment row collisions.

    Jeremy

  4. Kellan says:

    @jeremy: thanks for the reminder.

  5. Tim says:

    its not working for all users, i’m 59 user and nothing shows up for mu user name @ tim535353

  6. Kellan says:

    @tim: it’s case sensitive, try http://kellan.io/oldtweets?q=userid%3ATim535353

  7. Jason says:

    This is awesome, but I want the raw data. Where is it?? ;-)

  8. John Furrier says:

    nice work..those sequential integers was cool..i’d love to know who was the last person under that format …that is when twitter switched off of it..

  9. Ruby says:

    Love this. Thanks, Kellan! I’m surprised at how similar my tweets are now to 6 years ago. There weren’t any RT’s yet, but I found my first @reply to someone in March 2007.

  10. vanderwal says:

    I’m guessing it didn’t let you pull accounts that are currently private, even though Twitter privacy didn’t arrive until much later?

    I would have unprivated my account for a few days had I known you were going to do this. :-)

  11. rabble says:

    Privacy was part of twitter during this time. I know because i found a tweet where i set my account to private. Once you go BACK to being public, all the old private tweets become non-private.

    http://twitter.com/rabble/status/7614651/

  12. vanderwal says:

    Rabble, yep. Privacy is a binary wrapper for all one’s own tweets. I lifted the privacy to get an archive tool to grab about 17k of my tweets in 2008 or so. It was open for about 24 hours then I put it back in private.

    I really would love to get a full archive of my first 40k Tweets. There is so much in there of trips and kid’s funny sayings.

  13. [...] increased over this period. I’ve tried to figure out why and this is the closest I could get. Kellan wrote a blog post about the 1st year of tweets, in which he said it worked best in the first year because of “ambient intimacy”. There [...]

  14. Love this oldtweets project. I enjoyed doing a keyword search for “coffee” and seeing so many innocent coffee-making updates by @biz and @ev.

  15. Tom Duke says:

    I used to use twapperkeeper, thought it would be around for ever and then one day it got scooped up by HootSuite. HootSuite claims that the functionality is now within their product, but it doesn’t work the same.

    I was excited to find oldtweets, but it’s not working for me: @Tom_Duke

    Any assistance would be very much appreciated, I tried:

    userid:Tom_Duke and it finds 0 results.

  16. [...] week, Kellan Elliott-McCrea launched a new Twitter search engine called “oldtweets” to a bit of buzz. As the name [...]