Minimal Competence: Data Access, Data Ownership, and Sharecropping.

May 18th, 2010

A friend (from Google) recently trolled me, asking, “What’s up with the data lock-in at Flickr?”.

Got me thinking about standards. I wrote back a rant to a mailing list of fellow senior hacker, and coders types. Below I’ve included that rant, largely verbatim. I’d been meaning to turn it into a more reasoned blog post, maybe something suitable for posting on a more official outlet, but life is short, and Rod’s post about Quora reminded me to get on it.

As software engineers, as social software engineers, it’s important to have standards. You can debate the how much of what we do can be called engineering, even charitably, but the code we write determines the rules that govern the spaces more and more people spend time in, and while “First, do no harm” might be reaching, a few standards that you should be embarrassed to not meet seem appropriate.

One of those is around data access, data ownership, and sharecropping. This is something Flickr takes very seriously.

The Minimum

With Flickr you can get out, via the API, every single piece of information you put into the system.

Every photo, in every size, plus the completely untouched original. (which we store for you indefinitely, whether or not you pay us) Every tag, every comment, every note, every people tag, every fave. Also your stats, view counts, and referers.

Not the most recent N, not a subset of the data. All of it.

It’s your data, and you’ve granted us a limited license to use it.

Additionally we provide a moderately competently built API that allows you to access your data at rates roughly 500x faster then the rate that will get you banned from Twitter.

Asking people to accept anything else is sharecropping. It’s a bad deal. Flickr helped pioneer “Web 2.0″, and personal data ownership is a key piece of that vision. Just because the wider public hasn’t caught on yet to all the nuances around data access, data privacy, data ownership, and data fidelity, doesn’t mean you shouldn’t be embarrassed to be failing to deliver a quality product.

The ability to get out the data you put in is the bare minimum. All of it, at high fidelity, in a reasonable amount of time.

The bare minimum that you should be building, bare minimum that you should be using, and absolutely the bare minimum you should be looking for in tools you allow and encourage people who aren’t builders to use.

A Reasonable Exchange of Value

Flickr actually goes a bit farther, not only can you get your data out, but it gets enriched as it passes through the system.

If you use the geotagging feature, you don’t just get the lat/long out you put in, but your photo comes back with a whole hierarchy of geographic descriptors, that are pointers into a publicly available gazetteer (Y! GeoPlanet). It would be good if there were pointers into other publicly available gazetteers (if for example Google ever released one) but there isn’t a good concordance service yet (but it’s being worked on)

You get structured access to all the metadata that people have added to your photos, with proper attribution available. (of course there is a working privacy model, so your “friends” aren’t getting data they aren’t supposed to, like your friend requests, and chat logs)

If you used our machine tags vocab, you get extra information pulled in from 3rd party APIs like Open Street Maps, Open Library, Last.fm, various transit administrations, and Foursquare.

Additionally you also have access to the data that was created in aggregate using the data you shared with us, like tag clusters, and the Creative Commons licensed neighborhood shape boundaries.

This isn’t the exhaustive list, just a few of the things Flickr does to respect, and collaborate with the people who share their time and data with us.

I’d certainly love to get a fraction of this data back from other services I use. Imagine getting access to all the data Google has about you, and everything they’ve learned partially based on observing you. I’ve gotten used to being disappointed by most of my fellow practitioners, but I still dream about using good tools that treat me with respect and want to collaborate.

Thanks go to Jesse Vincent, for the useful sharecropping metaphor.

(and I’ll state the obvious this is my personal blog, nothing I post here should be taken as official Flickr or Yahoo communication or policy, unless otherwise noted, that isn’t what they pay me to do.)

19 Responses to “Minimal Competence: Data Access, Data Ownership, and Sharecropping.”

  1. Tom Carden says:

    “The ability to get out the data you put in is the bare minimum.”

    Amen. I hate to bring up the inevitable, but right from the beginning the thing that creeped me out about F*bk was that they asked me for information about myself and my contacts that subsequently never appeared anywhere on the site. It was only for profiling me, not for improving my experience and once entered it was no longer mine – I couldn’t take it back. That’s not acceptable.

  2. Tom Carden says:

    I tried to censor my reference to the world’s most popular social network site, above, because this post is about much more than that. I got the formatting wrong, sorry :)

  3. george says:

    FB is rediculous. I left 2 weeks ago, looking to replace google too. Even if I have to run my own server at home.

  4. Mr. Gunn says:

    “Just because the wider public hasn’t caught on yet to all the nuances around data access, data privacy, data ownership, and data fidelity, doesn’t mean you shouldn’t be embarrassed to be failing to deliver a quality product.”

    Love it! I’ve been trying to explain to business types that the number of users canceling their accounts likewise has no bearing on whether what you’re doing is the right thing to do. Bait-and-switch is wrong, even if no one notices.

  5. Florian says:

    I wish there was application that would let me mirror my Flickr content on my own server or hard disk. Comments, tags, everything. As far as I know such a thing does not exist yet.

  6. Pete Ashton says:

    “via the API” is the catch here. I’ve often thought a Pro account (or maybe an Uber-Pro account) should have some advanced export functions. Let me download a zip of the originals from a set, give me a full export of 2007′s photos in three sizes, etc.

    It’s nice that I can do this through the API but I’m not a programmer – I’m a photographer. Imagine if I lost all my photos and could only get them back by downloading each one from the All Sizes page. Ouch.

  7. Micah Alpern says:

    With Flickr you can get out, via the API, every single piece of information you put into the system.

    Love the API and the sentiment, but I think we can go one step further, which is to say, non-technical users should be able to get their information out of a system.

    In the case of Flickr it would be nice if there was a page a photographer could go to to easily download all their data.

    If a Microsoft Word user had to learn the VB Script API to “export” their data it wouldn’t really be a complete solution.

    That said, love Flickr. You guys rock. :)

  8. memo says:

    I guess flickr is doing quite a good job in data sharing. Not like facebook at all…

  9. Kellan says:

    Florian, Micah, and Pete: I hear you guys. My point is about creating the space of possibility. If you think about the code (and standards) as defining the physics of a services then whether or not flying is possible in this physics model is almost more important then whether anyone is yet flying – if its impossible it will never happen, if its possible it will inevitably happen. Architecture vs product design.

    That said there are tons of tools for doing backups, checking in the app garden, or the help forums for one that fits your exact needs. Additionally there are 3rd party services built on the API, like backupify that do something similar.

  10. Julien says:

    Actually, even better than an custom API that you guys have. I’d love that you supported say, PubSubHubbub! It would be a breeze for me to build a small subscribing app (or wait for Backupify so subscribe to the feed) to get a perfect permanent sync as soon as I update a photo!

  11. Aaron Stone says:

    I really appreciate that Flickr takes its APIs seriously. Having used the Twitter APIs, all four is it now? completely different APIs, Flickr’s great to come back to.

    And I hate to say this to everyone commenting, looking for a packaged Flickr backup solution, but here: http://lmgtfy.com/?q=flickr+backup

  12. Florian says:

    I am aware that there are several solutions that let you backup your photos. I checked in the app garden and even on google before posting my comment (gasp).

    Too bad that there is no online Flickr backup solution that lets you mirror your complete photostream (including all comments etc.) locally or on a server.

  13. Pete Ashton says:

    @Kellan Belated thanks for the reply. I hadn’t realised the App Garden had progessed so much. The last time I tried an API-driven backup is was a nightmare, but that was years ago. My bad!

  14. [...] that Facebook’s competitors adhere to them.  Flickr’s public API lets users remove every bit of information they make public, and Twitter offers similar functionality.  (Google Buzz failed [...]

  15. John Cowan says:

    As an ex-Googler who formerly worked on the Google Data APIs, I’ll point out that they provide “everything in, everything out” as well, although some Google apps unfortunately don’t have APIs.

  16. Josh says:

    All right. So I don’t know who you are. I found this post on an autosubscribed RSS feed through NNW, called ongong. Don’t know if that’s you, always, or not. I have a couple of things to say, in my typical longwinded way.

    First, tell me who you are. This post hits right on the target that I have been deeply concerned with the last few months. There are so many companies out there currently whose sole purpose to existence is to mine as much information about me as possible, for the purpose of selling it to the highest bidder. The consumer of their services, me, has little or no effective control of what that data is, and how it’s used. This is unacceptable.

    There are three companies that I am concerned with currently. Facebook, Google, and Global Rainmakers. The reasons vary, but are all privacy and liberty based.

    Facebook’s inability to build and maintain an acceptable privacy policy, and their requirement for data that is not needed for the purpose of providing the service they offer, is well known at this point. My concerns are shared by many. My biggest concern is the feeling that they don’t care what I think about the matter.

    Google’s data trove on everyone, and everything is scary enough, but add in there recent rethinking of the “Don’t be evil” policy, and the inability of anyone to affect the data that is held about them, is doubly so. The CEO’s recent comments regarding the expected necessity for teens to change their names on reaching their majority, in order to avoid association with parts of that data trove is a huge red flag for me.

    And GRI. This dude is just plain evil. They are deliberately attempting to bring the Minority Report world to life, minus the precogs. And their CDO’s “warning” that if you are a person who would opt out of their system, you will be red flagged as if you were the worst of criminal offenders is both offensive and indicative of why these kinds of decisions should not be left to the corporate world. As one previous poster already stated, bait and switch tactics are just wrong.

    One other major concern I have with all of these companies is that they are aware that they are targeting minors, and that the information gathered will cause those kids trouble later. And they seem to think that is not their problem.

    So, again, I didn’t know who you were, prior to reading this article. But I do now. And I’ll be paying attention to your ideas. I like the way you think, in this case. And I’m frankly a lot more comfortable using Flickr, at this point.

    Josh

  17. Quora says:

    Why do companies like Foursquare/Plancast/Yelp and other consumer-data companies have API’s?…

     Would Foursquare still post check-ins to this generic database of  > structured data? We wouldn’t do this automatically because check-in data is private and we wouldn’t post it to a public location without a user asking us to do it.  It would ce…

  18. Bertil Hatt says:

    Sorry to ask, but: what data does Google have on you that you can’t get back? Have you checked Google Dashboard?

    Plus: PubSubHubbub is a good idea too—it should prevent you from the risk of an “API bank run” (server crashing because all your users are getting their information back at once, fearing that the servers miht crash because of the announce of a run).

  19. [...] I first started sketching out the “Minimal competence” blog post in my head, I imagined it as part of a series. The series had three real [...]