Blog posts tagged "data"

  • April 17, 2009

    #2 Every Building with a Shoebox in it’s Basement.

    “Buildings could offer WiFi photo uploading service, in return for keeping the photos taken of them….… what if Cloudgate were built with servers and wireless inside, right from the start, offering to consume the photos taken of it. You take a shot with a wireless enabled camera and it could store a copy for you. It’s building up a library of itself, in all seasons, in all weather. Meanwhile you, have a backup, findable by time and browsing, stored safely in the Cloud!”

    + 0. (Aside , , , , )

Is a Firehose of Snowflakes a Nor’easter?

March 4th, 2009

I tried explaining the title of this blog post to Jasmine this morning. Suffice to say my explanation needed a bit of practice. And more than 140 characters. Or it might just be I’m a bit stir crazy from Winter returning with a vengeance in these here parts. But I wanted to call out a couple of points that might have gotten overshadowed in the good Reverend’s recent post on the Flickr Panda APIs.

NewsWire API

Picture 21

The NY Times at their great Times Open event announced their Newswire API, which is a real time stream of their content. Stories, and blog posts, and what not. More interestingly was their discussion about how they’ve built a backend “pinging service” that makes it easy for them to add new types of data to their stream. I’m a dork enough that a Grey Lady firehose sounds pretty awesome.

But they got some flack for it being a snowflake API. From where I sit snowflake APIs look like opening up your data as fast as possible, along any means necessary, and trying not to pre-judge how people will use it, but I’m thankful for the metaphor, as it allowed me to spend the morning envisioning fire hoses of snowflakes.

Still I spent 2007, and 2008 talking about how XMPP was going to be a key piece of building firehoses standardizing and enabling the real time Web, so its a criticism I’m sensitive to. (and I’ve already been skipping conferences in 2009 in the hopes of actually having some time to build it, though thankfully minor details like time haven’t stopped my colleagues at Fire Eagle from launching theirs)

Pandas

Flickr Panda!

Which is all apropos of saying, we launched our own “snowflake” realtime API yesterday. (though actually its just a slight modification of our standard photo response format). And its Panda-shaped. And it is awesome.

Near Real-Time, Every Minute, up to 120 Events

But because the documentation is quirky, I think people missed the significance. These are Flickr real time data APIs.

We’re building streams of photos in real time. Examining the huge stream of data events that happen on Flickr, the social activity, the searching, the meta-data creation, and fishing from that stream to build 3 real time streams. We’re then exposing those streams via a near real time polling based API.

The API pattern is specifically structured around making it easy to call from client side scripting, and the data streams are structured around discovery rather then guided search, but we’re pushing up to 120 discovered photos down these streams each minutes, every minute. Two streams of real-time interestingness, and 1 of lightly interestingnessed geotagged photos.

And they’re named after famous pandas. Really what more do you want?

Whither XMPP

So what’s up with the blossoming real time data APIs? And where is our promised standardization? They’re coming. There has always been a tricky chicken and egg problem. There is so little data out there that is appropriate to expose in a real time fashion, that there is little demand to consume it, so the tools fail to evolve. But I’m seeing tons of work, great toolkits from like Fire Hydrant from FireEagle and Babylon from notifixio.us, and Google’s decision to make XMPP a standard part of their AppEngine toolkit are just I’ve been most excited about recently.

NYTimes Article Search API

February 7th, 2009

NYTimes: Sex & Scandal since 1981

I don’t have much to add that the New York Times hasn’t already said about their Article Search API. Its an amazing corpus to be searchable, both in breadth, and scope, and for sheer richness of the classification. I can’t think of an remotely comparable dataset with such a rich API.

Couple of things I noticed that I wanted to call out.

Get info about an article/Search by URL

Positioned as a search API, it also doubles as a “getInfo”-style API, as article URL is one of the searchable fields.

?query=url:$article_url

Just make sure to remove the various query string bits that the Times appends, as these aren’t indexed. Should make a “find the history of this topic being discussed” Greasemonkey script a snap.

Expert’s attention information

One of my less comprehensible requests to the NYTimes developer team at OSCON last year was to make sure their APIs exposed the “attention information of [their] editors.” Age of amateur, citizen journalism, and radical decentralization are all awesome, but the NYTimes’ editors job is to think about what is important and interesting full time; and that’s information worth mining.

And they did!

The page_facet, and nytd_section_facet both allow you to gauge some degree of relative weight given to a story. (section_page_facet seems like it ought to do the same thing, but I couldn’t get it to work)

?query=flickr nytd_section_facet:[Front Page]

Gives you articles mentioning “flickr” featured on the NYTimes front page. (of which it only finds 3, alas)

API Design

Good stuff:

  • Clean hackable URLs, you can play with it in your browser and see what you’re going to get.
  • The getList + extras (called fields in the NYTimes API) is the house wisdom at Flickr, and I’m glad to see it elsewhere
  • The parsed tokens block is neat, and I can see it being incredibly useful for working with such a large, varied corpus
  • The sure amount of searchable/indexable metadata and the granularity is really unprecedented, great to see them go out with such a rich, “here’s the data do something great” approach.

Visualizations

The graphic at the top of this blog post is a “visualization of the frequency of occurrence of the words ’sex’ and ’scandal’ in the New York Times, since 1981.”, part of a set of visualizations by blprnt_van built with the article search API, and Processing.

Tagged: , , ,

Fire Eagle: Interesting Choices

March 5th, 2008

Fire Eagle

Other folks are talking about and writing about the long germinating, launched in beta, location broker from Yahoo’s Brickhouse, Fire Eagle.

I wanted to call out just a couple of the cool, and non-intuitve decisions they made.

Is NOT a consumer brand

Fire Eagle is a service for building and sharing location data. Its the application built on top of it that you’ll interact with, unless you’re building stuff.

Fire Eagle does NOT manage the social graph

Its a service for sharing your data with friends (or services, or your toaster), but it doesn’t know who your friends are. The social graph has been outsource. Best example of a small piece loosely joined I’ve seen in a long time.

Cares about privacy and ease of use

Ninja privacy is built in. But you don’t have to care. The TOS requires developers to discuss how the data is used. And privacy levels are front and center. And from day one data is delete-able, and in fact data is flushed on a regular basis.

Built on OAuth

Yay!

Notes from Social Graph Foo

February 4th, 2008

Here is my quick dump of the notebook, probably useful to no one but me. Names mostly removed to protect the guilty.

I think “Social Graph” is kind of a dumb phrase to apply to the back question of relationships. I promptly re-dubbed the event “Social Foo” and thereby found interesting things to talk about. Kevin Marks proposed “social cloud”, clouds hide details. (operations people get hives when you talk about clouds)

XMPP, OpenID, OAuth are all going to be huge in 2008; DiSo, DataPortability, and Social Graph API aren’t as clear winners to me.

Bowling Alone misses the point. There has been a transformative change from groups to networks. Groups are just a funny form of network.”

“Differentiated role networks”. Differentiated roles, and the failure of monolithic identity and friending were one of the things I went to Sebastopol to talk about this weekend, the people who got it got it, and everyone else wasn’t interested in the hard squishy details of real community. I think this might be the side effect of running social software for social softwares sake vs social software as bath for social media object sharing.

“Relationships can be broken down into 5 types: emotional aid, sociality, major help, minor help, and $$$”

Note to self: try block modeling interactions in high profile/high turn Flickr groups. (central, utata, etc)

No one really understands user expectations. Privacy expectation is currently, “unstable”.

Huge conceptual issues with the difference between public information hand aggregated, and public information computer aggregated. Cognitive dissonance ensues.

Rules, games, and rulesets. Modeling of social software as games. Tension of implicit vs. explicit rules. Mag.nol.ia’s altruism game derived from the cracks board (witnessing altruistic acts is a public good, way to update the Mag rules of game to support this?), Satisfaction’s status update game. Hoping Teresa can bring the quality gaming to BoingBoing’s anemic community. Social games + adversting.

Parody/pastiche as lit analysis. Investigate for web.

Social networks need NPCs. e.g. the Instructables Robot.

Standards works should be done in small groups, with a clear need, that selectively grow the list of participants. No hierarchy of early/late joiners (aka OAuth did it right)

“Everything public” bores me.

Beyond LAMP.

Find a feed for Nathan Eagle’s research.

“locations rights management”

“trusts are largely not transitive”

Language communities are “small world networks”, partitions communities by language. 2-5 hops vs 8 in analyzed network.

The Plaxo way: “We gets ze data Lebowski”

“Twitter is my early warning system. My blood pressure has gone down over the last 18 months”

Identity and sharing can make everyone warm and fuzzy, but also came face to face with sobering consequences that kept me up at night with a bottle of tequila. Re-thinking proposed Flickr features.

Flickr: A Place of Our Own

December 10th, 2007

You might have seen the post on the Flickr blog announcing Places, or maybe the Good Reverend’s write up, but if you haven’t:

Places is a new Flickr feature that mines our corpus of geotagged photos, identifies characteristic features on a per location basis, and then goes back into the data looking for “iconic” beautiful photos. (btw try reloading that /places page, the feature places are random. As to a certain degree are the photos on the individual Places pages themselves)

It also is where a good chunk of my creative energy went for the last few months which is why the blog has been so quiet. And its a hell of a lot of fun, not to mention a privilege and pleasure to deep dive into our database and be reminded just how much fabulous photography there is on Flickr, and maybe just barely fumble around the edges of surfacing the diverse communities shared vision. Eyes of the world indeed.

A Place for GeoRSS feeds

Dan roped me in on Places months ago. We had geoFeeds working for semi-arbitrary places, and we needed a page to hang them off of. That page looked a lot like search result. You never saw it because the Flickr project management process (a blog post of its own) left that particular prototype a bloody, heaving wreck. Don’t worry, the current version is much much much better. (of course you also never saw Dan’s brilliant prototype of the current version, which was too cool to release on an unsuspecting public) And voila, many months later, the feeds are there. (though I’d still like to bring back that SRP view to allow rich searching within a location)

Increased Surface Area

We brought a bunch of different design goals to Places, but one of my obsessions that I think we nailed was the idea of “increasing the surface area” of Flickr. (also known as providing new ways to level up in the Game of Flickr[tm]). Only a few people, and a limited range of styles will ever be featured on the Flickr Explore pages. Which is fine, most people don’t care. But Places provides another way to recognize the contributions of Flickr members, by hilighting their geotagging and their photography skills. I’m looking forward to adding a couple more similar features to Places, recognizing other Flickr Games one can level up in, and other contributions back to the commons you can make.

Mo’ Betta

A bunch of stuff didn’t make our initial launch. Some of that has come in since then. More will be coming. I’m particularly excited about using adding some new data sources to improve the page. (e.g. the Groups right now a bit weak, and we don’t have reliable neighborhoods in cities, both of which are in process of being fixed)

Thats kH8dLOubBZRvX_YZ to You

Turns out there are a lot of San Franciscos in the world, and we personally struggle to keep track of which one is which. So we’ve been experimenting with giving them unique place_ids. If you look really close you’ll start to see these popping up around flickr, in photos.getInfo, photos.search, and as microformats on the Places pages. Its all very experimental, this unique identifiers thing, but we think it might work.

Arm Chair Travel

And because I love you, I’m going to let you in a on a secret. Have a great trip.

Just beyond the door

Personal Data Stores and the Network

October 31st, 2007

Thinking about what “personal data stores” are going to look like, how this interacts with decentralized models for community services, (I swear I’ve written something more recent then 2005 on that topic, but can’t find it), mulling models for updating clouds, wondering if projects like G’s OpenSocial, and Portable Social Networks are a step forward or back, speculating that digital curation is a viable near future business model, and that individual curations would work well as shareable social media objects.

Nothing necessarily novel. Just where my head is at.

*Lots* of New Semi-Structured Data

June 21st, 2006

Showed up at Microformats party last night and promptly fell asleep in my drink (long day), but this is the real action. Andy and Y!Local have rolled out hCard, hCal, and hReview to all of Yahoo Local.

And Gordon has whipped up the first microformat -> to Y! bridge with Greasemonkey.

Nice.

Tagged: Uncategorized , , ,

Usability and Stockholm Syndrome

February 17th, 2006

Going home, and working with my grandmother on her computer is always an eye opening experience. I think it’s the only time I get any real insight into how computers should actually work, or how much time I spend working for my computer versus my computer working for me. Trite I’m sure, but I’m floored every time I do it, and floored again when I think of the energy I spend justifying (largely to myself) how computers work.

This morning I realized how arbitrary the distinction between photos you’ve downloaded from your camera, and photos you’ve been emailed by friends is. (and by extension photos out in the ether) Why can’t you find them all in iPhoto?

Tagged: Uncategorized ,

Pandora and the Vector of Personalization

January 7th, 2006

I’m getting good stuff out of Pandora mixing Frontier Psychiatrist with Feel Good, Inc. What are you mixing?

Recommendation cocktails are the way to go.

Most recommendations are either one dimensional (if you liked X you’ll like Y), or, more often, assume that all our little quirks added up describe our one true nature (e.g. Amazon). In fact we’re more complex then that, described by a multitude of often unrelated vectors. Pandora lets you experiment with the dot products.

Thanks Rob for pointing out this feature, I had missed it the first time through, and had written off Pandora.

Tagged: Uncategorized , , , ,

Knowledge Management, Blogging, and the CIA

August 14th, 2002

Knowledge managment and blogging has always gone hand in hand, especially before personal publishing was a legitimate buzzword, and there briefly flourished (and died, I hope) the term “k-log”.

Jon Udell recently had nice things to say about the Traction Software’s knowledge management solution, “best described as an enterprise Weblog system”.

He failed to mention, its funded by the CIA.

And here I thought the $5000 price tag was the only problem.