Blog posts tagged "nytimes"

NYTimes Article Search API

February 7th, 2009

NYTimes: Sex & Scandal since 1981

I don’t have much to add that the New York Times hasn’t already said about their Article Search API. Its an amazing corpus to be searchable, both in breadth, and scope, and for sheer richness of the classification. I can’t think of an remotely comparable dataset with such a rich API.

Couple of things I noticed that I wanted to call out.

Get info about an article/Search by URL

Positioned as a search API, it also doubles as a “getInfo”-style API, as article URL is one of the searchable fields.

?query=url:$article_url

Just make sure to remove the various query string bits that the Times appends, as these aren’t indexed. Should make a “find the history of this topic being discussed” Greasemonkey script a snap.

Expert’s attention information

One of my less comprehensible requests to the NYTimes developer team at OSCON last year was to make sure their APIs exposed the “attention information of [their] editors.” Age of amateur, citizen journalism, and radical decentralization are all awesome, but the NYTimes’ editors job is to think about what is important and interesting full time; and that’s information worth mining.

And they did!

The page_facet, and nytd_section_facet both allow you to gauge some degree of relative weight given to a story. (section_page_facet seems like it ought to do the same thing, but I couldn’t get it to work)

?query=flickr nytd_section_facet:[Front Page]

Gives you articles mentioning “flickr” featured on the NYTimes front page. (of which it only finds 3, alas)

API Design

Good stuff:

  • Clean hackable URLs, you can play with it in your browser and see what you’re going to get.
  • The getList + extras (called fields in the NYTimes API) is the house wisdom at Flickr, and I’m glad to see it elsewhere
  • The parsed tokens block is neat, and I can see it being incredibly useful for working with such a large, varied corpus
  • The sure amount of searchable/indexable metadata and the granularity is really unprecedented, great to see them go out with such a rich, “here’s the data do something great” approach.

Visualizations

The graphic at the top of this blog post is a “visualization of the frequency of occurrence of the words ‘sex’ and ‘scandal’ in the New York Times, since 1981.”, part of a set of visualizations by blprnt_van built with the article search API, and Processing.

Tagged: , , ,

NYTimes on Friendster

October 15th, 2006

Reading the NYTimes Friendster post mortem , its 4 pages long, but the following paragraph jumped out at me as the most important.

Many people working at Friendster sneered at MySpace. The holy grail at Friendster — and the cause of most of its technical problems — was its closed system: users at Friendster could view only the profiles of those on a relatively short chain of acquaintances. By contrast, MySpace was open, and therefore much simpler from a technological standpoint; anybody could look at anyone else’s profile.

In less public spaces then the NYTimes business pages you hear a lot of gossip about Frienster, mostly the personality clashes and rock star egos, but I think it really was that simple.

Failure to do the simplest possible thing that will work, and a failure to be public by default. (though going to war with your users is never a winner) It hurts an engineer’s soul, but worse really is better. (thank goodness I’m not an engineer, just a failed lit major)

Tagged: Uncategorized , , ,