Blog posts tagged "news"

Fire at Beijing CCTV tower complex

February 9th, 2009
asc: it seems to me that we need to set up a magic email address for “things on fire”

Fire at 
Beijing CCTV tower complex

From an amazing set by Ai de ke.

NYTimes Article Search API

February 7th, 2009

NYTimes: Sex & Scandal since 1981

I don’t have much to add that the New York Times hasn’t already said about their Article Search API. Its an amazing corpus to be searchable, both in breadth, and scope, and for sheer richness of the classification. I can’t think of an remotely comparable dataset with such a rich API.

Couple of things I noticed that I wanted to call out.

Get info about an article/Search by URL

Positioned as a search API, it also doubles as a “getInfo”-style API, as article URL is one of the searchable fields.


Just make sure to remove the various query string bits that the Times appends, as these aren’t indexed. Should make a “find the history of this topic being discussed” Greasemonkey script a snap.

Expert’s attention information

One of my less comprehensible requests to the NYTimes developer team at OSCON last year was to make sure their APIs exposed the “attention information of [their] editors.” Age of amateur, citizen journalism, and radical decentralization are all awesome, but the NYTimes’ editors job is to think about what is important and interesting full time; and that’s information worth mining.

And they did!

The page_facet, and nytd_section_facet both allow you to gauge some degree of relative weight given to a story. (section_page_facet seems like it ought to do the same thing, but I couldn’t get it to work)

?query=flickr nytd_section_facet:[Front Page]

Gives you articles mentioning “flickr” featured on the NYTimes front page. (of which it only finds 3, alas)

API Design

Good stuff:

  • Clean hackable URLs, you can play with it in your browser and see what you’re going to get.
  • The getList + extras (called fields in the NYTimes API) is the house wisdom at Flickr, and I’m glad to see it elsewhere
  • The parsed tokens block is neat, and I can see it being incredibly useful for working with such a large, varied corpus
  • The sure amount of searchable/indexable metadata and the granularity is really unprecedented, great to see them go out with such a rich, “here’s the data do something great” approach.


The graphic at the top of this blog post is a “visualization of the frequency of occurrence of the words ‘sex’ and ‘scandal’ in the New York Times, since 1981.”, part of a set of visualizations by blprnt_van built with the article search API, and Processing.


December 8th, 2008

public anger

A griot (pronounced /g?i.??/ in English or [??i.o] in French, with a silent t) or jeli (djeli or djéli in French spelling) is a West African poet, praise singer, and wandering musician, considered a repository of oral tradition.Wikipedia

Also an emerging tag for describing the ongoing protest in Athens over a 16 year old being shot to death at point blank range by Athens policemen.

Being used on Flickr, blogs, and Twitter and the meta Not being used by the corporate media (aside: the trailing ‘s’ is lexically significant, classic stemming does not work on tags)

Does anyone know how and where this tag emerged?

Clearly the next evolution in participatory media (and the only type with a future) is figuring out what the tools to discover, distribute and broadcast these meta-media collaborative objects. Who is thinking and writing about this?

Photo by murplejane

Rick Skrenta: What do you do when your success … sucks?

April 3rd, 2007

Stone Cottage pointed to a great post by Rick Skrenta, CEO of Topix (and mad mind behind NewHoo/DMOZ for those who can remember back that far) on the Topix re-launch.

Lot of really interesting stuff about identifying a brand’s core value and putting it into practice. But also a phenomenal laundry for a problem that has stumped a lot of us, how to make a local news site succeed. Including:

  • Anthropomorphize our existing technology into the roboblogger. This was a brilliant idea from one of our lead engineers. It simultaneously solves three problems:

    1. Booting up a new city — you need posting activity to draw the first editors. The roboblogger would give us that. But he is shy and gets out of the way if humans show up and take over a page.
    2. If the community editors go on vacation, the roboblogger can step back in and take over while they’re gone.
    3. People know when a robot is editing the page vs. a human. His profile icon is a picture of a little tin-can robot. His handle is ‘roboblogger’.
    No more confusion.

A project has to already have value to draw a valuable volunteer base, this is the classic and yet fundamentally hard problem with boot strapping all local sites. But as soon as you have volunteers your contract with them is to rain attention and love down upon their contributions. roboblogger is a really neat hack to handle the delicate balance of a site’s lifecycle and mix community and data mining techniques in social software. Looking forward to watching it play out.

Yahoo and A bit of speculation

December 12th, 2005

So the Yahoo acquisition of hit every tech blog on the planet this weekend, and hardly needs more rehashing. But a couple of ideas I haven’t seen elsewhere from one of my mailing lists.

It was pointed out that

[Yahoo] recently hired all the IBM people that worked at the WebFountain project.

And that the database of tagged website would be an awfully juicy source of data to start analyzing. Yahoo is the obvious player to build post-search interfaces, browsable and discoverable like Yahoo of old, but this time built to Web-scale.

Meanwhile is anyone watching the Flock’s future? What with its APIs to Yahoo’s Flickr, Yahoo’s, and integrated editor for all those new MT blogs. Just a thought.

Gmail: Web Clippings

December 9th, 2005

Gmail’s new web clipping feature is nice. Decent if sparse selection of feeds (doesn’t feel like a straight pay for placement deal), good interface for adding, selecting.

Minor detail: adding new feeds to the system seems to be broken. (Google is really cutting themselves slack these days about rolling out half finished features aren’t they?)

Best possible feature? No text to unbold! My days are already spent unbolding text, or feeling guilty about the ridiculously large bold (now turning red, now blinking) number which signifies my news folders’ backlog.

Much nicer. Obviously not a replacement, but as an addition to my existing aggregator, its great.

On a related note, what are people subscribed to for their news these days?

update: Well that was short lived. Should have seen it coming, but would it be too much to ad just one feature not targetted to selling more ads? I’m pretty sure “Catch More Bass”, wasn’t listed in any of my feeds. Siiiigh.

Tagged: Uncategorized , , , , ,

CommonTimes: Social Bookmarking as Open Editing

September 26th, 2005 is perhaps the beating heart of my web these days, not because I find bookmarks so useful, but because its useful to have a generic service for streaming links. But generic only gets you so far, as an engine for discovery can be painful, flipping through pages and pages of chronologically sorted results. Its comparable to the difference between Google’s search, with its largely generic listing of pages, and Google News which uses its domain knowledge to chunk, categorizes, and summarize the days news.


This is what CommonTimes is about. A project Jeff, Brian, and others launched a few months ago; it iterates on the successful model of to provide news centric “open editing” for the web. A vertical social bookmarking site, with a light touch editorial process to keep the site on topic.

The Web Needs Editors!

CT provides most of what you’d expect, tags, groups, bookmarklets, heat maps, RESTful APIs and some nice touches like an “Add from Bloglines” Greasemonkey script, and an adapted version of the del AJAX browser.

Perhaps more importantly CT points forward to a strategy (among many) for dealing with ever expanding problem of information overload, “smarter clients.” (Do I sound like a Microsoftie?) One approach is the AI-inspired, strong editor approach of a tech.memeorandum .

But personally my gets are on the “many editors makes categorization easy” technique that has got to be the years surprise success story, combined with tools which take advantage of available metadata, either through inference of explicit scoping.

Now that the idea is out there I’m surprised that there aren’t dozens of these vertical bookmarking sites.

Scaling Down, Scaling Up

Of course social sites, do rely on having a community, and there in perhaps lies the key challenge to building a site like CommonTimes. Thankfully there are solutions. Like breaking out of our silos, becoming a consumer as well as a producer of webservices. I want to tell CT about how to fish in my link stream (e.g. subscribe to, and then remix with its own services.

Finally a link to get you started: Ten Ways to use CommonTimes

Custom News Clipping with Magpie

February 27th, 2003

Laze talks about using gnews2rss to extract a custom RSS feed from Google News, and then parse and re-display it with Magpie. Home rolled news clipping service, like this one on progeria.

Tagged: Uncategorized , , , ,

BBC Miffed by Google News

September 24th, 2002

Some experts have questioned the value of the [Google News] service, arguing it fails to rank news reports on the basis of quality.
Google News seems to be an overview of the daily news much like the Yahoo News, but it works totally different. Rather then a hand selected list of appropriate news stories, Google draws from a wide list of news sources(much wider then would normally be acceptable to a media company), and uses its indexing technologies to list both: the hottest, most relevant breaking news, and to find related background material, and opposing views.

This has the BBC notably alarmed. It could simply be that they don’t rank high enough in the results, but I think its more then that. This service breaks down the monopoly on view point. It allows a wide range of interested (like SF IMC) to have a voice, and by its very nature undermines the conscious and unconscious censorship which lies at the very heart of the media making, consensus manufacturing machine.

Amusingly, the BBC provides an example of exactly why we need this service, their article is full of innuendo, and unnamed experts, an attack piece masquerading as unbiased journalism.

some experts have questioned…some critics have been less…senior journalists point out….”
Odd how the only the Google spokesperson is named? Who are these critics?
“Furthermore, it ranks stories according to the most recent, rather than the best, report. ”
Perhaps it would be better if Google allowed the BBC editorial staff to decide which stories to feature?

Tagged: Uncategorized , , ,