Blog posts tagged "news"
I don’t have much to add that the New York Times hasn’t already said about their Article Search API. Its an amazing corpus to be searchable, both in breadth, and scope, and for sheer richness of the classification. I can’t think of an remotely comparable dataset with such a rich API.
Couple of things I noticed that I wanted to call out.
Get info about an article/Search by URL
Positioned as a search API, it also doubles as a “getInfo”-style API, as article URL is one of the searchable fields.
Just make sure to remove the various query string bits that the Times appends, as these aren’t indexed. Should make a “find the history of this topic being discussed” Greasemonkey script a snap.
Expert’s attention information
One of my less comprehensible requests to the NYTimes developer team at OSCON last year was to make sure their APIs exposed the “attention information of [their] editors.” Age of amateur, citizen journalism, and radical decentralization are all awesome, but the NYTimes’ editors job is to think about what is important and interesting full time; and that’s information worth mining.
And they did!
nytd_section_facet both allow you to gauge some degree of relative weight given to a story. (
section_page_facet seems like it ought to do the same thing, but I couldn’t get it to work)
?query=flickr nytd_section_facet:[Front Page]
Gives you articles mentioning “flickr” featured on the NYTimes front page. (of which it only finds 3, alas)
- Clean hackable URLs, you can play with it in your browser and see what you’re going to get.
- The getList + extras (called fields in the NYTimes API) is the house wisdom at Flickr, and I’m glad to see it elsewhere
- The parsed tokens block is neat, and I can see it being incredibly useful for working with such a large, varied corpus
- The sure amount of searchable/indexable metadata and the granularity is really unprecedented, great to see them go out with such a rich, “here’s the data do something great” approach.
The graphic at the top of this blog post is a “visualization of the frequency of occurrence of the words ‘sex’ and ‘scandal’ in the New York Times, since 1981.”, part of a set of visualizations by blprnt_van built with the article search API, and Processing.
A griot (pronounced /g?i.??/ in English or [??i.o] in French, with a silent t) or jeli (djeli or djéli in French spelling) is a West African poet, praise singer, and wandering musician, considered a repository of oral tradition. – Wikipedia
Also an emerging tag for describing the ongoing protest in Athens over a 16 year old being shot to death at point blank range by Athens policemen.
Does anyone know how and where this tag emerged?
Clearly the next evolution in participatory media (and the only type with a future) is figuring out what the tools to discover, distribute and broadcast these meta-media collaborative objects. Who is thinking and writing about this?
Photo by murplejane
May 2, 2008⇒ All salmon fishing banned on West Coast.
“So few salmon returned last fall that the fishery council was required under its management plan to halt fishing throughout the salmon habitat, which is all along the California and Oregon coasts.”0. (Aside environment, environmental collapse, fishing, food, news, salmon, vegetarian)
October 28, 2007⇒ Der West is set to launch tonight.
the “much-hyped, much-anticipated and much-delayed, very ‘Web 2.0’ regional newspaper portal.” Looks like some of the bigger ideas might have gone overboard in the attempt to launch, but from what I can piece together not speaking German still looks like it could be very interesting0. (Aside, Uncategorized corporate media, news)
Lot of really interesting stuff about identifying a brand’s core value and putting it into practice. But also a phenomenal laundry for a problem that has stumped a lot of us, how to make a local news site succeed. Including:
Anthropomorphize our existing technology into the roboblogger. This was a brilliant idea from one of our lead engineers. It simultaneously solves three problems:
No more confusion.
- Booting up a new city — you need posting activity to draw the first editors. The roboblogger would give us that. But he is shy and gets out of the way if humans show up and take over a page.
- If the community editors go on vacation, the roboblogger can step back in and take over while they’re gone.
- People know when a robot is editing the page vs. a human. His profile icon is a picture of a little tin-can robot. His handle is ‘roboblogger’.
A project has to already have value to draw a valuable volunteer base, this is the classic and yet fundamentally hard problem with boot strapping all local sites. But as soon as you have volunteers your contract with them is to rain attention and love down upon their contributions.
roboblogger is a really neat hack to handle the delicate balance of a site’s lifecycle and mix community and data mining techniques in social software. Looking forward to watching it play out.
October 9, 2006⇒ twitters: Googlers are surprised.
now thats what we call p2p reporting0. (Aside google, news, twitter)
September 28, 2006⇒ Bill Clinton: “All you need is ubuntu”.
But he isn’t talking about LInux. Huh? And whats with the thong?0. (Aside africa, bbc, clinton, news, ubuntu)
January 30, 2006⇒ BBC: Iraqis ‘confirm’ bird flu death.
This is the route that H5N1 is going to take to enter the US0. (Aside h5n1, iraq, news)
So the Yahoo acquisition of del.icio.us hit every tech blog on the planet this weekend, and hardly needs more rehashing. But a couple of ideas I haven’t seen elsewhere from one of my mailing lists.
It was pointed out that
And that the del.icio.us database of tagged website would be an awfully juicy source of data to start analyzing. Yahoo is the obvious player to build post-search interfaces, browsable and discoverable like Yahoo of old, but this time built to Web-scale.
Gmail’s new web clipping feature is nice. Decent if sparse selection of feeds (doesn’t feel like a straight pay for placement deal), good interface for adding, selecting.
Minor detail: adding new feeds to the system seems to be broken. (Google is really cutting themselves slack these days about rolling out half finished features aren’t they?)
Best possible feature? No text to unbold! My days are already spent unbolding text, or feeling guilty about the ridiculously large bold (now turning red, now blinking) number which signifies my news folders’ backlog.
Much nicer. Obviously not a replacement, but as an addition to my existing aggregator, its great.
On a related note, what are people subscribed to for their news these days?
update: Well that was short lived. Should have seen it coming, but would it be too much to ad just one feature not targetted to selling more ads? I’m pretty sure “Catch More Bass”, wasn’t listed in any of my feeds. Siiiigh.
October 12, 2005⇒ More Bikes Than Cars Sold in US This Year.
Nice! (and interesting Boston area blog)0. (Aside bikes, blog, boston, news, oil, st)
October 5, 2005⇒ CommonTimes Integrates with del.icio.us and Technorati.
Fishing progressive news out the daily torrent of the del.icio.us link stream0. (Aside collaboration, commontimes, del.icio.us, media, news)
del.icio.us is perhaps the beating heart of my web these days, not because I find bookmarks so useful, but because its useful to have a generic service for streaming links. But generic only gets you so far, as an engine for discovery del.icio.us can be painful, flipping through pages and pages of chronologically sorted results. Its comparable to the difference between Google’s search, with its largely generic listing of pages, and Google News which uses its domain knowledge to chunk, categorizes, and summarize the days news.
This is what CommonTimes is about. A project Jeff, Brian, and others launched a few months ago; it iterates on the successful model of del.icio.us to provide news centric “open editing” for the web. A vertical social bookmarking site, with a light touch editorial process to keep the site on topic.
The Web Needs Editors!
CT provides most of what you’d expect, tags, groups, bookmarklets, heat maps, RESTful APIs and some nice touches like an “Add from Bloglines” Greasemonkey script, and an adapted version of the del AJAX browser.
Perhaps more importantly CT points forward to a strategy (among many) for dealing with ever expanding problem of information overload, “smarter clients.” (Do I sound like a Microsoftie?) One approach is the AI-inspired, strong editor approach of a tech.memeorandum .
But personally my gets are on the “many editors makes categorization easy” technique that has got to be the years surprise success story, combined with tools which take advantage of available metadata, either through inference of explicit scoping.
Now that the idea is out there I’m surprised that there aren’t dozens of these vertical bookmarking sites.
Scaling Down, Scaling Up
Of course social sites, do rely on having a community, and there in perhaps lies the key challenge to building a site like CommonTimes. Thankfully there are solutions. Like breaking out of our silos, becoming a consumer as well as a producer of webservices. I want to tell CT about how to fish in my link stream (e.g. subscribe to http://del.icio.us/rss/kellan/news), and then remix with its own services.
Finally a link to get you started: Ten Ways to use CommonTimes
Some experts have questioned the value of the [Google News] service, arguing it fails to rank news reports on the basis of quality.Google News seems to be an overview of the daily news much like the Yahoo News, but it works totally different. Rather then a hand selected list of appropriate news stories, Google draws from a wide list of news sources(much wider then would normally be acceptable to a media company), and uses its indexing technologies to list both: the hottest, most relevant breaking news, and to find related background material, and opposing views.
This has the BBC notably alarmed. It could simply be that they don’t rank high enough in the results, but I think its more then that. This service breaks down the monopoly on view point. It allows a wide range of interested (like SF IMC) to have a voice, and by its very nature undermines the conscious and unconscious censorship which lies at the very heart of the media making, consensus manufacturing machine.
Amusingly, the BBC provides an example of exactly why we need this service, their article is full of innuendo, and unnamed experts, an attack piece masquerading as unbiased journalism.
“some experts have questioned…some critics have been less…senior journalists point out….”Odd how the only the Google spokesperson is named? Who are these critics?
“Furthermore, it ranks stories according to the most recent, rather than the best, report. ”Perhaps it would be better if Google allowed the BBC editorial staff to decide which stories to feature?