Blog posts tagged "aggregation"

FriendFeed is too much info

May 2nd, 2008


One of the key topics (I think) in my Casual Privacy talk last week was the importance of “context” in privacy and sharing. That some people have trouble understanding how fundamental context is to all social interactions was my primary take away from SG Foo, and I’ve been preaching it quietly where I can.

All by way of saying, I made one of my rare visits to FriendFeed this evening, and I was reminded that I consistently regret it. Breaking down those contextual walls means I consistently like the people I find there less then I did when I was able to interact with them in isolated manners; fire walling the aesthetic from the technical from the political from the personal.

We need routing not aggregation.

Gmail: Web Clippings

December 9th, 2005

Gmail’s new web clipping feature is nice. Decent if sparse selection of feeds (doesn’t feel like a straight pay for placement deal), good interface for adding, selecting.

Minor detail: adding new feeds to the system seems to be broken. (Google is really cutting themselves slack these days about rolling out half finished features aren’t they?)

Best possible feature? No text to unbold! My days are already spent unbolding text, or feeling guilty about the ridiculously large bold (now turning red, now blinking) number which signifies my news folders’ backlog.

Much nicer. Obviously not a replacement, but as an addition to my existing aggregator, its great.

On a related note, what are people subscribed to for their news these days?

update: Well that was short lived. Should have seen it coming, but would it be too much to ad just one feature not targetted to selling more ads? I’m pretty sure “Catch More Bass”, wasn’t listed in any of my feeds. Siiiigh.

Tagged: Uncategorized , , , , ,

Automatic Unsubscribe Considered Harmful

November 1st, 2005

I’ve see a couple of tools recently adding automatic unsubscribe features, options to unsubscribe from a feed which has gone silent for too many days or weeks.

This seems 100% wrong to me. Almost a betrayal of the bright and shiny promise of RSS.

As a Blogger

Part of what makes blogging a sustainable medium for personal publishing is I don’t have to publish every day, every week, or every month. I’m secure in the knowledge that when I do publish, my audience will still be there.

A TV station can’t do this, a newspaper can’t do this, and so they’re forced into a professionalization of media creation which is by and large unsustainable. (hence the poor quality of the evening news, wouldn’t it be nice if they only put out a report when they actually had some news to report on?)

As a Subscriber

I subscribe to a number of feeds that are only updated when something goes wrong. My server goes down, my page stops validating, there is an emergency weather alert. I need the confidence to subscribe, and then forget about these feeds secure in the knowledge that they’ll still be there when needed. (otherwise I’ll have nagging doubts, and might as well just check a website daily, this is what GTD is all about as I understand it, the confidence to forget)

As a Developer

I don’t get the motivation. A dormant feed is nearly zero cost. It isn’t changing so conditional GETs reduce the cost to the aggregator and the provider. It isn’t updating, so there is no cognitive cost to the reader. I don’t get the motivation.

Please if a feed goes long term 404, 410, 500, etc, sure unsubscribe, rather then pounding them forever. But a feed simply gone quiet? That would be a shame.

Wrong Problem

The real problem is some way to automatically detect feeds which are no longer interesting. And even then I usually hold on against the day they’ll swerve back to what I started reading them for. (usually I enjoy the detours, but sometimes…) One of the beauties of is it explicitly allows people to be multi-facetted, and I think our aggregator tools need to start being more aware of this.

caveat, I haven’t actually used FeedDemon’s feature (not being a Windows user), it merely reminded me of this worrying, dare I call it wrong headed, trend.

Shell Scripting and Agg Stats

September 21st, 2005

Been a while since I dug into my aggregator stats (intrigued by FeedBurner mentioning their tracking 2000 aggregators), and while I’ve got my Perl script, but I was alarmed to realize that I had forgotten the shell for doing the equivalent.

So killing time waiting for J I re-created it. Assumes you’re using Apache’s “full” log format (and that your feed is “index.rdf”)

sudo grep '/index.rdf' access.log | cut --delimiter=\" -f6,1 --output-delimiter="=" | 
sed 's/ - - \[[^=]*//' | sort | uniq | cut --delimiter="=" -f2 | sort | uniq -c | sort -n

Returns a count of unique IPs per User-Agent. Tack on a little awk to get aggregate counts.

| awk '{sum += $1; print sum}'

Of course folks like Bloglines, Rojo, Yahoo FeedSeeker, Feedster, and FeedLounge (among others I’m sure) are rolling up the user counts. Of course FeedLounge and FeedSeeker are counted multiple times as they add time sensitive info to their User-Agent (that has got to be against some best practices!), and Bloglines comes from a couple of different IPs.

Interestingly, Google Desktop is showing up as generating not only the highest number of hits, but the highest number of 200s.

Tagged: Uncategorized , , , , ,

Managing Attention in Aggregators

May 29th, 2005

Scoble has a really lousy feature request for RSS aggregators

Feature request for RSS News Aggregators: I want to be able to “clean up” my feed subscription list. I want to remove any RSS feed that hasn’t published in the past XX days (default to 30).

Part of what makes personal publishing work, and what makes RSS such a key piece of it is that you don’t have to publish every day, you don’t have to publish every week, and you don’t have to publish on a schedule.

You can publish when you have time and something to say confident in the knowledge that your audience is still there, still waiting. In a well designed interface feeds without traffic should be zero cost mentally to all parties involved.

Matt Webb has a much better suggestion.

Here’s a feature I want from my RSS feeder. Every so often it should silently hide one of the feeds. If I notice, and if I remember what it was is that’s been hidden, I should be able to say: Hey, you forgot feed X, give it back!, and the application would say: Okay then, you got me banged to rights, here it is. If I don’t notice or can’t remember, the feed is deleted permanently.

But how do you tell when the aggregator has deleted the feed in order to see if you miss it, and when they a feed just hasn’t been updated recently?

Tagged: Uncategorized , , ,


March 17th, 2005

One of the keys to any decent social software project is making it useful enough to the individual that they generate the data that makes the network useful. FeedTagger seems poised to do this.

The idea is simple. FeedTagger is a web based aggregator that allows you to browse not only by feeds and folders like existing aggregators but also by tags, tags you assign on a per feed, or per item basis.

Right now the emergent social knowledge community is just a flicker of potential, but in the mean time its a very interesting re-thinking of the aggregator.

And Chris is blogging about his experience building it, including using Magpie to run smack into PHP5’s XML parsing bugs. (fixed in the PHP5 nightlies)

update: this bug

Tagged: Uncategorized , ,

Same Web, Different Front Ends

February 23rd, 2005

Ted says RSS aggregators are the killer app, killer in the sense that they are

the app that does so much that it consumes all available CPU, memory, network, and disk.

He goes on to say

It’s a reflection of the way that my relationship to the web has changed. I hardly use a standalone browser anymore — mostly for searching or printing. I don’t have time to go

Funny, I’m following this trend to opposite end of the curve. I’ve been noticing lately that I spend less and less time using anything other then my standalone browser, Firefox. Once I’ve got a decent coding environment built in, pretty much the only external apps will become ssh and chat. Functionally Firefox is very close to becoming my operating system. (though in practice I still prefer running it on platforms other then Windows)

While the news has been full of the problems with network based and centralized services (see Preshrunk or Schneier), their nativeness suggests that perhaps some of the problems Ted is seeing are an impedance mismatch.

And a quick, semi-related tip, if you’re setting up several people on FoF, consider setting up a shared cache directory to get some economies of scale.

Tagged: Uncategorized , , , ,

RSS Aggregator Popularity

March 20th, 2004

9 months ago I ran a report on which of the RSS aggregators which hit LaughingMeme were most popular by total requests. Using some Perl, some awk, some cool shell-fu Steve sent me last time, and Haiko Hebig’s elegant CSS I whipped up some new graphs, most popular RSS aggregators by unique IP. These stats are for the last month of traffic on LaughingMeme, for each of the RSS feeds associated with this blog.

The first graph is the 20 most popular RSS aggregators regardless of version. The second graph treats each version of the aggregators as distinct.

Couple of things to note.

1. NewNewsWire still dominates, after all this time, not only in popularity, but also in diversity with a total of 18 distinct versions (counting Lite, and paid as distinct) observed.

2. In the first graph Bloglines is listed twice. The first number is treating all the subscribers for each of the 4 feeds I surveyed as unique, the second number is assuming that the subscribers for the less popular feeds are a subset of people who subscribe to the main feed. The real number lies somewhere in between (as I think a large number of people only subscribe to MLPs, or to LM)

Most popular RSS aggregators (all versions) by unique ip

RSS aggregators by unique ip
Tagged: Uncategorized , ,

Magpie, The Very Model of A Modern RSS Aggegrator

November 16th, 2003

I don’t really think of MagpieRSS as being an aggregator. It was afterall designed to be an RSS parsing library. However in the pursuit of a simple API that could be dropped into a dynamic page, it seems to have sprouted a number of the key features of a modern aggregator.

Over time people have developed a few metrics for measuring such:

  • Supports GZIP encoding for reduced bandwidth usage, check
  • Supports Conditional GETS/Etags for intelligent fetching based on last modifieded dates, check (anyone know of a good list of Etag compliant aggregators?)
  • Supports secure RSS feeds, including feeds protected by HTTPAuth, and SSL, check

Looks like Danny is starting to put together a page pulling this sort of information together, but it doesn’t look done yet.

Tagged: Uncategorized , ,

Collating RSS Items by Publish/Modified Date

October 3rd, 2003

Jarno who says we shouldn’t use the timestamp in RSS feeds to sort the entries, also invites us to disagree. And I disagree. People keep forgetting that RSS pre-dates blogging, and is used for other tasks this pushing headlines to a desktop aggregator. Wish people would stop trying to limit us all because they lose sight of the bigger picture (as was done with Atom in my opinion) or are working with a poorly thought out spec. (Userland RSS) Just for the record, I’ve included below the algorithm I used to sort items into a collated list in one of my aggregators.

  • check for a dc:date or pubDate per item, and assign the item that as its pub/mod date. (unfortunately for me, the handful of feeds published in Userland RSS I was aggregating were confused by the lastBuildDate tag, using it to keep track of the last build date for an item or a channel, instead of the item’s modification date, making it largely useless)
  • otherwise, attempt to a certain the most recent publish time of the feed, by either looking at the channel’s dc:date, or the channel’s pubDate, and use that as the pub/mod time for all newly seen items.
  • This has some nice features. Like, depending on how smart your aggregator is, it could do the right thing with the upcoming RSS feeds, even though they don’t use mod_event. (note to self, send Andy a note about that)
Tagged: Uncategorized , , ,

Aggregators and Prior Art

July 22nd, 2003

Mark reminds us aggregator are HTTP clients, and there’s a lot of prior art on how HTTP clients are supposed to work.

I struggle with how much of this Magpie should be aware of. Its not really an aggregator, but people use it as such. The response code (and in CVS the full headers) are made available to clients, but for the people using it as a simple drop in to their website, Magpie moves from being a library to the client.

A difficult balance, too complicated for my exhausted brain.

Tagged: Uncategorized , ,

Aggregator Traffic Stats

July 10th, 2003

Inspired by hebig’s post on the subject, I ran the user agent numbers for my RSS feeds (last 30 days of data). I also stole the nifty CSS bar graphs from the same location.

RSS Aggregators by percent of total requests
NetNewsWire Lite/1.0.2
Hep Messaging Library/0.0.2
Radio UserLand/8.0.8
Feedster Harvester/1.0;
rssSearch Harvester/1.0;
[not listed: 149 browsers]

Tagged: Uncategorized , ,

Tracking RSS Users

May 30th, 2003

Tbray has an article on counting RSS subscribers as part of making RSS commerce ready. (though I’m sure we can find some better use then that)

Predictably he stumbled onto the coals of the old referer flamewars, which while gone dark and black, apparently had enough heat to set him straight quickly.

Personally I never understood the problem. I think the practice of passing extra info in the referer logs got tarred by bad implementation; yes its annoying when Syndirella or Amphetadesk put a raw url into the referer field (the 2 client who seem to still do this in my logs), this confuses analog, but I liked Straw’s implementation which sent you a referer url in the (approximate) form of I miss this feature, which was dropped from Straw about the time aggregator writers were being drawn and quartered for “referer spam”. And Tim’s suggestion of using email hashes, while adding back the countability, doesn’t reopen the channels of communication. (well you can take every email address you know, and run it against the hashes, and see if anything turns up, but that has some obvious limits)

Tagged: Uncategorized , ,