Thinking again about distributed log oriented writes as a better architecture for a whole class of persistent data we need to deal with. Atomic appends are actually one of the least appreciated features in GFS, and certainly the most critical feature HDFS is missing. Right now I’m not even sure I’m supposed to be worrying, my back of the napkins are saying maybe 10-20mil daily appends across 3-4mil queues is just like running a big mail install right? (remind me to look at Maildir again)

Also contrary to TC’s breathy article BigTable is not much like SimpleDB (other then they’re both ways of storing and retrieving data which aren’t MySQL) in that it doesn’t give you querying, just limited range scans on rows, and it seems to be really really expensive to add new columns (at least whenever I talk to Gengineers, they seem to flinch at the concept)

Meanwhile I’m still waiting on DevPay for SimpleDB, before I get into it in a big big way.