I have Google Takeout setup to periodically export to GDrive as a series of 10Gb .zip files. I’ve been meaning to get these over to S3 for a bit in part for backup purposes, and in part because I wanted to futz with the photos.

I’m quite confident I’m not the first person to want to do something like this. But we’ve reached an awkward and unfortunate moment in the Web’s journey where search just doesn’t work very well anymore. Except searching the latent information space of a LLM trained on the entire Web actually works unreasonably well for this sort of thing. Meaning, it is easier to write your own than find an existing solution. That’s going to have all sorts of weird consequences I’m sure.

So here’s my script, gdrive-to-s3.py. It use the MediaIoBaseDownload API to fetch files in byte ranges from Gdrive and multi-part S3 uploads which in combination remove the need to read any of these 10Gb files directly into memory or store then locally (either of which would rather rules out running the script on a t2.micro) and instead allows you to stream the data from one storage provider to another in manageable chunk sizes.

I used to leave long running jobs hanging around in load bearing screen sessions all the time. Kind of nice to return to my roots.