Today I had an epiphany while staring at a very long url.
I thought: “Url minifiers are really nice to store all of this data for free”.
And then it stroke me… One can really store 4KB of arbitrary data
with a url minifier system and share it for free.
Storing files as a programming puzzle
So here is a programming puzzle. You are given a service that has two operations:
PUT: accepts payloads of at most 4KB of data, stores it and returns
a unique key of exactly 16 bytes
GET: if you supply a key, you can then fetch your original payload.
Given this abstraction, how would you store files of an arbitrary length,
such that random seek, and download concurrently is possible?
A solution
We first split the file into 4KB pages and store each of them.
We are then left with a long list of keys aiming at these pages.
Of course sharing the list of keys would be lame. We want to store this list in
the url minifier service as well.
One page can hold 256 urls. We build packs of 256 urls and store them
as we did for the pages.
We can recursively apply the same trick until we are left with a single root key.
Anyone who knows this root key could now download all of our file.
Even better, the urls are forming a tree structure that allows for efficient
random access in the middle of the file.
Actual implementation
I actually experimented with the idea with a famous url minifying service.
The first version was single threaded and I was downloading at a bit less than 20KB/s.
But when I tried to download 30 pages concurrently, I reached a very decent download
speed of 400KB/s, downloading my 3MB file in a little bit more than 7s.
I did not hit any rate-limiting at any time.
Since I don’t want to cause any trouble to any service, I will not share my script
as is… Instead here is a proof-of-concept version of my script, without
multithreading and with a mock in place of the url minifier implementation.