September 16, 2018
When I wrote about ideas and competition, I said: “you can beat large companies by spotting accidental complexity and avoiding it.”
Pick a problem and focus. That’s what every startup book says. And they are right. Scope creep killed lots of products and companies.
But that’s not enough. These days you have large, agile competitors like Amazon that consist of small, startup-like projects, exposed to the world as if they were independent companies.
When they notice your Super Solution taking off, they just spin up an internal project that does the same thing, but better.
So it’s important to apply this principle everywhere, beyond just product design. Identify and eliminate the complexity that existed before you even started.
Let’s start with something simple. Like files. Yes, those files.
Everybody uses them for decades, they must be bulletproof, right? But for some reason, Apple tried to get rid of them in iOS. Let’s look closer.
Why do you even need files? To store photos, docs, apps, you might say.
Oh, I hear “in Unix, everything is a file!” Great! So we also have filesystems listing processes, devices, and so on. Next to directories, drives, repositories, remote filesystems… Things are getting complicated.
Unfortunately, files mix two completely unrelated concepts: data storage and API (a way for apps to talk to each other).
You’ve probably never thought of files as an API. But let’s take a look:
Mostly, filesystems just work. Except when their functions interact in unexpected ways.
I want this movie in “Best of 2015”, “Fiction” and “Already Watched” folders at the same time, how do I do that? Oh, shortcuts!
Oops, you sent me a shortcut, not the file itself.
Here are my edits to your proposal - I have edited it already, what exactly did you change?
Multiple billion-dollar companies exist that answer just this one question (Dropbox, Github, Google Docs, to name a few)
When Apple designed the iPhone, they tried to make computing accessible to everybody, not just geeks and trained engineers. For users on the go, battery running out, without any manuals, without a keyboard to type a search query.
And while the problems listed above are rare, for millions of users, hundreds of them will run into them every minute. No wonder they tried to steer clear from this mess.
It’s not that filesystem designers are stupid. They’re brilliant people. But you can’t solve a problem when you don’t understand it’s a problem first.
The complexity accumulated slowly:
…
Oops.
At this stage, when you’re about to add another feature to the system, it’s not enough to think about the feature itself.
If you’re lucky, you need to think about every other function, too.
If not - you need to think about every combination of features out there.
Nobody has the time or imagination to do this, of course. That’s why we end up with all of these half-baked tools, dozens of similar services and thousands of bugs.
As I said, these people were smart. They added features one by one, because at that time, in their situation, it was the easiest and the smartest thing to do.
But let’s see what happens if we start with all these requirements together.
Good, now we have to worry only about the contents! Let’s remove the names altogether and refer to pieces by their hash (a cryptographic thumbprint of the data).
Modifications
First, we often have to store previous versions anyway, second, there are multiple ways you can modify a particular piece (“save a copy,” “modify this copy,” “replace all copies,” etc.) so I’d argue this is up to the app to handle.
Large files
We split large files into multiple pieces, organized like a tree with the root referring to leaves like “Country -> City -> Street -> House 3A -> Apartment 15 -> Photo of Apartment 15”. Databases are internally structured this way already, so there’s no harm in making it explicit, too. And you can refer to pieces from multiple places, like “Friends -> A -> Andrew -> Photo of Apartment 15”, without changing a single thing.
What about access rights?
For reading, you can encrypt the piece, store the key together with the “name” and “document type” and share that. For writing, it’s even simpler. You either accept the message “make picture GreatCat refer to piece A555”, or you don’t.
Backup and synchronization systems are greatly simplified, too. They just need to store these blobs once and for good. And if you share only the pieces, without the encryption keys - they never ever get to see what’s stored there.
It’s easy to bash on large and faceless companies with arguments pulled out of thin air.
However, the approach I described is exactly how my knowledge assistant stores all of its data. It’s encrypted, backed up across multiple servers, and the entire source code for the data storage, including a lot of “conventional” file system functions, is still less than 5000 lines long, with comments and blank lines.
For comparison, the source code for Unix “ls” command (list a directory) is at 4700 lines already, and that’s just one feature.
I spent about a week writing this part. Maybe the “ls” command was written faster. But over time it adds up. The less code you have, the fewer bugs you have, the faster you can make changes.
This is how the software should be built. Not to the point when there’s nothing more to add, but to the point when there’s nothing to take away.
Ideas, experiments and projects by Oleksandr.
Ping me via Telegram, Twitter, or just e-mail. There's also a Telegram channel of articles I like.