title && ()

Fear of over-engineering has killed engineering altogether

Second article about building fika. In this one, I'm going to review the napkin math behind it

Jul 26, 2024

In the last 20 years, we have seen how engineering has become out of fashion. Developers must ship, ship, and ship, but, sir, please don’t bother them; let them cook!

There are many arguments that go like “programming is too complex; it is not a science but an art form, so let’s YOLO it." And while there is a certain truth to that, I would like to make a counterargument.

Where does it come from?

If you’ve been in tech for long enough, you can see the pendulum always swinging from one end to another. The more you've seen, the more you realize that interesting things happen in the middle. In the land of nuance and “it depends."

Before the 2000s, academics ruled computer science. They tried to understand what "engineering" meant for programming. They borrowed practices from other fields. Like dividing architects from practitioners and managing waterfall projects with rigorous planning.

It was bad. Very bad. Projects were always late, too complex, and the engineers were not motivated by their work.

This gave way to the creation of the Agile Manifesto, which disrupted not only how we program, but also how we do business in tech altogether. Both Lean Startup and later the famous YC Combinator insisted on moving fast, iterating, and not worrying about planning.

To make things worse, engineers took Donald Knuth’s quote “Premature optimization is the root of all evil” and conveniently reinterpreted it as “just ship whatever and fix it later… or not."

I do think the pendulum has gone too far, and there is a sweet spot of engineering practices that are actually very useful. This is the realm of Napkin Math and Fermi Problems.

The right amount of engineering

To predict where a planet will be on a date, it's faster to model and calculate than to wait for the planet to move. This is because humans have a shortcut. It's called mathematics, and it can solve most linear problems.

But if I want to predict a complex system such as cellular automata, we have no shortcut. There is no mathematics – and perhaps there will never be – to help us bypass the whole computation. We can only run the program and wait for it to reach the answer.

Most programs are complex systems; hence the resistance to engineering at all. Just run it and we shall see. Yet, they all contain simple linear problems that, if solved, are shortcuts to understanding whether they make sense to be built at all. A very little cost that can save you months of work.

Those linear problems are usually one of the three:

  • Time: Will it run in 10 ms or 10 days? This is the realm of algorithmic complexity, the speed of light, and latencies.

  • Space: How much memory or disk will it need? This is the land of encodings, compression, and data structures.

  • Money: And ultimately, can I afford it? Welcome to the swamp of optimizations, cloud abuse, and why the fuck I’ve ended up living under a bridge.

And of course, the three are related. You can usually trade off time and space interchangeably, and money can usually buy you both.

Fermi problems and Napkin Math

One of the things that bothers most people is “not knowing the numbers." Predicting the future without past data can be very stressful. But, once you get used to making things up, you will see that what matters is not the numbers, but the boundaries.

One of the most well-known Fermi problems, infamously used in some job interviews, is to guess how many piano tuners there are in New York. You won’t get the correct number, but one can be certain that the number must be between 10 and 10,000. You can get closer if you know how many people live in New York (8M). If 1% own a piano (80,000) and 1 tuner can serve 100 customers a year, then the upper bound is about 800, or 1,000 to round it up. Despite those guessed boundaries being orders of magnitude apart, it may be enough to convince you not to build a 1 euro/month app for that niche.

You start by writing down extremely pessimistic assumptions, things that likely fall in the p99. For instance, if you want to calculate how much storage you need to store a book's content, assume a book has 5,000 pages. Most books will have less than that, so if the resulting calculation is positive, then you are more than good to go.

Calculations boil down to simple math: adding and multiplying your assumptions. Nothing fancy. But for other calculations, you will need to know benchmarks or details of algorithms and data structures. For instance, if you are trying to estimate how much money you need to train an LLM model, knowing how transformers work will help. It lets you calculate the memory needed. I recommend bookmarking some cheat sheets and keeping them around.

I usually only do a worst-case scenario calculation, but if you want to do interval calculations, you can use tools like guesstimate. Keep the calculations around since once you start having real data, you will want to verify the assumptions and update the priors.

Example of the Napkin Math at fika

As an example, I will show the calculations I did while building fika to determine what was possible, what was not, and what ended up being up to date.

The main assumption is that a p99 user (which I modeled after myself) would have around 5,000 bookmarks in total and would generate 100 new ones a month.

According to the HTTP Archive, the p90 website weighs ~9MB, which is very unfortunate because it makes storage (R2) costs too high. But if you look deeper into the data, most of this weight goes into images, CSS, JavaScript, and fonts. I could get clever with it and get rid of most of that content with Readability, compress images, and finally gzip it all. This is a requirement that I didn’t think of before starting the project, but it became obvious once the numbers were on the table.

I also calculated whether I could afford to use Inngest or not. The user price was too high until I discovered that batching most events could reduce the cost to a manageable amount.

I've also evaluated two more fantastic vendors. Microlink fetches the bookmark's metadata, and Browsercat provides a hosted Playwright solution. Unfortunately, their pricing model wouldn’t fit my use case. I wanted to price the seat at $2, and these two providers would eat up all the margins.

Later I explored implementing hybrid search. OpenAI's pricing at the time was $0.10 for a million tokens, which meant a monthly $0.60 per user. It was too pricey, but some months later they released a new model for only $0.02. Even though that price was now making semantic search possible, I had already migrated the search to the client.

With the release of snowflake-arctic-embed-xs, I wanted to see if I could embed all the bookmarks in memory. This was needed since implementing a disk-based vector database was not in scope. I calculated that it would need ~350MB, which is not great. But this space is moving fast, and small models are becoming more attractive, so I will wait a bit to see how it develops.

Lastly, one of my biggest fears about building a local-first app was ensuring that the users would be able to hold all the data on the client side. I’ve focused only on bookmarks, since this is where most of the weight will likely go.

  • Origin Private File System (OPFS): To read the bookmarks offline, I want to save a copy on the user device. Saving them all, in the worst-case scenario, is ~3GB. This means that with storage quotas of 80% in Chromium, I could support any device with more than 4GB of storage. Nice.

  • In memory full text search: I wanted to know whether I could have all the bookmark bodies in memory and operate a BM25-based search with orama. I don’t know much about the inverted trees and other data structures of full-text search databases. But, if we assume there is no overhead (unlikely), having all the bookmarks' text in memory would take around ~350MB. I didn't discard this approach, but this is definitely something I need to look deeper into. I'm currently exploring whether using SQLite + FTS5 would allow having those indexes on disk instead of in memory.

Current results

I've received the first 200 signups, proving that some assumptions were too pessimistic.

  • ~1000 → ~200 stories a month/user: It’s still early, but obviously most users still have few bookmarks and subscribe to very few feeds. This number alone makes the cost per user go down to $0.02, which unlock; removing the paywall altogether without going bankrupt.

  • 0.36MB → 0.13MB per bookmark: It turns out that bookmarks can be compressed more than I initially thought. Using WebP, limiting the size, and getting rid of all CSS/JS/fonts is making bookmarks very lightweight.

  • 5% → 3% overlap: My intuition says that this number will go higher. As more people join the platform, the likelihood that you will bookmark a story that someone else has already bookmarked should go higher. But at the moment, users are more unique than I thought.

  • 20% → 108% feeds per bookmark: This means that bookmarking one story finds, on average, 1.08 feeds. How can this be? Is RSS that popular that websites include more than one? Nope. This is one of the biggest surprises, and it was a complete miscalculation on my end. It turns out that the system has feedback loops. A bookmark recommends a feed. The feed contains stories. Those stories discover a different feed. For instance, subscribing to Hacker News is a very fast way to discover many other feeds.

It’s too early to judge the usefulness of the numbers per se, but making the exercise was a very important step to drive the architecture. It took me only one hour to put an Excel together. A very low cost compared to all the hours I spent implementing fika.

Coda

I hope after this post I encourage you to check the fridge before cooking. Don't be afraid to do some basic calculations, and doing so will not make others see you as a lesser alpha. It’s not over-engineering; it’s not premature optimization. It’s a very basic form of hedging, with a ridiculously low cost and a potentially bonkers return. Because the best code is always the one that is never written.

Cheers,
Pao

© 2024 fika title && () create your own blog