Pao Ramen

Local-first search

Local-first search

I hear a lot of people are debating whether to adopt a local‑first architecture. This approach is often marketed as a near-magical solution that gives users the best of both worlds: cloud-like collaboration with zero latency and offline capability. Advocates even claim it can improve developer experience (DX) by simplifying state handling and reducing server costs. After two years of building applications in this paradigm, however, I’ve found the reality is more nuanced. Local-first applications do have major benefits for users but I think the DX claims fall off a cliff as your application grows in complexity or volume of data. In other words, the more state your app manages, the harder your life as a developer becomes. DX differences between local-first and cloud-based applicationsOne area I struggled with the most was implementing full-text search for my Fika, my local-first app that I used to write this very same blog post. Now that I’m finally happy with the solution, I want to share the journey with y’all to illustrate how local-first ideals can be at odds with practical constraints. Search at Fika Fika is a local-first web application built with Replicache (for syncing), Postgres (as authoritative database), and Cloudflare Workers (as the server runtime). It’s a platform for content curators, and it has three types of entities that need to be searchable: stories, feeds, and blog posts. A power user can easily have ~10k of those entities, and each entity can contain up to ~10k characters of text. In other words, we’re dealing with on the order of 100 million characters of text that might need to be searched locally. To deliver a good search experience for Fika’s users, I had a few specific requirements in mind: Good results: This sounds obvious, but many search solutions don’t actually deliver relevant results. In information retrieval terms, I wanted to maximize Recall@k, roughly, the fraction of relevant documents that appear in the top k results. Fuzziness: We don’t always remember the exact word we’re looking for. Was it “index” or “indexing”? Does décérébré really have 5 accents? The search needs to tolerate small differences in spelling/word forms. Techniques like stemming, lemmatization, or more generally typo-tolerance (a form of fuzzy search) help ensure that minor mismatches don’t result in zero hits. Highlighting: A good search UI should make it obvious why a result matched the query. Showing the matching keywords in context (highlighted in the result snippet) helps users understand why a given item is in the results. Hybrid search: This is a fancy term for combining traditional keyword search with vector-based semantic search. In a hybrid approach, the search engine can return results that either literally match the query terms or are semantically related via embeddings. The goal is to get the precision of keyword search (sparse) plus the recall of semantic search (dense). Local-first: Search is one of the main features of Fika, and it needs to work reliably and fast under any network condition. Other excellent bookmark managers like Raindrop are struggling to support offline, since they are based on a cloud-based architecture. This is a major differentiation point for Fika. With these goals in mind, I iteratively tried several implementations. The journey took me from a purely server-based approach to a fully local solution. Let’s dive in! First attempt: Postgres is awesome ok My first attempt was decidedly not local-first: I just wanted a baseline server-side implementation. I drank the Kool-Aid of “Postgres can do everything out of the box” and implemented hybrid search directly in Postgres. The idea was seductive: if I could make Postgres handle full-text indexing and vector similarity, I wouldn’t need a separate search service or pipeline. Fewer moving parts, right? First architecture: Pure PostgresWell, reality hit hard. I ran into all sorts of quirks. For example, the built-in unaccent function (to strip accents for diacritic-insensitive search) isn’t marked as immutable in Postgres, which means you can’t use it in index expressions without jumping through hoops. Many other rough edges cropped up turning what I hoped would be an “out-of-the-box” solution into wasted time reading PG docs and StackOverflow. In the end, I did get a Postgres-based search working, but the relevance of the results just wasn’t great. Perhaps one reason is that vanilla Postgres full-text search doesn’t use modern ranking algorithms like BM25 for relevance scoring, its default ts_rank is a more primitive approach. This was 2 years ago, so it would be worth trying again though, since ParadeDB’s new pg_search looks very promising! Second attempts: Typesense Having been humbled by Postgres, my next move was to try Typesense, a modern open-source search engine that brands itself as a simpler alternative to heavyweight systems like ElasticSearch. Within a couple of hours of playing with Typesense, I had an index up and running. It pretty much worked out of the box. All the features I wanted were supported with straightforward configuration and the developer experience was night-and-day compared to wrestling with Postgres. Second architecture with TypesenseTo integrate Typesense, I did have to modify the pipeline so that whenever a story/feed/post was created or updated in Postgres, it would also be upserted into the Typesense index (and deleted from the index if removed from the DB). But this was pretty manageable, far easier than dealing with Postgres. Sometimes we obsess with simple architectures that end up being harder to deal with. Simple ≠ easy. At this point, I had a solid server-side search solution. However, it wasn’t local. Searches still had to hit my server. This meant no offline search, and even though Typesense is fast, it couldn’t match the latency of having the data on device. For Fika’s use case (quickly pulling up an article on a phone while offline or on a spotty connection) I wanted to push further. It was time to bring the search engine, into the browser. <suspense music> Third attempt: Local-first with Orama If you’ve ever looked into client-side search libraries, you might have noticed the landscape hasn’t changed much in the last decade. We have classics like Lunr.js or Elasticlunr, but many are unmaintained or not designed for the volumes of data I was dealing with. Then I came across a relatively new, shiny project called Orama. Orama is an in-memory search engine written in TypeScript that runs entirely in the browser. The project supports full-text search, vector search, and even hybrid search. It sounded almost too good to be true, and despite the landing page not being ugly (always a bad sign), I decided to give it a shot. Third architecture: Local-first oramaTo my surprise, Orama delivered on a lot of its promises. Setting it up was straightforward, and it indeed supported everything I needed. I was able to index my content and perform both keyword searches locally. This was pretty mind-blowing: I had Elasticsearch-like capabilities running in the browser, on my own data, with no server round-trip. Awesome. But, and they say everything that comes before a “but” is bullshit. There were new challenges. The first issue was data sync and storage. To get all my ~10k entities into the Orama index, I needed to feed their text content to the browser. Fika uses Replicache to sync data, but originally I wasn’t syncing full bodies of articles to the client (just titles and metadata). Turns out that storing tons of large text blobs in Replicache’s IndexedDB store can slow things down. Replicache is optimized for lots of small key/value pairs and shoving entire 10k-character documents into it pushes it beyond its sweet spot (the Replicache docs suggest to keep key-value size under 1MB). To work around this, I adopted a bit of a hack: I spun up a second Replicache instance dedicated to syncing just the indexable content (the text of each story/feed/post). This ran in parallel with the main Replicache (which handled metadata, etc.) on a Web Worker. With this separation, I could keep most of the app snappy, and only the search-specific data sync would churn through large text blobs. It worked… sort of. The app’s performance improved, but having two replication streams increased the chances of transaction conflicts. Replicache’s sync, because of it’s stateful nature, demands a strong DB isolation level, so more concurrent data meant more retries on the push/pull endpoints. In other words, I achieved local search at the expense of a more complex and somewhat more brittle syncing setup. I told you the DX gets more complicated at the limit! And we are just getting started. I also planned to add vector search on the frontend to complement Orama’s keyword search. First I tried using transformers to run embedding models in the browser and doing all semantic indexing on-device. But an issue with Vite would force the user to re-download the ML model on every page load, which made that approach infeasible. This issue has been solved now, so it remains an experiment to try again. Later I tried generating the embeddings on the server and syncing them to the client, but the napkin math was not napkin mathing: With a context length of 512 tokens, most documents (~10k chars each) would need at least 5 chunks to cover their content (10,000 chars / ~4 chars per token ≈ 2,500 tokens, which is ~5 × 512-token chunks). That means ~5 embedding vectors per document. For ~10k documents, that’s on the order of 50k vectors. If each embedding is 768 dimensions (a common size for BERT-like models), that’s 38.4 million floats in total. 38.4M floats, at 4 bytes each (32-bit float), would be ~150 MB of raw numeric data to store/transmit. And because we use JSON for transport, it would encode these floats as text. The actual payload would balloon to somewhere between ~384–500 MB of JSON 😱 (each float turned into a string with quotes/commas overhead, e.g. "0.1234,"). That’s a lot to sync, store, and keep in memory. Could we quantize the vectors or use a smaller embedding model? Sure, a bit: for example, Snowflake’s Arctic Embed XS model has 384-dimensional vectors, which would reduce the size. But we’d still be talking hundreds of MB of data. And the more we started optimizing for size (bigger context lengths, aggressive quantization, etc.), the more the semantic search quality would degrade. So, I decide to yank the wax stripe and toss the hairy dream of local semantic search straight into the trash. In fact, to really convince myself, I ran two versions of Fika for a while: one version used my earlier server-side hybrid search (Typesense with both keywords and vectors), and another used local-only keyword search (Orama). After a couple of weeks of using both, I came to a counterintuitive conclusion: the purely keyword-based local search was actually more useful. The hybrid semantic search was, in theory, finding related content via embeddings, but in practice it often lead to noisier results. As I was tuning the system, I found myself giving more and more weight to the keyword matches. Perhaps it’s just how I search. I tend to remember pointers to the things I look for and rarely look for abstract terms. I could have bolted on a re-ranker to refine the results, but that would be a hassle to implement locally. And more importantly, the hybrid results sometimes failed the “why on earth did this result show up?” test. For example: Looking for a recipe I typed “bread” and I got an AI paper with nothing highlighted… oh wait, I see... there’s a paragraph in this AI paper that mentions croissants. Are croissants semantically related to bread? Opaque semantic matches can sabotage user trust in search results. In a keyword search, if I search a keyword, I either get results that contain those words or I don’t. It’s clear. With semantic search, you sometimes get results that are “kind of” related to your query, but without any obvious indication why. That can be frustrating. So I ultimately dropped the hybrid approach and went all-in on Orama for keyword search. Immediately, the results felt more focused and relevant to what I was actually looking for. Bonus: the solution was cheaper (no need to generate embeddings) and simpler to operate in production. However, I haven’t addressed the elephant in the room yet: Orama is an in-memory engine. Remember that ~100 MB of text? To index that, Orama was building up its own internal data structures in memory, which for full-text search can easily be 2–3× the size of the raw text. I was personally allocating ~300 MB of RAM in the browser just for the search index. That’s ~300 MB of the user’s memory just in case they perform a search during that session, and many sessions, they might not. What a waste. That’s far from good engineering. But we are not done with the bad news. On the server, you can buy bigger machines with predictable resources that match your workload. But with local-first, you have to deal with the device of your customer: For low-end devices (phones) the worse was not the memory overhead but the index build time. Every time the app loaded, I’d have to take all those documents out of IndexedDB and feed them into Orama to (re)construct the search index in memory. I offloaded this work to a Web Worker thread so it didn’t block the UI, but on a better-than-average mobile phone (my Google Pixel 6), this indexing process was taking on the order of 9 seconds. Think about that: if you open the app fresh on your phone and want to search for something right away, you’d be waiting ~9 seconds before the search could return anything. That’s worse than a cloud-based approach, and not an acceptable trade-off. I tried to mitigate this by using Orama’s data persistence plugin, which lets you save a prebuilt index to disk and load it back later. Unfortunately, this plugin uses a pretty naive approach (essentially serializing the entire in-memory index to a JSON blob). Restoring that still took on the order of seconds, and it also created a huge file on disk. I realized that what I really needed was a disk-based index, where searches could be served by reading just the relevant portions into memory on demand (kind of like how SQLite or Lucene operate under the hood). Last and final solution: FlexSearch As if someone was listening my frustrations, around March of 2025 an update to the FlexSearch library was released adding support for persistent indexes backed by IndexedDB. In theory, this gave exactly what I wanted: the index lives on disk (so app reloads don’t require full re-indexing), and memory is used only as needed to perform a search (and in an efficient, paging-friendly way). I jumped on this immediately. The library’s documentation was thorough, and the API was fairly straightforward to integrate. I basically replaced the Orama indexing code with FlexSearch, configuring it to use IndexedDB for persistence. The difference was night and day: On a cold start (new device with no index yet), the indexing process would incrementally build the on-disk index. This initial ingestion is still a bit heavy (it might take few minutes to pull all the data and index it), but it’s a one-time cost per device. On subsequent app loads, the search index is already on disk and doesn’t need to be rebuilt in memory. A search query will lazily load the necessary portions of the index from IndexedDB. The perceived search latency is now effectively zero once that initial indexing is done. Even on low-end mobile devices, searches are near-instant because all the heavy lifting was done ahead of time. Memory usage is drastically lower. Instead of holding hundreds of MBs in RAM for an index that might not even be used, the index now stays in IndexedDB until it’s needed. Typical searches only touch a small fraction of the data, so the runtime memory overhead is minimal. (If a user never searches in a session, the index stays on disk and doesn’t bloat memory at all.) At this point, I was also able to simplify my syncing strategy. I no longer needed that second Replicache instance continuously syncing full document contents in the background. Since the FlexSearch index persists between sessions, I could handle updates in an incremental way: I set up a lightweight diffing mechanism using Replicache’s experimentalWatch API. Essentially, whenever Replicache applies new mutations from the server, I get a list of changed document IDs (created/updated/deleted). I compare those IDs to what’s already indexed in FlexSearch. The difference tells me which documents I need to add to the index, which to update, and which to remove. Then, for any new or changed documents, I fetch just those documents’ content (lazily, via an API call to get the full text) and feed them into the FlexSearch index. This acts as an incremental ingestion pipeline in the browser. On a brand new device, it will detect that “no documents are indexed” and then start pulling content in batches until the index is fully built. After that, updates are very small and fast. This approach turned out to be surprisingly robust and easy to implement. By removing the second Replicache instance and doing on-demand content fetches, I reduced a lot of the database serialization conflicts. And because the index persists, even if the app crashes mid-indexing, we can resume where we left off next time. The end result is that search on Fika is now truly local-first: it works offline, it has essentially zero latency, and it returns very relevant results. The cost is that initial ingestion time on a new device and some added complexity in keeping the index in sync. But I’m comfortable with that trade-off. Conclusion As I mentioned at the beginning, the developer experience of building local-first software becomes more challenging as you push for more complex, data-heavy features to the client side. Most of the time, you can’t YOLO it. You have to do the math and account for the bytes and processing time on users’ devices (which you don’t control). In a cloud-based architecture you should also care about efficiency, but traditionally many web apps get away with sending fairly small amounts of data for each screen, so developers are less forced to think about, say, how many bytes a float32 in JSON takes up. In the end, I would dismiss DX claims and only recommend building local-first software if the benefits align with what you need: It resonates with your values. If you care about data ownership and the idea that an app should keep working even if its company go on an “incredible journey”. Your business model revolves around “backing up your data” instead of “lending your data”. The performance characteristics match your use case. Local-first often means paying a cost up front (initial data sync/download, indexing, etc.) in exchange for zero-latency interactions thereafter. Apps that have short sessions need to optimize for fast initial load times. But apps that have long sessions and a lot of interactions, benefit from local-first performance characteristics. You need the out-of-the-box features it enables. Real-time collaborative editing, offline availability, and seamless sync are basically “free” with the right local-first frameworks. Retrofitting those features onto a cloud-first app is extremely difficult. If none of the above are particularly important for your project, I’d say you can safely stick to a more traditional cloud-centric architecture. Stateless will always be easier than stateful. Local-first is not a silver bullet for all apps, and as I hope my story illustrates, it comes with its own very real trade-offs and complexities. But in cases where it fits, it’s incredibly rewarding to see an app that feels as fast as a native app, works with or without internet, and keeps user data under the user’s control. Just get your calculators ready, and maybe a good supply of coffee for those long debugging sessions. Good luck, and happy crafting!

Jul 31

create your own blog

fika is the place to share discoveries and ideas with others.

© 2025 fika