Why is everybody talking about sync engines?
And how game developers solved the web's biggest problem in the 90s
Oct 16, 2024
It's a sunny day in the countryside. The developers have gathered around the table and awkwardly engaging in small talk, which they hate. Meeting for the first time in real life, they subtly glance at each other's legs and bums, surprised by the unexpected difference in height. Especially Ethan, who has to bend to get out of the bus.
Although all were pro-remote, they chose to spend a week together to tackle a shared problem: their product sucks. It's a task manager app called “ChoreCommander” that was disruptive 10 years ago when web apps were not a thing, but nowadays, they find themselves being replaced by more modern and faster products. The circle of life.
David, the CTO, is willing to give up on his ideals. He has been a hardcore Ruby on Rails developer for more than a decade and he even has a tattoo of Matz on his left calf. “But perhaps”, he thinks for the first time, “it's time to embrace change.”
“Rendering the whole world on each request, despite caching and JavaScript shenanigans, is holding us back,” he told the team in a sober tone.
“…and we neglected offline and real-time collaboration. Nowadays, customers expect products to work like Gmail or WhatsApp, good on spotty connections, and without having to refresh to get updates.”
Maria has been waiting for this moment for a long time. She has been a React advocate since 2004 despite the library not being that old, but she claims the ideas were all there on her head. She will not let this opportunity pass, and will finally get that juicy “Principal” title that she deserves.
“I really think we should adopt React + Redux. You know, I’m a bit of a functional programmer myself, so I think immutability would really help towards our goals.”
Dan is skeptical. “State shouldn't be global! Globals are evil!” he yells. “Didn’t we learn anything from Object Oriented Programming? State must be local to the components or it will be hard to reason about.”
Tania sits in the corner, nodding her head as she begins to speak. “Doesn't this mean that we will have N+1 fetching problems on the frontend? Why not adopt react-query?” She takes a drag from her vape, holds it for five long seconds—an eternity—then exhales slowly before adding, “Everyone’s doing it.”
Ryan is the oldest. Caressing his beard, is not sure what to think of any of this. He is worried that components, being structured as a tree, will lead to loading cascades. He has recently seen a talk by some “React guy” that claims that “data fetching needs to happen at the route level,” which makes sense to him. After all, that's how it has always been done in the backend. But he is shy and does not dare to contribute.
Maria feels like everyone else’s opinions carry more weight, and her initial contribution was rather bland. Server Components come to mind, especially since Vercel—who scooped up all the React brains—seems to be heavily promoting them. “Have you guys heard about Server Components, our lord and sav…” Bam! Tadum! Crash. David just flipped the table, his face flushed with rage.
He is desperate. As a CTO, he can't let this slide. He was ChoreCommander’s first engineer 10 years ago, and has vested all his shares. On paper, he is a millionaire, but if the MRR keeps going down, he will be left with nothing. He is the Silicon Valley equivalent of the Schrödinger's cat, both rich and poor at the same time.
“Stop yapping! You’re all just bickering over what everyone else in the industry is doing, and your are not even talking about our users' actual concerns. What about offline functionality? What about real-time collaboration? Why is it so hard to make a f***ing to-do app? Every developer on the planet has made one!.” He is throwing things, he is losing it. Only Ethan can save them.
Ethan stands up and bangs his head against the ceiling lamp. Oh Ethan, why are you so tall? He is not even 20 years old, but he talks with the confidence of a developer who started his career punching cards in the 60s.
“You have no clue. All you web developers are a bunch of spoiled, entitled twats who’ve never solved a real engineering problem in your entire lives. I can't understand why you make everything so complicated, you are just converting database rows into html.”
He’s got everyone’s attention now. They’ve all been thoroughly humiliated, but deep down, each of them secretly wishes they could just be like Ethan.
“I come from a background in game programming, and we tackled this problem back in the ’90s.” He wasn’t even born then, but he always uses the Royal We. “What we need is a syncing engine, like in Duke Nukem 3D.”
What follows is the talk that Ethan should have given to the team. Instead, he pedaled away on a stolen bicycle, developed a game that flopped commercially, and now roams the streets of San Francisco, high on fentanyl. Meanwhile, David and the rest of the team sold the company for peanuts but landed comfortable positions at the acquiring firm, earning twice as much while working half as hard. So, in the end, it probably didn’t matter. But if you’re interested in sync engines, read on.
To learn how sync engines work, we are going to represent a scenario where two clients (Alice and Bob) do the following:
Alice and Bob read a counter from the server concurrently.
Alice increments the counter, and then Bob also does it.
Alice refreshes the page.
Let’s start by understanding how traditional web apps ⎯ nowadays called Multi Page Applications ⎯ work.
We use diagrams to represent the evolution of state and UI over time, where dotted lines are ephemeral state, and normal lines persistent state.
To read data, we request a “snapshot” of the server state over the network. We keep this state temporarily, meaning a refresh will fetch a newer snapshot. Writes are handled pessimistically, sent over the network, and only displayed to the user once the client receives server confirmation. This approach has worked for many years exceptionally well, offering simplicity and scalability thanks to its stateless design. However, it does come with several issues that might be unacceptable for your product:
Reads always occur over the network, meaning offline reading isn't possible, and you'll face the latencies.
Same goes for Writes, making all interactions feel laggy and without offline support.
The stateless model can’t offer real-time collaboration, since the server doesn’t proactively push updates to the client.
Let's try to solve these three issues incrementally. To do so, we are going to start by introducing a sync engine.
A sync engine manages interactions with the network and maintains persistent storage for the local state. Both stateful and nasty operations that we usually implement within our components. By extracting those into a separate layer, we can free ourselves from it.
Now that we have persistent storage, let’s cache reads. This is a technique often called Stale while re-validate.
This minor change magically solves the first issue, dropping read latencies to zero (for stale data), and getting offline reads too.
Let's move forward and allow the client to mutate the state locally. This is often called “Optimistic update” since the changes appear instantly, but need to be validated by the server.
This change addresses the second issue by enabling users to write offline, and making writes feel instantaneous. This is the reason why video games feel so responsive: when you play, you expect to see the effects of your actions with zero latency, while in the background, everything syncs to the server for validation and propagation to other users.
To sync local mutations to the server, there are two broad families of strategies:
Serializable: The server only accepts operations from clients that match its current state. If a client attempts to sync a local mutation with stale data, the server will reject it, requiring the client to refresh its data and try again. This is the simplest model and found in many databases.
Commutative: Mutations must be constructed or applied in a way that exhibits the commutative property, meaning the order of operations doesn’t matter. For example, instead of saying “set the counter to X,” you can use “increment the counter,” which is a commutative operation. Both CRDTs (Conflict-free Replicated Data Types) and OT (Operational Transformation) fall into this category.
Let's keep improving our simple sync engine and add real-time collaboration. We just need to allow the server to ping changes to the client:
With that, we have a fully working syncing engine. To be more specific, we've roughly described how Replicache works, which is one of the many sync engine solutions out there.
We can stress test it by adding offline periods to it and see how it would behave. As expected, it supports offline reading, offline writing, and eventually both clients converge to the same state.
It's a powerful architecture that liberates you from the gnarliest parts of building web applications. Sync engines feel as transformative as React was. I highly recommend everybody to try to build an app with this pattern. It’s a lot of fun.
I’ve certainly glossed over many details regarding the actual implementation of a sync engine. We haven't discussed the specifics of mutation operations or how servers and clients negotiate which data needs to be fetched. This is merely a high-level introduction to the topic.
If you want to learn more, I highly recommend reading the exceptional Matthew Weidner's article that goes deeper into the topic, and of course, the canonical Ink and Switch article on Local-first.
I hope some parts of this article brought a smile to your face or even made you laugh. I like to sprinkle in hints to show that my articles are not generated by AI. So far, sense of humor seems to be working fine as a modern Turing test.
Yours truly, Ethan.