All posts
EngineeringApril 12, 20268 min

Full File Snapshots vs Diffs: Why We Chose Snapshots

Binary diffs sound efficient in theory. In practice, they're a nightmare for game assets. Here's our technical deep-dive on why full file snapshots are the right choice.

By Victor Chanet

The elegant-looking wrong answer

When you're designing a source control system, there's an idea that sounds too elegant to resist: instead of storing a full copy of every file at every version, just store the diffs. A 500 MB texture that changed a few tiles only needs a tiny delta from the last version. Storage costs plummet. It's beautiful.

It's also wrong. Not a little wrong — fundamentally wrong for game assets. This post is about why we threw out the elegant answer and chose the boring one, and how that decision ripples through every other system we built.

Why binary diffs break on game assets

Binary diffs work well when the file format has local stability — when a small semantic change in the source produces a small contiguous change in the bytes on disk. Text files usually satisfy this. XML, JSON, even YAML produce predictable diffs where a modified line touches contiguous bytes.

Game assets rarely satisfy it. Consider three common cases:

  • Compressed files.Textures, audio, and many 3D formats are internally compressed. A single-pixel change in the source image, re-compressed, produces different bytes across the entire file — because that's what compression does. Your "small edit" becomes a near-complete rewrite at the byte level.
  • Serialized object graphs.Unreal's .uasset format, Unity's binary scene files, and similar serialized formats interleave many objects with offset tables. A change to one object shifts every subsequent offset, which means the "diff" is scattered across the whole file.
  • Re-imports and re-exports. Artists rarely just edit bytes. They open the file in a DCC tool, make changes, and export. The export path frequently reorders internal structures — even changing nothing produces a different byte layout on the next save.

In all three cases, a "diff" ends up being almost as large as the file itself. You've paid for the complexity of a diff system and gotten none of the savings.

The reliability cost is worse than the storage cost

Storage is cheap. We can afford to store a few hundred versions of a texture. What we can't afford is an unreliable restore path.

Diff-based systems have a fundamental property: to restore version N, you need version 0 plus every diff up to N. If any diff in that chain is corrupted, lost, or misapplied, every subsequent version is unrecoverable. Many systems mitigate this with periodic full snapshots, but the failure mode — "this specific old version is just gone" — still lurks.

Full snapshots are aggressively boring. Every version is a complete, self-contained file. Restoring version N is a single read. There's no chain. A corrupted old version affects only that version. You can delete arbitrary old versions to free space without worrying about breaking anything downstream.

For a system whose entire job is "never lose work, ever," this is the right trade. We'll pay for the storage.

Content-addressable storage recovers most of the savings

Here's where the story gets interesting. If you store every file by SHA-256 hash of its content, identical files are deduplicated automatically. Commit the same texture twice and you store the bytes once — regardless of whether it came from the same user, the same project, or a different project entirely.

This matters more than you'd expect for game assets. A texture that hasn't changed across 500 commits takes up storage for one copy, not 500. A prefab copy-pasted across three projects is one copy of bytes. A packaged build that's slightly different every night still shares most of its internal textures with the previous build.

What you're left paying for is genuinely different content — which is the thing you actually want versioned. For our workloads this recovers almost all the storage savings a diff system would have provided, without any of the reliability cost. It's one of those rare engineering decisions where the boring answer turns out to be the efficient one too.

The sync protocol becomes simpler too

Snapshot-per-version plus content-addressable storage makes the sync protocol nearly trivial. To sync, the client sends the server a list of hashes it has. The server replies with the list of hashes the client is missing. The client downloads those hashes. Done.

No diff application, no chain walking, no special cases for "what if my ancestor is corrupted." Resumable transfers are free — if a download stops halfway, the client just re-requests the missing hash on next sync. The protocol doesn't need to know anything about version history; it just moves content-addressed blobs.

Simplicity here pays off in reliability. Less code means fewer places for bugs to hide. When someone reports a sync issue, we can diagnose it in an afternoon, not a week.

What we give up

No design is free. Full snapshots have real downsides we live with:

  • Higher absolute storage.Even with deduplication, frequently-changing binary files cost more to store than they would under a diff system. We pass some of that along in pricing — storage caps scale with plan tier — but it's a real cost.
  • Less sophisticated diff views.For text files we show line-level diffs by computing them at read time. For binary files, "this version vs that version" is just two files side by side — there's no cleverness to surface what changed. For binary formats, cleverness would have been lies anyway, so this feels honest.

Neither is a blocker for shipping games. Both are trades we make consciously.

The boring answer is usually the right answer

There's a pattern we've noticed across many system design decisions at USourceControl: the clever answer introduces failure modes, the boring answer doesn't. Full snapshots over diffs. Server-authoritative locks over optimistic locks. SHA-256 integrity checks over checksums. Opinionated defaults over configuration.

None of these are novel ideas — they're well-worn choices that other systems have made. What's novel is actually making them in a product aimed at game teams, where the pressure to look impressive in a benchmark is strong, but the job that matters is "don't lose work."

Boring wins. Ship games.

Start your game project on solid source control

Free for solo developers. 50 GB storage, full desktop app, no credit card.

Start for free