All posts
EngineeringApril 12, 20268 min

Full File Snapshots vs Diffs: Why We Chose Snapshots

Binary diffs sound efficient in theory. In practice, they're a nightmare for game assets. Here's our technical deep-dive on why full file snapshots are the right choice.

By Victor Chanet

The elegant-looking wrong answer

When you're designing a source control system, there's an idea that sounds too elegant to resist: instead of storing a full copy of every file at every version, just store the diffs. A 500 MB texture that changed a few tiles only needs a tiny delta from the last version. Storage costs plummet. It's beautiful.

It's also wrong. Not a little wrong. Fundamentally wrong for game assets. This post is about why we threw out the elegant answer and chose the boring one, and how that decision ripples through every other system we built.

Why binary diffs break on game assets

Binary diffs work well when the file format has local stability, where a small semantic change in the source produces a small contiguous change in the bytes on disk. Text files usually satisfy this. XML, JSON, even YAML produce predictable diffs where a modified line touches contiguous bytes.

Game assets rarely satisfy it. Consider three common cases:

  • Compressed files.Textures, audio, and many 3D formats are internally compressed. A single-pixel change in the source image, re-compressed, produces different bytes across the entire file, because that's what compression does. Your "small edit" becomes a near-complete rewrite at the byte level.
  • Serialized object graphs.Unreal's .uasset format, Unity's binary scene files, and similar serialized formats interleave many objects with offset tables. A change to one object shifts every subsequent offset, which means the "diff" is scattered across the whole file.
  • Re-imports and re-exports. Artists rarely just edit bytes. They open the file in a DCC tool, make changes, and export. The export path frequently reorders internal structures. Even changing nothing produces a different byte layout on the next save.

In all three cases, a "diff" ends up being almost as large as the file itself. You've paid for the complexity of a diff system and gotten none of the savings.

The reliability cost is worse than the storage cost

Storage is cheap. We can afford to store a few hundred versions of a texture. What we can't afford is an unreliable restore path.

Diff-based systems have a fundamental property: to restore version N, you need version 0 plus every diff up to N. If any diff in that chain is corrupted, lost, or misapplied, every subsequent version is unrecoverable. Many systems mitigate this with periodic full snapshots, but the failure mode still lurks: "this specific old version is just gone."

Full snapshots are aggressively boring. Every version is a complete, self-contained file. Restoring version N is a single read. There's no chain. A corrupted old version affects only that version. You can delete arbitrary old versions to free space without worrying about breaking anything downstream.

For a system whose entire job is "never lose work, ever," this is the right trade. We'll pay for the storage.

Why per-version snapshots stay affordable

The obvious objection to snapshots is cost. Surely storing a full copy of every version is wasteful? For the files game teams actually version, it isn't the problem it looks like. And the cost that does matter stays low for a different reason.

Start with the storage itself. Binary game assets don't diff meaningfully in the first place. As the previous section showed, a re-compressed texture or a re-exported .uasset is a near-complete byte rewrite, so a diff system would have stored almost the whole file anyway. The theoretical savings a diff model promises mostly evaporate on real game data. Given that, a plain per-version snapshot is the right model: it's simple, and every version is a complete, self-contained, independently recoverable file.

The cost that actually bites a distributed team isn't storage. It's bandwidth. And that stays low because the client only transfers files that genuinely changed. A texture that hasn't changed across 500 commits is uploaded and downloaded once, not 500 times; the desktop app hashes each file and skips anything it already has. So you keep the reliability of full snapshots and the boring simplicity of version-scoped storage, while the day-to-day expense of moving bytes over the wire is paid only for the content that actually changed.

The sync protocol becomes simpler too

Snapshot-per-version makes the sync protocol nearly trivial. Each file version has a SHA-256 hash of its content. To sync, the client compares hashes against what it already has locally and transfers only the files that differ. Unchanged files are skipped entirely. Done.

No diff application, no chain walking, no special cases for "what if my ancestor is corrupted." Resumable transfers are free. If a download stops halfway, the client just re-requests the missing file on next sync. The protocol doesn't need to know anything about version history; it just compares hashes and moves the bytes that changed.

Simplicity here pays off in reliability. Less code means fewer places for bugs to hide. When someone reports a sync issue, we can diagnose it in an afternoon, not a week.

What we give up

No design is free. Full snapshots have real downsides we live with:

  • Higher absolute storage.Keeping a full copy of every version means frequently-changing binary files use more storage than a diff system would in theory. Though, as we covered, diffs of binary assets rarely shrink much anyway. We pass some of that along in pricing, with storage caps that scale with plan tier, but it's a real cost.
  • Less sophisticated diff views.For text files we show line-level diffs by computing them at read time. For binary files, "this version vs that version" is just two files side by side. There's no cleverness to surface what changed. For binary formats, cleverness would have been lies anyway, so this feels honest.

Neither is a blocker for shipping games. Both are trades we make consciously.

The boring answer is usually the right answer

There's a pattern we've noticed across many system design decisions at USourceControl: the clever answer introduces failure modes, the boring answer doesn't. Full snapshots over diffs. Server-authoritative locks over optimistic locks. SHA-256 integrity checks over checksums. Opinionated defaults over configuration.

None of these are novel ideas. They're well-worn choices that other systems have made. What's novel is actually making them in a product aimed at game teams, where the pressure to look impressive in a benchmark is strong, but the job that matters is "don't lose work."

Boring wins. Ship games.

Start your game project on solid source control

Built for game studios of every size. From $10/mo for solo developers, $25/user/mo for teams.

Book a call