I beat FFV1 by 41% on real video — with zero external compression deps

So I'm building Recursor, a structure-aware lossless compression engine. The whole pitch is "understand the structure of the

data first, then compress." For video, that means a custom pipeline: spatial prediction, range coding, motion compensation — the works.

I had a constraint I'd been slowly fighting against: parts of the codebase still leaned on zstd and flate2 as escape hatches when our own algorithms didn't have enough teeth. Today I cut both deps out entirely and then took a swing at FFV1,

the gold standard for honest lossless video coding (the kind that ships in ffmpeg and gets used in archival workflows).

Result: RARE-V5 beats FFV1 by 41% on real natural video. From scratch.

Zero external compression libraries.

The setup

The morning's work was dependency removal. Recursor had ~80 zstd call sites spanning video planes, image rows, CSV columns, and container metadata. I replaced them with a thin shim around our range coder:

pub mod ours {
    /// Drop-in replacement for zstd::encode_all(data, _level).
    pub fn encode_all<R: Read>(reader: R, _level: i32) -> Result<Vec<u8>>;
    pub fn decode_all<R: Read>(reader: R) -> Result<Vec<u8>>;
}

The implementation is an order-1 context model (256 bit-trees keyed on the previous byte) plus explicit run-length encoding for redundant data. Not the fastest thing in the world, but it always roundtrips and gets ~10× on highly-redundant input.

That alone got us to "tied with FFV1" on screen content and slightly behind on natural video. Acceptable, not exciting.

The win: block-based motion compensation

The honest weakness was inter-frame prediction. Recursor's existing inter-frame mode just diffed the current frame against the previous one — fine for static content (screen recordings), useless for anything with movement. FFV1 doesn't have explicit motion compensation either, but its adaptive context model is strong enough that it doesn't matter as much for it.

I added a new motion module:

pub const BLOCK_SIZE: usize = 16;
pub const SEARCH_RANGE: i32 = 8;
pub fn estimate_motion_field(cur: &[u8], reference: &[u8],
                              width: usize, height: usize)
    -> Vec<MotionVector>;

For each 16×16 block in the current Y plane, exhaustively search a 17×17 window in the previous frame for the lowest sum of absolute differences. The Y motion field is reused (scaled) for the U/V planes. Motion vectors are predicted from the left neighbor and arithmetic-coded via the range coder. Skip-mode early-out for blocks where SAD at MV=(0,0) is already ≈ 0.

Two false starts here:

Three-step search: I started with the textbook fast-search algorithm. It

failed my unit test on a noisy image — for pure-noise content the SAD landscape

is a delta function (spike at the right MV, uniform noise everywhere else),

so any greedy search gets stuck in the noise floor. Real video has spatial

correlation, so TSS works there, but I switched to exhaustive search for

correctness. ~1 second per frame in release build.

Test pattern aliasing: My first test image was a smooth gradient

(x + 3y) mod 256. Multiple motion vectors give SAD=0 on that pattern

(along the gradient line), so even brute force picks "wrong" answers that

are still correct. Lesson: test motion search on something with spatial

structure, not periodic patterns.

The encoder now tries both the existing zoom-comp path AND the MC path per frame, keeping the smaller bitstream. Decoder dispatches on a 0xFF sentinel in byte 0.

The numbers

Tested on identical Y4M input across all codecs. 30 frames of a YouTube clip (640×360, h.264 source decoded to raw):

UTVideo 4.73x FFV1 8.97x RECURSOR-V5 12.61x ← +41% over FFV1 x265-lossless 24.18x

x264-lossless 31.30x

Smaller synthetic clips (5 frames each):

mandelbrot screen_text testsrc2 ---------- ----------- -------- HuffYUV 1.98x 2.89x 2.42x UTVideo 3.05x 7.46x 5.61x FFV1 5.27x 102.39x 20.96x RECURSOR-V5 5.46x 128.46x 19.52x x264-lossless 12.13x 230.87x 25.54x

x265-lossless 12.25x 166.21x 21.03x

We beat FFV1 on three of four test inputs. The one we lose on (testsrc2) is ffmpeg's synthetic test pattern with sweeping color bars — content that doesn't have clean translational motion, so motion compensation has nothing to chew on.

What we still don't have

The remaining gap to x264/x265 is real and I want to be honest about it:

Sub-pixel motion: x264 searches at quarter-pixel precision with 6-tap

luma filtering. We do integer-pixel only. This alone is probably 1.5–2× of

the gap.

Intra prediction modes: x264 has 8 directional modes plus DC and planar

for intra blocks. We have 5 spatial filters picked per-plane (or per-16×16

block in V4.6).

Transform coding: x264 applies 4×4 integer DCT to residuals. We

arithmetic-code residual pixels directly.

Each of those is a real chunk of work. They're on the list.

What I think this means

I'm not claiming Recursor will replace x264 tomorrow. The interesting result isn't "we beat the best" — it's that 41% over FFV1 on real video is achievable by one person, building everything in-house from a range coder up. No FFmpeg dependency, no zstd, no DCT library. Just exhaustive motion search and an order-1 entropy coder feeding a 32-bit LZMA-style range coder.

The lossless video compression world has been pretty stable for a long time. FFV1 has been the practical default for archival workflows since 2003. It turns out you can do meaningfully better with techniques that have been in the textbooks since the late 90s — you just have to actually implement them end-to-end and test them on real content.

Roundtrip is bit-exact. All 116 unit tests pass.