Yandex Open-Sources YaFF: A Zero-Copy Wire Format for Protobuf With Near-Struct Read Speed

TLDR YaFF is Yandex’s open-source zero-copy wire format for Protobuf — Apache 2.0, currently C++, v0.1.0. The .proto file stays the source of truth; only the physical memory layout changes. On Yandex’s benchmarks, the Flat Layout reads hot data ~3.8× faster than FlatBuffers, within 1.2× of a raw C++ struct. Four layouts — Fixed, Flat, Sparse, Dynamic — trade read speed for schema flexibility; Dynamic is the default. YaFF runs in its advertising recommendation system,

Make your prediction

Will Yandex's YaFF repository reach 300 GitHub stars by July 5, 2026?

Resolves by Jul 5, 2026

Your prediction

50% · 50/50 coin flip

NOYES

Get smart on it

YaFF is an open-source serialization library that provides a faster way to read data formatted as Protobuf messages without requiring a parsing step. It matters because parsing Protobuf can consume significant CPU resources in high-load systems, and YaFF offers zero-copy reads that approach the speed of raw data structures while maintaining Protobuf compatibility. Unlike alternative solutions, YaFF allows gradual adoption by converting data at module boundaries, meaning teams can introduce it into performance-critical code paths while leaving the rest of their system unchanged. The tool ships with multiple layout options that trade read speed for schema flexibility, with benchmarks showing the Flat Layout reads approximately 3.8 times faster than comparable alternatives.

Mistral OCR 4 Brings Citation-Ready Structured Output to RAG, Agentic, and Enterprise Search Pipelines

Today, Mistral AI released OCR 4, its latest document-understanding model. This new release adds bounding boxes, block classification, and inline confidence scores alongside extracted text. It supports 170 languages across 10 language groups and runs in a single container for fully self-hosted deployments. OCR 4 also serves as an ingestion component for enterprise search, RAG, and domain-specific retrieval pipelines. TL;DR OCR 4 returns bounding boxes, typed-block labels, and per-word c

Models & ReleasesOpen story →

Datalab Releases lift: A 9B Open-Weights Vision Model That Extracts Structured JSON From PDFs Using Schemas

Datalab has released lift, a 9B open-weights vision model for structured extraction. You pass it a JSON schema, and it returns a JSON object that matches. The model reads PDFs and images directly, then decodes against your schema. This is Datalab’s first model built purely for extraction. The team already ships open-source OCR tools: chandra, marker, and surya. lift extends that work into schema-driven field extraction. lift scores 90.2% field accuracy on Datalab’s 225-documen

Yandex Open-Sources YaFF: A Zero-Copy Wire Format for Protobuf With Near-Struct Read Speed

Mistral OCR 4 Brings Citation-Ready Structured Output to RAG, Agentic, and Enterprise Search Pipelines

Datalab Releases lift: A 9B Open-Weights Vision Model That Extracts Structured JSON From PDFs Using Schemas

Introducing Mistral OCR 4

Prime Intellect Releases prime-rl 0.6.0 to Train Trillion-Parameter MoE Models on Agentic RL Workloads

Sakana AI Launches Sakana Fugu: An Orchestration Model That Routes Tasks Across a Swappable Pool of Frontier LLMs

Nous Research Updates Hermes Agent With a Blank Slate Mode That Pins Toolsets via platform_toolsets.cli and disabled_toolsets