Mistral OCR 4 Brings Citation-Ready Structured Output to RAG, Agentic, and Enterprise Search Pipelines

Today, Mistral AI released OCR 4, its latest document-understanding model. This new release adds bounding boxes, block classification, and inline confidence scores alongside extracted text. It supports 170 languages across 10 language groups and runs in a single container for fully self-hosted deployments. OCR 4 also serves as an ingestion component for enterprise search, RAG, and domain-specific retrieval pipelines. TL;DR OCR 4 returns bounding boxes, typed-block labels, and per-word c

Make your prediction

Will Mistral OCR 4 support self-hosted deployment in a single container by July 2026?

Resolves by Jul 15, 2026

Your prediction

50% · 50/50 coin flip

NOYES

Get smart on it

Mistral AI released OCR 4, a document-understanding model that extracts text from documents while also providing additional structured information such as bounding boxes showing where text appears, classifications of what type of content each section is, and confidence scores indicating how certain the model is about its output. The model supports 170 languages and can be deployed on a single server for self-hosted use. This structured output matters because it enables downstream applications like citation generation, human verification workflows, and retrieval systems to understand not just what a document contains but where elements are located and how reliable the extraction is, making it suitable for enterprise search, document processing, and agent-based workflows.

Datalab Releases lift: A 9B Open-Weights Vision Model That Extracts Structured JSON From PDFs Using Schemas

Datalab has released lift, a 9B open-weights vision model for structured extraction. You pass it a JSON schema, and it returns a JSON object that matches. The model reads PDFs and images directly, then decodes against your schema. This is Datalab’s first model built purely for extraction. The team already ships open-source OCR tools: chandra, marker, and surya. lift extends that work into schema-driven field extraction. lift scores 90.2% field accuracy on Datalab’s 225-documen

Models & ReleasesOpen story →

Introducing Mistral OCR 4

Mistral OCR 4 delivers enterprise document AI with 170-language support, bounding boxes, and self-hosted deployment.

Models & ReleasesOpen story →

Mistral OCR 4 Brings Citation-Ready Structured Output to RAG, Agentic, and Enterprise Search Pipelines

Datalab Releases lift: A 9B Open-Weights Vision Model That Extracts Structured JSON From PDFs Using Schemas

Introducing Mistral OCR 4

Prime Intellect Releases prime-rl 0.6.0 to Train Trillion-Parameter MoE Models on Agentic RL Workloads

Sakana AI Launches Sakana Fugu: An Orchestration Model That Routes Tasks Across a Swappable Pool of Frontier LLMs

Nous Research Updates Hermes Agent With a Blank Slate Mode That Pins Toolsets via platform_toolsets.cli and disabled_toolsets

Yandex Open-Sources YaFF: A Zero-Copy Wire Format for Protobuf With Near-Struct Read Speed