
Foundation models keep getting stronger, yet they still stall on the same thing: context. A model can write code or analyze a dataset, but only with the right internal knowledge. That knowledge includes table schemas, metric definitions, runbooks, join paths and it lives scattered across catalogs, wikis, and a few senior engineers’ heads. Google Cloud introduced the Open Knowledge Format (OKF), an open specification that formalizes the LLM-wiki pattern into a portable, interoperable fo
The lesson for this story is being written.

Today, Mistral AI released OCR 4, its latest document-understanding model. This new release adds bounding boxes, block classification, and inline confidence scores alongside extracted text. It supports 170 languages across 10 language groups and runs in a single container for fully self-hosted deployments. OCR 4 also serves as an ingestion component for enterprise search, RAG, and domain-specific retrieval pipelines. TL;DR OCR 4 returns bounding boxes, typed-block labels, and per-word c

Datalab has released lift, a 9B open-weights vision model for structured extraction. You pass it a JSON schema, and it returns a JSON object that matches. The model reads PDFs and images directly, then decodes against your schema. This is Datalab’s first model built purely for extraction. The team already ships open-source OCR tools: chandra, marker, and surya. lift extends that work into schema-driven field extraction. lift scores 90.2% field accuracy on Datalab’s 225-documen

Mistral OCR 4 delivers enterprise document AI with 170-language support, bounding boxes, and self-hosted deployment.
Want to go deeper than the news? Explore live, cohort-based AI courses taught by practitioners.
Browse AI courses on Maven