
Most enterprise data still sits inside PDFs, scans, and slide decks. Large language models and agents cannot use that data until it becomes structured JSON. Open-source document extraction has become the standard way to do that conversion on your own hardware. Two different problems hide under the phrase ‘PDF to JSON.’ The first is schema-driven extraction: you define fields, and a model fills them with values. The second is document parsing: a model reconstructs the page into st
Most enterprise data remains trapped in PDFs and documents that AI language models cannot directly use, requiring conversion to structured JSON format first. Two distinct problems exist under the phrase "PDF to JSON": schema-driven extraction, where fields are predefined and filled with values, and document parsing, where a model reconstructs the entire page into structured formats. Open-source models offer advantages over proprietary APIs by reducing costs, avoiding the need to send documents off-premise, and allowing local processing. Several open-source models and toolkits are available for each category, ranging from smaller vision-language models to document parsing systems that handle multiple file formats.
Junyang Lin was the technical lead of Alibaba’s Qwen project. He announced he was stepping down on March 3, 2026. He now lists himself as an independent researcher on his personal site. In a talk titled ‘Qwen: Towards a Generalist Model / Agent,‘ he walks through the Qwen family. It ends on a single line: “Training models -> training agents.” He later expanded that line into an detailed post as an independent researcher. This article reads the talk and the detai

Two hundred and fifty years after the signing of the Declaration of Independence, a new commercial asks: What if the Founding Fathers had access to Google Workspace?

This week, Anthropic released Claude Science. It is an app for scientists, available in beta. It runs on Anthropic’s existing Claude models, not a new model. The app targets researchers who juggle databases, notebooks, and cluster terminals. It runs multi-step research and records how each result was made. The beta is available for Pro, Max, Team, and Enterprise plans. Claude Science builds on Anthropic’s life sciences work from last fall. That earlier work connected Claude to th
Want to go deeper than the news? Explore live, cohort-based AI courses taught by practitioners.
Browse AI courses on Maven