
Machine Intelligence
Large language models on phones typically generate text one word at a time, which wastes processing power and drains battery due to memory bandwidth limitations. Google has developed a method that retrofits Multi-Token Prediction onto existing frozen models by attaching a lightweight component to predict multiple words at once, then verifying them in parallel. This approach uses a zero-copy architecture that reuses the main model's memory cache instead of duplicating it, reducing memory usage and eliminating redundant processing steps. The result achieves speedups of fifty percent or more on mobile devices for tasks like notification summaries and text proofreading, with lower energy consumption and no changes to the base model's safety or capabilities.

DeepSeek released DSpark, a speculative decoding framework, with open-source checkpoints and training code. It is a serving optimization, not a new model. The checkpoints DeepSeek-V4-Pro-DSpark and DeepSeek-V4-Flash-DSpark reuse the existing V4 weights, with a draft module attached. The DeepSeek research team also open-sourced DeepSpec, an MIT-licensed codebase for training and evaluating speculative decoding drafters. The work targets one problem: faster large-model inference in busy produc

New models are launching in Asia that promise Mythos-like capabilities without fear of an export ban. U.S. AI labs may never recover this enormous market.

OpenAI has begun a limited preview of GPT-5.6, its next-generation model series. The lineup splits into three named tiers: Sol, Terra, and Luna. Sol is the flagship. Terra targets everyday production work. Luna is the fast, low-cost option. OpenAI is starting with a small group of trusted partners through the API and Codex. According to OpenAI post, they shared the models and plans with the U.S. government first. Broader access in ChatGPT, Codex, and the API is planned in the coming weeks.
Want to go deeper than the news? Explore live, cohort-based AI courses taught by practitioners.
Browse AI courses on Maven