Accelerating Gemini Nano models on Pixel with frozen Multi-Token Prediction

Machine Intelligence

Get smart on it

Large language models on phones typically generate text one word at a time, which wastes processing power and drains battery due to memory bandwidth limitations. Google has developed a method that retrofits Multi-Token Prediction onto existing frozen models by attaching a lightweight component to predict multiple words at once, then verifying them in parallel. This approach uses a zero-copy architecture that reuses the main model's memory cache instead of duplicating it, reducing memory usage and eliminating redundant processing steps. The result achieves speedups of fifty percent or more on mobile devices for tasks like notification summaries and text proofreading, with lower energy consumption and no changes to the base model's safety or capabilities.

DeepSeek Releases DSpark, a Speculative Decoding Framework That Accelerates DeepSeek-V4 Per-User Generation 60–85% Over MTP-1

DeepSeek released DSpark, a speculative decoding framework, with open-source checkpoints and training code. It is a serving optimization, not a new model. The checkpoints DeepSeek-V4-Pro-DSpark and DeepSeek-V4-Flash-DSpark reuse the existing V4 weights, with a draft module attached. The DeepSeek research team also open-sourced DeepSpec, an MIT-licensed codebase for training and evaluating speculative decoding drafters. The work targets one problem: faster large-model inference in busy produc

Models & ReleasesOpen story →

Asian AI startups launch Mythos-like models as Anthropic’s export ban drags on

New models are launching in Asia that promise Mythos-like capabilities without fear of an export ban. U.S. AI labs may never recover this enormous market.

Accelerating Gemini Nano models on Pixel with frozen Multi-Token Prediction

DeepSeek Releases DSpark, a Speculative Decoding Framework That Accelerates DeepSeek-V4 Per-User Generation 60–85% Over MTP-1

Asian AI startups launch Mythos-like models as Anthropic’s export ban drags on

OpenAI Previews GPT-5.6 With Sol, Terra, and Luna: Tiered Models, New Reasoning Modes, Limited Access

OpenAI Has New AI Models. Here’s Why You Can’t Use Them

OpenAI unveils GPT-5.6 amid US AI regulatory drama

Previewing GPT-5.6 Sol: a next-generation model