
OpenAI engineers used large-scale core dump analysis to debug rare infrastructure crashes, uncovering both a hardware fault and a long-standing software bug.
Software services written in C++ can crash when bugs cause writes to incorrect memory addresses. When crashes occurred in a data infrastructure service, initial investigation of individual crash snapshots seemed impossible to solve because the symptoms pointed to multiple contradictory causes. The team switched to analyzing patterns across the entire population of crashes, similar to epidemiological methods, which revealed two separate unrelated bugs: silent hardware corruption on one host and a race condition in a widely-used open source library that had gone unnoticed for eighteen years.

As utilities struggle to keep pace with AI-driven demand, a new industry coalition aims to create a common playbook for powering next-generation data centers.

Nvidia AI chip competitor Etched says it has already booked $1 billion under contract for the inference systems powered by its chip.

The deal reflects the operator’s strategy to scale its footprint amid surging AI-driven demand for data center infrastructure, while potentially signaling Blackstone’s shift toward higher-return investments.
Want to go deeper than the news? Explore live, cohort-based AI courses taught by practitioners.
Browse AI courses on Maven