
MoonMath AI team has released a bf16 forward attention kernel for AMD’s MI300X GPU. It is written in HIP, not hand-written assembly. The code is open-source under the MIT license. The MoonMath.ai team reports it beats AITER v3, AMD’s own optimized kernel, on every tested shape. Bare-metal access came from HotAisle, an AMD cloud provider. Attention is the fused softmax(QKᵀ/√d)·V operation inside every transformer. The MI300X is AMD’s CDNA3 data-center GPU, with the ISA targe
A team has released open-source code that performs a core AI computation called attention more efficiently on AMD's MI300X graphics processor. The code, written in HIP programming language rather than lower-level assembly, consistently outperforms AMD's own optimized version across all tested configurations and rounding modes. The speedup comes primarily from careful memory placement: storing certain data structures in different cache levels and registers to minimize data movement between fast and slow storage. The kernel has already been used in real applications, improving video diffusion generation speed by 1.23 times on MI300X hardware with no quality loss.

Virginia’s new electricity tax on data centers, including self-generated power, is projected to generate $600M annually.

Orbital data centers promise relief from terrestrial power challenges, but their future may hinge on a harder question: repair infrastructure or replace fleets.

Microsoft's West Texas power agreement with Chevron shows how AI developers are securing generation capacity alongside compute.
Want to go deeper than the news? Explore live, cohort-based AI courses taught by practitioners.
Browse AI courses on Maven