Skip to content
LIVE
The Executives BriefThe Executives BriefBeta

AMD buys Mext to tame its AI-driven RAM crunch with flash memory tiering

The $2 to $4x flash expansion claim turns memory tiering into an AI-powered “cold storage” play for enterprise workloads.

ByYousef Al-ZahraniTechnology Correspondent, The Executives Brief
·4 min read
AMD buys Mext to tame its AI-driven RAM crunch with flash memory tiering
Executive summary

AMD’s compute and enterprise AI business acquired predictive memory startup Mext, founded in 2023, for an undisclosed sum. The deal pitches machine learning to offload “cold” RAM to flash and restore it before it is needed again, claiming 2 to 4x effective memory expansion.

AMD just made a bet that the AI memory crunch it helped cause can be softened with a software-defined “RAM to flash” strategy. This week, AMD’s “House of Zen” acquired predictive memory startup Mext for an undisclosed sum. Mext, founded in 2023, is trying to keep systems running like they have more fast memory than they actually do, by using machine learning to decide what data can be pushed into flash and when it should be pulled back into RAM.

The core claim is blunt: Mext says it can expand the effective memory of a system by 2 to 4x using flash. Flash gets you closer to main memory in aggregate bandwidth, but not in latency, so the engineering challenge is avoiding the worst case, where you constantly “swap to disk” and pay a big performance tax. Mext’s pitch is that it does not just shove data around. It predicts which memory is likely to go cold and then restores it before it is needed again, using access patterns as the signal.

To see why this matters, you have to zoom out to how memory constraints shape AI and enterprise economics. RAM is expensive, and AI workloads tend to be especially hungry because they pull in huge datasets, keep intermediate states around, and now increasingly rely on architectures like mixture of experts (MoE), where different sub-models may be used for different tokens. When you run out of fast memory, the usual options are to buy more DRAM or to redesign the software stack. Both are costly, slow, and often constrained by supply and procurement cycles.

Mext’s mechanism tries to sidestep that treadmill. It expands effective memory by exposing flash to the operating system as if it were regular memory, simply by running the Mextd daemon. That matters because it suggests a path where applications can keep thinking in familiar memory terms, while the platform handles tiering under the hood. Memory tiering itself is not new. There have been multiple reincarnations over the years, including software-based approaches and hardware-backed schemes such as Intel Optane persistent memory, which uses special 3D XPoint memory tech co-developed by Micron.

Where Mext differentiates is the “predictive” part and how it decides what to migrate. The platform uses machine learning algorithms and learned heuristics to proactively offload “cold” memory to flash storage. Based on data access patterns, it aims to restore that data before it is needed again. Instead of a single model doing all the work, Mext uses a combination of heuristics, long short term memory, and modern transformer architectures, selecting which blend renders the best results. The resemblance to how a branch predictor tries to guess what instruction path will matter next is not just a metaphor either. It is the same basic goal: reduce stalls by being early, not late.

AMD has the track record for this kind of prediction-driven memory management. In a blog post this week, Dan McNamara, SVP of AMD’s compute and enterprise AI biz, wrote that the approach has the potential to reduce infrastructure costs, improve resource utilization, and help customers more effectively scale general-purpose and AI workloads. That is a standard trio in tech justifications, but the details are the interesting part: if the prediction is good enough, you can pay flash prices while getting closer to DRAM-like behavior for the majority of time, at least on the access patterns that matter for real workloads.

The second-order question for decision-makers is where that “good enough” threshold lands, especially as AI workloads evolve. The source notes that beyond enterprise applications, the technology could have implications for AI serving. In MoE models, a different selection of experts may be used for each predicted token. In practice, an LLM may use some experts more frequently than others, meaning some experts are effectively “colder” during a given serving session. The source speculates that AMD could use Mext’s prediction algorithms to offload infrequently utilized experts from HBM to slower system memory, enabling enterprises to take advantage of larger, more capable models with fewer resources. That part is speculation, and AMD has not publicly confirmed a specific MoE use case here, but it fits the basic logic: if you can predict what will not be touched soon, you can tier it.

Of course, nobody should treat flash tiering as magic. If predictions miss, latency penalties return and performance can wobble. Still, the strategic pressure is real. AI is already reshaping infrastructure purchasing, and memory shortages have become a bottleneck that touches everything from cloud pricing to the feasibility of larger models. By acquiring Mext, AMD is signaling that it wants to win in the memory layer, not just the compute layer. For peers in similar roles, the stakes are straightforward: if AMD can make “more memory” feel like it scales with less DRAM, it changes how enterprises plan capex, how vendors price capacity, and how quickly competitors can ship larger AI workloads without hitting the same memory wall.

Executive ActionsLocked

This story's Key Insights and Take-aways are locked.

Create a free account to unlock Executive Actions for one credit.

Register to Unlock

Always free for Executives Club members. Join the Club

More in Technology