AI/ML workloads have a memory problem. The model memory requirements are growing exponentially but the GPUs just cant keep up.
No worries; CXL makes gobbles of memory available. The RAM can be disaggregated and placed in racks outside the system reachable via a network link. PJ says CXL.mem is typically pitched as a memory expansion technology (treating it like a "far" NUMA node), but this framing understates a critical issue: CXL link bandwidth can be orders of magnitude lower than native DDR, not just higher latency. As AI workloads push toward distributed memory pools, this bandwidth gap fundamentally changes the cost model for scaling large models across disaggregated memory. PJ has some suggestions for upcoming CXL 2.0 to move beyond simple expansion toward true memory-coherent cluster execution, calling for coordinated work in switching infrastructure, firmware, and Linux kernel networking/memory subsystems.
Come, listen and provide feedback to PJ!
cheers, jamal