Another talk on the intersection of AI and memory requirements and how
networking is a fundamental piece of the equation.
AI workloads needs large, high-bandwidth, low-latency memory.
Composable CXL memory fabrics address this by moving compute to data,
running services such as search, caching, and quantization on smart
Type-2 devices with local DDR. The trade-off? Device-side side needs
IP endpoints, while CXL devices expose a memory window, not a NIC.
In this talk, Vijay Inavolu and Gaurav Agarwal present a Linux
virtual-interface path built from stock kernel pieces. Host and device
daemons open /dev/net/tun, configure a virtual L3 interface, and mmap
a shared CXL HDM-H window as a packet ring, enabling local apps to
access memory with no kernel changes. The shared-ring design, CXL
ordering rules, and a new host-pod bridge pattern give device-side
Linux a cluster-facing service identity, so services are discovered
and reached by service IP while clients stay unaware of the memory
window.
Measurements show 60x less host-link traffic than host-side compute
and 2.56x FAISS throughput scaling across four cards.
Vijay and Gaurav would like to find partners in this crime to build
composable AI memory fabrics on top of Linux networking.
Come, listen, provide feedback and yes collobarate!
cheers,
jamal