Another talk on the intersection of AI and memory requirements and how networking is a fundamental piece of the equation.
AI workloads needs large, high-bandwidth, low-latency memory. Composable CXL memory fabrics address this by moving compute to data, running services such as search, caching, and quantization on smart Type-2 devices with local DDR. The trade-off? Device-side side needs IP endpoints, while CXL devices expose a memory window, not a NIC.
In this talk, Vijay Inavolu and Gaurav Agarwal present a Linux virtual-interface path built from stock kernel pieces. Host and device daemons open /dev/net/tun, configure a virtual L3 interface, and mmap a shared CXL HDM-H window as a packet ring, enabling local apps to access memory with no kernel changes. The shared-ring design, CXL ordering rules, and a new host-pod bridge pattern give device-side Linux a cluster-facing service identity, so services are discovered and reached by service IP while clients stay unaware of the memory window.
Measurements show 60x less host-link traffic than host-side compute and 2.56x FAISS throughput scaling across four cards.
Vijay and Gaurav would like to find partners in this crime to build composable AI memory fabrics on top of Linux networking.
Come, listen, provide feedback and yes collobarate!
cheers, jamal