David Ahern and Leon Romanovsky will chair a workshop on AI Networks with a focus on RoCEv2 and the role of "netdev". AI training requires high bandwidth and low latency networks and networking stacks since training times are highly dependent on tail latency. At 800 Gbps ethernet link ports, with multiple ports per system, the inefficiencies of using socket based interfaces are abundantly exposed. For that reason any solution that uses these traditional network interfaces is incapable of meeting the desired high bandwidth and low latency requirements.
In this workshop, David and Leon will discuss the RoCEv2 protocol for AI training networks and the role of “netdev” and compare that effort in contrast to proposals to shunt device QPs to TCP[1] and devmem[2], io_uring[3].
More details here: https://netdevconf.info/0x19/sessions/workshop/ai-networks-rocev2-and-the-ro...
cheers, jamal
[1]https://netdevconf.info/0x16/sessions/talk/merging-the-networking-worlds.htm... [2]https://netdevconf.info/0x17/sessions/talk/device-memory-tcp.html [3] https://netdevconf.info/0x17/sessions/talk/fast-zc-rx-data-plane-using-io-ur...