- people - netdevconf.info

0x1A: Talk, io_uring ZCRX: Progress and Next Steps
by Jamal Hadi Salim 25 Jun '26

25 Jun '26

Pavel Begunkov will update us on the latest and greatest in the io_uring ZCRX universe. To make ZCRX reliable for broader workloads, key questions must be addressed: handling refill queue exhaustion when buffers can't be recycled to the kernel, sharing NIC queues across multiple processes, and detecting memory pressure before allocation failures degrade performance. In this talk Pavel covers latest ZCRX developments tackling these challenges, including recent API additions and their motivations, new features, and performance improvements. It concludes with future directions for zero-copy receive in Linux. More details: https://netdevconf.info/0x1A/sessions/talk/io_uring-zcrx-progress-and-next-… cheers, jamal

1 0

0x1A: Talk, Kernel shared memory socket transport
by Jamal Hadi Salim 24 Jun '26

24 Jun '26

High performance IPC between two local apps uses zero copy on shared memory. There's a plethora of APIs that require a plethora of libraries' support - and both the sender and receiver MUST use the same IPC library to communicate. sigh. In this talk, Davide Wei proposes adding kernel support for zero-copy shared memory IPC (both sender and receiver) built on AF_UNIX SOCK_SEQPACKET + io_uring, enabling any userspace library to interoperate without sharing a common IPC library. An RFC prototype is ready for community feedback on uAPI and design. More details: https://netdevconf.info/0x1A/sessions/talk/kernel-shared-memory-socket-tran… cheers, jamal

1 0

0x1A: Talk, Line-Rate Cybersecurity: Modern DPI and Encrypted Traffic Fingerprinting at 100 Gbps
by Jamal Hadi Salim 23 Jun '26

23 Jun '26

Luca Deri and Alfredo Cardigliano present recent advancements in nDPI, an open-source Deep Packet Inspection toolkit, addressing challenges posed by encryption and evasion protocols that limit legacy firewalls. The talk covers using cryptographic fingerprints to identify malicious actors despite encryption, and exposes structural flaws in JA3/JA4 fingerprinting methods when handling ephemeral TLS extensions. Finally, Luca and Alfredo will detail integrating nDPI with the Linux kernel firewall for real-time traffic optimization, plus architectural blueprints using PF_RING and SmartNIC flow managers to achieve deterministic 100 Gbps monitoring and hardware-accelerated enforcement. Come and interact with Luca and Deri! More details: https://netdevconf.info/0x1A/sessions/talk/line-rate-cybersecurity-modern-d… cheers, jamal

1 0

0x1A: Talk, Thrice the charm: an skb extension for BPF metadata
by Jamal Hadi Salim 22 Jun '26

22 Jun '26

Jakub Sitnicki revisits storing custom BPF data tied to an skb's lifetime. In this talk Jakub will recap what he's already established on the XDP/skb metadata feature (scope, accessible hooks, wipe conditions, and driver contract) and then discuss two approaches for paths where XDP metadata isn't supported: a BPF map side-stash available today, and an RFC for embedding a BPF metadata buffer in an skb extension chunk. Jakub will compare both from the BPF-hook interface and overhead perspectives, and explain why alternatives (extending metadata lifetime, BPF local storage) failed. The latter will delve into skb extension internals—compiling out unneeded extensions and avoiding hot-path allocations—and finally the talk closes with open user-API questions: whether a new socket option or cmsg type is needed, how users exchange metadata today and its limits, and how metadata maps to L7 abstractions like streams and requests. Come and engage Jakub! cheers, jamal

1 0

0x1A: Two talks on offloading security to DPUs
by Jamal Hadi Salim 22 Jun '26

22 Jun '26

In this talk Balakrishna Bhamidipati and Vijay Ram Inavolu describe total offload of TLS from the host onto a DPU. They run a reverse-proxy on the NIC to handle MCP traffic. This moves TLS termination, OAuth2/JWT validation, and session-aware L7 routing off the host and onto the NIC. The authors will describe the packet path, kTLS activation and fallback, session lifecycle, backend affinity, and SSE relay, showing how the design can be reproduced on commodity DPUs without any kernel modifications. https://netdevconf.info/0x1A/sessions/talk/macsec-protected-rdma-on-dpus-fr… In the second talk Alkama Hasan and Vijay Ram Inavolu describe how they resolve the hard problem of security for zero-copy mechanisms like RDMA. In such approaches, the payload bypasses the network stack altogether leaving the app outside the security fence that typically protects ordinary host networking. Large-scale AI and HPC jobs depend on RDMA to move data between nodes. MACsec is a natural fit to resolve this security challenge at layer 2. So what does such a DPU approach offer above the NICs that support MACsec offload today? Well, merely offloading MACsec lacks the ability to interact with a multi-tenant cluster orchestrator which a DPU-based approach can. IOW, an orchestrator (Kubernetes in this case) has no visibility into which nodes actually offer MACsec-protected RDMA egress, so a workload can't request placement on such a node. The authors illustrate this making MACsec become a schedulable property exposed in the node ResourceSlice via DRANet-Sec allowing workloads land on the right nodes automatically. https://netdevconf.info/0x1A/sessions/talk/macsec-protected-rdma-on-dpus-fr… cheers, jamal

1 0

0x1A: Talk, Can Homa and TCP Get Along?
by Jamal Hadi Salim 20 Jun '26

20 Jun '26

John Ousterhout examines TCP and Homa coexistence. Initial measurements show that when running TCP and Homa concurrently, TCP performs slightly better than Homa because Homa reduces its buffer utilization. However, over time Homa degrades badly because TCP selfishly fails to reciprocate and overloads buffers. John introduces homa_qdisc, a queuing discipline that paces both protocols' traffic, implements SRPT for Homa and limited SRPT for TCP, and balances output during congestion. Results? with homa_qdisc, both protocols improve. When running concurrently—Homa suffers only slight degradation, and TCP latency improves. Even better: TCP on its own rips the benefits of the homa_qdisc: with tail latency for short messages nearly halving versus fq_codel. Come and engage with John! cheers, jamal

1 0

0x1A: Talk, AF_XDP copy mode needs more love
by Jamal Hadi Salim 19 Jun '26

19 Jun '26

In the spirit of the legend Bob Marley (playing while i am sipping coffee and typing this!): AF_XDP: Could you be loved? Jason Xing is showing a lot of love to you. Most drivers can't do zero-copy, so copy mode is the universal path. Jason will talk about recent correctness fixes (multi-buffer TX leaks, continuation descriptors, TOCTOU metadata) cleared the bugs, while fine-grained profiling unlocked close to 2× throughput. Questions for you AF_XDP (yes, you!): Could you be locked - with finer grain? Could you be batched - without the pain? No single silver bullet! Just cycle-level perf, contention tracing, cache-line heatmaps. And the optimization love for you: - Send without budget blues - Hot structure reshuffles (cache-line dance) - Batch xmit — one ring to rule them - Doorbell coalescing (don't ring it twice) - Skb allocation acceleration - IRQ pairs trimmed short Don't let them fool ya — spinlock's not the ruler. Methodical, mon, methodical, one cache line at a time, thats how we get there! Ok, enough of my spin, read more details on the talk here: https://netdevconf.info/0x1A/sessions/talk/af_xdp-copy-mode-needs-more-love… cheers, jamal

1 0

0x1A: Talk, Networking Headless CXL Devices for AI Memory Services
by Jamal Hadi Salim 19 Jun '26

19 Jun '26

Another talk on the intersection of AI and memory requirements and how networking is a fundamental piece of the equation. AI workloads needs large, high-bandwidth, low-latency memory. Composable CXL memory fabrics address this by moving compute to data, running services such as search, caching, and quantization on smart Type-2 devices with local DDR. The trade-off? Device-side side needs IP endpoints, while CXL devices expose a memory window, not a NIC. In this talk, Vijay Inavolu and Gaurav Agarwal present a Linux virtual-interface path built from stock kernel pieces. Host and device daemons open /dev/net/tun, configure a virtual L3 interface, and mmap a shared CXL HDM-H window as a packet ring, enabling local apps to access memory with no kernel changes. The shared-ring design, CXL ordering rules, and a new host-pod bridge pattern give device-side Linux a cluster-facing service identity, so services are discovered and reached by service IP while clients stay unaware of the memory window. Measurements show 60x less host-link traffic than host-side compute and 2.56x FAISS throughput scaling across four cards. Vijay and Gaurav would like to find partners in this crime to build composable AI memory fabrics on top of Linux networking. Come, listen, provide feedback and yes collobarate! cheers, jamal

1 0

0x1A: Talk, The CXL Fabric End-Game: Bandwidth Realities and Networked Memory for AI Scale
by Jamal Hadi Salim 19 Jun '26

19 Jun '26

AI/ML workloads have a memory problem. The model memory requirements are growing exponentially but the GPUs just cant keep up. No worries; CXL makes gobbles of memory available. The RAM can be disaggregated and placed in racks outside the system reachable via a network link. PJ says CXL.mem is typically pitched as a memory expansion technology (treating it like a "far" NUMA node), but this framing understates a critical issue: CXL link bandwidth can be orders of magnitude lower than native DDR, not just higher latency. As AI workloads push toward distributed memory pools, this bandwidth gap fundamentally changes the cost model for scaling large models across disaggregated memory. PJ has some suggestions for upcoming CXL 2.0 to move beyond simple expansion toward true memory-coherent cluster execution, calling for coordinated work in switching infrastructure, firmware, and Linux kernel networking/memory subsystems. Come, listen and provide feedback to PJ! cheers, jamal

1 0

0x1A: Talk, Linux QUIC: Bringing a Modern Secure Transport into the Kernel
by Jamal Hadi Salim 18 Jun '26

18 Jun '26

Xin Long proposes migrating QUIC from user space into the kernel. He introduces QUIC as a native transport via a new IPPROTO_QUIC socket type enabling direct use by subsystems such as SMB and NFS. The architecture handles transport logic in the kernel while offloading TLS handshakes to user space, supporting POSIX-style APIs. The implementation has been validated through large interoperability tests and benchmarking. Come learn about this shiny new subsystem! More details: https://netdevconf.info/0x1A/sessions/talk/linux-quic-bringing-a-modern-sec… cheers, jamal PS:- Today is the last day for early bird registration i.e 50% off!

1 0

2026

2025

2024

2023

2022

2021

2020

2019

2018

people