I'll spend a bit more time summarizing this talk because it will no
doubt generate passions in (some) people and curiosity in others.
If you thought that eBPF is the only approach to extend the kernel,
think again. In this talk, Lourival Vieira Neto et al describe a
framework, Lunatik, which facilitates dynamically injecting Lua scripts
into the kernel to extend kernel mechanisms.
Some context:
what is Lua?
1) It is a scripting language that is very widely deployed as an
embededable _extension language_. Game programming in particular
predominantly uses it for extensions. Openwrt uses it as config
language and there are many others listed here:
https://en.wikipedia.org/wiki/List_of_applications_using_Lua.
Wide deployment means it has seen exposure in many environments.
The _extension
2) It is a tiny language (the whole language is about 200KB) that is
designed to be easy to embed - in particular in C
3) It is considered relatively secure. Despite the wide deployment
over many years there have been very few CVEs reported against Lua in
more than a decade. See:
https://www.cvedetails.com/product/28436/LUA-LUA.html?vendor_id=13641).
Why scripting?
Extending kernels with scripts is nothing new. But:
The simple answer to this question is to compare a compiler driven
approach (think C) vs a scripting language(think Bash or python).
The former requires a complex development environment (think eBPF
needing latest clang, gcc, correct libbpf, etc) while the later
provides faster turnarounds to development and deployment (you need
a kernel with the Lua VM but not much after that in terms of tooling)
and of course, a much simpler and stable ABI. Script based execution
is often not as performant but often more usable relatively speaking.
And back to the talk...
The Lunatik framework has rich coverage across different kernel hooks
and has been used to script different Linux subsystems such as CPUfreq,
Sockets, RCU, ULP, Netfilter, and now XDP. Lunatik has been around for a
while, just not upstreamed - and to give a little sample space of its
deployment: NFLua is currently deployed _in production in over 20
million home routers_!
You dont want a rogue script kill your kernel and system.
Lourival et al will discuss the challenges and approach taken in
allowing script injection into the kernel while still maintaining
correctness, isolation, and liveness.
They will then describe NFLua and introduce XPDLua, which allows
users to extend XDP with Lua. XDPLua aims to replace NFLua.
The authors will detail how Lua can be used within XDP:
as a standalone to directly invoke currently exposed eBPF helpers
or alternatively have eBPF programs invoke Lua scripts.
And last but not least: Lourival et al will provide comparison
performance data for packet filtering between with Iptables, eBPF,
NFLua and XDPLua.
sounds exciting? Come to the conference, listen, learn, and engage
the speakers!
More info:
https://netdevconf.info/0x14/session.html?talk-linux-network-scripting-with…
Reminder, registration is now open and early bird is still in effect.
https://netdevconf.info/0x14/registration.html
cheers,
jamal
There has been an ongoing effort in the community to get Zero-copy
working over the last few years - with much success.
While both MSG_ZEROCOPY (zero copy transmit) and socket mmap (zero copy
receive) have been out in the wild for sometime now, there are no known
high scale open source application consumers of these interfaces.
In this talk, Or Gerlitz will describe how he makes use of these
interfaces to integrate into spdk(https://spdk.io/) - an open source
storage framework which uses sockets for nvme-over-tcp in a smart NIC
environment. The goal is to use kernel uAPIs while achieving high scale
performance.
Or will delve into MSG_ZEROCPOY and the challenges that he had to
deal with. He will describe the need for an app author to understand and
how to best tie-in their app state machine to the interfaces:
to address both transactions responses from the peer app and zero-copy
notifications from the local socket provider. Should the app use ZC with
all or nothing approach or sometimes yes and other-times no?
Come to the talk to get the answer and advise on how to effectively use
these interfaces.
In addition Or will spend time going into the performance analysis
details correlating I/O performance visavis CPU cost (which is often
tricky to get right with traditional tools like profiler/flame-graphs).
More info:
https://netdevconf.info/0x14/session.html?talk-storage-application-performa…
Reminder, registration is now open and early bird is still in effect.
https://netdevconf.info/0x14/registration.html
cheers,
jamal
MRP(Media Redundancy Protocol) is an open standard for ring topologies
in industrial ethernet networks defined in common standards-based
protocol(IEC 62439-2). In an MRP-enabled network each Ethernet switch
is connected to two other switches forming a ring. An MRP-enabled ring
can overcome single link point of failures at worst case recovery time
of 30ms - which is faster than STP.
In this talk Horatiu Vultur will describe the MRP protocol in
some detail. They will then proceed to discuss the effort to add support
to the kernel; different implementation approaches considered and
eventual implementation path taken after receiving feedback on the
mailing list. And last but not least, future work will be discussed
including hardware offload of MRP as well as preliminary results
comparing hardware-offloaded MRP vs non-offloaded version.
More info:
https://netdevconf.info/0x14/session.html?talk-adding-MRP-to-the-linux-kern…
Reminder, registration is now open and early bird is still in effect.
https://netdevconf.info/0x14/registration.html
cheers,
jamal
RPL is an IPv6 Routing Protocol for Low-Power and Lossy Networks defined
in RFC 6550. RFC 6550 defines a mode that is known as "storing" mode.
RFC 6554 defines a "non-storing" mode of route propogation.
While there are several other RPL open source implementations, they
all implement RPL using the "storing" mode which propagates routes
via ICMPv6. There are no open source "non-storing" mode implementations.
In this talk, Alexander Aring et al will discuss an implementation
that uses "non-storing" mode (RFC 6554).
In the storing mode(per RFC 6554), route propagation is done via the
IPv6 Routing Header.
Aring et al discuss the architecture, interface approach, challenges
that they overcame and outstanding future work.
More info:
https://netdevconf.info/0x14/session.html?talk-extend-segment-routing-for-R…
Reminder, registration is now open and early bird is still in effect.
https://netdevconf.info/0x14/registration.html
cheers,
jamal
Operations, Administration, and Maintenance (OAM)
refers to a set of techniques and mechanisms for performing
fault detection, isolation and performance measurements.
Classical approaches such are traceroute, ping, etc can now be
improved by collecting more granular and precise per-packet telemetry.
The IETF is currently in the process of standardizing In-situ OAM (IOAM)
to allow collecting operational information along a path.
In this talk Justin Iurman et al discuss an implementation of IOAM
for the Linux kernel with IPv6 as the encapsulation protocol.
They will discuss the details of their approach and demonstrate
evaluation results.
More info:
https://netdevconf.info/0x14/session.html?talk-implementation-of-IPv6-IOAM-…
Reminder, registration is now open and early bird is still in effect.
https://netdevconf.info/0x14/registration.html
cheers,
jamal
The PC has accepted a new workshop session.
Donald Sharp and David Lamparter will chair the FRRouting
(https://frrouting.org/) workshop.
Current agenda for the workshop includes:
* Where to put kernel boundary conventions:
* Current Status on netconf/yang models
* Status of using kernel nexthop groups FRR
* FRR feature rundown:
Recent past and future work
* Installing FRR for kernel developers
For more details:
https://netdevconf.info/0x14/session.html?workshop-FRR
Reminder, registration is now open and early bird is still in effect.
https://netdevconf.info/0x14/registration.html
cheers,
jamal
Hierarchial bandwith management is a very important packet
service for a lot of networking use cases (ranging from large data
centres to service provider use cases, etc).
Over the last decade, the TC Hierarchical Token Bucket(HTB) qdisc
has emerged as the most popular non-work conserving queueing discipline
for enabling this service in Linux.
HTB is quite flexible and versatile, but at large scale
(think thousands to million flows) it comes at a cost:
1) cpu cycles predominantly due to stalls caused by shared
queus lock contentions 2)extensive memory costs when adding many flows.
At 0x14 we have two sessions that are addressing this issue in
different ways.
Our first talk is from Yosef Kuperman, Rony Efraim and Maxim
Mikityanskiy and focuses on offloading HTB to the NIC hardware
(Mellanox cnx5).
Flow classification takes place in the TC egress clsact to avoid
any sorts of (queue) locking. Packets are tagged and the offloaded
HTB uses these tags as flow/classids to select the correct queue in
the hierarchy.
Kuperman et al will go over the challenges they overcame, show
performance numbers and solicit feedback.
More Info:
https://netdevconf.info/0x14/session.html?talk-hierarchical-QoS-hardware-of…
Our second talk is from the Google folks Stanislav Fomichev, Eric
Dumazet, Willem de Bruijn, Vlad Dumitrescu, Bill Sommerfeld and
Peter Oskolkov.
Google has for many years utilized HTB and consequently faced scaling
challenges.
With the recent introduction of Early Departure Time model (See
Van Jacobson's keynote on EDT in netdev 0x12), an opportunity has
opened up to achieve the same packet service in a more efficient way.
In this talk, Stan et al describe how they moved away from HTB
altogether.
The packet service is created using composition of BPF, FQ and
EDT. The authors will provide performance numbers, discuss some of the
outstanding challenges and solicit feedback from the community.
More info:
https://www.netdevconf.info/0x14/session.html?talk-replacing-HTB-with-EDT-a…
Reminder, registration is now open and early bird is still in effect.
https://netdevconf.info/0x14/registration.html
cheers,
jamal
We are pleased to announce Bronze sponsorship from Meter[meter.com]!
Thank you for your support Meter.
Meter is the easiest way to get the best Internet and WiFi for offices.
Meter takes care of everything, from ISP selection and installation to
ongoing support and network management. Meter combines powerful
software, custom hardware, and dedicated experts to provide
dramatically better Internet speed, security, and reliability.
More info:
https://netdevconf.info/0x14/news.html?bronze-sponsor-meter-com
cheers,
jamal
In this talk Markuze Alex et al describe how they improved,
by orders of magnitude, client download times of a
global overlay network across public clouds.
The overlay network known as the Pathway project
(operated by VMware Research) interconnects
geographical spread of public clouds and their vast
compute and networking infrastructure
The secret sauce? KTCP, a Kernel module running on a
modified Linux Kernel which implements novel TCP
splicing.
Markuze and co. will discuss why their approach is
different relative to the many approaches already
out in the wild that implement TCP proxying.
They will present numbers against classical approaches
which demonstrate that KTCP is able to considerably
increase the link utilization by TCP connections and
reduce the connection latency close to its theoretical
minimum.
More info:
https://netdevconf.info/0x14/session.html?talk-kernels-of-splitting-TCP-in-…
cheers,
jamal
Tom Herbert loves moonshots and three-letter acronyms.
First it was XDP and now it is BP4.
In this talk, Tom will introduce BP4 - a Domain Specific Language
for programmable dataplanes based on unifying the best features of
eBPF and P4. The goal of BP4 is “write once, run anywhere, run well!”
BP4 is intended to run in _both software and hardware_ execution
environments.
Central to a BP4 program is a dynamically programmable parser that
supports a wide variety of protocols and permits support for new
protocols to be added on the fly. The BP4 parser semantics include
native support for parsing Variable Length Headers (VLH) that contains
TLVs, flag-fields, or variable length arrays.
Tom will describe the first PoC for BP4 which leverages the eBPF
infrastructure. The PoC implements a flow dissector as a BP4 program by
essentially replicating the functionality of the current Linux kernel
flow_dissector with extra functionality to handle TLV and flag-fields.
The programmable flow dissector will then be used as the basis for a
dynamic tc-flower classification (which will allow protocols to be
programmed and dynamically added for tc-flower processing).
More info:
https://netdevconf.info/0x14/session.html?talk-BP4-byte-code-for-programmab…
Reminder, registration is now open and early bird is still in effect.
https://netdevconf.info/0x14/registration.html
cheers,
jamal