It is turtles^wbatching all the way down folks!
Several parts of the kernel network stack use batching
for performance enhancement.
Maciej Fijałkowski, Bjorn Topel and Krzysztof Kazimierczak
feel we could do a little more. They borrow ideas from other high
performance packet processing solutions, such as DPDK,
and tweak those ideas to an "XDP first" design - meaning drivers
that are optimized for the case when all packets are processed by XDP.
In this talk, they describe how they enabled batching at both the driver
and XDP level to improve performance for two sample drivers Intel i40e
and ice.
Maciej et all will share the performance improvements gained and
propose some ideas how these batching ideas can equally be applied
at other BPF hooks (socket send/recv, traffic control, etc)
More info:
https://netdevconf.info/0x14-staging/session.html?talk-it-is-batching-all-t…
cheers,
jamal
Existing performance enhancing mechanisms such as TCP auto-tuning
or programmatic async(epoll, etc) events such as "ready to send"
help applications to sustain high throughput even under high
Bandwith-Delay product scenarios.
But: the folks at the Tor project have found that when you have
thousands of active TCP sockets transmitting high volumes of data,
many of which are simultenous (as it is the case in Tor anonymity
network) then these mechanisms are insufficient.
Local buffer bloat becomes a hindrance.
In this talk, David Goulet and Rob Jansen introduce a new
async event that helps applications overcome these issues.
The new event supplements and extends the current write
"ready to send" event that triggers when a socket buffer
has free space.
David and Rob will present data and go into more detailed
description of the problem and then show the effect that
such a change could have on performance through a small
scale simulation.
More info:
https://netdevconf.info/0x14-staging/session.html?talk-reducing-kernel-queu…
Reminder, registration is now open and early bird is still in effect.
https://netdevconf.info/0x14/registration.html
cheers,
jamal
There are times when you want to recreate a network test
that would typically take a while to complete. In addition,
when you run the tests, you want the results to be consistent
in different runs.
In this talk, Johannes Berg and Richard Weinberger describe
their solution to this requirement. They introduce a
mechanism to do "time travel" in User Mode Linux(UML) with a
virtual clock.
The time travel mode allows for reproducible testing at significantly
faster test execution times; as an example, on a relatively slow
laptop:
two simulated machines in such a setup can simulate 61 pings,
at a default 1 second interval, in about 1.6 seconds real time
(as opposed to 61 seconds).
More info:
https://netdevconf.info/0x14/session.html?talk-time-travel-linux-network-si…
Reminder, registration is now open and early bird is still in effect.
https://netdevconf.info/0x14/registration.html
cheers,
jamal
The iNet Wireless Daemon aka iwd was created to be an
alternative to wpa_supplicant. Iwd provides a more complete
solution set for WiFi compared to wpa_supplicant - with a
much longwe list of features.
Marcel Holtmann will provide insights into iwd architecture and
how it takes advantage of kernel features to provide good user
experience. In addition, Marcel will go over lessons learned
since inception of iwd in the last 5 years.
More info:
https://netdevconf.info/0x14/session.html?talk-5-years-of-iwd-lessons-learn…
Reminder, registration is now open and early bird is still in effect.
https://netdevconf.info/0x14/registration.html
cheers,
jamal
New virtualization deployments call out for high density virtual
functions with more frequent lifetime recycling. Unfortunately,
PCIe SR-IOV has limited function count and large instantiation
overhead. A solution to these challenges is to take a PCIE device
and "split" it into multiple subdevices. Each subdevice gets its
own virtual port(s), queues as well as named resources; Combined
with TC and switchdev offloads this approach overcomes the SR-IOV
limitations.
In this talk Parav Pandit introduces devlink enhancements
to manage such sub functions.
Parav first discusses how devlink is used to life cycle,
configure and deploy accelerated sub functions with eswitch offloads
support. He then discusses the plumbing done using
virtbus to achieve persistence naming of netdevices and
rdma devices. Parav will also cover how such model
addresses smartnic use case where a sub-function NIC is
hot plugged in host system in a secure manner.
More info:
https://netdevconf.info/0x14/session.html?talk-devlink-enhancements-for-sub…
Reminder, registration is now open and early bird is still in effect.
https://netdevconf.info/0x14/registration.html
cheers,
jamal
Sorry to dissapoint folks, but we are _not_ going to have any
talks on blockchain in 0x14! If this statement infuriates you
then please demonstrate your outrage by submiting a proposal
when the CFS for 0x15 opens up!
Now that we got that out the way...
What is Machine Learning?
Machine learning (ML) is a subset of artificial intelligence(AI)
that lets computer systems solve a specific task without using
explicit instructions, relying on patterns and inference instead
of human intervention.
But How Does ML Apply To Networking?
Machine learning can be used to observe patterns in network
traffic or configuration and use the resulting data for a
variety of things, some sample space:
- dynamic congestion control (goodbye named congestion control algos!),
see for example applicability of:
https://netdevconf.info/0x12/session.html?restructuring-endpoint-congestion…
- improve datapath performance
- path optimization
- anomaly detection from a baseline expectation and using the
resulting data either for security or optimization end goals
- etc.
At 0x14 we have two moonshot talks that look at using ML for
networking on Linux. These talks will be part of the ML
workshop which is debutting in 0x14. We hope to able to
solicit discussions and feedback on the subject and hopefully
have this workshop as a fixture in future netdev confs.
In the first moonshot talk Marta Plantykow, Piotr Raczynski,
Maciej Machnikowski and Pawel Szymanski will discuss an approach
to optimize networking performance alongside CPU utilization
with ML.
Marta et al propose an approach which will use ML to study RSS
patterns and the CPU spread and then react dynamically to
modify RSS hash parameters to improve CPU spread.
The authors will go over the challenges they overcame, show some
performance numbers and solicit feedback.
More info:
https://netdevconf.info/0x14/session.html?talk-performance-optimization-usi…
Our second talk is from Maciej Paczkowski, Aleksandra Jereczek, and
Patrycja Kochmanska. In this talk Maciej et al integrate into FRR
to understand how to best optimize the path selection in an environemnt
with multiple simultenous link faults and incestant link flapps.
Could routing decisions better helped with ML hooks in
the kernel/datapath? Could we make use of offloading some of
the algos to AI hardware?
The authors will go over the challenges they overcame,
and solicit feedback.
More info:
https://netdevconf.info/0x14/session.html?talk-machine-learning-in-packet-r…
Reminder, registration is now open and early bird is still in effect.
https://netdevconf.info/0x14/registration.html
cheers,
jamal
In this moonshot talk Rony Efraim Roni, Bar Yanai and Yossi Kuperman
come up with a clever way to do efficient forwarding by introducing
an offloaded TC hash action.
An example use case of this feature is closely emulating what Equal
Cost Multipath (ECMP) based forwarding does.
ECMP-like can be achieved by enabling policy which will:
- First create a hash on a classical 5 tuple in hardware
- Kernel s/ware receives the hashid as metadata
- Use the resulting hashid as a tc filter chain id and
jump to that filter chain
- within the destination chain lookup a much smaller set
of tc flower rules (in the equivalent 5 tuple space) and
execute the associated leaf action(s) such as setting the
next hop mac address, etc.
Roni et al will suggest the TC extension semantics
to define the hash offload and solicit feedback from
the community. They will further discuss the challenges
involved.
More info:
https://netdevconf.info/0x14/session.html?talk-extending-tc-with-5-tuple-ha…
Reminder, registration is now open and early bird is still in effect.
https://netdevconf.info/0x14/registration.html
cheers,
jamal
Lets start with some context.
The Problem: TCP SYN DDOS attack.
TCP SYN attacks flood a targetted server with SYN requests.
Each SYN request received by the server is responded to with
a SYN ACK which results in a connection state being created
and put in a half-open (SYN RCVD) TCP state awaiting an ACK to
come back. *The ACK response never comes* and the SYN
requests keep coming in resulting in more half-open state
creation in the backlog...
At some point, during this attack, there will be a lot of
TCP half-open state such that the targeted server's ability
to respond to new SYN requests is lost because all the available
port resources have been exhausted...
How does one defend against SYN flood attack?
A popular defense against SYN attacks uses what is known as
SYN cookies.
So, how do SYN cookies work?
When the server sees the TCP SYN, it constructs the SYN ACK
using a sequence # created from a cryptographic hash of
some of the flow attributes. The computed ACK sequence # has
enough flow attributes encoding in it such that can be later
(on response to this sequence #) used to reconstruct the
original SYN request should the original sender response.
Two possibilities on responses:
1) If this was an attack, then the server will never
receive a response.
It is no big deal since we are not storing any half-open info
in the backlog; and therefore no server resources are wasted in
anticipation of that response.
2) This is legit client request - in which case there will be
a response coming back. When the ACK is received, we look at
the ACK sequence # and reconstruct the SYN queue entry using the
information that was originally encoded in the SYN ACK.
SYN cookies are effective, but:
the operation requires the packets traverse the TCP/IP layers
all the way up. For a busy server, the extra code path and the
hash computation constrains how fast you can issue SYN cookies back.
In this talk, Petar Penkov, Eric Dumazet and Stanislav Fomichev
discuss SynGate, an XDP-based approach to handling SYN cookies.
By moving the response lower in the stack it enables the
system to increase the rate at which a host can issue SYN cookies
and therefore improving its resilience to SYN flood attacks.
Petar et al will detail the design of this solution, the advantages
eBPF provided them and, challenges faced during development of
of SynGate and finally they will discuss areas they are considering
for improvement.
More info:
https://netdevconf.info/0x14/session.html?talk-issuing-SYN-cookies-in-XDP
Reminder, registration is now open and early bird is still in effect.
https://netdevconf.info/0x14/registration.html
cheers,
jamal
SmartNICs are introducing a new paradigm in hardware
offload/acceleration world - that of introducing additional
general purpose processors in the NIC. So try to visualize
the NIC as almost its own machine connected to your host.
Meaning you can, for example, move your control plane to
the NIC.
In this talk, Or Gerlitz and Andy Gospodarek start
by describing high level view of this class of NICs.
They will then explore common deployment methods
and models and help us understand better how this new
class of NICs operate by exploring:
- what is out there in terms of networking open
source code/features
- automation recipes and scripts to onboard
software infra
- common kernel offload techniques in use
- upstream kernel and distro SoC support
Or and Andy will iterate a few working examples
that are available for use across devices from
multiple, open SmartNIC vendors. They will also
discuss approaches of integrating into known
cloud orchestration systems like OpenStack
and Kubernetes.
More info:
https://netdevconf.info/0x14/session.html?talk-taking-control-of-your-Smart…
Reminder, registration is now open and early bird is still in effect.
https://netdevconf.info/0x14/registration.html
cheers,
jamal
ADQ(Application Device Queues) support has been in the
kernel for a while now. ADQ enables network application
data to be isolated to specific (symmetric rx/tx) hardware
netdevice queue pair(s).
Application specific-data is ingressed towards these dedicated queues
and rate controlled in the egress direction - all using policy
definitions. ADQ uses standard Linux interfaces to achieve its
goals.
In this talk, Amritha Nambiar, Kiran Patil and Sridhar Samudrala
will:
- dig into the details of ADQ architecture and operations.
- illustrate collected data on how ADQ helps to improve
predictability by reducing jitter, lowering latency and improving
throughput.
- show, via an example application, how developers can take
advantage of ADQ.
Amritha et al will also discuss future plans for ADQ such
as enabling busy polling on AF_XDP sockets by associating a
NAPI_ID to an AF_XDP socket at bind time.
More info:
https://netdevconf.info/0x14/session.html?talk-ADQ-for-system-level-network…
Reminder, registration is now open and early bird is still in effect.
https://netdevconf.info/0x14/registration.html
cheers,
jamal