Kubernetes(k8s) Container Network Interface (CNI) is a specification for
managing network resources on a kubernetes cluster. CNI enables
plugin-based networking solution for containers ranging in functionality
from IP address management to access control policy management, QOS,
etc. Operators can pick and choose from the many packaged and open
CNI implementations that exist or could create custom CNIs to serve
their needs.
In this talk Rony Efraim and Liel Shoshan describe how they approach
hardware network offload enabling in k8s.
Rony and Liel will illustrate offloading with a bunch of CNIs that
can be used in conjunction with OVS. In addition, they will also
describe how to offload for other K8s use cases like pod-to-pod
intra networking and enhancing ingress service load balancing via
dp_hash. And last but not least the speakers will describe
challenges faced and future work.
More info:
https://netdevconf.info/0x14/session.html?talk-hardware-offload-for-k8s-con…
Reminder, registration is now open and early bird is still in effect.
https://netdevconf.info/0x14/registration.html
cheers,
jamal
As the commercials say - 5G(Fifth Generation) networking is here.
Why is that a big deal?
One appealing aspect: How do you like getting multi-gigabit
bandwidth to your mobile device(s)?
How does that work?
For the first time in commercial mobile networking history
a vast untapped spectrum has been made available for mass user
consumption in the millimeter wave length range; in the 5G world
it is refered to as "mmWave".
The spectrum in mmWave links makes possible the multi-Gbps data rates
on the 5G cellular networks.
The problem with mmWave links is they are, like most high-frequency
signals, susceptible to blockage. Basic obstructions like trees, snow,
rain, buildings, etc interfere with the signal. A technique called
beamforming helps but doesnt solve the problem entirely.
To put this in context:
Think of having a link that is very high speed but is constantly
fluctuating in capacity. Then the question is: "How does TCP congestion
control work in this kind of setup?"
In this talk Feng Li, Jae Won Chung and Jamal Hadi Salim[1]
will present results of a study to evaluate how various Linux TCP
congestion control algorithm implementation fare over mmWave links.
The authors claim this is _the first ever_ such study on a real
commercial 5G network! The study was carried on the Verizon 5G
deployment network.
Feng et al will present data on comparing popular TCP congestion
control algorithms, including NewReno, Cubic, BBR and BBRv2
(prepatch) etc. The results show that the performance of TCP on
mmWave links is still highly dependent on the combination of TCP
algorithm and socket buffer sizes.
Without a doubt mmWave links impose new challenges on future transport
layer design and the authors hope this talk will incentivize
more discussions in the community.
[1] Refered to in the third person.
More info:
https://netdevconf.info/0x14/session.html?talk-preliminary-evaluation-of-TC…
Reminder, registration is now open and early bird is still in effect.
https://netdevconf.info/0x14/registration.html
cheers,
jamal
It is turtles^wbatching all the way down folks!
Several parts of the kernel network stack use batching
for performance enhancement.
Maciej Fijałkowski, Bjorn Topel and Krzysztof Kazimierczak
feel we could do a little more. They borrow ideas from other high
performance packet processing solutions, such as DPDK,
and tweak those ideas to an "XDP first" design - meaning drivers
that are optimized for the case when all packets are processed by XDP.
In this talk, they describe how they enabled batching at both the driver
and XDP level to improve performance for two sample drivers Intel i40e
and ice.
Maciej et all will share the performance improvements gained and
propose some ideas how these batching ideas can equally be applied
at other BPF hooks (socket send/recv, traffic control, etc)
More info:
https://netdevconf.info/0x14-staging/session.html?talk-it-is-batching-all-t…
cheers,
jamal
Existing performance enhancing mechanisms such as TCP auto-tuning
or programmatic async(epoll, etc) events such as "ready to send"
help applications to sustain high throughput even under high
Bandwith-Delay product scenarios.
But: the folks at the Tor project have found that when you have
thousands of active TCP sockets transmitting high volumes of data,
many of which are simultenous (as it is the case in Tor anonymity
network) then these mechanisms are insufficient.
Local buffer bloat becomes a hindrance.
In this talk, David Goulet and Rob Jansen introduce a new
async event that helps applications overcome these issues.
The new event supplements and extends the current write
"ready to send" event that triggers when a socket buffer
has free space.
David and Rob will present data and go into more detailed
description of the problem and then show the effect that
such a change could have on performance through a small
scale simulation.
More info:
https://netdevconf.info/0x14-staging/session.html?talk-reducing-kernel-queu…
Reminder, registration is now open and early bird is still in effect.
https://netdevconf.info/0x14/registration.html
cheers,
jamal
There are times when you want to recreate a network test
that would typically take a while to complete. In addition,
when you run the tests, you want the results to be consistent
in different runs.
In this talk, Johannes Berg and Richard Weinberger describe
their solution to this requirement. They introduce a
mechanism to do "time travel" in User Mode Linux(UML) with a
virtual clock.
The time travel mode allows for reproducible testing at significantly
faster test execution times; as an example, on a relatively slow
laptop:
two simulated machines in such a setup can simulate 61 pings,
at a default 1 second interval, in about 1.6 seconds real time
(as opposed to 61 seconds).
More info:
https://netdevconf.info/0x14/session.html?talk-time-travel-linux-network-si…
Reminder, registration is now open and early bird is still in effect.
https://netdevconf.info/0x14/registration.html
cheers,
jamal
The iNet Wireless Daemon aka iwd was created to be an
alternative to wpa_supplicant. Iwd provides a more complete
solution set for WiFi compared to wpa_supplicant - with a
much longwe list of features.
Marcel Holtmann will provide insights into iwd architecture and
how it takes advantage of kernel features to provide good user
experience. In addition, Marcel will go over lessons learned
since inception of iwd in the last 5 years.
More info:
https://netdevconf.info/0x14/session.html?talk-5-years-of-iwd-lessons-learn…
Reminder, registration is now open and early bird is still in effect.
https://netdevconf.info/0x14/registration.html
cheers,
jamal
New virtualization deployments call out for high density virtual
functions with more frequent lifetime recycling. Unfortunately,
PCIe SR-IOV has limited function count and large instantiation
overhead. A solution to these challenges is to take a PCIE device
and "split" it into multiple subdevices. Each subdevice gets its
own virtual port(s), queues as well as named resources; Combined
with TC and switchdev offloads this approach overcomes the SR-IOV
limitations.
In this talk Parav Pandit introduces devlink enhancements
to manage such sub functions.
Parav first discusses how devlink is used to life cycle,
configure and deploy accelerated sub functions with eswitch offloads
support. He then discusses the plumbing done using
virtbus to achieve persistence naming of netdevices and
rdma devices. Parav will also cover how such model
addresses smartnic use case where a sub-function NIC is
hot plugged in host system in a secure manner.
More info:
https://netdevconf.info/0x14/session.html?talk-devlink-enhancements-for-sub…
Reminder, registration is now open and early bird is still in effect.
https://netdevconf.info/0x14/registration.html
cheers,
jamal
Sorry to dissapoint folks, but we are _not_ going to have any
talks on blockchain in 0x14! If this statement infuriates you
then please demonstrate your outrage by submiting a proposal
when the CFS for 0x15 opens up!
Now that we got that out the way...
What is Machine Learning?
Machine learning (ML) is a subset of artificial intelligence(AI)
that lets computer systems solve a specific task without using
explicit instructions, relying on patterns and inference instead
of human intervention.
But How Does ML Apply To Networking?
Machine learning can be used to observe patterns in network
traffic or configuration and use the resulting data for a
variety of things, some sample space:
- dynamic congestion control (goodbye named congestion control algos!),
see for example applicability of:
https://netdevconf.info/0x12/session.html?restructuring-endpoint-congestion…
- improve datapath performance
- path optimization
- anomaly detection from a baseline expectation and using the
resulting data either for security or optimization end goals
- etc.
At 0x14 we have two moonshot talks that look at using ML for
networking on Linux. These talks will be part of the ML
workshop which is debutting in 0x14. We hope to able to
solicit discussions and feedback on the subject and hopefully
have this workshop as a fixture in future netdev confs.
In the first moonshot talk Marta Plantykow, Piotr Raczynski,
Maciej Machnikowski and Pawel Szymanski will discuss an approach
to optimize networking performance alongside CPU utilization
with ML.
Marta et al propose an approach which will use ML to study RSS
patterns and the CPU spread and then react dynamically to
modify RSS hash parameters to improve CPU spread.
The authors will go over the challenges they overcame, show some
performance numbers and solicit feedback.
More info:
https://netdevconf.info/0x14/session.html?talk-performance-optimization-usi…
Our second talk is from Maciej Paczkowski, Aleksandra Jereczek, and
Patrycja Kochmanska. In this talk Maciej et al integrate into FRR
to understand how to best optimize the path selection in an environemnt
with multiple simultenous link faults and incestant link flapps.
Could routing decisions better helped with ML hooks in
the kernel/datapath? Could we make use of offloading some of
the algos to AI hardware?
The authors will go over the challenges they overcame,
and solicit feedback.
More info:
https://netdevconf.info/0x14/session.html?talk-machine-learning-in-packet-r…
Reminder, registration is now open and early bird is still in effect.
https://netdevconf.info/0x14/registration.html
cheers,
jamal
In this moonshot talk Rony Efraim Roni, Bar Yanai and Yossi Kuperman
come up with a clever way to do efficient forwarding by introducing
an offloaded TC hash action.
An example use case of this feature is closely emulating what Equal
Cost Multipath (ECMP) based forwarding does.
ECMP-like can be achieved by enabling policy which will:
- First create a hash on a classical 5 tuple in hardware
- Kernel s/ware receives the hashid as metadata
- Use the resulting hashid as a tc filter chain id and
jump to that filter chain
- within the destination chain lookup a much smaller set
of tc flower rules (in the equivalent 5 tuple space) and
execute the associated leaf action(s) such as setting the
next hop mac address, etc.
Roni et al will suggest the TC extension semantics
to define the hash offload and solicit feedback from
the community. They will further discuss the challenges
involved.
More info:
https://netdevconf.info/0x14/session.html?talk-extending-tc-with-5-tuple-ha…
Reminder, registration is now open and early bird is still in effect.
https://netdevconf.info/0x14/registration.html
cheers,
jamal
Lets start with some context.
The Problem: TCP SYN DDOS attack.
TCP SYN attacks flood a targetted server with SYN requests.
Each SYN request received by the server is responded to with
a SYN ACK which results in a connection state being created
and put in a half-open (SYN RCVD) TCP state awaiting an ACK to
come back. *The ACK response never comes* and the SYN
requests keep coming in resulting in more half-open state
creation in the backlog...
At some point, during this attack, there will be a lot of
TCP half-open state such that the targeted server's ability
to respond to new SYN requests is lost because all the available
port resources have been exhausted...
How does one defend against SYN flood attack?
A popular defense against SYN attacks uses what is known as
SYN cookies.
So, how do SYN cookies work?
When the server sees the TCP SYN, it constructs the SYN ACK
using a sequence # created from a cryptographic hash of
some of the flow attributes. The computed ACK sequence # has
enough flow attributes encoding in it such that can be later
(on response to this sequence #) used to reconstruct the
original SYN request should the original sender response.
Two possibilities on responses:
1) If this was an attack, then the server will never
receive a response.
It is no big deal since we are not storing any half-open info
in the backlog; and therefore no server resources are wasted in
anticipation of that response.
2) This is legit client request - in which case there will be
a response coming back. When the ACK is received, we look at
the ACK sequence # and reconstruct the SYN queue entry using the
information that was originally encoded in the SYN ACK.
SYN cookies are effective, but:
the operation requires the packets traverse the TCP/IP layers
all the way up. For a busy server, the extra code path and the
hash computation constrains how fast you can issue SYN cookies back.
In this talk, Petar Penkov, Eric Dumazet and Stanislav Fomichev
discuss SynGate, an XDP-based approach to handling SYN cookies.
By moving the response lower in the stack it enables the
system to increase the rate at which a host can issue SYN cookies
and therefore improving its resilience to SYN flood attacks.
Petar et al will detail the design of this solution, the advantages
eBPF provided them and, challenges faced during development of
of SynGate and finally they will discuss areas they are considering
for improvement.
More info:
https://netdevconf.info/0x14/session.html?talk-issuing-SYN-cookies-in-XDP
Reminder, registration is now open and early bird is still in effect.
https://netdevconf.info/0x14/registration.html
cheers,
jamal