Browse Source

Updated results

Nico Schottelius 3 years ago
  1. 140
  2. BIN


@ -16,26 +16,22 @@ P4 software implementation.
\section{\label{results:p4}P4 based implementations}
We successfully implemented P4 code to realise
NAT64~\cite{schottelius:thesisrepo}. It contains parsers
for all related protocols (ipv6, ipv4, udp, tcp, icmp, icmp6, ndp,
arp), supports EAMT as defined by RFC7757 ~\cite{rfc7757} and is
for all related protocols (IPv6, IPv4, UDP, TCP, ICMP, ICMP6, NDP,
ARP), supports EAMT as defined by RFC7757 ~\cite{rfc7757} and is
feature equivalent to the two compared software solutions
tayga~\cite{lutchansky:_tayga_simpl_nat64_linux} and
Due to limitations in the P4 environment of the
NetFPGA~\cite{conclusion:netfpga} environment, the BMV2 implementation
is more feature rich. Table \ref{tab:benchmark} summarises the
achieved bandwidths of the NAT64 solutions.
NetFPGA environment, the BMV2 implementation
is more feature rich.
All planned features could be realised with P4 and a controller.
For this thesis the parsing capabilities of P4 were adequate.
However P4, at the time of writing, cannot parse ICMP6 options in
general, as the upper level protocol does not specify the number
of options that follow and parsing of an undefined number
of 64 bit blocks is required, which P4 does not support.
The language has some limitations on where the placement of
The language has some limitations on the placement of
conditional statements (\texttt{if/switch}).\footnote{In general,
if and switch statements in actions lead to errors,
but not all constellations are forbidden.}
@ -51,7 +47,7 @@ checksum errors, the effective length of the packet was incorrect.
The tooling around P4 is somewhat fragile. We encountered small
language bugs during the development~\cite{schottelius:github1675},
(compare section \ref{appendix:netfpgalogs:compilelogs})
or found missing features~\cite{schottelius:github745},
~\cite{theojepsen:_get}: it is at the moment impossible to retrieve
the matching key from table or the name of the action called. Thus
@ -75,7 +71,7 @@ The supporting scripts in the P4 toolchain are usually written in
python2. However python2 ``is
legacy''~\cite{various:_shoul_i_python_python}. During development
errors with unicode string handling in python2 caused
changes to IPv6 addresses.\footnote{Compare section ~\ref{appendix:p4:python2unicode}.}
changes to IPv6 addresses.!
% ok
% ----------------------------------------------------------------------
@ -239,7 +235,7 @@ unsupported\footnote{To support creating payload checksums, either an
Two different NetPFGA cards were used during the development of the
thesis. The first card had consistent ioctl errors (compare section
\ref{netpfgaioctlerror}) when writing table entries. The available
\ref{appendix:netfpgalogs:compilelogs}) when writing table entries. The available
hardware tests (compare figures \ref{fig:hwtestnico} and
\ref{fig:hwtesthendrik}) showed failures in both cards, however the
first card reported an additional ``10G\_Loopback'' failure. Due to
@ -392,64 +388,15 @@ reason our implementation uses \texttt{\#define} statements instead of functions
% ----------------------------------------------------------------------
\section{\label{results:softwarenat64}Software based NAT64}
Both solutions Tayga and Jool worked flawlessly. However as expected,
both solutions have a bottleneck that is CPU bound. Under high load
both solutions are CPU bound. Under high load
scenarios both solutions utilise one core fully. Neither Tayga as a
user space program nor Jool as a kernel module implement multi
% ----------------------------------------------------------------------
\section{\label{results:benchmark}NAT64 Benchmarks}
In this section we summarise the benchmarking results, in the
sub sections we discuss the benchmark design and the individual results.
FIXME: summary here
MTU setting to 1500, as netpfga doesn't support jumbo frames
iperf3, iperf 3.0.11
50 parallel = 2x 100% cpu usage
40 parallel = 100%, 70% cpu usage
30 parallel = 70%-100, 70% cpu usage
Turning back on checksum offloading (see below)
30 parallel = 70%, 30% cpu usage
Tayga running at 100% cpu load,
v4->v6 tcp
3.36 gbit/s at P1
3.30 Gbit/s at P20
3.11 gbit/s at P50
v6->v4 tcp
P1: 3.02 Gbit/s
P20: 3.28 gbit/s
P50: 2.85 gbit/s
UDP load generator hitting 100\% cpu at P20.
TCP confirmed.
Over bandwidth results
Feature comparison
speed - sessions - eamt
can act as host
lpm tables
ping6 support
controller support
netpfga consistent
In this section we give an overview of the benchmark design
and summarise the benchmarking results.
% ----------------------------------------------------------------------
\subsection{\label{results:benchmark:design}Benchmark Design}
@ -481,11 +428,61 @@ warm up phase.\footnote{iperf -O 10 parameter, see section \ref{design:tests}.}
% ok
% ----------------------------------------------------------------------
\subsection{\label{results:benchmark:summary}Benchmark Summary}
Overall \textbf{tayga} has shown to be the slowest translator with an achieved
bandwidth of \textbf{about 3 Gbit/s}, followed by \textbf{Jool} that translates at
about \textbf{8 Gbit/s}. \textbf{Our solution} is the fastest with an almost line rate
translation speed of about \textbf{9 Gbit/s}.
The TCP based benchmarks show realistic numbers, while iperf reports
above line rate speeds (up to 22 gbit/s on a 10gbit/s link)
for UDP based benchmarks. For this reason we
have summarised the UDP based benchmarks with their average loss
instead of listing the bandwidth details. The ``adjusted bandwidth''
in the UDP benchmarks incorporates the packets loss (compare tables
\ref{tab:benchmarkv6v4udp} and \ref{tab:benchmarkv6v4udp}).
Both software solutions showed significant loss of packets in the UDP
based benchmarks (tayga: up to 91\%, jool up to 71\%), while the
P4/NetFPGA showed a maximum of 0.01\% packet loss. Packet loss is only
recorded by iperf for UDP based benchmarks, as TCP packets are confirmed and
resent if necessary.
Tayga has the highest variation of results, which might be due to
being fully CPU bound even in the simplest benchmark. Jool has less
variation and in general the P4/NetFPGA solution behaves almost
identical in different benchmark runs.
The CPU load for TCP based benchmarks with Jool was almost negligible,
however for UDP based benchmarks one core was almost 100\%
utilised. In all benchmarks with tayga, one CPU was fully
utilised. And as the translation for P4/NetFPGA happens within the
NetFPGA card, there was no CPU utilisation visible on the NAT64 host.
We see lower bandwidth for translating IPv4 to IPv6 in all solutions.
We suspect that this might be due to slighty increasing packet sizes that
occur during this direction of translation. Not only does this vary
the IPv4 versus IPv6 bandwidth, but it might also cause fragmentation
to slowdown the process.
During the benchmark with 1 and 10 parallel connections, no
significant CPU load was registered on the load generator. However
with 20 parallel connections, each of the two iperf
processes\footnote{One for sending, one for receiving.} partially
spiked to 100\% cpu usage, which with 50 parallel connections the cpu
load of each process hit 100\% often.
While tayga's performance seems to reduce with the growing number of
parallel connections, both Jool and our P4/NetFPGA implementations
vary only slighty.
Overall the performance of tayga, a Linux user space program, is as
expected. We were surprised about the good performance of Jool, which,
while slower than the P4/NetFPGA solution, is almost on par with our solution.
% ----------------------------------------------------------------------
\subsection{\label{results:benchmark:v6v4tcp}IPv6 to IPv4 TCP
Benchmark Results}
some text
\begin{tabular}{| c | c | c | c | c |}
@ -509,9 +506,9 @@ Parallel connections & 1 & 10 & 20 & 50 \\
% ---------------------------------------------------------------------
\subsection{\label{results:benchmark:v4v6tcp}IPv4 to IPv6 TCP Benchmark Results}
During the benchmarks the client -- CPU usage
\begin{tabular}{| c | c | c | c | c |}
@ -540,7 +537,6 @@ Parallel connections & 1 & 10 & 20 & 50 \\
\subsection{\label{results:benchmark:v6v4udp}IPv6 to IPv4 UDP
Benchmark Results}
other text
\begin{tabular}{| c | c | c | c | c |}
@ -562,13 +558,12 @@ Parallel connections & 1 & 10 & 20 & 50 \\
\caption{IPv6 to IPv4 UDP NAT64 Benchmark}
% ---------------------------------------------------------------------
\subsection{\label{results:benchmark:v4v6udp}IPv4 to IPv6 UDP Benchmark Results}
last text
\begin{tabular}{| c | c | c | c | c |}
@ -590,6 +585,7 @@ Parallel connections & 1 & 10 & 20 & 50 \\
\caption{IPv4 to IPv6 UDP NAT64 Benchmark}


Binary file not shown.