Updated results
This commit is contained in:
parent
93da952c80
commit
acbb0836f3
2 changed files with 68 additions and 72 deletions
140
doc/Results.tex
140
doc/Results.tex
|
@ -16,26 +16,22 @@ P4 software implementation.
|
|||
\section{\label{results:p4}P4 based implementations}
|
||||
We successfully implemented P4 code to realise
|
||||
NAT64~\cite{schottelius:thesisrepo}. It contains parsers
|
||||
for all related protocols (ipv6, ipv4, udp, tcp, icmp, icmp6, ndp,
|
||||
arp), supports EAMT as defined by RFC7757 ~\cite{rfc7757} and is
|
||||
for all related protocols (IPv6, IPv4, UDP, TCP, ICMP, ICMP6, NDP,
|
||||
ARP), supports EAMT as defined by RFC7757 ~\cite{rfc7757} and is
|
||||
feature equivalent to the two compared software solutions
|
||||
tayga~\cite{lutchansky:_tayga_simpl_nat64_linux} and
|
||||
jool~\cite{mexico:_jool_open_sourc_siit_nat64_linux}.
|
||||
Due to limitations in the P4 environment of the
|
||||
NetFPGA~\cite{conclusion:netfpga} environment, the BMV2 implementation
|
||||
is more feature rich. Table \ref{tab:benchmark} summarises the
|
||||
achieved bandwidths of the NAT64 solutions.
|
||||
NetFPGA environment, the BMV2 implementation
|
||||
is more feature rich.
|
||||
|
||||
BEFORE OR AFTER MARKER - FIXME
|
||||
|
||||
All planned features could be realised with P4 and a controller.
|
||||
For this thesis the parsing capabilities of P4 were adequate.
|
||||
However P4, at the time of writing, cannot parse ICMP6 options in
|
||||
general, as the upper level protocol does not specify the number
|
||||
of options that follow and parsing of an undefined number
|
||||
of 64 bit blocks is required, which P4 does not support.
|
||||
|
||||
The language has some limitations on where the placement of
|
||||
The language has some limitations on the placement of
|
||||
conditional statements (\texttt{if/switch}).\footnote{In general,
|
||||
if and switch statements in actions lead to errors,
|
||||
but not all constellations are forbidden.}
|
||||
|
@ -51,7 +47,7 @@ checksum errors, the effective length of the packet was incorrect.
|
|||
|
||||
The tooling around P4 is somewhat fragile. We encountered small
|
||||
language bugs during the development~\cite{schottelius:github1675},
|
||||
\ref{appendix:expressionbug}
|
||||
(compare section \ref{appendix:netfpgalogs:compilelogs})
|
||||
or found missing features~\cite{schottelius:github745},
|
||||
~\cite{theojepsen:_get}: it is at the moment impossible to retrieve
|
||||
the matching key from table or the name of the action called. Thus
|
||||
|
@ -75,7 +71,7 @@ The supporting scripts in the P4 toolchain are usually written in
|
|||
python2. However python2 ``is
|
||||
legacy''~\cite{various:_shoul_i_python_python}. During development
|
||||
errors with unicode string handling in python2 caused
|
||||
changes to IPv6 addresses.\footnote{Compare section ~\ref{appendix:p4:python2unicode}.}
|
||||
changes to IPv6 addresses.!
|
||||
% ok
|
||||
% ----------------------------------------------------------------------
|
||||
\section{\label{results:bmv2}P4/BMV2}
|
||||
|
@ -239,7 +235,7 @@ unsupported\footnote{To support creating payload checksums, either an
|
|||
\subsection{\label{results:netpfga:stability}Stability}
|
||||
Two different NetPFGA cards were used during the development of the
|
||||
thesis. The first card had consistent ioctl errors (compare section
|
||||
\ref{netpfgaioctlerror}) when writing table entries. The available
|
||||
\ref{appendix:netfpgalogs:compilelogs}) when writing table entries. The available
|
||||
hardware tests (compare figures \ref{fig:hwtestnico} and
|
||||
\ref{fig:hwtesthendrik}) showed failures in both cards, however the
|
||||
first card reported an additional ``10G\_Loopback'' failure. Due to
|
||||
|
@ -392,64 +388,15 @@ reason our implementation uses \texttt{\#define} statements instead of functions
|
|||
% ----------------------------------------------------------------------
|
||||
\section{\label{results:softwarenat64}Software based NAT64}
|
||||
Both solutions Tayga and Jool worked flawlessly. However as expected,
|
||||
both solutions have a bottleneck that is CPU bound. Under high load
|
||||
both solutions are CPU bound. Under high load
|
||||
scenarios both solutions utilise one core fully. Neither Tayga as a
|
||||
user space program nor Jool as a kernel module implement multi
|
||||
threading.
|
||||
%ok
|
||||
% ----------------------------------------------------------------------
|
||||
\section{\label{results:benchmark}NAT64 Benchmarks}
|
||||
In this section we summarise the benchmarking results, in the
|
||||
sub sections we discuss the benchmark design and the individual results.
|
||||
|
||||
FIXME: summary here
|
||||
|
||||
MTU setting to 1500, as netpfga doesn't support jumbo frames
|
||||
|
||||
|
||||
iperf3, iperf 3.0.11
|
||||
|
||||
50 parallel = 2x 100% cpu usage
|
||||
40 parallel = 100%, 70% cpu usage
|
||||
30 parallel = 70%-100, 70% cpu usage
|
||||
|
||||
Turning back on checksum offloading (see below)
|
||||
|
||||
30 parallel = 70%, 30% cpu usage
|
||||
|
||||
|
||||
\subsection{\label{benchmark:tayga:tcp}Tayga/TCP}
|
||||
|
||||
Tayga running at 100% cpu load,
|
||||
|
||||
v4->v6 tcp
|
||||
delivering
|
||||
3.36 gbit/s at P1
|
||||
3.30 Gbit/s at P20
|
||||
3.11 gbit/s at P50
|
||||
|
||||
v6->v4 tcp
|
||||
P1: 3.02 Gbit/s
|
||||
P20: 3.28 gbit/s
|
||||
P50: 2.85 gbit/s
|
||||
|
||||
Commands:
|
||||
|
||||
|
||||
UDP load generator hitting 100\% cpu at P20.
|
||||
TCP confirmed.
|
||||
Over bandwidth results
|
||||
|
||||
Feature comparison
|
||||
speed - sessions - eamt
|
||||
can act as host
|
||||
lpm tables
|
||||
ping
|
||||
ping6 support
|
||||
ndp
|
||||
controller support
|
||||
|
||||
netpfga consistent
|
||||
In this section we give an overview of the benchmark design
|
||||
and summarise the benchmarking results.
|
||||
% ----------------------------------------------------------------------
|
||||
\subsection{\label{results:benchmark:design}Benchmark Design}
|
||||
\begin{figure}[h]
|
||||
|
@ -481,11 +428,61 @@ warm up phase.\footnote{iperf -O 10 parameter, see section \ref{design:tests}.}
|
|||
\end{figure}
|
||||
% ok
|
||||
% ----------------------------------------------------------------------
|
||||
\subsection{\label{results:benchmark:summary}Benchmark Summary}
|
||||
Overall \textbf{tayga} has shown to be the slowest translator with an achieved
|
||||
bandwidth of \textbf{about 3 Gbit/s}, followed by \textbf{Jool} that translates at
|
||||
about \textbf{8 Gbit/s}. \textbf{Our solution} is the fastest with an almost line rate
|
||||
translation speed of about \textbf{9 Gbit/s}.
|
||||
|
||||
The TCP based benchmarks show realistic numbers, while iperf reports
|
||||
above line rate speeds (up to 22 gbit/s on a 10gbit/s link)
|
||||
for UDP based benchmarks. For this reason we
|
||||
have summarised the UDP based benchmarks with their average loss
|
||||
instead of listing the bandwidth details. The ``adjusted bandwidth''
|
||||
in the UDP benchmarks incorporates the packets loss (compare tables
|
||||
\ref{tab:benchmarkv6v4udp} and \ref{tab:benchmarkv6v4udp}).
|
||||
|
||||
Both software solutions showed significant loss of packets in the UDP
|
||||
based benchmarks (tayga: up to 91\%, jool up to 71\%), while the
|
||||
P4/NetFPGA showed a maximum of 0.01\% packet loss. Packet loss is only
|
||||
recorded by iperf for UDP based benchmarks, as TCP packets are confirmed and
|
||||
resent if necessary.
|
||||
|
||||
Tayga has the highest variation of results, which might be due to
|
||||
being fully CPU bound even in the simplest benchmark. Jool has less
|
||||
variation and in general the P4/NetFPGA solution behaves almost
|
||||
identical in different benchmark runs.
|
||||
|
||||
The CPU load for TCP based benchmarks with Jool was almost negligible,
|
||||
however for UDP based benchmarks one core was almost 100\%
|
||||
utilised. In all benchmarks with tayga, one CPU was fully
|
||||
utilised. And as the translation for P4/NetFPGA happens within the
|
||||
NetFPGA card, there was no CPU utilisation visible on the NAT64 host.
|
||||
|
||||
We see lower bandwidth for translating IPv4 to IPv6 in all solutions.
|
||||
We suspect that this might be due to slighty increasing packet sizes that
|
||||
occur during this direction of translation. Not only does this vary
|
||||
the IPv4 versus IPv6 bandwidth, but it might also cause fragmentation
|
||||
to slowdown the process.
|
||||
|
||||
During the benchmark with 1 and 10 parallel connections, no
|
||||
significant CPU load was registered on the load generator. However
|
||||
with 20 parallel connections, each of the two iperf
|
||||
processes\footnote{One for sending, one for receiving.} partially
|
||||
spiked to 100\% cpu usage, which with 50 parallel connections the cpu
|
||||
load of each process hit 100\% often.
|
||||
|
||||
While tayga's performance seems to reduce with the growing number of
|
||||
parallel connections, both Jool and our P4/NetFPGA implementations
|
||||
vary only slighty.
|
||||
|
||||
Overall the performance of tayga, a Linux user space program, is as
|
||||
expected. We were surprised about the good performance of Jool, which,
|
||||
while slower than the P4/NetFPGA solution, is almost on par with our solution.
|
||||
% ----------------------------------------------------------------------
|
||||
\newpage
|
||||
\subsection{\label{results:benchmark:v6v4tcp}IPv6 to IPv4 TCP
|
||||
Benchmark Results}
|
||||
some text
|
||||
|
||||
\begin{table}[htbp]
|
||||
\begin{center}\begin{minipage}{\textwidth}
|
||||
\begin{tabular}{| c | c | c | c | c |}
|
||||
|
@ -509,9 +506,9 @@ Parallel connections & 1 & 10 & 20 & 50 \\
|
|||
\label{tab:benchmarkv6}
|
||||
\end{center}
|
||||
\end{table}
|
||||
%ok
|
||||
% ---------------------------------------------------------------------
|
||||
\subsection{\label{results:benchmark:v4v6tcp}IPv4 to IPv6 TCP Benchmark Results}
|
||||
During the benchmarks the client -- CPU usage
|
||||
\begin{table}[htbp]
|
||||
\begin{center}\begin{minipage}{\textwidth}
|
||||
\begin{tabular}{| c | c | c | c | c |}
|
||||
|
@ -540,7 +537,6 @@ Parallel connections & 1 & 10 & 20 & 50 \\
|
|||
\newpage
|
||||
\subsection{\label{results:benchmark:v6v4udp}IPv6 to IPv4 UDP
|
||||
Benchmark Results}
|
||||
other text
|
||||
\begin{table}[htbp]
|
||||
\begin{center}\begin{minipage}{\textwidth}
|
||||
\begin{tabular}{| c | c | c | c | c |}
|
||||
|
@ -562,13 +558,12 @@ Parallel connections & 1 & 10 & 20 & 50 \\
|
|||
\end{tabular}
|
||||
\end{minipage}
|
||||
\caption{IPv6 to IPv4 UDP NAT64 Benchmark}
|
||||
\label{tab:benchmarkv4}
|
||||
\label{tab:benchmarkv6v4udp}
|
||||
\end{center}
|
||||
\end{table}
|
||||
|
||||
%ok
|
||||
% ---------------------------------------------------------------------
|
||||
\subsection{\label{results:benchmark:v4v6udp}IPv4 to IPv6 UDP Benchmark Results}
|
||||
last text
|
||||
\begin{table}[htbp]
|
||||
\begin{center}\begin{minipage}{\textwidth}
|
||||
\begin{tabular}{| c | c | c | c | c |}
|
||||
|
@ -590,6 +585,7 @@ Parallel connections & 1 & 10 & 20 & 50 \\
|
|||
\end{tabular}
|
||||
\end{minipage}
|
||||
\caption{IPv4 to IPv6 UDP NAT64 Benchmark}
|
||||
\label{tab:benchmarkv4}
|
||||
\label{tab:benchmarkv6v4udp}
|
||||
\end{center}
|
||||
\end{table}
|
||||
%ok
|
||||
|
|
BIN
doc/Thesis.pdf
BIN
doc/Thesis.pdf
Binary file not shown.
Loading…
Reference in a new issue