Updated results
This commit is contained in:
parent
93da952c80
commit
acbb0836f3
2 changed files with 68 additions and 72 deletions
140
doc/Results.tex
140
doc/Results.tex
|
@ -16,26 +16,22 @@ P4 software implementation.
|
||||||
\section{\label{results:p4}P4 based implementations}
|
\section{\label{results:p4}P4 based implementations}
|
||||||
We successfully implemented P4 code to realise
|
We successfully implemented P4 code to realise
|
||||||
NAT64~\cite{schottelius:thesisrepo}. It contains parsers
|
NAT64~\cite{schottelius:thesisrepo}. It contains parsers
|
||||||
for all related protocols (ipv6, ipv4, udp, tcp, icmp, icmp6, ndp,
|
for all related protocols (IPv6, IPv4, UDP, TCP, ICMP, ICMP6, NDP,
|
||||||
arp), supports EAMT as defined by RFC7757 ~\cite{rfc7757} and is
|
ARP), supports EAMT as defined by RFC7757 ~\cite{rfc7757} and is
|
||||||
feature equivalent to the two compared software solutions
|
feature equivalent to the two compared software solutions
|
||||||
tayga~\cite{lutchansky:_tayga_simpl_nat64_linux} and
|
tayga~\cite{lutchansky:_tayga_simpl_nat64_linux} and
|
||||||
jool~\cite{mexico:_jool_open_sourc_siit_nat64_linux}.
|
jool~\cite{mexico:_jool_open_sourc_siit_nat64_linux}.
|
||||||
Due to limitations in the P4 environment of the
|
Due to limitations in the P4 environment of the
|
||||||
NetFPGA~\cite{conclusion:netfpga} environment, the BMV2 implementation
|
NetFPGA environment, the BMV2 implementation
|
||||||
is more feature rich. Table \ref{tab:benchmark} summarises the
|
is more feature rich.
|
||||||
achieved bandwidths of the NAT64 solutions.
|
|
||||||
|
|
||||||
BEFORE OR AFTER MARKER - FIXME
|
|
||||||
|
|
||||||
All planned features could be realised with P4 and a controller.
|
|
||||||
For this thesis the parsing capabilities of P4 were adequate.
|
For this thesis the parsing capabilities of P4 were adequate.
|
||||||
However P4, at the time of writing, cannot parse ICMP6 options in
|
However P4, at the time of writing, cannot parse ICMP6 options in
|
||||||
general, as the upper level protocol does not specify the number
|
general, as the upper level protocol does not specify the number
|
||||||
of options that follow and parsing of an undefined number
|
of options that follow and parsing of an undefined number
|
||||||
of 64 bit blocks is required, which P4 does not support.
|
of 64 bit blocks is required, which P4 does not support.
|
||||||
|
|
||||||
The language has some limitations on where the placement of
|
The language has some limitations on the placement of
|
||||||
conditional statements (\texttt{if/switch}).\footnote{In general,
|
conditional statements (\texttt{if/switch}).\footnote{In general,
|
||||||
if and switch statements in actions lead to errors,
|
if and switch statements in actions lead to errors,
|
||||||
but not all constellations are forbidden.}
|
but not all constellations are forbidden.}
|
||||||
|
@ -51,7 +47,7 @@ checksum errors, the effective length of the packet was incorrect.
|
||||||
|
|
||||||
The tooling around P4 is somewhat fragile. We encountered small
|
The tooling around P4 is somewhat fragile. We encountered small
|
||||||
language bugs during the development~\cite{schottelius:github1675},
|
language bugs during the development~\cite{schottelius:github1675},
|
||||||
\ref{appendix:expressionbug}
|
(compare section \ref{appendix:netfpgalogs:compilelogs})
|
||||||
or found missing features~\cite{schottelius:github745},
|
or found missing features~\cite{schottelius:github745},
|
||||||
~\cite{theojepsen:_get}: it is at the moment impossible to retrieve
|
~\cite{theojepsen:_get}: it is at the moment impossible to retrieve
|
||||||
the matching key from table or the name of the action called. Thus
|
the matching key from table or the name of the action called. Thus
|
||||||
|
@ -75,7 +71,7 @@ The supporting scripts in the P4 toolchain are usually written in
|
||||||
python2. However python2 ``is
|
python2. However python2 ``is
|
||||||
legacy''~\cite{various:_shoul_i_python_python}. During development
|
legacy''~\cite{various:_shoul_i_python_python}. During development
|
||||||
errors with unicode string handling in python2 caused
|
errors with unicode string handling in python2 caused
|
||||||
changes to IPv6 addresses.\footnote{Compare section ~\ref{appendix:p4:python2unicode}.}
|
changes to IPv6 addresses.!
|
||||||
% ok
|
% ok
|
||||||
% ----------------------------------------------------------------------
|
% ----------------------------------------------------------------------
|
||||||
\section{\label{results:bmv2}P4/BMV2}
|
\section{\label{results:bmv2}P4/BMV2}
|
||||||
|
@ -239,7 +235,7 @@ unsupported\footnote{To support creating payload checksums, either an
|
||||||
\subsection{\label{results:netpfga:stability}Stability}
|
\subsection{\label{results:netpfga:stability}Stability}
|
||||||
Two different NetPFGA cards were used during the development of the
|
Two different NetPFGA cards were used during the development of the
|
||||||
thesis. The first card had consistent ioctl errors (compare section
|
thesis. The first card had consistent ioctl errors (compare section
|
||||||
\ref{netpfgaioctlerror}) when writing table entries. The available
|
\ref{appendix:netfpgalogs:compilelogs}) when writing table entries. The available
|
||||||
hardware tests (compare figures \ref{fig:hwtestnico} and
|
hardware tests (compare figures \ref{fig:hwtestnico} and
|
||||||
\ref{fig:hwtesthendrik}) showed failures in both cards, however the
|
\ref{fig:hwtesthendrik}) showed failures in both cards, however the
|
||||||
first card reported an additional ``10G\_Loopback'' failure. Due to
|
first card reported an additional ``10G\_Loopback'' failure. Due to
|
||||||
|
@ -392,64 +388,15 @@ reason our implementation uses \texttt{\#define} statements instead of functions
|
||||||
% ----------------------------------------------------------------------
|
% ----------------------------------------------------------------------
|
||||||
\section{\label{results:softwarenat64}Software based NAT64}
|
\section{\label{results:softwarenat64}Software based NAT64}
|
||||||
Both solutions Tayga and Jool worked flawlessly. However as expected,
|
Both solutions Tayga and Jool worked flawlessly. However as expected,
|
||||||
both solutions have a bottleneck that is CPU bound. Under high load
|
both solutions are CPU bound. Under high load
|
||||||
scenarios both solutions utilise one core fully. Neither Tayga as a
|
scenarios both solutions utilise one core fully. Neither Tayga as a
|
||||||
user space program nor Jool as a kernel module implement multi
|
user space program nor Jool as a kernel module implement multi
|
||||||
threading.
|
threading.
|
||||||
%ok
|
%ok
|
||||||
% ----------------------------------------------------------------------
|
% ----------------------------------------------------------------------
|
||||||
\section{\label{results:benchmark}NAT64 Benchmarks}
|
\section{\label{results:benchmark}NAT64 Benchmarks}
|
||||||
In this section we summarise the benchmarking results, in the
|
In this section we give an overview of the benchmark design
|
||||||
sub sections we discuss the benchmark design and the individual results.
|
and summarise the benchmarking results.
|
||||||
|
|
||||||
FIXME: summary here
|
|
||||||
|
|
||||||
MTU setting to 1500, as netpfga doesn't support jumbo frames
|
|
||||||
|
|
||||||
|
|
||||||
iperf3, iperf 3.0.11
|
|
||||||
|
|
||||||
50 parallel = 2x 100% cpu usage
|
|
||||||
40 parallel = 100%, 70% cpu usage
|
|
||||||
30 parallel = 70%-100, 70% cpu usage
|
|
||||||
|
|
||||||
Turning back on checksum offloading (see below)
|
|
||||||
|
|
||||||
30 parallel = 70%, 30% cpu usage
|
|
||||||
|
|
||||||
|
|
||||||
\subsection{\label{benchmark:tayga:tcp}Tayga/TCP}
|
|
||||||
|
|
||||||
Tayga running at 100% cpu load,
|
|
||||||
|
|
||||||
v4->v6 tcp
|
|
||||||
delivering
|
|
||||||
3.36 gbit/s at P1
|
|
||||||
3.30 Gbit/s at P20
|
|
||||||
3.11 gbit/s at P50
|
|
||||||
|
|
||||||
v6->v4 tcp
|
|
||||||
P1: 3.02 Gbit/s
|
|
||||||
P20: 3.28 gbit/s
|
|
||||||
P50: 2.85 gbit/s
|
|
||||||
|
|
||||||
Commands:
|
|
||||||
|
|
||||||
|
|
||||||
UDP load generator hitting 100\% cpu at P20.
|
|
||||||
TCP confirmed.
|
|
||||||
Over bandwidth results
|
|
||||||
|
|
||||||
Feature comparison
|
|
||||||
speed - sessions - eamt
|
|
||||||
can act as host
|
|
||||||
lpm tables
|
|
||||||
ping
|
|
||||||
ping6 support
|
|
||||||
ndp
|
|
||||||
controller support
|
|
||||||
|
|
||||||
netpfga consistent
|
|
||||||
% ----------------------------------------------------------------------
|
% ----------------------------------------------------------------------
|
||||||
\subsection{\label{results:benchmark:design}Benchmark Design}
|
\subsection{\label{results:benchmark:design}Benchmark Design}
|
||||||
\begin{figure}[h]
|
\begin{figure}[h]
|
||||||
|
@ -481,11 +428,61 @@ warm up phase.\footnote{iperf -O 10 parameter, see section \ref{design:tests}.}
|
||||||
\end{figure}
|
\end{figure}
|
||||||
% ok
|
% ok
|
||||||
% ----------------------------------------------------------------------
|
% ----------------------------------------------------------------------
|
||||||
|
\subsection{\label{results:benchmark:summary}Benchmark Summary}
|
||||||
|
Overall \textbf{tayga} has shown to be the slowest translator with an achieved
|
||||||
|
bandwidth of \textbf{about 3 Gbit/s}, followed by \textbf{Jool} that translates at
|
||||||
|
about \textbf{8 Gbit/s}. \textbf{Our solution} is the fastest with an almost line rate
|
||||||
|
translation speed of about \textbf{9 Gbit/s}.
|
||||||
|
|
||||||
|
The TCP based benchmarks show realistic numbers, while iperf reports
|
||||||
|
above line rate speeds (up to 22 gbit/s on a 10gbit/s link)
|
||||||
|
for UDP based benchmarks. For this reason we
|
||||||
|
have summarised the UDP based benchmarks with their average loss
|
||||||
|
instead of listing the bandwidth details. The ``adjusted bandwidth''
|
||||||
|
in the UDP benchmarks incorporates the packets loss (compare tables
|
||||||
|
\ref{tab:benchmarkv6v4udp} and \ref{tab:benchmarkv6v4udp}).
|
||||||
|
|
||||||
|
Both software solutions showed significant loss of packets in the UDP
|
||||||
|
based benchmarks (tayga: up to 91\%, jool up to 71\%), while the
|
||||||
|
P4/NetFPGA showed a maximum of 0.01\% packet loss. Packet loss is only
|
||||||
|
recorded by iperf for UDP based benchmarks, as TCP packets are confirmed and
|
||||||
|
resent if necessary.
|
||||||
|
|
||||||
|
Tayga has the highest variation of results, which might be due to
|
||||||
|
being fully CPU bound even in the simplest benchmark. Jool has less
|
||||||
|
variation and in general the P4/NetFPGA solution behaves almost
|
||||||
|
identical in different benchmark runs.
|
||||||
|
|
||||||
|
The CPU load for TCP based benchmarks with Jool was almost negligible,
|
||||||
|
however for UDP based benchmarks one core was almost 100\%
|
||||||
|
utilised. In all benchmarks with tayga, one CPU was fully
|
||||||
|
utilised. And as the translation for P4/NetFPGA happens within the
|
||||||
|
NetFPGA card, there was no CPU utilisation visible on the NAT64 host.
|
||||||
|
|
||||||
|
We see lower bandwidth for translating IPv4 to IPv6 in all solutions.
|
||||||
|
We suspect that this might be due to slighty increasing packet sizes that
|
||||||
|
occur during this direction of translation. Not only does this vary
|
||||||
|
the IPv4 versus IPv6 bandwidth, but it might also cause fragmentation
|
||||||
|
to slowdown the process.
|
||||||
|
|
||||||
|
During the benchmark with 1 and 10 parallel connections, no
|
||||||
|
significant CPU load was registered on the load generator. However
|
||||||
|
with 20 parallel connections, each of the two iperf
|
||||||
|
processes\footnote{One for sending, one for receiving.} partially
|
||||||
|
spiked to 100\% cpu usage, which with 50 parallel connections the cpu
|
||||||
|
load of each process hit 100\% often.
|
||||||
|
|
||||||
|
While tayga's performance seems to reduce with the growing number of
|
||||||
|
parallel connections, both Jool and our P4/NetFPGA implementations
|
||||||
|
vary only slighty.
|
||||||
|
|
||||||
|
Overall the performance of tayga, a Linux user space program, is as
|
||||||
|
expected. We were surprised about the good performance of Jool, which,
|
||||||
|
while slower than the P4/NetFPGA solution, is almost on par with our solution.
|
||||||
|
% ----------------------------------------------------------------------
|
||||||
\newpage
|
\newpage
|
||||||
\subsection{\label{results:benchmark:v6v4tcp}IPv6 to IPv4 TCP
|
\subsection{\label{results:benchmark:v6v4tcp}IPv6 to IPv4 TCP
|
||||||
Benchmark Results}
|
Benchmark Results}
|
||||||
some text
|
|
||||||
|
|
||||||
\begin{table}[htbp]
|
\begin{table}[htbp]
|
||||||
\begin{center}\begin{minipage}{\textwidth}
|
\begin{center}\begin{minipage}{\textwidth}
|
||||||
\begin{tabular}{| c | c | c | c | c |}
|
\begin{tabular}{| c | c | c | c | c |}
|
||||||
|
@ -509,9 +506,9 @@ Parallel connections & 1 & 10 & 20 & 50 \\
|
||||||
\label{tab:benchmarkv6}
|
\label{tab:benchmarkv6}
|
||||||
\end{center}
|
\end{center}
|
||||||
\end{table}
|
\end{table}
|
||||||
|
%ok
|
||||||
% ---------------------------------------------------------------------
|
% ---------------------------------------------------------------------
|
||||||
\subsection{\label{results:benchmark:v4v6tcp}IPv4 to IPv6 TCP Benchmark Results}
|
\subsection{\label{results:benchmark:v4v6tcp}IPv4 to IPv6 TCP Benchmark Results}
|
||||||
During the benchmarks the client -- CPU usage
|
|
||||||
\begin{table}[htbp]
|
\begin{table}[htbp]
|
||||||
\begin{center}\begin{minipage}{\textwidth}
|
\begin{center}\begin{minipage}{\textwidth}
|
||||||
\begin{tabular}{| c | c | c | c | c |}
|
\begin{tabular}{| c | c | c | c | c |}
|
||||||
|
@ -540,7 +537,6 @@ Parallel connections & 1 & 10 & 20 & 50 \\
|
||||||
\newpage
|
\newpage
|
||||||
\subsection{\label{results:benchmark:v6v4udp}IPv6 to IPv4 UDP
|
\subsection{\label{results:benchmark:v6v4udp}IPv6 to IPv4 UDP
|
||||||
Benchmark Results}
|
Benchmark Results}
|
||||||
other text
|
|
||||||
\begin{table}[htbp]
|
\begin{table}[htbp]
|
||||||
\begin{center}\begin{minipage}{\textwidth}
|
\begin{center}\begin{minipage}{\textwidth}
|
||||||
\begin{tabular}{| c | c | c | c | c |}
|
\begin{tabular}{| c | c | c | c | c |}
|
||||||
|
@ -562,13 +558,12 @@ Parallel connections & 1 & 10 & 20 & 50 \\
|
||||||
\end{tabular}
|
\end{tabular}
|
||||||
\end{minipage}
|
\end{minipage}
|
||||||
\caption{IPv6 to IPv4 UDP NAT64 Benchmark}
|
\caption{IPv6 to IPv4 UDP NAT64 Benchmark}
|
||||||
\label{tab:benchmarkv4}
|
\label{tab:benchmarkv6v4udp}
|
||||||
\end{center}
|
\end{center}
|
||||||
\end{table}
|
\end{table}
|
||||||
|
%ok
|
||||||
% ---------------------------------------------------------------------
|
% ---------------------------------------------------------------------
|
||||||
\subsection{\label{results:benchmark:v4v6udp}IPv4 to IPv6 UDP Benchmark Results}
|
\subsection{\label{results:benchmark:v4v6udp}IPv4 to IPv6 UDP Benchmark Results}
|
||||||
last text
|
|
||||||
\begin{table}[htbp]
|
\begin{table}[htbp]
|
||||||
\begin{center}\begin{minipage}{\textwidth}
|
\begin{center}\begin{minipage}{\textwidth}
|
||||||
\begin{tabular}{| c | c | c | c | c |}
|
\begin{tabular}{| c | c | c | c | c |}
|
||||||
|
@ -590,6 +585,7 @@ Parallel connections & 1 & 10 & 20 & 50 \\
|
||||||
\end{tabular}
|
\end{tabular}
|
||||||
\end{minipage}
|
\end{minipage}
|
||||||
\caption{IPv4 to IPv6 UDP NAT64 Benchmark}
|
\caption{IPv4 to IPv6 UDP NAT64 Benchmark}
|
||||||
\label{tab:benchmarkv4}
|
\label{tab:benchmarkv6v4udp}
|
||||||
\end{center}
|
\end{center}
|
||||||
\end{table}
|
\end{table}
|
||||||
|
%ok
|
||||||
|
|
BIN
doc/Thesis.pdf
BIN
doc/Thesis.pdf
Binary file not shown.
Loading…
Add table
Reference in a new issue