595 lines
27 KiB
TeX
595 lines
27 KiB
TeX
\chapter{\label{results}Results}
|
|
%** Results.tex: What were the results achieved including an evaluation
|
|
%
|
|
This section describes the achieved results and compares the P4 based
|
|
implementation with real world software solutions.
|
|
|
|
We distinguish the software implementation of P4 (BMV2) and the
|
|
hardware implementation (NetFPGA) due to significant differences in
|
|
deployment and development. We present benchmarks for the existing
|
|
software solutions as well as for our hardware implementation. As the
|
|
objective of this thesis is to demonstrate the high speed
|
|
capabilities of NAT64 in hardware, no benchmarks were performed on the
|
|
P4 software implementation.
|
|
% ok
|
|
% ----------------------------------------------------------------------
|
|
\section{\label{results:p4}P4 Based Implementations}
|
|
We successfully implemented P4 code to realise
|
|
NAT64~\cite{schottelius:thesisrepo}. It contains parsers
|
|
for all related protocols (IPv6, IPv4, UDP, TCP, ICMP, ICMP6, NDP,
|
|
ARP), supports EAMT as defined by RFC7757 ~\cite{rfc7757}, and is
|
|
feature equivalent to the two compared software solutions
|
|
tayga~\cite{lutchansky:_tayga_simpl_nat64_linux} and
|
|
jool~\cite{mexico:_jool_open_sourc_siit_nat64_linux}.
|
|
Due to limitations in the P4 environment of the
|
|
NetFPGA environment, the BMV2 implementation
|
|
is more feature rich.
|
|
|
|
For this thesis the parsing capabilities of P4 were adequate.
|
|
However P4, at the time of writing, cannot parse ICMP6 options in
|
|
general, as the upper level protocol does not specify the number
|
|
of options that follow and parsing of an undefined number
|
|
of 64 bit blocks is required, which P4 does not support.
|
|
|
|
The language has some limitations on the placement of
|
|
conditional statements (\texttt{if/switch}).\footnote{In general,
|
|
if and switch statements in actions lead to errors,
|
|
but not all constellations are forbidden.}
|
|
Furthermore P4/BMV2 does not support for multiple LPM keys in a table,
|
|
however it supports multiple keys with ternary matching, which is a
|
|
superset of LPM matching.
|
|
|
|
When developing P4 programs, the reason for incorrect behaviour we
|
|
have seen were checksum problems. This is in retrospective expected,
|
|
as the main task of our implementation is modifying headers on which
|
|
the checksums depend. In all cases we have seen Ethernet frame
|
|
checksum errors, the effective length of the packet was incorrect.
|
|
|
|
The tooling around P4 is somewhat fragile. We encountered small
|
|
language bugs during the development~\cite{schottelius:github1675},
|
|
(compare section \ref{appendix:netfpgalogs:compilelogs})
|
|
or found missing features~\cite{schottelius:github745},
|
|
~\cite{theojepsen:_get}: it is at the moment impossible to retrieve
|
|
the matching key from table or the name of the action called. Thus
|
|
if different table entries call the same action, it is impossible
|
|
within the action, or if forwarded to the controller, within the
|
|
controller to distinguish on which match the action was
|
|
triggered. This problem is very consistent within P4, not even the
|
|
matching table name can be retrieved. While these information can be
|
|
added manually as additional fields in the table entries, we would
|
|
expect a language to support reading and forwarding this kind of meta
|
|
information.
|
|
|
|
While in P4 the P4 code and the related controller are tightly
|
|
coupled, their data definitions are not. Thus the packet format
|
|
definition that is used between the P4 switch and the controller has
|
|
to be duplicated. Our experiences in software development indicate
|
|
that this duplication is a likely source of errors in bigger software
|
|
projects.
|
|
|
|
The supporting scripts in the P4 toolchain are usually written in
|
|
python2. However, python2 ``is
|
|
legacy''~\cite{various:_shoul_i_python_python}. During development
|
|
errors with unicode string handling in python2 caused
|
|
changes to IPv6 addresses.
|
|
% ok
|
|
% ----------------------------------------------------------------------
|
|
\section{\label{results:bmv2}P4/BMV2}
|
|
The software implementation of P4 has most features, which is
|
|
mostly due to the capability of creating checksums over the payload.
|
|
It enables the switch to act as a ``proper'' participant in NDP, as
|
|
this requires the host to calculate checksums over the payload.
|
|
Table~\ref{tab:p4bmv2features} references all implemented features.
|
|
\begin{table}[htbp]
|
|
\begin{center}\begin{minipage}{\textwidth}
|
|
\begin{tabular}{| c | c | c |}
|
|
\hline
|
|
\textbf{Feature} & \textbf{Description} & \textbf{Status} \\
|
|
\hline
|
|
Switch to controller & Switch forwards unhandled packets to
|
|
controller & fully implemented\footnote{Source code: \texttt{actions\_egress.p4}}\\
|
|
\hline
|
|
Controller to Switch & Controller can setup table entries &
|
|
fully implemented\footnote{Source code: \texttt{controller.py}}\\
|
|
\hline
|
|
NDP & Switch responds to ICMP6 neighbor & \\
|
|
& solicitation request (without controller) &
|
|
fully implemented\footnote{Source code:
|
|
\texttt{actions\_icmp6\_ndp\_icmp.p4}} \\
|
|
\hline
|
|
ARP & Switch can answer ARP request (without controller) & fully
|
|
implemented\footnote{Source code: \texttt{actions\_arp.p4}}\\
|
|
\hline
|
|
ICMP6 & Switch responds to ICMP6 echo request (without controller) &
|
|
fully implemented\footnote{Source code: \texttt{actions\_icmp6\_ndp\_icmp.p4}} \\
|
|
\hline
|
|
ICMP & Switch responds to ICMP echo request (without controller) &
|
|
fully implemented\footnote{Source code: \texttt{actions\_icmp6\_ndp\_icmp.p4}} \\
|
|
\hline
|
|
NAT64: TCP & Switch translates TCP with checksumming & \\
|
|
& from/to IPv6 to/from IPv4 &
|
|
fully implemented\footnote{Source code: \texttt{actions\_nat64\_generic\_icmp.p4}} \\
|
|
\hline
|
|
NAT64: UDP & Switch translates UDP with checksumming & \\
|
|
& from/to IPv6 to/from IPv4 &
|
|
fully implemented\footnote{Source code: \texttt{actions\_nat64\_generic\_icmp.p4}} \\
|
|
\hline
|
|
NAT64: & Switch translates echo request/reply & \\
|
|
ICMP/ICMP6 & from/to ICMP6 to/from ICMP with checksumming &
|
|
fully implemented\footnote{Source code: \texttt{actions\_nat64\_generic\_icmp.p4}} \\
|
|
\hline
|
|
NAT64: Sessions & Switch and controller create 1:n sessions/mappings &
|
|
fully implemented\footnote{Source code:
|
|
\texttt{actions\_nat64\_session.p4}, \texttt{controller.py}} \\
|
|
\hline
|
|
Delta Checksum & Switch can calculate checksum without payload
|
|
inspection &
|
|
fully implemented\footnote{Source code: \texttt{actions\_delta\_checksum.p4}}\\
|
|
\hline
|
|
Payload Checksum & Switch can calculate checksum with payload inspection &
|
|
fully implemented\footnote{Source code: \texttt{checksum\_bmv2.p4}}\\
|
|
\hline
|
|
\end{tabular}
|
|
\end{minipage}
|
|
\caption{P4/BMV2 feature list}
|
|
\label{tab:p4bmv2features}
|
|
\end{center}
|
|
\end{table}
|
|
The switch responds to ICMP echo requests, ICMP6 echo requests,
|
|
answers NDP and ARP requests. Overall P4/BMV is very easy to use,
|
|
even without a controller a fully functional network host can be
|
|
implemented.
|
|
|
|
This P4/BMV implementation supports translating ICMP/ICMP6
|
|
echo request and echo reply messages, but does not support
|
|
all ICMP/ICMP6 translations that are defined in
|
|
RFC6145~\cite{rfc6145}.
|
|
% ----------------------------------------------------------------------
|
|
\section{\label{results:netpfga}P4/NetFPGA}
|
|
In the following section we describe the achieved feature set of
|
|
P4/NetFPGA in detail and analyse differences to the BMV2 based
|
|
implementation.
|
|
% ok
|
|
% ----------------------------------------------------------------------
|
|
\subsection{\label{results:netpfga:features}Features}
|
|
While the NetFPGA target supports P4, compared to P4/BMV2
|
|
we only implemented a reduced features set on P4/NetPFGA. The first
|
|
reason for this is missing
|
|
support of the NetFPGA P4 compiler to inspect payload and to compute
|
|
checksums over payload. While this can (partially) be compensated
|
|
using delta checksums, the compile time of 2 to 6 hours contributed to
|
|
a significant slower development cycle compared to BMV2.
|
|
Lastly, the focus of this thesis is to implement high speed NAT64 on
|
|
P4, which only requires a subset of the features that we realised on
|
|
BMV2. Table \ref{tab:p4netpfgafeatures} summarises the implemented
|
|
features and reasons about their implementation status.
|
|
\begin{table}[htbp]
|
|
\begin{center}\begin{minipage}{\textwidth}
|
|
\begin{tabular}{| c | c | c |}
|
|
\hline
|
|
\textbf{Feature} & \textbf{Description} & \textbf{Status} \\
|
|
\hline
|
|
Switch to controller & Switch forwards unhandled packets to
|
|
controller & portable\footnote{While the NetFPGA P4 implementation
|
|
does not have the clone3() extern that the BMV2 implementation offers,
|
|
communication to the controller can easily be realised by using one of
|
|
the additional ports of the NetFPGA and connect a physical network
|
|
card to it.}\\
|
|
\hline
|
|
Controller to Switch & Controller can setup table entries &
|
|
portable\footnote{The p4utils suite offers an easy access to the
|
|
switch tables. While the P4-NetFPGA support repository also offers
|
|
python scripts to modify the switch tables, the code is less
|
|
sophisticated and more fragile.}\\
|
|
\hline
|
|
NDP & Switch responds to ICMP6 neighbor & \\
|
|
& solicitation request (without controller) &
|
|
portable\footnote{NetFPGA/P4 does not offer calculating the checksum
|
|
over the payload. However delta checksumming can be used to create
|
|
the required checksum for replying.} \\
|
|
\hline
|
|
ARP & Switch can answer ARP request (without controller) &
|
|
portable\footnote{As ARP does not use checksums, integrating the
|
|
source code \texttt{actions\_arp.p4} into the netpfga code base is
|
|
enough to enable ARP support in the NetPFGA.} \\
|
|
\hline
|
|
ICMP6 & Switch responds to ICMP6 echo request (without controller) &
|
|
portable\footnote{Same reasoning as NDP.} \\
|
|
\hline
|
|
ICMP & Switch responds to ICMP echo request (without controller) &
|
|
portable\footnote{Same reasoning as NDP.} \\
|
|
\hline
|
|
NAT64: TCP & Switch translates TCP with checksumming & \\
|
|
& from/to IPv6 to/from IPv4 &
|
|
fully implemented\footnote{Source code: \texttt{actions\_nat64\_generic\_icmp.p4}} \\
|
|
\hline
|
|
NAT64: UDP & Switch translates UDP with checksumming & \\
|
|
& from/to IPv6 to/from IPv4 &
|
|
fully implemented\footnote{Source code: \texttt{actions\_nat64\_generic\_icmp.p4}} \\
|
|
\hline
|
|
NAT64: & Switch translates echo request/reply & \\
|
|
ICMP/ICMP6 & from/to ICMP6 to/from ICMP with checksumming &
|
|
portable\footnote{ICMP/ICMP6 translations only require enabling the
|
|
icmp/icmp6 code in the netpfga code base.} \\
|
|
\hline
|
|
NAT64: Sessions & Switch and controller create 1:n sessions/mappings &
|
|
portable\footnote{Same reasoning as ``Controller to switch''.} \\
|
|
\hline
|
|
Delta Checksum & Switch can calculate checksum without payload
|
|
inspection &
|
|
fully implemented\footnote{Source code: \texttt{actions\_delta\_checksum.p4}}\\
|
|
\hline
|
|
Payload Checksum & Switch can calculate checksum with payload inspection &
|
|
unsupported\footnote{To support creating payload checksums, either an
|
|
HDL module needs to be created or to modify the generated
|
|
the PX program.~\cite{schottelius:_exter_p4_netpf}} \\
|
|
\hline
|
|
\end{tabular}
|
|
\end{minipage}
|
|
\caption{P4/NetFPGA feature list}
|
|
\label{tab:p4netpfgafeatures}
|
|
\end{center}
|
|
\end{table}
|
|
% ok
|
|
% ----------------------------------------------------------------------
|
|
\subsection{\label{results:netpfga:stability}Stability}
|
|
Two different NetPFGA cards were used during the development of this
|
|
thesis. The first card had consistent ioctl errors (compare section
|
|
\ref{appendix:netfpgalogs:compilelogs}) when writing table entries. The available
|
|
hardware tests (compare figures \ref{fig:hwtestnico} and
|
|
\ref{fig:hwtesthendrik}) showed failures in both cards, however the
|
|
first card reported an additional ``10G\_Loopback'' failure. Due to
|
|
the inability of setting table entries, no benchmarking was performed
|
|
on the first NetFPGA card.
|
|
\begin{figure}[h]
|
|
\includegraphics[scale=1.4]{hwtestnico}
|
|
\centering
|
|
\caption{Hardware Test NetPFGA card 1}
|
|
\label{fig:hwtestnico}
|
|
\end{figure}
|
|
\begin{figure}[h]
|
|
\includegraphics[scale=0.2]{hwtesthendrik}
|
|
\centering
|
|
\caption{Hardware Test NetPFGA card 2, ~\cite{hendrik:_p4_progr_fpga_semes_thesis_sa}}
|
|
\label{fig:hwtesthendrik}
|
|
\end{figure}
|
|
During the development and benchmarking, the second NetFPGA card stopped to
|
|
function properly multiple times. In theses cases the card would not
|
|
forward packets anymore. Multiple reboots (up to 3)
|
|
and multiple times reflashing the bitstream to the NetFPGA usually
|
|
restored the intended behaviour. However due to this ``crashes'', it
|
|
was impossible for us to run a benchmark for more than one hour.
|
|
Similarly, sometimes flashing the bitstream to the NetFPGA would fail.
|
|
It was required to reboot the host containing the
|
|
NetFPGA card up to 3 times to enable successful flashing.\footnote{Typical
|
|
output of the flashing process would be: ``fpga configuration
|
|
failed. DONE PIN is not HIGH''}
|
|
% ok
|
|
% ----------------------------------------------------------------------
|
|
\subsubsection{\label{results:netpfga:performance}Performance}
|
|
The NetFPGA card performed at near line speed and offers
|
|
NAT64 translations at 9.28 Gbit/s (see section \ref{results:benchmark}
|
|
for details).
|
|
Single and multiple streams
|
|
performed almost exactly identical and have been consistent through
|
|
multiple iterations of the benchmarks.
|
|
% ok
|
|
% ----------------------------------------------------------------------
|
|
\subsection{\label{results:netpfga:usability}Usability}
|
|
The handling and usability of the NetFPGA card is rather difficult. In
|
|
this section we describe our findings and experiences with the card
|
|
and its toolchain.
|
|
|
|
To use the NetFPGA, the tools Vivado and SDNET provided by Xilinx need to be
|
|
installed. However a bug in the installer triggers an infinite loop,
|
|
if a certain shared library\footnote{The required shared library
|
|
is libncurses5.} is missing on the target operating system. The
|
|
installation program seems to be still progressing, however never
|
|
finishes.
|
|
|
|
While the NetFPGA card supports P4, the toolchains and supporting
|
|
scripts are in an immature state. The compilation process consists of
|
|
at least 9 different steps, which are interdependent.\footnote{See
|
|
source code \texttt{bin/do-all-steps.sh}.} Some of the steps generate
|
|
shell scripts and python scripts that in turn generate JSON
|
|
data.\footnote{One compilation step calls the script
|
|
``config\_writes.py''. This script failed with a syntax error, as it
|
|
contained incomplete python code. The scripts config\_writes.py
|
|
and config\_writes.sh are generated by gen\_config\_writes.py.
|
|
The output of the script gen\_config\_writes.py depends on the content
|
|
of config\_writes.txt. That file is generated by the simulation
|
|
``xsim''. The file ``SimpleSumeSwitch\_tb.sv'' contains code that is
|
|
responsible for writing config\_writes.txt and uses a function
|
|
named axi4\_lite\_master\_write\_request\_control for generating the
|
|
output. This in turn is dependent on the output of a script named
|
|
gen\_testdata.py.}
|
|
|
|
However incorrect parsing generates syntactically incorrect
|
|
scripts or scripts that generate incorrect output. The toolchain
|
|
provided by the NetFPGA-P4 repository contains more than 80000 lines
|
|
of code. The supporting scripts for setting table entries require
|
|
setting the parameters for all possible actions, not only for the
|
|
selected action. Supplying only the required parameters results in a
|
|
crash of the supporting script.
|
|
|
|
The documentation for using the NetFPGA-P4 repository is very
|
|
distributed and does not contain a reference on how to use the
|
|
tools. Mapping of egress ports and their metadata field are found in a
|
|
python script that is used for generating test data.
|
|
|
|
The compile process can take up to 6 hours and because the different
|
|
steps are interdependent, errors in a previous stage were in our
|
|
experiences detected hours after they happened. The resulting log
|
|
files of the compilation process can be up to 5 MB in size. Within
|
|
this log file various commands output references to other logfiles,
|
|
however the referenced logfiles do not exist before or after the
|
|
compile process.
|
|
|
|
During the compile process various informational, warning and error
|
|
messages are printed. However some informational messages constitute
|
|
critical errors, while on the other hand critical errors and syntax
|
|
errors often do not constitute a critical
|
|
error.\footnote{F.i. ``CRITICAL WARNING: [BD 41-737] Cannot set the
|
|
parameter TRANSLATION\_MODE on /axi\_interconnect\_0. It is
|
|
read-only.'' is a non critical warning.}
|
|
Also contradicting
|
|
output is generated.\footnote{While using version 2018.2, the following
|
|
message was printed: ``WARNING: command 'get\_user\_parameter' will be removed in the 2015.3
|
|
release, use 'get\_user\_parameters' instead''.}
|
|
|
|
Programs or scripts that are called during the compile process do not
|
|
necessarily exit non zero if they encountered a critical error. Thus
|
|
finding the source of an error can be difficult due to the compile
|
|
process continuing after critical errors occurred. Not only programs
|
|
that have critical errors exit ``successfully'', but also python
|
|
scripts that encounter critical paths don't abort with raise(), but
|
|
print an error message to stdout and don't abort with an error.
|
|
|
|
The most often encountered critical compile error is
|
|
``Run 'impl\_1' has not been launched. Unable to open''. This error
|
|
indicates that something in the previous compile steps failed and can
|
|
refer to incorrectly generated testdata to unsupported LPM tables.
|
|
|
|
The NetFPGA kernel module provides access to virtual Linux
|
|
devices (nf0...nf3). However tcpdump does not see any packets that are
|
|
emitted from the switch. The only possibility to capture packets
|
|
that are emitted from the switch is by connecting a physical cable to
|
|
the port and capturing on the other side.
|
|
|
|
Jumbo frames\footnote{Frames with an MTU greater than 1500 bytes.} are
|
|
commonly used in 10 Gbit/s networks. According to
|
|
\cite{wikipedia:_jumbo}, even many gigabit network interface card
|
|
support jumbo frames. However according to emails on the private
|
|
NetPFGA mailing list, the NetFPGA only supports 1500 byte frames at
|
|
the moment and additional work is required to implement support for
|
|
bigger frames.
|
|
|
|
Our P4 source code requires to contains Xilinx
|
|
annotations\footnote{F.i. ``@Xilinx\_MaxPacketRegion(1024)''} that define
|
|
the maximum packet size in bits. We observed two different errors on
|
|
the output packet, if the incoming packets exceed the maximum packet size:
|
|
\begin{itemize}
|
|
\item The output packet is longer than the original packet.
|
|
\item The output packet is corrupted.
|
|
\end{itemize}
|
|
|
|
While most of the P4 language is supported on the NetFPGA, some key
|
|
techniques are currently missing or not supported.
|
|
\begin{itemize}
|
|
\item Analysing / accessing payload is not supported
|
|
\item Checksum computation over payload is not supported
|
|
\item Using LPM tables can lead to compilation errors
|
|
\item Depending on the match type, only certain table sizes are allowed
|
|
\end{itemize}
|
|
Renaming variables in the declaration of the parser or deparser lead
|
|
to compilation errors. The P4 function syntax is not supported. For this
|
|
reason our implementation uses \texttt{\#define} statements instead of functions.
|
|
%ok
|
|
% ----------------------------------------------------------------------
|
|
\section{\label{results:softwarenat64}Software Based NAT64}
|
|
Both solutions Tayga and Jool worked flawlessly. However as expected,
|
|
both solutions are CPU bound. Under high load
|
|
scenarios both solutions utilise one core fully. Neither Tayga as a
|
|
user space program nor Jool as a kernel module implement multi
|
|
threading.
|
|
%ok
|
|
% ----------------------------------------------------------------------
|
|
\section{\label{results:benchmark}NAT64 Benchmarks}
|
|
In this section we give an overview of the benchmark design
|
|
and summarise the benchmarking results.
|
|
% ----------------------------------------------------------------------
|
|
\subsection{\label{results:benchmark:design}Benchmark Design}
|
|
\begin{figure}[h]
|
|
\includegraphics[scale=0.6]{softwarenat64design}
|
|
\centering
|
|
\caption{Benchmark design for NAT64 in software implementations}
|
|
\label{fig:softwarenat64design}
|
|
\end{figure}
|
|
We use two hosts for performing benchmarks: a load generator and a
|
|
NAT64 translator. Both hosts are equipped with a dual port
|
|
Intel X520 10 Gbit/s network card. Both hosts are connected using DAC
|
|
without any equipment in between. TCP offloading is enabled in the
|
|
X520 cards. Figure \ref{fig:softwarenat64design}
|
|
shows the network setup.
|
|
When testing the NetPFGA/P4 performance, the X520 cards in the NAT64
|
|
translator are disconnected and instead the NetPFGA ports are
|
|
connected, as shown in figure \ref{fig:netpfgadesign}. The load
|
|
generator is equipped with a quad core CPU (Intel(R) Core(TM) i7-6700
|
|
CPU @ 3.40GHz), enabled with hyperthreading and 16 GB RAM. The NAT64
|
|
translator is also equipped with a quard core CPU (Intel(R) Core(TM)
|
|
i7-4770 CPU @ 3.40GHz) and 16 GB RAM.
|
|
The first 10 seconds of the benchmark are excluded to avoid the TCP
|
|
warm up phase.\footnote{iperf -O 10 parameter, see section \ref{design:tests}.}
|
|
\begin{figure}[h]
|
|
\includegraphics[scale=0.5]{netpfgadesign}
|
|
\centering
|
|
\caption{NAT64 with NetFPGA benchmark}
|
|
\label{fig:netpfgadesign}
|
|
\end{figure}
|
|
% ok
|
|
% ----------------------------------------------------------------------
|
|
\subsection{\label{results:benchmark:summary}Benchmark Summary}
|
|
Overall \textbf{tayga} has shown to be the slowest translator with an achieved
|
|
bandwidth of \textbf{about 3 Gbit/s}, followed by \textbf{Jool} that translates at
|
|
about \textbf{8 Gbit/s}. \textbf{Our solution} is the fastest with an almost line rate
|
|
translation speed of about \textbf{9 Gbit/s}.
|
|
|
|
The TCP based benchmarks show realistic numbers, while iperf reports
|
|
above line rate speeds (up to 22 gbit/s on a 10gbit/s link)
|
|
for UDP based benchmarks. For this reason we
|
|
have summarised the UDP based benchmarks with their average loss
|
|
instead of listing the bandwidth details. The ``adjusted bandwidth''
|
|
in the UDP benchmarks incorporates the packets loss (compare tables
|
|
\ref{tab:benchmarkv6v4udp} and \ref{tab:benchmarkv6v4udp}).
|
|
|
|
Both software solutions showed significant loss of packets in the UDP
|
|
based benchmarks (tayga: up to 91\%, jool up to 71\%), while the
|
|
P4/NetFPGA showed a maximum of 0.01\% packet loss. Packet loss is only
|
|
recorded by iperf for UDP based benchmarks, as TCP packets are confirmed and
|
|
resent if necessary.
|
|
|
|
Tayga has the highest variation of results, which might be due to
|
|
being fully CPU bound, even in the non-parallel benchmark. Jool has less
|
|
variation and in general the P4/NetFPGA solution behaves almost
|
|
identical in different benchmark runs.
|
|
|
|
The CPU load for TCP based benchmarks with Jool was almost negligible,
|
|
however for UDP based benchmarks one core was almost 100\%
|
|
utilised. In all benchmarks with tayga, one CPU was fully
|
|
utilised. When the translation for P4/NetFPGA happens within the
|
|
NetFPGA card, there was no CPU utilisation visible on the NAT64 host.
|
|
|
|
We see lower bandwidth for translating IPv4 to IPv6 in all solutions.
|
|
We suspect that this might be due to slighty increasing packet sizes that
|
|
occur during this direction of translation. Not only does this vary
|
|
the IPv4 versus IPv6 bandwidth, but it might also cause fragmentation
|
|
that slows down.
|
|
|
|
During the benchmarks with up to 10 parallel connections, no
|
|
significant CPU load was registered on the load generator. However
|
|
with 20 parallel connections, each of the two iperf
|
|
processes\footnote{The client process for sending, the server process for receiving.} partially
|
|
spiked to 100\% cpu usage. With 50 parallel connections the cpu
|
|
load of each process hit 100\% often. For this reason we argue that
|
|
the benchmark results of the benchmarks with 20 or more parallel
|
|
connections might be affected by the load generator limits. While
|
|
there is no visible evidence in our results, this problem might become
|
|
more significant with higher speed links.
|
|
|
|
While tayga's performance is reduced with the growing number of
|
|
parallel connections, both Jool and our P4/NetFPGA implementations
|
|
vary only slighty.
|
|
|
|
Overall the performance of tayga, a Linux user space program, is as
|
|
expected. We were surprised about the good performance of Jool, which,
|
|
while slower than the P4/NetFPGA solution, is almost on par with our solution.
|
|
% ----------------------------------------------------------------------
|
|
\newpage
|
|
\subsection{\label{results:benchmark:v6v4tcp}IPv6 to IPv4 TCP
|
|
Benchmark Results}
|
|
\begin{table}[htbp]
|
|
\begin{center}\begin{minipage}{\textwidth}
|
|
\begin{tabular}{| c | c | c | c | c |}
|
|
\hline
|
|
Implementation & \multicolumn{4}{|c|}{min/avg/max in Gbit/s} \\
|
|
\hline
|
|
Tayga & 2.79 / 3.20 / 3.43 & 3.34 / 3.36 / 3.38 & 2.57 / 3.02 / 3.27 &
|
|
2.35 / 2.91 / 3.20 \\
|
|
\hline
|
|
Jool & 8.22 / 8.22 / 8.22 & 8.21 / 8.21 / 8.22 & 8.21 / 8.23 / 8.25
|
|
& 8.21 / 8.23 / 8.25\\
|
|
\hline
|
|
P4 / NetPFGA & 9.28 / 9.28 / 9.29 & 9.28 / 9.28 / 9.29 & 9.28 / 9.28
|
|
/ 9.29 & 9.28 / 9.28 / 9.29\\
|
|
\hline
|
|
Parallel connections & 1 & 10 & 20 & 50 \\
|
|
\hline
|
|
\end{tabular}
|
|
\end{minipage}
|
|
\caption{IPv6 to IPv4 TCP NAT64 Benchmark}
|
|
\label{tab:benchmarkv6}
|
|
\end{center}
|
|
\end{table}
|
|
%ok
|
|
% ---------------------------------------------------------------------
|
|
\subsection{\label{results:benchmark:v4v6tcp}IPv4 to IPv6 TCP Benchmark Results}
|
|
\begin{table}[htbp]
|
|
\begin{center}\begin{minipage}{\textwidth}
|
|
\begin{tabular}{| c | c | c | c | c |}
|
|
\hline
|
|
Implementation & \multicolumn{4}{|c|}{min/avg/max in Gbit/s} \\
|
|
\hline
|
|
Tayga & 2.90 / 3.15 / 3.34 & 2.87 / 3.01 / 3.22 &
|
|
2.68 / 2.85 / 3.09 & 2.60 / 2.78 / 2.88 \\
|
|
\hline
|
|
Jool & 7.18 / 7.56 / 8.24 & 7.97 / 8.05 / 8.09 &
|
|
8.05 / 8.08 / 8.10 & 8.10 / 8.12 / 8.13 \\
|
|
\hline
|
|
P4 / NetPFGA & 8.51 / 8.53 / 8.55 & 9.28 / 9.28 / 9.29 & 9.29 / 9.29 /
|
|
9.29 & 9.28 / 9.28 / 9.29 \\
|
|
\hline
|
|
Parallel connections & 1 & 10 & 20 & 50 \\
|
|
\hline
|
|
\end{tabular}
|
|
\end{minipage}
|
|
\caption{IPv4 to IPv6 TCP NAT64 Benchmark}
|
|
\label{tab:benchmarkv4}
|
|
\end{center}
|
|
\end{table}
|
|
|
|
% ---------------------------------------------------------------------
|
|
\newpage
|
|
\subsection{\label{results:benchmark:v6v4udp}IPv6 to IPv4 UDP
|
|
Benchmark Results}
|
|
\begin{table}[htbp]
|
|
\begin{center}\begin{minipage}{\textwidth}
|
|
\begin{tabular}{| c | c | c | c | c |}
|
|
\hline
|
|
Implementation & \multicolumn{4}{|c|}{avg bandwidth in gbit/s / avg loss /
|
|
adjusted bandwith} \\
|
|
\hline
|
|
Tayga & 8.02 / 70\% / 2.43 & 9.39 / 79\% / 1.97 & 15.43 / 86\% / 2.11
|
|
& 19.27 / 91\% 1.73 \\
|
|
\hline
|
|
Jool & 6.44 / 0\% / 6.41 & 6.37 / 2\% / 6.25 &
|
|
16.13 / 64\% / 5.75 & 20.83 / 71\% / 6.04 \\
|
|
\hline
|
|
P4 / NetPFGA & 8.28 / 0\% / 8.28 & 9.26 / 0\% / 9.26 &
|
|
16.15 / 0\% / 16.15 & 15.8 / 0\% / 15.8 \\
|
|
\hline
|
|
Parallel connections & 1 & 10 & 20 & 50 \\
|
|
\hline
|
|
\end{tabular}
|
|
\end{minipage}
|
|
\caption{IPv6 to IPv4 UDP NAT64 Benchmark}
|
|
\label{tab:benchmarkv6v4udp}
|
|
\end{center}
|
|
\end{table}
|
|
%ok
|
|
% ---------------------------------------------------------------------
|
|
\subsection{\label{results:benchmark:v4v6udp}IPv4 to IPv6 UDP Benchmark Results}
|
|
\begin{table}[htbp]
|
|
\begin{center}\begin{minipage}{\textwidth}
|
|
\begin{tabular}{| c | c | c | c | c |}
|
|
\hline
|
|
Implementation & \multicolumn{4}{|c|}{avg bandwidth in gbit/s / avg loss /
|
|
adjusted bandwith} \\
|
|
\hline
|
|
Tayga & 6.78 / 84\% / 1.06 & 9.58 / 90\% / 0.96 &
|
|
15.67 / 91\% / 1.41 & 20.77 / 95\% / 1.04 \\
|
|
\hline
|
|
Jool & 4.53 / 0\% / 4.53 & 4.49 / 0\% / 4.49 & 13.26 / 0\% / 13.26 &
|
|
22.57 / 0\% / 22.57\\
|
|
\hline
|
|
P4 / NetPFGA & 7.04 / 0\% / 7.04 & 9.58 / 0\% / 9.58 &
|
|
9.78 / 0\% / 9.78 & 14.37 / 0\% / 14.37\\
|
|
\hline
|
|
Parallel connections & 1 & 10 & 20 & 50 \\
|
|
\hline
|
|
\end{tabular}
|
|
\end{minipage}
|
|
\caption{IPv4 to IPv6 UDP NAT64 Benchmark}
|
|
\label{tab:benchmarkv6v4udp}
|
|
\end{center}
|
|
\end{table}
|
|
%ok
|