master-thesis/doc/Results.tex

609 lines
28 KiB
TeX
Raw Normal View History

2019-07-29 17:13:47 +00:00
\chapter{\label{results}Results}
%** Results.tex: What were the results achieved including an evaluation
%
2019-08-14 15:23:12 +00:00
This section describes the achieved results and compares the P4 based
implementation with real world software solutions.
We distinguish the software implementation of P4 (BMV2) and the
hardware implementation (NetFPGA) due to significant differences in
deployment and development. We present benchmarks for the existing
software solutions as well as for our hardware implementation. As the
objective of this thesis is to demonstrate the high speed
2019-08-14 15:23:12 +00:00
capabilities of NAT64 in hardware, no benchmarks were performed on the
P4 software implementation.
2019-08-19 15:50:00 +00:00
% ok
2019-08-14 15:23:12 +00:00
% ----------------------------------------------------------------------
2019-08-22 10:01:24 +00:00
\section{\label{results:p4}P4 Based Implementations}
2019-08-19 23:29:26 +00:00
We successfully implemented P4 code to realise
NAT64~\cite{schottelius:thesisrepo}. It contains parsers
2019-08-20 09:15:44 +00:00
for all related protocols (IPv6, IPv4, UDP, TCP, ICMP, ICMP6, NDP,
ARP), supports EAMT as defined by RFC7757 ~\cite{rfc7757}, and is
2019-08-19 23:29:26 +00:00
feature equivalent to the two compared software solutions
Tayga~\cite{lutchansky:_Tayga_simpl_nat64_linux} and
Jool~\cite{mexico:_Jool_open_sourc_siit_nat64_linux}.
2019-08-19 23:29:26 +00:00
Due to limitations in the P4 environment of the
2019-08-20 09:15:44 +00:00
NetFPGA environment, the BMV2 implementation
is more feature rich.
2019-08-19 23:29:26 +00:00
2019-08-19 15:50:00 +00:00
For this thesis the parsing capabilities of P4 were adequate.
However P4, at the time of writing, cannot parse ICMP6 options in
general, as the upper level protocol does not specify the number
of options that follow and parsing of an undefined number
2019-08-19 23:29:26 +00:00
of 64 bit blocks is required, which P4 does not support.
2019-08-19 15:50:00 +00:00
2019-08-20 09:15:44 +00:00
The language has some limitations on the placement of
2019-08-19 15:50:00 +00:00
conditional statements (\texttt{if/switch}).\footnote{In general,
if and switch statements in actions lead to errors,
but not all constellations are forbidden.}
Furthermore P4/BMV2 does not support for multiple LPM keys in a table,
however it supports multiple keys with ternary matching, which is a
superset of LPM matching.
When developing P4 programs, the reason for incorrect behaviour we
have seen were checksum problems. This is in retrospective expected,
as the main task of our implementation is modifying headers on which
2019-08-19 15:50:00 +00:00
the checksums depend. In all cases we have seen Ethernet frame
checksum errors, the effective length of the packet was incorrect.
The tooling around P4 is somewhat fragile. We encountered small
language bugs during the development~\cite{schottelius:github1675},
2019-08-20 09:15:44 +00:00
(compare section \ref{appendix:netfpgalogs:compilelogs})
2019-08-19 15:50:00 +00:00
or found missing features~\cite{schottelius:github745},
~\cite{theojepsen:_get}: it is at the moment impossible to retrieve
the matching key from table or the name of the action called. Thus
if different table entries call the same action, it is impossible
within the action, or if forwarded to the controller, within the
controller to distinguish on which match the action was
triggered. This problem is very consistent within P4, not even the
2019-08-19 15:50:00 +00:00
matching table name can be retrieved. While these information can be
added manually as additional fields in the table entries, we would
expect a language to support reading and forwarding this kind of meta
information.
While in P4 the P4 code and the related controller are tightly
coupled, their data definitions are not. Thus the packet format
definition that is used between the P4 switch and the controller has
to be duplicated. Our experiences in software development indicate
that this duplication is a likely source of errors in bigger software
projects.
The supporting scripts in the P4 toolchain are usually written in
python2. However, python2 ``is
2019-08-19 15:50:00 +00:00
legacy''~\cite{various:_shoul_i_python_python}. During development
errors with unicode string handling in python2 caused
changes to IPv6 addresses.
2019-08-19 23:29:26 +00:00
% ok
2019-08-14 15:23:12 +00:00
% ----------------------------------------------------------------------
2019-08-19 23:29:26 +00:00
\section{\label{results:bmv2}P4/BMV2}
2019-08-15 13:33:08 +00:00
The software implementation of P4 has most features, which is
2019-08-19 23:29:26 +00:00
mostly due to the capability of creating checksums over the payload.
It enables the switch to act as a ``proper'' participant in NDP, as
this requires the host to calculate checksums over the payload.
Table~\ref{tab:p4bmv2features} references all implemented features.
2019-08-15 13:33:08 +00:00
\begin{table}[htbp]
\begin{center}\begin{minipage}{\textwidth}
\begin{tabular}{| c | c | c |}
\hline
\textbf{Feature} & \textbf{Description} & \textbf{Status} \\
\hline
2019-08-20 08:19:01 +00:00
Switch to controller & Switch forwards unhandled packets to
2019-08-15 13:33:08 +00:00
controller & fully implemented\footnote{Source code: \texttt{actions\_egress.p4}}\\
\hline
Controller to Switch & Controller can setup table entries &
fully implemented\footnote{Source code: \texttt{controller.py}}\\
\hline
NDP & Switch responds to ICMP6 neighbor & \\
& solicitation request (without controller) &
fully implemented\footnote{Source code:
\texttt{actions\_icmp6\_ndp\_icmp.p4}} \\
\hline
ARP & Switch can answer ARP request (without controller) & fully
implemented\footnote{Source code: \texttt{actions\_arp.p4}}\\
\hline
ICMP6 & Switch responds to ICMP6 echo request (without controller) &
fully implemented\footnote{Source code: \texttt{actions\_icmp6\_ndp\_icmp.p4}} \\
\hline
ICMP & Switch responds to ICMP echo request (without controller) &
fully implemented\footnote{Source code: \texttt{actions\_icmp6\_ndp\_icmp.p4}} \\
\hline
NAT64: TCP & Switch translates TCP with checksumming & \\
& from/to IPv6 to/from IPv4 &
fully implemented\footnote{Source code: \texttt{actions\_nat64\_generic\_icmp.p4}} \\
\hline
NAT64: UDP & Switch translates UDP with checksumming & \\
& from/to IPv6 to/from IPv4 &
fully implemented\footnote{Source code: \texttt{actions\_nat64\_generic\_icmp.p4}} \\
\hline
NAT64: & Switch translates echo request/reply & \\
ICMP/ICMP6 & from/to ICMP6 to/from ICMP with checksumming &
fully implemented\footnote{Source code: \texttt{actions\_nat64\_generic\_icmp.p4}} \\
\hline
NAT64: Sessions & Switch and controller create 1:n sessions/mappings &
fully implemented\footnote{Source code:
\texttt{actions\_nat64\_session.p4}, \texttt{controller.py}} \\
\hline
Delta Checksum & Switch can calculate checksum without payload
inspection &
fully implemented\footnote{Source code: \texttt{actions\_delta\_checksum.p4}}\\
\hline
Payload Checksum & Switch can calculate checksum with payload inspection &
fully implemented\footnote{Source code: \texttt{checksum\_bmv2.p4}}\\
\hline
\end{tabular}
\end{minipage}
\caption{P4/BMV2 Feature List}
2019-08-15 13:33:08 +00:00
\label{tab:p4bmv2features}
\end{center}
\end{table}
2019-08-19 23:29:26 +00:00
The switch responds to ICMP echo requests, ICMP6 echo requests,
answers NDP and ARP requests. Overall P4/BMV is very easy to use,
2019-08-19 23:29:26 +00:00
even without a controller a fully functional network host can be
implemented.
This P4/BMV implementation supports translating ICMP/ICMP6
echo request and echo reply messages, but does not support
all ICMP/ICMP6 translations that are defined in
2019-08-18 12:24:22 +00:00
RFC6145~\cite{rfc6145}.
2019-08-14 15:23:12 +00:00
% ----------------------------------------------------------------------
2019-08-19 23:29:26 +00:00
\section{\label{results:netpfga}P4/NetFPGA}
In the following section we describe the achieved feature set of
P4/NetFPGA in detail and analyse differences to the BMV2 based
implementation.
% ok
2019-08-15 14:45:56 +00:00
% ----------------------------------------------------------------------
2019-08-19 23:29:26 +00:00
\subsection{\label{results:netpfga:features}Features}
While the NetFPGA target supports P4, compared to P4/BMV2
we only implemented a reduced features set on P4/NetPFGA. The first
reason for this is missing
support of the NetFPGA P4 compiler to inspect payload and to compute
checksums over payload. While this can (partially) be compensated
using delta checksums, the compile time of 2 to 6 hours contributed to
a significant slower development cycle compared to BMV2.
Lastly, the focus of this thesis is to implement high speed NAT64 on
2019-08-19 23:29:26 +00:00
P4, which only requires a subset of the features that we realised on
2019-08-22 12:13:58 +00:00
BMV2. In table \ref{tab:p4netpfgafeatures} we summarise the implemented
features and reason about their portability afterwards:
2019-08-15 13:33:08 +00:00
\begin{table}[htbp]
\begin{center}\begin{minipage}{\textwidth}
\begin{tabular}{| c | c | c |}
\hline
\textbf{Feature} & \textbf{Description} & \textbf{Status} \\
\hline
2019-08-20 08:19:01 +00:00
Switch to controller & Switch forwards unhandled packets to
2019-08-22 12:13:58 +00:00
controller & portable\\
2019-08-15 13:33:08 +00:00
\hline
Controller to Switch & Controller can setup table entries &
2019-08-22 12:13:58 +00:00
portable\\
2019-08-15 13:33:08 +00:00
\hline
NDP & Switch responds to ICMP6 neighbor & \\
& solicitation request (without controller) &
2019-08-22 12:13:58 +00:00
portable\\
2019-08-15 13:33:08 +00:00
\hline
ARP & Switch can answer ARP request (without controller) &
2019-08-22 12:13:58 +00:00
portable \\
2019-08-15 13:33:08 +00:00
\hline
ICMP6 & Switch responds to ICMP6 echo request (without controller) &
2019-08-22 12:13:58 +00:00
portable\\
2019-08-15 13:33:08 +00:00
\hline
ICMP & Switch responds to ICMP echo request (without controller) &
2019-08-22 12:13:58 +00:00
portable\\
2019-08-15 13:33:08 +00:00
\hline
NAT64: TCP & Switch translates TCP with checksumming & \\
& from/to IPv6 to/from IPv4 &
fully implemented\footnote{Source code: \texttt{actions\_nat64\_generic\_icmp.p4}} \\
\hline
NAT64: UDP & Switch translates UDP with checksumming & \\
& from/to IPv6 to/from IPv4 &
fully implemented\footnote{Source code: \texttt{actions\_nat64\_generic\_icmp.p4}} \\
\hline
NAT64: & Switch translates echo request/reply & \\
ICMP/ICMP6 & from/to ICMP6 to/from ICMP with checksumming &
2019-08-22 12:13:58 +00:00
portable\\
2019-08-15 13:33:08 +00:00
\hline
NAT64: Sessions & Switch and controller create 1:n sessions/mappings &
2019-08-22 12:13:58 +00:00
portable\\
2019-08-15 13:33:08 +00:00
\hline
Delta Checksum & Switch can calculate checksum without payload
inspection &
fully implemented\footnote{Source code: \texttt{actions\_delta\_checksum.p4}}\\
\hline
Payload Checksum & Switch can calculate checksum with payload inspection &
2019-08-22 12:13:58 +00:00
unsupported \\
2019-08-15 13:33:08 +00:00
\hline
\end{tabular}
\end{minipage}
\caption{P4/NetFPGA Feature List}
2019-08-15 13:33:08 +00:00
\label{tab:p4netpfgafeatures}
\end{center}
\end{table}
2019-08-22 12:13:58 +00:00
The switch to controller communication differs,
because the P4/NetFPGA implementation does not have the clone3() extern
that the BMV2 implementation offers. However communication to the
controller can easily be realised by using one of
the additional ports of the NetFPGA and connect a physical network
card to it.
Communicating from the controller towards the switch also differs, as
the p4utils suite supporting BMV2 offers an easy access to the switch
tables. While the P4-NetFPGA support repository also offers python
scripts to modify the switch tables, the code is less sophisticated
and more fragile. While porting the existing code is possible, it
might be of advantage to rewrite parts of the P4-NetFPGA before.
The NAT64 session support is based on the P4 switch communicating with
the controller and vice versa. As we consider both features to be
portable, we also consider the NAT64 session feature to be portable.
P4/NetFPGA does not offer calculating the checksum over the payload
and thus calculating the checksum over the payload to create
a reply for an neighbor solicitation packet is not possible. However,
as the payload stays the same as in the request, our delta based
checksum approach can be reused in this situation. With the same
reasoning we consider our ICMP6 and ICMP code, which also requires to
create payload based checksums, to be portable.
ARP replies do not contain a checksum over the payload, thus the
existing ARP code can be directly integrated into P4/NetFPGA without
any changes.
While the P4/NetFPGA target currently does not support accessing the
payload or creating checksums over it, there are two possibilities to
extend the platform: either by creating an HDL module or by
modify the generated the PX
program.~\cite{schottelius:_exter_p4_netpf}
Due to the existing code complexity of the P4/NetFPGA platform, using
the HDL module based approach is likely to be more sustainable.
2019-08-19 23:29:26 +00:00
% ok
2019-08-14 14:18:27 +00:00
% ----------------------------------------------------------------------
2019-08-19 23:29:26 +00:00
\subsection{\label{results:netpfga:stability}Stability}
Two different NetPFGA cards were used during the development of this
2019-08-15 13:33:08 +00:00
thesis. The first card had consistent ioctl errors (compare section
2019-08-20 09:15:44 +00:00
\ref{appendix:netfpgalogs:compilelogs}) when writing table entries. The available
2019-08-15 13:33:08 +00:00
hardware tests (compare figures \ref{fig:hwtestnico} and
\ref{fig:hwtesthendrik}) showed failures in both cards, however the
first card reported an additional ``10G\_Loopback'' failure. Due to
the inability of setting table entries, no benchmarking was performed
on the first NetFPGA card.
2019-08-22 12:13:58 +00:00
\begin{figure}[htbp]
2019-08-15 13:33:08 +00:00
\includegraphics[scale=1.4]{hwtestnico}
\centering
\caption{Hardware Test NetPFGA Card 1}
2019-08-15 13:33:08 +00:00
\label{fig:hwtestnico}
\end{figure}
2019-08-22 12:13:58 +00:00
\begin{figure}[htbp]
2019-08-15 13:33:08 +00:00
\includegraphics[scale=0.2]{hwtesthendrik}
\centering
\caption{Hardware Test NetPFGA Card 2~\cite{hendrik:_p4_progr_fpga_semes_thesis_sa}}
2019-08-15 13:33:08 +00:00
\label{fig:hwtesthendrik}
\end{figure}
During the development and benchmarking, the second NetFPGA card stopped to
2019-08-19 23:29:26 +00:00
function properly multiple times. In theses cases the card would not
forward packets anymore. Multiple reboots (up to 3)
2019-08-15 13:33:08 +00:00
and multiple times reflashing the bitstream to the NetFPGA usually
2019-08-15 14:45:56 +00:00
restored the intended behaviour. However due to this ``crashes'', it
was impossible for us to run a benchmark for more than one hour.
2019-08-20 08:19:01 +00:00
Similarly, sometimes flashing the bitstream to the NetFPGA would fail.
2019-08-19 23:29:26 +00:00
It was required to reboot the host containing the
NetFPGA card up to 3 times to enable successful flashing.\footnote{Typical
output of the flashing process would be: ``fpga configuration
failed. DONE PIN is not HIGH''}
% ok
2019-08-15 13:33:08 +00:00
% ----------------------------------------------------------------------
2019-08-18 21:58:10 +00:00
\subsubsection{\label{results:netpfga:performance}Performance}
2019-08-20 08:19:01 +00:00
The NetFPGA card performed at near line speed and offers
2019-08-19 23:29:26 +00:00
NAT64 translations at 9.28 Gbit/s (see section \ref{results:benchmark}
for details).
Single and multiple streams
2019-08-15 14:45:56 +00:00
performed almost exactly identical and have been consistent through
multiple iterations of the benchmarks.
2019-08-19 23:29:26 +00:00
% ok
2019-08-15 14:45:56 +00:00
% ----------------------------------------------------------------------
2019-08-19 23:29:26 +00:00
\subsection{\label{results:netpfga:usability}Usability}
The handling and usability of the NetFPGA card is rather difficult. In
this section we describe our findings and experiences with the card
and its toolchain.
2019-08-20 08:19:01 +00:00
To use the NetFPGA, the tools Vivado and SDNET provided by Xilinx need to be
2019-08-15 14:45:56 +00:00
installed. However a bug in the installer triggers an infinite loop,
if a certain shared library\footnote{The required shared library
is libncurses5.} is missing on the target operating system. The
installation program seems to be still progressing, however never
finishes.
2019-08-15 14:45:56 +00:00
While the NetFPGA card supports P4, the toolchains and supporting
scripts are in an immature state. The compilation process consists of
at least 9 different steps, which are interdependent.\footnote{See
2019-08-15 14:45:56 +00:00
source code \texttt{bin/do-all-steps.sh}.} Some of the steps generate
shell scripts and python scripts that in turn generate JSON
data.\footnote{One compilation step calls the script
``config\_writes.py''. This script failed with a syntax error, as it
contained incomplete python code. The scripts config\_writes.py
and config\_writes.sh are generated by gen\_config\_writes.py.
The output of the script gen\_config\_writes.py depends on the content
of config\_writes.txt. That file is generated by the simulation
``xsim''. The file ``SimpleSumeSwitch\_tb.sv'' contains code that is
responsible for writing config\_writes.txt and uses a function
named axi4\_lite\_master\_write\_request\_control for generating the
output. This in turn is dependent on the output of a script named
gen\_testdata.py.}
However incorrect parsing generates syntactically incorrect
scripts or scripts that generate incorrect output. The toolchain
2019-08-20 08:19:01 +00:00
provided by the NetFPGA-P4 repository contains more than 80000 lines
2019-08-15 14:45:56 +00:00
of code. The supporting scripts for setting table entries require
setting the parameters for all possible actions, not only for the
selected action. Supplying only the required parameters results in a
crash of the supporting script.
The documentation for using the NetFPGA-P4 repository is very
distributed and does not contain a reference on how to use the
tools. Mapping of egress ports and their metadata field are found in a
python script that is used for generating test data.
The compile process can take up to 6 hours and because the different
steps are interdependent, errors in a previous stage were in our
experiences detected hours after they happened. The resulting log
files of the compilation process can be up to 5 MB in size. Within
this log file various commands output references to other logfiles,
however the referenced logfiles do not exist before or after the
compile process.
During the compile process various informational, warning and error
messages are printed. However some informational messages constitute
critical errors, while on the other hand critical errors and syntax
2019-08-20 08:19:01 +00:00
errors often do not constitute a critical
2019-08-15 14:45:56 +00:00
error.\footnote{F.i. ``CRITICAL WARNING: [BD 41-737] Cannot set the
parameter TRANSLATION\_MODE on /axi\_interconnect\_0. It is
read-only.'' is a non critical warning.}
Also contradicting
2019-08-15 15:08:10 +00:00
output is generated.\footnote{While using version 2018.2, the following
2019-08-15 14:45:56 +00:00
message was printed: ``WARNING: command 'get\_user\_parameter' will be removed in the 2015.3
release, use 'get\_user\_parameters' instead''.}
2019-08-15 15:08:10 +00:00
Programs or scripts that are called during the compile process do not
necessarily exit non zero if they encountered a critical error. Thus
finding the source of an error can be difficult due to the compile
2019-08-20 08:19:01 +00:00
process continuing after critical errors occurred. Not only programs
2019-08-15 15:08:10 +00:00
that have critical errors exit ``successfully'', but also python
scripts that encounter critical paths don't abort with raise(), but
print an error message to stdout and don't abort with an error.
The most often encountered critical compile error is
``Run 'impl\_1' has not been launched. Unable to open''. This error
indicates that something in the previous compile steps failed and can
refer to incorrectly generated testdata to unsupported LPM tables.
2019-08-15 14:45:56 +00:00
The NetFPGA kernel module provides access to virtual Linux
devices (nf0...nf3). However tcpdump does not see any packets that are
emitted from the switch. The only possibility to capture packets
that are emitted from the switch is by connecting a physical cable to
the port and capturing on the other side.
2019-08-15 13:33:08 +00:00
2019-08-15 14:45:56 +00:00
Jumbo frames\footnote{Frames with an MTU greater than 1500 bytes.} are
commonly used in 10 Gbit/s networks. According to
\cite{wikipedia:_jumbo}, even many gigabit network interface card
2019-08-15 14:45:56 +00:00
support jumbo frames. However according to emails on the private
NetPFGA mailing list, the NetFPGA only supports 1500 byte frames at
the moment and additional work is required to implement support for
bigger frames.
2019-08-15 13:33:08 +00:00
2019-08-22 09:17:56 +00:00
Our P4 source code requires to contains Xilinx
2019-08-15 15:08:10 +00:00
annotations\footnote{F.i. ``@Xilinx\_MaxPacketRegion(1024)''} that define
the maximum packet size in bits. We observed two different errors on
2019-08-22 09:17:56 +00:00
the output packet, if the incoming packets exceed the maximum packet size:
2019-08-15 15:08:10 +00:00
\begin{itemize}
\item The output packet is longer than the original packet.
2019-08-15 15:08:10 +00:00
\item The output packet is corrupted.
\end{itemize}
2019-08-22 09:17:56 +00:00
While most of the P4 language is supported on the NetFPGA, some key
techniques are currently missing or not supported.
2019-08-15 14:45:56 +00:00
\begin{itemize}
\item Analysing / accessing payload is not supported
\item Checksum computation over payload is not supported
\item Using LPM tables can lead to compilation errors
2019-08-20 08:19:01 +00:00
\item Depending on the match type, only certain table sizes are allowed
2019-08-15 14:45:56 +00:00
\end{itemize}
Renaming variables in the declaration of the parser or deparser lead
2019-08-22 09:17:56 +00:00
to compilation errors. The P4 function syntax is not supported. For this
2019-08-15 14:45:56 +00:00
reason our implementation uses \texttt{\#define} statements instead of functions.
2019-08-19 23:29:26 +00:00
%ok
% ----------------------------------------------------------------------
2019-08-22 10:01:24 +00:00
\section{\label{results:softwarenat64}Software Based NAT64}
2019-08-19 23:29:26 +00:00
Both solutions Tayga and Jool worked flawlessly. However as expected,
2019-08-20 09:15:44 +00:00
both solutions are CPU bound. Under high load
2019-08-19 23:29:26 +00:00
scenarios both solutions utilise one core fully. Neither Tayga as a
user space program nor Jool as a kernel module implement multi
threading.
%ok
% ----------------------------------------------------------------------
\section{\label{results:benchmark}NAT64 Benchmarks}
2019-08-20 09:15:44 +00:00
In this section we give an overview of the benchmark design
and summarise the benchmarking results.
2019-08-19 15:50:00 +00:00
% ----------------------------------------------------------------------
\subsection{\label{results:benchmark:design}Benchmark Design}
2019-08-22 12:13:58 +00:00
\begin{figure}[htbp]
2019-08-22 09:17:56 +00:00
\includegraphics[scale=0.6]{softwarenat64design}
2019-08-19 15:50:00 +00:00
\centering
\caption{Benchmark Design for NAT64 in Software Implementations}
2019-08-19 15:50:00 +00:00
\label{fig:softwarenat64design}
\end{figure}
We use two hosts for performing benchmarks: a load generator and a
NAT64 translator. Both hosts are equipped with a dual port
Intel X520 10 Gbit/s network card. Both hosts are connected using DAC
without any equipment in between. TCP offloading is enabled in the
X520 cards. Figure \ref{fig:softwarenat64design}
shows the network setup.
When testing the NetPFGA/P4 performance, the X520 cards in the NAT64
translator are disconnected and instead the NetPFGA ports are
connected, as shown in figure \ref{fig:netpfgadesign}. The load
2019-08-19 15:50:00 +00:00
generator is equipped with a quad core CPU (Intel(R) Core(TM) i7-6700
CPU @ 3.40GHz), enabled with hyperthreading and 16 GB RAM. The NAT64
translator is also equipped with a quard core CPU (Intel(R) Core(TM)
i7-4770 CPU @ 3.40GHz) and 16 GB RAM.
The first 10 seconds of the benchmark are excluded to avoid the TCP
warm up phase.\footnote{iperf -O 10 parameter, see section \ref{design:tests}.}
\begin{figure}[h]
\includegraphics[scale=0.5]{netpfgadesign}
\centering
\caption{NAT64 with NetFPGA Benchmark}
2019-08-19 15:50:00 +00:00
\label{fig:netpfgadesign}
\end{figure}
% ok
% ----------------------------------------------------------------------
2019-08-20 09:15:44 +00:00
\subsection{\label{results:benchmark:summary}Benchmark Summary}
2019-08-22 12:13:58 +00:00
Overall \textbf{Tayga} has shown to be the slowest translator with an
achieved bandwidth of \textbf{about 3 Gbit/s}, followed by
\textbf{Jool} that translates at about \textbf{8 Gbit/s}. \textbf{Our
solution} is the fastest with an almost line rate translation speed
of about \textbf{9 Gbit/s} (compare tables \ref{tab:benchmarkv6} and
\ref{tab:benchmarkv4}).
2019-08-20 09:15:44 +00:00
The TCP based benchmarks show realistic numbers, while iperf reports
2019-08-22 12:13:58 +00:00
above line rate speeds (up to 22 gbit/s on a 10gbit/s link) for UDP
based benchmarks. For this reason we have summarised the UDP based
benchmarks with their average loss instead of listing the bandwidth
details. The ``adjusted bandwidth'' in the UDP benchmarks incorporates
the packets loss (compare tables \ref{tab:benchmarkv6v4udp} and
\ref{tab:benchmarkv4v6udp}).
2019-08-20 09:15:44 +00:00
Both software solutions showed significant loss of packets in the UDP
based benchmarks (Tayga: up to 91\%, Jool up to 71\%), while the
2019-08-20 09:15:44 +00:00
P4/NetFPGA showed a maximum of 0.01\% packet loss. Packet loss is only
2019-08-22 12:13:58 +00:00
recorded by iperf for UDP based benchmarks, as TCP packets are
confirmed and resent if necessary.
2019-08-20 09:15:44 +00:00
Tayga has the highest variation of results, which might be due to
2019-08-22 12:13:58 +00:00
being fully CPU bound, even in the non-parallel benchmark. Jool has
less variation and in general the P4/NetFPGA solution behaves almost
2019-08-20 09:15:44 +00:00
identical in different benchmark runs.
The CPU load for TCP based benchmarks with Jool was almost negligible,
however for UDP based benchmarks one core was almost 100\%
utilised. In all benchmarks with Tayga, one CPU was fully
2019-08-22 09:17:56 +00:00
utilised. When the translation for P4/NetFPGA happens within the
2019-08-20 09:15:44 +00:00
NetFPGA card, there was no CPU utilisation visible on the NAT64 host.
We see lower bandwidth for translating IPv4 to IPv6 in all solutions.
2019-08-22 12:13:58 +00:00
We suspect that this might be due to slighty increasing packet sizes
that occur during this direction of translation. Not only does this
vary the IPv4 versus IPv6 bandwidth, but it might also cause
fragmentation that slows down.
2019-08-20 09:15:44 +00:00
2019-08-22 09:17:56 +00:00
During the benchmarks with up to 10 parallel connections, no
2019-08-20 09:15:44 +00:00
significant CPU load was registered on the load generator. However
with 20 parallel connections, each of the two iperf
2019-08-22 09:17:56 +00:00
processes\footnote{The client process for sending, the server process for receiving.} partially
2019-08-22 10:10:49 +00:00
spiked to 100\% CPU usage. With 50 parallel connections the CPU
2019-08-22 09:17:56 +00:00
load of each process hit 100\% often. For this reason we argue that
the benchmark results of the benchmarks with 20 or more parallel
connections might be affected by the load generator limits. While
there is no visible evidence in our results, this problem might become
more significant with higher speed links.
2019-08-20 09:15:44 +00:00
While Tayga's performance is reduced with the growing number of
2019-08-20 09:15:44 +00:00
parallel connections, both Jool and our P4/NetFPGA implementations
vary only slighty.
Overall the performance of Tayga, a Linux user space program, is as
2019-08-20 09:15:44 +00:00
expected. We were surprised about the good performance of Jool, which,
while slower than the P4/NetFPGA solution, is almost on par with our solution.
% ----------------------------------------------------------------------
2019-08-18 21:58:10 +00:00
\begin{table}[htbp]
2019-08-22 12:13:58 +00:00
\begin{center}
2019-08-18 21:58:10 +00:00
\begin{tabular}{| c | c | c | c | c |}
\hline
Implementation & \multicolumn{4}{|c|}{min/avg/max in Gbit/s} \\
\hline
Tayga & 2.79 / 3.20 / 3.43 & 3.34 / 3.36 / 3.38 & 2.57 / 3.02 / 3.27 &
2.35 / 2.91 / 3.20 \\
\hline
Jool & 8.22 / 8.22 / 8.22 & 8.21 / 8.21 / 8.22 & 8.21 / 8.23 / 8.25
& 8.21 / 8.23 / 8.25\\
\hline
P4 / NetPFGA & 9.28 / 9.28 / 9.29 & 9.28 / 9.28 / 9.29 & 9.28 / 9.28
/ 9.29 & 9.28 / 9.28 / 9.29\\
\hline
Parallel connections & 1 & 10 & 20 & 50 \\
\hline
\end{tabular}
\caption{IPv6 to IPv4 TCP NAT64 Benchmark}
\label{tab:benchmarkv6}
\end{center}
\end{table}
2019-08-20 09:15:44 +00:00
%ok
2019-08-19 23:29:26 +00:00
% ---------------------------------------------------------------------
2019-08-18 21:58:10 +00:00
\begin{table}[htbp]
2019-08-22 12:13:58 +00:00
\begin{center}
2019-08-18 21:58:10 +00:00
\begin{tabular}{| c | c | c | c | c |}
\hline
Implementation & \multicolumn{4}{|c|}{min/avg/max in Gbit/s} \\
\hline
Tayga & 2.90 / 3.15 / 3.34 & 2.87 / 3.01 / 3.22 &
2.68 / 2.85 / 3.09 & 2.60 / 2.78 / 2.88 \\
\hline
Jool & 7.18 / 7.56 / 8.24 & 7.97 / 8.05 / 8.09 &
8.05 / 8.08 / 8.10 & 8.10 / 8.12 / 8.13 \\
\hline
P4 / NetPFGA & 8.51 / 8.53 / 8.55 & 9.28 / 9.28 / 9.29 & 9.29 / 9.29 /
9.29 & 9.28 / 9.28 / 9.29 \\
\hline
Parallel connections & 1 & 10 & 20 & 50 \\
\hline
\end{tabular}
\caption{IPv4 to IPv6 TCP NAT64 Benchmark}
\label{tab:benchmarkv4}
\end{center}
\end{table}
2019-08-19 23:29:26 +00:00
% ---------------------------------------------------------------------
2019-08-18 21:58:10 +00:00
\begin{table}[htbp]
2019-08-22 12:13:58 +00:00
\begin{center}
2019-08-18 21:58:10 +00:00
\begin{tabular}{| c | c | c | c | c |}
\hline
Implementation & \multicolumn{4}{|c|}{avg bandwidth in gbit/s / avg loss /
adjusted bandwith} \\
\hline
Tayga & 8.02 / 70\% / 2.43 & 9.39 / 79\% / 1.97 & 15.43 / 86\% / 2.11
& 19.27 / 91\% 1.73 \\
\hline
Jool & 6.44 / 0\% / 6.41 & 6.37 / 2\% / 6.25 &
16.13 / 64\% / 5.75 & 20.83 / 71\% / 6.04 \\
\hline
P4 / NetPFGA & 8.28 / 0\% / 8.28 & 9.26 / 0\% / 9.26 &
16.15 / 0\% / 16.15 & 15.8 / 0\% / 15.8 \\
\hline
Parallel connections & 1 & 10 & 20 & 50 \\
\hline
\end{tabular}
\caption{IPv6 to IPv4 UDP NAT64 Benchmark}
2019-08-20 09:15:44 +00:00
\label{tab:benchmarkv6v4udp}
2019-08-18 21:58:10 +00:00
\end{center}
\end{table}
2019-08-20 09:15:44 +00:00
%ok
2019-08-19 23:29:26 +00:00
% ---------------------------------------------------------------------
2019-08-18 21:58:10 +00:00
\begin{table}[htbp]
2019-08-22 12:13:58 +00:00
\begin{center}
2019-08-18 21:58:10 +00:00
\begin{tabular}{| c | c | c | c | c |}
\hline
Implementation & \multicolumn{4}{|c|}{avg bandwidth in gbit/s / avg loss /
adjusted bandwith} \\
\hline
Tayga & 6.78 / 84\% / 1.06 & 9.58 / 90\% / 0.96 &
15.67 / 91\% / 1.41 & 20.77 / 95\% / 1.04 \\
\hline
Jool & 4.53 / 0\% / 4.53 & 4.49 / 0\% / 4.49 & 13.26 / 0\% / 13.26 &
22.57 / 0\% / 22.57\\
\hline
P4 / NetPFGA & 7.04 / 0\% / 7.04 & 9.58 / 0\% / 9.58 &
9.78 / 0\% / 9.78 & 14.37 / 0\% / 14.37\\
\hline
Parallel connections & 1 & 10 & 20 & 50 \\
\hline
\end{tabular}
\caption{IPv4 to IPv6 UDP NAT64 Benchmark}
2019-08-22 12:13:58 +00:00
\label{tab:benchmarkv4v6udp}
2019-08-18 21:58:10 +00:00
\end{center}
\end{table}
2019-08-20 09:15:44 +00:00
%ok