Cleanup appendix

This commit is contained in:
Nico Schottelius 2019-08-20 01:29:26 +02:00
parent d99a65ed27
commit 27d4d449aa
4 changed files with 591 additions and 1492 deletions

View file

@ -66,3 +66,6 @@ Long term supporting python3 would be helpful. P4OS.
- react on FIN/RST (?) -- could be an addition - react on FIN/RST (?) -- could be an addition
P4os - reusable code
Future work: session handling

View file

@ -14,12 +14,26 @@ P4 software implementation.
% ok % ok
% ---------------------------------------------------------------------- % ----------------------------------------------------------------------
\section{\label{results:p4}P4 based implementations} \section{\label{results:p4}P4 based implementations}
We successfully implemented P4 code to realise
NAT64~\cite{schottelius:thesisrepo}. It contains parsers
for all related protocols (ipv6, ipv4, udp, tcp, icmp, icmp6, ndp,
arp), supports EAMT as defined by RFC7757 ~\cite{rfc7757} and is
feature equivalent to the two compared software solutions
tayga~\cite{lutchansky:_tayga_simpl_nat64_linux} and
jool~\cite{mexico:_jool_open_sourc_siit_nat64_linux}.
Due to limitations in the P4 environment of the
NetFPGA~\cite{conclusion:netfpga} environment, the BMV2 implementation
is more feature rich. Table \ref{tab:benchmark} summarises the
achieved bandwidths of the NAT64 solutions.
BEFORE OR AFTER MARKER - FIXME
All planned features could be realised with P4 and a controller. All planned features could be realised with P4 and a controller.
For this thesis the parsing capabilities of P4 were adequate. For this thesis the parsing capabilities of P4 were adequate.
However P4, at the time of writing, cannot parse ICMP6 options in However P4, at the time of writing, cannot parse ICMP6 options in
general, as the upper level protocol does not specify the number general, as the upper level protocol does not specify the number
of options that follow and parsing of an undefined number of options that follow and parsing of an undefined number
of 64 bit blocks is required. of 64 bit blocks is required, which P4 does not support.
The language has some limitations on where the placement of The language has some limitations on where the placement of
conditional statements (\texttt{if/switch}).\footnote{In general, conditional statements (\texttt{if/switch}).\footnote{In general,
@ -61,34 +75,15 @@ The supporting scripts in the P4 toolchain are usually written in
python2. However python2 ``is python2. However python2 ``is
legacy''~\cite{various:_shoul_i_python_python}. During development legacy''~\cite{various:_shoul_i_python_python}. During development
errors with unicode string handling in python2 caused errors with unicode string handling in python2 caused
changes to IPv6 addresses.~\ref{appendix:p4:python2unicode} changes to IPv6 addresses.\footnote{Compare section ~\ref{appendix:p4:python2unicode}.}
% ok
P4os - reusable code
% idomatic problem: Security issue: not checking checksums before
****** TODO IPv6 udp -> IPv4
- Got 4-5 tuple ([proto], src ip, src port, dst ip, dst port)
- Does not / never signal end
- Needs timeout for cleaning up
P4/BMV2 thus
allows us to closest resemble any other translation implementation.
Only supporting /96, not other embeddings as described in
section \ref{background:transition:prefixnat}.
% ---------------------------------------------------------------------- % ----------------------------------------------------------------------
\subsection{\label{Results:BMV2}BMV2} \section{\label{results:bmv2}P4/BMV2}
The software implementation of P4 has most features, which is The software implementation of P4 has most features, which is
mostly due to the capability of checksumming the payload: Acting mostly due to the capability of creating checksums over the payload.
as a ``proper'' participant in NDP, requires the host to calculate It enables the switch to act as a ``proper'' participant in NDP, as
checksums over the payload. this requires the host to calculate checksums over the payload.
Table~\ref{tab:p4bmv2features} references all implemented features.
List of features BMV2 ~\cite{tab:p4bmv2features}
\begin{table}[htbp] \begin{table}[htbp]
\begin{center}\begin{minipage}{\textwidth} \begin{center}\begin{minipage}{\textwidth}
\begin{tabular}{| c | c | c |} \begin{tabular}{| c | c | c |}
@ -144,38 +139,34 @@ fully implemented\footnote{Source code: \texttt{checksum\_bmv2.p4}}\\
\label{tab:p4bmv2features} \label{tab:p4bmv2features}
\end{center} \end{center}
\end{table} \end{table}
The switch responds to ICMP echo requests, ICMP6 echo requests,
answers NDP and ARP requests. Overall P4/BMV is very easy to use
even without a controller a fully functional network host can be
implemented.
Responds to icmp, icmp6 This P4/BMV implementation supports translating ICMP/ICMP6
ndp ~\cite{rfc4861} echo request and echo reply messages, but does not support
arp all ICMP/ICMP6 translations that are defined in
very easy to use
Fully functional host
Can compute checksums on its own.
focus on typical use cases of icmp, icmp6, the software implementation
supports translating echo request and echo reply messages, but does
not support all ICMP/ICMP6 translations that are defined in
RFC6145~\cite{rfc6145}. RFC6145~\cite{rfc6145}.
Stateful : no automatic removal
Session management not benchmarked, as it is only a matter of creating
table entries.
Jool and tayga are supported by
% ---------------------------------------------------------------------- % ----------------------------------------------------------------------
\subsection{\label{results:netpfga}NetFPGA - FIXME: writing} \section{\label{results:netpfga}P4/NetFPGA}
The reduced feature set of the NetPFGA implementation is due to two In the following section we describe the achieved feature set of
factors: compile time. Between 2 to 6 hours per compile run. No P4/NetFPGA in detail and analyse differences to the BMV2 based
payload checksum implementation.
% ok
overview - general translation - not advanced features
% ---------------------------------------------------------------------- % ----------------------------------------------------------------------
\subsubsection{\label{results:netpfga:features}Features} \subsection{\label{results:netpfga:features}Features}
While the NetFPGA target supports P4, compared to P4/BMV2
we only implemented a reduced features set on P4/NetPFGA. The first
reason for this is missing
support of the NetFPGA P4 compiler to inspect payload and to compute
checksums over payload. While this can (partially) be compensated
using delta checksums, the compile time of 2 to 6 hours contributed to
a significant slower development cycle compared to BMV2.
Lastly, the focus of this thesis was to implement high speed NAT64 on
P4, which only requires a subset of the features that we realised on
BMV2. Table \ref{tab:p4netpfgafeatures} summarises the implemented
features and reasons about their implementation status.
\begin{table}[htbp] \begin{table}[htbp]
\begin{center}\begin{minipage}{\textwidth} \begin{center}\begin{minipage}{\textwidth}
\begin{tabular}{| c | c | c |} \begin{tabular}{| c | c | c |}
@ -243,8 +234,9 @@ unsupported\footnote{To support creating payload checksums, either an
\label{tab:p4netpfgafeatures} \label{tab:p4netpfgafeatures}
\end{center} \end{center}
\end{table} \end{table}
% ok
% ---------------------------------------------------------------------- % ----------------------------------------------------------------------
\subsubsection{\label{results:netpfga:stability}Stability} \subsection{\label{results:netpfga:stability}Stability}
Two different NetPFGA cards were used during the development of the Two different NetPFGA cards were used during the development of the
thesis. The first card had consistent ioctl errors (compare section thesis. The first card had consistent ioctl errors (compare section
\ref{netpfgaioctlerror}) when writing table entries. The available \ref{netpfgaioctlerror}) when writing table entries. The available
@ -266,25 +258,33 @@ on the first NetFPGA card.
\label{fig:hwtesthendrik} \label{fig:hwtesthendrik}
\end{figure} \end{figure}
During the development and benchmarking, the second NetFPGA card stopped to During the development and benchmarking, the second NetFPGA card stopped to
function properly multiple times. In both cases the card would not function properly multiple times. In theses cases the card would not
forward packets anymore. Multiple reboots (3 were usually enough) forward packets anymore. Multiple reboots (up to 3)
and multiple times reflashing the bitstream to the NetFPGA usually and multiple times reflashing the bitstream to the NetFPGA usually
restored the intended behaviour. However due to this ``crashes'', it restored the intended behaviour. However due to this ``crashes'', it
was impossible to complete a full benchmark run that would last for was impossible for us run a benchmark for more than one hour.
more than one hour. Similariy, sometimes flashing the bitstream to the NetFPGA would fail.
It was required to reboot the host containing the
Sometimes it was also required to reboot the host containing the NetFPGA card up to 3 times to enable successful flashing.\footnote{Typical
NetFPGA card 3 times to enable successful flashing.\footnote{Typical output of the flashing process would be: ``fpga configuration
output of the flashing process would be: ``fpga configuration failed. DONE PIN is not HIGH''} failed. DONE PIN is not HIGH''}
% ok
% ---------------------------------------------------------------------- % ----------------------------------------------------------------------
\subsubsection{\label{results:netpfga:performance}Performance} \subsubsection{\label{results:netpfga:performance}Performance}
As expected, the NetFGPA card performed at near line speed and offers The NetFGPA card performed at near line speed and offers
NAT64 translations at 9.28 Gbit/s. Single and multiple streams NAT64 translations at 9.28 Gbit/s (see section \ref{results:benchmark}
for details).
Single and multiple streams
performed almost exactly identical and have been consistent through performed almost exactly identical and have been consistent through
multiple iterations of the benchmarks. multiple iterations of the benchmarks.
% ok
% ---------------------------------------------------------------------- % ----------------------------------------------------------------------
\subsubsection{\label{results:netpfga:usability}Usability} \subsection{\label{results:netpfga:usability}Usability}
To use the NetFGPA, Vivado and SDNET provided by Xilinx need to be The handling and usability of the NetFPGA card is rather difficult. In
this section we describe our findings and experiences with the card
and its toolchain.
To use the NetFGPA, the tools Vivado and SDNET provided by Xilinx need to be
installed. However a bug in the installer triggers an infinite loop, installed. However a bug in the installer triggers an infinite loop,
if a certain shared library\footnote{The required shared library if a certain shared library\footnote{The required shared library
is libncurses5.} is missing on the target operating system. The is libncurses5.} is missing on the target operating system. The
@ -388,36 +388,68 @@ techniques are missing or not supported.
Renaming variables in the declaration of the parser or deparser lead Renaming variables in the declaration of the parser or deparser lead
to compilation errors. Function syntax is not supported. For this to compilation errors. Function syntax is not supported. For this
reason our implementation uses \texttt{\#define} statements instead of functions. reason our implementation uses \texttt{\#define} statements instead of functions.
%ok
FIXME:
General result: limited NAT64 is working, however
No Payload ; checksumming - requires controller
Hash funktion in Arbeit ; No NDP, no ARP - focused on key factors of NAT64 translation,
other features can be supported by controller
Needed to debug internal parsing errors
debugging generated tcl code to debug impl1 error
% ---------------------------------------------------------------------- % ----------------------------------------------------------------------
\section{\label{results:softwarenat64}Software based NAT64} \section{\label{results:softwarenat64}Software based NAT64}
with Tayga and Both solutions Tayga and Jool worked flawlessly. However as expected,
Jool both solutions have a bottleneck that is CPU bound. Under high load
Both cpu bound. scenarios both solutions utilise one core fully. Neither Tayga as a
user space program nor Jool as a kernel module implement multi
During the benchmark cpu bound, single thread threading.
tayga: Single threaded %ok
easy to use
Jool kernel module
100\% cpu usage on 1 core for udp
0\% visible cpu usage for tcp, might be tcp offloading
Integration with iptables
Requires routing
% ---------------------------------------------------------------------- % ----------------------------------------------------------------------
\section{\label{results:benchmark}NAT64 Benchmarks - FIXME: explain \section{\label{results:benchmark}NAT64 Benchmarks}
numbers} In this section we summarise the benchmarking results, in the
sub sections we discuss the benchmark design and the individual results.
FIXME: summary here
MTU setting to 1500, as netpfga doesn't support jumbo frames
iperf3, iperf 3.0.11
50 parallel = 2x 100% cpu usage
40 parallel = 100%, 70% cpu usage
30 parallel = 70%-100, 70% cpu usage
Turning back on checksum offloading (see below)
30 parallel = 70%, 30% cpu usage
\subsection{\label{benchmark:tayga:tcp}Tayga/TCP}
Tayga running at 100% cpu load,
v4->v6 tcp
delivering
3.36 gbit/s at P1
3.30 Gbit/s at P20
3.11 gbit/s at P50
v6->v4 tcp
P1: 3.02 Gbit/s
P20: 3.28 gbit/s
P50: 2.85 gbit/s
Commands:
UDP load generator hitting 100\% cpu at P20.
TCP confirmed.
Over bandwidth results
Feature comparison
speed - sessions - eamt
can act as host
lpm tables
ping
ping6 support
ndp
controller support
netpfga consistent
% ---------------------------------------------------------------------- % ----------------------------------------------------------------------
\subsection{\label{results:benchmark:design}Benchmark Design} \subsection{\label{results:benchmark:design}Benchmark Design}
\begin{figure}[h] \begin{figure}[h]
@ -449,20 +481,10 @@ warm up phase.\footnote{iperf -O 10 parameter, see section \ref{design:tests}.}
\end{figure} \end{figure}
% ok % ok
% ---------------------------------------------------------------------- % ----------------------------------------------------------------------
\newpage
\subsection{\label{results:benchmark:v6v4tcp}IPv6 to IPv4 TCP
We successfully implemented P4 code to realise Benchmark Results}
NAT64~\cite{schottelius:thesisrepo}. It contains parsers some text
for all related protocols (ipv6, ipv4, udp, tcp, icmp, icmp6, ndp,
arp), supports EAMT as defined by RFC7757 ~\cite{rfc7757} and is
feature equivalent to the two compared software solutions
tayga~\cite{lutchansky:_tayga_simpl_nat64_linux} and
jool~\cite{mexico:_jool_open_sourc_siit_nat64_linux}.
Due to limitations in the P4 environment of the
NetFPGA~\cite{conclusion:netfpga} environment, the BMV2 implementation
is more feature rich. Table \ref{tab:benchmark} summarises the
achieved bandwidths of the NAT64 solutions.
\begin{table}[htbp] \begin{table}[htbp]
\begin{center}\begin{minipage}{\textwidth} \begin{center}\begin{minipage}{\textwidth}
@ -487,8 +509,8 @@ Parallel connections & 1 & 10 & 20 & 50 \\
\label{tab:benchmarkv6} \label{tab:benchmarkv6}
\end{center} \end{center}
\end{table} \end{table}
% ---------------------------------------------------------------------
\subsection{\label{results:benchmark:v4v6tcp}IPv4 to IPv6 TCP Benchmark Results}
During the benchmarks the client -- CPU usage During the benchmarks the client -- CPU usage
\begin{table}[htbp] \begin{table}[htbp]
\begin{center}\begin{minipage}{\textwidth} \begin{center}\begin{minipage}{\textwidth}
@ -514,7 +536,11 @@ Parallel connections & 1 & 10 & 20 & 50 \\
\end{center} \end{center}
\end{table} \end{table}
% ---------------------------------------------------------------------
\newpage
\subsection{\label{results:benchmark:v6v4udp}IPv6 to IPv4 UDP
Benchmark Results}
other text
\begin{table}[htbp] \begin{table}[htbp]
\begin{center}\begin{minipage}{\textwidth} \begin{center}\begin{minipage}{\textwidth}
\begin{tabular}{| c | c | c | c | c |} \begin{tabular}{| c | c | c | c | c |}
@ -540,7 +566,9 @@ Parallel connections & 1 & 10 & 20 & 50 \\
\end{center} \end{center}
\end{table} \end{table}
% ---------------------------------------------------------------------
\subsection{\label{results:benchmark:v4v6udp}IPv4 to IPv6 UDP Benchmark Results}
last text
\begin{table}[htbp] \begin{table}[htbp]
\begin{center}\begin{minipage}{\textwidth} \begin{center}\begin{minipage}{\textwidth}
\begin{tabular}{| c | c | c | c | c |} \begin{tabular}{| c | c | c | c | c |}
@ -565,18 +593,3 @@ Parallel connections & 1 & 10 & 20 & 50 \\
\label{tab:benchmarkv4} \label{tab:benchmarkv4}
\end{center} \end{center}
\end{table} \end{table}
UDP load generator hitting 100\% cpu at P20.
TCP confirmed.
Over bandwidth results
Feature comparison
speed - sessions - eamt
can act as host
lpm tables
ping
ping6 support
ndp
controller support
netpfga consistent

Binary file not shown.

File diff suppressed because it is too large Load diff