Cleanup appendix
This commit is contained in:
parent
d99a65ed27
commit
27d4d449aa
4 changed files with 591 additions and 1492 deletions
|
@ -66,3 +66,6 @@ Long term supporting python3 would be helpful. P4OS.
|
||||||
|
|
||||||
|
|
||||||
- react on FIN/RST (?) -- could be an addition
|
- react on FIN/RST (?) -- could be an addition
|
||||||
|
P4os - reusable code
|
||||||
|
|
||||||
|
Future work: session handling
|
||||||
|
|
269
doc/Results.tex
269
doc/Results.tex
|
@ -14,12 +14,26 @@ P4 software implementation.
|
||||||
% ok
|
% ok
|
||||||
% ----------------------------------------------------------------------
|
% ----------------------------------------------------------------------
|
||||||
\section{\label{results:p4}P4 based implementations}
|
\section{\label{results:p4}P4 based implementations}
|
||||||
|
We successfully implemented P4 code to realise
|
||||||
|
NAT64~\cite{schottelius:thesisrepo}. It contains parsers
|
||||||
|
for all related protocols (ipv6, ipv4, udp, tcp, icmp, icmp6, ndp,
|
||||||
|
arp), supports EAMT as defined by RFC7757 ~\cite{rfc7757} and is
|
||||||
|
feature equivalent to the two compared software solutions
|
||||||
|
tayga~\cite{lutchansky:_tayga_simpl_nat64_linux} and
|
||||||
|
jool~\cite{mexico:_jool_open_sourc_siit_nat64_linux}.
|
||||||
|
Due to limitations in the P4 environment of the
|
||||||
|
NetFPGA~\cite{conclusion:netfpga} environment, the BMV2 implementation
|
||||||
|
is more feature rich. Table \ref{tab:benchmark} summarises the
|
||||||
|
achieved bandwidths of the NAT64 solutions.
|
||||||
|
|
||||||
|
BEFORE OR AFTER MARKER - FIXME
|
||||||
|
|
||||||
All planned features could be realised with P4 and a controller.
|
All planned features could be realised with P4 and a controller.
|
||||||
For this thesis the parsing capabilities of P4 were adequate.
|
For this thesis the parsing capabilities of P4 were adequate.
|
||||||
However P4, at the time of writing, cannot parse ICMP6 options in
|
However P4, at the time of writing, cannot parse ICMP6 options in
|
||||||
general, as the upper level protocol does not specify the number
|
general, as the upper level protocol does not specify the number
|
||||||
of options that follow and parsing of an undefined number
|
of options that follow and parsing of an undefined number
|
||||||
of 64 bit blocks is required.
|
of 64 bit blocks is required, which P4 does not support.
|
||||||
|
|
||||||
The language has some limitations on where the placement of
|
The language has some limitations on where the placement of
|
||||||
conditional statements (\texttt{if/switch}).\footnote{In general,
|
conditional statements (\texttt{if/switch}).\footnote{In general,
|
||||||
|
@ -61,34 +75,15 @@ The supporting scripts in the P4 toolchain are usually written in
|
||||||
python2. However python2 ``is
|
python2. However python2 ``is
|
||||||
legacy''~\cite{various:_shoul_i_python_python}. During development
|
legacy''~\cite{various:_shoul_i_python_python}. During development
|
||||||
errors with unicode string handling in python2 caused
|
errors with unicode string handling in python2 caused
|
||||||
changes to IPv6 addresses.~\ref{appendix:p4:python2unicode}
|
changes to IPv6 addresses.\footnote{Compare section ~\ref{appendix:p4:python2unicode}.}
|
||||||
|
% ok
|
||||||
P4os - reusable code
|
|
||||||
|
|
||||||
% idomatic problem: Security issue: not checking checksums before
|
|
||||||
|
|
||||||
|
|
||||||
****** TODO IPv6 udp -> IPv4
|
|
||||||
- Got 4-5 tuple ([proto], src ip, src port, dst ip, dst port)
|
|
||||||
- Does not / never signal end
|
|
||||||
- Needs timeout for cleaning up
|
|
||||||
|
|
||||||
P4/BMV2 thus
|
|
||||||
allows us to closest resemble any other translation implementation.
|
|
||||||
|
|
||||||
Only supporting /96, not other embeddings as described in
|
|
||||||
section \ref{background:transition:prefixnat}.
|
|
||||||
|
|
||||||
% ----------------------------------------------------------------------
|
% ----------------------------------------------------------------------
|
||||||
\subsection{\label{Results:BMV2}BMV2}
|
\section{\label{results:bmv2}P4/BMV2}
|
||||||
The software implementation of P4 has most features, which is
|
The software implementation of P4 has most features, which is
|
||||||
mostly due to the capability of checksumming the payload: Acting
|
mostly due to the capability of creating checksums over the payload.
|
||||||
as a ``proper'' participant in NDP, requires the host to calculate
|
It enables the switch to act as a ``proper'' participant in NDP, as
|
||||||
checksums over the payload.
|
this requires the host to calculate checksums over the payload.
|
||||||
|
Table~\ref{tab:p4bmv2features} references all implemented features.
|
||||||
|
|
||||||
List of features BMV2 ~\cite{tab:p4bmv2features}
|
|
||||||
|
|
||||||
\begin{table}[htbp]
|
\begin{table}[htbp]
|
||||||
\begin{center}\begin{minipage}{\textwidth}
|
\begin{center}\begin{minipage}{\textwidth}
|
||||||
\begin{tabular}{| c | c | c |}
|
\begin{tabular}{| c | c | c |}
|
||||||
|
@ -144,38 +139,34 @@ fully implemented\footnote{Source code: \texttt{checksum\_bmv2.p4}}\\
|
||||||
\label{tab:p4bmv2features}
|
\label{tab:p4bmv2features}
|
||||||
\end{center}
|
\end{center}
|
||||||
\end{table}
|
\end{table}
|
||||||
|
The switch responds to ICMP echo requests, ICMP6 echo requests,
|
||||||
|
answers NDP and ARP requests. Overall P4/BMV is very easy to use
|
||||||
|
even without a controller a fully functional network host can be
|
||||||
|
implemented.
|
||||||
|
|
||||||
Responds to icmp, icmp6
|
This P4/BMV implementation supports translating ICMP/ICMP6
|
||||||
ndp ~\cite{rfc4861}
|
echo request and echo reply messages, but does not support
|
||||||
arp
|
all ICMP/ICMP6 translations that are defined in
|
||||||
|
|
||||||
very easy to use
|
|
||||||
|
|
||||||
Fully functional host
|
|
||||||
Can compute checksums on its own.
|
|
||||||
|
|
||||||
focus on typical use cases of icmp, icmp6, the software implementation
|
|
||||||
supports translating echo request and echo reply messages, but does
|
|
||||||
not support all ICMP/ICMP6 translations that are defined in
|
|
||||||
RFC6145~\cite{rfc6145}.
|
RFC6145~\cite{rfc6145}.
|
||||||
|
|
||||||
Stateful : no automatic removal
|
|
||||||
|
|
||||||
Session management not benchmarked, as it is only a matter of creating
|
|
||||||
table entries.
|
|
||||||
|
|
||||||
Jool and tayga are supported by
|
|
||||||
|
|
||||||
|
|
||||||
% ----------------------------------------------------------------------
|
% ----------------------------------------------------------------------
|
||||||
\subsection{\label{results:netpfga}NetFPGA - FIXME: writing}
|
\section{\label{results:netpfga}P4/NetFPGA}
|
||||||
The reduced feature set of the NetPFGA implementation is due to two
|
In the following section we describe the achieved feature set of
|
||||||
factors: compile time. Between 2 to 6 hours per compile run. No
|
P4/NetFPGA in detail and analyse differences to the BMV2 based
|
||||||
payload checksum
|
implementation.
|
||||||
|
% ok
|
||||||
overview - general translation - not advanced features
|
|
||||||
% ----------------------------------------------------------------------
|
% ----------------------------------------------------------------------
|
||||||
\subsubsection{\label{results:netpfga:features}Features}
|
\subsection{\label{results:netpfga:features}Features}
|
||||||
|
While the NetFPGA target supports P4, compared to P4/BMV2
|
||||||
|
we only implemented a reduced features set on P4/NetPFGA. The first
|
||||||
|
reason for this is missing
|
||||||
|
support of the NetFPGA P4 compiler to inspect payload and to compute
|
||||||
|
checksums over payload. While this can (partially) be compensated
|
||||||
|
using delta checksums, the compile time of 2 to 6 hours contributed to
|
||||||
|
a significant slower development cycle compared to BMV2.
|
||||||
|
Lastly, the focus of this thesis was to implement high speed NAT64 on
|
||||||
|
P4, which only requires a subset of the features that we realised on
|
||||||
|
BMV2. Table \ref{tab:p4netpfgafeatures} summarises the implemented
|
||||||
|
features and reasons about their implementation status.
|
||||||
\begin{table}[htbp]
|
\begin{table}[htbp]
|
||||||
\begin{center}\begin{minipage}{\textwidth}
|
\begin{center}\begin{minipage}{\textwidth}
|
||||||
\begin{tabular}{| c | c | c |}
|
\begin{tabular}{| c | c | c |}
|
||||||
|
@ -243,8 +234,9 @@ unsupported\footnote{To support creating payload checksums, either an
|
||||||
\label{tab:p4netpfgafeatures}
|
\label{tab:p4netpfgafeatures}
|
||||||
\end{center}
|
\end{center}
|
||||||
\end{table}
|
\end{table}
|
||||||
|
% ok
|
||||||
% ----------------------------------------------------------------------
|
% ----------------------------------------------------------------------
|
||||||
\subsubsection{\label{results:netpfga:stability}Stability}
|
\subsection{\label{results:netpfga:stability}Stability}
|
||||||
Two different NetPFGA cards were used during the development of the
|
Two different NetPFGA cards were used during the development of the
|
||||||
thesis. The first card had consistent ioctl errors (compare section
|
thesis. The first card had consistent ioctl errors (compare section
|
||||||
\ref{netpfgaioctlerror}) when writing table entries. The available
|
\ref{netpfgaioctlerror}) when writing table entries. The available
|
||||||
|
@ -266,25 +258,33 @@ on the first NetFPGA card.
|
||||||
\label{fig:hwtesthendrik}
|
\label{fig:hwtesthendrik}
|
||||||
\end{figure}
|
\end{figure}
|
||||||
During the development and benchmarking, the second NetFPGA card stopped to
|
During the development and benchmarking, the second NetFPGA card stopped to
|
||||||
function properly multiple times. In both cases the card would not
|
function properly multiple times. In theses cases the card would not
|
||||||
forward packets anymore. Multiple reboots (3 were usually enough)
|
forward packets anymore. Multiple reboots (up to 3)
|
||||||
and multiple times reflashing the bitstream to the NetFPGA usually
|
and multiple times reflashing the bitstream to the NetFPGA usually
|
||||||
restored the intended behaviour. However due to this ``crashes'', it
|
restored the intended behaviour. However due to this ``crashes'', it
|
||||||
was impossible to complete a full benchmark run that would last for
|
was impossible for us run a benchmark for more than one hour.
|
||||||
more than one hour.
|
Similariy, sometimes flashing the bitstream to the NetFPGA would fail.
|
||||||
|
It was required to reboot the host containing the
|
||||||
Sometimes it was also required to reboot the host containing the
|
NetFPGA card up to 3 times to enable successful flashing.\footnote{Typical
|
||||||
NetFPGA card 3 times to enable successful flashing.\footnote{Typical
|
output of the flashing process would be: ``fpga configuration
|
||||||
output of the flashing process would be: ``fpga configuration failed. DONE PIN is not HIGH''}
|
failed. DONE PIN is not HIGH''}
|
||||||
|
% ok
|
||||||
% ----------------------------------------------------------------------
|
% ----------------------------------------------------------------------
|
||||||
\subsubsection{\label{results:netpfga:performance}Performance}
|
\subsubsection{\label{results:netpfga:performance}Performance}
|
||||||
As expected, the NetFGPA card performed at near line speed and offers
|
The NetFGPA card performed at near line speed and offers
|
||||||
NAT64 translations at 9.28 Gbit/s. Single and multiple streams
|
NAT64 translations at 9.28 Gbit/s (see section \ref{results:benchmark}
|
||||||
|
for details).
|
||||||
|
Single and multiple streams
|
||||||
performed almost exactly identical and have been consistent through
|
performed almost exactly identical and have been consistent through
|
||||||
multiple iterations of the benchmarks.
|
multiple iterations of the benchmarks.
|
||||||
|
% ok
|
||||||
% ----------------------------------------------------------------------
|
% ----------------------------------------------------------------------
|
||||||
\subsubsection{\label{results:netpfga:usability}Usability}
|
\subsection{\label{results:netpfga:usability}Usability}
|
||||||
To use the NetFGPA, Vivado and SDNET provided by Xilinx need to be
|
The handling and usability of the NetFPGA card is rather difficult. In
|
||||||
|
this section we describe our findings and experiences with the card
|
||||||
|
and its toolchain.
|
||||||
|
|
||||||
|
To use the NetFGPA, the tools Vivado and SDNET provided by Xilinx need to be
|
||||||
installed. However a bug in the installer triggers an infinite loop,
|
installed. However a bug in the installer triggers an infinite loop,
|
||||||
if a certain shared library\footnote{The required shared library
|
if a certain shared library\footnote{The required shared library
|
||||||
is libncurses5.} is missing on the target operating system. The
|
is libncurses5.} is missing on the target operating system. The
|
||||||
|
@ -388,36 +388,68 @@ techniques are missing or not supported.
|
||||||
Renaming variables in the declaration of the parser or deparser lead
|
Renaming variables in the declaration of the parser or deparser lead
|
||||||
to compilation errors. Function syntax is not supported. For this
|
to compilation errors. Function syntax is not supported. For this
|
||||||
reason our implementation uses \texttt{\#define} statements instead of functions.
|
reason our implementation uses \texttt{\#define} statements instead of functions.
|
||||||
|
%ok
|
||||||
FIXME:
|
|
||||||
|
|
||||||
General result: limited NAT64 is working, however
|
|
||||||
No Payload ; checksumming - requires controller
|
|
||||||
Hash funktion in Arbeit ; No NDP, no ARP - focused on key factors of NAT64 translation,
|
|
||||||
other features can be supported by controller
|
|
||||||
Needed to debug internal parsing errors
|
|
||||||
debugging generated tcl code to debug impl1 error
|
|
||||||
|
|
||||||
% ----------------------------------------------------------------------
|
% ----------------------------------------------------------------------
|
||||||
\section{\label{results:softwarenat64}Software based NAT64}
|
\section{\label{results:softwarenat64}Software based NAT64}
|
||||||
with Tayga and
|
Both solutions Tayga and Jool worked flawlessly. However as expected,
|
||||||
Jool
|
both solutions have a bottleneck that is CPU bound. Under high load
|
||||||
Both cpu bound.
|
scenarios both solutions utilise one core fully. Neither Tayga as a
|
||||||
|
user space program nor Jool as a kernel module implement multi
|
||||||
During the benchmark cpu bound, single thread
|
threading.
|
||||||
tayga: Single threaded
|
%ok
|
||||||
easy to use
|
|
||||||
|
|
||||||
Jool kernel module
|
|
||||||
100\% cpu usage on 1 core for udp
|
|
||||||
0\% visible cpu usage for tcp, might be tcp offloading
|
|
||||||
Integration with iptables
|
|
||||||
Requires routing
|
|
||||||
|
|
||||||
|
|
||||||
% ----------------------------------------------------------------------
|
% ----------------------------------------------------------------------
|
||||||
\section{\label{results:benchmark}NAT64 Benchmarks - FIXME: explain
|
\section{\label{results:benchmark}NAT64 Benchmarks}
|
||||||
numbers}
|
In this section we summarise the benchmarking results, in the
|
||||||
|
sub sections we discuss the benchmark design and the individual results.
|
||||||
|
|
||||||
|
FIXME: summary here
|
||||||
|
|
||||||
|
MTU setting to 1500, as netpfga doesn't support jumbo frames
|
||||||
|
|
||||||
|
|
||||||
|
iperf3, iperf 3.0.11
|
||||||
|
|
||||||
|
50 parallel = 2x 100% cpu usage
|
||||||
|
40 parallel = 100%, 70% cpu usage
|
||||||
|
30 parallel = 70%-100, 70% cpu usage
|
||||||
|
|
||||||
|
Turning back on checksum offloading (see below)
|
||||||
|
|
||||||
|
30 parallel = 70%, 30% cpu usage
|
||||||
|
|
||||||
|
|
||||||
|
\subsection{\label{benchmark:tayga:tcp}Tayga/TCP}
|
||||||
|
|
||||||
|
Tayga running at 100% cpu load,
|
||||||
|
|
||||||
|
v4->v6 tcp
|
||||||
|
delivering
|
||||||
|
3.36 gbit/s at P1
|
||||||
|
3.30 Gbit/s at P20
|
||||||
|
3.11 gbit/s at P50
|
||||||
|
|
||||||
|
v6->v4 tcp
|
||||||
|
P1: 3.02 Gbit/s
|
||||||
|
P20: 3.28 gbit/s
|
||||||
|
P50: 2.85 gbit/s
|
||||||
|
|
||||||
|
Commands:
|
||||||
|
|
||||||
|
|
||||||
|
UDP load generator hitting 100\% cpu at P20.
|
||||||
|
TCP confirmed.
|
||||||
|
Over bandwidth results
|
||||||
|
|
||||||
|
Feature comparison
|
||||||
|
speed - sessions - eamt
|
||||||
|
can act as host
|
||||||
|
lpm tables
|
||||||
|
ping
|
||||||
|
ping6 support
|
||||||
|
ndp
|
||||||
|
controller support
|
||||||
|
|
||||||
|
netpfga consistent
|
||||||
% ----------------------------------------------------------------------
|
% ----------------------------------------------------------------------
|
||||||
\subsection{\label{results:benchmark:design}Benchmark Design}
|
\subsection{\label{results:benchmark:design}Benchmark Design}
|
||||||
\begin{figure}[h]
|
\begin{figure}[h]
|
||||||
|
@ -449,20 +481,10 @@ warm up phase.\footnote{iperf -O 10 parameter, see section \ref{design:tests}.}
|
||||||
\end{figure}
|
\end{figure}
|
||||||
% ok
|
% ok
|
||||||
% ----------------------------------------------------------------------
|
% ----------------------------------------------------------------------
|
||||||
|
\newpage
|
||||||
|
\subsection{\label{results:benchmark:v6v4tcp}IPv6 to IPv4 TCP
|
||||||
We successfully implemented P4 code to realise
|
Benchmark Results}
|
||||||
NAT64~\cite{schottelius:thesisrepo}. It contains parsers
|
some text
|
||||||
for all related protocols (ipv6, ipv4, udp, tcp, icmp, icmp6, ndp,
|
|
||||||
arp), supports EAMT as defined by RFC7757 ~\cite{rfc7757} and is
|
|
||||||
feature equivalent to the two compared software solutions
|
|
||||||
tayga~\cite{lutchansky:_tayga_simpl_nat64_linux} and
|
|
||||||
jool~\cite{mexico:_jool_open_sourc_siit_nat64_linux}.
|
|
||||||
Due to limitations in the P4 environment of the
|
|
||||||
NetFPGA~\cite{conclusion:netfpga} environment, the BMV2 implementation
|
|
||||||
is more feature rich. Table \ref{tab:benchmark} summarises the
|
|
||||||
achieved bandwidths of the NAT64 solutions.
|
|
||||||
|
|
||||||
|
|
||||||
\begin{table}[htbp]
|
\begin{table}[htbp]
|
||||||
\begin{center}\begin{minipage}{\textwidth}
|
\begin{center}\begin{minipage}{\textwidth}
|
||||||
|
@ -487,8 +509,8 @@ Parallel connections & 1 & 10 & 20 & 50 \\
|
||||||
\label{tab:benchmarkv6}
|
\label{tab:benchmarkv6}
|
||||||
\end{center}
|
\end{center}
|
||||||
\end{table}
|
\end{table}
|
||||||
|
% ---------------------------------------------------------------------
|
||||||
|
\subsection{\label{results:benchmark:v4v6tcp}IPv4 to IPv6 TCP Benchmark Results}
|
||||||
During the benchmarks the client -- CPU usage
|
During the benchmarks the client -- CPU usage
|
||||||
\begin{table}[htbp]
|
\begin{table}[htbp]
|
||||||
\begin{center}\begin{minipage}{\textwidth}
|
\begin{center}\begin{minipage}{\textwidth}
|
||||||
|
@ -514,7 +536,11 @@ Parallel connections & 1 & 10 & 20 & 50 \\
|
||||||
\end{center}
|
\end{center}
|
||||||
\end{table}
|
\end{table}
|
||||||
|
|
||||||
|
% ---------------------------------------------------------------------
|
||||||
|
\newpage
|
||||||
|
\subsection{\label{results:benchmark:v6v4udp}IPv6 to IPv4 UDP
|
||||||
|
Benchmark Results}
|
||||||
|
other text
|
||||||
\begin{table}[htbp]
|
\begin{table}[htbp]
|
||||||
\begin{center}\begin{minipage}{\textwidth}
|
\begin{center}\begin{minipage}{\textwidth}
|
||||||
\begin{tabular}{| c | c | c | c | c |}
|
\begin{tabular}{| c | c | c | c | c |}
|
||||||
|
@ -540,7 +566,9 @@ Parallel connections & 1 & 10 & 20 & 50 \\
|
||||||
\end{center}
|
\end{center}
|
||||||
\end{table}
|
\end{table}
|
||||||
|
|
||||||
|
% ---------------------------------------------------------------------
|
||||||
|
\subsection{\label{results:benchmark:v4v6udp}IPv4 to IPv6 UDP Benchmark Results}
|
||||||
|
last text
|
||||||
\begin{table}[htbp]
|
\begin{table}[htbp]
|
||||||
\begin{center}\begin{minipage}{\textwidth}
|
\begin{center}\begin{minipage}{\textwidth}
|
||||||
\begin{tabular}{| c | c | c | c | c |}
|
\begin{tabular}{| c | c | c | c | c |}
|
||||||
|
@ -565,18 +593,3 @@ Parallel connections & 1 & 10 & 20 & 50 \\
|
||||||
\label{tab:benchmarkv4}
|
\label{tab:benchmarkv4}
|
||||||
\end{center}
|
\end{center}
|
||||||
\end{table}
|
\end{table}
|
||||||
|
|
||||||
UDP load generator hitting 100\% cpu at P20.
|
|
||||||
TCP confirmed.
|
|
||||||
Over bandwidth results
|
|
||||||
|
|
||||||
Feature comparison
|
|
||||||
speed - sessions - eamt
|
|
||||||
can act as host
|
|
||||||
lpm tables
|
|
||||||
ping
|
|
||||||
ping6 support
|
|
||||||
ndp
|
|
||||||
controller support
|
|
||||||
|
|
||||||
netpfga consistent
|
|
||||||
|
|
BIN
doc/Thesis.pdf
BIN
doc/Thesis.pdf
Binary file not shown.
1807
doc/appendix.tex
1807
doc/appendix.tex
File diff suppressed because it is too large
Load diff
Loading…
Reference in a new issue