@ -12,7 +12,7 @@ objective of this thesis was to demonstrate the high speed
capabilities of NAT64 in hardware, no benchmarks were performed on the
P4 software implementation.
% ----------------------------------------------------------------------
\section { \label { results:p4:implementation } P4 based implementation}
\section { \label { results:p4} P4 based implementations }
****** TODO IPv6 udp -> IPv4
- Got 4-5 tuple ([proto], src ip, src port, dst ip, dst port)
@ -22,140 +22,56 @@ P4 software implementation.
Only supporting /96, not other embeddings as described in
section \ref { background:transition:prefixnat} .
% ----------------------------------------------------------------------
\section { \label { results:benchmark} NAT64 Benchmarks - FIXME: explain numbers}
We successfully implemented P4 code to realise
NAT64~\cite { schottelius:thesisrepo} . It contains parsers
for all related protocols (ipv6, ipv4, udp, tcp, icmp, icmp6, ndp,
arp), supports EAMT as defined by RFC7757 ~\cite { rfc7757} and is
feature equivalent to the two compared software solutions
tayga~\cite { lutchansky:_ tayga_ simpl_ nat64_ linux} and
jool~\cite { mexico:_ jool_ open_ sourc_ siit_ nat64_ linux} .
Due to limitations in the P4 environment of the
NetFPGA~\cite { conclusion:netfpga} environment, the BMV2 implementation
is more feature rich. Table \ref { tab:benchmark} summarises the
achieved bandwidths of the NAT64 solutions.
All planned features could be realised with P4 and a controller.
The language has some limitations on where if/switch statements can be
used.\footnote { In general, if and switch statements in actions lead to
errors, but not all constellations are forbidden.}
\begin { table} [htbp]
\begin { center} \begin { minipage} { \textwidth }
\begin { tabular} { | c | c | c | c | c |}
\hline
Implementation & \multicolumn { 4} { |c|} { min/avg/max in Gbit/s} \\
\hline
Tayga & 2.79 / 3.20 / 3.43 & 3.34 / 3.36 / 3.38 & 2.57 / 3.02 / 3.27 &
2.35 / 2.91 / 3.20 \\
\hline
Jool & 8.22 / 8.22 / 8.22 & 8.21 / 8.21 / 8.22 & 8.21 / 8.23 / 8.25
& 8.21 / 8.23 / 8.25\\
\hline
P4 / NetPFGA & 9.28 / 9.28 / 9.29 & 9.28 / 9.28 / 9.29 & 9.28 / 9.28
/ 9.29 & 9.28 / 9.28 / 9.29\\
\hline
Parallel connections & 1 & 10 & 20 & 50 \\
\hline
\end { tabular}
\end { minipage}
\caption { IPv6 to IPv4 TCP NAT64 Benchmark}
\label { tab:benchmarkv6}
\end { center}
\end { table}
For this thesis the parsing capabilities of P4 were adequate. However
P4 at the time of writing cannot parse ICMP6 options, as the upper
level protocol does not specify the number of options that follow and
parsing of 64 bit blocks is required.
P4/BMV2 does not support for multiple LPM keys in a table, however it
supports multiple keys with ternary matching.
During the benchmarks the client -- CPU usage
\begin { table} [htbp]
\begin { center} \begin { minipage} { \textwidth }
\begin { tabular} { | c | c | c | c | c |}
\hline
Implementation & \multicolumn { 4} { |c|} { min/avg/max in Gbit/s} \\
\hline
Tayga & 2.90 / 3.15 / 3.34 & 2.87 / 3.01 / 3.22 &
2.68 / 2.85 / 3.09 & 2.60 / 2.78 / 2.88 \\
\hline
Jool & 7.18 / 7.56 / 8.24 & 7.97 / 8.05 / 8.09 &
8.05 / 8.08 / 8.10 & 8.10 / 8.12 / 8.13 \\
\hline
P4 / NetPFGA & 8.51 / 8.53 / 8.55 & 9.28 / 9.28 / 9.29 & 9.29 / 9.29 /
9.29 & 9.28 / 9.28 / 9.29 \\
\hline
Parallel connections & 1 & 10 & 20 & 50 \\
\hline
\end { tabular}
\end { minipage}
\caption { IPv4 to IPv6 TCP NAT64 Benchmark}
\label { tab:benchmarkv4}
\end { center}
\end { table}
When developing P4 programs, the reason for incorrect behaviour was
most often found in checksum problems. If frame checksum errors where
displayed by tcpdump, usually the effective length of the packet was
incorrect.
FIXMe: IPv6: NDP: not easy to parse, as unknown number of following fields
\begin { table} [htbp]
\begin { center} \begin { minipage} { \textwidth }
\begin { tabular} { | c | c | c | c | c |}
\hline
Implementation & \multicolumn { 4} { |c|} { avg bandwidth in gbit/s / avg loss /
adjusted bandwith} \\
\hline
Tayga & 8.02 / 70\% / 2.43 & 9.39 / 79\% / 1.97 & 15.43 / 86\% / 2.11
& 19.27 / 91\% 1.73 \\
\hline
Jool & 6.44 / 0\% / 6.41 & 6.37 / 2\% / 6.25 &
16.13 / 64\% / 5.75 & 20.83 / 71\% / 6.04 \\
\hline
P4 / NetPFGA & 8.28 / 0\% / 8.28 & 9.26 / 0\% / 9.26 &
16.15 / 0\% / 16.15 & 15.8 / 0\% / 15.8 \\
\hline
Parallel connections & 1 & 10 & 20 & 50 \\
\hline
\end { tabular}
\end { minipage}
\caption { IPv6 to IPv4 UDP NAT64 Benchmark}
\label { tab:benchmarkv4}
\end { center}
\end { table}
The tooling around P4 is still fragile, encountered many bugs
in the development.~\cite { schottelius:github1675}
or missing features (~\cite { schottelius:github745} ,
~\cite { theojepsen:_ get} )
\begin { table} [htbp]
\begin { center} \begin { minipage} { \textwidth }
\begin { tabular} { | c | c | c | c | c |}
\hline
Implementation & \multicolumn { 4} { |c|} { avg bandwidth in gbit/s / avg loss /
adjusted bandwith} \\
\hline
Tayga & 6.78 / 84\% / 1.06 & 9.58 / 90\% / 0.96 &
15.67 / 91\% / 1.41 & 20.77 / 95\% / 1.04 \\
\hline
Jool & 4.53 / 0\% / 4.53 & 4.49 / 0\% / 4.49 & 13.26 / 0\% / 13.26 &
22.57 / 0\% / 22.57\\
\hline
P4 / NetPFGA & 7.04 / 0\% / 7.04 & 9.58 / 0\% / 9.58 &
9.78 / 0\% / 9.78 & 14.37 / 0\% / 14.37\\
\hline
Parallel connections & 1 & 10 & 20 & 50 \\
\hline
\end { tabular}
\end { minipage}
\caption { IPv4 to IPv6 UDP NAT64 Benchmark}
\label { tab:benchmarkv4}
\end { center}
\end { table}
Hitting expression bug (FIXME: source)
UDP load generator hitting 100\% cpu at P20.
TCP confirmed.
Over bandwidth results
1) Impossible to retrieve key from table: LPM: addr + mask -> addr and
mask might be used in controller
Feature comparison
speed - sessions - eamt
can act as host
lpm tables
ping
ping6 support
ndp
controller support
2) retrieving information from tables : no meta information, don't
know which table matched
3) type definitions separate Code sharing (controller, switch)
No switch in actions, No conditional execution in actions
Not directly related to P4, but supporting scripts are usually written in python2, however python2
handles unicode strings differently and thus effects like an IPv6
address ``changing'' happen. ~\cite { appendix:p4:python2unicode} .
P4os - reusable code
idomatic problem: Security issue: not checking checksums before
netpfga consistent
% ----------------------------------------------------------------------
\section { \label { Results:BMV2} BMV2 - FIXME: write better}
\subsection { \label { Results:BMV2} BMV2}
The software implementation of P4 has most features, which is
mostly due to the capability of checksumming the payload: Acting
as a ``proper'' participant in NDP, requires the host to calculate
@ -243,14 +159,14 @@ Jool and tayga are supported by
% ----------------------------------------------------------------------
\section { \label { results:netpfga} NetFPGA - FIXME: writing}
\subs ection { \label { results:netpfga} NetFPGA - FIXME: writing}
The reduced feature set of the NetPFGA implementation is due to two
factors: compile time. Between 2 to 6 hours per compile run. No
payload checksum
overview - general translation - not advanced features
% ----------------------------------------------------------------------
\subsection { \label { results:netpfga:features} Features}
\subsubs ection { \label { results:netpfga:features} Features}
\begin { table} [htbp]
\begin { center} \begin { minipage} { \textwidth }
\begin { tabular} { | c | c | c |}
@ -319,7 +235,7 @@ unsupported\footnote{To support creating payload checksums, either an
\end { center}
\end { table}
% ----------------------------------------------------------------------
\subsection { \label { results:netpfga:stability} Stability}
\subsubs ection { \label { results:netpfga:stability} Stability}
Two different NetPFGA cards were used during the development of the
thesis. The first card had consistent ioctl errors (compare section
\ref { netpfgaioctlerror} ) when writing table entries. The available
@ -352,13 +268,13 @@ Sometimes it was also required to reboot the host containing the
NetFPGA card 3 times to enable successful flashing.\footnote { Typical
output of the flashing process would be: ``fpga configuration failed. DONE PIN is not HIGH''}
% ----------------------------------------------------------------------
\subsection { \label { results:netpfga:performance} Performance}
\subsubs ection { \label { results:netpfga:performance} Performance}
As expected, the NetFGPA card performed at near line speed and offers
NAT64 translations at 9.28 Gbit/s. Single and multiple streams
performed almost exactly identical and have been consistent through
multiple iterations of the benchmarks.
% ----------------------------------------------------------------------
\subsection { \label { results:netpfga:usability} Usability}
\subsubs ection { \label { results:netpfga:usability} Usability}
To use the NetFGPA, Vivado and SDNET provided by Xilinx need to be
installed. However a bug in the installer triggers an infinite loop,
if a certain shared library\footnote { The required shared library
@ -474,8 +390,9 @@ Needed to debug internal parsing errors
debugging generated tcl code to debug impl1 error
% ----------------------------------------------------------------------
\section { \label { results:softwarenat64} Software NAT64 with Tayga and
Jool}
\section { \label { results:softwarenat64} Software based NAT64}
with Tayga and
Jool
Both cpu bound.
During the benchmark cpu bound, single thread
@ -489,49 +406,136 @@ Integration with iptables
Requires routing
% ----------------------------------------------------------------------
\section { \label { results:p4} P4}
All planned features could be realised with P4 and a controller.
The language has some limitations on where if/switch statements can be
used.\footnote { In general, if and switch statements in actions lead to
errors, but not all constellations are forbidden.}
For this thesis the parsing capabilities of P4 were adequate. However
P4 at the time of writing cannot parse ICMP6 options, as the upper
level protocol does not specify the number of options that follow and
parsing of 64 bit blocks is required.
P4/BMV2 does not support for multiple LPM keys in a table, however it
supports multiple keys with ternary matching.
When developing P4 programs, the reason for incorrect behaviour was
most often found in checksum problems. If frame checksum errors where
displayed by tcpdump, usually the effective length of the packet was
incorrect.
% ----------------------------------------------------------------------
\section { \label { results:benchmark} NAT64 Benchmarks - FIXME: explain numbers}
We successfully implemented P4 code to realise
NAT64~\cite { schottelius:thesisrepo} . It contains parsers
for all related protocols (ipv6, ipv4, udp, tcp, icmp, icmp6, ndp,
arp), supports EAMT as defined by RFC7757 ~\cite { rfc7757} and is
feature equivalent to the two compared software solutions
tayga~\cite { lutchansky:_ tayga_ simpl_ nat64_ linux} and
jool~\cite { mexico:_ jool_ open_ sourc_ siit_ nat64_ linux} .
Due to limitations in the P4 environment of the
NetFPGA~\cite { conclusion:netfpga} environment, the BMV2 implementation
is more feature rich. Table \ref { tab:benchmark} summarises the
achieved bandwidths of the NAT64 solutions.
FIXMe: IPv6: NDP: not easy to parse, as unknown number of following fields
The tooling around P4 is still fragile, encountered many bugs
in the development.~\cite { schottelius:github1675}
\begin { table} [htbp]
\begin { center} \begin { minipage} { \textwidth }
\begin { tabular} { | c | c | c | c | c |}
\hline
Implementation & \multicolumn { 4} { |c|} { min/avg/max in Gbit/s} \\
\hline
Tayga & 2.79 / 3.20 / 3.43 & 3.34 / 3.36 / 3.38 & 2.57 / 3.02 / 3.27 &
2.35 / 2.91 / 3.20 \\
\hline
Jool & 8.22 / 8.22 / 8.22 & 8.21 / 8.21 / 8.22 & 8.21 / 8.23 / 8.25
& 8.21 / 8.23 / 8.25\\
\hline
P4 / NetPFGA & 9.28 / 9.28 / 9.29 & 9.28 / 9.28 / 9.29 & 9.28 / 9.28
/ 9.29 & 9.28 / 9.28 / 9.29\\
\hline
Parallel connections & 1 & 10 & 20 & 50 \\
\hline
\end { tabular}
\end { minipage}
\caption { IPv6 to IPv4 TCP NAT64 Benchmark}
\label { tab:benchmarkv6}
\end { center}
\end { table}
or missing features (~\cite { schottelius:github745} ,
~\cite { theojepsen:_ get} )
Hitting expression bug (FIXME: source)
During the benchmarks the client -- CPU usage
\begin { table} [htbp]
\begin { center} \begin { minipage} { \textwidth }
\begin { tabular} { | c | c | c | c | c |}
\hline
Implementation & \multicolumn { 4} { |c|} { min/avg/max in Gbit/s} \\
\hline
Tayga & 2.90 / 3.15 / 3.34 & 2.87 / 3.01 / 3.22 &
2.68 / 2.85 / 3.09 & 2.60 / 2.78 / 2.88 \\
\hline
Jool & 7.18 / 7.56 / 8.24 & 7.97 / 8.05 / 8.09 &
8.05 / 8.08 / 8.10 & 8.10 / 8.12 / 8.13 \\
\hline
P4 / NetPFGA & 8.51 / 8.53 / 8.55 & 9.28 / 9.28 / 9.29 & 9.29 / 9.29 /
9.29 & 9.28 / 9.28 / 9.29 \\
\hline
Parallel connections & 1 & 10 & 20 & 50 \\
\hline
\end { tabular}
\end { minipage}
\caption { IPv4 to IPv6 TCP NAT64 Benchmark}
\label { tab:benchmarkv4}
\end { center}
\end { table}
1) Impossible to retrieve key from table: LPM: addr + mask -> addr and
mask might be used in controller
2) retrieving information from tables : no meta information, don't
know which table matched
\begin { table} [htbp]
\begin { center} \begin { minipage} { \textwidth }
\begin { tabular} { | c | c | c | c | c |}
\hline
Implementation & \multicolumn { 4} { |c|} { avg bandwidth in gbit/s / avg loss /
adjusted bandwith} \\
\hline
Tayga & 8.02 / 70\% / 2.43 & 9.39 / 79\% / 1.97 & 15.43 / 86\% / 2.11
& 19.27 / 91\% 1.73 \\
\hline
Jool & 6.44 / 0\% / 6.41 & 6.37 / 2\% / 6.25 &
16.13 / 64\% / 5.75 & 20.83 / 71\% / 6.04 \\
\hline
P4 / NetPFGA & 8.28 / 0\% / 8.28 & 9.26 / 0\% / 9.26 &
16.15 / 0\% / 16.15 & 15.8 / 0\% / 15.8 \\
\hline
Parallel connections & 1 & 10 & 20 & 50 \\
\hline
\end { tabular}
\end { minipage}
\caption { IPv6 to IPv4 UDP NAT64 Benchmark}
\label { tab:benchmarkv4}
\end { center}
\end { table}
3) type definitions separate Code sharing (controller, switch)
No switch in actions, No conditional execution in actions
\begin { table} [htbp]
\begin { center} \begin { minipage} { \textwidth }
\begin { tabular} { | c | c | c | c | c |}
\hline
Implementation & \multicolumn { 4} { |c|} { avg bandwidth in gbit/s / avg loss /
adjusted bandwith} \\
\hline
Tayga & 6.78 / 84\% / 1.06 & 9.58 / 90\% / 0.96 &
15.67 / 91\% / 1.41 & 20.77 / 95\% / 1.04 \\
\hline
Jool & 4.53 / 0\% / 4.53 & 4.49 / 0\% / 4.49 & 13.26 / 0\% / 13.26 &
22.57 / 0\% / 22.57\\
\hline
P4 / NetPFGA & 7.04 / 0\% / 7.04 & 9.58 / 0\% / 9.58 &
9.78 / 0\% / 9.78 & 14.37 / 0\% / 14.37\\
\hline
Parallel connections & 1 & 10 & 20 & 50 \\
\hline
\end { tabular}
\end { minipage}
\caption { IPv4 to IPv6 UDP NAT64 Benchmark}
\label { tab:benchmarkv4}
\end { center}
\end { table}
Not directly related to P4, but supporting scripts are usually written in python2, however python2
handles unicode strings differently and thus effects like an IPv6
address ``changing'' happen. ~\cite { appendix:p4:python2unicode} .
UDP load generator hitting 100\% cpu at P20.
TCP confirmed.
Over bandwidth results
P4os - reusable code
Feature comparison
speed - sessions - eamt
can act as host
lpm tables
ping
ping6 support
ndp
controller support
idomatic problem: Security issue: not checking checksums before
netpfga consistent