diff --git a/doc/Results.tex b/doc/Results.tex index da61410..a32ca74 100644 --- a/doc/Results.tex +++ b/doc/Results.tex @@ -16,26 +16,22 @@ P4 software implementation. \section{\label{results:p4}P4 based implementations} We successfully implemented P4 code to realise NAT64~\cite{schottelius:thesisrepo}. It contains parsers -for all related protocols (ipv6, ipv4, udp, tcp, icmp, icmp6, ndp, -arp), supports EAMT as defined by RFC7757 ~\cite{rfc7757} and is +for all related protocols (IPv6, IPv4, UDP, TCP, ICMP, ICMP6, NDP, +ARP), supports EAMT as defined by RFC7757 ~\cite{rfc7757} and is feature equivalent to the two compared software solutions tayga~\cite{lutchansky:_tayga_simpl_nat64_linux} and jool~\cite{mexico:_jool_open_sourc_siit_nat64_linux}. Due to limitations in the P4 environment of the -NetFPGA~\cite{conclusion:netfpga} environment, the BMV2 implementation -is more feature rich. Table \ref{tab:benchmark} summarises the -achieved bandwidths of the NAT64 solutions. +NetFPGA environment, the BMV2 implementation +is more feature rich. -BEFORE OR AFTER MARKER - FIXME - -All planned features could be realised with P4 and a controller. For this thesis the parsing capabilities of P4 were adequate. However P4, at the time of writing, cannot parse ICMP6 options in general, as the upper level protocol does not specify the number of options that follow and parsing of an undefined number of 64 bit blocks is required, which P4 does not support. -The language has some limitations on where the placement of +The language has some limitations on the placement of conditional statements (\texttt{if/switch}).\footnote{In general, if and switch statements in actions lead to errors, but not all constellations are forbidden.} @@ -51,7 +47,7 @@ checksum errors, the effective length of the packet was incorrect. The tooling around P4 is somewhat fragile. We encountered small language bugs during the development~\cite{schottelius:github1675}, -\ref{appendix:expressionbug} +(compare section \ref{appendix:netfpgalogs:compilelogs}) or found missing features~\cite{schottelius:github745}, ~\cite{theojepsen:_get}: it is at the moment impossible to retrieve the matching key from table or the name of the action called. Thus @@ -75,7 +71,7 @@ The supporting scripts in the P4 toolchain are usually written in python2. However python2 ``is legacy''~\cite{various:_shoul_i_python_python}. During development errors with unicode string handling in python2 caused -changes to IPv6 addresses.\footnote{Compare section ~\ref{appendix:p4:python2unicode}.} +changes to IPv6 addresses.! % ok % ---------------------------------------------------------------------- \section{\label{results:bmv2}P4/BMV2} @@ -239,7 +235,7 @@ unsupported\footnote{To support creating payload checksums, either an \subsection{\label{results:netpfga:stability}Stability} Two different NetPFGA cards were used during the development of the thesis. The first card had consistent ioctl errors (compare section -\ref{netpfgaioctlerror}) when writing table entries. The available +\ref{appendix:netfpgalogs:compilelogs}) when writing table entries. The available hardware tests (compare figures \ref{fig:hwtestnico} and \ref{fig:hwtesthendrik}) showed failures in both cards, however the first card reported an additional ``10G\_Loopback'' failure. Due to @@ -392,64 +388,15 @@ reason our implementation uses \texttt{\#define} statements instead of functions % ---------------------------------------------------------------------- \section{\label{results:softwarenat64}Software based NAT64} Both solutions Tayga and Jool worked flawlessly. However as expected, -both solutions have a bottleneck that is CPU bound. Under high load +both solutions are CPU bound. Under high load scenarios both solutions utilise one core fully. Neither Tayga as a user space program nor Jool as a kernel module implement multi threading. %ok % ---------------------------------------------------------------------- \section{\label{results:benchmark}NAT64 Benchmarks} -In this section we summarise the benchmarking results, in the -sub sections we discuss the benchmark design and the individual results. - -FIXME: summary here - -MTU setting to 1500, as netpfga doesn't support jumbo frames - - -iperf3, iperf 3.0.11 - -50 parallel = 2x 100% cpu usage -40 parallel = 100%, 70% cpu usage -30 parallel = 70%-100, 70% cpu usage - -Turning back on checksum offloading (see below) - -30 parallel = 70%, 30% cpu usage - - -\subsection{\label{benchmark:tayga:tcp}Tayga/TCP} - -Tayga running at 100% cpu load, - -v4->v6 tcp -delivering -3.36 gbit/s at P1 -3.30 Gbit/s at P20 -3.11 gbit/s at P50 - -v6->v4 tcp -P1: 3.02 Gbit/s -P20: 3.28 gbit/s -P50: 2.85 gbit/s - -Commands: - - -UDP load generator hitting 100\% cpu at P20. -TCP confirmed. -Over bandwidth results - -Feature comparison -speed - sessions - eamt -can act as host -lpm tables -ping -ping6 support -ndp -controller support - -netpfga consistent +In this section we give an overview of the benchmark design +and summarise the benchmarking results. % ---------------------------------------------------------------------- \subsection{\label{results:benchmark:design}Benchmark Design} \begin{figure}[h] @@ -481,11 +428,61 @@ warm up phase.\footnote{iperf -O 10 parameter, see section \ref{design:tests}.} \end{figure} % ok % ---------------------------------------------------------------------- +\subsection{\label{results:benchmark:summary}Benchmark Summary} +Overall \textbf{tayga} has shown to be the slowest translator with an achieved +bandwidth of \textbf{about 3 Gbit/s}, followed by \textbf{Jool} that translates at +about \textbf{8 Gbit/s}. \textbf{Our solution} is the fastest with an almost line rate +translation speed of about \textbf{9 Gbit/s}. + +The TCP based benchmarks show realistic numbers, while iperf reports +above line rate speeds (up to 22 gbit/s on a 10gbit/s link) +for UDP based benchmarks. For this reason we +have summarised the UDP based benchmarks with their average loss +instead of listing the bandwidth details. The ``adjusted bandwidth'' +in the UDP benchmarks incorporates the packets loss (compare tables +\ref{tab:benchmarkv6v4udp} and \ref{tab:benchmarkv6v4udp}). + +Both software solutions showed significant loss of packets in the UDP +based benchmarks (tayga: up to 91\%, jool up to 71\%), while the +P4/NetFPGA showed a maximum of 0.01\% packet loss. Packet loss is only +recorded by iperf for UDP based benchmarks, as TCP packets are confirmed and +resent if necessary. + +Tayga has the highest variation of results, which might be due to +being fully CPU bound even in the simplest benchmark. Jool has less +variation and in general the P4/NetFPGA solution behaves almost +identical in different benchmark runs. + +The CPU load for TCP based benchmarks with Jool was almost negligible, +however for UDP based benchmarks one core was almost 100\% +utilised. In all benchmarks with tayga, one CPU was fully +utilised. And as the translation for P4/NetFPGA happens within the +NetFPGA card, there was no CPU utilisation visible on the NAT64 host. + +We see lower bandwidth for translating IPv4 to IPv6 in all solutions. +We suspect that this might be due to slighty increasing packet sizes that +occur during this direction of translation. Not only does this vary +the IPv4 versus IPv6 bandwidth, but it might also cause fragmentation +to slowdown the process. + +During the benchmark with 1 and 10 parallel connections, no +significant CPU load was registered on the load generator. However +with 20 parallel connections, each of the two iperf +processes\footnote{One for sending, one for receiving.} partially +spiked to 100\% cpu usage, which with 50 parallel connections the cpu +load of each process hit 100\% often. + +While tayga's performance seems to reduce with the growing number of +parallel connections, both Jool and our P4/NetFPGA implementations +vary only slighty. + +Overall the performance of tayga, a Linux user space program, is as +expected. We were surprised about the good performance of Jool, which, +while slower than the P4/NetFPGA solution, is almost on par with our solution. +% ---------------------------------------------------------------------- \newpage \subsection{\label{results:benchmark:v6v4tcp}IPv6 to IPv4 TCP Benchmark Results} -some text - \begin{table}[htbp] \begin{center}\begin{minipage}{\textwidth} \begin{tabular}{| c | c | c | c | c |} @@ -509,9 +506,9 @@ Parallel connections & 1 & 10 & 20 & 50 \\ \label{tab:benchmarkv6} \end{center} \end{table} +%ok % --------------------------------------------------------------------- \subsection{\label{results:benchmark:v4v6tcp}IPv4 to IPv6 TCP Benchmark Results} -During the benchmarks the client -- CPU usage \begin{table}[htbp] \begin{center}\begin{minipage}{\textwidth} \begin{tabular}{| c | c | c | c | c |} @@ -540,7 +537,6 @@ Parallel connections & 1 & 10 & 20 & 50 \\ \newpage \subsection{\label{results:benchmark:v6v4udp}IPv6 to IPv4 UDP Benchmark Results} -other text \begin{table}[htbp] \begin{center}\begin{minipage}{\textwidth} \begin{tabular}{| c | c | c | c | c |} @@ -562,13 +558,12 @@ Parallel connections & 1 & 10 & 20 & 50 \\ \end{tabular} \end{minipage} \caption{IPv6 to IPv4 UDP NAT64 Benchmark} -\label{tab:benchmarkv4} +\label{tab:benchmarkv6v4udp} \end{center} \end{table} - +%ok % --------------------------------------------------------------------- \subsection{\label{results:benchmark:v4v6udp}IPv4 to IPv6 UDP Benchmark Results} -last text \begin{table}[htbp] \begin{center}\begin{minipage}{\textwidth} \begin{tabular}{| c | c | c | c | c |} @@ -590,6 +585,7 @@ Parallel connections & 1 & 10 & 20 & 50 \\ \end{tabular} \end{minipage} \caption{IPv4 to IPv6 UDP NAT64 Benchmark} -\label{tab:benchmarkv4} +\label{tab:benchmarkv6v4udp} \end{center} \end{table} +%ok diff --git a/doc/Thesis.pdf b/doc/Thesis.pdf index 63909d7..3c978d7 100644 Binary files a/doc/Thesis.pdf and b/doc/Thesis.pdf differ