\chapter{\label{results}Results} %** Results.tex: What were the results achieved including an evaluation % This section describes the achieved results and compares the P4 based implementation with real world software solutions. We distinguish the software implementation of P4 (BMV2) and the hardware implementation (NetFPGA) due to significant differences in deployment and development. We present benchmarks for the existing software solutions as well as for our hardware implementation. As the objective of this thesis was to demonstrate the high speed capabilities of NAT64 in hardware, no benchmarks were performed on the P4 software implementation. % ---------------------------------------------------------------------- \section{\label{results:p4}NAT64 Overview - FIXME: verify numbers} We successfully implemented P4 code to realise NAT64\cite{schottelius:thesisrepo}. It contains parsers for all related protocols (ipv6, ipv4, udp, tcp, icmp, icmp6, ndp, arp), supports EAMT as defined by RFC7757 \cite{rfc7757} and is feature equivalent to the two compared software solutions tayga\cite{lutchansky:_tayga_simpl_nat64_linux} and jool\cite{mexico:_jool_open_sourc_siit_nat64_linux}. Due to limitations in the P4 environment of the NetFPGA\cite{conclusion:netfpga} environment, the BMV2 implementation is more feature rich. Table \ref{tab:benchmark} summarises the achieved bandwidths of the NAT64 solutions. \begin{table}[htbp] \begin{center}\begin{minipage}{\textwidth} \begin{tabular}{| c | c | c | c |} \hline Solution & \multicolumn{3}{|c|}{Parallel connections} \\ & 1 & 20 & 3 \\ \hline Tayga & 3.02 & 3.28 & 2.85\\ \hline Jool & 6.67 & 16.8 ?? & 20.5 udp?\\ \hline P4 / NetPFGA & 9.28 & 9.29 & 9.29\\ \hline \end{tabular} \end{minipage} \caption{NAT64 Benchmark (client: IPv6, server: IPv4), all results in Gbit/sec (\%loss)} \label{tab:benchmarkv6} \end{center} \end{table} During the benchmarks the client -- CPU usage \begin{table}[htbp] \begin{center}\begin{minipage}{\textwidth} \begin{tabular}{| c | c | c | c |} \hline Solution & \multicolumn{3}{|c|}{Parallel connections} \\ & 1 & 20 & 3 \\ \hline Tayga & 3.36 & 3.29 & 3.11 \\ \hline Jool & 8.24 & 8.26 & 8.29\\ \hline P4 / NetPFGA & 8.43 & 9.29 & 9.29\\ \hline \end{tabular} \end{minipage} \caption{NAT64 Benchmark (client: IPv4, server: IPv6), all results in Gbit/sec (\%loss)} \label{tab:benchmarkv4} \end{center} \end{table} Feature comparison speed - sessions - eamt can act as host lpm tables ping ping6 support ndp controller support % ---------------------------------------------------------------------- \section{\label{Results:BMV2}BMV2} The software implementation of P4 has most features, which is mostly due to the capability of checksumming the payload: Acting as a ``proper'' participant in NDP, requires the host to calculate checksums over the payload. List of features: \begin{table}[htbp] \begin{center}\begin{minipage}{\textwidth} \begin{tabular}{| c | c | c |} \hline \textbf{Feature} & \textbf{Description} & \textbf{Status} \\ \hline Switch to controller & Switch forwards unhandeled packets to controller & fully implemented\footnote{Source code: \texttt{actions\_egress.p4}}\\ \hline Controller to Switch & Controller can setup table entries & fully implemented\footnote{Source code: \texttt{controller.py}}\\ \hline NDP & Switch responds to ICMP6 neighbor & \\ & solicitation request (without controller) & fully implemented\footnote{Source code: \texttt{actions\_icmp6\_ndp\_icmp.p4}} \\ \hline ARP & Switch can answer ARP request (without controller) & fully implemented\footnote{Source code: \texttt{actions\_arp.p4}}\\ \hline ICMP6 & Switch responds to ICMP6 echo request (without controller) & fully implemented\footnote{Source code: \texttt{actions\_icmp6\_ndp\_icmp.p4}} \\ \hline ICMP & Switch responds to ICMP echo request (without controller) & fully implemented\footnote{Source code: \texttt{actions\_icmp6\_ndp\_icmp.p4}} \\ \hline NAT64: TCP & Switch translates TCP with checksumming & \\ & from/to IPv6 to/from IPv4 & fully implemented\footnote{Source code: \texttt{actions\_nat64\_generic\_icmp.p4}} \\ \hline NAT64: UDP & Switch translates UDP with checksumming & \\ & from/to IPv6 to/from IPv4 & fully implemented\footnote{Source code: \texttt{actions\_nat64\_generic\_icmp.p4}} \\ \hline NAT64: & Switch translates echo request/reply & \\ ICMP/ICMP6 & from/to ICMP6 to/from ICMP with checksumming & fully implemented\footnote{Source code: \texttt{actions\_nat64\_generic\_icmp.p4}} \\ \hline NAT64: Sessions & Switch and controller create 1:n sessions/mappings & fully implemented\footnote{Source code: \texttt{actions\_nat64\_session.p4}, \texttt{controller.py}} \\ \hline Delta Checksum & Switch can calculate checksum without payload inspection & fully implemented\footnote{Source code: \texttt{actions\_delta\_checksum.p4}}\\ \hline Payload Checksum & Switch can calculate checksum with payload inspection & fully implemented\footnote{Source code: \texttt{checksum\_bmv2.p4}}\\ \hline \end{tabular} \end{minipage} \caption{P4 / BMV2 feature list} \label{tab:p4bmv2features} \end{center} \end{table} Responds to icmp, icmp6 ndp \cite{rfc4861} arp Fully functional host Can compute checksums on its own. focus on typical use cases of icmp, icmp6, the software implementation supports translating echo request and echo reply messages, but does not support all ICMP/ICMP6 translations that are defined in RFC6145\cite{rfc6145}. Stateful : no automatic removal Session management not benchmarked, as it is only a matter of creating table entries. Jool and tayga are supported by % ---------------------------------------------------------------------- \section{\label{Results:NetPFGA}NetFPGA} The reduced feature set of the NetPFGA implementation is due to two factors: compile time. Between 2 to 6 hours per compile run. No payload checksum \begin{table}[htbp] \begin{center}\begin{minipage}{\textwidth} \begin{tabular}{| c | c | c |} \hline \textbf{Feature} & \textbf{Description} & \textbf{Status} \\ \hline Switch to controller & Switch forwards unhandeled packets to controller & portable\footnote{While the NetFPGA P4 implementation does not have the clone3() extern that the BMV2 implementation offers, communication to the controller can easily be realised by using one of the additional ports of the NetFPGA and connect a physical network card to it.}\\ \hline Controller to Switch & Controller can setup table entries & portable\footnote{The p4utils suite offers an easy access to the switch tables. While the P4-NetFPGA support repository also offers python scripts to modify the switch tables, the code is less sophisticated and more fragile.}\\ \hline NDP & Switch responds to ICMP6 neighbor & \\ & solicitation request (without controller) & portable\footnote{NetFPGA/P4 does not offer calculating the checksume over the payload. However delta checksumming can be used to create the required checksum for replying.} \\ \hline ARP & Switch can answer ARP request (without controller) & portable\footnote{As ARP does not use checksums, integrating the source code \texttt{actions\_arp.p4} into the netpfga code base is enough to enable ARP support in the NetPFGA.} \\ \hline ICMP6 & Switch responds to ICMP6 echo request (without controller) & portable\footnote{Same reasoning as NDP.} \\ \hline ICMP & Switch responds to ICMP echo request (without controller) & portable\footnote{Same reasoning as NDP.} \\ \hline NAT64: TCP & Switch translates TCP with checksumming & \\ & from/to IPv6 to/from IPv4 & fully implemented\footnote{Source code: \texttt{actions\_nat64\_generic\_icmp.p4}} \\ \hline NAT64: UDP & Switch translates UDP with checksumming & \\ & from/to IPv6 to/from IPv4 & fully implemented\footnote{Source code: \texttt{actions\_nat64\_generic\_icmp.p4}} \\ \hline NAT64: & Switch translates echo request/reply & \\ ICMP/ICMP6 & from/to ICMP6 to/from ICMP with checksumming & portable\footnote{ICMP/ICMP6 translations only require enabling the icmp/icmp6 code in the netpfga code base.} \\ \hline NAT64: Sessions & Switch and controller create 1:n sessions/mappings & portable\footnote{Same reasoning as ``Controller to switch''.} \\ \hline Delta Checksum & Switch can calculate checksum without payload inspection & fully implemented\footnote{Source code: \texttt{actions\_delta\_checksum.p4}}\\ \hline Payload Checksum & Switch can calculate checksum with payload inspection & unsupported\footnote{To support creating payload checksums, either an HDL module needs to be created or to modify the generated the PX program.\cite{schottelius:_exter_p4_netpf}} \\ \hline \end{tabular} \end{minipage} \caption{P4 / NetFPGA feature list} \label{tab:p4netpfgafeatures} \end{center} \end{table} % ---------------------------------------------------------------------- \subsection{\label{results:netpfga:stability}Stability} Two different NetPFGA cards were used during the development of the thesis. The first card had consistent ioctl errors (compare section \ref{netpfgaioctlerror}) when writing table entries. The available hardware tests (compare figures \ref{fig:hwtestnico} and \ref{fig:hwtesthendrik}) showed failures in both cards, however the first card reported an additional ``10G\_Loopback'' failure. Due to the inability of setting table entries, no benchmarking was performed on the first NetFPGA card. \begin{figure}[h] \includegraphics[scale=1.4]{hwtestnico} \centering \caption{Hardware Test NetPFGA card 1} \label{fig:hwtestnico} \end{figure} \begin{figure}[h] \includegraphics[scale=0.2]{hwtesthendrik} \centering \caption{Hardware Test NetPFGA card 2, \cite{hendrik:_p4_progr_fpga_semes_thesis_sa}} \label{fig:hwtesthendrik} \end{figure} During the development and benchmarking, the second NetFPGA card stopped to function properly multiple times. In both cases the card would not forward packets anymore. Multiple reboots (3 were usually enough) and multiple times reflashing the bitstream to the NetFPGA usually restored the intended behaviour. % ---------------------------------------------------------------------- \subsection{\label{results:netpfga:performance}Performance} As expected, the NetFGPA card performed at near line speed and offers NAT64 translations at 9.28 Gbit/s. Checksum computation Trace files \begin{verbatim} create mode 100644 pcap/tcp-udp-delta-2019-07-17-1555-h1.pcap create mode 100644 pcap/tcp-udp-delta-2019-07-17-1555-h3.pcap create mode 100644 pcap/tcp-udp-delta-2019-07-17-1557-h1.pcap create mode 100644 pcap/tcp-udp-delta-2019-07-17-1558-h3.pcap \end{verbatim} \begin{verbatim} *** DONE 2019-07-21: Proof of v6->v4 working delta based CLOSED: [2019-07-21 Sun 12:30] #+BEGIN_CENTER pcap/tcp-udp-delta-from-v6-2019-07-21-0853-h1.pcap | Bin 0 -> 4252 bytes pcap/tcp-udp-delta-from-v6-2019-07-21-0853-h3.pcap | Bin 0 -> 2544 bytes #+END_CENTER \end{verbatim} \begin{verbatim} **** DONE Testing v4->v6 tcp: ok (version 10.0) CLOSED: [2019-08-04 Sun 09:15] #+BEGIN_CENTER nico@ESPRIMO-P956:~/master-thesis/bin$ ./socat-connect-tcp-v4 + echo from-v4-ok + socat - TCP:10.0.0.66:2345 TCPv6-ok nico@ESPRIMO-P956:~/master-thesis/bin$ ./socat-listen-tcp-v6 from-v4-ok #+END_CENTER trace: netfpga-nat64-2019-08-04-0907-enp2s0f0.pcap netfpga-nat64-2019-08-04-0907-enp2s0f1.pcap **** DONE Testing v4->v6 udp: ok (version 10.1) trace: create mode 100644 pcap/netfpga-nat64-udp-2019-08-04-0913-enp2s0f0.pcap create mode 100644 pcap/netfpga-nat64-udp-2019-08-04-0913-enp2s0f1.pcap \end{verbatim} \begin{verbatim} *** DONE 2019-08-04: version 10.1/10.2: new maxpacketregion: v4->v6 works CLOSED: [2019-08-04 Sun 19:42] #+BEGIN_CENTER nico@ESPRIMO-P956:~/master-thesis/bin$ ./init_ipv4_esprimo.sh nico@ESPRIMO-P956:~/master-thesis/bin$ ./set_ipv4_neighbor.sh #+END_CENTER Test 20 first: - Does't work -> missed to add table entries - Does work after setting table entries - 300 works - 1450 works - 1500 does not work Proof: create mode 100644 pcap/netfpga-10.2-maxpacket-2019-08-04-1931-enp2s0f0.pcap create mode 100644 pcap/netfpga-10.2-maxpacket-2019-08-04-1931-enp2s0f1.pcap \end{verbatim} \begin{verbatim} *** DONE 2019-08-04: test v6 -> v4: works for 1420 CLOSED: [2019-08-04 Sun 20:30] Proof: #+BEGIN_CENTER create mode 100644 pcap/netfpga-10.2-fromv6tov4-2019-08-04-1943-enp2s0f0.pcap create mode 100644 pcap/netfpga-10.2-fromv6tov4-2019-08-04-1943-enp2s0f1.pcap \end{verbatim} General result: limited NAT64 is working, however No Payload checksumming - requires controller Hash funktion in Arbeit No NDP, no ARP - focused on key factors of NAT64 translation, other features can be supported by controller % ---------------------------------------------------------------------- \section{\label{results:tayga}Tayga} During the benchmark cpu bound, single thread tayga: Single threaded % ---------------------------------------------------------------------- \section{\label{results:jool}Jool} kernel module high cpu usage for udp connetcinos Integration with iptables