diff --git a/doc/Design.tex b/doc/Design.tex index c6f3b87..03e4ff1 100644 --- a/doc/Design.tex +++ b/doc/Design.tex @@ -377,8 +377,8 @@ this section we describe the IPv6 and IPv4 configurations as a basis for the discussion. All IPv6 addresses are from the documentation block -\textit{2001:DB8::/32}~\cite{rfc3849}. In particular the following sub -networks and IPv6 addresses are used: +\textit{2001:DB8::/32}~\cite{rfc3849}. In particular we use the sub +networks and IPv6 addresses shown in table \ref{tab:ipv6address}. \begin{table}[htbp] \begin{center}\begin{minipage}{\textwidth} \begin{tabular}{| c | c |} @@ -407,7 +407,7 @@ networks and IPv6 addresses are used: \end{table} We use private IPv4 addresses as specified by RFC1918~\cite{rfc1918} -from the 10.0.0.0/8 range as follows: +from the 10.0.0.0/8 range as shown in table \ref{tab:ipv4address}. \begin{table}[htbp] \begin{center}\begin{minipage}{\textwidth} diff --git a/doc/Results.tex b/doc/Results.tex index 058022d..e9a59c4 100644 --- a/doc/Results.tex +++ b/doc/Results.tex @@ -161,8 +161,8 @@ using delta checksums, the compile time of 2 to 6 hours contributed to a significant slower development cycle compared to BMV2. Lastly, the focus of this thesis is to implement high speed NAT64 on P4, which only requires a subset of the features that we realised on -BMV2. Table \ref{tab:p4netpfgafeatures} summarises the implemented -features and reasons about their implementation status. +BMV2. In table \ref{tab:p4netpfgafeatures} we summarise the implemented +features and reason about their portability afterwards: \begin{table}[htbp] \begin{center}\begin{minipage}{\textwidth} \begin{tabular}{| c | c | c |} @@ -170,34 +170,23 @@ features and reasons about their implementation status. \textbf{Feature} & \textbf{Description} & \textbf{Status} \\ \hline Switch to controller & Switch forwards unhandled packets to -controller & portable\footnote{While the NetFPGA P4 implementation - does not have the clone3() extern that the BMV2 implementation offers, -communication to the controller can easily be realised by using one of -the additional ports of the NetFPGA and connect a physical network -card to it.}\\ +controller & portable\\ \hline Controller to Switch & Controller can setup table entries & -portable\footnote{The p4utils suite offers an easy access to the - switch tables. While the P4-NetFPGA support repository also offers - python scripts to modify the switch tables, the code is less - sophisticated and more fragile.}\\ +portable\\ \hline NDP & Switch responds to ICMP6 neighbor & \\ & solicitation request (without controller) & -portable\footnote{NetFPGA/P4 does not offer calculating the checksum - over the payload. However delta checksumming can be used to create - the required checksum for replying.} \\ +portable\\ \hline ARP & Switch can answer ARP request (without controller) & -portable\footnote{As ARP does not use checksums, integrating the - source code \texttt{actions\_arp.p4} into the netpfga code base is - enough to enable ARP support in the NetPFGA.} \\ +portable \\ \hline ICMP6 & Switch responds to ICMP6 echo request (without controller) & -portable\footnote{Same reasoning as NDP.} \\ +portable\\ \hline ICMP & Switch responds to ICMP echo request (without controller) & -portable\footnote{Same reasoning as NDP.} \\ +portable\\ \hline NAT64: TCP & Switch translates TCP with checksumming & \\ & from/to IPv6 to/from IPv4 & @@ -209,20 +198,17 @@ fully implemented\footnote{Source code: \texttt{actions\_nat64\_generic\_icmp.p4 \hline NAT64: & Switch translates echo request/reply & \\ ICMP/ICMP6 & from/to ICMP6 to/from ICMP with checksumming & -portable\footnote{ICMP/ICMP6 translations only require enabling the - icmp/icmp6 code in the netpfga code base.} \\ +portable\\ \hline NAT64: Sessions & Switch and controller create 1:n sessions/mappings & -portable\footnote{Same reasoning as ``Controller to switch''.} \\ +portable\\ \hline Delta Checksum & Switch can calculate checksum without payload inspection & fully implemented\footnote{Source code: \texttt{actions\_delta\_checksum.p4}}\\ \hline Payload Checksum & Switch can calculate checksum with payload inspection & -unsupported\footnote{To support creating payload checksums, either an - HDL module needs to be created or to modify the generated - the PX program.~\cite{schottelius:_exter_p4_netpf}} \\ +unsupported \\ \hline \end{tabular} \end{minipage} @@ -230,6 +216,44 @@ unsupported\footnote{To support creating payload checksums, either an \label{tab:p4netpfgafeatures} \end{center} \end{table} +The switch to controller communication differs, +because the P4/NetFPGA implementation does not have the clone3() extern +that the BMV2 implementation offers. However communication to the +controller can easily be realised by using one of +the additional ports of the NetFPGA and connect a physical network +card to it. + +Communicating from the controller towards the switch also differs, as +the p4utils suite supporting BMV2 offers an easy access to the switch +tables. While the P4-NetFPGA support repository also offers python +scripts to modify the switch tables, the code is less sophisticated +and more fragile. While porting the existing code is possible, it +might be of advantage to rewrite parts of the P4-NetFPGA before. + +The NAT64 session support is based on the P4 switch communicating with +the controller and vice versa. As we consider both features to be +portable, we also consider the NAT64 session feature to be portable. + +P4/NetFPGA does not offer calculating the checksum over the payload +and thus calculating the checksum over the payload to create +a reply for an neighbor solicitation packet is not possible. However, +as the payload stays the same as in the request, our delta based +checksum approach can be reused in this situation. With the same +reasoning we consider our ICMP6 and ICMP code, which also requires to +create payload based checksums, to be portable. + +ARP replies do not contain a checksum over the payload, thus the +existing ARP code can be directly integrated into P4/NetFPGA without +any changes. + +While the P4/NetFPGA target currently does not support accessing the +payload or creating checksums over it, there are two possibilities to +extend the platform: either by creating an HDL module or by +modify the generated the PX +program.~\cite{schottelius:_exter_p4_netpf} +Due to the existing code complexity of the P4/NetFPGA platform, using +the HDL module based approach is likely to be more sustainable. + % ok % ---------------------------------------------------------------------- \subsection{\label{results:netpfga:stability}Stability} @@ -241,13 +265,13 @@ hardware tests (compare figures \ref{fig:hwtestnico} and first card reported an additional ``10G\_Loopback'' failure. Due to the inability of setting table entries, no benchmarking was performed on the first NetFPGA card. -\begin{figure}[h] +\begin{figure}[htbp] \includegraphics[scale=1.4]{hwtestnico} \centering \caption{Hardware Test NetPFGA Card 1} \label{fig:hwtestnico} \end{figure} -\begin{figure}[h] +\begin{figure}[htbp] \includegraphics[scale=0.2]{hwtesthendrik} \centering \caption{Hardware Test NetPFGA Card 2~\cite{hendrik:_p4_progr_fpga_semes_thesis_sa}} @@ -399,7 +423,7 @@ In this section we give an overview of the benchmark design and summarise the benchmarking results. % ---------------------------------------------------------------------- \subsection{\label{results:benchmark:design}Benchmark Design} -\begin{figure}[h] +\begin{figure}[htbp] \includegraphics[scale=0.6]{softwarenat64design} \centering \caption{Benchmark Design for NAT64 in Software Implementations} @@ -429,28 +453,30 @@ warm up phase.\footnote{iperf -O 10 parameter, see section \ref{design:tests}.} % ok % ---------------------------------------------------------------------- \subsection{\label{results:benchmark:summary}Benchmark Summary} -Overall \textbf{Tayga} has shown to be the slowest translator with an achieved -bandwidth of \textbf{about 3 Gbit/s}, followed by \textbf{Jool} that translates at -about \textbf{8 Gbit/s}. \textbf{Our solution} is the fastest with an almost line rate -translation speed of about \textbf{9 Gbit/s}. +Overall \textbf{Tayga} has shown to be the slowest translator with an +achieved bandwidth of \textbf{about 3 Gbit/s}, followed by +\textbf{Jool} that translates at about \textbf{8 Gbit/s}. \textbf{Our + solution} is the fastest with an almost line rate translation speed +of about \textbf{9 Gbit/s} (compare tables \ref{tab:benchmarkv6} and +\ref{tab:benchmarkv4}). The TCP based benchmarks show realistic numbers, while iperf reports -above line rate speeds (up to 22 gbit/s on a 10gbit/s link) -for UDP based benchmarks. For this reason we -have summarised the UDP based benchmarks with their average loss -instead of listing the bandwidth details. The ``adjusted bandwidth'' -in the UDP benchmarks incorporates the packets loss (compare tables -\ref{tab:benchmarkv6v4udp} and \ref{tab:benchmarkv6v4udp}). +above line rate speeds (up to 22 gbit/s on a 10gbit/s link) for UDP +based benchmarks. For this reason we have summarised the UDP based +benchmarks with their average loss instead of listing the bandwidth +details. The ``adjusted bandwidth'' in the UDP benchmarks incorporates +the packets loss (compare tables \ref{tab:benchmarkv6v4udp} and +\ref{tab:benchmarkv4v6udp}). Both software solutions showed significant loss of packets in the UDP based benchmarks (Tayga: up to 91\%, Jool up to 71\%), while the P4/NetFPGA showed a maximum of 0.01\% packet loss. Packet loss is only -recorded by iperf for UDP based benchmarks, as TCP packets are confirmed and -resent if necessary. +recorded by iperf for UDP based benchmarks, as TCP packets are +confirmed and resent if necessary. Tayga has the highest variation of results, which might be due to -being fully CPU bound, even in the non-parallel benchmark. Jool has less -variation and in general the P4/NetFPGA solution behaves almost +being fully CPU bound, even in the non-parallel benchmark. Jool has +less variation and in general the P4/NetFPGA solution behaves almost identical in different benchmark runs. The CPU load for TCP based benchmarks with Jool was almost negligible, @@ -460,10 +486,10 @@ utilised. When the translation for P4/NetFPGA happens within the NetFPGA card, there was no CPU utilisation visible on the NAT64 host. We see lower bandwidth for translating IPv4 to IPv6 in all solutions. -We suspect that this might be due to slighty increasing packet sizes that -occur during this direction of translation. Not only does this vary -the IPv4 versus IPv6 bandwidth, but it might also cause fragmentation -that slows down. +We suspect that this might be due to slighty increasing packet sizes +that occur during this direction of translation. Not only does this +vary the IPv4 versus IPv6 bandwidth, but it might also cause +fragmentation that slows down. During the benchmarks with up to 10 parallel connections, no significant CPU load was registered on the load generator. However @@ -484,11 +510,8 @@ Overall the performance of Tayga, a Linux user space program, is as expected. We were surprised about the good performance of Jool, which, while slower than the P4/NetFPGA solution, is almost on par with our solution. % ---------------------------------------------------------------------- -\newpage -\subsection{\label{results:benchmark:v6v4tcp}IPv6 to IPv4 TCP - Benchmark Results} \begin{table}[htbp] -\begin{center}\begin{minipage}{\textwidth} +\begin{center} \begin{tabular}{| c | c | c | c | c |} \hline Implementation & \multicolumn{4}{|c|}{min/avg/max in Gbit/s} \\ @@ -505,16 +528,14 @@ P4 / NetPFGA & 9.28 / 9.28 / 9.29 & 9.28 / 9.28 / 9.29 & 9.28 / 9.28 Parallel connections & 1 & 10 & 20 & 50 \\ \hline \end{tabular} -\end{minipage} \caption{IPv6 to IPv4 TCP NAT64 Benchmark} \label{tab:benchmarkv6} \end{center} \end{table} %ok % --------------------------------------------------------------------- -\subsection{\label{results:benchmark:v4v6tcp}IPv4 to IPv6 TCP Benchmark Results} \begin{table}[htbp] -\begin{center}\begin{minipage}{\textwidth} +\begin{center} \begin{tabular}{| c | c | c | c | c |} \hline Implementation & \multicolumn{4}{|c|}{min/avg/max in Gbit/s} \\ @@ -531,18 +552,13 @@ P4 / NetPFGA & 8.51 / 8.53 / 8.55 & 9.28 / 9.28 / 9.29 & 9.29 / 9.29 / Parallel connections & 1 & 10 & 20 & 50 \\ \hline \end{tabular} -\end{minipage} \caption{IPv4 to IPv6 TCP NAT64 Benchmark} \label{tab:benchmarkv4} \end{center} \end{table} - % --------------------------------------------------------------------- -\newpage -\subsection{\label{results:benchmark:v6v4udp}IPv6 to IPv4 UDP - Benchmark Results} \begin{table}[htbp] -\begin{center}\begin{minipage}{\textwidth} +\begin{center} \begin{tabular}{| c | c | c | c | c |} \hline Implementation & \multicolumn{4}{|c|}{avg bandwidth in gbit/s / avg loss / @@ -560,16 +576,14 @@ P4 / NetPFGA & 8.28 / 0\% / 8.28 & 9.26 / 0\% / 9.26 & Parallel connections & 1 & 10 & 20 & 50 \\ \hline \end{tabular} -\end{minipage} \caption{IPv6 to IPv4 UDP NAT64 Benchmark} \label{tab:benchmarkv6v4udp} \end{center} \end{table} %ok % --------------------------------------------------------------------- -\subsection{\label{results:benchmark:v4v6udp}IPv4 to IPv6 UDP Benchmark Results} \begin{table}[htbp] -\begin{center}\begin{minipage}{\textwidth} +\begin{center} \begin{tabular}{| c | c | c | c | c |} \hline Implementation & \multicolumn{4}{|c|}{avg bandwidth in gbit/s / avg loss / @@ -587,9 +601,8 @@ P4 / NetPFGA & 7.04 / 0\% / 7.04 & 9.58 / 0\% / 9.58 & Parallel connections & 1 & 10 & 20 & 50 \\ \hline \end{tabular} -\end{minipage} \caption{IPv4 to IPv6 UDP NAT64 Benchmark} -\label{tab:benchmarkv6v4udp} +\label{tab:benchmarkv4v6udp} \end{center} \end{table} %ok diff --git a/doc/Thesis.pdf b/doc/Thesis.pdf index a3b38be..04c69cf 100644 Binary files a/doc/Thesis.pdf and b/doc/Thesis.pdf differ diff --git a/doc/graphviz/p4switch-stateful.dot b/doc/graphviz/p4switch-stateful.dot index c17e988..2469f37 100644 --- a/doc/graphviz/p4switch-stateful.dot +++ b/doc/graphviz/p4switch-stateful.dot @@ -16,7 +16,7 @@ digraph G { tableentry [ label="Create Table Entry" ]; tablematch [ label="Table Match" ]; - reinject [ label="Reinject packet" ]; + reinject [ label="Reinject Packet" ]; controller [ label="Controller Reads Packet" ] deparser [ label="Deparser"];