|
|
|
@ -161,8 +161,8 @@ using delta checksums, the compile time of 2 to 6 hours contributed to
|
|
|
|
|
a significant slower development cycle compared to BMV2. |
|
|
|
|
Lastly, the focus of this thesis is to implement high speed NAT64 on |
|
|
|
|
P4, which only requires a subset of the features that we realised on |
|
|
|
|
BMV2. Table \ref{tab:p4netpfgafeatures} summarises the implemented |
|
|
|
|
features and reasons about their implementation status. |
|
|
|
|
BMV2. In table \ref{tab:p4netpfgafeatures} we summarise the implemented |
|
|
|
|
features and reason about their portability afterwards: |
|
|
|
|
\begin{table}[htbp] |
|
|
|
|
\begin{center}\begin{minipage}{\textwidth} |
|
|
|
|
\begin{tabular}{| c | c | c |} |
|
|
|
@ -170,34 +170,23 @@ features and reasons about their implementation status.
|
|
|
|
|
\textbf{Feature} & \textbf{Description} & \textbf{Status} \\ |
|
|
|
|
\hline |
|
|
|
|
Switch to controller & Switch forwards unhandled packets to |
|
|
|
|
controller & portable\footnote{While the NetFPGA P4 implementation |
|
|
|
|
does not have the clone3() extern that the BMV2 implementation offers, |
|
|
|
|
communication to the controller can easily be realised by using one of |
|
|
|
|
the additional ports of the NetFPGA and connect a physical network |
|
|
|
|
card to it.}\\ |
|
|
|
|
controller & portable\\ |
|
|
|
|
\hline |
|
|
|
|
Controller to Switch & Controller can setup table entries & |
|
|
|
|
portable\footnote{The p4utils suite offers an easy access to the |
|
|
|
|
switch tables. While the P4-NetFPGA support repository also offers |
|
|
|
|
python scripts to modify the switch tables, the code is less |
|
|
|
|
sophisticated and more fragile.}\\ |
|
|
|
|
portable\\ |
|
|
|
|
\hline |
|
|
|
|
NDP & Switch responds to ICMP6 neighbor & \\ |
|
|
|
|
& solicitation request (without controller) & |
|
|
|
|
portable\footnote{NetFPGA/P4 does not offer calculating the checksum |
|
|
|
|
over the payload. However delta checksumming can be used to create |
|
|
|
|
the required checksum for replying.} \\ |
|
|
|
|
portable\\ |
|
|
|
|
\hline |
|
|
|
|
ARP & Switch can answer ARP request (without controller) & |
|
|
|
|
portable\footnote{As ARP does not use checksums, integrating the |
|
|
|
|
source code \texttt{actions\_arp.p4} into the netpfga code base is |
|
|
|
|
enough to enable ARP support in the NetPFGA.} \\ |
|
|
|
|
portable \\ |
|
|
|
|
\hline |
|
|
|
|
ICMP6 & Switch responds to ICMP6 echo request (without controller) & |
|
|
|
|
portable\footnote{Same reasoning as NDP.} \\ |
|
|
|
|
portable\\ |
|
|
|
|
\hline |
|
|
|
|
ICMP & Switch responds to ICMP echo request (without controller) & |
|
|
|
|
portable\footnote{Same reasoning as NDP.} \\ |
|
|
|
|
portable\\ |
|
|
|
|
\hline |
|
|
|
|
NAT64: TCP & Switch translates TCP with checksumming & \\ |
|
|
|
|
& from/to IPv6 to/from IPv4 & |
|
|
|
@ -209,20 +198,17 @@ fully implemented\footnote{Source code: \texttt{actions\_nat64\_generic\_icmp.p4
|
|
|
|
|
\hline |
|
|
|
|
NAT64: & Switch translates echo request/reply & \\ |
|
|
|
|
ICMP/ICMP6 & from/to ICMP6 to/from ICMP with checksumming & |
|
|
|
|
portable\footnote{ICMP/ICMP6 translations only require enabling the |
|
|
|
|
icmp/icmp6 code in the netpfga code base.} \\ |
|
|
|
|
portable\\ |
|
|
|
|
\hline |
|
|
|
|
NAT64: Sessions & Switch and controller create 1:n sessions/mappings & |
|
|
|
|
portable\footnote{Same reasoning as ``Controller to switch''.} \\ |
|
|
|
|
portable\\ |
|
|
|
|
\hline |
|
|
|
|
Delta Checksum & Switch can calculate checksum without payload |
|
|
|
|
inspection & |
|
|
|
|
fully implemented\footnote{Source code: \texttt{actions\_delta\_checksum.p4}}\\ |
|
|
|
|
\hline |
|
|
|
|
Payload Checksum & Switch can calculate checksum with payload inspection & |
|
|
|
|
unsupported\footnote{To support creating payload checksums, either an |
|
|
|
|
HDL module needs to be created or to modify the generated |
|
|
|
|
the PX program.~\cite{schottelius:_exter_p4_netpf}} \\ |
|
|
|
|
unsupported \\ |
|
|
|
|
\hline |
|
|
|
|
\end{tabular} |
|
|
|
|
\end{minipage} |
|
|
|
@ -230,6 +216,44 @@ unsupported\footnote{To support creating payload checksums, either an
|
|
|
|
|
\label{tab:p4netpfgafeatures} |
|
|
|
|
\end{center} |
|
|
|
|
\end{table} |
|
|
|
|
The switch to controller communication differs, |
|
|
|
|
because the P4/NetFPGA implementation does not have the clone3() extern |
|
|
|
|
that the BMV2 implementation offers. However communication to the |
|
|
|
|
controller can easily be realised by using one of |
|
|
|
|
the additional ports of the NetFPGA and connect a physical network |
|
|
|
|
card to it. |
|
|
|
|
|
|
|
|
|
Communicating from the controller towards the switch also differs, as |
|
|
|
|
the p4utils suite supporting BMV2 offers an easy access to the switch |
|
|
|
|
tables. While the P4-NetFPGA support repository also offers python |
|
|
|
|
scripts to modify the switch tables, the code is less sophisticated |
|
|
|
|
and more fragile. While porting the existing code is possible, it |
|
|
|
|
might be of advantage to rewrite parts of the P4-NetFPGA before. |
|
|
|
|
|
|
|
|
|
The NAT64 session support is based on the P4 switch communicating with |
|
|
|
|
the controller and vice versa. As we consider both features to be |
|
|
|
|
portable, we also consider the NAT64 session feature to be portable. |
|
|
|
|
|
|
|
|
|
P4/NetFPGA does not offer calculating the checksum over the payload |
|
|
|
|
and thus calculating the checksum over the payload to create |
|
|
|
|
a reply for an neighbor solicitation packet is not possible. However, |
|
|
|
|
as the payload stays the same as in the request, our delta based |
|
|
|
|
checksum approach can be reused in this situation. With the same |
|
|
|
|
reasoning we consider our ICMP6 and ICMP code, which also requires to |
|
|
|
|
create payload based checksums, to be portable. |
|
|
|
|
|
|
|
|
|
ARP replies do not contain a checksum over the payload, thus the |
|
|
|
|
existing ARP code can be directly integrated into P4/NetFPGA without |
|
|
|
|
any changes. |
|
|
|
|
|
|
|
|
|
While the P4/NetFPGA target currently does not support accessing the |
|
|
|
|
payload or creating checksums over it, there are two possibilities to |
|
|
|
|
extend the platform: either by creating an HDL module or by |
|
|
|
|
modify the generated the PX |
|
|
|
|
program.~\cite{schottelius:_exter_p4_netpf} |
|
|
|
|
Due to the existing code complexity of the P4/NetFPGA platform, using |
|
|
|
|
the HDL module based approach is likely to be more sustainable. |
|
|
|
|
|
|
|
|
|
% ok |
|
|
|
|
% ---------------------------------------------------------------------- |
|
|
|
|
\subsection{\label{results:netpfga:stability}Stability} |
|
|
|
@ -241,13 +265,13 @@ hardware tests (compare figures \ref{fig:hwtestnico} and
|
|
|
|
|
first card reported an additional ``10G\_Loopback'' failure. Due to |
|
|
|
|
the inability of setting table entries, no benchmarking was performed |
|
|
|
|
on the first NetFPGA card. |
|
|
|
|
\begin{figure}[h] |
|
|
|
|
\begin{figure}[htbp] |
|
|
|
|
\includegraphics[scale=1.4]{hwtestnico} |
|
|
|
|
\centering |
|
|
|
|
\caption{Hardware Test NetPFGA Card 1} |
|
|
|
|
\label{fig:hwtestnico} |
|
|
|
|
\end{figure} |
|
|
|
|
\begin{figure}[h] |
|
|
|
|
\begin{figure}[htbp] |
|
|
|
|
\includegraphics[scale=0.2]{hwtesthendrik} |
|
|
|
|
\centering |
|
|
|
|
\caption{Hardware Test NetPFGA Card 2~\cite{hendrik:_p4_progr_fpga_semes_thesis_sa}} |
|
|
|
@ -399,7 +423,7 @@ In this section we give an overview of the benchmark design
|
|
|
|
|
and summarise the benchmarking results. |
|
|
|
|
% ---------------------------------------------------------------------- |
|
|
|
|
\subsection{\label{results:benchmark:design}Benchmark Design} |
|
|
|
|
\begin{figure}[h] |
|
|
|
|
\begin{figure}[htbp] |
|
|
|
|
\includegraphics[scale=0.6]{softwarenat64design} |
|
|
|
|
\centering |
|
|
|
|
\caption{Benchmark Design for NAT64 in Software Implementations} |
|
|
|
@ -429,28 +453,30 @@ warm up phase.\footnote{iperf -O 10 parameter, see section \ref{design:tests}.}
|
|
|
|
|
% ok |
|
|
|
|
% ---------------------------------------------------------------------- |
|
|
|
|
\subsection{\label{results:benchmark:summary}Benchmark Summary} |
|
|
|
|
Overall \textbf{Tayga} has shown to be the slowest translator with an achieved |
|
|
|
|
bandwidth of \textbf{about 3 Gbit/s}, followed by \textbf{Jool} that translates at |
|
|
|
|
about \textbf{8 Gbit/s}. \textbf{Our solution} is the fastest with an almost line rate |
|
|
|
|
translation speed of about \textbf{9 Gbit/s}. |
|
|
|
|
Overall \textbf{Tayga} has shown to be the slowest translator with an |
|
|
|
|
achieved bandwidth of \textbf{about 3 Gbit/s}, followed by |
|
|
|
|
\textbf{Jool} that translates at about \textbf{8 Gbit/s}. \textbf{Our |
|
|
|
|
solution} is the fastest with an almost line rate translation speed |
|
|
|
|
of about \textbf{9 Gbit/s} (compare tables \ref{tab:benchmarkv6} and |
|
|
|
|
\ref{tab:benchmarkv4}). |
|
|
|
|
|
|
|
|
|
The TCP based benchmarks show realistic numbers, while iperf reports |
|
|
|
|
above line rate speeds (up to 22 gbit/s on a 10gbit/s link) |
|
|
|
|
for UDP based benchmarks. For this reason we |
|
|
|
|
have summarised the UDP based benchmarks with their average loss |
|
|
|
|
instead of listing the bandwidth details. The ``adjusted bandwidth'' |
|
|
|
|
in the UDP benchmarks incorporates the packets loss (compare tables |
|
|
|
|
\ref{tab:benchmarkv6v4udp} and \ref{tab:benchmarkv6v4udp}). |
|
|
|
|
above line rate speeds (up to 22 gbit/s on a 10gbit/s link) for UDP |
|
|
|
|
based benchmarks. For this reason we have summarised the UDP based |
|
|
|
|
benchmarks with their average loss instead of listing the bandwidth |
|
|
|
|
details. The ``adjusted bandwidth'' in the UDP benchmarks incorporates |
|
|
|
|
the packets loss (compare tables \ref{tab:benchmarkv6v4udp} and |
|
|
|
|
\ref{tab:benchmarkv4v6udp}). |
|
|
|
|
|
|
|
|
|
Both software solutions showed significant loss of packets in the UDP |
|
|
|
|
based benchmarks (Tayga: up to 91\%, Jool up to 71\%), while the |
|
|
|
|
P4/NetFPGA showed a maximum of 0.01\% packet loss. Packet loss is only |
|
|
|
|
recorded by iperf for UDP based benchmarks, as TCP packets are confirmed and |
|
|
|
|
resent if necessary. |
|
|
|
|
recorded by iperf for UDP based benchmarks, as TCP packets are |
|
|
|
|
confirmed and resent if necessary. |
|
|
|
|
|
|
|
|
|
Tayga has the highest variation of results, which might be due to |
|
|
|
|
being fully CPU bound, even in the non-parallel benchmark. Jool has less |
|
|
|
|
variation and in general the P4/NetFPGA solution behaves almost |
|
|
|
|
being fully CPU bound, even in the non-parallel benchmark. Jool has |
|
|
|
|
less variation and in general the P4/NetFPGA solution behaves almost |
|
|
|
|
identical in different benchmark runs. |
|
|
|
|
|
|
|
|
|
The CPU load for TCP based benchmarks with Jool was almost negligible, |
|
|
|
@ -460,10 +486,10 @@ utilised. When the translation for P4/NetFPGA happens within the
|
|
|
|
|
NetFPGA card, there was no CPU utilisation visible on the NAT64 host. |
|
|
|
|
|
|
|
|
|
We see lower bandwidth for translating IPv4 to IPv6 in all solutions. |
|
|
|
|
We suspect that this might be due to slighty increasing packet sizes that |
|
|
|
|
occur during this direction of translation. Not only does this vary |
|
|
|
|
the IPv4 versus IPv6 bandwidth, but it might also cause fragmentation |
|
|
|
|
that slows down. |
|
|
|
|
We suspect that this might be due to slighty increasing packet sizes |
|
|
|
|
that occur during this direction of translation. Not only does this |
|
|
|
|
vary the IPv4 versus IPv6 bandwidth, but it might also cause |
|
|
|
|
fragmentation that slows down. |
|
|
|
|
|
|
|
|
|
During the benchmarks with up to 10 parallel connections, no |
|
|
|
|
significant CPU load was registered on the load generator. However |
|
|
|
@ -484,11 +510,8 @@ Overall the performance of Tayga, a Linux user space program, is as
|
|
|
|
|
expected. We were surprised about the good performance of Jool, which, |
|
|
|
|
while slower than the P4/NetFPGA solution, is almost on par with our solution. |
|
|
|
|
% ---------------------------------------------------------------------- |
|
|
|
|
\newpage |
|
|
|
|
\subsection{\label{results:benchmark:v6v4tcp}IPv6 to IPv4 TCP |
|
|
|
|
Benchmark Results} |
|
|
|
|
\begin{table}[htbp] |
|
|
|
|
\begin{center}\begin{minipage}{\textwidth} |
|
|
|
|
\begin{center} |
|
|
|
|
\begin{tabular}{| c | c | c | c | c |} |
|
|
|
|
\hline |
|
|
|
|
Implementation & \multicolumn{4}{|c|}{min/avg/max in Gbit/s} \\ |
|
|
|
@ -505,16 +528,14 @@ P4 / NetPFGA & 9.28 / 9.28 / 9.29 & 9.28 / 9.28 / 9.29 & 9.28 / 9.28
|
|
|
|
|
Parallel connections & 1 & 10 & 20 & 50 \\ |
|
|
|
|
\hline |
|
|
|
|
\end{tabular} |
|
|
|
|
\end{minipage} |
|
|
|
|
\caption{IPv6 to IPv4 TCP NAT64 Benchmark} |
|
|
|
|
\label{tab:benchmarkv6} |
|
|
|
|
\end{center} |
|
|
|
|
\end{table} |
|
|
|
|
%ok |
|
|
|
|
% --------------------------------------------------------------------- |
|
|
|
|
\subsection{\label{results:benchmark:v4v6tcp}IPv4 to IPv6 TCP Benchmark Results} |
|
|
|
|
\begin{table}[htbp] |
|
|
|
|
\begin{center}\begin{minipage}{\textwidth} |
|
|
|
|
\begin{center} |
|
|
|
|
\begin{tabular}{| c | c | c | c | c |} |
|
|
|
|
\hline |
|
|
|
|
Implementation & \multicolumn{4}{|c|}{min/avg/max in Gbit/s} \\ |
|
|
|
@ -531,18 +552,13 @@ P4 / NetPFGA & 8.51 / 8.53 / 8.55 & 9.28 / 9.28 / 9.29 & 9.29 / 9.29 /
|
|
|
|
|
Parallel connections & 1 & 10 & 20 & 50 \\ |
|
|
|
|
\hline |
|
|
|
|
\end{tabular} |
|
|
|
|
\end{minipage} |
|
|
|
|
\caption{IPv4 to IPv6 TCP NAT64 Benchmark} |
|
|
|
|
\label{tab:benchmarkv4} |
|
|
|
|
\end{center} |
|
|
|
|
\end{table} |
|
|
|
|
|
|
|
|
|
% --------------------------------------------------------------------- |
|
|
|
|
\newpage |
|
|
|
|
\subsection{\label{results:benchmark:v6v4udp}IPv6 to IPv4 UDP |
|
|
|
|
Benchmark Results} |
|
|
|
|
\begin{table}[htbp] |
|
|
|
|
\begin{center}\begin{minipage}{\textwidth} |
|
|
|
|
\begin{center} |
|
|
|
|
\begin{tabular}{| c | c | c | c | c |} |
|
|
|
|
\hline |
|
|
|
|
Implementation & \multicolumn{4}{|c|}{avg bandwidth in gbit/s / avg loss / |
|
|
|
@ -560,16 +576,14 @@ P4 / NetPFGA & 8.28 / 0\% / 8.28 & 9.26 / 0\% / 9.26 &
|
|
|
|
|
Parallel connections & 1 & 10 & 20 & 50 \\ |
|
|
|
|
\hline |
|
|
|
|
\end{tabular} |
|
|
|
|
\end{minipage} |
|
|
|
|
\caption{IPv6 to IPv4 UDP NAT64 Benchmark} |
|
|
|
|
\label{tab:benchmarkv6v4udp} |
|
|
|
|
\end{center} |
|
|
|
|
\end{table} |
|
|
|
|
%ok |
|
|
|
|
% --------------------------------------------------------------------- |
|
|
|
|
\subsection{\label{results:benchmark:v4v6udp}IPv4 to IPv6 UDP Benchmark Results} |
|
|
|
|
\begin{table}[htbp] |
|
|
|
|
\begin{center}\begin{minipage}{\textwidth} |
|
|
|
|
\begin{center} |
|
|
|
|
\begin{tabular}{| c | c | c | c | c |} |
|
|
|
|
\hline |
|
|
|
|
Implementation & \multicolumn{4}{|c|}{avg bandwidth in gbit/s / avg loss / |
|
|
|
@ -587,9 +601,8 @@ P4 / NetPFGA & 7.04 / 0\% / 7.04 & 9.58 / 0\% / 9.58 &
|
|
|
|
|
Parallel connections & 1 & 10 & 20 & 50 \\ |
|
|
|
|
\hline |
|
|
|
|
\end{tabular} |
|
|
|
|
\end{minipage} |
|
|
|
|
\caption{IPv4 to IPv6 UDP NAT64 Benchmark} |
|
|
|
|
\label{tab:benchmarkv6v4udp} |
|
|
|
|
\label{tab:benchmarkv4v6udp} |
|
|
|
|
\end{center} |
|
|
|
|
\end{table} |
|
|
|
|
%ok |
|
|
|
|