Browse Source

Adjust results section

master
Nico Schottelius 3 years ago
parent
commit
a0cf251917
  1. 6
      doc/Design.tex
  2. 141
      doc/Results.tex
  3. BIN
      doc/Thesis.pdf
  4. 2
      doc/graphviz/p4switch-stateful.dot

6
doc/Design.tex

@ -377,8 +377,8 @@ this section we describe the IPv6 and IPv4 configurations as a basis
for the discussion.
All IPv6 addresses are from the documentation block
\textit{2001:DB8::/32}~\cite{rfc3849}. In particular the following sub
networks and IPv6 addresses are used:
\textit{2001:DB8::/32}~\cite{rfc3849}. In particular we use the sub
networks and IPv6 addresses shown in table \ref{tab:ipv6address}.
\begin{table}[htbp]
\begin{center}\begin{minipage}{\textwidth}
\begin{tabular}{| c | c |}
@ -407,7 +407,7 @@ networks and IPv6 addresses are used:
\end{table}
We use private IPv4 addresses as specified by RFC1918~\cite{rfc1918}
from the 10.0.0.0/8 range as follows:
from the 10.0.0.0/8 range as shown in table \ref{tab:ipv4address}.
\begin{table}[htbp]
\begin{center}\begin{minipage}{\textwidth}

141
doc/Results.tex

@ -161,8 +161,8 @@ using delta checksums, the compile time of 2 to 6 hours contributed to
a significant slower development cycle compared to BMV2.
Lastly, the focus of this thesis is to implement high speed NAT64 on
P4, which only requires a subset of the features that we realised on
BMV2. Table \ref{tab:p4netpfgafeatures} summarises the implemented
features and reasons about their implementation status.
BMV2. In table \ref{tab:p4netpfgafeatures} we summarise the implemented
features and reason about their portability afterwards:
\begin{table}[htbp]
\begin{center}\begin{minipage}{\textwidth}
\begin{tabular}{| c | c | c |}
@ -170,34 +170,23 @@ features and reasons about their implementation status.
\textbf{Feature} & \textbf{Description} & \textbf{Status} \\
\hline
Switch to controller & Switch forwards unhandled packets to
controller & portable\footnote{While the NetFPGA P4 implementation
does not have the clone3() extern that the BMV2 implementation offers,
communication to the controller can easily be realised by using one of
the additional ports of the NetFPGA and connect a physical network
card to it.}\\
controller & portable\\
\hline
Controller to Switch & Controller can setup table entries &
portable\footnote{The p4utils suite offers an easy access to the
switch tables. While the P4-NetFPGA support repository also offers
python scripts to modify the switch tables, the code is less
sophisticated and more fragile.}\\
portable\\
\hline
NDP & Switch responds to ICMP6 neighbor & \\
& solicitation request (without controller) &
portable\footnote{NetFPGA/P4 does not offer calculating the checksum
over the payload. However delta checksumming can be used to create
the required checksum for replying.} \\
portable\\
\hline
ARP & Switch can answer ARP request (without controller) &
portable\footnote{As ARP does not use checksums, integrating the
source code \texttt{actions\_arp.p4} into the netpfga code base is
enough to enable ARP support in the NetPFGA.} \\
portable \\
\hline
ICMP6 & Switch responds to ICMP6 echo request (without controller) &
portable\footnote{Same reasoning as NDP.} \\
portable\\
\hline
ICMP & Switch responds to ICMP echo request (without controller) &
portable\footnote{Same reasoning as NDP.} \\
portable\\
\hline
NAT64: TCP & Switch translates TCP with checksumming & \\
& from/to IPv6 to/from IPv4 &
@ -209,20 +198,17 @@ fully implemented\footnote{Source code: \texttt{actions\_nat64\_generic\_icmp.p4
\hline
NAT64: & Switch translates echo request/reply & \\
ICMP/ICMP6 & from/to ICMP6 to/from ICMP with checksumming &
portable\footnote{ICMP/ICMP6 translations only require enabling the
icmp/icmp6 code in the netpfga code base.} \\
portable\\
\hline
NAT64: Sessions & Switch and controller create 1:n sessions/mappings &
portable\footnote{Same reasoning as ``Controller to switch''.} \\
portable\\
\hline
Delta Checksum & Switch can calculate checksum without payload
inspection &
fully implemented\footnote{Source code: \texttt{actions\_delta\_checksum.p4}}\\
\hline
Payload Checksum & Switch can calculate checksum with payload inspection &
unsupported\footnote{To support creating payload checksums, either an
HDL module needs to be created or to modify the generated
the PX program.~\cite{schottelius:_exter_p4_netpf}} \\
unsupported \\
\hline
\end{tabular}
\end{minipage}
@ -230,6 +216,44 @@ unsupported\footnote{To support creating payload checksums, either an
\label{tab:p4netpfgafeatures}
\end{center}
\end{table}
The switch to controller communication differs,
because the P4/NetFPGA implementation does not have the clone3() extern
that the BMV2 implementation offers. However communication to the
controller can easily be realised by using one of
the additional ports of the NetFPGA and connect a physical network
card to it.
Communicating from the controller towards the switch also differs, as
the p4utils suite supporting BMV2 offers an easy access to the switch
tables. While the P4-NetFPGA support repository also offers python
scripts to modify the switch tables, the code is less sophisticated
and more fragile. While porting the existing code is possible, it
might be of advantage to rewrite parts of the P4-NetFPGA before.
The NAT64 session support is based on the P4 switch communicating with
the controller and vice versa. As we consider both features to be
portable, we also consider the NAT64 session feature to be portable.
P4/NetFPGA does not offer calculating the checksum over the payload
and thus calculating the checksum over the payload to create
a reply for an neighbor solicitation packet is not possible. However,
as the payload stays the same as in the request, our delta based
checksum approach can be reused in this situation. With the same
reasoning we consider our ICMP6 and ICMP code, which also requires to
create payload based checksums, to be portable.
ARP replies do not contain a checksum over the payload, thus the
existing ARP code can be directly integrated into P4/NetFPGA without
any changes.
While the P4/NetFPGA target currently does not support accessing the
payload or creating checksums over it, there are two possibilities to
extend the platform: either by creating an HDL module or by
modify the generated the PX
program.~\cite{schottelius:_exter_p4_netpf}
Due to the existing code complexity of the P4/NetFPGA platform, using
the HDL module based approach is likely to be more sustainable.
% ok
% ----------------------------------------------------------------------
\subsection{\label{results:netpfga:stability}Stability}
@ -241,13 +265,13 @@ hardware tests (compare figures \ref{fig:hwtestnico} and
first card reported an additional ``10G\_Loopback'' failure. Due to
the inability of setting table entries, no benchmarking was performed
on the first NetFPGA card.
\begin{figure}[h]
\begin{figure}[htbp]
\includegraphics[scale=1.4]{hwtestnico}
\centering
\caption{Hardware Test NetPFGA Card 1}
\label{fig:hwtestnico}
\end{figure}
\begin{figure}[h]
\begin{figure}[htbp]
\includegraphics[scale=0.2]{hwtesthendrik}
\centering
\caption{Hardware Test NetPFGA Card 2~\cite{hendrik:_p4_progr_fpga_semes_thesis_sa}}
@ -399,7 +423,7 @@ In this section we give an overview of the benchmark design
and summarise the benchmarking results.
% ----------------------------------------------------------------------
\subsection{\label{results:benchmark:design}Benchmark Design}
\begin{figure}[h]
\begin{figure}[htbp]
\includegraphics[scale=0.6]{softwarenat64design}
\centering
\caption{Benchmark Design for NAT64 in Software Implementations}
@ -429,28 +453,30 @@ warm up phase.\footnote{iperf -O 10 parameter, see section \ref{design:tests}.}
% ok
% ----------------------------------------------------------------------
\subsection{\label{results:benchmark:summary}Benchmark Summary}
Overall \textbf{Tayga} has shown to be the slowest translator with an achieved
bandwidth of \textbf{about 3 Gbit/s}, followed by \textbf{Jool} that translates at
about \textbf{8 Gbit/s}. \textbf{Our solution} is the fastest with an almost line rate
translation speed of about \textbf{9 Gbit/s}.
Overall \textbf{Tayga} has shown to be the slowest translator with an
achieved bandwidth of \textbf{about 3 Gbit/s}, followed by
\textbf{Jool} that translates at about \textbf{8 Gbit/s}. \textbf{Our
solution} is the fastest with an almost line rate translation speed
of about \textbf{9 Gbit/s} (compare tables \ref{tab:benchmarkv6} and
\ref{tab:benchmarkv4}).
The TCP based benchmarks show realistic numbers, while iperf reports
above line rate speeds (up to 22 gbit/s on a 10gbit/s link)
for UDP based benchmarks. For this reason we
have summarised the UDP based benchmarks with their average loss
instead of listing the bandwidth details. The ``adjusted bandwidth''
in the UDP benchmarks incorporates the packets loss (compare tables
\ref{tab:benchmarkv6v4udp} and \ref{tab:benchmarkv6v4udp}).
above line rate speeds (up to 22 gbit/s on a 10gbit/s link) for UDP
based benchmarks. For this reason we have summarised the UDP based
benchmarks with their average loss instead of listing the bandwidth
details. The ``adjusted bandwidth'' in the UDP benchmarks incorporates
the packets loss (compare tables \ref{tab:benchmarkv6v4udp} and
\ref{tab:benchmarkv4v6udp}).
Both software solutions showed significant loss of packets in the UDP
based benchmarks (Tayga: up to 91\%, Jool up to 71\%), while the
P4/NetFPGA showed a maximum of 0.01\% packet loss. Packet loss is only
recorded by iperf for UDP based benchmarks, as TCP packets are confirmed and
resent if necessary.
recorded by iperf for UDP based benchmarks, as TCP packets are
confirmed and resent if necessary.
Tayga has the highest variation of results, which might be due to
being fully CPU bound, even in the non-parallel benchmark. Jool has less
variation and in general the P4/NetFPGA solution behaves almost
being fully CPU bound, even in the non-parallel benchmark. Jool has
less variation and in general the P4/NetFPGA solution behaves almost
identical in different benchmark runs.
The CPU load for TCP based benchmarks with Jool was almost negligible,
@ -460,10 +486,10 @@ utilised. When the translation for P4/NetFPGA happens within the
NetFPGA card, there was no CPU utilisation visible on the NAT64 host.
We see lower bandwidth for translating IPv4 to IPv6 in all solutions.
We suspect that this might be due to slighty increasing packet sizes that
occur during this direction of translation. Not only does this vary
the IPv4 versus IPv6 bandwidth, but it might also cause fragmentation
that slows down.
We suspect that this might be due to slighty increasing packet sizes
that occur during this direction of translation. Not only does this
vary the IPv4 versus IPv6 bandwidth, but it might also cause
fragmentation that slows down.
During the benchmarks with up to 10 parallel connections, no
significant CPU load was registered on the load generator. However
@ -484,11 +510,8 @@ Overall the performance of Tayga, a Linux user space program, is as
expected. We were surprised about the good performance of Jool, which,
while slower than the P4/NetFPGA solution, is almost on par with our solution.
% ----------------------------------------------------------------------
\newpage
\subsection{\label{results:benchmark:v6v4tcp}IPv6 to IPv4 TCP
Benchmark Results}
\begin{table}[htbp]
\begin{center}\begin{minipage}{\textwidth}
\begin{center}
\begin{tabular}{| c | c | c | c | c |}
\hline
Implementation & \multicolumn{4}{|c|}{min/avg/max in Gbit/s} \\
@ -505,16 +528,14 @@ P4 / NetPFGA & 9.28 / 9.28 / 9.29 & 9.28 / 9.28 / 9.29 & 9.28 / 9.28
Parallel connections & 1 & 10 & 20 & 50 \\
\hline
\end{tabular}
\end{minipage}
\caption{IPv6 to IPv4 TCP NAT64 Benchmark}
\label{tab:benchmarkv6}
\end{center}
\end{table}
%ok
% ---------------------------------------------------------------------
\subsection{\label{results:benchmark:v4v6tcp}IPv4 to IPv6 TCP Benchmark Results}
\begin{table}[htbp]
\begin{center}\begin{minipage}{\textwidth}
\begin{center}
\begin{tabular}{| c | c | c | c | c |}
\hline
Implementation & \multicolumn{4}{|c|}{min/avg/max in Gbit/s} \\
@ -531,18 +552,13 @@ P4 / NetPFGA & 8.51 / 8.53 / 8.55 & 9.28 / 9.28 / 9.29 & 9.29 / 9.29 /
Parallel connections & 1 & 10 & 20 & 50 \\
\hline
\end{tabular}
\end{minipage}
\caption{IPv4 to IPv6 TCP NAT64 Benchmark}
\label{tab:benchmarkv4}
\end{center}
\end{table}
% ---------------------------------------------------------------------
\newpage
\subsection{\label{results:benchmark:v6v4udp}IPv6 to IPv4 UDP
Benchmark Results}
\begin{table}[htbp]
\begin{center}\begin{minipage}{\textwidth}
\begin{center}
\begin{tabular}{| c | c | c | c | c |}
\hline
Implementation & \multicolumn{4}{|c|}{avg bandwidth in gbit/s / avg loss /
@ -560,16 +576,14 @@ P4 / NetPFGA & 8.28 / 0\% / 8.28 & 9.26 / 0\% / 9.26 &
Parallel connections & 1 & 10 & 20 & 50 \\
\hline
\end{tabular}
\end{minipage}
\caption{IPv6 to IPv4 UDP NAT64 Benchmark}
\label{tab:benchmarkv6v4udp}
\end{center}
\end{table}
%ok
% ---------------------------------------------------------------------
\subsection{\label{results:benchmark:v4v6udp}IPv4 to IPv6 UDP Benchmark Results}
\begin{table}[htbp]
\begin{center}\begin{minipage}{\textwidth}
\begin{center}
\begin{tabular}{| c | c | c | c | c |}
\hline
Implementation & \multicolumn{4}{|c|}{avg bandwidth in gbit/s / avg loss /
@ -587,9 +601,8 @@ P4 / NetPFGA & 7.04 / 0\% / 7.04 & 9.58 / 0\% / 9.58 &
Parallel connections & 1 & 10 & 20 & 50 \\
\hline
\end{tabular}
\end{minipage}
\caption{IPv4 to IPv6 UDP NAT64 Benchmark}
\label{tab:benchmarkv6v4udp}
\label{tab:benchmarkv4v6udp}
\end{center}
\end{table}
%ok

BIN
doc/Thesis.pdf

Binary file not shown.

2
doc/graphviz/p4switch-stateful.dot

@ -16,7 +16,7 @@ digraph G {
tableentry [ label="Create Table Entry" ];
tablematch [ label="Table Match" ];
reinject [ label="Reinject packet" ];
reinject [ label="Reinject Packet" ];
controller [ label="Controller Reads Packet" ]
deparser [ label="Deparser"];

Loading…
Cancel
Save