Adjust results section
This commit is contained in:
parent
33963bc681
commit
a0cf251917
4 changed files with 81 additions and 68 deletions
|
@ -377,8 +377,8 @@ this section we describe the IPv6 and IPv4 configurations as a basis
|
|||
for the discussion.
|
||||
|
||||
All IPv6 addresses are from the documentation block
|
||||
\textit{2001:DB8::/32}~\cite{rfc3849}. In particular the following sub
|
||||
networks and IPv6 addresses are used:
|
||||
\textit{2001:DB8::/32}~\cite{rfc3849}. In particular we use the sub
|
||||
networks and IPv6 addresses shown in table \ref{tab:ipv6address}.
|
||||
\begin{table}[htbp]
|
||||
\begin{center}\begin{minipage}{\textwidth}
|
||||
\begin{tabular}{| c | c |}
|
||||
|
@ -407,7 +407,7 @@ networks and IPv6 addresses are used:
|
|||
\end{table}
|
||||
|
||||
We use private IPv4 addresses as specified by RFC1918~\cite{rfc1918}
|
||||
from the 10.0.0.0/8 range as follows:
|
||||
from the 10.0.0.0/8 range as shown in table \ref{tab:ipv4address}.
|
||||
|
||||
\begin{table}[htbp]
|
||||
\begin{center}\begin{minipage}{\textwidth}
|
||||
|
|
141
doc/Results.tex
141
doc/Results.tex
|
@ -161,8 +161,8 @@ using delta checksums, the compile time of 2 to 6 hours contributed to
|
|||
a significant slower development cycle compared to BMV2.
|
||||
Lastly, the focus of this thesis is to implement high speed NAT64 on
|
||||
P4, which only requires a subset of the features that we realised on
|
||||
BMV2. Table \ref{tab:p4netpfgafeatures} summarises the implemented
|
||||
features and reasons about their implementation status.
|
||||
BMV2. In table \ref{tab:p4netpfgafeatures} we summarise the implemented
|
||||
features and reason about their portability afterwards:
|
||||
\begin{table}[htbp]
|
||||
\begin{center}\begin{minipage}{\textwidth}
|
||||
\begin{tabular}{| c | c | c |}
|
||||
|
@ -170,34 +170,23 @@ features and reasons about their implementation status.
|
|||
\textbf{Feature} & \textbf{Description} & \textbf{Status} \\
|
||||
\hline
|
||||
Switch to controller & Switch forwards unhandled packets to
|
||||
controller & portable\footnote{While the NetFPGA P4 implementation
|
||||
does not have the clone3() extern that the BMV2 implementation offers,
|
||||
communication to the controller can easily be realised by using one of
|
||||
the additional ports of the NetFPGA and connect a physical network
|
||||
card to it.}\\
|
||||
controller & portable\\
|
||||
\hline
|
||||
Controller to Switch & Controller can setup table entries &
|
||||
portable\footnote{The p4utils suite offers an easy access to the
|
||||
switch tables. While the P4-NetFPGA support repository also offers
|
||||
python scripts to modify the switch tables, the code is less
|
||||
sophisticated and more fragile.}\\
|
||||
portable\\
|
||||
\hline
|
||||
NDP & Switch responds to ICMP6 neighbor & \\
|
||||
& solicitation request (without controller) &
|
||||
portable\footnote{NetFPGA/P4 does not offer calculating the checksum
|
||||
over the payload. However delta checksumming can be used to create
|
||||
the required checksum for replying.} \\
|
||||
portable\\
|
||||
\hline
|
||||
ARP & Switch can answer ARP request (without controller) &
|
||||
portable\footnote{As ARP does not use checksums, integrating the
|
||||
source code \texttt{actions\_arp.p4} into the netpfga code base is
|
||||
enough to enable ARP support in the NetPFGA.} \\
|
||||
portable \\
|
||||
\hline
|
||||
ICMP6 & Switch responds to ICMP6 echo request (without controller) &
|
||||
portable\footnote{Same reasoning as NDP.} \\
|
||||
portable\\
|
||||
\hline
|
||||
ICMP & Switch responds to ICMP echo request (without controller) &
|
||||
portable\footnote{Same reasoning as NDP.} \\
|
||||
portable\\
|
||||
\hline
|
||||
NAT64: TCP & Switch translates TCP with checksumming & \\
|
||||
& from/to IPv6 to/from IPv4 &
|
||||
|
@ -209,20 +198,17 @@ fully implemented\footnote{Source code: \texttt{actions\_nat64\_generic\_icmp.p4
|
|||
\hline
|
||||
NAT64: & Switch translates echo request/reply & \\
|
||||
ICMP/ICMP6 & from/to ICMP6 to/from ICMP with checksumming &
|
||||
portable\footnote{ICMP/ICMP6 translations only require enabling the
|
||||
icmp/icmp6 code in the netpfga code base.} \\
|
||||
portable\\
|
||||
\hline
|
||||
NAT64: Sessions & Switch and controller create 1:n sessions/mappings &
|
||||
portable\footnote{Same reasoning as ``Controller to switch''.} \\
|
||||
portable\\
|
||||
\hline
|
||||
Delta Checksum & Switch can calculate checksum without payload
|
||||
inspection &
|
||||
fully implemented\footnote{Source code: \texttt{actions\_delta\_checksum.p4}}\\
|
||||
\hline
|
||||
Payload Checksum & Switch can calculate checksum with payload inspection &
|
||||
unsupported\footnote{To support creating payload checksums, either an
|
||||
HDL module needs to be created or to modify the generated
|
||||
the PX program.~\cite{schottelius:_exter_p4_netpf}} \\
|
||||
unsupported \\
|
||||
\hline
|
||||
\end{tabular}
|
||||
\end{minipage}
|
||||
|
@ -230,6 +216,44 @@ unsupported\footnote{To support creating payload checksums, either an
|
|||
\label{tab:p4netpfgafeatures}
|
||||
\end{center}
|
||||
\end{table}
|
||||
The switch to controller communication differs,
|
||||
because the P4/NetFPGA implementation does not have the clone3() extern
|
||||
that the BMV2 implementation offers. However communication to the
|
||||
controller can easily be realised by using one of
|
||||
the additional ports of the NetFPGA and connect a physical network
|
||||
card to it.
|
||||
|
||||
Communicating from the controller towards the switch also differs, as
|
||||
the p4utils suite supporting BMV2 offers an easy access to the switch
|
||||
tables. While the P4-NetFPGA support repository also offers python
|
||||
scripts to modify the switch tables, the code is less sophisticated
|
||||
and more fragile. While porting the existing code is possible, it
|
||||
might be of advantage to rewrite parts of the P4-NetFPGA before.
|
||||
|
||||
The NAT64 session support is based on the P4 switch communicating with
|
||||
the controller and vice versa. As we consider both features to be
|
||||
portable, we also consider the NAT64 session feature to be portable.
|
||||
|
||||
P4/NetFPGA does not offer calculating the checksum over the payload
|
||||
and thus calculating the checksum over the payload to create
|
||||
a reply for an neighbor solicitation packet is not possible. However,
|
||||
as the payload stays the same as in the request, our delta based
|
||||
checksum approach can be reused in this situation. With the same
|
||||
reasoning we consider our ICMP6 and ICMP code, which also requires to
|
||||
create payload based checksums, to be portable.
|
||||
|
||||
ARP replies do not contain a checksum over the payload, thus the
|
||||
existing ARP code can be directly integrated into P4/NetFPGA without
|
||||
any changes.
|
||||
|
||||
While the P4/NetFPGA target currently does not support accessing the
|
||||
payload or creating checksums over it, there are two possibilities to
|
||||
extend the platform: either by creating an HDL module or by
|
||||
modify the generated the PX
|
||||
program.~\cite{schottelius:_exter_p4_netpf}
|
||||
Due to the existing code complexity of the P4/NetFPGA platform, using
|
||||
the HDL module based approach is likely to be more sustainable.
|
||||
|
||||
% ok
|
||||
% ----------------------------------------------------------------------
|
||||
\subsection{\label{results:netpfga:stability}Stability}
|
||||
|
@ -241,13 +265,13 @@ hardware tests (compare figures \ref{fig:hwtestnico} and
|
|||
first card reported an additional ``10G\_Loopback'' failure. Due to
|
||||
the inability of setting table entries, no benchmarking was performed
|
||||
on the first NetFPGA card.
|
||||
\begin{figure}[h]
|
||||
\begin{figure}[htbp]
|
||||
\includegraphics[scale=1.4]{hwtestnico}
|
||||
\centering
|
||||
\caption{Hardware Test NetPFGA Card 1}
|
||||
\label{fig:hwtestnico}
|
||||
\end{figure}
|
||||
\begin{figure}[h]
|
||||
\begin{figure}[htbp]
|
||||
\includegraphics[scale=0.2]{hwtesthendrik}
|
||||
\centering
|
||||
\caption{Hardware Test NetPFGA Card 2~\cite{hendrik:_p4_progr_fpga_semes_thesis_sa}}
|
||||
|
@ -399,7 +423,7 @@ In this section we give an overview of the benchmark design
|
|||
and summarise the benchmarking results.
|
||||
% ----------------------------------------------------------------------
|
||||
\subsection{\label{results:benchmark:design}Benchmark Design}
|
||||
\begin{figure}[h]
|
||||
\begin{figure}[htbp]
|
||||
\includegraphics[scale=0.6]{softwarenat64design}
|
||||
\centering
|
||||
\caption{Benchmark Design for NAT64 in Software Implementations}
|
||||
|
@ -429,28 +453,30 @@ warm up phase.\footnote{iperf -O 10 parameter, see section \ref{design:tests}.}
|
|||
% ok
|
||||
% ----------------------------------------------------------------------
|
||||
\subsection{\label{results:benchmark:summary}Benchmark Summary}
|
||||
Overall \textbf{Tayga} has shown to be the slowest translator with an achieved
|
||||
bandwidth of \textbf{about 3 Gbit/s}, followed by \textbf{Jool} that translates at
|
||||
about \textbf{8 Gbit/s}. \textbf{Our solution} is the fastest with an almost line rate
|
||||
translation speed of about \textbf{9 Gbit/s}.
|
||||
Overall \textbf{Tayga} has shown to be the slowest translator with an
|
||||
achieved bandwidth of \textbf{about 3 Gbit/s}, followed by
|
||||
\textbf{Jool} that translates at about \textbf{8 Gbit/s}. \textbf{Our
|
||||
solution} is the fastest with an almost line rate translation speed
|
||||
of about \textbf{9 Gbit/s} (compare tables \ref{tab:benchmarkv6} and
|
||||
\ref{tab:benchmarkv4}).
|
||||
|
||||
The TCP based benchmarks show realistic numbers, while iperf reports
|
||||
above line rate speeds (up to 22 gbit/s on a 10gbit/s link)
|
||||
for UDP based benchmarks. For this reason we
|
||||
have summarised the UDP based benchmarks with their average loss
|
||||
instead of listing the bandwidth details. The ``adjusted bandwidth''
|
||||
in the UDP benchmarks incorporates the packets loss (compare tables
|
||||
\ref{tab:benchmarkv6v4udp} and \ref{tab:benchmarkv6v4udp}).
|
||||
above line rate speeds (up to 22 gbit/s on a 10gbit/s link) for UDP
|
||||
based benchmarks. For this reason we have summarised the UDP based
|
||||
benchmarks with their average loss instead of listing the bandwidth
|
||||
details. The ``adjusted bandwidth'' in the UDP benchmarks incorporates
|
||||
the packets loss (compare tables \ref{tab:benchmarkv6v4udp} and
|
||||
\ref{tab:benchmarkv4v6udp}).
|
||||
|
||||
Both software solutions showed significant loss of packets in the UDP
|
||||
based benchmarks (Tayga: up to 91\%, Jool up to 71\%), while the
|
||||
P4/NetFPGA showed a maximum of 0.01\% packet loss. Packet loss is only
|
||||
recorded by iperf for UDP based benchmarks, as TCP packets are confirmed and
|
||||
resent if necessary.
|
||||
recorded by iperf for UDP based benchmarks, as TCP packets are
|
||||
confirmed and resent if necessary.
|
||||
|
||||
Tayga has the highest variation of results, which might be due to
|
||||
being fully CPU bound, even in the non-parallel benchmark. Jool has less
|
||||
variation and in general the P4/NetFPGA solution behaves almost
|
||||
being fully CPU bound, even in the non-parallel benchmark. Jool has
|
||||
less variation and in general the P4/NetFPGA solution behaves almost
|
||||
identical in different benchmark runs.
|
||||
|
||||
The CPU load for TCP based benchmarks with Jool was almost negligible,
|
||||
|
@ -460,10 +486,10 @@ utilised. When the translation for P4/NetFPGA happens within the
|
|||
NetFPGA card, there was no CPU utilisation visible on the NAT64 host.
|
||||
|
||||
We see lower bandwidth for translating IPv4 to IPv6 in all solutions.
|
||||
We suspect that this might be due to slighty increasing packet sizes that
|
||||
occur during this direction of translation. Not only does this vary
|
||||
the IPv4 versus IPv6 bandwidth, but it might also cause fragmentation
|
||||
that slows down.
|
||||
We suspect that this might be due to slighty increasing packet sizes
|
||||
that occur during this direction of translation. Not only does this
|
||||
vary the IPv4 versus IPv6 bandwidth, but it might also cause
|
||||
fragmentation that slows down.
|
||||
|
||||
During the benchmarks with up to 10 parallel connections, no
|
||||
significant CPU load was registered on the load generator. However
|
||||
|
@ -484,11 +510,8 @@ Overall the performance of Tayga, a Linux user space program, is as
|
|||
expected. We were surprised about the good performance of Jool, which,
|
||||
while slower than the P4/NetFPGA solution, is almost on par with our solution.
|
||||
% ----------------------------------------------------------------------
|
||||
\newpage
|
||||
\subsection{\label{results:benchmark:v6v4tcp}IPv6 to IPv4 TCP
|
||||
Benchmark Results}
|
||||
\begin{table}[htbp]
|
||||
\begin{center}\begin{minipage}{\textwidth}
|
||||
\begin{center}
|
||||
\begin{tabular}{| c | c | c | c | c |}
|
||||
\hline
|
||||
Implementation & \multicolumn{4}{|c|}{min/avg/max in Gbit/s} \\
|
||||
|
@ -505,16 +528,14 @@ P4 / NetPFGA & 9.28 / 9.28 / 9.29 & 9.28 / 9.28 / 9.29 & 9.28 / 9.28
|
|||
Parallel connections & 1 & 10 & 20 & 50 \\
|
||||
\hline
|
||||
\end{tabular}
|
||||
\end{minipage}
|
||||
\caption{IPv6 to IPv4 TCP NAT64 Benchmark}
|
||||
\label{tab:benchmarkv6}
|
||||
\end{center}
|
||||
\end{table}
|
||||
%ok
|
||||
% ---------------------------------------------------------------------
|
||||
\subsection{\label{results:benchmark:v4v6tcp}IPv4 to IPv6 TCP Benchmark Results}
|
||||
\begin{table}[htbp]
|
||||
\begin{center}\begin{minipage}{\textwidth}
|
||||
\begin{center}
|
||||
\begin{tabular}{| c | c | c | c | c |}
|
||||
\hline
|
||||
Implementation & \multicolumn{4}{|c|}{min/avg/max in Gbit/s} \\
|
||||
|
@ -531,18 +552,13 @@ P4 / NetPFGA & 8.51 / 8.53 / 8.55 & 9.28 / 9.28 / 9.29 & 9.29 / 9.29 /
|
|||
Parallel connections & 1 & 10 & 20 & 50 \\
|
||||
\hline
|
||||
\end{tabular}
|
||||
\end{minipage}
|
||||
\caption{IPv4 to IPv6 TCP NAT64 Benchmark}
|
||||
\label{tab:benchmarkv4}
|
||||
\end{center}
|
||||
\end{table}
|
||||
|
||||
% ---------------------------------------------------------------------
|
||||
\newpage
|
||||
\subsection{\label{results:benchmark:v6v4udp}IPv6 to IPv4 UDP
|
||||
Benchmark Results}
|
||||
\begin{table}[htbp]
|
||||
\begin{center}\begin{minipage}{\textwidth}
|
||||
\begin{center}
|
||||
\begin{tabular}{| c | c | c | c | c |}
|
||||
\hline
|
||||
Implementation & \multicolumn{4}{|c|}{avg bandwidth in gbit/s / avg loss /
|
||||
|
@ -560,16 +576,14 @@ P4 / NetPFGA & 8.28 / 0\% / 8.28 & 9.26 / 0\% / 9.26 &
|
|||
Parallel connections & 1 & 10 & 20 & 50 \\
|
||||
\hline
|
||||
\end{tabular}
|
||||
\end{minipage}
|
||||
\caption{IPv6 to IPv4 UDP NAT64 Benchmark}
|
||||
\label{tab:benchmarkv6v4udp}
|
||||
\end{center}
|
||||
\end{table}
|
||||
%ok
|
||||
% ---------------------------------------------------------------------
|
||||
\subsection{\label{results:benchmark:v4v6udp}IPv4 to IPv6 UDP Benchmark Results}
|
||||
\begin{table}[htbp]
|
||||
\begin{center}\begin{minipage}{\textwidth}
|
||||
\begin{center}
|
||||
\begin{tabular}{| c | c | c | c | c |}
|
||||
\hline
|
||||
Implementation & \multicolumn{4}{|c|}{avg bandwidth in gbit/s / avg loss /
|
||||
|
@ -587,9 +601,8 @@ P4 / NetPFGA & 7.04 / 0\% / 7.04 & 9.58 / 0\% / 9.58 &
|
|||
Parallel connections & 1 & 10 & 20 & 50 \\
|
||||
\hline
|
||||
\end{tabular}
|
||||
\end{minipage}
|
||||
\caption{IPv4 to IPv6 UDP NAT64 Benchmark}
|
||||
\label{tab:benchmarkv6v4udp}
|
||||
\label{tab:benchmarkv4v6udp}
|
||||
\end{center}
|
||||
\end{table}
|
||||
%ok
|
||||
|
|
BIN
doc/Thesis.pdf
BIN
doc/Thesis.pdf
Binary file not shown.
|
@ -16,7 +16,7 @@ digraph G {
|
|||
tableentry [ label="Create Table Entry" ];
|
||||
tablematch [ label="Table Match" ];
|
||||
|
||||
reinject [ label="Reinject packet" ];
|
||||
reinject [ label="Reinject Packet" ];
|
||||
controller [ label="Controller Reads Packet" ]
|
||||
|
||||
deparser [ label="Deparser"];
|
||||
|
|
Loading…
Reference in a new issue