424 lines
17 KiB
TeX
424 lines
17 KiB
TeX
\chapter{\label{design}Design}
|
|
%** Design.tex: How was the problem attacked, what was the design
|
|
% the architecture
|
|
In this chapter we describe the architecture of our solution and our
|
|
design choices.
|
|
|
|
% ----------------------------------------------------------------------
|
|
\section{\label{design:configuration}IPv6 and IPv4 configuration}
|
|
The following sections refer to host and network configurations. In
|
|
this section we describe the IPv6 and IPv4 configurations as a basis
|
|
for the discussion.
|
|
|
|
All IPv6 addresses are from the documentation block
|
|
\textit{2001:DB8::/32}~\cite{rfc3849}. In particular the following sub
|
|
networks and IPv6 addresses are used:
|
|
\begin{table}[htbp]
|
|
\begin{center}\begin{minipage}{\textwidth}
|
|
\begin{tabular}{| c | c |}
|
|
\hline
|
|
\textbf{Address} & \textbf{Description} \\
|
|
\hline
|
|
2001:db8:42::/64 & IPv6 host network \\
|
|
\hline
|
|
2001:db8:23::/96 & IPv6 mapping to the IPv4 Internet \\
|
|
\hline
|
|
2001:db8:42::42 & IPv6 host address \\
|
|
\hline
|
|
2001:db8:42::77 & IPv6 router address \\
|
|
\hline
|
|
2001:db8:42::a00:2a & In-network IPv6 address mapped to 10.0.0.42 (p4)\\
|
|
\hline
|
|
2001:db8:23::a00:2a & IPv6 address mapped to 10.0.0.42 (tayga) \\
|
|
\hline
|
|
2001:db8:23::2a & IPv6 address mapped to 10.0.0.42 (jool)\\
|
|
\hline
|
|
\end{tabular}
|
|
\end{minipage}
|
|
\caption{IPv6 address and network overview}
|
|
\label{tab:ipv6address}
|
|
\end{center}
|
|
\end{table}
|
|
|
|
We use private IPv4 addresses as specified by RFC1918~\cite{rfc1918}
|
|
from the 10.0.0.0/8 range as follows:
|
|
|
|
\begin{table}[htbp]
|
|
\begin{center}\begin{minipage}{\textwidth}
|
|
\begin{tabular}{| c | c |}
|
|
\hline
|
|
\textbf{Address} & \textbf{Description} \\
|
|
\hline
|
|
10.0.0.0/24 & IPv4 host network \\
|
|
\hline
|
|
10.0.1.0/24 & IPv4 network mapping to IPv6\\
|
|
\hline
|
|
10.0.0.77 & IPv4 router address\\
|
|
\hline
|
|
10.0.0.66 & In-network IPv4 address mapped to 2001:db8:42::42 (p4)\\
|
|
\hline
|
|
10.0.1.42 & IPv4 address mapped to 2001:db8:42::42 (tayga)\\
|
|
\hline
|
|
10.0.1.66 & IPv4 address mapped to 2001:db8:42::42 (jool)\\
|
|
\hline
|
|
\end{tabular}
|
|
\end{minipage}
|
|
\caption{IPv4 address and network overview}
|
|
\label{tab:ipv4address}
|
|
\end{center}
|
|
\end{table}
|
|
% ok
|
|
% ----------------------------------------------------------------------
|
|
\section{\label{design:tests}NAT64 Verification}
|
|
We use socat~\cite{rieger:_multip} to verify basic operation of the
|
|
NAT64 gateway and iperf~\cite{dugan:_tcp_udp_sctp} to test stability
|
|
of the implementation and measure bandwidth.
|
|
In particular we use
|
|
the commands listed in table \ref{tab:nat64verification}. The socat
|
|
commands allow interactive testing on TCP and UDP connections, while
|
|
the iperf commands fully utilise the available bandwidth with test
|
|
data.
|
|
The socat and iperf commands are used to verify all three NAT64
|
|
implementations (p4, tayga, jool).
|
|
\begin{table}[htbp]
|
|
\begin{center}\begin{minipage}{\textwidth}
|
|
\begin{tabular}{| c | c | c |}
|
|
\hline
|
|
\textbf{Command} & \textbf{Example} & \textbf{Description} \\
|
|
\hline
|
|
\texttt{socat - TCP6:HOST:PORT} & socat -
|
|
TCP6:[2001:db8:42::a00:2a]:2345 & Connect via IPv6/TCP\\
|
|
& & to IPv4 host\\
|
|
%\hline
|
|
\texttt{socat - UDP6:HOST:PORT} & socat -
|
|
UDP6:[2001:db8:42::a00:2a]:2345 & Connect via IPv6/UDP \\ & & to IPv4 host\\
|
|
%\hline
|
|
\texttt{socat - TCP:HOST:PORT} & socat -
|
|
TCP:10.0.1.42:2345 & Connect via IPv4/TCP \\ & & to IPv6 host \\
|
|
%\hline
|
|
\texttt{socat - UDP:HOST:PORT} & socat -
|
|
UDP:10.0.1.42:2345 & Connect via IPv4/UDP \\ & & to IPv6 host \\
|
|
\hline
|
|
\texttt{socat - UDP6-LISTEN:PORT} & socat -
|
|
UDP6-LISTEN:2345 & Listen on IPv6/UDP \\
|
|
%\hline
|
|
\texttt{socat - TCP6-LISTEN:PORT} & socat -
|
|
TCP6-LISTEN:2345 & Listen on IPv6/TCP \\
|
|
%\hline
|
|
\texttt{socat - UDP-LISTEN:PORT} & socat -
|
|
UDP-LISTEN:2345 & Listen on IPv4/UDP \\
|
|
%\hline
|
|
\texttt{socat - TCP-LISTEN:PORT} & socat -
|
|
TCP-LISTEN:2345 & Listen on IPv4/TCP \\
|
|
\hline
|
|
\texttt{iperf3 -PROTO -p PORT} & iperf3 -4 -p 2345 & IPv4 iperf server\\
|
|
\texttt{-B IP -s} & -B 10.0.0.42 -s &\\
|
|
& iperf3 -6 -p 2345 & IPv6 iperf server\\
|
|
& -B 2001:db8:42::42 -s & \\
|
|
\hline
|
|
\texttt{iperf3 -PROTO -p PORT } & iperf3 -6 -p 2345& Connect to iperf server\\
|
|
\texttt{-O IGNORETIME -t RUNTIME} & -O 10 -t 190 &
|
|
Run for 190 seconds, \\
|
|
& & skip first 10 seconds\\
|
|
\texttt{-P PARALLEL -c IP} & -P20 -c 2001:db8:23::2a &
|
|
with 20 sessions\\
|
|
& & connecting to\\
|
|
& & 2001:db8:23::2a\\
|
|
\texttt{iperf3 -PROTO -p PORT} & & Same as above,\\
|
|
\texttt{-O IGNORETIME -t RUNTIME} & & but connect via UDP\\
|
|
\texttt{-P PARALLEL -c IP} & & \\
|
|
\texttt{-u -b0} & & \\
|
|
\hline
|
|
\end{tabular}
|
|
\end{minipage}
|
|
\caption{NAT64 verification commands}
|
|
\label{tab:nat64verification}
|
|
\end{center}
|
|
\end{table}
|
|
% ----------------------------------------------------------------------
|
|
\section{\label{design:nat64}NAT64 with P4}
|
|
\begin{figure}[h]
|
|
\includegraphics[scale=0.4]{switchdesign}
|
|
\centering
|
|
\caption{P4 Switch Architecture}
|
|
\label{fig:switchdesign}
|
|
\end{figure}
|
|
In section \ref{background:transition} we discussed different
|
|
translation mechanisms for IPv6 and IPv4. In this thesis we focus on
|
|
the translation mechansims stateless and stateful NAT64. While higher
|
|
layer protocol dependent translations are more flexible, this topic
|
|
has already been addressed in
|
|
\cite{nico18:_implem_layer_ipv4_ipv6_rever_proxy} and the focus in
|
|
this thesis is on the practicability of high speed NAT64 with P4.
|
|
The high level design can be seen in figure \ref{fig:switchdesign}: a
|
|
P4 capable switch is running our code to provide NAT64
|
|
functionality. A P4 switch cannot manage its tables on it own and
|
|
needs support for this from a controller. The controller also has the
|
|
role to handle unknown packets and can modify the runtime
|
|
configuration of the switch. This is especially useful in the case of
|
|
stateful NAT64.
|
|
If only static table entries
|
|
are required, they can usually be added at the start of a P4 switch
|
|
and the controller can also be omitted. However stateful
|
|
NAT64 requires the use of a controller to create session entries in the
|
|
switch tables.
|
|
The P4 switch can use any protocol to communicate with the controller, as
|
|
the connection to the controller is implemented as a separate ethernet
|
|
port.
|
|
\begin{figure}[h]
|
|
\includegraphics[scale=0.4]{v6-v4-standard}
|
|
\centering
|
|
\caption{Standard NAT64 translation}
|
|
\label{fig:v6v4standard}
|
|
\end{figure}
|
|
|
|
Software NAT64 solutions typically require routing to be applied to
|
|
transport the packet to the NAT64 translator as shown in figure
|
|
\ref{fig:v6v4standard}.
|
|
|
|
Our design differs here: while routing could be used like described
|
|
above, NAT64 with P4 does not require any routing to be setup. Figure
|
|
\ref{fig:v6v4mixed} shows the network design that we realise using
|
|
P4. This design has multiple advantages: first it reduces the number
|
|
of devices to pass and thus directly reduces the RTT, secondly it
|
|
allows translation of IP addresses within the same logic network
|
|
segment.
|
|
\begin{figure}[h]
|
|
\includegraphics[scale=0.4]{v6-v4-mixed}
|
|
\centering
|
|
\caption{In-network NAT64 translation}
|
|
\label{fig:v6v4mixed}
|
|
\end{figure}
|
|
|
|
% ----------------------------------------------------------------------
|
|
\section{\label{design:statelessnat64}Stateless NAT64}
|
|
As seen in section \ref{background:transition:stateless}, stateless
|
|
NAT64 can be implemented using various factors. Our design for the
|
|
stateless depends on the capabilities of the environment and is
|
|
summarised in table \ref{tab:statelessnat64factors}.
|
|
\begin{table}[htbp]
|
|
\begin{center}\begin{minipage}{\textwidth}
|
|
\begin{tabular}{| c | c |}
|
|
\hline
|
|
\textbf{Implementation} & \textbf{NAT64 match}\\
|
|
\hline
|
|
P4/BMV2 & LPM (both directions)\\
|
|
& and individual entries (both directions)\\
|
|
\hline
|
|
P4/NetPFGA & Individual entries\\
|
|
\hline
|
|
Tayga & LPM (IPv6 to IPv4) and individual entries (IPv4 to IPv6)\\
|
|
\hline
|
|
Jool & LPM (both directions)\\
|
|
\hline
|
|
\end{tabular}
|
|
\end{minipage}
|
|
\caption{NAT64 match factors}
|
|
\label{tab:statelessnat64factors}
|
|
\end{center}
|
|
\end{table}
|
|
When using LPM for translating from IPv6 to IPv4, a /96 IPv6 network
|
|
is configured for covering the whole IPv4 Internet and the individual
|
|
IPv4 address is appended to the prefix (compare section
|
|
\ref{design:configuration}). We also use LPM to match on an IPv4 sub
|
|
network that translates to an IPv6 sub network. Individual
|
|
entries are configured differently depending on the implementation:
|
|
Limitations in the P4/NetFPGA environment require to use table
|
|
entries. Jool supports individual entries as a special case of LPM,
|
|
with a network mask matching only one IP address. Tayga
|
|
support LPM for translation from IPv6 to IPv4, but requires invidiual
|
|
entries for translating from IPv4 to IPv6. Our P4/BMV2 offers the
|
|
highest degree of flexibility, as it provides support for invidual
|
|
entries based on table entries and LPM table entries.
|
|
% ----------------------------------------------------------------------
|
|
\section{\label{design:statefulnat64}Stateful NAT64}
|
|
Similar to stateless NAT64, the design of stateful NAT64 depends on
|
|
the features of the invidual implementation. As pointed out in section
|
|
\ref{background:transition:statefulnat64}, stateful NAT64 is very
|
|
similar to stateless NAT64, with the main difference being an
|
|
additional stateful table that helps to create 1:n mappings.
|
|
We use different approaches within the implementations
|
|
to solve this problem:
|
|
\begin{itemize}
|
|
\item For P4/BMV2 and P4/NetPFGA a python controller handles packets
|
|
that don't have a table entry, sets the table entry in the P4 switch
|
|
and inserts the original packet afterwards back into the switch.
|
|
\item With tayga we rely on the Linux kernel NAT44 capabilities
|
|
\item Jool implements its own stateful mechanism based on a port
|
|
ranges
|
|
\end{itemize}
|
|
All methods though operate in a very similar fashion: A ``controller''
|
|
inspects the IPv6 packet and depending on the source address,
|
|
destination address, protocol (TCP, UDP,
|
|
ICMP, ICMP6, etc.) and the protocol ID (source / destination TCP/UDP
|
|
port, ICMP identifier) it selects an outgoing IPv4 address, and source
|
|
port or ICMP identifier.
|
|
In case of Jool and Tayga this decision is based on a session table
|
|
inside the Linux kernel, in case of P4 this decision is based on a
|
|
session table inside the python controller. While the Jool and Tayga
|
|
both support cleaning up old session entries,
|
|
our P4 based solution does not support this feature at the moment.
|
|
% ----------------------------------------------------------------------
|
|
\section{\label{design:bmv2}P4/BMV2}
|
|
\begin{figure}[h]
|
|
\begin{verbatim}
|
|
/* checksumming for icmp6_na_ns_option */
|
|
update_checksum_with_payload(meta.chk_icmp6_na_ns == 1,
|
|
{
|
|
hdr.ipv6.src_addr, /* 128 */
|
|
hdr.ipv6.dst_addr, /* 128 */
|
|
meta.cast_length, /* 32 */
|
|
24w0, /* 24 0's */
|
|
PROTO_ICMP6, /* 8 */
|
|
hdr.icmp6.type, /* 8 */
|
|
hdr.icmp6.code, /* 8 */
|
|
|
|
hdr.icmp6_na_ns.router,
|
|
hdr.icmp6_na_ns.solicitated,
|
|
hdr.icmp6_na_ns.override,
|
|
hdr.icmp6_na_ns.reserved,
|
|
hdr.icmp6_na_ns.target_addr,
|
|
|
|
hdr.icmp6_option_link_layer_addr.type,
|
|
hdr.icmp6_option_link_layer_addr.ll_length,
|
|
hdr.icmp6_option_link_layer_addr.mac_addr
|
|
},
|
|
hdr.icmp6.checksum,
|
|
HashAlgorithm.csum16
|
|
);
|
|
\end{verbatim}
|
|
\centering
|
|
\caption{P4/BMV2 checksumming}
|
|
\label{fig:bmv2checksum}
|
|
\end{figure}
|
|
The software emulated switch that is implemented using
|
|
Open vSwitch~\cite{openvswitch} and the
|
|
behavioral model~\cite{_implem_your_switc_target_with_bmv2}
|
|
offers the fastest and easiest way of P4 development. All NAT64
|
|
features are tested first on P4/BMV2 and in a second step ported to
|
|
P4/NetFPGA and modified, where necessary.
|
|
The development follows closely the general design shown in section
|
|
\ref{design:nat64}.
|
|
As outlined in section \ref{background:checksums}, checksums inside
|
|
higher level protocols need to be adjusted after translation.
|
|
Within the software emulation checksums can be
|
|
computed with two different methods:
|
|
\begin{itemize}
|
|
\item Recalculating the checksum by inspecting headers and payload
|
|
\item Calculating the difference between the translated headers
|
|
\end{itemize}
|
|
The BMV2 model is sophisticated and provides direct support
|
|
for calculating the checksum over the payload. This allows the BMV2
|
|
model to operate as a full featured host, including advanced features
|
|
like responding to ICMP6 Neighbor discovery requests~\cite{rfc4861}
|
|
that include payload checksums. Sample code that calculates the
|
|
required checksum for answering NDP queries is shown in figure
|
|
\ref{fig:bmv2checksum}. The code shows how the field
|
|
\texttt{hdr.icmp6.checksum} is updated with the \texttt{csum16} method
|
|
depending on the IPv6 and ICMP6 headers as well as the payload. The
|
|
second option of using the differences is described in section
|
|
\ref{design:netpfga}.
|
|
% ok
|
|
% ----------------------------------------------------------------------
|
|
\section{\label{design:netpfga}NetFPGA - FIXME: relate things}
|
|
While the P4-NetFPGA project ~\cite{netfpga:_p4_netpf_public_github}
|
|
allows compiling P4 to the NetPFGA, the design slightly varies.
|
|
In particular, the NetFPGA P4 compiler does not support reading
|
|
the payload. For this reason it also does not support
|
|
creating the checksum based on the payload.
|
|
To support checksum modifications in NAT64 on the NetFPGA, the
|
|
checksum was calculated on the netpfga using differences between
|
|
the IPv6 and IPv4 headers. Figure \ref{fig:checksumbydiff} shows an
|
|
excerpt of the code used for calculating checksums in the netpfga.
|
|
\begin{figure}[h]
|
|
\begin{verbatim}
|
|
action v4sum() {
|
|
bit<16> tmp = 0;
|
|
|
|
tmp = tmp + (bit<16>) hdr.ipv4.src_addr[15:0]; // 16 bit
|
|
tmp = tmp + (bit<16>) hdr.ipv4.src_addr[31:16]; // 16 bit
|
|
tmp = tmp + (bit<16>) hdr.ipv4.dst_addr[15:0]; // 16 bit
|
|
tmp = tmp + (bit<16>) hdr.ipv4.dst_addr[31:16]; // 16 bit
|
|
|
|
tmp = tmp + (bit<16>) hdr.ipv4.totalLen -20; // 16 bit
|
|
tmp = tmp + (bit<16>) hdr.ipv4.protocol; // 8 bit
|
|
|
|
meta.v4sum = ~tmp;
|
|
}
|
|
|
|
/* analogue code for v6sum skipped */
|
|
|
|
action delta_tcp_from_v6_to_v4()
|
|
{
|
|
v6sum();
|
|
v4sum();
|
|
|
|
bit<17> tmp = (bit<17>) hdr.tcp.checksum + (bit<17>) meta.v4sum;
|
|
if (tmp[16:16] == 1) {
|
|
tmp = tmp + 1;
|
|
tmp[16:16] = 0;
|
|
}
|
|
tmp = tmp + (bit<17>) (0xffff - meta.v6sum);
|
|
if (tmp[16:16] == 1) {
|
|
tmp = tmp + 1;
|
|
tmp[16:16] = 0;
|
|
}
|
|
|
|
hdr.tcp.checksum = (bit<16>) tmp;
|
|
}
|
|
|
|
\end{verbatim}
|
|
\centering
|
|
\caption{Calculating checksum based on header differences}
|
|
\label{fig:checksumbydiff}
|
|
\end{figure}
|
|
The checksums for IPv4, TCP, UDP and ICMP6 are all based on the
|
|
``Internet Checksum''~\cite{rfc791},~\cite{rfc1071}.
|
|
Its calculation can be summarised as follows:
|
|
\begin{quote}
|
|
The checksum field is the 16-bit one's complement of the one's
|
|
complement sum of all 16-bit words in the header. For purposes of
|
|
computing the checksum, the value of the checksum field
|
|
is zero.\footnote{Quote from Wikipedia~\cite{wikipedia:_ipv4}.}.
|
|
\end{quote}
|
|
As the calculation mainly depends on on (1-complement) sums, the
|
|
checksums after translating the protocol can be corrected by
|
|
subtracting the differences of the relevant fields. It is notable that
|
|
not the full headers are used, but the pseudo headers (compare figures
|
|
\ref{fig:ipv6pseudoheader} and \ref{fig:ipv4pseudoheader}).
|
|
To compensate the carry bit, our code uses 17 bit integers for
|
|
correcting the carry.
|
|
% FIXME: add note to python script / checksum diffing
|
|
|
|
|
|
% ----------------------------------------------------------------------
|
|
\section{\label{design:benchmarks}Benchmarks}
|
|
The benchmarks were performed on two hosts, a load generator and a
|
|
nat64 translator. Both hosts were equipped with a dual port
|
|
Intel X520 10 Gbit/s network card. Both hosts were connected using DAC
|
|
without any equipment in between. TCP offloading was enabled in the
|
|
X520 cards. Figure \ref{fig:softwarenat64design}
|
|
shows the network setup.
|
|
\begin{figure}[h]
|
|
\includegraphics[scale=0.5]{softwarenat64design}
|
|
\centering
|
|
\caption{NAT64 in software benchmark}
|
|
\label{fig:softwarenat64design}
|
|
\end{figure}
|
|
When testing the NetPFGA/P4 performance, the X520 cards in the NAT64
|
|
translator were diconnected and instead the NetPFGA ports were
|
|
connected, as show in figure \ref{fig:netpfgadesign}. The load
|
|
generator is equipped with a quad core CPU (Intel(R) Core(TM) i7-6700
|
|
CPU @ 3.40GHz), enabled with hyperthreading and 16 GB RAM. The NAT64
|
|
translator is also equipped with a quard core CPU (Intel(R) Core(TM)
|
|
i7-4770 CPU @ 3.40GHz) and 16 GB RAM.
|
|
|
|
The first 10 seconds of the benchmark were excluded to avoid the TCP
|
|
warm up phase.\footnote{iperf -O 10 parameter}
|
|
\begin{figure}[h]
|
|
\includegraphics[scale=0.5]{netpfgadesign}
|
|
\centering
|
|
\caption{NAT64 with NetFPGA benchmark}
|
|
\label{fig:netpfgadesign}
|
|
\end{figure}
|
|
% ok
|