360 lines
13 KiB
TeX
360 lines
13 KiB
TeX
\chapter{\label{design}Design}
|
|
%** Design.tex: How was the problem attacked, what was the design
|
|
% the architecture
|
|
In this chapter we describe the architecture of our solution.
|
|
|
|
% ----------------------------------------------------------------------
|
|
\section{\label{design:configuration}IPv6 and IPv4 configuration}
|
|
The following sections refer to host and network configurations. In
|
|
this section we describe the IPv6 and IPv4 configurations as a basis
|
|
for the discussion.
|
|
|
|
All IPv6 addresses are from the documentation block
|
|
\textit{2001:DB8::/32}~\cite{rfc3849}. In particular the following sub
|
|
networks and IPv6 addresses are used:
|
|
|
|
\begin{table}[htbp]
|
|
\begin{center}\begin{minipage}{\textwidth}
|
|
\begin{tabular}{| c | c |}
|
|
\hline
|
|
\textbf{Address} & \textbf{Description} \\
|
|
\hline
|
|
2001:db8:42::/64 & IPv6 host network \\
|
|
\hline
|
|
2001:db8:23::/96 & IPv6 mapping to the IPv4 Internet \\
|
|
\hline
|
|
2001:db8:42::42 & IPv6 host address \\
|
|
\hline
|
|
2001:db8:42::77 & IPv6 router address \\
|
|
\hline
|
|
2001:db8:42::a00:2a & In-network IPv6 address mapped to 10.0.0.42 (p4)\\
|
|
\hline
|
|
2001:db8:23::a00:2a & IPv6 address mapped to 10.0.0.42 (tayga) \\
|
|
\hline
|
|
2001:db8:23::2a & IPv6 address mapped to 10.0.0.42 (jool)\\
|
|
\hline
|
|
\end{tabular}
|
|
\end{minipage}
|
|
\caption{IPv6 address and network overview}
|
|
\label{tab:ipv6address}
|
|
\end{center}
|
|
\end{table}
|
|
|
|
We use private IPv4 addresses as specified by RFC1918~\cite{rfc1918}
|
|
from the 10.0.0.0/8 range as follows:
|
|
|
|
\begin{table}[htbp]
|
|
\begin{center}\begin{minipage}{\textwidth}
|
|
\begin{tabular}{| c | c |}
|
|
\hline
|
|
\textbf{Address} & \textbf{Description} \\
|
|
\hline
|
|
10.0.0.0/24 & IPv4 host network \\
|
|
\hline
|
|
10.0.1.0/24 & IPv4 network mapping to IPv6\\
|
|
\hline
|
|
10.0.0.77 & IPv4 router address\\
|
|
\hline
|
|
10.0.0.66 & In-network IPv4 address mapped to 2001:db8:42::42 (p4)\\
|
|
\hline
|
|
10.0.1.42 & IPv4 address mapped to 2001:db8:42::42 (tayga)\\
|
|
\hline
|
|
10.0.1.66 & IPv4 address mapped to 2001:db8:42::42 (jool)\\
|
|
\hline
|
|
\end{tabular}
|
|
\end{minipage}
|
|
\caption{IPv4 address and network overview}
|
|
\label{tab:ipv4address}
|
|
\end{center}
|
|
\end{table}
|
|
|
|
|
|
% ----------------------------------------------------------------------
|
|
\section{\label{design:nat64}NAT64 with P4 - FIXME: elaborate}
|
|
\begin{figure}[h]
|
|
\includegraphics[scale=0.5]{switchdesign}
|
|
\centering
|
|
\caption{P4 Switch Architecture}
|
|
\label{fig:switchdesign}
|
|
\end{figure}
|
|
In section \ref{background:transition} we discussed different
|
|
translation mechanisms for IPv6 and IPv4. In this thesis we focus on
|
|
the translation mechansims stateless and stateful NAT64. While higher
|
|
layer protocol dependent translations are more flexible, this topic
|
|
has already been addressed in
|
|
\cite{nico18:_implem_layer_ipv4_ipv6_rever_proxy} and the focus in
|
|
this thesis is on the practicability of high speed NAT64.
|
|
The high level design can be seen in figure \ref{fig:switchdesign}: a
|
|
P4 capable switch is running our code to provide NAT64
|
|
functionality. A P4 switch cannot manage its tables on it own and
|
|
needs support for this from a controller. The controller also has the
|
|
role to handle unknown packets and can modify the runtime
|
|
configuration of the switch. This is especially useful in the case of
|
|
stateful NAT64.
|
|
If only static table entries
|
|
are required, they can usually be added at the start of a P4 switch
|
|
and the controller can also be omitted. However stateful
|
|
NAT64 requires the use of a controller to create session entries in the
|
|
switch tables.
|
|
The P4 switch can use any protocol to communicate with the controller, as
|
|
the connection to the controller is implemented as a separate ethernet
|
|
port.
|
|
\begin{figure}[h]
|
|
\includegraphics[scale=0.4]{v6-v4-standard}
|
|
\centering
|
|
\caption{Standard NAT64 translation}
|
|
\label{fig:v6v4standard}
|
|
\end{figure}
|
|
|
|
Software NAT64 solutions typically require routing to be applied to
|
|
transport the packet to the NAT64 translator as shown in
|
|
\ref{fig:v6v4standard}.
|
|
|
|
Our design differs here: while routing could be used like described
|
|
above, NAT64 with P4 does not require any routing to be setup. Figure
|
|
\ref{fig:v6v4mixed} shows a network design that can be realised using
|
|
P4. This design has multiple advantages: first it reduces the number
|
|
of devices to pass and thus directly reduces the RTT. Secondly it
|
|
allows translation of IP addresses within the same logic network
|
|
segment.
|
|
|
|
\begin{figure}[h]
|
|
\includegraphics[scale=0.4]{v6-v4-mixed}
|
|
\centering
|
|
\caption{In-network NAT64 translation}
|
|
\label{fig:v6v4mixed}
|
|
\end{figure}
|
|
|
|
|
|
allows our solution to be used as a standard NAT64
|
|
translation method or as an in network NAT64 translation (compare
|
|
figures \ref{fig:v6v4innetwork} and \ref{fig:v6v4standard}). The
|
|
controller is implemented in python, the NAT64 solution is implemented
|
|
in P4. The network
|
|
\begin{figure}[h]
|
|
\includegraphics[scale=0.5]{networkdesignnat64}
|
|
\centering
|
|
\caption{Network design}
|
|
\label{fig:switchdesign}
|
|
\end{figure}
|
|
|
|
|
|
from intro:
|
|
|
|
|
|
|
|
Figures \ref{fig:v6v4standard} shows the standard NAT64
|
|
approach and \ref{fig:v6v4innetwork} shows our solution.
|
|
%% \begin{figure}[h]
|
|
%% \includegraphics[scale=0.6]{v6-v4-innetwork}
|
|
%% \centering
|
|
%% \caption{In Network NAT64 translation}
|
|
%% \label{fig:v6v4innetwork}
|
|
%% \end{figure}
|
|
|
|
|
|
|
|
Describe network layouts
|
|
\begin{verbatim}
|
|
- IPv6 subnet 2001:db8::/32
|
|
- IPv6 hosts are in 2001:db8:6::/64
|
|
- IPv6 default router (::/0) is 2001:db8:6::42/64
|
|
- IPv4 mapped Internet "NAT64 prefix" 2001:db8:4444::/96 (should
|
|
go into a table)
|
|
- IPv4 hosts are in 10.0.4.0/24
|
|
- IPv6 in IPv4 mapped hosts are in 10.0.6.0/24
|
|
- IPv4 default router = 10.0.0.42
|
|
|
|
\end{verbatim}
|
|
|
|
Describe testing methods
|
|
\begin{verbatim}
|
|
def test_v4_udp_to_v6(self):
|
|
print('mx h3 "echo V4-OK | socat - UDP:10.1.1.1:2342"')
|
|
print('mx h1 "echo V6-OK | socat - UDP-LISTEN:2342"')
|
|
|
|
return
|
|
|
|
p4@ubuntu:~$ mx h1 "echo V6-OK | socat - UDP6-LISTEN:2342"
|
|
p4@ubuntu:~/master-thesis/bin$ mx h3 "echo V4-OK | socat - UDP:10.1.1.1:2342"
|
|
|
|
while true; do mx h3 "echo V4-OK | socat - TCP-LISTEN:2343"; sleep 2;
|
|
done
|
|
|
|
while true; do mx h1 "echo V6-OK | socat -
|
|
TCP6:[2001:db8:1::a00:1]:2343"; sleep 2; done
|
|
|
|
mx h1 "echo V6-OK | socat - TCP6:[2001:db8:1::a00:1]:2343"
|
|
|
|
\end{verbatim}
|
|
% ----------------------------------------------------------------------
|
|
% ----------------------------------------------------------------------
|
|
\section{\label{design:statelessnat64}Stateless NAT64 - FIXME: write}
|
|
Only using /96. Using addition.
|
|
% ----------------------------------------------------------------------
|
|
\section{\label{design:statefulnat64}Stateful NAT64 - FIXME: write}
|
|
- controller selects "outgoing" IPv4 address range => base for sessions
|
|
- IPv4 addresses can be "random" (in our test case), but need
|
|
to be unique
|
|
- switch does not need to know about the "range", only about
|
|
sessions
|
|
- on session create, controller selects "random" ip (ring?)
|
|
- on session create, controller selects "random port" (next in range?)
|
|
- on session create controller adds choice into 2 tables:
|
|
incoming, outgoing
|
|
|
|
% ----------------------------------------------------------------------
|
|
\section{\label{Design:BMV2}BMV2}
|
|
Development of the thesis took place on a software emulated switch
|
|
that is implemented using Open vSwitch~\cite{openvswitch}
|
|
and the behavioral model~\cite{_implem_your_switc_target_with_bmv2}.
|
|
The development followed
|
|
closely the general design shown in section
|
|
\ref{design:nat64}. Within the software emulation checksums can be
|
|
computed with two different methods:
|
|
\begin{itemize}
|
|
\item Recalculating the checksum by inspecting headers and payload
|
|
\item Calculating the difference between the translated headers
|
|
\end{itemize}
|
|
The BMV2 model is rather sophisticated and provides many standard
|
|
features including checksumming over payload. This allows the BMV2
|
|
model to operate as a full featured host, including advanced features
|
|
like responding to ICMP6 Neighbor discovery requests~\cite{rfc4861}
|
|
that include payload checksums.
|
|
A typical code to create the checksum can be found in figure
|
|
\ref{fig:checksum}.
|
|
\begin{figure}[h]
|
|
\begin{verbatim}
|
|
/* checksumming for icmp6_na_ns_option */
|
|
update_checksum_with_payload(meta.chk_icmp6_na_ns == 1,
|
|
{
|
|
hdr.ipv6.src_addr, /* 128 */
|
|
hdr.ipv6.dst_addr, /* 128 */
|
|
meta.cast_length, /* 32 */
|
|
24w0, /* 24 0's */
|
|
PROTO_ICMP6, /* 8 */
|
|
hdr.icmp6.type, /* 8 */
|
|
hdr.icmp6.code, /* 8 */
|
|
|
|
hdr.icmp6_na_ns.router,
|
|
hdr.icmp6_na_ns.solicitated,
|
|
hdr.icmp6_na_ns.override,
|
|
hdr.icmp6_na_ns.reserved,
|
|
hdr.icmp6_na_ns.target_addr,
|
|
|
|
hdr.icmp6_option_link_layer_addr.type,
|
|
hdr.icmp6_option_link_layer_addr.ll_length,
|
|
hdr.icmp6_option_link_layer_addr.mac_addr
|
|
},
|
|
hdr.icmp6.checksum,
|
|
HashAlgorithm.csum16
|
|
);
|
|
\end{verbatim}
|
|
\centering
|
|
\caption{IPv4 Pseudo Header}
|
|
\label{fig:checksum}
|
|
\end{figure}
|
|
|
|
% ----------------------------------------------------------------------
|
|
\section{\label{Design:NetPFGA}NetFPGA - FIXME: relate things}
|
|
While the P4-NetFPGA project ~\cite{netfpga:_p4_netpf_public_github}
|
|
allows compiling P4 to the NetPFGA, the design slightly varies.
|
|
In particular, the NetFPGA P4 compiler does not support reading
|
|
the payload. For this reason it also does not support
|
|
creating the checksum based on the payload.
|
|
To support checksum modifications in NAT64 on the NetFPGA, the
|
|
checksum was calculated on the netpfga using differences between
|
|
the IPv6 and IPv4 headers. Figure \ref{fig:checksumbydiff} shows an
|
|
excerpt of the code used for calculating checksums in the netpfga.
|
|
\begin{figure}[h]
|
|
\begin{verbatim}
|
|
action v4sum() {
|
|
bit<16> tmp = 0;
|
|
|
|
tmp = tmp + (bit<16>) hdr.ipv4.src_addr[15:0]; // 16 bit
|
|
tmp = tmp + (bit<16>) hdr.ipv4.src_addr[31:16]; // 16 bit
|
|
tmp = tmp + (bit<16>) hdr.ipv4.dst_addr[15:0]; // 16 bit
|
|
tmp = tmp + (bit<16>) hdr.ipv4.dst_addr[31:16]; // 16 bit
|
|
|
|
tmp = tmp + (bit<16>) hdr.ipv4.totalLen -20; // 16 bit
|
|
tmp = tmp + (bit<16>) hdr.ipv4.protocol; // 8 bit
|
|
|
|
meta.v4sum = ~tmp;
|
|
}
|
|
|
|
/* analogue code for v6sum skipped */
|
|
|
|
action delta_tcp_from_v6_to_v4()
|
|
{
|
|
v6sum();
|
|
v4sum();
|
|
|
|
bit<17> tmp = (bit<17>) hdr.tcp.checksum + (bit<17>) meta.v4sum;
|
|
if (tmp[16:16] == 1) {
|
|
tmp = tmp + 1;
|
|
tmp[16:16] = 0;
|
|
}
|
|
tmp = tmp + (bit<17>) (0xffff - meta.v6sum);
|
|
if (tmp[16:16] == 1) {
|
|
tmp = tmp + 1;
|
|
tmp[16:16] = 0;
|
|
}
|
|
|
|
hdr.tcp.checksum = (bit<16>) tmp;
|
|
}
|
|
|
|
\end{verbatim}
|
|
\centering
|
|
\caption{Calculating checksum based on header differences}
|
|
\label{fig:checksumbydiff}
|
|
\end{figure}
|
|
The checksums for IPv4, TCP, UDP and ICMP6 are all based on the
|
|
``Internet Checksum''~\cite{rfc791},~\cite{rfc1071}.
|
|
Its calculation can be summarised as follows:
|
|
\begin{quote}
|
|
The checksum field is the 16-bit one's complement of the one's
|
|
complement sum of all 16-bit words in the header. For purposes of
|
|
computing the checksum, the value of the checksum field
|
|
is zero.\footnote{Quote from Wikipedia~\cite{wikipedia:_ipv4}.}.
|
|
\end{quote}
|
|
As the calculation mainly depends on on (1-complement) sums, the
|
|
checksums after translating the protocol can be corrected by
|
|
subtracting the differences of the relevant fields. It is notable that
|
|
not the full headers are used, but the pseudo headers (compare figures
|
|
\ref{fig:ipv6pseudoheader} and \ref{fig:ipv4pseudoheader}).
|
|
To compensate the carry bit, our code uses 17 bit integers for
|
|
correcting the carry.
|
|
% FIXME: add note to python script / checksum diffing
|
|
|
|
|
|
% ----------------------------------------------------------------------
|
|
\section{\label{design:benchmarks}Benchmarks}
|
|
The benchmarks were performed on two hosts, a load generator and a
|
|
nat64 translator. Both hosts were equipped with a dual port
|
|
Intel X520 10 Gbit/s network card. Both hosts were connected using DAC
|
|
without any equipment in between. TCP offloading was enabled in the
|
|
X520 cards. Figure \ref{fig:softwarenat64design}
|
|
shows the network setup.
|
|
\begin{figure}[h]
|
|
\includegraphics[scale=0.5]{softwarenat64design}
|
|
\centering
|
|
\caption{NAT64 in software benchmark}
|
|
\label{fig:softwarenat64design}
|
|
\end{figure}
|
|
When testing the NetPFGA/P4 performance, the X520 cards in the NAT64
|
|
translator were diconnected and instead the NetPFGA ports were
|
|
connected, as show in figure \ref{fig:netpfgadesign}. The load
|
|
generator is equipped with a quad core CPU (Intel(R) Core(TM) i7-6700
|
|
CPU @ 3.40GHz), enabled with hyperthreading and 16 GB RAM. The NAT64
|
|
translator is also equipped with a quard core CPU (Intel(R) Core(TM)
|
|
i7-4770 CPU @ 3.40GHz) and 16 GB RAM.
|
|
|
|
The first 10 seconds of the benchmark were excluded to avoid the TCP
|
|
warm up phase.\footnote{iperf -O 10 parameter}
|
|
\begin{figure}[h]
|
|
\includegraphics[scale=0.5]{netpfgadesign}
|
|
\centering
|
|
\caption{NAT64 with NetFPGA benchmark}
|
|
\label{fig:netpfgadesign}
|
|
\end{figure}
|
|
% ok
|