master-thesis/doc/Design.tex

\chapter{\label{design}Design}
%** Design.tex: How was the problem attacked, what was the design
%               the architecture
In this chapter we describe the architecture of our solution.

% ----------------------------------------------------------------------
\section{\label{design:configuration}IPv6 and IPv4 configuration}
The following sections refer to host and network configurations. In
this section we describe the IPv6 and IPv4 configurations as a basis
for the discussion.

All IPv6 addresses are from the documentation block
\textit{2001:DB8::/32}~\cite{rfc3849}. In particular the following sub
networks and IPv6 addresses are used:

\begin{table}[htbp]
\begin{center}\begin{minipage}{\textwidth}
\begin{tabular}{| c | c |}
\hline
\textbf{Address} & \textbf{Description} \\
\hline
2001:db8:42::/64 & IPv6 host network \\
\hline
2001:db8:23::/96 & IPv6 mapping to the IPv4 Internet \\
\hline
2001:db8:42::42 & IPv6 host address \\
\hline
2001:db8:42::77 & IPv6 router address \\
\hline
2001:db8:42::a00:2a & In-network IPv6 address mapped to 10.0.0.42 (p4)\\
\hline
2001:db8:23::a00:2a & IPv6 address mapped to 10.0.0.42 (tayga) \\
\hline
2001:db8:23::2a & IPv6 address mapped to 10.0.0.42 (jool)\\
\hline
\end{tabular}
\end{minipage}
\caption{IPv6 address and network overview}
\label{tab:ipv6address}
\end{center}
\end{table}

We use private IPv4 addresses as specified by RFC1918~\cite{rfc1918}
from the 10.0.0.0/8 range as follows:

\begin{table}[htbp]
\begin{center}\begin{minipage}{\textwidth}
\begin{tabular}{| c | c |}
\hline
\textbf{Address} & \textbf{Description} \\
\hline
10.0.0.0/24 & IPv4 host network \\
\hline
10.0.1.0/24 & IPv4 network mapping to IPv6\\
\hline
10.0.0.77 & IPv4 router address\\
\hline
10.0.0.66 & In-network IPv4 address mapped to 2001:db8:42::42 (p4)\\
\hline
10.0.1.42 & IPv4 address mapped to 2001:db8:42::42 (tayga)\\
\hline
10.0.1.66 & IPv4 address mapped to 2001:db8:42::42 (jool)\\
\hline
\end{tabular}
\end{minipage}
\caption{IPv4 address and network overview}
\label{tab:ipv4address}
\end{center}
\end{table}


% ----------------------------------------------------------------------
\section{\label{design:nat64}NAT64 with P4 - FIXME: elaborate}
\begin{figure}[h]
  \includegraphics[scale=0.5]{switchdesign}
  \centering
  \caption{P4 Switch Architecture}
  \label{fig:switchdesign}
\end{figure}
In section \ref{background:transition} we discussed different
translation mechanisms for IPv6 and IPv4. In this thesis we focus on
the translation mechansims stateless and stateful NAT64. While higher
layer protocol dependent translations are more flexible, this topic
has already been addressed in
\cite{nico18:_implem_layer_ipv4_ipv6_rever_proxy} and the focus in
this thesis is on the practicability of high speed NAT64.
The high level design can be seen in figure \ref{fig:switchdesign}: a
P4 capable switch is running our code to provide NAT64
functionality. A P4 switch cannot manage its tables on it own and
needs support for this from a controller. The controller also has the
role to handle unknown packets and can modify the runtime
configuration of the switch. This is especially useful in the case of
stateful NAT64.
If only static table entries
are required, they can usually be added at the start of a P4 switch
and the controller can also be omitted. However stateful
NAT64 requires the use of a controller to create session entries in the
switch tables.
The P4 switch can use any protocol to communicate with the controller, as
the connection to the controller is implemented as a separate ethernet
port.
\begin{figure}[h]
  \includegraphics[scale=0.4]{v6-v4-standard}
  \centering
  \caption{Standard NAT64 translation}
  \label{fig:v6v4standard}
\end{figure}

Software NAT64 solutions typically require routing to be applied to
transport the packet to the NAT64 translator as shown in
\ref{fig:v6v4standard}.

Our design differs here: while routing could be used like described
above, NAT64 with P4 does not require any routing to be setup. Figure
\ref{fig:v6v4mixed} shows a network design that can be realised using
P4. This design has multiple advantages: first it reduces the number
of devices to pass and thus directly reduces the RTT. Secondly it
allows translation of IP addresses within the same logic network
segment.

\begin{figure}[h]
  \includegraphics[scale=0.4]{v6-v4-mixed}
  \centering
  \caption{In-network NAT64 translation}
  \label{fig:v6v4mixed}
\end{figure}


allows our solution to be used as a standard NAT64
translation method or as an in network NAT64 translation (compare
figures \ref{fig:v6v4innetwork} and \ref{fig:v6v4standard}). The
controller is implemented in python, the NAT64 solution is implemented
in P4. The network
\begin{figure}[h]
  \includegraphics[scale=0.5]{networkdesignnat64}
  \centering
  \caption{Network design}
  \label{fig:switchdesign}
\end{figure}


from intro:


Figures \ref{fig:v6v4standard} shows the standard NAT64
approach and \ref{fig:v6v4innetwork} shows our solution.
%% \begin{figure}[h]
%%   \includegraphics[scale=0.6]{v6-v4-innetwork}
%%   \centering
%%   \caption{In Network NAT64 translation}
%%   \label{fig:v6v4innetwork}
%% \end{figure}


Describe network layouts
\begin{verbatim}
    - IPv6 subnet 2001:db8::/32
    - IPv6 hosts are in 2001:db8:6::/64
    - IPv6 default router (::/0) is 2001:db8:6::42/64
    - IPv4 mapped Internet "NAT64 prefix" 2001:db8:4444::/96 (should
      go into a table)
    - IPv4 hosts are in 10.0.4.0/24
    - IPv6 in IPv4 mapped hosts are in 10.0.6.0/24
    - IPv4 default router = 10.0.0.42

\end{verbatim}

Describe testing methods
\begin{verbatim}
    def test_v4_udp_to_v6(self):
        print('mx h3 "echo V4-OK | socat - UDP:10.1.1.1:2342"')
        print('mx h1 "echo V6-OK | socat - UDP-LISTEN:2342"')

        return

p4@ubuntu:~$ mx h1 "echo V6-OK | socat - UDP6-LISTEN:2342"
p4@ubuntu:~/master-thesis/bin$ mx h3 "echo V4-OK | socat - UDP:10.1.1.1:2342"

while true; do mx h3 "echo V4-OK | socat - TCP-LISTEN:2343"; sleep 2;
done

while true; do mx h1 "echo V6-OK | socat -
TCP6:[2001:db8:1::a00:1]:2343"; sleep 2; done

 mx h1 "echo V6-OK | socat - TCP6:[2001:db8:1::a00:1]:2343"

\end{verbatim}
% ----------------------------------------------------------------------
% ----------------------------------------------------------------------
\section{\label{design:statelessnat64}Stateless NAT64 - FIXME: write}
Only using /96. Using addition.
% ----------------------------------------------------------------------
\section{\label{design:statefulnat64}Stateful NAT64 - FIXME: write}
- controller selects "outgoing" IPv4 address range => base for sessions
- IPv4 addresses can be "random" (in our test case), but need
to be unique
- switch does not need to know about the "range", only about
sessions
- on session create, controller selects "random" ip (ring?)
- on session create, controller selects "random port" (next in range?)
- on session create controller adds choice into 2 tables:
incoming, outgoing

% ----------------------------------------------------------------------
\section{\label{Design:BMV2}BMV2}
Development of the thesis took place on a software emulated switch
that is implemented using Open vSwitch~\cite{openvswitch}
and the behavioral model~\cite{_implem_your_switc_target_with_bmv2}.
The development followed
closely the general design shown in section
\ref{design:nat64}. Within the software emulation checksums can be
computed with two different methods:
\begin{itemize}
\item Recalculating the checksum by inspecting headers and payload
\item Calculating the difference between the translated headers
\end{itemize}
The BMV2 model is rather sophisticated and provides many standard
features including checksumming over payload. This allows the BMV2
model to operate as a full featured host, including advanced features
like responding to ICMP6 Neighbor discovery requests~\cite{rfc4861}
that include payload checksums.
A typical code to create the checksum can be found in figure
\ref{fig:checksum}.
\begin{figure}[h]
\begin{verbatim}
/* checksumming for icmp6_na_ns_option */
update_checksum_with_payload(meta.chk_icmp6_na_ns == 1,
	{
        hdr.ipv6.src_addr,         /* 128 */
        hdr.ipv6.dst_addr,         /* 128 */
        meta.cast_length,          /* 32 */
        24w0,                      /* 24 0's */
        PROTO_ICMP6,               /* 8 */
        hdr.icmp6.type,            /* 8 */
        hdr.icmp6.code,            /* 8 */

        hdr.icmp6_na_ns.router,
        hdr.icmp6_na_ns.solicitated,
        hdr.icmp6_na_ns.override,
        hdr.icmp6_na_ns.reserved,
        hdr.icmp6_na_ns.target_addr,

        hdr.icmp6_option_link_layer_addr.type,
        hdr.icmp6_option_link_layer_addr.ll_length,
        hdr.icmp6_option_link_layer_addr.mac_addr
    },
    hdr.icmp6.checksum,
    HashAlgorithm.csum16
);
\end{verbatim}
  \centering
  \caption{IPv4 Pseudo Header}
  \label{fig:checksum}
\end{figure}

% ----------------------------------------------------------------------
\section{\label{Design:NetPFGA}NetFPGA - FIXME: relate things}
While the P4-NetFPGA project ~\cite{netfpga:_p4_netpf_public_github}
allows compiling P4 to the NetPFGA, the design slightly varies.
In particular, the NetFPGA P4 compiler does not support reading
the payload. For this reason it also does not support
creating the checksum based on the payload.
To support checksum modifications in NAT64 on the NetFPGA, the
checksum was calculated on the netpfga using differences between
the IPv6 and IPv4 headers. Figure \ref{fig:checksumbydiff} shows an
excerpt of the code used for calculating checksums in the netpfga.
\begin{figure}[h]
\begin{verbatim}
action v4sum() {
    bit<16> tmp = 0;

    tmp = tmp + (bit<16>) hdr.ipv4.src_addr[15:0];              // 16 bit
    tmp = tmp + (bit<16>) hdr.ipv4.src_addr[31:16];             // 16 bit
    tmp = tmp + (bit<16>) hdr.ipv4.dst_addr[15:0];              // 16 bit
    tmp = tmp + (bit<16>) hdr.ipv4.dst_addr[31:16];             // 16 bit

    tmp = tmp + (bit<16>) hdr.ipv4.totalLen -20;                // 16 bit
    tmp = tmp + (bit<16>) hdr.ipv4.protocol;                    // 8 bit

    meta.v4sum = ~tmp;
}

/* analogue code for v6sum skipped */

action delta_tcp_from_v6_to_v4()
{
    v6sum();
    v4sum();

    bit<17> tmp = (bit<17>) hdr.tcp.checksum + (bit<17>) meta.v4sum;
    if (tmp[16:16] == 1) {
        tmp = tmp + 1;
        tmp[16:16] = 0;
    }
    tmp = tmp + (bit<17>) (0xffff - meta.v6sum);
    if (tmp[16:16] == 1) {
        tmp = tmp + 1;
        tmp[16:16] = 0;
    }

    hdr.tcp.checksum = (bit<16>) tmp;
}

\end{verbatim}
  \centering
  \caption{Calculating checksum based on header differences}
  \label{fig:checksumbydiff}
\end{figure}
The checksums for IPv4, TCP, UDP and ICMP6 are all based on the
``Internet Checksum''~\cite{rfc791},~\cite{rfc1071}.
Its calculation can be summarised as follows:
\begin{quote}
    The checksum field is the 16-bit one's complement of the one's
    complement sum of all 16-bit words in the header. For purposes of
    computing the checksum, the value of the checksum field
    is zero.\footnote{Quote from Wikipedia~\cite{wikipedia:_ipv4}.}.
\end{quote}
As the calculation mainly depends on on (1-complement) sums, the
checksums after translating the protocol can be corrected by
subtracting the differences of the relevant fields. It is notable that
not the full headers are used, but the pseudo headers (compare figures
\ref{fig:ipv6pseudoheader} and \ref{fig:ipv4pseudoheader}).
To compensate the carry bit, our code uses 17 bit integers for
correcting the carry.
% FIXME: add note to python script / checksum diffing


% ----------------------------------------------------------------------
\section{\label{design:benchmarks}Benchmarks}
The benchmarks were performed on two hosts, a load generator and a
nat64 translator. Both hosts were equipped with a dual port
Intel X520 10 Gbit/s network card. Both hosts were connected using DAC
without any equipment in between. TCP offloading was enabled in the
X520 cards. Figure \ref{fig:softwarenat64design}
shows the network setup.
\begin{figure}[h]
  \includegraphics[scale=0.5]{softwarenat64design}
  \centering
  \caption{NAT64 in software benchmark}
  \label{fig:softwarenat64design}
\end{figure}
When testing the NetPFGA/P4 performance, the X520 cards in the NAT64
translator were diconnected and instead the NetPFGA ports were
connected, as show in figure \ref{fig:netpfgadesign}. The load
generator is equipped with a quad core CPU (Intel(R) Core(TM) i7-6700
CPU @ 3.40GHz), enabled with hyperthreading and 16 GB RAM. The NAT64
translator is also equipped with a quard core CPU (Intel(R) Core(TM)
i7-4770 CPU @ 3.40GHz) and 16 GB RAM.

The first 10 seconds of the benchmark were excluded to avoid the TCP
warm up phase.\footnote{iperf -O 10 parameter}
\begin{figure}[h]
  \includegraphics[scale=0.5]{netpfgadesign}
  \centering
  \caption{NAT64 with NetFPGA benchmark}
  \label{fig:netpfgadesign}
\end{figure}
% ok