You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
436 lines
18 KiB
436 lines
18 KiB
\chapter{\label{design}Design} |
|
%** Design.tex: How was the problem attacked, what was the design |
|
% the architecture |
|
In this chapter we describe the architecture of our solution and our |
|
design choices. We first introduce the general design of NAT64 in the |
|
P4 architecture. Afterwards we describe the design differences |
|
of the BMV2 and NetFPGA P4 architectures. Afterwards we discuss the |
|
design of stateless and stateful NAT64 in relation to P4 as well as |
|
two existing software NAT64 solutions. |
|
|
|
Lastly we discuss how we verify NAT64 functionality and |
|
present the network configurations that we use. |
|
% ---------------------------------------------------------------------- |
|
\section{\label{design:nat64}P4/NAT64} |
|
\begin{figure}[htbp] |
|
\includegraphics[scale=0.5]{switchdesign} |
|
\centering |
|
\caption{P4 Switch Architecture} |
|
\label{fig:switchdesign} |
|
\end{figure} |
|
In section \ref{background:transition} we discussed different |
|
translation mechanisms for IPv6 and IPv4. In this thesis we focus on |
|
the translation mechanisms ``stateless'' and ``stateful'' NAT64. While higher |
|
layer protocol dependent translations are more flexible, this topic |
|
has already been addressed in |
|
\cite{nico18:_implem_layer_ipv4_ipv6_rever_proxy} and the focus in |
|
this thesis is on the practicability of high speed NAT64 with P4. |
|
The high level design can be seen in figure \ref{fig:switchdesign}: a |
|
P4 capable switch is running our code to provide NAT64 |
|
functionality. A P4 switch cannot manage its tables on its own and |
|
needs support for this from a controller. The controller also has the |
|
role to handle unknown packets and can modify the runtime |
|
configuration of the switch. This is especially useful in the case of |
|
stateful NAT64. |
|
If only static table entries |
|
are required, they can usually be added at the start of a P4 switch |
|
and the controller can also be omitted. However, stateful |
|
NAT64 requires the use of a controller to create session entries in the |
|
switch tables. |
|
The P4 switch can use any protocol to communicate with the controller, as |
|
the connection to the controller is implemented as a separate Ethernet |
|
port. |
|
\begin{figure}[htbp] |
|
\includegraphics[scale=0.4]{v6-v4-standard} |
|
\centering |
|
\caption{Standard NAT64 Translation} |
|
\label{fig:v6v4standard} |
|
\end{figure} |
|
|
|
Software NAT64 solutions typically require routing to be applied to |
|
transport the packet to the NAT64 translator as shown in figure |
|
\ref{fig:v6v4standard}. |
|
|
|
Our design differs here: |
|
while routing could be used like described |
|
above, NAT64 with P4 does not require any routing to be setup. Figure |
|
\ref{fig:v6v4mixed} shows the network design that we realise using |
|
P4. This design has multiple advantages: first it reduces the number |
|
of devices to pass and thus directly reduces the RTT, secondly it |
|
allows translation of IP addresses within the same logic network |
|
segment. |
|
\begin{figure}[htbp] |
|
\includegraphics[scale=0.4]{v6-v4-mixed} |
|
\centering |
|
\caption{In-network NAT64 Translation} |
|
\label{fig:v6v4mixed} |
|
\end{figure} |
|
P4 switches in general look very similar to regular switches, however |
|
support executing logic while the packet passes through the |
|
switch. Figure \ref{fig:p4switch} illustrates how our solution is |
|
implemented and translates packets. |
|
\begin{figure}[h] |
|
\includegraphics[scale=0.5]{p4switch} |
|
\centering |
|
\caption{Our P4 Switch Architecture} |
|
\label{fig:p4switch} |
|
\end{figure} |
|
% ---------------------------------------------------------------------- |
|
\section{\label{design:bmv2}P4/BMV2} |
|
\begin{figure}[htbp] |
|
\begin{verbatim} |
|
/* checksumming for icmp6_na_ns_option */ |
|
update_checksum_with_payload(meta.chk_icmp6_na_ns == 1, |
|
{ |
|
hdr.ipv6.src_addr, /* 128 */ |
|
hdr.ipv6.dst_addr, /* 128 */ |
|
meta.cast_length, /* 32 */ |
|
24w0, /* 24 0's */ |
|
PROTO_ICMP6, /* 8 */ |
|
hdr.icmp6.type, /* 8 */ |
|
hdr.icmp6.code, /* 8 */ |
|
|
|
hdr.icmp6_na_ns.router, |
|
hdr.icmp6_na_ns.solicitated, |
|
hdr.icmp6_na_ns.override, |
|
hdr.icmp6_na_ns.reserved, |
|
hdr.icmp6_na_ns.target_addr, |
|
|
|
hdr.icmp6_option_link_layer_addr.type, |
|
hdr.icmp6_option_link_layer_addr.ll_length, |
|
hdr.icmp6_option_link_layer_addr.mac_addr |
|
}, |
|
hdr.icmp6.checksum, |
|
HashAlgorithm.csum16 |
|
); |
|
\end{verbatim} |
|
\centering |
|
\caption{P4/BMV2 Checksumming} |
|
\label{fig:bmv2checksum} |
|
\end{figure} |
|
The software emulated switch that is implemented using |
|
Open vSwitch~\cite{openvswitch} and the |
|
behavioral model~\cite{_implem_your_switc_target_with_bmv2} |
|
offers the fastest and easiest way of P4 development. All NAT64 |
|
features are tested first on P4/BMV2 and in a second step ported to |
|
P4/NetFPGA and modified, where necessary. |
|
The development follows closely the general design shown in section |
|
\ref{design:nat64}. |
|
As outlined in section \ref{background:checksums}, checksums inside |
|
higher level protocols need to be adjusted after translation. |
|
Within the software emulation checksums can be |
|
computed with two different methods: |
|
\begin{itemize} |
|
\item Recalculating the checksum by inspecting headers and payload |
|
\item Calculating the difference between the translated headers |
|
\end{itemize} |
|
The BMV2 model is sophisticated and provides direct support |
|
for calculating the checksum over the payload. This allows the BMV2 |
|
model to operate as a full featured host, including advanced features |
|
like responding to ICMP6 Neighbor discovery requests~\cite{rfc4861} |
|
that include payload checksums. Sample code that calculates the |
|
required checksum for answering NDP queries is shown in figure |
|
\ref{fig:bmv2checksum}. The code shows how the field |
|
\texttt{hdr.icmp6.checksum} is updated with the \texttt{csum16} method |
|
depending on the IPv6 and ICMP6 headers as well as the payload. The |
|
second option of using the differences is described in section |
|
\ref{design:netpfga}. |
|
% ok |
|
% ---------------------------------------------------------------------- |
|
\section{\label{design:netpfga}P4/NetFPGA} |
|
\begin{figure}[htbp] |
|
\begin{verbatim} |
|
action v4sum() { |
|
bit<16> tmp = 0; |
|
|
|
tmp = tmp + (bit<16>) hdr.ipv4.src_addr[15:0]; // 16 bit |
|
tmp = tmp + (bit<16>) hdr.ipv4.src_addr[31:16]; // 16 bit |
|
tmp = tmp + (bit<16>) hdr.ipv4.dst_addr[15:0]; // 16 bit |
|
tmp = tmp + (bit<16>) hdr.ipv4.dst_addr[31:16]; // 16 bit |
|
|
|
tmp = tmp + (bit<16>) hdr.ipv4.totalLen -20; // 16 bit |
|
tmp = tmp + (bit<16>) hdr.ipv4.protocol; // 8 bit |
|
|
|
meta.v4sum = ~tmp; |
|
} |
|
|
|
/* analogue code for v6sum skipped */ |
|
|
|
action delta_tcp_from_v6_to_v4() |
|
{ |
|
v6sum(); |
|
v4sum(); |
|
|
|
bit<17> tmp = (bit<17>) hdr.tcp.checksum + (bit<17>) meta.v4sum; |
|
if (tmp[16:16] == 1) { |
|
tmp = tmp + 1; |
|
tmp[16:16] = 0; |
|
} |
|
tmp = tmp + (bit<17>) (0xffff - meta.v6sum); |
|
if (tmp[16:16] == 1) { |
|
tmp = tmp + 1; |
|
tmp[16:16] = 0; |
|
} |
|
|
|
hdr.tcp.checksum = (bit<16>) tmp; |
|
} |
|
\end{verbatim} |
|
\centering |
|
\caption{Calculating Checksum based on Header Differences} |
|
\label{fig:checksumbydiff} |
|
\end{figure} |
|
While the P4-NetFPGA project~\cite{netfpga:_p4_netpf_public_github} |
|
allows compiling P4 to the NetPFGA, the design slightly varies due to |
|
limitations in the available toolchain. |
|
In particular, the NetFPGA P4 compiler does not support reading |
|
the payload.\footnote{This feature could be implemented in theory, but |
|
isn't available at the moment, see~\cite{schottelius:_exter_p4_netpf}.} |
|
For this reason it also does not support |
|
creating the checksum based on the payload. |
|
To support checksum modifications in NAT64 on the NetFPGA, the |
|
checksum is calculated using differences between |
|
the IPv6 and IPv4 headers. |
|
|
|
As the checksum calculation only depends on the 1-complement sums of |
|
headers and the payload (compare section \ref{background:checksums}) |
|
and only headers are modified during NAT64 translations, the higher |
|
level protocol checksums can be corrected based on the sum of |
|
differences of both headers. Thus our P4/NetFPGA implementation first |
|
calculates the sum of the relevant IPv4 headers (\texttt{v4sum()}), |
|
the sum of the relevant IPv6 headers (\texttt{v6sum()}) |
|
and then calculates the difference including a |
|
possible carry bit and adjusts the higher level protocol by this |
|
difference (\texttt{delta\_tcp\_from\_v6\_to\_v4()}). |
|
Figure \ref{fig:checksumbydiff} shows an |
|
excerpt of the code used for adjusting the checksum when translating TCP |
|
from IPv6 to IPv4. |
|
It is notable that |
|
not the full headers are used, but only a ``pseudo header'' is (compare figures |
|
\ref{fig:ipv6pseudoheader} and \ref{fig:ipv4pseudoheader}). |
|
% ok |
|
|
|
% ---------------------------------------------------------------------- |
|
\section{\label{design:statelessnat64}Stateless NAT64} |
|
As seen in section \ref{background:transition:stateless}, stateless |
|
NAT64 can be implemented using various factors. Our design for the |
|
stateless depends on the capabilities of the environment and is |
|
summarised in table \ref{tab:statelessnat64factors}. |
|
\begin{table}[htbp] |
|
\begin{center}\begin{minipage}{\textwidth} |
|
\begin{tabular}{| c | c |} |
|
\hline |
|
\textbf{Implementation} & \textbf{NAT64 match}\\ |
|
\hline |
|
P4/BMV2 & LPM (both directions)\\ |
|
& and individual entries (both directions)\\ |
|
\hline |
|
P4/NetPFGA & Individual entries\\ |
|
\hline |
|
Tayga & LPM (IPv6 to IPv4) and individual entries (IPv4 to IPv6)\\ |
|
\hline |
|
Jool & LPM (both directions)\\ |
|
\hline |
|
\end{tabular} |
|
\end{minipage} |
|
\caption{NAT64 Match Factors} |
|
\label{tab:statelessnat64factors} |
|
\end{center} |
|
\end{table} |
|
When using LPM for translating from IPv6 to IPv4, a /96 IPv6 network |
|
is configured for covering the whole IPv4 Internet and the individual |
|
IPv4 address is appended to the prefix (compare section |
|
\ref{design:configuration}). We also use LPM to match on an IPv4 sub |
|
network that translates to an IPv6 sub network. Individual |
|
entries are configured differently depending on the implementation: |
|
Limitations in the P4/NetFPGA environment require to use table |
|
entries. Jool supports individual entries as a special case of LPM, |
|
with a network mask matching only one IP address. Tayga |
|
supports LPM to translate from IPv6 to IPv4, but requires individual |
|
entries for translating from IPv4 to IPv6. Our P4/BMV2 offers the |
|
highest degree of flexibility, as it provides support for individual |
|
entries based on table entries and LPM table entries. |
|
% ---------------------------------------------------------------------- |
|
\section{\label{design:statefulnat64}Stateful NAT64} |
|
\begin{figure}[htbp] |
|
\includegraphics[scale=0.5]{p4switch-stateful} |
|
\centering |
|
\caption{Stateful NAT64 with P4} |
|
\label{fig:p4switchstateful} |
|
\end{figure} |
|
Similar to stateless NAT64, the design of stateful NAT64 depends on |
|
the features of the individual implementation. As pointed out in section |
|
\ref{background:transition:statefulnat64}, stateful NAT64 is very |
|
similar to stateless NAT64, with the main difference being an |
|
additional stateful table that helps to create 1:n mappings. |
|
We use different approaches within the implementations |
|
to solve this problem: |
|
\begin{itemize} |
|
\item For P4/BMV2 and P4/NetPFGA a python controller handles packets |
|
that don't have a table entry, sets the table entry in the P4 switch |
|
and inserts the original packet afterwards back into the |
|
switch. |
|
\item With Tayga we rely on the Linux kernel NAT44 capabilities |
|
\item Jool implements its own stateful mechanism based on port |
|
ranges |
|
\end{itemize} |
|
All methods though operate in a very similar fashion: A ``controller'' |
|
inspects the IPv6 packet and depending on the source address, |
|
destination address, protocol (TCP, UDP, |
|
ICMP, ICMP6, etc.) and the protocol ID (source / destination TCP/UDP |
|
port, ICMP identifier) it selects an outgoing IPv4 address, and source |
|
port or ICMP identifier. |
|
In case of Jool and Tayga this decision is based on a session table |
|
inside the Linux kernel, in case of P4 this decision is based on a |
|
session table inside the python controller. While the Jool and Tayga |
|
both support cleaning up old session entries, |
|
our P4 based solution does not support this feature at the moment. |
|
|
|
In figure \ref{fig:p4switchstateful} we show the flow of a packet for |
|
stateful translation in a P4 switch in detail. An IPv6 only |
|
host emits a packet that should be translated to IPv4. On a new |
|
connection there will be no table entry in the P4 switch to |
|
match. Thus the table mismatch causes the P4 switch to forward the |
|
packet to the controller. The controller then inspects the packet, |
|
creates a table entry for the session and reinjects the packet into |
|
the P4 switch. The P4 switch then processes the packet again, however |
|
this time it finds a matching table entry. This entry causes |
|
translation to happen to a specific IPv4 address, including higher |
|
level protocol changes. After processing the IPv6 packet it is output as |
|
a translated IPv4 packet. A second packet of the same session will |
|
directly take the second path via table match, as the session ID will |
|
stay the same.\footnote{We use the quintuple (source address, |
|
destination address, source port, destination port, protocol) to |
|
generate a unique ID.} This is an important feature, because if the |
|
controller was involved into processing every packet, the P4 |
|
controller would become the bottleneck. |
|
% ---------------------------------------------------------------------- |
|
\section{\label{design:tests}NAT64 Verification} |
|
We use socat~\cite{rieger:_multip} to verify basic operation of the |
|
NAT64 gateway and iperf~\cite{dugan:_tcp_udp_sctp} to test stability |
|
of the implementation and measure bandwidth. |
|
In particular we use |
|
the commands listed in table \ref{tab:nat64verification}. The socat |
|
commands allow interactive testing on TCP and UDP connections, while |
|
the iperf commands fully utilise the available bandwidth with test |
|
data. |
|
The socat and iperf commands are used to verify all three NAT64 |
|
implementations (P4, Tayga, Jool). |
|
\begin{table}[htbp] |
|
\begin{center}\begin{minipage}{\textwidth} |
|
\begin{tabular}{| c | c | c |} |
|
\hline |
|
\textbf{Command} & \textbf{Example} & \textbf{Description} \\ |
|
\hline |
|
\texttt{socat - TCP6:HOST:PORT} & socat - |
|
TCP6:[2001:db8:42::a00:2a]:2345 & Connect via IPv6/TCP\\ |
|
& & to IPv4 host\\ |
|
%\hline |
|
\texttt{socat - UDP6:HOST:PORT} & socat - |
|
UDP6:[2001:db8:42::a00:2a]:2345 & Connect via IPv6/UDP \\ & & to IPv4 host\\ |
|
%\hline |
|
\texttt{socat - TCP:HOST:PORT} & socat - |
|
TCP:10.0.1.42:2345 & Connect via IPv4/TCP \\ & & to IPv6 host \\ |
|
%\hline |
|
\texttt{socat - UDP:HOST:PORT} & socat - |
|
UDP:10.0.1.42:2345 & Connect via IPv4/UDP \\ & & to IPv6 host \\ |
|
\hline |
|
\texttt{socat - UDP6-LISTEN:PORT} & socat - |
|
UDP6-LISTEN:2345 & Listen on IPv6/UDP \\ |
|
%\hline |
|
\texttt{socat - TCP6-LISTEN:PORT} & socat - |
|
TCP6-LISTEN:2345 & Listen on IPv6/TCP \\ |
|
%\hline |
|
\texttt{socat - UDP-LISTEN:PORT} & socat - |
|
UDP-LISTEN:2345 & Listen on IPv4/UDP \\ |
|
%\hline |
|
\texttt{socat - TCP-LISTEN:PORT} & socat - |
|
TCP-LISTEN:2345 & Listen on IPv4/TCP \\ |
|
\hline |
|
\texttt{iperf3 -PROTO -p PORT} & iperf3 -4 -p 2345 & IPv4 iperf server\\ |
|
\texttt{-B IP -s} & -B 10.0.0.42 -s &\\ |
|
& iperf3 -6 -p 2345 & IPv6 iperf server\\ |
|
& -B 2001:db8:42::42 -s & \\ |
|
\hline |
|
\texttt{iperf3 -PROTO -p PORT } & iperf3 -6 -p 2345& Connect to iperf server\\ |
|
\texttt{-O IGNORETIME -t RUNTIME} & -O 10 -t 190 & |
|
Run for 190 seconds, \\ |
|
& & skip first 10 seconds\\ |
|
\texttt{-P PARALLEL -c IP} & -P20 -c 2001:db8:23::2a & |
|
with 20 sessions\\ |
|
& & connecting to\\ |
|
& & 2001:db8:23::2a\\ |
|
\texttt{iperf3 -PROTO -p PORT} & & Same as above,\\ |
|
\texttt{-O IGNORETIME -t RUNTIME} & & but connect via UDP\\ |
|
\texttt{-P PARALLEL -c IP} & & \\ |
|
\texttt{-u -b0} & & \\ |
|
\hline |
|
\end{tabular} |
|
\end{minipage} |
|
\caption{NAT64 Verification Commands} |
|
\label{tab:nat64verification} |
|
\end{center} |
|
\end{table} |
|
% ---------------------------------------------------------------------- |
|
\section{\label{design:configuration}IPv6 and IPv4 Configuration} |
|
The following sections refer to host and network configurations. In |
|
this section we describe the IPv6 and IPv4 configurations as a basis |
|
for the discussion. |
|
|
|
All IPv6 addresses are from the documentation block |
|
\textit{2001:DB8::/32}~\cite{rfc3849}. In particular we use the sub |
|
networks and IPv6 addresses shown in table \ref{tab:ipv6address}. |
|
\begin{table}[htbp] |
|
\begin{center}\begin{minipage}{\textwidth} |
|
\begin{tabular}{| c | c |} |
|
\hline |
|
\textbf{Address} & \textbf{Description} \\ |
|
\hline |
|
2001:db8:42::/64 & IPv6 host network \\ |
|
\hline |
|
2001:db8:23::/96 & IPv6 mapping to the IPv4 Internet \\ |
|
\hline |
|
2001:db8:42::42 & IPv6 host address \\ |
|
\hline |
|
2001:db8:42::77 & IPv6 router address \\ |
|
\hline |
|
2001:db8:42::a00:2a & In-network IPv6 address mapped to 10.0.0.42 (p4)\\ |
|
\hline |
|
2001:db8:23::a00:2a & IPv6 address mapped to 10.0.0.42 (Tayga) \\ |
|
\hline |
|
2001:db8:23::2a & IPv6 address mapped to 10.0.0.42 (Jool)\\ |
|
\hline |
|
\end{tabular} |
|
\end{minipage} |
|
\caption{IPv6 Address and Network Overview} |
|
\label{tab:ipv6address} |
|
\end{center} |
|
\end{table} |
|
|
|
We use private IPv4 addresses as specified by RFC1918~\cite{rfc1918} |
|
from the 10.0.0.0/8 range as shown in table \ref{tab:ipv4address}. |
|
|
|
\begin{table}[htbp] |
|
\begin{center}\begin{minipage}{\textwidth} |
|
\begin{tabular}{| c | c |} |
|
\hline |
|
\textbf{Address} & \textbf{Description} \\ |
|
\hline |
|
10.0.0.0/24 & IPv4 host network \\ |
|
\hline |
|
10.0.1.0/24 & IPv4 network mapping to IPv6\\ |
|
\hline |
|
10.0.0.77 & IPv4 router address\\ |
|
\hline |
|
10.0.0.66 & In-network IPv4 address mapped to 2001:db8:42::42 (p4)\\ |
|
\hline |
|
10.0.1.42 & IPv4 address mapped to 2001:db8:42::42 (Tayga)\\ |
|
\hline |
|
10.0.1.66 & IPv4 address mapped to 2001:db8:42::42 (Jool)\\ |
|
\hline |
|
\end{tabular} |
|
\end{minipage} |
|
\caption{IPv4 Address and Network Overview} |
|
\label{tab:ipv4address} |
|
\end{center} |
|
\end{table} |
|
% ok
|
|
|