++netpfga results

This commit is contained in:
Nico Schottelius 2019-08-15 16:45:56 +02:00
parent e1949d2ac3
commit bf22fdcdb3
5 changed files with 110 additions and 177 deletions

View file

@ -174,191 +174,24 @@ idomatic problem: Security issue: not checking checksums before
% ----------------------------------------------------------------------
\section{\label{conclusion:netpfga}NetFGPA - all HERE}
personal note here
stopped working
reboot not enough
does not respond to any packet
tested various kernels for table debugging
MTU limitations: 1500 according to a private mail from Salvator Galea
cambridge / uk
long compile process
error prone compile process
many dependencies
lpm not supported!
Netpfga live,
Vivado
SDNET
xx k lines of supporting code
Vivado installation: silent errors, infinite loop, missing libncurses5
82k lines of code that are interdependent
Many non critical error messages on the way
Zero exit fatal errors
missing / spreaded documentation
tcpdump on local nfX doesn't work -> can only debug on other endpoint
First card: Writing tables fails
hardware debug shows some errors
but hardware debug on correct card also shows some error
Debug ioctl errors when writing table entries
Output all ports -> port mapping documented only in a testdata script
hwtest: Execution fails due to missing djtgcfg
no payload accessq
Many workarounds
Table size 63, table size 64,
Table entries require arguments of all possible actions, not only used
one.
Compile time hours
Silent errors
Unclear errors: broken board
Due to the very fragile nature of the build framework from the
NetFPGA-Live repository,
Renaming VARIABLES in the definition of
Reproducibility:
hours for finding right output ports
packet size / annotation
Needed to debug internal parsing errors
3x rebooting to get card working with bitstream
Variable renaming breaks the compile process
\begin{verbatim}
It seems I was really mistaken for the last weeks
If I am not totally mistaken, the following is happening with the netpfga:
I was testing sending and receiving packets on the same computer; so I sent a packet on nfX and expected an answer on nf0, which is how I wanted to verify that the card works
So I ran tcpdump on nf0, send a packet with ping6 and scapy on nf{0,1,2,3} (edited)
I have never seen the switch emitting ANY packet back with tcpdump
Now with the card connected to another host, sending neighbor solicitation, I see duplicated packets on the other host - so it seems that it might have worked all the time, just that tcpdump on nfX on the host which contains the card does not show the packets
\end{verbatim}
debugging generated tcl code to debug impl1 error
Cable problems:
\begin{verbatim}
[ 488.265148] ixgbe 0000:02:00.0: failed to initialize because an unsupported SFP+ module type was detected.
[ 488.265157] ixgbe 0000:02:00.0: Reload the driver after installing a supported module.
[ 488.265605] ixgbe 0000:02:00.0: removed PHC on enp2s0f0
\end{verbatim}
function syntax not supported, using defines instead
4-6 MB logfiles for a compile process.
confusing messages
\begin{verbatim}
WARNING: command 'get_user_parameter' will be removed in the 2015.3
release, use 'get_user_parameters' instead
\end{verbatim}
critical non critical errors
\begin{verbatim}
CRITICAL WARNING: [BD 41-737] Cannot set the parameter TRANSLATION_MODE on /axi_interconnect_0. It is read-only.
\end{verbatim}
\begin{verbatim}
- step9 (sume simulation, the longest step) in the process calls
"config_writes.py"
- config_writes.py fails with a syntax error, as it is incomplete
python code
- config_writes.py and config_writes.sh are generated by
gen_config_writes.py
- gen_config_writes.py reads config_writes.txt
- config_writes.txt is created in step 5 (sdnet simulation)
- step 5 consists of running xsc, xelab and xsim
- xsim (re-)generates config_writes.txt according to a watch ls -l
on the file: ${XILINX_VIVADO}/bin/xsim --runall
SimpleSumeSwitch_tb#work.glbl
- it seems (by grep -r) that ./Testbench/SimpleSumeSwitch_tb.sv is
responsible for writing config_writes.txt
- It seems that the "task" "SV_write_control" inside that file is
responsible for writing the content, which in turn uses
axi4_lite_master_write_request_control
\end{verbatim}
\begin{verbatim}
- Cannot easily run P4 on notebook - changes to the system very
invasive
- Varous compiler bugs/limitations
- Very very deep rabbithole problems
- Hanging/sleeping issue -- unclear whether it does something or
not
- Open impl_1 error with unclear reason
- logfiles referenced that don't exist
Run output will be captured here: /home/nico/projects/P4-NetFPGA/contrib-projects/sume-sdnet-switch/projects/minip4/simple_sume_switch/hw/project/simple_sume_switch.runs/synth/runme.log
nico@nsg-System:~/master-thesis/netpfga/log$ ls -alh /home/nico/projects/P4-NetFPGA/contrib-projects/sume-sdnet-switch/projects/minip4/simple_sume_switch/hw/project/simple_sume_switch.runs/synth/runme.log
ls: cannot access '/home/nico/projects/P4-NetFPGA/contrib-projects/sume-sdnet-switch/projects/minip4/simple_sume_switch/hw/project/simple_sume_switch.runs/synth/runme.log': No such file or directory
- even "short" compile runs taking 30m+
control_sub_m02_data_fifo_0_synth_1: /home/nico/projects/P4-NetFPGA/contrib-projects/sume-sdnet-switch/projects/minip4/simple_sume_switch/hw/project/simple_sume_switch.runs/control_sub_m02_data_fifo_0_synth_1/runme.log
nico@nsg-System:~/master-thesis/netpfga/minip4/testdata$ less /home/nico/projects/P4-NetFPGA/contrib-projects/sume-sdnet-switch/projects/minip4/simple_sume_switch/hw/project/simple_sume_switch.runs/control_sub_m02_data_fifo_0_synth_1/runme.log
/home/nico/projects/P4-NetFPGA/contrib-projects/sume-sdnet-switch/projects/minip4/simple_sume_switch/hw/project/simple_sume_switch.runs/control_sub_m02_data_fifo_0_synth_1/runme.log: No such file or directory
- Wrong warnings: using 2018.2, getting warnings about things
removed in 2015.3
WARNING: command 'get_user_parameter' will be removed in the 2015.3
release, use 'get_user_parameters' instead
- A script/makefile generates a python script that generates a shell
script and later then a python script. If there is a mistake in
generating the first python script (syntax ok, but content is
not correct) then a much later stage of the compile process will
fail due to a syntax error in the third generated
script. However that syntax error is not fatal in the build
process and thus can only be seen with careful analysis of the
logfile, which is around 700 KiB or 10k lines per compile
process and contains 328 lines matching "error" and
"warning".
Most of the error and warning messages seem to be non-critical
(even if saying they are). Then there are a variety of INFO
messages that actually constitute ERROR messages, but are not
flagged as such nor do they cause the build process to abort.
\end{verbatim}
LPM tables don't work
match type exact - table must be at least 64 in size
multiple reboots sometimes required for flashing
Damaged, enlarged packets
\begin{verbatim}
@ -545,6 +378,14 @@ the learnings of the different layers were very much appreciated / liked
It was a
% ----------------------------------------------------------------------
\section{\label{conclusion:netpfga2}NetFGPA2 - conclusion here}
Very time intensive development due to usability problems and
uncertainty of functionality (compare sections
\ref{results:netpfga:usability} and \ref{results:netpfga:stability}).
\section{todo - FIXME: remove}
\begin{verbatim}
***** Summary eher kurz

View file

@ -161,13 +161,14 @@ table entries.
Jool and tayga are supported by
% ----------------------------------------------------------------------
\section{\label{Results:NetPFGA}NetFPGA}
\section{\label{results:netpfga}NetFPGA}
The reduced feature set of the NetPFGA implementation is due to two
factors: compile time. Between 2 to 6 hours per compile run. No
payload checksum
overview - general translation - not advanced features
% ----------------------------------------------------------------------
\subsection{\label{results:netpfga:features}Features}
\begin{table}[htbp]
\begin{center}\begin{minipage}{\textwidth}
\begin{tabular}{| c | c | c |}
@ -235,7 +236,6 @@ unsupported\footnote{To support creating payload checksums, either an
\label{tab:p4netpfgafeatures}
\end{center}
\end{table}
% ----------------------------------------------------------------------
\subsection{\label{results:netpfga:stability}Stability}
Two different NetPFGA cards were used during the development of the
@ -262,15 +262,99 @@ During the development and benchmarking, the second NetFPGA card stopped to
function properly multiple times. In both cases the card would not
forward packets anymore. Multiple reboots (3 were usually enough)
and multiple times reflashing the bitstream to the NetFPGA usually
restored the intended behaviour.
restored the intended behaviour. However due to this ``crashes'', it
was impossible to complete a full benchmark run that would last for
more than one hour.
% ----------------------------------------------------------------------
\subsection{\label{results:netpfga:performance}Performance}
As expected, the NetFGPA card performed at near line speed and offers
NAT64 translations at 9.28 Gbit/s.
NAT64 translations at 9.28 Gbit/s. Single and multiple streams
performed almost exactly identical and have been consistent through
multiple iterations of the benchmarks.
% ----------------------------------------------------------------------
\subsection{\label{results:netpfga:usability}Usability}
To use the NetFGPA, Vivado and SDNET provided by Xilinx need to be
installed. However a bug in the installer triggers an infinite loop,
if a certain shared library\footnote{The required shared library
is libncurses5.} is missing on the target operating system. The
installation program seems still to be progressing, however does never
finish.
While the NetFPGA card supports P4, the toolchains and supporting
scripts are in a immature state. The compilation process consists of
at least 9 different steps, which are interdependent\footnote{See
source code \texttt{bin/do-all-steps.sh}.} Some of the steps generate
shell scripts and python scripts that in turn generate JSON
data.\footnote{One compilation step calls the script
``config\_writes.py''. This script failed with a syntax error, as it
contained incomplete python code. The scripts config\_writes.py
and config\_writes.sh are generated by gen\_config\_writes.py.
The output of the script gen\_config\_writes.py depends on the content
of config\_writes.txt. That file is generated by the simulation
``xsim''. The file ``SimpleSumeSwitch\_tb.sv'' contains code that is
responsible for writing config\_writes.txt and uses a function
named axi4\_lite\_master\_write\_request\_control for generating the
output. This in turn is dependent on the output of a script named
gen\_testdata.py.}
Checksum computation
However incorrect parsing generates syntactically incorrect
scripts or scripts that generate incorrect output. The toolchain
provided by the NetFGPA-P4 repository contains more than 80000 lines
of code. The supporting scripts for setting table entries require
setting the parameters for all possible actions, not only for the
selected action. Supplying only the required parameters results in a
crash of the supporting script.
The documentation for using the NetFPGA-P4 repository is very
distributed and does not contain a reference on how to use the
tools. Mapping of egress ports and their metadata field are found in a
python script that is used for generating test data.
The compile process can take up to 6 hours and because the different
steps are interdependent, errors in a previous stage were in our
experiences detected hours after they happened. The resulting log
files of the compilation process can be up to 5 MB in size. Within
this log file various commands output references to other logfiles,
however the referenced logfiles do not exist before or after the
compile process.
During the compile process various informational, warning and error
messages are printed. However some informational messages constitute
critical errors, while on the other hand critical errors and syntax
errors often do not constitue a critical
error.\footnote{F.i. ``CRITICAL WARNING: [BD 41-737] Cannot set the
parameter TRANSLATION\_MODE on /axi\_interconnect\_0. It is
read-only.'' is a non critical warning.}
Also contradicting
output is generated\footnote{While using version 2018.2, the following
message was printed: ``WARNING: command 'get\_user\_parameter' will be removed in the 2015.3
release, use 'get\_user\_parameters' instead''.}
The NetFPGA kernel module provides access to virtual Linux
devices (nf0...nf3). However tcpdump does not see any packets that are
emitted from the switch. The only possibility to capture packets
that are emitted from the switch is by connecting a physical cable to
the port and capturing on the other side.
Jumbo frames\footnote{Frames with an MTU greater than 1500 bytes.} are
commonly used in 10 Gbit/s networks. According to
\ref{wikipedia:_jumbo}, even many gigabit network interface card
support jumbo frames. However according to emails on the private
NetPFGA mailing list, the NetFPGA only supports 1500 byte frames at
the moment and additional work is required to implement support for
bigger frames.
While most of the P4 language is supported on the netpfga, some key
techniques are missing or not supported.
\begin{itemize}
\item Analysing / accessing payload is not supported
\item Checksum computation over payload is not supported
\item Using LPM tables can lead to compilation errors
\item Depening on the match type, only certain table sizes are allowed
\end{itemize}
Renaming variables in the declaration of the parser or deparser lead
to compilation errors. Function syntax is not supported. For this
reason our implementation uses \texttt{\#define} statements instead of functions.
Trace files
\begin{verbatim}

Binary file not shown.

View file

@ -508,7 +508,6 @@ nf3: ERROR while getting interface flags: No such device
nico@nsg-System:~/projects/P4-NetFPGA/contrib-projects/sume-sdnet-switch/projects/minip4/simple_sume_switch/bitfiles$
\end{verbatim}
% ----------------------------------------------------------------------
\section{\label{chapterB:netpfga-kernelmodule}NetFPGA Kernel module}
After a successful flash, loading the kernel module will enable nf
devices to appear in the operating system.
@ -580,13 +579,15 @@ nico@nsg-System:~$
\end{verbatim}
% ----------------------------------------------------------------------
\section{\label{chapterB:netpfga-nftraffic}NetFPGA misses packets on nf*}
While the nf devices appear in the operating system, packets emitted
by the netpfga cannot be sniffed on the nf interfaces
directly. Instead one has to sniff packets on a physical network card
that is connected to the specific output port.
% ----------------------------------------------------------------------
\section{\label{chapterB:netpfga-kernelmodule}NetFPGA Kernel module}
%---------------------------------------------------------------------------------------------------------
\chapter{\label{benchmark}Benchmark Logs}
% ----------------------------------------------------------------------

View file

@ -135,3 +135,10 @@
author = {Hendrik Züllig, Supervisor; Prof. Dr. Laurent Vanbever; Tutor: Tobias Bühler},
title = {P4-Programming on an FPGA, Semester Thesis SA-2019-02},
howpublished = {\url{https://gitlab.ethz.ch/nsg/student-projects/sa-2019-02_p4_programming_sume_netfpga/blob/master/SA-2019-02.pdf}}}
@Misc{wikipedia:_jumbo,
author = {Wikipedia},
title = {Jumbo frame},
howpublished = {\url{https://en.wikipedia.org/wiki/Jumbo_frame}},
note = {Requested on 2019-08-15}}