\chapter{\label{summary}Conclusion} %** Summary.tex: What have you achieved, what have you presented in this % document. What are the highlights of your work. % It should conclude by a conclusion. Sum up what you have done and recapitulate your key findings. %\section{\label{conclusion:overall}Overall} \section{\label{conclusion:softwarenat64}Software based NAT64} \section{\label{conclusion:general}General} Many misleading \section{\label{conclusion:bmv2}BMV2} \section{\label{conclusion:P4}P4} NDP parsing problem checksumming a frequent problem and helper Many possibilities Protocol independent Easy architecture Limitations in if in action limitations Limits if in actions python2 only - unicode errors IPv6: NDP: not easy to parse, as unknown number of following fields No support for multiple LPM keys in a table, can be solved with ternary matching. switch cannot be used in actions if things don't work, often a checksum problem. if frame checksum, then length of packet is broken \begin{verbatim} p4c --target bmv2 --arch v1model --std p4-16 "../p4src/static-mapping.p4" -o "/home/p4/master-thesis/p4src" ../p4src/static-mapping.p4(366): error: Program is not supported by this target, because table MyIngress.v6_networks has multiple successors table v6_networks { ^^^^^^^^^^^ \end{verbatim} \begin{verbatim} ipaddress.ip_network("2001:db8:61::/64") IPv6Network(u'3230:3031:3a64:6238:3a36:313a:3a2f:3634/128') Fix: from __future__ import unicode_literals \end{verbatim} The tooling around P4 is still fragile, encountered many bugs in the development.\cite{schottelius:github1675} or missing features (\cite{schottelius:github745}, \cite{theojepsen:_get}) Hitting expression bug retrieving information from tables \begin{verbatim} Key and mask for matching destination is in table. We need this information in the action. However this information is not exposed, so we need to specify another parameter with the same information as in the key(s). Log from slack: (2019-03-14) nico [1:55 PM] If I use LPM for matching, can I easily get the network address from P4 or do I have to use a bitmask myself? In the latter case it is not exactly clear how to get the mask from the table Nate Foster [1:58 PM] You want to retrieve the address in the packet? In a table? And do you want to do the retrieving from the data plane or the control plane? (edited) nico [2:00 PM] If I have a match in a table that matches on LPM, it can be any IP address in a network For calculating the NAT64/NAT46 translation, I will need the base address, i.e. network address to do subtractions/additions So it is fully data plane, what I would like to do I'll commit sample code to show the use case more clearly https://gitlab.ethz.ch/nicosc/master-thesis/blob/master/p4src/static-mapping.p4#L73 GitLab p4src/static-mapping.p4 · master · nicosc / master-thesis gitlab.ethz.ch So the action nat64_static() is used in the table v6_networks. In v6_networks I use a match on `hdr.ipv6.dst_addr: lpm;` What I would like to be able is to get the network address ; I can do that manually, if I have the mask I can also re-inject this parameter by another action argument, but I'd assume that I can somewhere read this out from the table / match Nate Foster [2:15 PM] To make sure I understand, in the data plane, you want to retrieve the address in the lpm pattern? (edited) nico [2:16 PM] I want to retrieve the key Nate Foster [2:16 PM] Wait. The value `hdr.ipv6.dst_addr` is the thing used in the match. So you have that. What you don’t have is the IPv6 address and mask put into the table by the control plane. I assume you want the latter, right? nico [2:17 PM] For example, if my matching key is 2001:db8::/32 and the real address is 2001:db8::f00, then I would like to retrieve 2001:db8:: and 32 from the table exactly :slightly_smiling_face: I can "fix" this by adding another argument, but it feels somewhat wrong to do that Because the table already knows this information Nate Foster [2:26 PM] I can’t think of a way other than the action parameter hack. nico [2:26 PM] Oh, ok Is it because the information is "lost in hardware"? Nate Foster [2:31 PM] No you’re right that most implementations have the value in memory. And one can imagine a different table API that allowed one to retrieve it in the data plane. But unless I am missing something obvious, P4 hides it… \end{verbatim} no meta information \begin{verbatim} Is there any meta information for "from which table was the action called" available? My use case is having a debug action that sends packets to the controller and I use it as a default_action in various tables; however know I don't know anymore from which table the action was called. Is there any kind of meta information which table called me available? I could work around this by using if(! .. .hit) { my_action(table_id) }, but it would not work with using default_action = ... \end{verbatim} type definitions separate Code sharing (controller, switch) \begin{verbatim} *** DONE Synchronisation with the controller - Double data type definition -> might differ - TYPE_CPU for ethernet - Port ingress offset (9 vs. 16 bit) \end{verbatim} No switch in actions, No conditional execution in actions P4os - reusable code \begin{verbatim} Not addressed so far: how to create re-usable code fragments that can be plugged in easily. There could be a hypothetical "P4OS" that manages code fragments. This might include, but not limited to downloading (signed?) source code, managing dependencies similar to Linux package management, handling updates, etc. \end{verbatim} idomatic problem: Security issue: not checking checksums before % ---------------------------------------------------------------------- \section{\label{conclusion:netpfga}NetFGPA - all HERE} personal note here tested various kernels for table debugging MTU limitations: 1500 according to a private mail from Salvator Galea cambridge / uk long compile process error prone compile process many dependencies lpm not supported! Netpfga live, Vivado SDNET xx k lines of supporting code Vivado installation: silent errors, infinite loop, missing libncurses5 82k lines of code that are interdependent Many non critical error messages on the way Zero exit fatal errors missing / spreaded documentation tcpdump on local nfX doesn't work -> can only debug on other endpoint First card: Writing tables fails hardware debug shows some errors but hardware debug on correct card also shows some error Debug ioctl errors when writing table entries Output all ports -> port mapping documented only in a testdata script hwtest: Execution fails due to missing djtgcfg no payload accessq Many workarounds Table size 63, table size 64, Table entries require arguments of all possible actions, not only used one. Compile time hours Silent errors Unclear errors: broken board Due to the very fragile nature of the build framework from the NetFPGA-Live repository, Renaming VARIABLES in the definition of Reproducibility: hours for finding right output ports packet size / annotation Needed to debug internal parsing errors 3x rebooting to get card working with bitstream Variable renaming breaks the compile process \begin{verbatim} It seems I was really mistaken for the last weeks If I am not totally mistaken, the following is happening with the netpfga: I was testing sending and receiving packets on the same computer; so I sent a packet on nfX and expected an answer on nf0, which is how I wanted to verify that the card works So I ran tcpdump on nf0, send a packet with ping6 and scapy on nf{0,1,2,3} (edited) I have never seen the switch emitting ANY packet back with tcpdump Now with the card connected to another host, sending neighbor solicitation, I see duplicated packets on the other host - so it seems that it might have worked all the time, just that tcpdump on nfX on the host which contains the card does not show the packets \end{verbatim} debugging generated tcl code to debug impl1 error Cable problems: \begin{verbatim} [ 488.265148] ixgbe 0000:02:00.0: failed to initialize because an unsupported SFP+ module type was detected. [ 488.265157] ixgbe 0000:02:00.0: Reload the driver after installing a supported module. [ 488.265605] ixgbe 0000:02:00.0: removed PHC on enp2s0f0 \end{verbatim} function syntax not supported, using defines instead 4-6 MB logfiles for a compile process. confusing messages \begin{verbatim} WARNING: command 'get_user_parameter' will be removed in the 2015.3 release, use 'get_user_parameters' instead \end{verbatim} critical non critical errors \begin{verbatim} CRITICAL WARNING: [BD 41-737] Cannot set the parameter TRANSLATION_MODE on /axi_interconnect_0. It is read-only. \end{verbatim} \begin{verbatim} - step9 (sume simulation, the longest step) in the process calls "config_writes.py" - config_writes.py fails with a syntax error, as it is incomplete python code - config_writes.py and config_writes.sh are generated by gen_config_writes.py - gen_config_writes.py reads config_writes.txt - config_writes.txt is created in step 5 (sdnet simulation) - step 5 consists of running xsc, xelab and xsim - xsim (re-)generates config_writes.txt according to a watch ls -l on the file: ${XILINX_VIVADO}/bin/xsim --runall SimpleSumeSwitch_tb#work.glbl - it seems (by grep -r) that ./Testbench/SimpleSumeSwitch_tb.sv is responsible for writing config_writes.txt - It seems that the "task" "SV_write_control" inside that file is responsible for writing the content, which in turn uses axi4_lite_master_write_request_control \end{verbatim} \begin{verbatim} - Cannot easily run P4 on notebook - changes to the system very invasive - Varous compiler bugs/limitations - Very very deep rabbithole problems - Hanging/sleeping issue -- unclear whether it does something or not - Open impl_1 error with unclear reason - logfiles referenced that don't exist Run output will be captured here: /home/nico/projects/P4-NetFPGA/contrib-projects/sume-sdnet-switch/projects/minip4/simple_sume_switch/hw/project/simple_sume_switch.runs/synth/runme.log nico@nsg-System:~/master-thesis/netpfga/log$ ls -alh /home/nico/projects/P4-NetFPGA/contrib-projects/sume-sdnet-switch/projects/minip4/simple_sume_switch/hw/project/simple_sume_switch.runs/synth/runme.log ls: cannot access '/home/nico/projects/P4-NetFPGA/contrib-projects/sume-sdnet-switch/projects/minip4/simple_sume_switch/hw/project/simple_sume_switch.runs/synth/runme.log': No such file or directory - even "short" compile runs taking 30m+ control_sub_m02_data_fifo_0_synth_1: /home/nico/projects/P4-NetFPGA/contrib-projects/sume-sdnet-switch/projects/minip4/simple_sume_switch/hw/project/simple_sume_switch.runs/control_sub_m02_data_fifo_0_synth_1/runme.log nico@nsg-System:~/master-thesis/netpfga/minip4/testdata$ less /home/nico/projects/P4-NetFPGA/contrib-projects/sume-sdnet-switch/projects/minip4/simple_sume_switch/hw/project/simple_sume_switch.runs/control_sub_m02_data_fifo_0_synth_1/runme.log /home/nico/projects/P4-NetFPGA/contrib-projects/sume-sdnet-switch/projects/minip4/simple_sume_switch/hw/project/simple_sume_switch.runs/control_sub_m02_data_fifo_0_synth_1/runme.log: No such file or directory - Wrong warnings: using 2018.2, getting warnings about things removed in 2015.3 WARNING: command 'get_user_parameter' will be removed in the 2015.3 release, use 'get_user_parameters' instead - A script/makefile generates a python script that generates a shell script and later then a python script. If there is a mistake in generating the first python script (syntax ok, but content is not correct) then a much later stage of the compile process will fail due to a syntax error in the third generated script. However that syntax error is not fatal in the build process and thus can only be seen with careful analysis of the logfile, which is around 700 KiB or 10k lines per compile process and contains 328 lines matching "error" and "warning". Most of the error and warning messages seem to be non-critical (even if saying they are). Then there are a variety of INFO messages that actually constitute ERROR messages, but are not flagged as such nor do they cause the build process to abort. \end{verbatim} LPM tables don't work match type exact - table must be at least 64 in size multiple reboots sometimes required for flashing Damaged, enlarged packets \begin{verbatim} ** The NetPFGA saga Problems encountered: - The logfile for a compile run is 10k+ lines - Many logged errors can actually be ignored (?) like: ERROR: [VRFC 10-1491] unexpected EOF [/home/nico/master-thesis/netpfga/minip4/nf_sume_sdnet_ip/SimpleSumeSwitch/S_CONTROLLERs.HDL/S_CONTROLLER_SimpleSumeSwitch.vp:37] ERROR: [VRFC 10-426] cannot find port tuple_out_sume_metadata_DATA on this module [/home/nico/master-thesis/netpfga/minip4/simple_sume_switch/hw/project/simple_sume_switch.srcs/sources_1/ip/nf_sume_sdnet_ip/nf_sume_sdnet_ip/wrapper/nf_sume_sdnet.v:219] ERROR: [VRFC 10-426] cannot find port tuple_out_sume_metadata_VALID on this module [/home/nico/master-thesis/netpfga/minip4/simple_sume_switch/hw/project/simple_sume_switch.srcs/sources_1/ip/nf_sume_sdnet_ip/nf_sume_sdnet_ip/wrapper/nf_sume_sdnet.v:218] ERROR: [VRFC 10-426] cannot find port tuple_in_sume_metadata_DATA on this module [/home/nico/master-thesis/netpfga/minip4/simple_sume_switch/hw/project/simple_sume_switch.srcs/sources_1/ip/nf_sume_sdnet_ip/nf_sume_sdnet_ip/wrapper/nf_sume_sdnet.v:185] ERROR: [VRFC 10-426] cannot find port tuple_in_sume_metadata_VALID on this module [/home/nico/master-thesis/netpfga/minip4/simple_sume_switch/hw/project/simple_sume_switch.srcs/sources_1/ip/nf_sume_sdnet_ip/nf_sume_sdnet_ip/wrapper/nf_sume_sdnet.v:184] ERROR: [VRFC 10-2063] Module not found while processing module instance [/home/nico/master-thesis/netpfga/minip4/simple_sume_switch/hw/project/simple_sume_switch.srcs/sources_1/ip/nf_sume_sdnet_ip/nf_sume_sdnet_ip/Simp leSumeSwitch/SimpleSumeSwitch.v:332] ERROR: [VRFC 10-2063] Module not found while processing module instance [/home/nico/master-thesis/netpfga/minip4/simple_sume_switch/hw/project/simple_sume_switch.srcs/sources_1/ip/nf_sume_sdnet_ip/nf_sume_sdnet_ip/ SimpleSumeSwitch/SimpleSumeSwitch.v:343] ERROR: [VRFC 10-2063] Module not found while processing module instance [/home/nico/master-thesis/netpfga/minip4/simple_sume_switch/hw/project/simple_sume_switch.srcs/sources_1/ip/nf_sume_sdnet_ip/nf_sume_sdnet_i p/SimpleSumeSwitch/SimpleSumeSwitch.v:354] ERROR: [VRFC 10-2063] Module not found while processing module instance [/home/nico/master-thesis/netpfga/minip4/simple_sume_switch/hw/project/simple_sume_switch.srcs/sources_1/ip/nf_sume_sdnet_ip/nf_sume_sdnet_ip/SimpleSumeSwitc h/SimpleSumeSwitch.v:436] ERROR: [VRFC 10-2063] Module not found while processing module instance [/home/nico/master-thesis/netpfga/minip4/simple_sume_switch/hw/project/simple_sume_switch.srcs/sources_1/ip/nf_sume_sdnet_ip/nf_sume_sdnet_ip/SimpleSumeS witch/SimpleSumeSwitch.v:474] ERROR: [VRFC 10-2063] Module not found while processing module instance [/home/nico/master-thesis/netpfga/minip4/simple_sume_switch/hw/project/simple_sume_switch.srcs/sources_1/ip/nf_sume_sdnet_ip/nf_s ume_sdnet_ip/SimpleSumeSwitch/SimpleSumeSwitch.v:502] ERROR: [VRFC 10-2063] Module not found while processing module instance [/home/nico/master-thesis/netpfga/minip4/simple_sume_switch/hw/project/simple_sume_switch.srcs/sources_1/ip/nf_sume_sdnet_ip/nf_sume_sdnet_ip/SimpleS umeSwitch/SimpleSumeSwitch.v:533] ERROR: [VRFC 10-2063] Module not found while processing module instance [/home/nico/master-thesis/netpfga/minip4/simple_sume_switch/hw/project/simple_sume_switch.srcs/sources_1/ip/nf_sume_sdnet_ip/nf_sume_sdnet_ip/SimpleSumeS witch/SimpleSumeSwitch.v:561] # launch_simulation -simset sim_1 -mode behavioral INFO: [Vivado 12-5698] Checking validity of IPs in the design for the 'XSim' simulator... CRITICAL WARNING: [BD 41-1356] Address block is not mapped into . Please use Address Editor to either map or exclude it. CRITICAL WARNING: [BD 41-1356] Address block is not mapped into . Please use Address Editor to either map or exclude it. WARNING: [VRFC 10-756] identifier state is used before its declaration [/home/nico/master-thesis/netpfga/minip4/simple_sume_switch/hw/project/simple_sume_switch.srcs/sources_1/ip/axis_sim_record_ip0/hdl/axis_sim_record.v:93] WARNING: [VRFC 10-756] identifier ready_count is used before its declaration [/home/nico/master-thesis/netpfga/minip4/simple_sume_switch/hw/project/simple_sume_switch.srcs/sources_1/ip/axis_sim_record_ip0/hdl/axis_sim_record.v:94] INFO: [#UNDEF] Sorry, too many errors.. ERROR: [XSIM 43-3322] Static elaboration of top level Verilog design unit(s) in library work failed. INFO: [USF-XSim-69] 'elaborate' step finished in '1' seconds INFO: [USF-XSim-99] Step results log file:'/home/nico/master-thesis/netpfga/minip4/simple_sume_switch/hw/project/simple_sume_switch.sim/sim_1/behav/xsim/elaborate.log' ERROR: [USF-XSim-62] 'elaborate' step failed with error(s). Please check the Tcl console output or '/home/nico/master-thesis/netpfga/minip4/simple_sume_switch/hw/project/simple_sume_switch.sim/sim_1/behav/xsim/elaborate.log' file for more information. nico@nsg-System:~/master-thesis$ find . -name elaborate.log nico@nsg-System:~/master-thesis$ find ~ -name elaborate.log nico@nsg-System:~/master-thesis$ - Scripts that "fail" (generate wrong data) do exit 0 -> There is no easy / reliable error detection - Writing tables resulted in ioctl errors - Hardware test: unclear if first board was/is broken or not, BUT: second board in different computer allows writing tables - Many scripts depend on each other in later stages, without clear dependencies - There is basically no documentation for someone who "just wants to compile from P4 to netpfga" or A LOT of documentation (if vivado, vhld, sdnet documentation is counted) - Very high complexity in toolchain, scripts that are generated + cd /home/nico/projects/P4-NetFPGA/contrib-projects/sume-sdnet-switch/projects/minip4/simple_sume_switch/test/sim_switch_default + make rm -f config_writes.py* rm -f *.pyc nico@nsg-System:~$ cat /home/nico/projects/P4-NetFPGA/contrib-projects/sume-sdnet-switch/projects/minip4/testdata/config_writes.py from NFTest import * NUM_WRITES = 4 def config_tables(): nftest_regwrite(0x44020050, 0x22222208) nftest_regwrite(0x44020054, 0x00000822) nftest_regwrite(0x44020080, 0x00000201) nftest_regwrite(0x44020040, 0x00000001) nico@nsg-System:~$ cat /home/nico/projects/P4-NetFPGA/contrib-projects/sume-sdnet-switch/projects/minip4/testdata/config_writes.sh #!/bin/bash ${SUME_SDNET}/sw/sume/rwaxi -a 0x44020050 -w 0x22222208 ${SUME_SDNET}/sw/sume/rwaxi -a 0x44020054 -w 0x00000822 ${SUME_SDNET}/sw/sume/rwaxi -a 0x44020080 -w 0x00000201 ${SUME_SDNET}/sw/sume/rwaxi -a 0x44020040 -w 0x00000001 nico@nsg-System:~$ - Misleading errors like ERROR: [USF-XSim-62] 'elaborate' step failed with error(s). Please check the Tcl console output or '/home/nico/master-thesis/netpfga/minip4/simple_sume_switch/hw/project/simple_sume_switch.sim/sim_1/behav/xsim/elaborate.log' file for more information. nico@nsg-System:~/master-thesis/netpfga$ ls /home/nico/master-thesis/netpfga/minip4/simple_sume_switch/hw/project/simple_sume_switch.sim/sim_1/behav/xsim/elaborate.log ls: cannot access '/home/nico/master-thesis/netpfga/minip4/simple_sume_switch/hw/project/simple_sume_switch.sim/sim_1/behav/xsim/elaborate.log': No such file or directory - not using raise() and hiding source of errors (_hexify) - sometimes flashing fails: #+BEGIN_CENTER nico@nsg-System:~/projects/P4-NetFPGA/contrib-projects/sume-sdnet-switch/projects/minip4/simple_sume_switch/bitfiles$ sudo bash -c ". $HOME/master-thesis/netpfga/bashinit && $(pwd -P)/program_switch.sh" ++ which vivado + xilinx_tool_path=/opt/Xilinx/Vivado/2018.2/bin/vivado + bitimage=minip4.bit + configWrites=config_writes.sh + '[' -z minip4.bit ']' + '[' -z config_writes.sh ']' + '[' /opt/Xilinx/Vivado/2018.2/bin/vivado == '' ']' + rmmod sume_riffa + xsct /home/nico/projects/P4-NetFPGA/contrib-projects/sume-sdnet-switch/tools/run_xsct.tcl -tclargs minip4.bit rlwrap: warning: your $TERM is 'screen' but rlwrap couldn't find it in the terminfo database. Expect some problems. RUN loading image file. minip4.bit 100% 19MB 1.7MB/s 00:11 fpga configuration failed. DONE PIN is not HIGH invoked from within "::tcf::eval -progress ::xsdb::print_progress {::tcf::cache_enter tcfchan#0 {tcf_cache_eval {process_tcf_actions_cache_client ::tcfclient#0::arg}}}" (procedure "::tcf::cache_eval_with_progress" line 2) invoked from within "::tcf::cache_eval_with_progress [dict get $arg chan] [list process_tcf_actions_cache_client $argvar] $progress" (procedure "process_tcf_actions" line 1) invoked from within "process_tcf_actions $arg ::xsdb::print_progress" (procedure "fpga" line 430) invoked from within "fpga -f $bitimage" (file "/home/nico/projects/P4-NetFPGA/contrib-projects/sume-sdnet-switch/tools/run_xsct.tcl" line 33) + bash /home/nico/projects/P4-NetFPGA/contrib-projects/sume-sdnet-switch/tools/pci_rescan_run.sh Check programming FPGA or Reboot machine ! + rmmod sume_riffa rmmod: ERROR: Module sume_riffa is not currently loaded + modprobe sume_riffa + ifconfig nf0 up nf0: ERROR while getting interface flags: No such device + ifconfig nf1 up nf1: ERROR while getting interface flags: No such device + ifconfig nf2 up nf2: ERROR while getting interface flags: No such device + ifconfig nf3 up nf3: ERROR while getting interface flags: No such device + bash config_writes.sh nico@nsg-System:~/projects/P4-NetFPGA/contrib-projects/sume-sdnet-switch/projects/minip4/simple_sume_switch/bitfiles$ #+END_CENTER \end{verbatim} \section{\label{conclusion:realworld}Real world applications} Can be deployed using the netpfga. Or Barefoot or Arista. \section{\label{conclusion:outlook}Outlook} %** Outlook.tex: What needs to be done further, what is planed % What are the consequences of your work for future work? Different HW Speed only limited to line speed. Could be running at 100 Gbit/s without modifications. PMTU handling error cases Our algorithm uses the IPv4-Compatible IPv6 Address\cite{rfc4291} to embed IPv4 addresses. However RFC6052\cite{rfc6052} defines different embeddings depending on the prefix size. A future version should support these schemes to be compatible to other implementations. No fragmentation No address / mac learning **** No DNS64 has already been solved in a different domain - could even do transparent / in network modification **** Incomplete NDP Very limited option support No resolution of hardware addresses \section{\label{conclusion:closing}Closing words (NAME?)} While the port to NetPFGA was significantly more effort then expected, the learnings of the different layers were very much appreciated / liked It was a \section{todo - FIXME: remove} \begin{verbatim} ***** Summary eher kurz ***** Outlook als subsection! \end{verbatim}