Difference between revisions of "NVIDIA Data Processing Units"

From Beam Line Controls
Jump to navigation Jump to search
(DPU setup)
 
 
(One intermediate revision by the same user not shown)
Line 105: Line 105:
You can now update the DPU firmware:
You can now update the DPU firmware:


<pre>sudo /opt/mellanox/mlnx-fw-updater/mlnx_fw_updater.pl</pre>
<pre># sudo /opt/mellanox/mlnx-fw-updater/mlnx_fw_updater.pl</pre>


and configure it:
and configure it:


<pre>
<pre>
sudo mst start
# sudo mst start
# Reset all settings
# sudo mlxconfig -d /dev/mst/mt41686_pciconf0 -y reset   # Reset all settings
sudo mlxconfig -d /dev/mst/mt41686_pciconf0 -y reset  
# sudo mlxconfig -d /dev/mst/mt41686_pciconf0 s LINK_TYPE_P1=2   # Set port 1 to Ethernet mode (not Infiniband)
# Set port 1 to Ethernet mode (not Infiniband)
sudo mlxconfig -d /dev/mst/mt41686_pciconf0 s LINK_TYPE_P1=2
</pre>
</pre>


Line 121: Line 119:
== References ==
== References ==


[https://docs.nvidia.com/networking/display/BlueFieldDPUOSLatest/Modes+of+Operation BlueField Modes of Operation]
* [https://docs.nvidia.com/networking/display/BlueFieldDPUOSLatest/Modes+of+Operation BlueField Modes of Operation]
[https://docs.nvidia.com/networking/display/BlueFieldDPUOSLatest/Functional+Diagram BlueField Functional Diagram]
* [https://docs.nvidia.com/networking/display/BlueFieldDPUOSLatest/Functional+Diagram BlueField Functional Diagram]
[https://docs.nvidia.com/networking/display/BlueFieldDPUOSLatest/Kernel+Representors+Model BlueField Kernel Representors] (names for the passthrough interfaces from the DPU side)
* [https://docs.nvidia.com/networking/display/BlueFieldDPUOSLatest/Kernel+Representors+Model BlueField Kernel Representors] (names for the passthrough interfaces from the DPU side)

Latest revision as of 18:50, 17 May 2023

Introduction

NVIDIA DPUs are expansion cards which allow for offloading of certain network-traffic related tasks from the host CPU. They comprise of an ARM CPU, memory, and high-speed ConnectX NIC on the same board. Network packets to/from the host can be manipulated by the ARM CPU using programs written with the DPDK or DOCA SDKs. The DPU ARM system runs its own OS; as shipped by NVIDIA, this is currently Ubuntu Linux.

DPU host software setup

Bluefield DPU Administrator Quick Start Guide (NVIDIA)

On a RHEL8 machine, first install the RPM package which contains the DOCA and DPU-related packages. This includes both a local copy of the necessary RPMs and enables a YUM repo for updates.

$ wget https://www.mellanox.com/downloads/DOCA/DOCA_v1.5.1/doca-host-repo-rhel86-1.5.1-0.1.8.1.5.1007.1.el8.5.8.1.1.2.1.x86_64.rpm
# yum install ./doca-host-repo-rhel86-1.5.1-0.1.8.1.5.1007.1.el8.5.8.1.1.2.1.x86_64.rpm

We find that the NVIDIA repos tend to timeout when accessed from the APS, so add to the end of /etc/yum.conf:

minrate=10 
timeout=300

Then install the necessary RPMs, allowing for downgrades and package removals:

# yum makecache 
# yum install --allowerasing --nobest doca-runtime doca-tools pv

rshim is a userspace tool which allows for configuration of NVIDIA Mellanox cards. Ensure rshim is running with systemctl status rshim (look for "loaded" and "enabled").

mst, or Mellanox Software Tools, is a userspace program which creates a device tree used for configuration.

# mst start 
Starting MST (Mellanox Software Tools) driver set 
Loading MST PCI module - Success 
Loading MST PCI configuration module - Success 
Create devices 
Unloading MST PCI module (unused) - Success 
# mst status -v 
MST modules: 
------------ 
    MST PCI module is not loaded 
    MST PCI configuration module loaded 
PCI devices: 
------------ 
DEVICE_TYPE             MST                           PCI       RDMA            NET                       NUMA   
BlueField2(rev:1)       /dev/mst/mt41686_pciconf0     ca:00.0   mlx5_0          net-ib0                   1

Get a new Bluefield OS system image:

$ wget https://content.mellanox.com/BlueField/BFBs/Ubuntu20.04/DOCA_1.5.1_BSP_3.9.3_Ubuntu_20.04-4.2211-LTS.signed.bfb

and install it:

# bfb-install --bfb DOCA_1.5.1_BSP_3.9.3_Ubuntu_20.04-4.2211-LTS.signed.bfb --rshim rshim0
Collecting BlueField booting status. Press Ctrl+C to stop… 
INFO[BL2]: start 
INFO[BL2]: DDR POST passed 
INFO[BL2]: UEFI loaded 
INFO[BL31]: start 
INFO[BL31]: runtime 
INFO[UEFI]: UPVS valid 
INFO[UEFI]: eMMC init 
INFO[UEFI]: eMMC probed 
INFO[UEFI]: PMI: updates started 
INFO[UEFI]: PMI: boot image update 
INFO[UEFI]: PMI: updates completed, status 0 
INFO[UEFI]: PCIe enum start 
INFO[UEFI]: PCIe enum end 
INFO[MISC]: Ubuntu installation started 
INFO[MISC]: Installing OS image 
INFO[MISC]: Installation finished

Only if DPU remote access (LAN/internet) is required, enable ip routing on host: add to /etc/sysctl.d/50-dpu.conf

net.ipv4.conf.all.forwarding = 1 
net.ipv6.conf.all.forwarding = 1

and setup IPv4 masquerading via nftables:

# nft add table nat 
# nft -- add chain nat prerouting { type nat hook prerouting priority -100 \; } 
# nft -- add chain nat postrouting { type nat hook postrouting priority 100 \; } 
# nft add rule nat postrouting oifname "ens6f0" snat to $(host IP) 
# nft list ruleset > /etc/nftables/dpu_nat.nft 
# echo "include "/etc/nftables/dpu_nat.nft" >> /etc/sysconfig/nftables.conf 
# systemctl enable nftables.service

DPU login and configuration

First set up an appropriate IP configuration for the tmfifo_net0 interface on the host. The DPU is factory-configured at 192.168.100.2.

# nmcli conn add type tun mode tap con-name tmfifo_net0 ifname tmfifo_net0 autoconnect yes ip4 192.168.100.1/24 ipv4.never-default true 
# nmci conn up tmfifo_net0

then login via ssh:

$ ssh [email protected]

You should receive the Ubuntu OS login prompt, and will be prompted to update the user password.

You can now update the DPU firmware:

# sudo /opt/mellanox/mlnx-fw-updater/mlnx_fw_updater.pl

and configure it:

# sudo mst start
# sudo mlxconfig -d /dev/mst/mt41686_pciconf0 -y reset   # Reset all settings
# sudo mlxconfig -d /dev/mst/mt41686_pciconf0 s LINK_TYPE_P1=2    # Set port 1 to Ethernet mode (not Infiniband)

Reboot the host and ensure settings persist.

References