TrueNAS Scale - Infiniband¶
Well, I have been wanting to get infiniband working for a while. I decided today was as good of a time as any.... So, I made it happen.
Info
This post was originally published in 2022, and has been adopted to this static-site from wordpress.
Summary¶
- Was able to get infiniband working
- IPoIB performance is utter garbage.
- I didn't want to hack up TrueNAS more to support RDMA
- Gave up, went back to my Chelsio 40GBe adaptor.
Step 1. Install a few packages.¶
By default, TrueNAS Scale has apt "disabled". Please see THIS POST on how to re-enable it.
Once you have... re-enabled the ability to install packages, AND added the debian repos, we need to install a few packages to assist.
At this point, you should be able to run ibstat and get a status. If you don't have a status, make sure your card is installed. I am using a ConnectX-3 (Non-pro)
To note, I have one port set as infiniband, and the other port set as ethernet.
root@truenas:~# ibstat
CA 'mlx4_0'
CA type: MT4099
Number of ports: 2
Firmware version: 2.42.5000
Hardware version: 1
Node GUID: 0x506b4b03007bfc50
System image GUID: 0x506b4b03007bfc53
Port 1:
State: Initializing
Physical state: LinkUp
Rate: 40
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x02514868
Port GUID: 0x506b4b03007bfc51
Link layer: InfiniBand
Port 2:
State: Down
Physical state: Disabled
Rate: 10
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x00010000
Port GUID: 0x526b4bfffe7bfc52
Link layer: Ethernet
How to change configuration parameters¶
# First- get the PCI address of your card.
root@truenas:~# lspci | grep Mell
03:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]
# In this case- 03:00.0 is my address.
# Show current configuration
root@truenas:~# mstconfig -d 03:00.0 query
Device #1:
----------
Device type: ConnectX3
Device: 03:00.0
Configurations: Next Boot
SRIOV_EN True(1)
NUM_OF_VFS 8
LINK_TYPE_P1 IB(1)
LINK_TYPE_P2 ETH(2)
LOG_BAR_SIZE 3
BOOT_PKEY_P1 0
BOOT_PKEY_P2 0
BOOT_OPTION_ROM_EN_P1 False(0)
BOOT_VLAN_EN_P1 False(0)
BOOT_RETRY_CNT_P1 0
LEGACY_BOOT_PROTOCOL_P1 PXE(1)
BOOT_VLAN_P1 1
BOOT_OPTION_ROM_EN_P2 False(0)
BOOT_VLAN_EN_P2 False(0)
BOOT_RETRY_CNT_P2 0
LEGACY_BOOT_PROTOCOL_P2 PXE(1)
BOOT_VLAN_P2 1
IP_VER_P1 IPv4(0)
IP_VER_P2 IPv4(0)
CQ_TIMESTAMP True(1)
# Set port mode to infiniband
mstconfig -d 03:00.0 set LINK_TYPE_P1=1
# Set port mode to ethernet
mstconfig -d 03:00.0 set LINK_TYPE_P1=2
# You can also set multiple properties in the same command
mstconfig -d 03:00.0 set LINK_TYPE_P1=1 LINK_TYPE_P2=1
# Disable boot rom, if you desire. I don't have plans on using it.
mstconfig -d 03:00.0 set LINK_TYPE_P1=1 BOOT_OPTION_ROM_EN_P1=0 BOOT_OPTION_ROM_EN_P2=0
Step 2. Ping across the network¶
This guide ASSUMES you already have a working physical link established between two infiniband cards.
In my case, I have a windows client, and a TrueNAS server.
Next, lets lists connected hosts
root@truenas:~# ibhosts
ibwarn: [2618545] get_abi_version: can't read ABI version from /sys/class/infiniband_mad/abi_version (No such file or directory): is ib_umad module loaded?
ibwarn: [2618545] mad_rpc_open_port: can't open UMAD port ((null):0)
../libibnetdisc/ibnetdisc.c:802; can't open MAD port ((null):0)
/usr/sbin/ibnetdiscover: iberror: failed: discover failed
Seems... a kernel module is missing. Lets add it, and try again.
root@truenas:~# modprobe ib_umad
# I also went ahead and loaded the infiniband IP over InfiniBand module.
root@truenas:~# modprobe ib_ipoib
root@truenas:~# ibhosts
Ca : 0x0002c90300a29060 ports 2 "XTREMEOWNAGE-PC"
Ca : 0x506b4b03007bfc50 ports 2 "MT25408 ConnectX Mellanox Technologies"
Well, look at that. The two PCs can see each other over infiniband.
The next step, was to allocate an ip address for the infiniband adapter.
ibp3s0: flags=4098<BROADCAST,MULTICAST> mtu 4092
unspec 80-00-02-28-FE-80-00-00-00-00-00-00-00-00-00-00 txqueuelen 256 (UNSPEC)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
As we can see, it doesn't have an address. I provisioned the IP address through the TrueNAS WebGUI.
After saving the changes, applying, etc.... I was able to ping the local IP, but, not the remote IP.
root@truenas:~# ping 10.100.255.1
PING 10.100.255.1 (10.100.255.1) 56(84) bytes of data.
64 bytes from 10.100.255.1: icmp_seq=1 ttl=64 time=0.052 ms
^C
--- 10.100.255.1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.052/0.052/0.052/0.000 ms
root@truenas:~# ping 10.100.255.2
PING 10.100.255.2 (10.100.255.2) 56(84) bytes of data.
^C
Doing a bit of troubleshooting, I ran ibstat on both ends.
and, determined the link was stuck in status "Initializing"
After a bit of research, I determined a subnet manager needs to exist, for a point to point link.
So...
root@truenas:~# apt-get install opensm
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following NEW packages will be installed:
opensm
...
At this point, I decided to do a reboot, to ensure all of the modules are being properly loaded at boot.
Or... to see if.... all of the above work has been reset.
After rebooting, all of the configuration and packages were still in place, However, the kernel modules were not loaded back in. So... I reloaded the modules manually, and added a config to load them at boot.
root@truenas:~# modprobe ib_ipoib ib_umad ib_uverbs rdma_ucm
root@truenas:~# nano /etc/modules-load.d/infiniband.conf
# File Contents:
ib_umad
ib_ipoib
ib_uverbs
rdma_ucm
The next step, was to get opensm up and running.
root@truenas[/etc/modules-load.d]# systemctl status opensm
● opensm.service - Starts the OpenSM InfiniBand fabric Subnet Managers
Loaded: loaded (/lib/systemd/system/opensm.service; enabled; vendor preset: enabled)
Active: inactive (dead)
Condition: start condition failed at Sat 2022-03-26 13:26:55 CDT; 8min ago
Docs: man:opensm(8)
Mar 26 13:26:55 truenas.local.xtremeownage.com systemd[1]: Condition check resulted in Starts the OpenSM InfiniBand fabric Subnet Managers being skipped.
## I assume, it failed to start due to the missing kernel modules. Lets start it.
root@truenas[/etc/modules-load.d]# systemctl start opensm
root@truenas[/etc/modules-load.d]# systemctl status opensm
● opensm.service - Starts the OpenSM InfiniBand fabric Subnet Managers
Loaded: loaded (/lib/systemd/system/opensm.service; enabled; vendor preset: enabled)
Active: active (exited) since Sat 2022-03-26 13:35:38 CDT; 1s ago
Docs: man:opensm(8)
Process: 296934 ExecCondition=/bin/sh -c if test "$PORTS" = NONE; then echo "opensm is disabled via PORTS=NONE."; exit 1; fi (code=exited, status=0/SUCCESS)
Process: 296935 ExecStart=/bin/sh -c if test "$PORTS" = ALL; then PORTS=$(/usr/sbin/ibstat -p); if test -z "$PORTS"; then echo "No InfiniBand ports found."; exi>
Main PID: 296935 (code=exited, status=0/SUCCESS)
Mar 26 13:35:38 truenas.local.xtremeownage.com systemd[1]: Starting Starts the OpenSM InfiniBand fabric Subnet Managers...
Mar 26 13:35:38 truenas.local.xtremeownage.com sh[296935]: Starting opensm on following ports: 0x506b4b03007bfc51
Mar 26 13:35:38 truenas.local.xtremeownage.com sh[296935]: 0x526b4bfffe7bfc52
Mar 26 13:35:38 truenas.local.xtremeownage.com systemd[1]: Finished Starts the OpenSM InfiniBand fabric Subnet Managers.
## Much better.
At this point, I checked the stats for the infiniband nics.
root@truenas[/etc/modules-load.d]# ibstat
CA 'mlx4_0'
CA type: MT4099
Number of ports: 2
Firmware version: 2.42.5000
Hardware version: 1
Node GUID: 0x506b4b03007bfc50
System image GUID: 0x506b4b03007bfc53
Port 1:
State: Active
Physical state: LinkUp
Rate: 40
Base lid: 1
LMC: 0
SM lid: 1
Capability mask: 0x0251486a
Port GUID: 0x506b4b03007bfc51
Link layer: InfiniBand
Port 2:
State: Down
Physical state: Disabled
Rate: 10
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x00010000
Port GUID: 0x526b4bfffe7bfc52
Link layer: Ethernet
## Boom, good to go on the truenas side.
BADPROMPT:ibstat
CA 'ibv_device0'
CA type:
Number of ports: 2
Firmware version: 2.42.5000
Hardware version: 0x0
Node GUID: 0x0002c90300a29060
System image GUID: 0x0002c90300a29063
Port 1:
State: Down
Physical state: Disabled
Rate: 1
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x80500000
Port GUID: 0x0202c9fffea29061
Link layer: Ethernet
Transport: RoCE v1.25
Port 2:
State: Active
Physical state: LinkUp
Rate: 40
Real rate: 32.00 (QDR)
Base lid: 2
LMC: 0
SM lid: 1
Capability mask: 0x90580000
Port GUID: 0x0002c90300a29062
Link layer: IB
Transport: IB
Excellent, the link is active on both sides!
Since, the module was not active when the machine was rebooted, the interface was down. I used iconfig to bring it up.
root@truenas[/etc/modules-load.d]# ifconfig ibp3s0 up
root@truenas[/etc/modules-load.d]# ifconfig ibp3s0
ibp3s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 2044
inet 10.100.255.1 netmask 255.255.255.252 broadcast 10.100.255.3
inet6 fe80::526b:4b03:7b:fc51 prefixlen 64 scopeid 0x20<link>
unspec 80-00-02-28-FE-80-00-00-00-00-00-00-00-00-00-00 txqueuelen 256 (UNSPEC)
RX packets 2 bytes 212 (212.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 37 bytes 11759 (11.4 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
root@truenas[/etc/modules-load.d]# ping 10.100.255.1
PING 10.100.255.1 (10.100.255.1) 56(84) bytes of data.
64 bytes from 10.100.255.1: icmp_seq=1 ttl=64 time=0.056 ms
64 bytes from 10.100.255.1: icmp_seq=2 ttl=64 time=0.049 ms
^C
--- 10.100.255.1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1030ms
rtt min/avg/max/mdev = 0.049/0.052/0.056/0.003 ms
We can now ping over IP. Lets do a speed test.
## Since, TrueNAS doesn't ship with iperf2 (Does include iperf3, however, getting speed results above 10G isn't consistent using iperf3, due to its single-threaded nature....
## So, I prefer iperf2 for testing over 10G.
root@truenas[/etc/modules-load.d]# apt-get install iperf
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following NEW packages will be installed:
iperf
0 upgraded, 1 newly installed, 0 to remove and 105 not upgraded.
...
root@truenas[/etc/modules-load.d]# iperf -v
iperf version 2.0.14a (2 October 2020) pthreads
root@truenas[/etc/modules-load.d]# iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 128 KByte (default)
------------------------------------------------------------
## on the windows side.
BADPROMPT:iperf.exe -c 10.100.255.1
------------------------------------------------------------
Client connecting to 10.100.255.1, TCP port 5001
TCP window size: 208 KByte (default)
------------------------------------------------------------
[ 3] local 10.100.255.2 port 23277 connected with 10.100.255.1 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 7.69 GBytes 6.60 Gbits/sec
BADPROMPT:iperf.exe -c 10.100.255.1 -P 10
------------------------------------------------------------
Client connecting to 10.100.255.1, TCP port 5001
TCP window size: 208 KByte (default)
------------------------------------------------------------
[ 12] local 10.100.255.2 port 23297 connected with 10.100.255.1 port 5001
[ 10] local 10.100.255.2 port 23295 connected with 10.100.255.1 port 5001
[ 11] local 10.100.255.2 port 23296 connected with 10.100.255.1 port 5001
[ 9] local 10.100.255.2 port 23294 connected with 10.100.255.1 port 5001
[ 7] local 10.100.255.2 port 23292 connected with 10.100.255.1 port 5001
[ 6] local 10.100.255.2 port 23291 connected with 10.100.255.1 port 5001
[ 8] local 10.100.255.2 port 23293 connected with 10.100.255.1 port 5001
[ 5] local 10.100.255.2 port 23290 connected with 10.100.255.1 port 5001
[ 4] local 10.100.255.2 port 23289 connected with 10.100.255.1 port 5001
[ 3] local 10.100.255.2 port 23288 connected with 10.100.255.1 port 5001
[ ID] Interval Transfer Bandwidth
[ 12] 0.0-10.0 sec 1.07 GBytes 916 Mbits/sec
[ 10] 0.0-10.0 sec 899 MBytes 753 Mbits/sec
[ 6] 0.0-10.0 sec 1.01 GBytes 870 Mbits/sec
[ 8] 0.0-10.0 sec 220 MBytes 185 Mbits/sec
[ 5] 0.0-10.0 sec 561 MBytes 470 Mbits/sec
[ 9] 0.0-10.1 sec 695 MBytes 578 Mbits/sec
[ 11] 0.0-10.1 sec 674 MBytes 557 Mbits/sec
[ 7] 0.0-10.2 sec 346 MBytes 285 Mbits/sec
[ 4] 0.0-10.2 sec 253 MBytes 208 Mbits/sec
[ 3] 0.0-10.2 sec 681 MBytes 560 Mbits/sec
[SUM] 0.0-10.2 sec 6.31 GBytes 5.31 Gbits/sec
Well, that was extremely disappointing!
Troubleshooting poor performance¶
The first step- is always to check MTU. I found the server was configured for 8,000, however, my client only supported 4092 max.
Just, to confirm this wasn't an iperf issue, I decided to do a quick iscsi performance test.
Only, 5Gbits..... far short of the expected 40/56Gbits.
Well... lets doublecheck MTU.
root@truenas[/etc/modules-load.d]# ifconfig ibp3s0 mtu 4092
root@truenas[/etc/modules-load.d]# ifconfig ibp3s0
ibp3s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 2044
inet 10.100.255.1 netmask 255.255.255.252 broadcast 10.100.255.3
inet6 fe80::526b:4b03:7b:fc51 prefixlen 64 scopeid 0x20<link>
unspec 80-00-02-28-FE-80-00-00-00-00-00-00-00-00-00-00 txqueuelen 256 (UNSPEC)
RX packets 23373377 bytes 47020734796 (43.7 GiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 6652816 bytes 6411049427 (5.9 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
## Appears the card on my server only supports 2044 MTU... this might explain the issues.
So... I set the MTU on both ends to 2044.
Sadly, this did not help performance.
Conclusion.¶
I DID successfully manage to get infiniband up and running.
I DID learn, IPoIB (IP over InfiniBand) is actually quite slow. To realize huge performance gains over using the NIC in ETH mode, you would need to leverage RDMA/RCOE.
Since, I have hacked up my TrueNAS install enough for the day, I decided to switch everything back to ethernet mode, and call it a day. I can easily get 40G out of ethernet mode.
How to switch back to ethernet mode.¶
If you recall earlier, I left one interface in ETH mode.... I am just going to switch the physical adaptor, and move the ip address over... so, I don't need to reboot my PC.
ifconfig ibp3s0 0.0.0.0 0.0.0.0
ifconfig enp3s0d1 10.100.255.1/30 mtu 9000 up
root@truenas[/etc/modules-load.d]# ifconfig enp3s0d1
enp3s0d1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9000
inet 10.100.255.1 netmask 255.255.255.252 broadcast 10.100.255.3
inet6 fe80::526b:4bff:fe7b:fc52 prefixlen 64 scopeid 0x20<link>
## On the windows side, I also physically swapped the adaptors over.
root@truenas[/etc/modules-load.d]# ping 10.100.255.2
PING 10.100.255.2 (10.100.255.2) 56(84) bytes of data.
64 bytes from 10.100.255.2: icmp_seq=1 ttl=128 time=0.482 ms
64 bytes from 10.100.255.2: icmp_seq=2 ttl=128 time=0.162 ms
## Connection is good to.... Lets test performance.
[SUM] 0.0000-10.0208 sec 10.7 GBytes 9.15 Gbits/sec
## Appears my client is only connected at 10G.... interesting.
root@truenas[/etc/modules-load.d]# ethtool enp3s0d1
Settings for enp3s0d1:
Supported ports: [ FIBRE ]
Supported link modes: 10000baseKX4/Full
40000baseCR4/Full
40000baseSR4/Full
56000baseCR4/Full
56000baseSR4/Full
1000baseX/Full
10000baseCR/Full
10000baseSR/Full
Supported pause frame use: Symmetric Receive-only
Supports auto-negotiation: Yes
Supported FEC modes: Not reported
Advertised link modes: 10000baseKX4/Full
40000baseCR4/Full
40000baseSR4/Full
1000baseX/Full
10000baseCR/Full
10000baseSR/Full
Advertised pause frame use: Symmetric
Advertised auto-negotiation: Yes
Advertised FEC modes: Not reported
Speed: 10000Mb/s
Duplex: Full
Auto-negotiation: off
Port: FIBRE
PHYAD: 0
Transceiver: internal
Supports Wake-on: d
Wake-on: d
Current message level: 0x00000014 (20)
link ifdown
Link detected: yes
## Wonder what happens if I uh, FORCE it to 40g.
root@truenas[/etc/modules-load.d]# ethtool -s enp3s0d1 speed 40000 duplex full autoneg off
## Turns out, the connection on the windows side dropped.... I am starting to think this particular ConnectX-3 does not support 40G.
## I know the adaptor on the server side supports 40G, because, its the same exact adaptor i used to benchmark 40G in a previous post!
I found documentation on my particular NIC here: https://downloads.dell.com/manuals/all-products/esuprt_ser_stor_net/esuprt_pedge_srvr_ethnt_nic/mellanox-adapters_users-guide_en-us.pdf
Based on the specifications, it "should" support 40G ethernet mode...
Edit- Turns out, it supported 40G infiniband, but, only 10GBe ethernet.
Yup. Oh well. Putting the chelsio NIC back in my PC. It had no problems doing 40G.