Proxmox: Using ZFS over iSCSI with Unraid¶
So, recently I decided to acquire a new Lenovo P520, which will be for running Unraid. Before, My unraid ran as a VM on my r730XD.
Now- it is bare metal on the P520, with a MD1200 disk shelf for its 3.5" HDDs.
One of my goals, I wanted to transition some of my VMs to leverage ZFS storage, hosted on this box.
This post goes through the steps of configuring Proxmox to leverage ZFS over iSCSI, hosted on Unraid.
Testing Setup¶
Here is the servers I will be testing & benchmarking with.
-
Server: Unraid
- Model: Lenovo P520
- CPU: Intel® Xeon® W-2135 @ 3.70 GHz
- Memory: 128 GiB DDR4 Single-bit ECC
- Network: ConnectX-4 Dual-Port 100 GbE NIC
- OS: Unraid
- ZFS Pool “cache”: 3× mirrored NVMe vdevs:
- Mirror-1: 2 × Samsung 970 EVO Plus 1 TB
- Mirror-2: 2 × Samsung PM963 (MZ1LW960HMJP-00003) 960 GB NVMe M.2 22110 :contentReference[oaicite:0]{index=0}
- Mirror-3: 2 × Samsung PM963 (same model as above)
-
Client: Proxmox PVE
- Model: Dell Optiplex 7060 SFF
- Memory: 64 GiB DDR4
- Network: ConnectX-4 Dual-Port 25 GbE NIC (bonded)
- OS: Proxmox
Steps¶
For, anyone wanting to do this, I strongly recommend looking at the Proxmox: ZFS over iSCSI documentation.
Configure PVE Hosts¶
These commands must be ran for each PVE host.
This will install the packages, and update the initiator name to be something less random.
# Set Proxmox IQNs
HOST=$(hostname -s)
IQN_DATE="2025-08"
IQN_DOMAIN="com.xtremeownage.svr"
IQN_TARGET="iqn.${IQN_DATE}.${IQN_DOMAIN}:${HOST}"
PORTAL_IP="10.100.4.24"
# Install / Enable targetcli-fb
apt install targetcli-fb -y
systemctl enable --now targetclid.service
echo "InitiatorName=${IQN_TARGET}" > /etc/iscsi/initiatorname.iscsi
systemctl restart open-iscsi
echo "IQN for this host has been set to: ${IQN_TARGET}"
Configure PVE Cluster¶
This script only needs to be executed ONCE for the cluster. Does not matter which node it runs on.
This creates the keys used to access the target host.
PORTAL_IP="10.100.4.24"
## Run ONCE on PVE Host - Create Target.
mkdir -p /etc/pve/priv/zfs
ssh-keygen -f /etc/pve/priv/zfs/${PORTAL_IP}_id_rsa
ssh-copy-id -i /etc/pve/priv/zfs/${PORTAL_IP}_id_rsa.pub root@${PORTAL_IP}
We will need to create the storage now.
zfs: Unraid-ZFS
blocksize 1m
iscsiprovider LIO
pool cache/iscsi
portal 10.100.4.24
target iqn.2025-08.com.xtremeownage.svr:tower-iscsi
content rootdir,images
lio_tpg tpg1
nowritecache 1
sparse 1
zfs-base-path /dev/zvol
Configure Unraid¶
Configure targetcli¶
# Server Target
IQN_DATE="2025-08"
IQN_DOMAIN="com.xtremeownage.svr"
TARGET="10.100.4.24"
IQN_TARGET="iqn.${IQN_DATE}.${IQN_DOMAIN}:tower-iscsi"
# Proxmox hosts
PROXMOX_HOSTS=("kube01" "kube04" "kube05" "kube06")
PROXMOX_IQN_DATE="2025-08"
PROXMOX_IQN_DOMAIN="com.xtremeownage.svr"
# Create Target (skip if exists)
targetcli /iscsi create ${IQN_TARGET} 2>/dev/null || echo "Target already exists"
# Create Portal (skip if exists)
targetcli /iscsi/${IQN_TARGET}/tpg1/portals create ${TARGET} 2>/dev/null || echo "Portal already exists"
# Add ACLs for each Proxmox host
for HOST in "${PROXMOX_HOSTS[@]}"; do
CLIENT_IQN="iqn.${PROXMOX_IQN_DATE}.${PROXMOX_IQN_DOMAIN}:${HOST}"
echo "Adding ACL for ${HOST} with IQN ${CLIENT_IQN}"
targetcli /iscsi/${IQN_TARGET}/tpg1/acls create ${CLIENT_IQN} 2>/dev/null || echo "ACL for ${HOST} already exists"
done
targetcli saveconfig
targetcli exit
Expected Output from targetcli
Afterwards, should have this:
/iscsi> ls
o- iscsi .............................................................................................................. [Targets: 1]
o- iqn.2025-08.com.xtremeownage.svr:tower-iscsi ........................................................................ [TPGs: 1]
o- tpg1 ................................................................................................. [no-gen-acls, no-auth]
o- acls ............................................................................................................ [ACLs: 4]
| o- iqn.2025-08.com.xtremeownage.kube:kube01 ............................................................... [Mapped LUNs: 0]
| o- iqn.2025-08.com.xtremeownage.kube:kube04 ............................................................... [Mapped LUNs: 0]
| o- iqn.2025-08.com.xtremeownage.kube:kube05 ............................................................... [Mapped LUNs: 0]
| o- iqn.2025-08.com.xtremeownage.kube:kube06 ............................................................... [Mapped LUNs: 0]
o- luns ............................................................................................................ [LUNs: 0]
o- portals ...................................................................................................... [Portals: 1]
o- 0.0.0.0:3260 ....................................................................................................... [OK]
/iscsi>
Optimize NIC settings¶
I noticed during testing high CPU was occurring when only transmitting 1GB/s of traffic.
Given, I have a 100G nic in this host, I am at least expecting to hit 5+ GB/s of throughput before running into issues.
After digging, I discovered few of the NIC offloads were enabled for my ConnectX-4 NIC. To fix- will add a startup script.
This can be added to /boot/config/go
, and will be executed during system startup.
# Unraid- Add to /boot/config/go
# This will optimize NIC settings.
# --- Variables ---
NIC=eth0 # adjust if different
RSS_QUEUES=32 # match physical cores (adjust if needed)
TCP_RMEM_MAX=67108864
TCP_WMEM_MAX=67108864
NETDEV_MAX_BACKLOG=250000
# --- Enable NIC offloads ---
ethtool -K $NIC tso on gso on gro on rx on tx on
ethtool -K $NIC rxvlan on txvlan on
# --- Maximize RSS queues ---
ethtool -L $NIC combined $RSS_QUEUES
# --- Kernel TCP tuning ---
sysctl -w net.core.rmem_max=$TCP_RMEM_MAX
sysctl -w net.core.wmem_max=$TCP_WMEM_MAX
sysctl -w net.core.netdev_max_backlog=$NETDEV_MAX_BACKLOG
sysctl -w net.ipv4.tcp_rmem="4096 87380 $TCP_RMEM_MAX"
sysctl -w net.ipv4.tcp_wmem="4096 87380 $TCP_WMEM_MAX"
# --- Confirmation ---
echo "NIC $NIC offloads enabled, RSS set to $RSS_QUEUES, TCP buffers tuned."
Benchmarking¶
To benchmark performance, I spun up a new VM from my Proxmox cluster.
I did disable cache, and enabled iothread.
To setup fio...
For benchmarking, here are the commands I used.
Info
Note, these settings are not very optimized. This is addressed later.
fio --name=seqwrite --filename=testfile --size=1G --bs=1M --rw=write --direct=1 --numjobs=1 --time_based=0
fio --name=seqread --filename=testfile --size=1G --bs=1M --rw=read --direct=1 --numjobs=1 --time_based=0
fio --name=randreadwrite --filename=testfile --size=1G --bs=4k --rw=randrw --rwmixread=70 --direct=1 --numjobs=4 --time_based=0
Initial Benchmark¶
Sequential I/O (1 MiB blocks)¶
Test | IOPS (avg) | BW (avg) | Latency avg | Latency min | Latency max | 50th pct | 95th pct | Notes |
---|---|---|---|---|---|---|---|---|
Seq Write | 168 | 169 MiB/s (177 MB/s) | 5.93 ms | 2.2 ms | 43.2 ms | 5.4 ms | 7.3 ms | 1 job, 1 GiB file, psync |
Seq Read | 200 | 213 MiB/s (224 MB/s) | 4.68 ms | 1.6 ms | 27.2 ms | 4.8 ms | 7.0 ms | 1 job, 1 GiB file, psync |
Rand RW (4 jobs) | 2,744 | 10.7 MiB/s | 0.25 ms | 83 µs | 19.9 ms | 0.22 ms | 0.44 ms | 70/30 R/W, 4 KB blocks |
Rand RW (1 job) | 2,744 | 10.7 MiB/s | 0.26 ms | 88 µs | 20.2 ms | 0.22 ms | 0.44 ms | Single-threaded random mix |
First issue- I had the blocksize defaulted to 4k, and the ZFS cache was disabled. I corrected the issues, tore down the VM, re-cloned, and benchmarked again.
Adjust Cache Size. Enable ZFS Cache¶
For this benchmark, I increased the blocksize from 4k, to 1M. This will hurt random I/O, but, will greatly boost sequential.
Also, I enabled the zfs cache.
The results: Performance for large sequential was basically doubled. But, still extremely slow.
For random I/O, we took a minor penalty in performance.
Sequential I/O (1 MiB blocks)¶
Test | IOPS (avg) | BW (avg) | Latency avg | Latency min | Latency max | 50th pct | 95th pct | Notes |
---|---|---|---|---|---|---|---|---|
Seq Write | 313 | 314 MiB/s (329 MB/s) | 3.19 ms | 1.43 ms | 75.0 ms | 2.84 ms | 4.55 ms | 1 job, 1 GiB file, psync |
Seq Read | 438 | 441 MiB/s (462 MB/s) | 2.26 ms | 1.46 ms | 7.35 ms | 2.11 ms | 3.13 ms | 1 job, 1 GiB file, psync |
Random I/O (4 KB blocks, multiple jobs)¶
Test | IOPS (avg) | BW (avg) | Latency avg | Latency min | Latency max | 50th pct | 95th pct | Notes |
---|---|---|---|---|---|---|---|---|
Rand RW (4 jobs) | 2,400 | 9.6 MiB/s | 0.28 ms | 84 µs | 66.9 ms | 0.24 ms | 1.04 ms | 70/30 R/W, 4 KB blocks |
Rand RW (1 job) | 2,396 | 9.6 MiB/s | 0.28 ms | 81 µs | 52.1 ms | 0.24 ms | 1.06 ms | Single-threaded random mix |
Run time-based sequential read¶
Next- I decided to run a time-based large sequential read, to assist with identifying bottlenecks...
fio --name=seqread-time \
--filename=testfile \
--size=10G \
--bs=1M \
--rw=read \
--direct=1 \
--numjobs=1 \
--time_based \
--runtime=300s \
--group_reporting
Off the bat, it averaged around 1GiB/s.
From the unraid size, CPU did not appear to be overly saturated.
Looking at the PVE host, I did notice a large spike in memory pressure stall.
Looking at the VM.....
It... appears to need more memory....
But, CPU utilization, and network utilization were fine.
So, at this point, I decided to go check the MTUs / Jumboframes.
And.... as it turns out, I did not have Jumbo frames enabled for my Unraid host.
Time-based sequential 2 - Fixed MTU, Fix Benchmark VM Memory¶
I adjusted the MTU size for the unraid host ane enabled Jumbo frames. I also doubled the RAM of the benchmark host to 16GiB.
Then, re-ran the benchmark.
And.......... No change.
Test | IOPS (avg) | BW (avg) | Latency avg | Latency min | Latency max | 50th pct | 95th pct | Notes |
---|---|---|---|---|---|---|---|---|
Seq Read (300 s) | 447 | 446 MiB/s (468 MB/s) | 2.24 ms | 0.84 ms | 15.9 ms | 2.18 ms | 3.03 ms | Single job, 10 GiB file, psync |
Decided to look at the switch.
Everything, looks perfectly fine here. So.... then I looked at the fio command I was using, and realized.... it could be optimized a bit more.
60s sequential read - Optimized¶
This command increases the iodepth, and number of jobs, and uses async r/w.
fio --name=seqread-time \
--filename=testfile \
--size=10G \
--bs=1M \
--rw=read \
--direct=1 \
--numjobs=4 \
--iodepth=16 \
--time_based \
--runtime=300s \
--group_reporting \
--ioengine=libaio
This yielded around a 500% increase in performance.
At this point, I found the next bottleneck. The 25g nics installed in my PVE hosts.
No other issues were noticed. CPU on Unraid, and the PVE host were extremely acceptable. No issues noticed with memory pressure, etc.
BUT... we are saturating this 25g NIC.
There are two bonded 25g ports. However, bonding does not work perfectly.
Ignore- the comments. I never updated them after swapping the 100g nics for 25g nics.
So, I decided to get the iSCSI multipathing working.
# I deleted the default portal of 0.0.0.0, to allow me to create portals for each IP address.
Deleted network portal 0.0.0.0:3260
/iscsi/iqn.20.../tpg1/portals> create 10.100.4.24 3260
Using default IP port 3260
Created network portal 10.100.4.24:3260.
/iscsi/iqn.20.../tpg1/portals> create 10.100.5.2 3260
Next on the PVE side, I went down a rabbithole, and essentially discovered multipathing is not available for zfs over iscsi.
So, thats a shame. But- Hey, Thats not a huge problem. Let me clone this VM for one of the other hosts.....
Time-based sequential 2 - Two Hosts¶
Since, we know there is a limitation of multipathing with qemu/zfs over iscsi.... This time, going to run the benchmark from two hosts. at the same time.
fio --name=seqread-time \
--filename=testfile \
--size=10G \
--bs=1M \
--rw=read \
--direct=1 \
--numjobs=4 \
--iodepth=16 \
--time_based \
--runtime=300s \
--group_reporting \
--ioengine=libaio
Off the bat, 50Gbits
Looking at the unraid host, no bottlenecks identified. (Do note- metric here is GiB, not Gb/s)
So, suppose if I want to see the full capabilities of this zfs iscsi san, I either need to add in more hosts with 25g+, or I need to toss 100G nics back into my SFFs.
BUT, I really don't want to do either of those options right now. Instead, I will do a final benchmark.
Summary / Full Benchmark - Sequential read, write & random I/O¶
sudo apt install fio -y
fio --name=seqwrite-time \
--filename=testfile \
--size=10G \
--bs=1M \
--rw=write \
--direct=1 \
--numjobs=4 \
--iodepth=16 \
--ioengine=libaio \
--time_based \
--runtime=300s \
--group_reporting
fio --name=seqread-time \
--filename=testfile \
--size=10G \
--bs=1M \
--rw=read \
--direct=1 \
--numjobs=4 \
--iodepth=16 \
--ioengine=libaio \
--time_based \
--runtime=300s \
--group_reporting
fio --name=randreadwrite-time \
--filename=testfile \
--size=10G \
--bs=4k \
--rw=randrw \
--rwmixread=70 \
--direct=1 \
--numjobs=4 \
--iodepth=16 \
--ioengine=libaio \
--time_based \
--runtime=300s \
--group_reporting
Test | IOPS (avg) | BW (avg) | Latency avg | Latency min | Latency max | 50th pct | 95th pct | Notes |
---|---|---|---|---|---|---|---|---|
Seq Write | 2,134 | 2,134 MiB/s (2,238 MB/s) | 29.99 ms | 6 ms | 335 ms | 24 ms | 68 ms | 4 jobs, 16 QD, 1 MiB blocks, libaio |
Seq Read | 2,772 | 2,772 MiB/s (2,906 MB/s) | 23.08 ms | 5 ms | 167 ms | 23 ms | 26 ms | 4 jobs, 16 QD, 1 MiB blocks, libaio |
Rand RW 70/30 | 1,732 / 749 | 6.9 MiB/s / 3 MiB/s | 18.3 ms | 0.097 ms | 383 ms | 7.5 ms | 94.9 ms | 4 jobs, 16 QD, 4 KiB blocks, libaio |
For sequential reads, we are fully saturating the network connection. The low IOPs is bothering me for 4k, So, I disabled sync and ran another bench.
Test | IOPS (avg) | BW (avg) | Latency avg | Latency min | Latency max | 50th pct | 95th pct | Notes |
---|---|---|---|---|---|---|---|---|
Rand RW Read | 5,039 | 19.7 MiB/s (20.6 MB/s) | 6.65 ms | 93 µs | 86.6 ms | 5.15 ms | 16.9 ms | 4 jobs, 4 KB blocks, 70/30 R/W mix |
Rand RW Write | 2,165 | 8.46 MiB/s (8.86 MB/s) | 14.1 ms | 283 µs | 86.5 ms | 12.9 ms | 26.3 ms | 4 jobs, 4 KB blocks, 70/30 R/W mix |
This drastically boosted the random IOPs. As such, it looks like I may be adding a SLOG in the future.
But, thats all for now.
This performance is perfectly acceptable for now. And, honestly quite a bit faster then ceph was managing, while yielding 20% more usable space.
(Ceph = 3 replicas = 30% usable space, ZFS Striped mirrors configuration = 50% usable space)
Bonus - 16k Block size¶
Before publishing this post, I was looking over the openzfs documentation for zvol blocksize, and decided to go back, and run another benchmark using the recommended size of 16k.
After cloning & configuring my cloud-init template again.... I ran the benchmarks with 16k blocksize.
I did also benchmark both sync, and async (on the zfs side).
FIO Commands:
sudo apt install fio -y
fio --name=seqwrite-time \
--filename=testfile \
--size=10G \
--bs=1M \
--rw=write \
--direct=1 \
--numjobs=4 \
--iodepth=16 \
--ioengine=libaio \
--time_based \
--runtime=300s \
--group_reporting
fio --name=seqread-time \
--filename=testfile \
--size=10G \
--bs=1M \
--rw=read \
--direct=1 \
--numjobs=4 \
--iodepth=16 \
--ioengine=libaio \
--time_based \
--runtime=300s \
--group_reporting
fio --name=randreadwrite-time \
--filename=testfile \
--size=10G \
--bs=4k \
--rw=randrw \
--rwmixread=70 \
--direct=1 \
--numjobs=4 \
--iodepth=16 \
--ioengine=libaio \
--time_based \
--runtime=300s \
--group_reporting
echo "done"
Test | IOPS Avg | IOPS Max | BW Avg (MiB/s) | BW Max (MiB/s) | Latency Avg (ms) | Latency Max (ms) | Sync Writes | Link % |
---|---|---|---|---|---|---|---|---|
Seq Write | 1,960 | 2,484 | 1,960 | 2,484 | 32.65 | 222.00 | Enabled | 62.7% |
Seq Read | 2,577 | 2,748 | 2,575 | 2,748 | 24.85 | 173.00 | Enabled | 82.4% |
Random Read 4k | 42,000 | 53,810 | 164 | 215 | 1.05 | 99.56 | Enabled | 5.2% |
Random Write 4k | 18,000 | 23,214 | 70.5 | 92.9 | 1.09 | 99.62 | Enabled | 2.3% |
Seq Write | 1,815 | 2,648 | 1,815 | 2,648 | 35.24 | 393.00 | Disabled | 58.1% |
Seq Read | 2,557 | 2,762 | 2,556 | 2,762 | 25.04 | 165.00 | Disabled | 81.8% |
Random Read 4k | 41,700 | 54,918 | 163 | 220 | 1.06 | 74.17 | Disabled | 5.2% |
Random Write 4k | 17,900 | 23,576 | 70.0 | 94.3 | 1.10 | 74.28 | Disabled | 2.2% |
40k IOPs, that is pretty respectable. Not fantastic. But, good enough... for now.