Skip to content

Proxmox: Using ZFS over iSCSI with Unraid

So, recently I decided to acquire a new Lenovo P520, which will be for running Unraid. Before, My unraid ran as a VM on my r730XD.

Now- it is bare metal on the P520, with a MD1200 disk shelf for its 3.5" HDDs.

One of my goals, I wanted to transition some of my VMs to leverage ZFS storage, hosted on this box.

This post goes through the steps of configuring Proxmox to leverage ZFS over iSCSI, hosted on Unraid.

Testing Setup

Here is the servers I will be testing & benchmarking with.

  1. Server: Unraid

    • Model: Lenovo P520
    • CPU: Intel® Xeon® W-2135 @ 3.70 GHz
    • Memory: 128 GiB DDR4 Single-bit ECC
    • Network: ConnectX-4 Dual-Port 100 GbE NIC
    • OS: Unraid
    • ZFS Pool “cache”: 3× mirrored NVMe vdevs:
      • Mirror-1: 2 × Samsung 970 EVO Plus 1 TB
      • Mirror-2: 2 × Samsung PM963 (MZ1LW960HMJP-00003) 960 GB NVMe M.2 22110 :contentReference[oaicite:0]{index=0}
      • Mirror-3: 2 × Samsung PM963 (same model as above)
  2. Client: Proxmox PVE

    • Model: Dell Optiplex 7060 SFF
    • Memory: 64 GiB DDR4
    • Network: ConnectX-4 Dual-Port 25 GbE NIC (bonded)
    • OS: Proxmox

Steps

For, anyone wanting to do this, I strongly recommend looking at the Proxmox: ZFS over iSCSI documentation.

Configure PVE Hosts

These commands must be ran for each PVE host.

This will install the packages, and update the initiator name to be something less random.

# Set Proxmox IQNs
HOST=$(hostname -s)
IQN_DATE="2025-08"
IQN_DOMAIN="com.xtremeownage.svr" 
IQN_TARGET="iqn.${IQN_DATE}.${IQN_DOMAIN}:${HOST}"
PORTAL_IP="10.100.4.24"

# Install / Enable targetcli-fb
apt install targetcli-fb -y
systemctl enable --now targetclid.service

echo "InitiatorName=${IQN_TARGET}" > /etc/iscsi/initiatorname.iscsi
systemctl restart open-iscsi
echo "IQN for this host has been set to: ${IQN_TARGET}"

Configure PVE Cluster

This script only needs to be executed ONCE for the cluster. Does not matter which node it runs on.

This creates the keys used to access the target host.

PORTAL_IP="10.100.4.24"
## Run ONCE on PVE Host - Create Target.
mkdir -p /etc/pve/priv/zfs
ssh-keygen -f /etc/pve/priv/zfs/${PORTAL_IP}_id_rsa
ssh-copy-id -i /etc/pve/priv/zfs/${PORTAL_IP}_id_rsa.pub root@${PORTAL_IP}

We will need to create the storage now.

zfs: Unraid-ZFS
        blocksize 1m
        iscsiprovider LIO
        pool cache/iscsi
        portal 10.100.4.24
        target iqn.2025-08.com.xtremeownage.svr:tower-iscsi
        content rootdir,images
        lio_tpg tpg1
        nowritecache 1
        sparse 1
        zfs-base-path /dev/zvol

Configure Unraid

Configure targetcli

# Server Target
IQN_DATE="2025-08"
IQN_DOMAIN="com.xtremeownage.svr"
TARGET="10.100.4.24"
IQN_TARGET="iqn.${IQN_DATE}.${IQN_DOMAIN}:tower-iscsi"

# Proxmox hosts
PROXMOX_HOSTS=("kube01" "kube04" "kube05" "kube06")
PROXMOX_IQN_DATE="2025-08"
PROXMOX_IQN_DOMAIN="com.xtremeownage.svr"

# Create Target (skip if exists)
targetcli /iscsi create ${IQN_TARGET} 2>/dev/null || echo "Target already exists"
# Create Portal (skip if exists)
targetcli /iscsi/${IQN_TARGET}/tpg1/portals create ${TARGET} 2>/dev/null || echo "Portal already exists"



# Add ACLs for each Proxmox host
for HOST in "${PROXMOX_HOSTS[@]}"; do
    CLIENT_IQN="iqn.${PROXMOX_IQN_DATE}.${PROXMOX_IQN_DOMAIN}:${HOST}"
    echo "Adding ACL for ${HOST} with IQN ${CLIENT_IQN}"
    targetcli /iscsi/${IQN_TARGET}/tpg1/acls create ${CLIENT_IQN} 2>/dev/null || echo "ACL for ${HOST} already exists"
done

targetcli saveconfig
targetcli exit

Expected Output from targetcli

Afterwards, should have this:
/iscsi> ls
o- iscsi .............................................................................................................. [Targets: 1]
  o- iqn.2025-08.com.xtremeownage.svr:tower-iscsi ........................................................................ [TPGs: 1]
    o- tpg1 ................................................................................................. [no-gen-acls, no-auth]
      o- acls ............................................................................................................ [ACLs: 4]
      | o- iqn.2025-08.com.xtremeownage.kube:kube01 ............................................................... [Mapped LUNs: 0]
      | o- iqn.2025-08.com.xtremeownage.kube:kube04 ............................................................... [Mapped LUNs: 0]
      | o- iqn.2025-08.com.xtremeownage.kube:kube05 ............................................................... [Mapped LUNs: 0]
      | o- iqn.2025-08.com.xtremeownage.kube:kube06 ............................................................... [Mapped LUNs: 0]
      o- luns ............................................................................................................ [LUNs: 0]
      o- portals ...................................................................................................... [Portals: 1]
        o- 0.0.0.0:3260 ....................................................................................................... [OK]
/iscsi>

Optimize NIC settings

I noticed during testing high CPU was occurring when only transmitting 1GB/s of traffic.

Given, I have a 100G nic in this host, I am at least expecting to hit 5+ GB/s of throughput before running into issues.

After digging, I discovered few of the NIC offloads were enabled for my ConnectX-4 NIC. To fix- will add a startup script.

This can be added to /boot/config/go, and will be executed during system startup.

# Unraid- Add to /boot/config/go
# This will optimize NIC settings.

# --- Variables ---
NIC=eth0          # adjust if different
RSS_QUEUES=32     # match physical cores (adjust if needed)
TCP_RMEM_MAX=67108864
TCP_WMEM_MAX=67108864
NETDEV_MAX_BACKLOG=250000

# --- Enable NIC offloads ---
ethtool -K $NIC tso on gso on gro on rx on tx on
ethtool -K $NIC rxvlan on txvlan on

# --- Maximize RSS queues ---
ethtool -L $NIC combined $RSS_QUEUES

# --- Kernel TCP tuning ---
sysctl -w net.core.rmem_max=$TCP_RMEM_MAX
sysctl -w net.core.wmem_max=$TCP_WMEM_MAX
sysctl -w net.core.netdev_max_backlog=$NETDEV_MAX_BACKLOG
sysctl -w net.ipv4.tcp_rmem="4096 87380 $TCP_RMEM_MAX"
sysctl -w net.ipv4.tcp_wmem="4096 87380 $TCP_WMEM_MAX"

# --- Confirmation ---
echo "NIC $NIC offloads enabled, RSS set to $RSS_QUEUES, TCP buffers tuned."

Benchmarking

To benchmark performance, I spun up a new VM from my Proxmox cluster.

alt text

I did disable cache, and enabled iothread.

alt text

To setup fio...

sudo apt install fio -y

For benchmarking, here are the commands I used.

Info

Note, these settings are not very optimized. This is addressed later.

fio --name=seqwrite --filename=testfile --size=1G --bs=1M --rw=write --direct=1 --numjobs=1 --time_based=0
fio --name=seqread --filename=testfile --size=1G --bs=1M --rw=read --direct=1 --numjobs=1 --time_based=0
fio --name=randreadwrite --filename=testfile --size=1G --bs=4k --rw=randrw --rwmixread=70 --direct=1 --numjobs=4 --time_based=0

Initial Benchmark

Sequential I/O (1 MiB blocks)

Test IOPS (avg) BW (avg) Latency avg Latency min Latency max 50th pct 95th pct Notes
Seq Write 168 169 MiB/s (177 MB/s) 5.93 ms 2.2 ms 43.2 ms 5.4 ms 7.3 ms 1 job, 1 GiB file, psync
Seq Read 200 213 MiB/s (224 MB/s) 4.68 ms 1.6 ms 27.2 ms 4.8 ms 7.0 ms 1 job, 1 GiB file, psync
Rand RW (4 jobs) 2,744 10.7 MiB/s 0.25 ms 83 µs 19.9 ms 0.22 ms 0.44 ms 70/30 R/W, 4 KB blocks
Rand RW (1 job) 2,744 10.7 MiB/s 0.26 ms 88 µs 20.2 ms 0.22 ms 0.44 ms Single-threaded random mix

First issue- I had the blocksize defaulted to 4k, and the ZFS cache was disabled. I corrected the issues, tore down the VM, re-cloned, and benchmarked again.

Adjust Cache Size. Enable ZFS Cache

For this benchmark, I increased the blocksize from 4k, to 1M. This will hurt random I/O, but, will greatly boost sequential.

Also, I enabled the zfs cache.

The results: Performance for large sequential was basically doubled. But, still extremely slow.

For random I/O, we took a minor penalty in performance.

Sequential I/O (1 MiB blocks)

Test IOPS (avg) BW (avg) Latency avg Latency min Latency max 50th pct 95th pct Notes
Seq Write 313 314 MiB/s (329 MB/s) 3.19 ms 1.43 ms 75.0 ms 2.84 ms 4.55 ms 1 job, 1 GiB file, psync
Seq Read 438 441 MiB/s (462 MB/s) 2.26 ms 1.46 ms 7.35 ms 2.11 ms 3.13 ms 1 job, 1 GiB file, psync

Random I/O (4 KB blocks, multiple jobs)

Test IOPS (avg) BW (avg) Latency avg Latency min Latency max 50th pct 95th pct Notes
Rand RW (4 jobs) 2,400 9.6 MiB/s 0.28 ms 84 µs 66.9 ms 0.24 ms 1.04 ms 70/30 R/W, 4 KB blocks
Rand RW (1 job) 2,396 9.6 MiB/s 0.28 ms 81 µs 52.1 ms 0.24 ms 1.06 ms Single-threaded random mix

Run time-based sequential read

Next- I decided to run a time-based large sequential read, to assist with identifying bottlenecks...

fio --name=seqread-time \
    --filename=testfile \
    --size=10G \
    --bs=1M \
    --rw=read \
    --direct=1 \
    --numjobs=1 \
    --time_based \
    --runtime=300s \
    --group_reporting

Off the bat, it averaged around 1GiB/s.

From the unraid size, CPU did not appear to be overly saturated.

alt text

Looking at the PVE host, I did notice a large spike in memory pressure stall.

alt text

Looking at the VM.....

alt text

It... appears to need more memory....

But, CPU utilization, and network utilization were fine.

So, at this point, I decided to go check the MTUs / Jumboframes.

And.... as it turns out, I did not have Jumbo frames enabled for my Unraid host.

alt text

Time-based sequential 2 - Fixed MTU, Fix Benchmark VM Memory

I adjusted the MTU size for the unraid host ane enabled Jumbo frames. I also doubled the RAM of the benchmark host to 16GiB.

Then, re-ran the benchmark.

And.......... No change.

Test IOPS (avg) BW (avg) Latency avg Latency min Latency max 50th pct 95th pct Notes
Seq Read (300 s) 447 446 MiB/s (468 MB/s) 2.24 ms 0.84 ms 15.9 ms 2.18 ms 3.03 ms Single job, 10 GiB file, psync

Decided to look at the switch.

alt text

Everything, looks perfectly fine here. So.... then I looked at the fio command I was using, and realized.... it could be optimized a bit more.

60s sequential read - Optimized

This command increases the iodepth, and number of jobs, and uses async r/w.

fio --name=seqread-time \
    --filename=testfile \
    --size=10G \
    --bs=1M \
    --rw=read \
    --direct=1 \
    --numjobs=4 \
    --iodepth=16 \
    --time_based \
    --runtime=300s \
    --group_reporting \
    --ioengine=libaio

This yielded around a 500% increase in performance.

alt text

At this point, I found the next bottleneck. The 25g nics installed in my PVE hosts.

alt text

No other issues were noticed. CPU on Unraid, and the PVE host were extremely acceptable. No issues noticed with memory pressure, etc.

BUT... we are saturating this 25g NIC.

There are two bonded 25g ports. However, bonding does not work perfectly.

Ignore- the comments. I never updated them after swapping the 100g nics for 25g nics.

alt text

So, I decided to get the iSCSI multipathing working.

# I deleted the default portal of 0.0.0.0, to allow me to create portals for each IP address.
Deleted network portal 0.0.0.0:3260
/iscsi/iqn.20.../tpg1/portals> create 10.100.4.24 3260
Using default IP port 3260
Created network portal 10.100.4.24:3260.
/iscsi/iqn.20.../tpg1/portals> create 10.100.5.2 3260

Next on the PVE side, I went down a rabbithole, and essentially discovered multipathing is not available for zfs over iscsi.

So, thats a shame. But- Hey, Thats not a huge problem. Let me clone this VM for one of the other hosts.....

Time-based sequential 2 - Two Hosts

Since, we know there is a limitation of multipathing with qemu/zfs over iscsi.... This time, going to run the benchmark from two hosts. at the same time.

fio --name=seqread-time \
    --filename=testfile \
    --size=10G \
    --bs=1M \
    --rw=read \
    --direct=1 \
    --numjobs=4 \
    --iodepth=16 \
    --time_based \
    --runtime=300s \
    --group_reporting \
    --ioengine=libaio

Off the bat, 50Gbits

alt text

Looking at the unraid host, no bottlenecks identified. (Do note- metric here is GiB, not Gb/s)

alt text

So, suppose if I want to see the full capabilities of this zfs iscsi san, I either need to add in more hosts with 25g+, or I need to toss 100G nics back into my SFFs.

BUT, I really don't want to do either of those options right now. Instead, I will do a final benchmark.

Summary / Full Benchmark - Sequential read, write & random I/O

sudo apt install fio -y

fio --name=seqwrite-time \
    --filename=testfile \
    --size=10G \
    --bs=1M \
    --rw=write \
    --direct=1 \
    --numjobs=4 \
    --iodepth=16 \
    --ioengine=libaio \
    --time_based \
    --runtime=300s \
    --group_reporting

fio --name=seqread-time \
    --filename=testfile \
    --size=10G \
    --bs=1M \
    --rw=read \
    --direct=1 \
    --numjobs=4 \
    --iodepth=16 \
    --ioengine=libaio \
    --time_based \
    --runtime=300s \
    --group_reporting

fio --name=randreadwrite-time \
    --filename=testfile \
    --size=10G \
    --bs=4k \
    --rw=randrw \
    --rwmixread=70 \
    --direct=1 \
    --numjobs=4 \
    --iodepth=16 \
    --ioengine=libaio \
    --time_based \
    --runtime=300s \
    --group_reporting
Test IOPS (avg) BW (avg) Latency avg Latency min Latency max 50th pct 95th pct Notes
Seq Write 2,134 2,134 MiB/s (2,238 MB/s) 29.99 ms 6 ms 335 ms 24 ms 68 ms 4 jobs, 16 QD, 1 MiB blocks, libaio
Seq Read 2,772 2,772 MiB/s (2,906 MB/s) 23.08 ms 5 ms 167 ms 23 ms 26 ms 4 jobs, 16 QD, 1 MiB blocks, libaio
Rand RW 70/30 1,732 / 749 6.9 MiB/s / 3 MiB/s 18.3 ms 0.097 ms 383 ms 7.5 ms 94.9 ms 4 jobs, 16 QD, 4 KiB blocks, libaio

For sequential reads, we are fully saturating the network connection. The low IOPs is bothering me for 4k, So, I disabled sync and ran another bench.

Test IOPS (avg) BW (avg) Latency avg Latency min Latency max 50th pct 95th pct Notes
Rand RW Read 5,039 19.7 MiB/s (20.6 MB/s) 6.65 ms 93 µs 86.6 ms 5.15 ms 16.9 ms 4 jobs, 4 KB blocks, 70/30 R/W mix
Rand RW Write 2,165 8.46 MiB/s (8.86 MB/s) 14.1 ms 283 µs 86.5 ms 12.9 ms 26.3 ms 4 jobs, 4 KB blocks, 70/30 R/W mix

This drastically boosted the random IOPs. As such, it looks like I may be adding a SLOG in the future.

But, thats all for now.

This performance is perfectly acceptable for now. And, honestly quite a bit faster then ceph was managing, while yielding 20% more usable space.

(Ceph = 3 replicas = 30% usable space, ZFS Striped mirrors configuration = 50% usable space)

Bonus - 16k Block size

Before publishing this post, I was looking over the openzfs documentation for zvol blocksize, and decided to go back, and run another benchmark using the recommended size of 16k.

After cloning & configuring my cloud-init template again.... I ran the benchmarks with 16k blocksize.

I did also benchmark both sync, and async (on the zfs side).

FIO Commands:

sudo apt install fio -y

fio --name=seqwrite-time \
    --filename=testfile \
    --size=10G \
    --bs=1M \
    --rw=write \
    --direct=1 \
    --numjobs=4 \
    --iodepth=16 \
    --ioengine=libaio \
    --time_based \
    --runtime=300s \
    --group_reporting

fio --name=seqread-time \
    --filename=testfile \
    --size=10G \
    --bs=1M \
    --rw=read \
    --direct=1 \
    --numjobs=4 \
    --iodepth=16 \
    --ioengine=libaio \
    --time_based \
    --runtime=300s \
    --group_reporting

fio --name=randreadwrite-time \
    --filename=testfile \
    --size=10G \
    --bs=4k \
    --rw=randrw \
    --rwmixread=70 \
    --direct=1 \
    --numjobs=4 \
    --iodepth=16 \
    --ioengine=libaio \
    --time_based \
    --runtime=300s \
    --group_reporting

echo "done"
Test IOPS Avg IOPS Max BW Avg (MiB/s) BW Max (MiB/s) Latency Avg (ms) Latency Max (ms) Sync Writes Link %
Seq Write 1,960 2,484 1,960 2,484 32.65 222.00 Enabled 62.7%
Seq Read 2,577 2,748 2,575 2,748 24.85 173.00 Enabled 82.4%
Random Read 4k 42,000 53,810 164 215 1.05 99.56 Enabled 5.2%
Random Write 4k 18,000 23,214 70.5 92.9 1.09 99.62 Enabled 2.3%
Seq Write 1,815 2,648 1,815 2,648 35.24 393.00 Disabled 58.1%
Seq Read 2,557 2,762 2,556 2,762 25.04 165.00 Disabled 81.8%
Random Read 4k 41,700 54,918 163 220 1.06 74.17 Disabled 5.2%
Random Write 4k 17,900 23,576 70.0 94.3 1.10 74.28 Disabled 2.2%

40k IOPs, that is pretty respectable. Not fantastic. But, good enough... for now.