Proxmox: Using ZFS over iSCSI with Unraid¶

So, recently I decided to acquire a new Lenovo P520, which will be for running Unraid. Before, My unraid ran as a VM on my r730XD.

Now- it is bare metal on the P520, with a MD1200 disk shelf for its 3.5" HDDs.

One of my goals, I wanted to transition some of my VMs to leverage ZFS storage, hosted on this box.

This post goes through the steps of configuring Proxmox to leverage ZFS over iSCSI, hosted on Unraid.

Testing Setup¶

Here is the servers I will be testing & benchmarking with.

Server: Unraid
- Model: Lenovo P520
- CPU: Intel® Xeon® W-2135 @ 3.70 GHz
- Memory: 128 GiB DDR4 Single-bit ECC
- Network: ConnectX-4 Dual-Port 100 GbE NIC
- OS: Unraid
- ZFS Pool “cache”: 3× mirrored NVMe vdevs:
  - Mirror-1: 2 × Samsung 970 EVO Plus 1 TB
  - Mirror-2: 2 × Samsung PM963 (MZ1LW960HMJP-00003) 960 GB NVMe M.2 22110 :contentReference[oaicite:0]{index=0}
  - Mirror-3: 2 × Samsung PM963 (same model as above)
Client: Proxmox PVE
- Model: Dell Optiplex 7060 SFF
- Memory: 64 GiB DDR4
- Network: ConnectX-4 Dual-Port 25 GbE NIC (bonded)
- OS: Proxmox

Steps¶

For, anyone wanting to do this, I strongly recommend looking at the Proxmox: ZFS over iSCSI documentation.

Configure PVE Hosts¶

These commands must be ran for each PVE host.

This will install the packages, and update the initiator name to be something less random.

# Set Proxmox IQNs
HOST=$(hostname -s)
IQN_DATE="2025-08"
IQN_DOMAIN="com.xtremeownage.svr" 
IQN_TARGET="iqn.${IQN_DATE}.${IQN_DOMAIN}:${HOST}"
PORTAL_IP="10.100.4.24"

# Install / Enable targetcli-fb
apt install targetcli-fb -y
systemctl enable --now targetclid.service

echo "InitiatorName=${IQN_TARGET}" > /etc/iscsi/initiatorname.iscsi
systemctl restart open-iscsi
echo "IQN for this host has been set to: ${IQN_TARGET}"

Configure PVE Cluster¶

This script only needs to be executed ONCE for the cluster. Does not matter which node it runs on.

This creates the keys used to access the target host.

PORTAL_IP="10.100.4.24"
## Run ONCE on PVE Host - Create Target.
mkdir -p /etc/pve/priv/zfs
ssh-keygen -f /etc/pve/priv/zfs/${PORTAL_IP}_id_rsa
ssh-copy-id -i /etc/pve/priv/zfs/${PORTAL_IP}_id_rsa.pub root@${PORTAL_IP}

We will need to create the storage now.

zfs: Unraid-ZFS
        blocksize 1m
        iscsiprovider LIO
        pool cache/iscsi
        portal 10.100.4.24
        target iqn.2025-08.com.xtremeownage.svr:tower-iscsi
        content rootdir,images
        lio_tpg tpg1
        nowritecache 1
        sparse 1
        zfs-base-path /dev/zvol

Configure Unraid¶

Configure targetcli¶

# Server Target
IQN_DATE="2025-08"
IQN_DOMAIN="com.xtremeownage.svr"
TARGET="10.100.4.24"
IQN_TARGET="iqn.${IQN_DATE}.${IQN_DOMAIN}:tower-iscsi"

# Proxmox hosts
PROXMOX_HOSTS=("kube01" "kube04" "kube05" "kube06")
PROXMOX_IQN_DATE="2025-08"
PROXMOX_IQN_DOMAIN="com.xtremeownage.svr"

# Create Target (skip if exists)
targetcli /iscsi create ${IQN_TARGET} 2>/dev/null || echo "Target already exists"
# Create Portal (skip if exists)
targetcli /iscsi/${IQN_TARGET}/tpg1/portals create ${TARGET} 2>/dev/null || echo "Portal already exists"



# Add ACLs for each Proxmox host
for HOST in "${PROXMOX_HOSTS[@]}"; do
    CLIENT_IQN="iqn.${PROXMOX_IQN_DATE}.${PROXMOX_IQN_DOMAIN}:${HOST}"
    echo "Adding ACL for ${HOST} with IQN ${CLIENT_IQN}"
    targetcli /iscsi/${IQN_TARGET}/tpg1/acls create ${CLIENT_IQN} 2>/dev/null || echo "ACL for ${HOST} already exists"
done

targetcli saveconfig
targetcli exit

Expected Output from targetcli

Afterwards, should have this:
/iscsi> ls
o- iscsi .............................................................................................................. [Targets: 1]
  o- iqn.2025-08.com.xtremeownage.svr:tower-iscsi ........................................................................ [TPGs: 1]
    o- tpg1 ................................................................................................. [no-gen-acls, no-auth]
      o- acls ............................................................................................................ [ACLs: 4]
      | o- iqn.2025-08.com.xtremeownage.kube:kube01 ............................................................... [Mapped LUNs: 0]
      | o- iqn.2025-08.com.xtremeownage.kube:kube04 ............................................................... [Mapped LUNs: 0]
      | o- iqn.2025-08.com.xtremeownage.kube:kube05 ............................................................... [Mapped LUNs: 0]
      | o- iqn.2025-08.com.xtremeownage.kube:kube06 ............................................................... [Mapped LUNs: 0]
      o- luns ............................................................................................................ [LUNs: 0]
      o- portals ...................................................................................................... [Portals: 1]
        o- 0.0.0.0:3260 ....................................................................................................... [OK]
/iscsi>

Optimize NIC settings¶

I noticed during testing high CPU was occurring when only transmitting 1GB/s of traffic.

Given, I have a 100G nic in this host, I am at least expecting to hit 5+ GB/s of throughput before running into issues.

After digging, I discovered few of the NIC offloads were enabled for my ConnectX-4 NIC. To fix- will add a startup script.

This can be added to /boot/config/go, and will be executed during system startup.

# Unraid- Add to /boot/config/go
# This will optimize NIC settings.

# --- Variables ---
NIC=eth0          # adjust if different
RSS_QUEUES=32     # match physical cores (adjust if needed)
TCP_RMEM_MAX=67108864
TCP_WMEM_MAX=67108864
NETDEV_MAX_BACKLOG=250000

# --- Enable NIC offloads ---
ethtool -K $NIC tso on gso on gro on rx on tx on
ethtool -K $NIC rxvlan on txvlan on

# --- Maximize RSS queues ---
ethtool -L $NIC combined $RSS_QUEUES

# --- Kernel TCP tuning ---
sysctl -w net.core.rmem_max=$TCP_RMEM_MAX
sysctl -w net.core.wmem_max=$TCP_WMEM_MAX
sysctl -w net.core.netdev_max_backlog=$NETDEV_MAX_BACKLOG
sysctl -w net.ipv4.tcp_rmem="4096 87380 $TCP_RMEM_MAX"
sysctl -w net.ipv4.tcp_wmem="4096 87380 $TCP_WMEM_MAX"

# --- Confirmation ---
echo "NIC $NIC offloads enabled, RSS set to $RSS_QUEUES, TCP buffers tuned."

Benchmarking¶

To benchmark performance, I spun up a new VM from my Proxmox cluster.

I did disable cache, and enabled iothread.

To setup fio...

sudo apt install fio -y

For benchmarking, here are the commands I used.

Info

Note, these settings are not very optimized. This is addressed later.

fio --name=seqwrite --filename=testfile --size=1G --bs=1M --rw=write --direct=1 --numjobs=1 --time_based=0
fio --name=seqread --filename=testfile --size=1G --bs=1M --rw=read --direct=1 --numjobs=1 --time_based=0
fio --name=randreadwrite --filename=testfile --size=1G --bs=4k --rw=randrw --rwmixread=70 --direct=1 --numjobs=4 --time_based=0

Initial Benchmark¶

Sequential I/O (1 MiB blocks)¶

Test	IOPS (avg)	BW (avg)	Latency avg	Latency min	Latency max	50th pct	95th pct	Notes
Seq Write	168	169 MiB/s (177 MB/s)	5.93 ms	2.2 ms	43.2 ms	5.4 ms	7.3 ms	1 job, 1 GiB file, psync
Seq Read	200	213 MiB/s (224 MB/s)	4.68 ms	1.6 ms	27.2 ms	4.8 ms	7.0 ms	1 job, 1 GiB file, psync
Rand RW (4 jobs)	2,744	10.7 MiB/s	0.25 ms	83 µs	19.9 ms	0.22 ms	0.44 ms	70/30 R/W, 4 KB blocks
Rand RW (1 job)	2,744	10.7 MiB/s	0.26 ms	88 µs	20.2 ms	0.22 ms	0.44 ms	Single-threaded random mix

First issue- I had the blocksize defaulted to 4k, and the ZFS cache was disabled. I corrected the issues, tore down the VM, re-cloned, and benchmarked again.

Adjust Cache Size. Enable ZFS Cache¶

For this benchmark, I increased the blocksize from 4k, to 1M. This will hurt random I/O, but, will greatly boost sequential.

Also, I enabled the zfs cache.

The results: Performance for large sequential was basically doubled. But, still extremely slow.

For random I/O, we took a minor penalty in performance.

Sequential I/O (1 MiB blocks)¶

Test	IOPS (avg)	BW (avg)	Latency avg	Latency min	Latency max	50th pct	95th pct	Notes
Seq Write	313	314 MiB/s (329 MB/s)	3.19 ms	1.43 ms	75.0 ms	2.84 ms	4.55 ms	1 job, 1 GiB file, psync
Seq Read	438	441 MiB/s (462 MB/s)	2.26 ms	1.46 ms	7.35 ms	2.11 ms	3.13 ms	1 job, 1 GiB file, psync

Random I/O (4 KB blocks, multiple jobs)¶

Test	IOPS (avg)	BW (avg)	Latency avg	Latency min	Latency max	50th pct	95th pct	Notes
Rand RW (4 jobs)	2,400	9.6 MiB/s	0.28 ms	84 µs	66.9 ms	0.24 ms	1.04 ms	70/30 R/W, 4 KB blocks
Rand RW (1 job)	2,396	9.6 MiB/s	0.28 ms	81 µs	52.1 ms	0.24 ms	1.06 ms	Single-threaded random mix

Run time-based sequential read¶

Next- I decided to run a time-based large sequential read, to assist with identifying bottlenecks...

fio --name=seqread-time \
    --filename=testfile \
    --size=10G \
    --bs=1M \
    --rw=read \
    --direct=1 \
    --numjobs=1 \
    --time_based \
    --runtime=300s \
    --group_reporting

Off the bat, it averaged around 1GiB/s.

From the unraid size, CPU did not appear to be overly saturated.

Looking at the PVE host, I did notice a large spike in memory pressure stall.

Looking at the VM.....

It... appears to need more memory....

But, CPU utilization, and network utilization were fine.

So, at this point, I decided to go check the MTUs / Jumboframes.

And.... as it turns out, I did not have Jumbo frames enabled for my Unraid host.

Time-based sequential 2 - Fixed MTU, Fix Benchmark VM Memory¶

I adjusted the MTU size for the unraid host ane enabled Jumbo frames. I also doubled the RAM of the benchmark host to 16GiB.

Then, re-ran the benchmark.

And.......... No change.

Test	IOPS (avg)	BW (avg)	Latency avg	Latency min	Latency max	50th pct	95th pct	Notes
Seq Read (300 s)	447	446 MiB/s (468 MB/s)	2.24 ms	0.84 ms	15.9 ms	2.18 ms	3.03 ms	Single job, 10 GiB file, psync

Decided to look at the switch.

Everything, looks perfectly fine here. So.... then I looked at the fio command I was using, and realized.... it could be optimized a bit more.

60s sequential read - Optimized¶

This command increases the iodepth, and number of jobs, and uses async r/w.

fio --name=seqread-time \
    --filename=testfile \
    --size=10G \
    --bs=1M \
    --rw=read \
    --direct=1 \
    --numjobs=4 \
    --iodepth=16 \
    --time_based \
    --runtime=300s \
    --group_reporting \
    --ioengine=libaio

This yielded around a 500% increase in performance.

At this point, I found the next bottleneck. The 25g nics installed in my PVE hosts.

No other issues were noticed. CPU on Unraid, and the PVE host were extremely acceptable. No issues noticed with memory pressure, etc.

BUT... we are saturating this 25g NIC.

There are two bonded 25g ports. However, bonding does not work perfectly.

Ignore- the comments. I never updated them after swapping the 100g nics for 25g nics.

So, I decided to get the iSCSI multipathing working.

# I deleted the default portal of 0.0.0.0, to allow me to create portals for each IP address.
Deleted network portal 0.0.0.0:3260
/iscsi/iqn.20.../tpg1/portals> create 10.100.4.24 3260
Using default IP port 3260
Created network portal 10.100.4.24:3260.
/iscsi/iqn.20.../tpg1/portals> create 10.100.5.2 3260

Next on the PVE side, I went down a rabbithole, and essentially discovered multipathing is not available for zfs over iscsi.

So, thats a shame. But- Hey, Thats not a huge problem. Let me clone this VM for one of the other hosts.....

Time-based sequential 2 - Two Hosts¶

Since, we know there is a limitation of multipathing with qemu/zfs over iscsi.... This time, going to run the benchmark from two hosts. at the same time.

fio --name=seqread-time \
    --filename=testfile \
    --size=10G \
    --bs=1M \
    --rw=read \
    --direct=1 \
    --numjobs=4 \
    --iodepth=16 \
    --time_based \
    --runtime=300s \
    --group_reporting \
    --ioengine=libaio

Off the bat, 50Gbits

Looking at the unraid host, no bottlenecks identified. (Do note- metric here is GiB, not Gb/s)

So, suppose if I want to see the full capabilities of this zfs iscsi san, I either need to add in more hosts with 25g+, or I need to toss 100G nics back into my SFFs.

BUT, I really don't want to do either of those options right now. Instead, I will do a final benchmark.

Summary / Full Benchmark - Sequential read, write & random I/O¶

sudo apt install fio -y

fio --name=seqwrite-time \
    --filename=testfile \
    --size=10G \
    --bs=1M \
    --rw=write \
    --direct=1 \
    --numjobs=4 \
    --iodepth=16 \
    --ioengine=libaio \
    --time_based \
    --runtime=300s \
    --group_reporting

fio --name=seqread-time \
    --filename=testfile \
    --size=10G \
    --bs=1M \
    --rw=read \
    --direct=1 \
    --numjobs=4 \
    --iodepth=16 \
    --ioengine=libaio \
    --time_based \
    --runtime=300s \
    --group_reporting

fio --name=randreadwrite-time \
    --filename=testfile \
    --size=10G \
    --bs=4k \
    --rw=randrw \
    --rwmixread=70 \
    --direct=1 \
    --numjobs=4 \
    --iodepth=16 \
    --ioengine=libaio \
    --time_based \
    --runtime=300s \
    --group_reporting

Test	IOPS (avg)	BW (avg)	Latency avg	Latency min	Latency max	50th pct	95th pct	Notes
Seq Write	2,134	2,134 MiB/s (2,238 MB/s)	29.99 ms	6 ms	335 ms	24 ms	68 ms	4 jobs, 16 QD, 1 MiB blocks, libaio
Seq Read	2,772	2,772 MiB/s (2,906 MB/s)	23.08 ms	5 ms	167 ms	23 ms	26 ms	4 jobs, 16 QD, 1 MiB blocks, libaio
Rand RW 70/30	1,732 / 749	6.9 MiB/s / 3 MiB/s	18.3 ms	0.097 ms	383 ms	7.5 ms	94.9 ms	4 jobs, 16 QD, 4 KiB blocks, libaio

For sequential reads, we are fully saturating the network connection. The low IOPs is bothering me for 4k, So, I disabled sync and ran another bench.

Test	IOPS (avg)	BW (avg)	Latency avg	Latency min	Latency max	50th pct	95th pct	Notes
Rand RW Read	5,039	19.7 MiB/s (20.6 MB/s)	6.65 ms	93 µs	86.6 ms	5.15 ms	16.9 ms	4 jobs, 4 KB blocks, 70/30 R/W mix
Rand RW Write	2,165	8.46 MiB/s (8.86 MB/s)	14.1 ms	283 µs	86.5 ms	12.9 ms	26.3 ms	4 jobs, 4 KB blocks, 70/30 R/W mix

This drastically boosted the random IOPs. As such, it looks like I may be adding a SLOG in the future.

But, thats all for now.

This performance is perfectly acceptable for now. And, honestly quite a bit faster then ceph was managing, while yielding 20% more usable space.

(Ceph = 3 replicas = 30% usable space, ZFS Striped mirrors configuration = 50% usable space)

Bonus - 16k Block size¶

Before publishing this post, I was looking over the openzfs documentation for zvol blocksize, and decided to go back, and run another benchmark using the recommended size of 16k.

After cloning & configuring my cloud-init template again.... I ran the benchmarks with 16k blocksize.

I did also benchmark both sync, and async (on the zfs side).

FIO Commands:

sudo apt install fio -y

fio --name=seqwrite-time \
    --filename=testfile \
    --size=10G \
    --bs=1M \
    --rw=write \
    --direct=1 \
    --numjobs=4 \
    --iodepth=16 \
    --ioengine=libaio \
    --time_based \
    --runtime=300s \
    --group_reporting

fio --name=seqread-time \
    --filename=testfile \
    --size=10G \
    --bs=1M \
    --rw=read \
    --direct=1 \
    --numjobs=4 \
    --iodepth=16 \
    --ioengine=libaio \
    --time_based \
    --runtime=300s \
    --group_reporting

fio --name=randreadwrite-time \
    --filename=testfile \
    --size=10G \
    --bs=4k \
    --rw=randrw \
    --rwmixread=70 \
    --direct=1 \
    --numjobs=4 \
    --iodepth=16 \
    --ioengine=libaio \
    --time_based \
    --runtime=300s \
    --group_reporting

echo "done"

Test	IOPS Avg	IOPS Max	BW Avg (MiB/s)	BW Max (MiB/s)	Latency Avg (ms)	Latency Max (ms)	Sync Writes	Link %
Seq Write	1,960	2,484	1,960	2,484	32.65	222.00	Enabled	62.7%
Seq Read	2,577	2,748	2,575	2,748	24.85	173.00	Enabled	82.4%
Random Read 4k	42,000	53,810	164	215	1.05	99.56	Enabled	5.2%
Random Write 4k	18,000	23,214	70.5	92.9	1.09	99.62	Enabled	2.3%
Seq Write	1,815	2,648	1,815	2,648	35.24	393.00	Disabled	58.1%
Seq Read	2,557	2,762	2,556	2,762	25.04	165.00	Disabled	81.8%
Random Read 4k	41,700	54,918	163	220	1.06	74.17	Disabled	5.2%
Random Write 4k	17,900	23,576	70.0	94.3	1.10	74.28	Disabled	2.2%

40k IOPs, that is pretty respectable. Not fantastic. But, good enough... for now.

Proxmox: Using ZFS over iSCSI with Unraid¶

Testing Setup¶

Steps¶

Configure PVE Hosts¶

Configure PVE Cluster¶

Configure Unraid¶

Configure targetcli¶

Optimize NIC settings¶

Benchmarking¶

Initial Benchmark¶

Sequential I/O (1 MiB blocks)¶

Adjust Cache Size. Enable ZFS Cache¶

Sequential I/O (1 MiB blocks)¶

Random I/O (4 KB blocks, multiple jobs)¶

Run time-based sequential read¶

Time-based sequential 2 - Fixed MTU, Fix Benchmark VM Memory¶

60s sequential read - Optimized¶

Time-based sequential 2 - Two Hosts¶

Summary / Full Benchmark - Sequential read, write & random I/O¶

Bonus - 16k Block size¶

Sequential I/O (1 MiB blocks)¶

Sequential I/O (1 MiB blocks)¶

Random I/O (4 KB blocks, multiple jobs)¶