📣 How to replace a failed disk in a ZFS pool

(Written November 2025)

Abstract

How to replace a failed disk in a ZFS pool. This guide demonstrates how to do this on a Proxmox cluster.

1. Replace the faulty drive

In the Proxmox ZFS overview, you can see that one of the ZFS pools is degraded: ZFSreplacedisk

Start by replacing the failed physical drive.

2. Identify the replacement drive

Go to the shell and check the ZFS pool status:

zpool status

The output will look something like this:

root@falador:~# zpool status
  pool: HDD_2Tb
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
        invalid.  Sufficient replicas exist for the pool to continue
        functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-4J
  scan: resilvered 1004K in 00:00:01 with 0 errors on Sat Nov  8 21:09:34 2025
config:

        NAME                                   STATE     READ WRITE CKSUM
        HDD_2Tb                                DEGRADED     0     0     0
          raidz3-0                             DEGRADED     0     0     0
            ata-TOSHIBA_MG03ACA200_Z63EKCUTF   ONLINE       0     0     0
            ata-SEAGATE_ST2000NM0033_Z1X4LXJT  ONLINE       0     0     0
            ata-TOSHIBA_MG03ACA200_Z63EKCUWF   ONLINE       0     0     0
            ata-SEAGATE_ST2000NM0033_Z1X4LW0D  ONLINE       0     0     0
            wwn-0x500003977c3800c3             ONLINE       0     0     0
            wwn-0x5000c5007b12fdb0             ONLINE       0     0     0
            wwn-0x500003977bd8025c             ONLINE       0     0     0
            wwn-0x500003977bd80267             ONLINE       0     0     0
            7734569190555848310                UNAVAIL      0     0     0  was /dev/disk/by-id/wwn-0x5000c5007b12fd48-part1
            wwn-0x5000c5007b12fc4c             ONLINE       0     0     0

errors: No known data errors

Copy the ID of the failed drive, we will need it later. In this case the ID of the failed drive is: "7734569190555848310"

The following command will show all the disks and their partitions. Identify the disk that has no partitions, as that will be the new drive

lsblk

In the output below, you can see that "sdg" has no partitions and is the new drive.

NAME sda ├─sda1 ├─sda2 └─sda3 sdb ├─sdb1 └─sdb9 sdc ├─sdc1 └─sdc9 sdd ├─sdd1 └─sdd9 sde ├─sde1 └─sde9 sdf ├─sdf1 └─sdf9 sdg sdh ├─sdh1 └─sdh9

href="#__codelineno-3-1">root@falador:~# lsblk MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS 8:0 0 93.2G 0 disk 8:1 0 1007K 0 part 8:2 0 512M 0 part 8:3 0 92.7G 0 part 8:16 0 894.3G 0 disk 8:17 0 894.2G 0 part 8:25 0 8M 0 part 8:32 0 894.3G 0 disk 8:33 0 894.2G 0 part 8:41 0 8M 0 part 8:48 0 894.3G 0 disk 8:49 0 894.2G 0 part 8:57 0 8M 0 part 8:64 0 894.3G 0 disk 8:65 0 894.2G 0 part 8:73 0 8M 0 part 8:80 0 1.8T 0 disk 8:81 0 1.8T 0 part 8:89 0 8M 0 part 8:96 0 1.8T 0 disk 8:112 0 1.8T 0 disk 8:113 0 1.8T 0 part 8:121 0 8M 0 part

Now we want to know the ID of that disk. You can do this by using the command (replace the disk name):

ls -l /dev/disk/by-id/ | grep sdg

root@falador:~# ls -l /dev/disk/by-id/ | grep sdg
lrwxrwxrwx 1 root root  9 Nov 10 18:38 wwn-0x5000cca224d31cce -> ../../sdg

In this scenario, the disk ID is "wwn-0x5000cca224d31cce"

Now we can tell the zpool which disk to use, to replace the old disk.

zpool replace <Pool-name> <ID-failed-drive> /dev/disk/by-id/<ID-new-disk>

Example:

zpool replace HDD_2Tb 7734569190555848310 /dev/disk/by-id/wwn-0x5000cca224d31cce

The resilvering process is now started to include the new drive in the pool. You can check this by using:

zpool status

Example output:

root@falador:~# zpool status
  pool: HDD_2Tb
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Wed Nov 12 23:54:56 2025
        1.05T / 3.25T scanned at 3.38G/s, 83.8G / 3.04T issued at 271M/s
        10.5G resilvered, 2.69% done, 03:11:05 to go
config:

        NAME                                   STATE     READ WRITE CKSUM
        HDD_2Tb                                DEGRADED     0     0     0
          raidz3-0                             DEGRADED     0     0     0
            ata-TOSHIBA_MG03ACA200_Z63EKCUTF   ONLINE       0     0     0
            ata-SEAGATE_ST2000NM0033_Z1X4LXJT  ONLINE       0     0     0
            ata-TOSHIBA_MG03ACA200_Z63EKCUWF   ONLINE       0     0     0
            ata-SEAGATE_ST2000NM0033_Z1X4LW0D  ONLINE       0     0     0
            wwn-0x500003977c3800c3             ONLINE       0     0     0
            wwn-0x5000c5007b12fdb0             ONLINE       0     0     0
            wwn-0x500003977bd8025c             ONLINE       0     0     0
            wwn-0x500003977bd80267             ONLINE       0     0     0
            replacing-8                        DEGRADED     0     0     0
              7734569190555848310              UNAVAIL      0     0     0  was /dev/disk/by-id/wwn-0x5000c5007b12fd48-part1
              wwn-0x5000cca224d31cce           ONLINE       0     0     0  (resilvering)
            wwn-0x5000c5007b12fc4c             ONLINE       0     0     0

errors: No known data errors

Now simply wait for the resilvering process to complete and your ZFS pool should be back in an 'online' state.