General FAQs - Solaris

VHCI: devices not recognised as multi-path candidates for Solaris/OmniOS and derivatives

With the Solaris family of OS's, the virtual host controller interconnect (VHCI) driver enables a device with multiple paths to be represented as single device instance rather than as an instance per physical path. Devices under VHCI control appear in format listings with the device path starting /scsi_vhci, as in the following example:


# format
Searching for drives...done

AVAILABLE DISK SELECTIONS:
       0. c0t5000C500237B2E53d0 <SEAGATE-ST3300657SS-ES62-279.40GB>
                 /scsi_vhci/disk@g5000c500237b2e53
       1. c0t5000C5002385CE4Fd0 <SEAGATE-ST3300657SS-ES62-279.40GB>
          /scsi_vhci/disk@g5000c5002385ce4f
       2. c0t5000C500238478ABd0 <SEAGATE-ST3300657SS-ES62-279.40GB>
          /scsi_vhci/disk@g5000c500238478ab
       3. c1t5000C50013047C55d0 <HP-DG0300BALVP-HPD3-279.40GB>
          /pci@71,0/pci8086,2f04@2/pci1028,1f4f@0/iport@1/disk@w5000c50013047c55,0
       4. c2t5000C5000F81EDB1d0 <HP-DG0300BALVP-HPD4-279.40GB>
          /pci@71,0/pci8086,2f08@3/pci1028,1f4f@0/iport@20/disk@w5000c5000f81edb1,0

However, in the above example two devices are not under the control of the VHCI driver, as can be seen by the device /pci path rather than the /scsi_vhci one. In order to resolve this the VHCI driver needs to be made aware these drives can be multipathed. This is accompished by adding specific entries into the VHCI configuration file /kernel/drv/scsi_vhci.conf; in essence, for each differing (vendor/model combination) candidate SCSI target device, the scsi_vhci code must identify a failover module to support the device by adding them to the property scsi-vhci-failover-override in the VHCI configuration file.

By using the format command we can identify the device vendor/model from the resulting output. Taking the entry <HP-DG0300BALVP-HPD4-279.40GB> from the above example, the first two digits identify the manufacturer, HP, with the next block identifying the model number, DG0300BALVP. These identifiers can then be added to the VHCI configuration file /kernel/drv/scsi_vhci.conf thus (syntax for more than one entry shown here for reference):


scsi-vhci-failover-override =
    "HP      DG0300BALVP", "f_sym",
    "HP      DG0300FARVV", "f_sym";
#END: FAILOVER_MODULE_BLOCK (DO NOT MOVE OR DELETE)

Please note that the spacing is important in the vendor declaration - it must be padded out to eight characters, immediately followed by the model number (which does not require any padding). Once the entries have been added, the host machine must be rebooted in order for them to take effect. In the example above, once the configuration has been updated and the host rebooted, the output of format now returns:


AVAILABLE DISK SELECTIONS:
       0. c0t5000C500237B2E53d0 <SEAGATE-ST3300657SS-ES62-279.40GB>
                 /scsi_vhci/disk@g5000c500237b2e53
       1. c0t5000C5002385CE4Fd0 <SEAGATE-ST3300657SS-ES62-279.40GB>
          /scsi_vhci/disk@g5000c5002385ce4f
       2. c0t5000C500238478ABd0 <SEAGATE-ST3300657SS-ES62-279.40GB>
          /scsi_vhci/disk@g5000c500238478ab
       3. c1t5000C50013047C55d0 <HP-DG0300BALVP-HPD3-279.40GB>
          /scsi_vhci/disk@g5000c50013047c57
       4. c2t5000C5000F81EDB1d0 <HP-DG0300BALVP-HPD4-279.40GB>
          /scsi_vhci/disk@g5000c500238478ab

The drives have now been sucessfully configured for multi-pathing via the VHCI driver.

Reservation drives are getting 'Failed to power up' errors

When a ZFS service is running on a node in the cluster, that node will hold SCSI reservations on some of the zpool disks to prevent the other node from being able to access those disks. With some disk models, when the passive node reboots, it will no longer be able to access those reservation disks and will get the message:

Device <path-to-device> failed to power up

Because of the failure to power up then that node will then always encounter I/O error from those disks.

To resolve this issue, add an entry to /kernel/drv/sd.conf to disable the bootup power check for a specific disk model. The entry should be similar to:


sd-config-list= "SEAGATE ST2000NM0001","power-condition:false";
or if there are multiple disk models showing this behaviour:

sd-config-list= "SEAGATE ST2000NM0001","power-condition:false",
                "SEAGATE ST32000644NS","power-condition:false";

After sd.conf has been modified on both nodes, there should be no 'failed to power up' error on the next bootup and the passive node should be able to access the disks as expected (although it will still get 'reservation conflict' because the disks are still reserved).

RSF-1 Services not starting due to missing libc.so.1

When installing from scratch (clean OS install), the following issue may occur with RSF-1 services starting:


Starting RSF-1 REST Service...
ld.so.1: python3.9: fatal: libc.so.1: version 'ILLUMOS_0.39' not found (required by file /opt/HAC/Python/bin/python3.9)
ld.so.1: python3.9: fatal: libc.so.1: open failed: No such file or directory
[ Jun 14 08:40:50 Method "start" exited with status 0. ]
[ Jun 14 08:40:50 Stopping because all processes in service exited. ]
[ Jun 14 08:40:50 Executing stop method ("/lib/svc/method/svc-rsf-rest stop"). ]
[ Jun 14 08:40:50 Method "stop" exited with status 0. ]
[ Jun 14 08:40:50 Restarting too quickly, changing state to maintenance. ]
(END)

This can occur due to libc.so.1 being out of date and can be resolved by running pkg update to get the up-to-date libraries and rebooting.