Sharing minidisks in Read-Write mode

A lot of folks are confused about multiwrite (MW) minidisks on z/VM. Here is my effort to give a concise (but not TOO concise) explanation.

It is generally the case that a particular piece of storage (ECKD, FBA, SCSI) is made available to only one host at a time since most file systems cannot tolerate two hosts updating them at the same time. Or even reading while another is writing.

In the physical world we control this with physical (cabling) or logical (zoning, IOCDS) forms of separation. In z/VM this is controlled by the link mode of a minidisk. The link mode determines which set of rules the Control Program (CP) will use determine whether a guest gets the disk R/O, R/W, or not at all.

CMS users typically have a few R/W disks for private use (e.g. 191), and a lot of R/O disks shared with other virtual machines (e.g. 190). CMS can actually tolerate ALL disks being R/O. You can't save anything to disk, but you can run programs and see the results in your virtual printer or on the console. So, CMS users most often have link modes RR and MR.

  • RR - Always give the virtual machine a read-only link. If someone else has an exclusive link to the disk (EW or ER), the LINK command will fail.
  • MR - Gives the virtual machine a R/W link unless someone else has it linked in R/W or stable mode, in which case they will get a read-only link.

Traditional IBM System z operating systems (z/OS, z/VM, z/TPF, z/VSE) were all invented in a time when dasd volumes had physical R/O-R/W switches on them and so they react fairly well to a disk being in R/O mode.

Linux, on the other hand, isn't so forgiving. Unless you specifically mount a filesystem R/O (-ro), Linux expects the disk to be R/W. It will generate errors if a disk is R/O and the filesystem is mounted R/W. For this reason, a Linux virtual machine will have it's disks linked RR for filesystems mounted R/O, and M for filesystems mounted R/W. (Yes, M, not MR.)

  • M - Gives the virtual machine R/W access unless someone else has a R/W or exclusive link. In that event, the LINK fails. This ensures that the guest gets a more precise error ("device does not exist") versus I/O errors because it's read-only.

From a RACF perspective, RR requires READ access, MR and M require CONTROL access.

If you need to let two or more virtual machines have concurrent read-write access to a minidisk, you have to enable those virtual machines to excercise the locking controls present on the disk.

RESERVE/RELEASE (R/R) is a reference to the device commands that are used by operating systems to control whether or not the OS has exclusive access to a DASD volume. Other OSes are locked out, being unable to read from or write to the volume. It is possible to "break" a Reserve, but it's only done in an emergency where a human confirms that it's ok.

If you're familiar with tape, it performs a function that's similar to ASSIGN/UNASSIGN.

In this case multiple virtual machine will have R/W access to the minidisk. This is connoted by the use of link mode MW (multiwrite). In RACF this is enabled by giving the guest ALTER authority to the mindisk.

If an MDISK statement has MWV on it, the CP will allow the guest to issue RESERVE/RELEASE commands to the minidisk. The reason the "V" is optional is to allow simulation of the lack of the multichannel switch feature available for control units of yesteryear. Since the introduction of the IBM 3990 storage controller, however, the multichannel switch has been a standard part of the configuration, with the attendant RESERVE/RELEASE function.

If the "V" is not specified and the guest issues RESERVE or RELEASE, an I/O error (unit check) will be generated that indicates the command was rejected.

In my opinion, all MDISKs defined with MW should have MWV. Like many things in life, it's better to have it and not need it, than need it and not have it. But such a default would create problems for shared minidisks in a Single System Image (SSI) cluster. Because virtual R/R support doesn't span members of the cluster, there is no way for CP to assure serialization of the mindisks across the cluster. Consequently V cannot be the default, else even RR sharing would become impossible. (If you turn it on, you'll discover that only the first member to create the minidisk will succeed.)

Remember that the "V" goes on the MDISK statement, not the LINK command. So, anyone with read/write access to the minidisk can use RESERVE/RELEASE.

If you're sharing a DASD volume across LPARs, the RESERVE/RELEASE is only sent to the physical dasd volume if (a) the MDISK is a fullpack, AND (b) the real volume is defined as SHARED, whether via the RDEVICE statement in SYSTEM CONFIG or the CP SET SHARED ON command. Otherwise CP enforces reserves only at the minidisk level within the z/VM system.

EDEVICEs emulate an IBM 9336 dasd device. The 9336 did not support RESERVE/RELEASE and so, even if you define an MDISK on an EDEVICE with MWV, attempts to use RESERVE/RELEASE will be rejected by CP. Implementation of RESERVE/RELEASE would require that CP convert them to one or more SCSI PERSISTENT RESERVE functions.

All that said, not all disk sharing mechanisms depend on such exclusive access to the storage device. A lock manager may be used instead. In fact, that's how Linux manages concurrent access to the same storage device for the filesystems that support it. In z/OS, RESERVE/RELEASE is issued to a volume only if GRS is configured to do so, such as when a RACF database is shared between z/VM and z/OS. This is why RACF databases on z/VM need to be on fullpack mindisks all by themselves on separate volumes. You don't want RESERVEs held by another system to interfere with access of non-RACF data.

CMS does not support any form of filesharing on minidisks. It doesn't issue any RESERVEs and it doesn't use a lock manager. All it takes to destroy a CMS minidisk is for two people to have R/W access, one of them changes the disk, and the other issues the RELEASE command. CMS doesn't detect disk corruption until it starts to read from the disk. The type of error you get is unpredictable. In the worst case, the directory has valid pointers in it, but they point to the wrong things. So you can open a file and read it, but it can have garbage or some other file's contents. You never get any filesystem errors, but your application is reading garbage.

If you should ever find that you have accidentally linked a disk R/W from two CMS users, very carefully issue #CP DETACH on both users. Do NOT issue the RELEASE command. Then run the MDCHECK utility (see README SAMPCMS on MAINTxxx 493).


The information provided, and views expressed on this site are my own and do not represent the IBM Corporation.