IBM: Minidisk Cache

Minidisk Cache

Note: As of z/VM 6.4, GA Nov 11 2016, z/VM cannot use XSTORE for any purpose. If you are running z/VM 6.4 or later you should ignore this article's XSTORE-related information.

Minidisk cache may provide performance and administrative benefits to z/VM systems. The amount of data that exists is much larger than the amount of data that is frequently used. This tends to be true for systems as well as individual virtual machines. The concept of caching builds off this behavior by keeping the frequently referenced data where it can be efficiently accessed. For minidisk cache, CP uses real or expanded storage or both as a cache for data from virtual I/O. By default, both real and expanded storage are used. Accessing electronic storage is much more efficient than accessing DASD.

The rest of this section is broken down into subsections. The first five provide background information on how minidisk cache works. The last subsections provide recommendations and guidelines for using minidisk cache more effectively.

Requirements

Restrictions for minidisk cache are in the area of the type of I/O and the type of the data or minidisk. The following is a list of restrictions for minidisk cache eligibility:

Data must be referenced via Diagnose X'18', Diagnose X'A4', Diagnose X'A8', Diagnose X'250', *BLOCKIO, or SSCH.
Minidisk must be on 3380, 3390, or FBA DASD; or hardware that emulates those architectures.
Dedicated devices are not eligible.
FBA minidisks must be defined on page boundaries, and the size must be a multiple of 8 pages (64 512-byte blocks).
Minidisk created with Diagnose X'E4' are not eligible.
Minidisk caching is not supported for minidisks on shared DASD. There is no support for handshaking between z/VM systems for minidisk cache.
Minidisks greater than 32767 cylinders are not eligible for MDC.
When you SET SHARED ON for a device, minidisks on that device become ineligible for minidisk cache.
Minidisks that overlap CP space allocated for page, spool, directory, or temporary disk are not eligible. In the temporary disk case, this refers to a minidisk defined in the directory or with the DEFINE MDISK command. It does not refer to minidisks created with the DEFINE TEMPDISK command.
I/Os can be aborted from MDC processing for various reasons. These include (list is not all inclusive) -
- The record size is greater than 32767.
- The channel program includes a backwards Tic that does not point to a search command.
- The channel program includes a format write (x'03') command.
- The channel program includes a sense ccw (x'04').
- The channel program includes a read sector ccw (x'22').
- The channel path mask for the real device and the channel path mask for the virtual device do not match.

Concepts

Minidisk cache is a data-in-memory technique that attempts to improve performance by decreasing the I/O to DASD required for minidisk I/O. Minidisk cache trades increased use of real and expanded storage for decreased DASD I/O. Since paging to DASD increases as the amount of available real and expanded storage decreases, you should expect some increase in paging I/O when exploiting minidisk cache. An increase in paging is not necessarily bad. Looking at the total real DASD I/O rate and user state sampling can show whether the system is benefiting from minidisk cache. For example, if minidisk cache reduces the real DASD I/O rate by 300 I/Os per second and paging DASD I/O increases by 50 per second, then there would be a 250 I/Os per second reduction. This is good. Also, looking at user state sampling might indicate users are waiting for virtual I/O much less than before with just a small increase in page wait time. This would also be good.

Minidisk Cache Arbiter

By default, the CP arbiter function determines how much real and expanded storage to give to minidisk cache and paging. The arbiter determines the division separately for real and expanded storage. The goal of the arbiter is to keep the average page life of a minidisk cache page equal to the average page life of a page used for paging. This is not measured directly, but estimated based on steal rates for the two usage types. Several commands are available to influence the arbiter. The amount of real or expanded storage used by minidisk cache can be set by the SET MDCACHE STORAGE or SET MDCACHE XSTORE commands. For both commands, minimum and maximum values can be set. When the minimum and maximum values are equal, the storage associated with minidisk cache is a fixed amount and the arbiter is effectively turned off. Unequal minimum and maximum values can be used to bound the cache size determined by the arbiter. An additional method of influencing the arbiter is setting a bias with the SET MDCACHE STORAGE BIAS or SET MDCACHE XSTORE BIAS commands. If you find the arbiter favoring paging or minidisk cache more than is optimal for your system, you may bias the arbiter for or against minidisk cache.

Fair Share Limit

The z/VM minidisk cache is implemented with a system cache where the storage for the cache is shared with all the users on the system. In order to prevent any one user from negatively impacting other users on the system, CP enforces a fair share limit. This is a limit on the amount of I/O that a user can insert into the cache over a fixed period of time. The limit is a dynamic value based on the amount of storage available and the number of users that want to make inserts to the data. If a user reaches the fair share limit, the I/O still completes but the data is not inserted into the cache. The NOMDCFS operand on the OPTION statement of a user directory entry can be used to override the fair share limit.

Minidisk Cache Enabling

By default all minidisks that meet the requirements for minidisk cache are enabled. There are three levels of control for enabling minidisk cache.

System level
The system level is the highest level of control and is managed by the SET MDCACHE SYSTEM command. Minidisk cache is either on or off at the system level. If it is off, then no minidisk caching is done regardless of other settings at lower levels. Minidisk cache is on at the system level by default.
Real device level
If minidisk cache is enabled at the system level, then the SET MDCACHE RDEV command, SET RDEVICE command, or RDEVICE configuration statement can be used to further enable or disable minidisk cache at the real device level. There are three settings at the real device level.

DFLTON
Default On enables minidisk caching for a real device, yet allows the ability to disable minidisk caching for a particular minidisk on that device. All eligible minidisks will be cached on this device except those that have caching off at the minidisk level. This is the default.
DFLTOFF
Default Off disables minidisk caching for a real device, yet allows the ability to enable caching for a particular minidisk on that device. No minidisks will be cached on this device except those that have caching on at the minidisk level.
OFF
OFF disables minidisk caching for a real device. It cannot be overridden at the minidisk level. No minidisks will be cached on this device.
Minidisk level
The last level of control is the minidisk level which is managed by the CP command SET MDCACHE MDISK or MINIOPT statements in the system directory. By default minidisk cache is enabled at the minidisk level. One can explicitly specify ON for minidisks on devices with a DFLTOFF setting or OFF for minidisks on devices with a DFLTON setting. The choice between track level caching and record level caching is also made at the minidisk level.

The ability also exists to flush data from minidisk cache for particular devices and minidisks with the SET MDCACHE RDEV FLUSH or SET MDCACHE MDISK FLUSH commands.

Guidelines

Here are some suggestions that will help you use minidisk cache more effectively:

Ensure that the paging configuration is appropriate.
As discussed above, the use of minidisk cache usually results in an increase in paging and the arbiter assumes the paging configuration is appropriate. There can be significant performance degradation if minidisk cache is used on a system that has not had the paging configuration tuned. You want to make sure that there is sufficient paging space allocated. z/VM likes large paging allocations in order to effectively use block paging. If the block size of page reads as reported by monitor is less than 10, there is probably not enough DASD space allocated for paging. Paging space should also be isolated so that the seldom-ending channel program (start subchannel/resume subchannel) technique is effective. Do not mix paging space with other types of data, including spool space. Balance the paging I/O over multiple volumes where appropriate.
When using minidisk cache, always have some real storage defined for minidisk cache.
In systems where the I/O buffers in virtual storage of the user are not page aligned, there can be a significant performance degradation if you do not have some real storage allocated for minidisk cache.
Consider biasing the arbiter against minidisk cache if the system is very rich in storage.
Setting a value less than one on the BIAS option of the SET MDCACHE will bias against minidisk cache. Measurement results to date suggest that in environments that show no storage constraint, the arbiters sometimes use more storage for minidisk cache than is optimal. The more storage constrained a system is, the better the arbiter tends to work. A bias in minidisk cache size can also be achieved by using SET MDCACHE to set a maximum size.
When planning what data should be enabled for minidisk cache, it is generally better to start with everything enabled and then selectively disable minidisks or volumes.
- Volumes with data that are physically shared between VM systems should be disabled for minidisk cache. There is no handshaking between VM systems to ensure that changes to a minidisk are reflected in the minidisk cache of other systems. This also applies to sharing minidisks between first and second level systems.
- Minidisks where the majority of I/Os are write I/Os should be disabled for minidisk cache. Examples of this include the log disks for Shared File System.
- Minidisks with data that is only read once should be disabled for minidisk cache. The real benefit of minidisk cache is only seen for data that is referenced multiple times.
- Disable minidisk cache for volumes that are mapped to VM data spaces. If data is being accessed by the mapped mdisk feature of VM data spaces, there is no additional benefit from minidisk cache. There can be significant overhead in processing to ensure data integrity when both minidisk cache and mapped mdisks are used for the same minidisk.
- Disable minidisk cache for minidisks that are the target of FlashCopy requests. This includes FlashCopy requests that may be initiated by virtual machines, such as HSM component of z/OS.
- You might want to temporarily disable minidisk cache for backup or scan routines that reference all the data on a disk, but just once. This can be done with the SET MDCACHE INSERT or SET MDCACHE MDISK CP commands.
Disable the minidisk cache fair share limit for key users.
Server machines or guests that do I/O on behalf of many users should not be subject to the minidisk cache fair share limit. Use the NOMDCFS operand on the OPTION statement in the user entry of the system directory to turn off the fair share limit.
Prepare for minidisk caching on devices shared between first and second level systems. Care should be used for minidisks shared with first level systems so that all changes are reflected to the second level system. For example, a read-write first level minidisk is shared with a second level system as a read-only minidisk. The minidisk is cached by the second level system, and then a change is made to the minidisk by the first level system. The change will not be reflected on the second level system if the data had been in the second level system's cache. In order to see the changes, one must purge the cache on the second level system with the SET MDCACHE MDISK FLUSH command and reaccess the minidisk.
Minidisks residing on emulated-FBA-on-SCSI devices are eligible for MDC. Because the processor cost per I/O is greater for emulated FBA devices than for traditional disk devices (e.g., ECKD), MDC is a particularly important tool for improving I/O performance of emulated FBA. More information about the performance of emulated FBA and the effect of MDC on emulated FBA can be found in the z/VM Performance Report.
With VM/ESA 2.2.0 and later, minidisks defined via the DEVNO clause of the CP directory MDISK statement are in fact eligible for MDC.
Minidisks being used for Linux file systems should be configured with MDC OFF unless they are almost exclusively read-only. This especially means that minidisks holding Linux swap extents should be run with MDC OFF. The read fraction is not high enough to pay off the processor penalty paid for updating MDC on reads. For more information about Linux swapping, check our Linux tips page.

Back to the Performance Tips Page