Skip to main content

IBM Systems  >   System z  >   z/VM  >  

Minidisk Cache

Last updated 9 November 2005


Minidisk cache may provide performance and administrative benefits to VM/ESA systems. The amount of data that exists is much larger than the amount of data that is frequently used. This tends to be true for systems as well as individual virtual machines. The concept of caching builds off this behavior by keeping the frequently referenced data where it can be efficiently accessed. For minidisk cache, CP uses real or expanded storage or both as a cache for data from virtual I/O. By default, both real and expanded storage are used. Accessing electronic storage is much more efficient than accessing DASD.

The rest of this section is broken down into subsections. The first five provide background information on how minidisk cache works. The last subsections provide recommendations and guidelines for using minidisk cache more effectively.

Requirements

Restrictions for minidisk cache are in the area of the type of I/O and the type of the data or minidisk. The following is a list of restrictions for minidisk cache eligibility:

  • Data must be referenced via Diagnose X'18', Diagnose X'A4', Diagnose X'A8', Diagnose X'250', *BLOCKIO, SSCH, SIO, or SIOF.

  • Minidisk must be on 3380, 3390, 9345, or FBA DASD.

  • Dedicated devices are not eligible.

  • FBA minidisks must be defined on page boundaries, and the size must be a multiple of 8 pages (64 512-byte blocks).

  • Minidisk created with Diagnose X'E4' are not eligible.

  • Minidisk caching is not supported for minidisks on shared DASD. There is no support for handshaking between VM/ESA systems for minidisk cache.

  • Minidisks greater than 32767 cylinders are not eligible for MDC.

  • Fullpack minidisks for a V=R machine with SET CCWTRAN OFF, which is the default, become ineligible for minidisk caching at the time the first channel program to the device bypasses CCW translation.

  • When you SET SHARED ON for a device, minidisks on that device become ineligible for minidisk cache.

  • Minidisks that overlap CP space allocated for page, spool, directory, or temporary disk are not eligible. In the temporary disk case, this refers to a minidisk defined in the directory or with the DEFINE MDISK command. It does not refer to minidisks created with the DEFINE TEMPDISK command.

  • I/Os can be aborted from MDC processing for various reasons. These include (list is not all inclusive) -
    • The record size is greater than 32767.
    • The channel program includes a backwards Tic that does not point to a search command.
    • The channel program includes a format write (x'03') command.
    • The channel program includes a sense ccw (x'04').
    • The channel program includes a read sector ccw (x'22').

Concepts

Minidisk cache is a data-in-memory technique that attempts to improve performance by decreasing the I/O to DASD required for minidisk I/O. Minidisk cache trades increased use of real and expanded storage for decreased DASD I/O. Since paging to DASD increases as the amount of available real and expanded storage decreases, you should expect some increase in paging I/O when exploiting minidisk cache. An increase in paging is not necessarily bad. Looking at the total real DASD I/O rate and user state sampling can show whether the system is benefiting from minidisk cache. For example, if minidisk cache reduces the real DASD I/O rate by 300 I/Os per second and paging DASD I/O increases by 50 per second, then there would be a 250 I/Os per second reduction. This is good. Also, looking at user state sampling might indicate users are waiting for virtual I/O much less than before with just a small increase in page wait time. This would also be good.

Minidisk Cache Arbiter

By default, the CP arbiter function determines how much real and expanded storage to give to minidisk cache and paging. The arbiter determines the division separately for real and expanded storage. The goal of the arbiter is to keep the average page life of a minidisk cache page equal to the average page life of a page used for paging. This is not measured directly, but estimated based on steal rates for the two usage types. Several commands are available to influence the arbiter. The amount of real or expanded storage used by minidisk cache can be set by the SET MDCACHE STORAGE or SET MDCACHE XSTORE commands. For both commands, minimum and maximum values can be set. When the minimum and maximum values are equal, the storage associated with minidisk cache is a fixed amount and the arbiter is effectively turned off. Unequal minimum and maximum values can be used to bound the cache size determined by the arbiter. An additional method of influencing the arbiter is setting a bias with the SET MDCACHE STORAGE BIAS or SET MDCACHE XSTORE BIAS commands. If you find the arbiter favoring paging or minidisk cache more than is optimal for your system, you may bias the arbiter for or against minidisk cache.

Fair Share Limit

The VM/ESA minidisk cache is implemented with a system cache where the storage for the cache is shared with all the users on the system. In order to prevent any one user from negatively impacting other users on the system, CP enforces a fair share limit. This is a limit on the amount of I/O that a user can insert into the cache over a fixed period of time. The limit is a dynamic value based on the amount of storage available and the number of users that want to make inserts to the data. If a user reaches the fair share limit, the I/O still completes but the data is not inserted into the cache. The NOMDCFS operand on the OPTION statement of a user directory entry can be used to override the fair share limit.

Minidisk Cache Enabling

By default all minidisks that meet the requirements for minidisk cache are enabled. There are three levels of control for enabling minidisk cache.

  1. System level

    The system level is the highest level of control and is managed by the SET MDCACHE SYSTEM command. Minidisk cache is either on or off at the system level. If it is off, then no minidisk caching is done regardless of other settings at lower levels. Minidisk cache is on at the system level by default.

  2. Real device level

    If minidisk cache is enabled at the system level, then the SET MDCACHE RDEV command, SET RDEVICE command, or RDEVICE configuration statement can be used to further enable or disable minidisk cache at the real device level. There are three settings at the real device level.

    DFLTON
    Default On enables minidisk caching for a real device, yet allows the ability to disable minidisk caching for a particular minidisk on that device. All eligible minidisks will be cached on this device except those that have caching off at the minidisk level. This is the default.
    DFLTOFF
    Default Off disables minidisk caching for a real device, yet allows the ability to enable caching for a particular minidisk on that device. No minidisks will be cached on this device except those that have caching on at the minidisk level.
    OFF
    OFF disables minidisk caching for a real device. It cannot be overridden at the minidisk level. No minidisks will be cached on this device.

  3. Minidisk level

    The last level of control is the minidisk level which is managed by the CP command SET MDCACHE MDISK or MINIOPT statements in the system directory. By default minidisk cache is enabled at the minidisk level. One can explicitly specify ON for minidisks on devices with a DFLTOFF setting or OFF for minidisks on devices with a DFLTON setting. The choice between track level caching and record level caching is also made at the minidisk level. Support for this was introduced in VM/ESA 2.3.0 and in VM/ESA 2.1.0 and 2.2.0 with APAR VM61045. The APAR version only has support of the CP SET command changes for record level minidisk caching.

The ability also exists to flush data from minidisk cache for particular devices and minidisks with the SET MDCACHE RDEV FLUSH or SET MDCACHE MDISK FLUSH commands.

Guidelines

Here are some suggestions that will help you use minidisk cache more effectively:

  1. Ensure that the paging configuration is appropriate.

    As discussed above, the use of minidisk cache usually results in an increase in paging and the arbiter assumes the paging configuration is appropriate. There can be significant performance degradation if minidisk cache is used on a system that has not had the paging configuration tuned. You want to make sure that there is sufficient paging space allocated. VM/ESA likes large paging allocations in order to effectively use block paging. If the block size of page reads as reported by monitor is less than 10, there is probably not enough DASD space allocated for paging. Paging space should also be isolated so that the seldom-ending channel program (start subchannel/resume subchannel) technique is effective. Do not mix paging space with other types of data, including spool space. Balance the paging I/O over multiple volumes where appropriate.

  2. When using minidisk cache, always have some real storage defined for minidisk cache.

    In systems where the I/O buffers in virtual storage of the user are not page aligned, there can be a significant performance degradation if you do not have some real storage allocated for minidisk cache.

  3. Reconfigure expanded storage as main storage if it is specifically for minidisk caching.

    Some processors allow a portion of main storage to be configured as expanded storage. Installations that configure storage as expanded storage in order to do minidisk caching should consider reconfiguring this expanded storage as main storage. However, there can be some advantage to keeping some storage configured as expanded storage to create a paging hierarchy.

  4. Consider biasing the arbiter against minidisk cache if the system is very rich in storage.

    Setting a value less than one on the BIAS option of the SET MDCACHE will bias against minidisk cache. Measurement results to date suggest that in environments that show no storage constraint, the arbiters sometimes use more storage for minidisk cache than is optimal. The more storage constrained a system is, the better the arbiter tends to work. A bias in minidisk cache size can also be achieved by using SET MDCACHE to set a maximum size.

  5. When planning what data should be enabled for minidisk cache, it is generally better to start with everything enabled and then selectively disable minidisks or volumes.

    • Volumes with data that are physically shared between VM systems should be disabled for minidisk cache. There is no handshaking between VM systems to ensure that changes to a minidisk are reflected in the minidisk cache of other systems. This also applies to sharing minidisks between first and second level systems.

    • Minidisks where the majority of I/Os are write I/Os should be disabled for minidisk cache. Examples of this include the log disks for Shared File System.

    • Minidisks with data that is only read once should be disabled for minidisk cache. The real benefit of minidisk cache is only seen for data that is referenced multiple times.

    • Disable minidisk cache for volumes that are mapped to VM data spaces. If data is being accessed by the mapped mdisk feature of VM data spaces, there is no additional benefit from minidisk cache. There can be significant overhead in processing to ensure data integrity when both minidisk cache and mapped mdisks are used for the same minidisk.

    • You might want to temporarily disable minidisk cache for backup or scan routines that reference all the data on a disk, but just once. This can be done with the SET MDCACHE INSERT or SET MDCACHE MDISK CP commands.

  6. Disable the minidisk cache fair share limit for key users.

    Server machines or guests that do I/O on behalf of many users should not be subject to the minidisk cache fair share limit. Use the NOMDCFS operand on the OPTION statement in the user entry of the system directory to turn off the fair share limit.

  7. Remove duplication of minidisks.

    If you duplicated minidisks in the past in order to balance I/O, that may not be necessary with minidisk cache. Duplication can actually decrease performance since it might result in duplicate copies of the data in minidisk cache.

  8. Minidisk cache was significantly changed in VM/ESA Version 1 Release 2.2. If you are migrating from a release prior to that, consider the following additional guidelines:

    • In older releases, minidisk cache could not use real storage. If you configured expanded storage in order to use minidisk cache, consider changing the storage to real storage where possible. However, there are cases where keeping some expanded storage for paging use is beneficial. This creates a paging hierarchy.

    • By default the current minidisk cache uses both real and expanded storage. If you set minimum and maximum bounds for expanded storage usage in the past, you may want to revisit those values and set bounds on real storage usage.

    • Use the CP commands SET MDCACHE or SET RDEVICE commands to control minidisk cache usage instead of SET SHARED. In older releases, if a volume initially had SET SHARED ON and then SET SHARED OFF was issued, a minidisk newly defined would be eligible for minidisk cache. In the current release, you must explicitly enable minidisk cache for volumes that start with SET SHARED ON.

    • Prepare for minidisk caching on devices shared between first and second level systems. Prior to VM/ESA Version 1 Release 2.2, most second level systems did not use minidisk cache because it required expanded storage, which most second level test systems do not have. Now that minidisk caching can use real storage instead of expanded storage, second level systems benefit. Care should be used for minidisks shared with first level systems so that all changes are reflected to the second level system. For example, a read-write first level minidisk is shared with a second level system as a read-only minidisk. The minidisk is cached by the second level system, and then a change is made to the minidisk by the first level system. The change will not be reflected on the second level system if the data had been in the second level system's cache. In order to see the changes, one must purge the cache on the second level system with the SET MDCACHE MDISK FLUSH command and reaccess the minidisk.

    • Disable caching for minidisks that are poor candidates. There may be some minidisks that are poor candidates for minidisk caching, but did not matter in the past since the type of I/O or format made them ineligible for minidisk caching. With several restrictions being lifted, it may be worthwhile to revisit these minidisks and make sure they have minidisk caching disabled. An example might be VSE paging volumes.

    • Reformat some minidisks to smaller block size. Each CMS file consists of at least one DASD record even if the file is very small. The capacity of 4096 byte formatted minidisks that consist of mostly small files may be increased by reformatting them using 2048, 1024, or 512 byte sizes. In the current minidisk cache, these formats are all eligible for caching. However, as the record size gets smaller, the number of available bytes per track is reduced.

    • There may be a performance degradation for applications that do large amounts of I/O for small amounts of data with poor locality of reference. In older releases, the minidisk cache operated on a block basis. Currently, the initial read of the data from DASD is done by reading an entire track of data into storage. When the I/O pattern is made up of many I/O requests for small amounts of data that is scattered across tracks, this can result in extra overhead in reading in extra data that is never referenced. Disabling minidisk cache for these applications or changing the applications may provide some improvement.

      VM/ESA 2.3.0 and APAR VM61045 (available for VM/ESA 2.1.0 and 2.2.0) provides support for record level MDC to non-FBA CMS minidisks. This is quite different from the old record level MDC of VM/ESA 1.2.1. While it may help performance, care should be used in implementing this feature. IBM recommends you check with VM performance before using this new feature.

    • Be careful when drawing conclusions between releases based on the minidisk cache hit ratio. A lower hit ratio could be the result of more I/Os being eligible for minidisk cache which would increase the denominator of the ratio. For example, a hit ratio of 90% could be the result of 90 I/Os being satisfied from cache out of 100 I/Os that are eligible. On a new release, there may be 200 I/Os that are eligible with 150 being satisfied from cache which results in a lower hit ratio of 75%. However, the total real I/Os avoided is better (150 compared to 90). Also, some products determine the hit ratio differently on various VM/ESA releases.

    • Consider the impact of full pack minidisks (FPMs). Prior to VM/ESA 1.2.2, FPMs were not eligible for MDC. Now, FPMs defined via the VOLSER option are eligible; FPMs defined via DEVNO are not eligible. There are two general uses for FPMs: use with guest systems and use with system utilities (e.g. backup utilities). A problem arises in that the MINIOPT directory statement is not valid for FPMs and the DASDOPT directory statement does not have a NOMDC option.

      In the guest environment, one might want to disable MDC because it is guest paging volume or being cached in guest already. In these cases, the best alternative is to use the RDEV statement in the system config file to set MDC off for the entire real volume.

      In the FPM overlay scenario, we want to protect the system from a utility program add more data than is necessary to MDC and to avoid the program doing I/O to CP system areas (directory, warmstart, checkpoint, etc.) since this causes data to be flushed from the cache or made ineligible. In the latter case, it is often better to define a minidisk that these applications use that only overlaps the system areas and not the whole volume or to move all CMS and user data off these volumes. To protect from utility programs, the alternatives include:

      1. Defining the volume in system config file with the RDEV statement as MDC DFLTOFF (MDC is off by default, but can be enabled at minidisk level) and then use MINIOPT statements to enable all the minidisks that are subsets of the FPM. This is tedious, but it works.
      2. Autolog a userid that links to all the FPMs and then issues SET MDC MDISK commands to disable MDC for those FPMs. This is effective but has a few more moving parts than we typically like.
      3. Leave MDC on as the default, but use the SET MDC INSERT OFF command in the profile or utility program that runs against the FPMs.
      4. Just leave everything as default, and count on the fair share algorithms to protect you.

Update, January 7, 2005

Since this article was written, some additional considerations have come to light as regards Minidisk Cache.

  • Simulated ECKD devices residing in the 2105 ESS are eligible for MDC. However, because the ESS is so heavily cached compared to other kinds of storage subsystems, and because the 2105 ESS also supports FICON (very fast) channels, customers in storage-constrained environments might want to consider turning off MDC for minidisks hosted in the ESS, if storage compromises need to be made.

  • Minidisks residing on emulated-FBA-on-SCSI devices (first available in z/VM 5.1.0) are eligible for MDC. Because the processor cost per I/O is greater for emulated FBA devices than for traditional disk devices (e.g., ECKD), MDC is a particularly important tool for improving I/O performance of emulated FBA. More information about the performance of emulated FBA and the effect of MDC on emulated FBA can be found in the z/VM Performance Report.

  • With VM/ESA 2.2.0 and later, minidisks defined via the DEVNO clause of the CP directory MDISK statement are in fact eligible for MDC.

  • Minidisks being used for Linux file systems should be configured with MDC OFF unless they are almost exclusively read-only. This especially means that minidisks holding Linux swap extents should be run with MDC OFF. The read fraction is not high enough to pay off the processor penalty paid for updating MDC on reads. For more information about Linux swapping, check our Linux tips page.

Back to the Performance Tips Page