Minidisk Cache
Last updated 9 November 2005
Minidisk cache may provide performance and administrative benefits to
VM/ESA systems.
The amount of data that exists is much larger than the amount of data
that is frequently used.
This tends to be true for systems as well as individual virtual
machines.
The concept of caching builds off this behavior by keeping the
frequently referenced data where it can be efficiently accessed.
For minidisk cache, CP uses real or expanded storage or both as a cache
for data from virtual I/O.
By default, both real and expanded storage are used.
Accessing electronic storage is much more efficient than accessing DASD.
The rest of this section is broken down into subsections.
The first five provide background information on how minidisk cache
works.
The last subsections provide recommendations and guidelines for using
minidisk cache more effectively.
Requirements
Restrictions for minidisk cache are in the area of the type of I/O and
the type of the data or minidisk.
The following is a list of restrictions for minidisk cache eligibility:
- Data must be referenced via Diagnose X'18',
Diagnose X'A4',
Diagnose X'A8', Diagnose X'250', *BLOCKIO, SSCH, SIO,
or SIOF.
- Minidisk must be on 3380, 3390, 9345, or FBA DASD.
- Dedicated devices are not eligible.
- FBA minidisks must be defined on page boundaries, and the
size must be a multiple of 8 pages (64 512-byte blocks).
- Minidisk created with Diagnose X'E4' are not eligible.
- Minidisk caching is not supported for minidisks on shared DASD.
There is no support for handshaking between VM/ESA systems for minidisk
cache.
- Minidisks greater than 32767 cylinders are not eligible for MDC.
- Fullpack minidisks for a V=R machine with SET CCWTRAN OFF, which is
the default, become ineligible for minidisk caching at the time the
first channel program to the device bypasses CCW translation.
- When you SET SHARED ON for a device, minidisks on that device
become ineligible for minidisk cache.
- Minidisks that overlap CP space allocated for page, spool,
directory, or temporary disk are not eligible. In the temporary disk
case, this refers to a minidisk defined in the directory or with the
DEFINE MDISK command. It does not refer to minidisks created with the
DEFINE TEMPDISK command.
- I/Os can be aborted from MDC processing for various reasons.
These include (list is not all inclusive) -
- The record size is greater than 32767.
- The channel program includes a backwards Tic that does not point
to a search command.
- The channel program includes a format write (x'03') command.
- The channel program includes a sense ccw (x'04').
- The channel program includes a read sector ccw (x'22').
Concepts
Minidisk cache is a data-in-memory technique that attempts to improve
performance by decreasing the I/O to DASD required for minidisk I/O.
Minidisk cache trades increased use of
real and expanded storage for decreased DASD I/O.
Since paging to DASD increases as the amount of available
real and expanded storage decreases, you should expect some increase
in paging I/O when exploiting minidisk cache.
An increase in paging is not necessarily bad.
Looking at the total real DASD I/O rate and user state sampling can
show whether the system is benefiting from minidisk cache.
For example,
if minidisk cache reduces the real DASD I/O rate by 300 I/Os per second
and paging DASD I/O increases by 50 per second, then there would be a
250 I/Os per second reduction.
This is good.
Also, looking at user state sampling might indicate users are waiting
for virtual I/O much less than before with just a small increase in
page wait time.
This would also be good.
Minidisk Cache Arbiter
By default, the CP arbiter function
determines how much real and expanded storage to give
to minidisk cache and paging.
The arbiter determines the division separately for real and expanded
storage.
The goal of the arbiter is to keep the average page life of a minidisk
cache page equal to the average page life of a page used for paging.
This is not measured directly, but estimated based on steal rates for
the two usage types.
Several commands are available to influence the arbiter.
The amount of real or expanded storage used by minidisk cache can be
set by the SET MDCACHE STORAGE or SET MDCACHE XSTORE commands.
For both commands, minimum and maximum values can be set.
When the minimum and maximum values are equal, the storage associated
with minidisk cache is a fixed amount and the arbiter is effectively
turned off.
Unequal minimum and maximum values can be used to bound the cache size
determined by the arbiter.
An additional method of influencing the arbiter is setting a bias with
the SET MDCACHE STORAGE BIAS or SET MDCACHE XSTORE BIAS commands.
If you find the arbiter favoring paging or minidisk cache more than is
optimal for your system, you may bias the arbiter for or against
minidisk cache.
Fair Share Limit
The VM/ESA minidisk cache is implemented with a system cache where
the storage for the cache is shared with all the users on the system.
In order to prevent any one user from negatively impacting other users
on the system, CP enforces a fair share limit.
This is a limit on the amount of I/O that a user can insert into the
cache over a fixed period of time.
The limit is a dynamic value based on the amount of storage available
and the number of users that want to make inserts to the data.
If a user reaches the fair share limit, the I/O still completes but
the data is not inserted into the cache.
The NOMDCFS operand on the OPTION statement of a user directory entry
can be used to override the fair share limit.
Minidisk Cache Enabling
By default all minidisks that meet the requirements for minidisk cache
are enabled.
There are three levels of control for enabling minidisk cache.
- System level
The system level is the highest level of control and is managed by
the SET MDCACHE SYSTEM command.
Minidisk cache is either on or off at the system level.
If it is off, then no minidisk caching is done regardless of other
settings at lower levels.
Minidisk cache is on at the system level by default.
- Real device level
If minidisk cache is enabled at the system level, then the SET MDCACHE
RDEV command, SET RDEVICE command, or RDEVICE configuration statement
can be used to further enable or disable minidisk cache at the real
device level.
There are three settings at the real device level.
- DFLTON
- Default On enables minidisk caching
for a real device, yet allows the ability
to disable minidisk caching for a particular minidisk on that device.
All eligible minidisks will be cached on this device except those that
have caching off at the minidisk level.
This is the default.
- DFLTOFF
- Default Off disables minidisk caching for a real device, yet allows
the ability to enable caching for a particular minidisk on that device.
No minidisks will be cached on this device except those that have
caching on at the minidisk level.
- OFF
- OFF disables minidisk caching for a real device.
It cannot be overridden at the minidisk level.
No minidisks will be cached on this device.
- Minidisk level
The last level of control is the minidisk level which is managed by
the CP command SET MDCACHE MDISK or MINIOPT statements in the system
directory.
By default minidisk cache is enabled at the minidisk level.
One can explicitly specify ON for minidisks on devices with a DFLTOFF
setting or OFF for minidisks on devices with a DFLTON setting.
The choice between track level caching and record level caching is
also made at the minidisk level. Support for this was introduced in
VM/ESA 2.3.0 and in VM/ESA 2.1.0 and 2.2.0 with APAR VM61045. The
APAR version only has support of the CP SET command changes for record
level minidisk caching.
The ability also exists to flush data from minidisk cache for
particular devices and minidisks with the SET MDCACHE RDEV FLUSH or
SET MDCACHE MDISK FLUSH commands.
Guidelines
Here are some suggestions that will help you use minidisk cache more
effectively:
- Ensure that the paging configuration is appropriate.
As discussed above, the use of minidisk cache usually results in an
increase in paging and the arbiter assumes the paging configuration is
appropriate.
There can be significant performance degradation if minidisk cache is
used on a system that has not had the paging configuration tuned.
You want to make sure that there is sufficient paging space allocated.
VM/ESA likes large paging allocations in order to effectively use block
paging.
If the block size of page reads as reported by monitor is less than 10,
there is probably not enough DASD space allocated for paging.
Paging space should also be isolated so that the seldom-ending channel
program (start subchannel/resume subchannel) technique is effective.
Do not mix paging space with other types of data, including spool space.
Balance the paging I/O over multiple volumes where appropriate.
- When using minidisk cache, always have some real storage defined
for minidisk cache.
In systems where the I/O buffers in virtual storage of the user are
not page aligned, there can be a significant performance degradation if
you do not have some real storage allocated for minidisk cache.
- Reconfigure expanded storage as main storage if it is specifically
for minidisk caching.
Some processors allow a portion of main storage to be configured as
expanded storage.
Installations that configure storage as expanded storage in order to
do minidisk caching should consider reconfiguring this expanded storage
as main storage.
However, there can be some advantage to keeping some
storage configured as expanded storage to create a paging hierarchy.
- Consider biasing the arbiter against minidisk cache if the system
is very rich in storage.
Setting a value less than one on the BIAS option of the SET MDCACHE
will bias against minidisk cache.
Measurement results to date suggest that in environments that show no
storage constraint, the arbiters sometimes use more storage for
minidisk cache than is optimal.
The more storage constrained a system is, the better the arbiter tends
to work.
A bias in minidisk cache size can also be achieved by using SET
MDCACHE to set a maximum size.
- When planning what data should be enabled for minidisk cache, it
is generally better to start with everything enabled and then selectively
disable minidisks or volumes.
- Volumes with data that are physically shared between VM systems
should be disabled for minidisk cache.
There is no handshaking between VM systems to ensure that changes to
a minidisk are reflected in the minidisk cache of other systems.
This also applies to sharing minidisks between first and second level
systems.
- Minidisks where the majority of I/Os are write I/Os
should be disabled for minidisk cache.
Examples of this include the log disks for Shared File System.
- Minidisks with data that is only read once should be disabled for
minidisk cache.
The real benefit of minidisk cache is only seen for data that is
referenced multiple times.
- Disable minidisk cache for volumes that are mapped to VM data
spaces.
If data is being accessed by the mapped mdisk feature of VM data spaces,
there is no additional benefit from minidisk cache.
There can be significant overhead in processing to ensure data integrity
when both minidisk cache and mapped mdisks are used for the same
minidisk.
- You might want to temporarily disable minidisk cache for backup or
scan routines that reference all the data on a disk, but just once.
This can be done with the SET MDCACHE INSERT or SET MDCACHE MDISK CP
commands.
- Disable the minidisk cache fair share limit for key users.
Server machines or guests that do I/O on behalf of many users should
not be subject to the minidisk cache fair share limit.
Use the NOMDCFS operand on the OPTION statement in the user entry of
the system directory to turn off the fair share limit.
- Remove duplication of minidisks.
If you duplicated minidisks in the past in order to balance I/O, that
may not be necessary with minidisk cache.
Duplication can actually decrease performance since it might result in
duplicate copies of the data in minidisk cache.
- Minidisk cache was significantly changed in
VM/ESA Version 1 Release 2.2.
If you are migrating from a release prior to that, consider the following
additional guidelines:
- In older releases, minidisk cache could not use real storage.
If you configured expanded storage in order to use minidisk cache,
consider changing the storage to real storage where possible.
However, there are cases where keeping some expanded storage for paging
use is beneficial.
This creates a paging hierarchy.
- By default the current minidisk cache uses both real and expanded
storage.
If you set minimum and maximum bounds for expanded storage usage in
the past, you may want to revisit those values and set bounds on real
storage usage.
- Use the CP commands SET MDCACHE or SET RDEVICE commands to control
minidisk cache usage instead of SET SHARED.
In older releases, if a volume initially had SET SHARED ON and then
SET SHARED OFF was issued, a minidisk newly defined would be eligible
for minidisk cache.
In the current release, you must explicitly enable minidisk cache for
volumes that start with SET SHARED ON.
- Prepare for minidisk caching on devices shared between first and
second level systems.
Prior to VM/ESA Version 1 Release 2.2, most second level systems did not
use minidisk cache because it required expanded storage, which most
second level test systems do not have.
Now that minidisk caching can use real storage instead of expanded
storage, second level systems benefit.
Care should be used for minidisks shared with first level systems so
that all changes are reflected to the second level system.
For example, a read-write first level minidisk is shared with a second
level system as a read-only minidisk.
The minidisk is cached by the second level system, and then a change
is made to the minidisk by the first level system.
The change will not be reflected on the second level system if the data
had been in the second level system's cache.
In order to see the changes, one must purge the cache on the second
level system with the SET MDCACHE MDISK FLUSH command and reaccess the
minidisk.
- Disable caching for minidisks that are poor candidates.
There may be some minidisks that are poor candidates for minidisk
caching, but did not matter in the past since the type of I/O or
format made them ineligible for minidisk caching.
With several restrictions being lifted, it may be worthwhile to revisit
these minidisks and make sure they have minidisk caching disabled.
An example might be VSE paging volumes.
- Reformat some minidisks to smaller block size.
Each CMS file consists of at least one DASD record even if the file is
very small.
The capacity of 4096 byte formatted minidisks that consist of mostly
small files may be increased by reformatting them using 2048, 1024, or
512 byte sizes.
In the current minidisk cache, these formats are all eligible for
caching.
However, as the record size gets smaller, the number of available bytes
per track is reduced.
- There may be a performance degradation for applications that do
large amounts of I/O for small amounts of data with poor locality of
reference.
In older releases, the minidisk cache operated on a block basis.
Currently, the initial read of the data from DASD is done by reading
an entire track of data into storage.
When the I/O pattern is made up of many I/O requests for small amounts
of data that is scattered across tracks, this can result in extra
overhead in reading in extra data that is never referenced.
Disabling minidisk cache for these applications or changing the
applications may provide some improvement.
VM/ESA 2.3.0 and
APAR VM61045 (available for VM/ESA 2.1.0 and 2.2.0) provides
support for record level MDC to non-FBA CMS minidisks. This is quite
different from the old record level MDC of VM/ESA 1.2.1. While it may
help performance, care should be used in implementing this feature.
IBM recommends you check with
VM performance before using this new
feature.
- Be careful when drawing conclusions between releases based on the
minidisk cache hit ratio.
A lower hit ratio could be the result of more I/Os being eligible
for minidisk cache which would increase the denominator of the ratio.
For example, a hit ratio of 90% could be the result of 90 I/Os being
satisfied from cache out of 100 I/Os that are eligible.
On a new release, there may be 200 I/Os that are eligible with 150
being satisfied from cache which results in a lower hit ratio of 75%.
However, the total real I/Os avoided is better (150 compared to 90).
Also, some products determine the hit ratio differently on various
VM/ESA releases.
- Consider the impact of full pack minidisks (FPMs). Prior to
VM/ESA 1.2.2, FPMs were not eligible for MDC. Now, FPMs defined via
the VOLSER option are eligible; FPMs defined via DEVNO are not
eligible. There are two general uses for FPMs: use with guest
systems and use with system utilities (e.g. backup utilities).
A problem arises in that the MINIOPT directory statement is not
valid for FPMs and the DASDOPT directory statement does not have
a NOMDC option.
In the guest environment, one might want to disable MDC because
it is guest paging volume or being cached in guest already. In these
cases, the best alternative is to use the RDEV statement in the
system config file to set MDC off for the entire real volume.
In the FPM overlay scenario, we want to protect the system from
a utility program add more data than is necessary to MDC and to
avoid the program doing I/O to CP system areas (directory, warmstart,
checkpoint, etc.) since this causes data to be flushed from the
cache or made ineligible. In the latter case, it is often better to
define a minidisk that these applications use that only overlaps the
system areas and not the whole volume or to move all CMS and user
data off these volumes. To protect from utility programs, the
alternatives include:
- Defining the volume in system config file with the RDEV statement
as MDC DFLTOFF (MDC is off by default, but can be enabled at minidisk
level) and then use MINIOPT statements to enable all the minidisks
that are subsets of the FPM. This is tedious, but it works.
- Autolog a userid that links to all the FPMs and then issues
SET MDC MDISK commands to disable MDC for those FPMs. This is effective
but has a few more moving parts than we typically like.
- Leave MDC on as the default, but use the SET MDC INSERT OFF
command in the profile or utility program that runs against the
FPMs.
- Just leave everything as default, and count on the fair share
algorithms to protect you.
Update, January 7, 2005
Since this article was written, some additional considerations
have come to light as regards Minidisk Cache.
-
Simulated ECKD devices residing in the 2105 ESS are eligible for
MDC. However, because the ESS is so heavily cached compared to
other kinds of storage subsystems, and because the 2105 ESS
also supports FICON (very fast) channels,
customers in storage-constrained
environments might want to consider turning off MDC for
minidisks hosted in the ESS, if storage
compromises need to be made.
-
Minidisks residing on emulated-FBA-on-SCSI devices
(first available in z/VM 5.1.0)
are eligible
for MDC. Because the processor cost per I/O is greater for
emulated FBA devices than for traditional disk
devices (e.g., ECKD),
MDC is a particularly
important tool for improving I/O performance of emulated FBA.
More information
about the performance of emulated FBA and the
effect of MDC on emulated FBA
can be found in the
z/VM Performance Report.
-
With VM/ESA 2.2.0 and later,
minidisks defined via the DEVNO clause of the
CP directory MDISK statement are in fact eligible
for MDC.
-
Minidisks being used for Linux file systems should be configured
with MDC OFF unless they are almost exclusively read-only.
This especially means that minidisks holding Linux swap
extents should be run with MDC OFF.
The read fraction
is not high enough to pay off the processor penalty paid
for updating MDC on reads. For more information about
Linux swapping, check
our Linux tips page.
Back to the Performance Tips Page
|