Minidisk Cache
Minidisk cache may provide performance and administrative benefits to
z/VM systems.
The amount of data that exists is much larger than the amount of data
that is frequently used.
This tends to be true for systems as well as individual virtual
machines.
The concept of caching builds off this behavior by keeping the
frequently referenced data where it can be efficiently accessed.
For minidisk cache, CP uses real or expanded storage or both as a cache
for data from virtual I/O.
By default, both real and expanded storage are used.
Accessing electronic storage is much more efficient than accessing DASD.
The rest of this section is broken down into subsections.
The first five provide background information on how minidisk cache
works.
The last subsections provide recommendations and guidelines for using
minidisk cache more effectively.
Requirements
Restrictions for minidisk cache are in the area of the type of I/O and
the type of the data or minidisk.
The following is a list of restrictions for minidisk cache eligibility:
- Data must be referenced via Diagnose X'18',
Diagnose X'A4',
Diagnose X'A8', Diagnose X'250', *BLOCKIO, or SSCH.
- Minidisk must be on 3380, 3390, or FBA DASD; or hardware that
emulates those architectures.
- Dedicated devices are not eligible.
- FBA minidisks must be defined on page boundaries, and the
size must be a multiple of 8 pages (64 512-byte blocks).
- Minidisk created with Diagnose X'E4' are not eligible.
- Minidisk caching is not supported for minidisks on shared DASD.
There is no support for handshaking between z/VM systems for minidisk
cache.
- Minidisks greater than 32767 cylinders are not eligible for MDC.
- When you SET SHARED ON for a device, minidisks on that device
become ineligible for minidisk cache.
- Minidisks that overlap CP space allocated for page, spool,
directory, or temporary disk are not eligible. In the temporary disk
case, this refers to a minidisk defined in the directory or with the
DEFINE MDISK command. It does not refer to minidisks created with the
DEFINE TEMPDISK command.
- I/Os can be aborted from MDC processing for various reasons.
These include (list is not all inclusive) -
- The record size is greater than 32767.
- The channel program includes a backwards Tic that does not point
to a search command.
- The channel program includes a format write (x'03') command.
- The channel program includes a sense ccw (x'04').
- The channel program includes a read sector ccw (x'22').
- The channel path mask
for the real device and the channel path mask for
the virtual device do not match.
Concepts
Minidisk cache is a data-in-memory technique that attempts to improve
performance by decreasing the I/O to DASD required for minidisk I/O.
Minidisk cache trades increased use of
real and expanded storage for decreased DASD I/O.
Since paging to DASD increases as the amount of available
real and expanded storage decreases, you should expect some increase
in paging I/O when exploiting minidisk cache.
An increase in paging is not necessarily bad.
Looking at the total real DASD I/O rate and user state sampling can
show whether the system is benefiting from minidisk cache.
For example,
if minidisk cache reduces the real DASD I/O rate by 300 I/Os per second
and paging DASD I/O increases by 50 per second, then there would be a
250 I/Os per second reduction.
This is good.
Also, looking at user state sampling might indicate users are waiting
for virtual I/O much less than before with just a small increase in
page wait time.
This would also be good.
Minidisk Cache Arbiter
By default, the CP arbiter function
determines how much real and expanded storage to give
to minidisk cache and paging.
The arbiter determines the division separately for real and expanded
storage.
The goal of the arbiter is to keep the average page life of a minidisk
cache page equal to the average page life of a page used for paging.
This is not measured directly, but estimated based on steal rates for
the two usage types.
Several commands are available to influence the arbiter.
The amount of real or expanded storage used by minidisk cache can be
set by the SET MDCACHE STORAGE or SET MDCACHE XSTORE commands.
For both commands, minimum and maximum values can be set.
When the minimum and maximum values are equal, the storage associated
with minidisk cache is a fixed amount and the arbiter is effectively
turned off.
Unequal minimum and maximum values can be used to bound the cache size
determined by the arbiter.
An additional method of influencing the arbiter is setting a bias with
the SET MDCACHE STORAGE BIAS or SET MDCACHE XSTORE BIAS commands.
If you find the arbiter favoring paging or minidisk cache more than is
optimal for your system, you may bias the arbiter for or against
minidisk cache.
Fair Share Limit
The z/VM minidisk cache is implemented with a system cache where
the storage for the cache is shared with all the users on the system.
In order to prevent any one user from negatively impacting other users
on the system, CP enforces a fair share limit.
This is a limit on the amount of I/O that a user can insert into the
cache over a fixed period of time.
The limit is a dynamic value based on the amount of storage available
and the number of users that want to make inserts to the data.
If a user reaches the fair share limit, the I/O still completes but
the data is not inserted into the cache.
The NOMDCFS operand on the OPTION statement of a user directory entry
can be used to override the fair share limit.
Minidisk Cache Enabling
By default all minidisks that meet the requirements for minidisk cache
are enabled.
There are three levels of control for enabling minidisk cache.
- System level
The system level is the highest level of control and is managed by
the SET MDCACHE SYSTEM command.
Minidisk cache is either on or off at the system level.
If it is off, then no minidisk caching is done regardless of other
settings at lower levels.
Minidisk cache is on at the system level by default.
- Real device level
If minidisk cache is enabled at the system level, then the SET MDCACHE
RDEV command, SET RDEVICE command, or RDEVICE configuration statement
can be used to further enable or disable minidisk cache at the real
device level.
There are three settings at the real device level.
- DFLTON
- Default On enables minidisk caching
for a real device, yet allows the ability
to disable minidisk caching for a particular minidisk on that device.
All eligible minidisks will be cached on this device except those that
have caching off at the minidisk level.
This is the default.
- DFLTOFF
- Default Off disables minidisk caching for a real device, yet allows
the ability to enable caching for a particular minidisk on that device.
No minidisks will be cached on this device except those that have
caching on at the minidisk level.
- OFF
- OFF disables minidisk caching for a real device.
It cannot be overridden at the minidisk level.
No minidisks will be cached on this device.
- Minidisk level
The last level of control is the minidisk level which is managed by
the CP command SET MDCACHE MDISK or MINIOPT statements in the system
directory.
By default minidisk cache is enabled at the minidisk level.
One can explicitly specify ON for minidisks on devices with a DFLTOFF
setting or OFF for minidisks on devices with a DFLTON setting.
The choice between track level caching and record level caching is
also made at the minidisk level.
The ability also exists to flush data from minidisk cache for
particular devices and minidisks with the SET MDCACHE RDEV FLUSH or
SET MDCACHE MDISK FLUSH commands.
Guidelines
Here are some suggestions that will help you use minidisk cache more
effectively:
- Ensure that the paging configuration is appropriate.
As discussed above, the use of minidisk cache usually results in an
increase in paging and the arbiter assumes the paging configuration is
appropriate.
There can be significant performance degradation if minidisk cache is
used on a system that has not had the paging configuration tuned.
You want to make sure that there is sufficient paging space allocated.
z/VM likes large paging allocations in order to effectively use block
paging.
If the block size of page reads as reported by monitor is less than 10,
there is probably not enough DASD space allocated for paging.
Paging space should also be isolated so that the seldom-ending channel
program (start subchannel/resume subchannel) technique is effective.
Do not mix paging space with other types of data, including spool space.
Balance the paging I/O over multiple volumes where appropriate.
- When using minidisk cache, always have some real storage defined
for minidisk cache.
In systems where the I/O buffers in virtual storage of the user are
not page aligned, there can be a significant performance degradation if
you do not have some real storage allocated for minidisk cache.
- Consider biasing the arbiter against minidisk cache if the system
is very rich in storage.
Setting a value less than one on the BIAS option of the SET MDCACHE
will bias against minidisk cache.
Measurement results to date suggest that in environments that show no
storage constraint, the arbiters sometimes use more storage for
minidisk cache than is optimal.
The more storage constrained a system is, the better the arbiter tends
to work.
A bias in minidisk cache size can also be achieved by using SET
MDCACHE to set a maximum size.
- When planning what data should be enabled for minidisk cache, it
is generally better to start with everything enabled and then selectively
disable minidisks or volumes.
- Volumes with data that are physically shared between VM systems
should be disabled for minidisk cache.
There is no handshaking between VM systems to ensure that changes to
a minidisk are reflected in the minidisk cache of other systems.
This also applies to sharing minidisks between first and second level
systems.
- Minidisks where the majority of I/Os are write I/Os
should be disabled for minidisk cache.
Examples of this include the log disks for Shared File System.
- Minidisks with data that is only read once should be disabled for
minidisk cache.
The real benefit of minidisk cache is only seen for data that is
referenced multiple times.
- Disable minidisk cache for volumes that are mapped to VM data
spaces.
If data is being accessed by the mapped mdisk feature of VM data spaces,
there is no additional benefit from minidisk cache.
There can be significant overhead in processing to ensure data integrity
when both minidisk cache and mapped mdisks are used for the same
minidisk.
- Disable minidisk cache for minidisks that are the target of FlashCopy
requests. This includes FlashCopy requests that may be initiated by
virtual machines, such as HSM component of z/OS.
- You might want to temporarily disable minidisk cache for backup or
scan routines that reference all the data on a disk, but just once.
This can be done with the SET MDCACHE INSERT or SET MDCACHE MDISK CP
commands.
- Disable the minidisk cache fair share limit for key users.
Server machines or guests that do I/O on behalf of many users should
not be subject to the minidisk cache fair share limit.
Use the NOMDCFS operand on the OPTION statement in the user entry of
the system directory to turn off the fair share limit.
- Prepare for minidisk caching on devices shared between first and
second level systems.
Care should be used for minidisks shared with first level systems so
that all changes are reflected to the second level system.
For example, a read-write first level minidisk is shared with a second
level system as a read-only minidisk.
The minidisk is cached by the second level system, and then a change
is made to the minidisk by the first level system.
The change will not be reflected on the second level system if the data
had been in the second level system's cache.
In order to see the changes, one must purge the cache on the second
level system with the SET MDCACHE MDISK FLUSH command and reaccess the
minidisk.
-
Minidisks residing on emulated-FBA-on-SCSI devices
are eligible for MDC.
Because the processor cost per I/O is greater for
emulated FBA devices than for traditional disk
devices (e.g., ECKD), MDC is a particularly
important tool for improving I/O performance of emulated FBA.
More information
about the performance of emulated FBA and the
effect of MDC on emulated FBA
can be found in the
z/VM Performance Report.
-
With VM/ESA 2.2.0 and later,
minidisks defined via the DEVNO clause of the
CP directory MDISK statement are in fact eligible
for MDC.
-
Minidisks being used for Linux file systems should be configured
with MDC OFF unless they are almost exclusively read-only.
This especially means that minidisks holding Linux swap
extents should be run with MDC OFF.
The read fraction
is not high enough to pay off the processor penalty paid
for updating MDC on reads. For more information about
Linux swapping, check
our Linux tips page.
Back to the Performance Tips Page
|