VM/ESA 1.2.2 Performance Changes
- Performance Improvements
- Minidisk Caching
- Scheduler Share Capping and Proportional Distribution
- SPXTAPE Command
- IUCV Pathlength Reduction
- Global Lock Removal for ISFC I/O Buffer
- Asynchronous Data Mover Facility (ADMF)
- TRSOURCE Selectivity
- Clean Start
- Storage Management Enhancements
- CMS - Larger Preallocated Stack for DMSITS
- CMS - Improved Block Allocation for Virtual Disk in Storage
- SFS - Improved Handling of Released File Blocks
- SFS - Revoke Performance
- Performance Considerations
- Performance Management
Major enhancements have been made to minidisk caching in VM/ESA 1.2.2. These enhancements broaden minidisk cache's scope of applicability, improve minidisk cache performance, and provide better controls over the caching process.
Prior to VM/ESA 1.2.2, minidisk caching was subject to a number of restrictions:
- CMS 4K-formatted minidisks only
- access via diagnose or *BLOCKIO interfaces on ly
- the cache must reside in expanded storage
- the minidisk cannot be on shared DASD
- dedicated devices are not supported
- FBA minidisks must be on page boundaries
With VM/ESA 1.2.2, there are fewer restrictions. The last three of the above restrictions still apply and the minidisk must be on 3380, 3390, 9345, or FBA DASD. However:
- The minidisk can be in any format.
- In addition to the diagnose and *BLOCKIO interfaces, minidisk caching now also applies to DASD accesses that are done using SSCH, SIO, or SIOF.
- The cache can reside in real storage, expanded storage, or a combination of both.
By lifting these restrictions, the benefits of minidisk caching are now available to a much broader range of computing situations. Guest operating systems are a prime example. Other examples include CMS minidisks that are not 4K-formatted, applications that use SSCH, SIO, or SIOF to access minidisks, and systems that do not have expanded storage.
Even in situations where minidisk caching is already being used, enhanced minidisk caching can provide performance benefits. This is primarily due the following factors:
- The new minidisk cache reads and caches whole tracks instead of individual blocks. The entire track is read in with one I/O, resulting in improved efficiency and reduced average access times.
- When the minidisk cache is placed in real memory, performance is improved relative to having the cache in expanded storage.
Commands have been added that allow you to:
- set and query the size of the cache
- set and query cache settings for a real device or for a minidisk
- purge the cache of data from a specific real device or minidisk
- change a user's ability to insert data into the cache
- bias the arbiter for or against minidisk caching.
The CP scheduler has been enhanced in two significant ways in VM/ESA 1.2.2. First, surplus processing from users who are not using their entire share will be given out to other users in a manner that is proportional to their shares. In prior releases, surplus processing was given to the ready-to-run user having the highest share.
Second, an installation is now able to specify a limit on the amount of processing resource a user may receive. Three types of limits are supported: NOLIMIT, LIMITSOFT, and LIMITHARD.
- NOLIMIT (the default) means that the user will not be limited. This results in scheduling that is equivalent to prior VM/ESA releases.
- LIMITSOFT means that the user will not get more than its share if there is any other user that can use the surplus without exceeding its own limit. However, if no other user can use the processing time, the limit will be overridden to give the user more than its share rather than letting the processor time go to waste.
- LIMITHARD means that the user will not get more than its share, even if no one else can use the surplus.
SPXTAPE saves spool files (standard spool files and system data files) on tape and restores saved files from tape to the spooling system. SPXTAPE provides upgraded and extended function compared with the the SPTAPE command to be able to handle the large number of spool files used by VM/ESA customers.
In most cases, the elapsed time required to dump or load spool files is an order of magnitude less as compared to SPTAPE. The performance is good enough to make it practical to consider backing up all spool files on a nightly basis if desired.
The large reduction in elapsed time is primarily due to the following factors:
- Spool files are written to tape in 32K blocks.
- Tape I/O is overlapped with DASD I/O.
- Many spool file blocks are read at one time using the paging subsystem. This results in reduced DASD access time and overlap of DASD I/Os that are done to different spool volumes.
- Multiple tape drives are supported. These are used to eliminate tape mount delays and increase the amount of overlap between tape I/O and DASD I/O.
SPTAPE writes a tape mark between each backed up spool file. SPXTAPE writes the spool file data as one tape file consisting of 32K blocks. This reduces the number of tape volumes required to hold the spool files. Relative to SPTAPE, reductions ranging from 30% to 60% have been observed. The smaller the average spool file size, the larger the reduction.
IUCV and APPC/VM processor usage was reduced substantially in VM/ESA 1.1.1 and VM/ESA 1.2.1 Processor usage has been further reduced in VM/ESA 1.2.2.
Prior to VM/ESA 1.2.2, all ISFC activity was serialized by the ISFC global lock. With VM/ESA 1.2.2, each active link has a link lock associated with it and the I/O-related functions of ISFC are now serialized by this link lock instead of by the ISFC global lock. This change reduces potential contention for the ISFC global lock, thus improving responsiveness and increasing the maximum amount of message traffic that ISFC can handle when there are multiple active links.
VM/ESA's support of the ADMF hardware feature provides an extension to the existing channel subsystem which is capable of off-loading page move activity onto the I/O processor, freeing the instruction processor for other work while the page movement is performed. No external control or intervention is necessary. ADMF is made available for qualified guest use provided they:
- Have VM/ESA 1.2.2 or above running natively on the hardware
- Have the Dynamic Relocation Facility (DRF) available
- Are a preferred guest (V=R, V=F)
This enhancement provides conditional statements for type data traces so that the user can determine what data (if any) needs to be collected when a trace point is executed. If, on a given trace point occurrence, no data is to be collected, no trace record is cut. This capability allows high frequency code paths to be traced with minimal impact on system performance.
A clean start IPLs the system without attempting to recover spool files and system data files that existed prior to system shutdown. A clean start will typically take less time than a cold start because cold start recovers the system data files.
APAR VM57456 includes three CP storage management changes that can benefit performance. Two of them affect the demand scan algorithm while the third change is to the expanded storage migration algorithm.
- Formerly, as soon as the available list was replenished to the high
threshold, the demand scan was discontinued. With VM57456, when demand
scan visits a virtual machine, it now steals all of that virtual
page frames. If that results in the high threshold being reached or
exceeded, the demand scan is then discontinued.
This change is not expected to have any significant performance effect for most types of system usage. It is directed towards situations where a virtual machine is rapidly referencing large numbers of pages and many of those pages are not referenced again for a long time. With VM57456, these unneeded page frames are identified more quickly and returned to the system for other uses.
- Demand scan has changed how it handles pages that have been read in as part of a block but are never referenced. In some cases, it now steals such pages much more aggressively and does not page them out to expanded storage. The net effect is to further increase the preference that is given to referenced pages.
- An optimization has been added to expanded storage migration. A bit is maintained in invalid segment table entries that indicates whether or not any of that segment's pages reside in expanded storage. If there are none, that segment's PGMBK is not examined for migration candidates. The primary performance benefit occurs in the case where the PGMBK was previously paged out. In that case, this optimization avoids paging in the PGMBK in order to examine it.
This APAR is being made available on VM/ESA 1.2.2 and all earlier VM/ESA releases back to VM/ESA 1.1.1. It is especially applicable to VM systems that are running SQL/DS* with the VM Data Spaces Support Feature.
This change reduces the number of dynamic storage requests made by CMS during SVC processing. This results in a reduction in processor requirements for a broad range of CMS functions.
In VM/ESA 1.2.1, block allocation was always done using a moving cursor algorithm. This method continues its scan for free blocks from the point where it left off, wrapping around when the end of the minidisk is reached. For normal minidisks on DASD, this algorithm is advantageous because it helps to keep a given file's blocks near to each other (and often contiguous). This reduces the number of I/Os and DASD access time. However, this algorithm is not as well suited to virtual disks in storage.
With VM/ESA 1.2.2, if the minidisk is a virtual disk in storage, the block allocation algorithm is changed so as to scan for blocks starting at the first available block on the minidisk. In this way, blocks that become available as files are erased are more quickly reused. As a result, the virtual disk in storage will tend to generate less paging activity and have fewer page frames associated with it.
When one or more file blocks are released (from erasing a file, for example), those blocks normally become immediately available for use by other SFS requests once the change is committed. However, when SFS is required to maintain a consistent image of that file, or the directory or file space that file resides in, the released blocks cannot be made available for reuse until that requirement goes away. For example, if some other user currently has that file open for read, that file's blocks cannot be made available until that other user has closed the file. Other examples of read consistency include DIRCONTROL directories (which have ACCESS to RELEASE read consistency) and the DMSOPCAT interface.
Prior to VM/ESA 1.2.2, the way in which these blocks were managed could cause instances of high processor utilization in the SFS server in cases where very large numbers of deferred file blocks were involved. A change has been included in VM/ESA 1.2.2 that eliminates most of the processing time used to manage blocks that require deferred availability.
The processing associated with revoking authority from an SFS file or directory has been changed to reduce the likelihood of catalog I/O having to occur when there are still other users who have individual authorizations to that object.
The performance of the VMFBLD function has been improved. This will generally result in a decrease in elapsed time required to build a nucleus.
The automation of more service processing in VMSES/E R2.2 eliminates certain manual tasks. Therefore, the overall time required to do these tasks will decrease. The following automation functions have been added to VMSES/E in VM/ESA 1.2.2:
- The VMFPSU command automates planning for a Product Service Upgrade.
- The GENCPBLS command automates the process for modifying the CPLOAD EXEC when you make local modifications to HCPMDLAT MACRO.
With the enhancements to the CP minidisk cache feature in VM/ESA (ESA Feature), the following are potential items to consider when migrating from previous releases of VM/ESA (ESA Feature) or VM/XA SP. For more details see the VM/ESA: Planning and Administration book.
- Remove expanded storage from system if added specifically for minidisk cache.
- Review storage allocation for minidisk cache.
- Use SET MDCACHE or SET RDEVICE commands instead of SET SHARED to enable minidisk cache on volumes.
- Enable caching for minidisks that were poor candidates in the past.
- Disable caching for minidisks that are poor candidates.
- Disable minidisk cache fair share limit for key users.
- Reformat some minidisks to smaller blocksize.
- Prepare for minidisk caching on devices shared between first and second level systems.
- Avoid mixing standard format and non-standard format records on the same cylinder.
ISFC pathlengths have increased in VM/ESA 1.2.2. This will ordinarily have no significant effect on overall system performance. However, applications that make heavy use of ISFC may experience some decrease in performance.
In prior releases, surplus processing was given to the ready-to-run user having the highest share. This has been changed in VM/ESA 1.2.2. Surplus processing from users who are not using their entire share is now given out to other users in a manner that is proportional to their shares.
For most installations, this change will either have no significant effect or result in improved performance characteristics. However, there may be cases where an installation's desired performance characteristics have an implicit dependency upon the old method of allocating excess share.
For example, consider a VM/ESA system where most of the users run at the default relative share setting of 100, an important server machine that does large amounts of processing has a relative share of 120, and there are several other virtual machines that have very large relative shares. Prior to VM/ESA 1.2.2, the server machine may have provided excellent performance, but only because it was preferentially receiving large amounts of unused share. With VM/ESA 1.2.2, that server machine's allocation of the excess share can become much smaller as a result of the new proportional distribution method, possibly resulting in periods of unacceptable server performance.
Before migrating to VM/ESA 1.2.2, then, check your virtual machine share allocations for situations like this. If you find any such case, increase that virtual machine's share allocation to more properly reflect that virtual machine's true processing requirements.
One side effect of the scheduler changes that were made to implement the proportional distribution of excess share is that there now tends to be somewhat less favoring of short transactions over long-running transactions. This shows up as increased trivial response time and somewhat decreased non-trivial response time. This effect is generally small, but is more significant on small processors where processing time is a larger percentage of overall response time.
The SET SRM IABIAS command can be used, if desired, to increase the extent to which the CP scheduler favors short transactions over longer ones. Doing so can result in an improvement in overall system responsiveness.
The default STORBUF settings have been increased from 100%, 85%, 75% to 125%, 105%, 95%. If your system is currently running with the default settings, you can continue to run with the old defaults by issuing SET SRM STORBUF 100 85 75.
Experience has shown that most VM/ESA systems run best with some degree of storage overcommitment. The new defaults are a reasonable starting point for systems that do not use expanded storage for paging. Systems that do use expanded storage for paging often run best with even larger overcommitment. You can do this either by specifying larger STORBUF values or by using SET SRM XSTORE to tell the scheduler what percentage of expanded storage to include when determining the amount of available storage for dispatching purposes. See for more information.
VM/ESA 1.2.2 provides a new SNAPDUMP command that can be used to generate a system dump, identical to a hard abend dump, without bringing the system down. When using this command, keep in mind that:
- All activity in the system is stopped while the dump is in progress.
- The elapsed time required to take the dump is similar to the elapsed time required to obtain a system abend dump.
- The elapsed time can be reduced by using the SET DUMP command to restrict the dump to just those areas that are required.
This change will allow applications running on GCS to obtain performance benefits by using VM Data Spaces through use of the existing CP macros. When running in an XC mode virtual machine, this support requires some additional processing by GCS. For example, the access registers must be saved, along with the general registers, whenever a GCS supervisor call is made. To avoid this additional processing, do not run GCS in an XC mode virtual machine unless you are running a GCS application that makes use of data spaces.
A number of new monitor records and fields have been added. Some of the more significant changes are summarized below. For a complete list of changes, see the MONITOR LIST1403 file for VM/ESA 1.2.2. See for information about this file.
- Scheduler Monitor Enhancements
The monitor has been enhanced to provide data on the new maximum share feature of the scheduler. This includes the maximum share setting in user configuration and scheduler records, a system count of current users in the limit list, and rate that users are added to the limit list. A state counter for user in limit list has been added to high frequency user sampling and system sampling.
Other data related to the scheduler features that are not new were also added. This includes the following fields:
- The amount of total storage considered available when making scheduler decisions.
- The sum of the working sets for users in the various dispatch classes.
- The percentage of expanded storage to use in available memory calculations as set by the SET SRM XSTORE command.
- Minidisk Cache Monitor Changes
The new minidisk cache required that several changes be made to related monitor data. Several previously existing fields contain minidisk cache information. The source (control block field) for monitor fields has been changed to maintain compatibility where possible. Some of the existing fields are no longer meaningful because of the different design of the new minidisk cache. These fields have been made reserved fields. In addition, some new information has been added to monitor:
- The system-wide setting for minidisk cache.
- Real storage usage by minidisk cache.
- Expanded storage usage by minidisk cache.
- Related settings for individual virtual machines (NOMDCFS and NOINSERT).
- Cache eligibility settings for each real device.
- Improved User State Sampling
The accuracy of high frequency state sampling for virtual machines has been improved. Previously, the "running" state was skewed low. In fact, on uni-processor machines, percentage of time spent in the "running" state was shown as zero. When virtual machines are being sampled, it is CP (the monitor) that is running. While the skewing is less on n-ways as n increases, it is still skewed. This has been corrected in VM/ESA 1.2.2 by checking to see if a user virtual machine has been displaced by the monitor, and if so, mark that virtual machine as running.
- Other Monitor Enhancements
Other information added to the monitor includes:
- Information on processor spin time for formal spin locks.
- Information in the user domain for the new "logon by" feature.
- Indication of Asynchronous Data Mover installation.
Two of the INDICATE commands have been extended to accommodate the scheduler and minidisk caching changes.
- Another line is added to the response from INDICATE LOAD to show
the number of users who are currently in the limit list. The limit
list is introduced in VM/ESA 1.2.2 by the new maximum share scheduler
function. This list represents the subset of users on the dispatch
list who are currently being prevented from running because they would
exceed their maximum share setting.
The response from INDICATE LOAD has been changed slightly to reflect the fact that the minidisk cache can reside in real storage as well as expanded storage. When minidisk caching is being done in both real and expanded storage, the numbers shown reflect the combined benefits of both caches.
In VM/ESA 1.2.1, the MDC hit ratio is computed on a block basis. In VM/ESA 1.2.2, it is computed on an I/O basis.
- In the INDICATE QUEUES response, a user who is currently in the limit list will be designated as L0, L1, L2, or L3. The current Q0, Q1, Q2, and Q3 designations will be shown for users who are in the dispatch list and not on the limit list.
As mentioned in the discussion of monitor changes, the new minidisk cache design means that some of the MDC performance measures no longer apply, while others have a somewhat different meaning. Because of this, you should exercise caution when comparing the performance of minidisk caching on VM/ESA 1.2.2 with the performance of minidisk caching when the same system was running an earlier VM/ESA release.
The MDC hit ratio is especially unsuitable for comparing minidisk caching performance between VM/ESA 1.2.2 and a prior VM/ESA release.
- Because the enhanced minidisk cache lifts a number of restrictions, it is likely that a significant amount of data that was ineligible will now start participating in the minidisk cache. This is likely to affect the MDC hit ratio. If there is some constraint on the MDC size, this additional data may well cause the hit ratio to go down. At the same time, however, the number of real I/Os that are avoided is likely to go up because these additional DASD areas now benefit from minidisk caching.
- In VM/ESA 1.2.1, the MDC hit ratio is computed on a block basis. In VM/ESA 1.2.2, it is computed on an I/O basis. This difference can sometimes result in a significant difference in the computed hit ratio.
- There is another important difference if you are looking at RTM data. In VM/ESA 1.2.1, the MDC hits are only divided by those DASD reads that are eligible for minidisk caching. In VM/ESA 1.2.2, the MDC hits are divided by all DASD reads (except for page, spool, and virtual disk in storage I/O). This can lead to MDC hit ratios that appear lower on VM/ESA 1.2.2 than were experienced on earlier releases, even though minidisk caching may actually be more effective.
To avoid these problems, look at I/Os avoided instead. I/Os avoided is a bottom-line measure of how effective MDC is at reducing DASD I/Os. Further, this measure is very similar in meaning between VM/ESA 1.2.2 and prior VM/ESA releases. RTM VM/ESA provides I/Os avoided on a system basis. (For VM/ESA 1.2.1, look at MDC_IA on the SYSTEM screen. For VM/ESA 1.2.2, look at VIO_AVOID on the new MDCACHE screen.) VMPRF's DASD_BY_ACTIVITY report shows I/Os avoided on a device basis.
The MONVIEW package is a set of tools which can assist you when looking at raw VM/XA or VM/ESA monitor data. It accepts monitor data from tape or disk and creates a CMS file with a single record for each monitor record. Options exist to translate the header of monitor data for domain/record/timestamp.
MONVIEW is provided on an ás is" basis, and is installed as samples on the MAINT 193 disk.
The real storage requirements of a CMS-intensive environment can often be reduced by placing the frequently executed S-disk modules into a logical segment so that one copy is shared by all users. This used to be done as an extra step following VM/ESA installation. With VM/ESA 1.2.2, this has now been integrated into the VM/ESA installation process. Two logical segments are used: one for modules that can run above the 16 meg line and one for modules than cannot. A discussion of how to manually create a logical segment for modules has been retained in for reference by those who wish to customize this step.
The following list describes fields in the virtual machine resource usage accounting record (type 01) that may be affected by performance changes in VM/ESA 1.2.2. The columns where the field is located are shown in parentheses.
- Milliseconds of processor time used (33-36)
- This is the total processor time charged to a user and includes both CP and emulation time. For most workloads, this should not change much as a result of the changes made in VM/ESA 1.2.2. Most CMS-intensive workloads are expected to experience little change in virtual processor time and a slight decrease in CP processor time. I/O-intensive environments that are set up to use the enhanced minidisk cache and were not using minidisk caching prior to VM/ESA 1.2.2 can experience larger decreases in total processor time (up to 6%).
- Milliseconds of Virtual processor time (37-40)
- This is the virtual time charged to a user. As mentioned above, little change is expected for most workloads.
- Requested Virtual nonspooled I/O Starts (49-52)
- This is a total count of requests. All requests may not complete. The value of this field should see little change in most cases.
- Completed Virtual nonspooled I/O Starts (73-76)
- This is a total count of completed requests. All requests may not complete. The value of this field should see little change in most cases.
RTM VM/ESA 1.5.2 has been updated to include performance data for the new minidisk cache in VM/ESA 1.2.2. Most of this data is provided in two new screens--MDCACHE and MDLOG. The remaining data is provided as updates to the existing SYSTEM and XTLOG screens.
The calculation of the minidisk cache hit ratio (MDHR) has been changed in two ways.
- In VM/ESA 1.2.1, the MDC hit ratio is computed on a block basis. In VM/ESA 1.2.2, it is computed on an I/O basis. This difference is also the case for the MDC hit ratio reported by the INDICATE LOAD command.
- In VM/ESA 1.2.1, the MDC hits are only divided by those DASD reads that are eligible for minidisk caching. In VM/ESA 1.2.2, the MDC hits are divided by all DASD reads (except for page, spool, and virtual disk in storage I/O). This difference only applies to the MDC hit ratio as reported by RTM. The MDC hit ratio reported by the INDICATE USER command continues to be MDC hits divided by MDC-eligible DASD reads.
(1) During demand scan pass 1, this is all unreferenced page frames on the unreferenced portion of the user-owned frame list, subject only to that virtual machine's reserved page count (if any). On average, these frames have not been referenced for 1.5 reorder intervals.