IBM: VM/ESA 1.2.0 Performance Changes

VM/ESA 1.2.0 Performance Changes

Performance Improvements
Performance Considerations
Performance Management

Performance Improvements

CP Free Storage

The code that handles short-term free storage requests was redesigned to reduce the processor requirements of this CP function. The processing time in HCPFRE was reduced by over one-third. The degree of benefit is proportional to the frequency of free storage requests.

In addition, two new macros were created that normally obtain or release a block of free storage without calling HCPFRE. These macros are intended for use by performance-critical CP components and are not for general use. They are currently exploited by three other performance enhancements: HCPCFR (described in "Reduced Master Processor Usage"), IUCV (described in "IUCV Storage Management"), and fast path CCW translation (described in "Fast Path CCW Translation Extensions").

IUCV Storage Management

In VM/ESA Version 1 Release 1.1, one of the several changes to IUCV that led to improved performance was the more efficient storage management of control blocks MSGBK and IUSBK. Storage management was improved by making the control blocks semi-permanent and by exploiting stack management. This reduced the processor usage of HCPFRE.

However, this improvement caused short term garbage collection to be less efficient, because the control blocks were being held for several minutes (long-term). This caused storage fragmentation. In an attempt to improve short term garbage collection in VM/ESA 1.2.0, the control blocks were obtained from long-term storage. But large systems experienced large-system effects in the long-term storage algorithm when obtaining long-term storage. This caused unacceptable response times and internal throughput rate.

APAR VM54161 to VM/ESA 1.2.0 corrects three problems by using inline macros to get the control blocks out of short term storage.

Large-system effect
This is avoided because the control blocks come out of short-term storage. There is no net improvement over VM/ESA 1.1.1. However, avoiding the large-system effect is the primary reason for this APAR.
HCPFRE calls
By using the inline macros, calls are avoided to HCPFRE. This is an improvement over VM/ESA 1.1.1.
Short-term garbage collection
Storage fragmentation does not occur, because storage is no longer held long-term. This is an improvement over VM/ESA 1.1.1.

For more information on the inline macros, see "CP Free Storage".

Reduced Master Processor Usage

In VM/ESA, one method of serialization is to limit certain functions to run on only the master processor. Only one processor is designated as the master in VM/ESA. The IPLed processor is usually the master. However, if the IPLed processor has a special feature (vector or crypto) installed, then another processor without a special feature is selected as the master. The master can also be changed by disabling processors (using the VARY OFFLINE PROCESSOR command). You can determine which is the master processor by issuing the CP QUERY PROCESSORS command.

Because there is only one master processor, it has the potential to be a bottleneck if the demand for master-only work is great. As the number of processors in the complex increases, the potential for a master processor bottleneck increases. For example, a master processor bottleneck is more likely on a 6-way system than on a 2-way system.

Two changes were made in VM/ESA 1.2.0 that reduce the master processor requirements. Neither change involved moving work off the master, because moving work off the master would involve replacing the serialization methodology, a nontrivial task. Instead, these changes improve the efficiency of processing on the master. As a result, they also have a positive effect for environments that are not master processor constrained. The two items are as follows:

HCPCNF improvement - DIAGNOSE code X'08' handling.
HCPCFR improvement - storage management for the console formatter

The HCPCNF module formats console output from DIAGNOSE code X'08' processing. This DIAGNOSE code issues CP commands. The modification applies to console output that goes to a buffer, not the user's screen. This would apply to programs that use EXECIO or CMS Pipelines to capture output from CP commands.

CP takes special precautions to prevent a single user from flooding the system if the user issues DIAGNOSE code X'08's that generate a lot of output. Prior to VM/ESA 1.2.0, each time this DIAGNOSE code was issued, CP opened a dispatch window from HCPCNF to allow another user to run. This kept the virtual machine that repeatedly issued the DIAGNOSE code from flooding the system with work.

In VM/ESA 1.2.0, the frequency of HCPCNF opening the dispatch window is changed. Instead of opening the dispatch window every time, it is opened approximately every sixteenth time. This eliminates most of the overhead while maintaining the protection feature.

The HCPCFR module is not a master-only module. However, it is usually called by master-only modules, and therefore ends up completing on the master processor. This module handles console output formatting related to CP commands. In the OfficeVision* environment, it is called frequently. The improvement in HCPCFR involved replacing costly CP free storage calls with the new inline macros. These inline macros are new in VM/ESA 1.2.0. See the "CP Free Storage" section for additional information on them.

Improving the efficiency of master processor work reduces processor consumption. In addition, master processor constrained environments may also see significant improvement in response times. However, the great majority of systems are not master processor constrained. A system probably has a meaningful master processor constraint if the processor utilization of the master processor is much higher than the utilization of the other processors, and if the percent emulation time and idle time on the master processor is less than 5%.

Monitor Enhancement

This enhancement applies to VM systems that run with user state sampling (MONITOR SAMPLE ENABLE USER). When user state sampling is enabled, HCPMOU is called once for every logged on or disconnected user at the end of each high frequency sample interval. In VM/ESA 1.2.0, HCPMOU no longer calls HCPFRE to obtain or release free storage for a module work area. Instead, HCPMOU uses leftover storage in the save block (used for linkage). Reducing the number of free storage calls reduces processor usage and improves response time.

The number of calls avoided is proportional to the number of users and the monitor sample rate.

DASD Slot Allocation

The DASD slot allocation algorithm, which gives out page and spool slots, was redesigned to take advantage of block paging and to reduce I/O seek times by increasing the number of contiguous DASD slots.

As in the past, a moving cursor approach is used. However, the new scheme allocates contiguous slots as much as possible by scanning ahead of the cursor for groups of contiguous available slots of appropriate sizes that reside on the same cylinder. This is accomplished in two ways:

These groups of contiguous available slots have a minimum size of two. anything less than this minimum size is ignored. The one with the best fit is selected.
Allocation no longer switches volumes in the middle of allocating a block. Instead it may switch a little prematurely or a little late.

Environments that do significant paging to DASD should see an improvement in response time because of faster page fault resolution and a reduction in I/Os on the paging volumes. The benefits increase as the system ages. For a CMS-intensive benchmark, the average pages per page I/O increased 13% as a result of this improvement.

Fast Path CCW Translation Extensions

Fast path CCW translation was introduced as APAR VM51012 to VM/ESA 1.1.1.

Fast path CCW translation was extended in VM/ESA 1.2.0 to include support for FBA DASD. In addition, it now applies to more types of DASD channel programs. Fast path CCW translation also takes advantage of two new macros that normally obtain or release free storage without having to call HCPFRE.

Typical DASD I/O such as SSCH (Start Subchannel), DIAGNOSE code X'A8', SIO (Start I/O), and SIOF (Start I/O fast) will benefit from this. However, not all I/O is DASD I/O. For example, if the system is reporting a high rate of DIAGNOSE code X'A8's, many of these could be for unit record devices, which do not benefit from this improvement.

In VSE guest environments that are heavy in DASD I/O, a significant performance improvement can be gained by this enhancement. In a CMS-intensive environment, where most minidisk I/Os are diagnose code X'A4's and SFS I/Os are block I/O, the performance improvement is less significant. Fast path CCW translation does not apply to these cases because they take a different path through CP's I/O subsystem. The input to CP is not an actual channel program, but rather input from which CP builds the channel program.

DSPSLICE Calculation Change

CP initialization establishes a minor time slice for the dispatcher to use when running virtual machines. This code calculates the amount of time to complete about 100 000 selected instructions. The result is used as the dispatching minor time slice. Installations can change the size of the time slice with the CP SET SRM DSPSLICE command, but a reasonable default value is important. As the minor time slice becomes too small, the cost to run and to stop running users goes up. As the time slice is increased, the responsiveness of the system may decrease.

Before VM/ESA 1.1.1, the minimum value for the minor time slice was 1 millisecond. If the CP initialization code calculated a value less than 1 millisecond, CP used 1 millisecond instead. On some large S/390* processors, this led to minor time slices of 2 milliseconds or less. Such a small value caused a noticeable increase in the CP processor time needed to run and to stop running users. In VM/ESA 1.1.1, the minimum default time slice was changed to 5 milliseconds because of this problem.

Even after this change, the time slice value for some of the smaller S/390 processors remained too small. Many of the instructions used by the initialization code to determine the minor time slice were too short relative to a typical instruction mix and this discrepancy was particularly pronounced on smaller S/390 processors. Although the default minor time slice for these processors fell above the 5 milliseconds minimum introduced in VM/ESA 1.1.1, the value was still too low for efficient dispatching of work.

VM/ESA 1.2.0 changes the loop of instructions used to determine the minor time slice. The number of instructions completed is still about the same, but on average the instructions are longer. This loop produces a reasonable minor time slice value for the whole range of S/390 processors. The minimum default time slice is still 5 milliseconds, and the maximum remains at 100 milliseconds.

There are two principal benefits of this change:

Installations with small S/390 processors do not have to issue the CP SET SRM DSPSLICE command to achieve reasonable dispatching costs.
Installations with small S/390 processors that do not issue CP SET SRM DSPSLICE to increase the minor time slice may see lower CP processor costs. When the DSPSLICE is increased, workloads consisting of long-running transactions experience the largest dispatching efficiency benefits.

The DSPSLICE value is reported by monitor data and also by the QUERY SRM DSPSLICE command, with less precision.

Free Storage Limit Detection

In prior releases, a virtual machine could request excessive amounts of CP free storage, which could disable the system and be seen as performance degradation or empty available list. This situation could be caused by:

Repeatedly issuing CP commands that consume free storage
Performing tasks in a disconnected machine that caused large amounts of console output to be routed to a secondary user
Accounting or EREP records not being retrieved by the accounting or EREP virtual machine.

In VM/ESA 1.2.0, the free storage limit detection function tracks free storage so that a virtual machine cannot request excessive amounts. Three thresholds are calculated based on the size of the dynamic paging area. If the virtual machine exceeds the first threshold, it receives a warning message. If the virtual machine exceeds the second threshold, it is put into stopped state. If the virtual machine exceeds the third threshold, it is forced off the system. Messages are sent to the virtual machine and the operator user ID for each threshold crossed.

Four new commands were added to the CP Commands. The QUERY STGLIMIT command (privilege classes: A, B, C, and E) checks the state of free storage limit detection for the system. The SET STGLIMIT command (privilege classes: A, B, and C) controls the state of free storage limit detection for the system. The QUERY STGEXEMP command (privilege classes: A, B, C, and E) checks the state of free storage limit detection for virtual machines. The SET STGEXEMP command (privilege classes: A, B, and C) controls the state of free storage limit detection for virtual machines.

Exempting a user ID ensures that the user ID is not subject to being stopped or forced off because of the amount of free storage it causes CP to consume. This is recommended for special purpose user IDs that are vital to the installation, user IDs running trusted code, and user IDs that should never be forced off the system. If no action is taken, free storage limits are enforced for all users.

CP Configurability and the Frame Table

For VM/ESA 1.2.0, the value that CP uses to determine the size of the frame table to build has changed. Prior to VM/ESA 1.2.0, CP used the RMSIZE value generated in HCPSYS. In VM/ESA 1.2.0, CP builds the frame table for the smaller value of either the actual real storage or RMSIZE of the SYSTORE macro in HCPSYS. With this improvement, the system may use less fixed real storage for the frame table.

The reported SYSGEN value for the QUERY FRAMES command was changed to reflect the smaller value of either actual real storage or RMSIZE.

The frame table is no longer built as part of the CP nucleus. It is built in the dynamic paging area. This reduces the size of the CP nucleus, which reduces the time to read the nucleus at system IPL.

Read-Only Minidisks

The format for accessed read-only CMS minidisk directories FSTs (File Storage Table) was improved. The old layout, called hyperblock format, alternated a 30-byte header with a block of FSTs, where the size of the block equaled the minidisk blocksize. The new layout has one header followed by all the FSTs. These FSTs are organized so that the block of FSTs in the hyperblock map are aligned by page boundary. This structure was in use, for some time, for the S and Y disks.

During FST lookup, the hyperblock map pointed to the page of desired FSTs. If the desired page doesn't exist, only one page of FSTs would be referenced. Previously, the hyperblock of FSTs would span pages causing two pages to be referenced. For 4KB-blocked minidisks with an average-to-large number of files, this new layout typically eliminates one referenced nonshared page per read-only minidisk (other than S and Y) in the search order. Note, this applies only to read-only minidisks, not read-only SFS directories.

CMS Record Manager

Improvements were made to the CMS record manager (DMSRCM) to increase from 8 to 500 the maximum number of 4KB blocks that DMSRCM reads or writes with one DIAGNOSE code X'A4'. This can result in reduced DIAGNOSE code X'A4's, processor usage, I/O time, and device usage. This improvement applies to minidisk files used by applications (like XEDIT) that specify a large buffer. Most applications are unaffected by this improvement, and it does not apply to SFS.

CMS Storage Management

Since CMS 5.5, page boundary alignment was forced for storage requests greater than or equal to 4KB. In VM/ESA 1.2.0, this is no longer done. As a result, applications that do a large number of requests greater than or equal to 4KB may see a reduction in virtual or real storage requirements and paging. However, if no such requests are made, it costs the CMS user 1 more referenced nonshared page.

For those applications that require page boundary alignment, CMSSTOR has an option to do this.

CMS Application Multitasking

CMS provides a multitasking environment for applications and servers. The multitasking facilities are available only at the application programming level and are provided as routines in a callable service library. The CMS user still runs one application at a time, but the application can split itself into multiple threads, and the multiple threads can be dispatched on multiple virtual processors. These multitasking facilities allow applications to harness the power of the underlying multiprocessor complex and to overlap operations, achieving high performance. Multiprocessor exploitation is supported in XA or XC mode virtual machines only.

NAMEFIND Improvement

In VM/ESA 1.2.0, improvements were made to the processing of the NAMES file. These improvements include caching the NAMES file in storage and optimizing the search for tags. The degree of improvement depends on the size of the NAMES file and the number of names requested; the larger the NAMES file and number of requested names, the larger the improvement.

REXX Storage Management

REXX storage management APARS VM47302 and VM50916 for VM/ESA 1.1.1 can improve performance. These REXX storage management improvements are part of VM/ESA 1.2.0. Most of the performance change in REXX from VM/ESA 1.1.1 to VM/ESA 1.2.0 is because of these changes.

The improvements are to obtain or release storage in larger, page-aligned areas. By obtaining larger areas, there are fewer calls to CMS storage management. This reduces pathlength, but depending on the REXX program, some additional virtual storage may be required.

By obtaining page-aligned areas, a header address can be calculated by clearing the low order bits rather than scanning a chain. Removing this scan can reduce pathlength and thrashing when REXX releases variable storage.

REXX and Global Variables

The REXX VALUE function was extended in VM/ESA 1.2.0 to permit manipulation of global variables (as is done by the CMS GLOBALV command). Prior to this, manipulation of global variables from REXX required use of the GLOBALV CMS command. The VALUE function is much faster than GLOBALV when there is only one variable involved. When working with a list of variables, the GLOBALV command has an advantage because it supports a list of variable names.

SFS Checkpoint

The checkpoint process consists of writing back to DASD all changed catalog blocks and releasing all shadow blocks allocated since the last checkpoint. The checkpoint routine serializes the operation of the server before beginning this process. The checkpoint process was changed in two ways to reduce the amount of time that the server is serialized. First, the writeback of changed catalog buffers now exploits multiblock block I/O. Second, writeback of changed catalog buffers is now done before checkpoint serialization starts. This is called preflush. Any catalog buffer changes that may have occurred after the preflush and before the checkpoint is serialized are still written back during the serialization period.

These improvements benefit users of read/write SFS file pools. The benefit in reduced checkpoint serialization time results in a lower average response time, a more consistent response time, and a reduced SFS server working set. In addition, the reduced checkpoint time allows a larger CATBUFFERs setting for better exploitation of large real memories.

From QUERY FILEPOOL STATUS, checkpoint duration can be calculated by dividing checkpoint time by checkpoints taken.

SFS Catalog Insert Algorithm

The SFS catalog insert algorithm was changed to remember the location of the last inserted row and begin the search for space at that location. The benefits from this change are reduced processor usage and, in some cases, such as very large catalog spaces, reduced I/Os. The degree of benefit is proportional to the rate of inserts into the catalog and the catalog size. Typically, most catalog inserts come from GRANT AUTH and files that are larger than 32KB.

SFS Log Manager

The SFS log manager was enhanced to exploit multiblock block I/O. This enhancement eliminates log writes because of a full log buffer. Instead, the full log buffer is written at the next commit along with the buffer containing the commit log record. In addition, the log manager routines were combined and streamlined for reduced pathlength.

These changes apply to users of read/write file pools and result in fewer log I/Os and reduced processor usage.

SFS Default File Cache Size Increased

The default file cache size for SFS files (a CMS nucleus generation option) was increased from 12KB to 20KB. This has the advantage of reduced server communication and I/Os, but increases virtual storage requirements and may increase paging. The 20KB default should be a better trade-off for most installations.

Performance Considerations

CMS Application Multitasking

The CMS application multitasking code is in a callable services library called VMMTLIB. It is important to save VMMTLIB in a shared segment whether CMS application multitasking is in use or not, in order to keep virtual and real storage requirements to a minimum.

Virtual Storage Increase

CMS virtual storage requirements for VM/ESA 1.2.0 increased by about 25 pages. The majority of this increase is due to CMS application multitasking. These pages are referenced at CMS initialization or when CMS multitasking is used. For workloads that do not use multitasking, these pages are migrated out to DASD, causing an increase in DASD slot usage and an increase in paging.

Productivity Aids

The performance of the CMS productivity aids changed. Most of the commands were rewritten in REXX, causing an increase in pathlength. This tends to be balanced by the NAMEFIND improvement discussed earlier.

FBA DASD Considerations

Because FBA devices are, in general, slower than CKD devices, minidisk caching is particularly beneficial for reducing the response times of FBA DASD. To use minidisk caching with FBA minidisks, it is important to format the CMS minidisks with a block size of 4KB. However, the CMS default block size for FBA devices is 1KB. Therefore, 4KB blocking must be specified explicitly. In addition to this requirement, when CMS minidisks are allocated on an FBA DASD volume, they must begin on a 4KB page boundary (that is, a block address that is divisible by 8).

LOCATEVM and LOCATE (Storage) Commands

The LOCATEVM and LOCATE CPReal/LOCATE CPVirtual commands permit users to search storage. Both commands can consume very large amounts of resources when the given range for the locate is large. The LOCATEVM command is a class G command and therefore permits any end user to impact system performance. User class restructure could be used to change the class of LOCATEVM command to avoid potential problems.

Performance Management

Monitor Enhancements

A number of new monitor records and fields have been added. Some of the more significant changes are summarized below. For a complete list of changes, see the MONITOR LIST1403 file for VM/ESA 1.2.0. For information about the content and format of the monitor records file, see the VM/ESA: Performance book.

LPAR Monitor Enhancements
Prior to VM/ESA 1.2.0, monitor reported partition processor consumption, but did not report LPAR management time (the processor busy time that is not charged to any given partition). Monitor was enhanced to report LPAR management time. A flag in monitor domain 0 record 15 (D0/R15) tells whether the information is reported by the hardware or not, and a new record (D0/R17) reports the management time per physical processor.
There is a new flag in D0/R15 that reports whether capping is in effect. Capping limits the processor resources that a partition may consume.
ESCON* Multiple Image Facility (EMIF) Monitor Enhancements
EMIF provides the capability to share a physical ESCON channel among multiple logical partitions running on the same processor. Prior to EMIF, processor channels were dedicated to individual logical partitions. The new EMIF monitor information (found in D0/R18) reports busy time because of the partition in which VM/ESA is running.
Additional Monitor Enhancements
The following fields were added:
- User configuration fields were added: virtual machine size; account number; RACF* group name; count of reserved pages; logon time; QUICKDSP status; flag for dialed or SNA user and a V=V, V=F, or V=R flag. The records that were affected are D1/R15, D4/R1, D4/R3, and D4/R9.
- Relative and absolute SHARE settings by user ID were added. These affect records D2/R5, D4/R3, and D4/R9.
- User logoff data and resource consumption statistics from the user activity record (D4/R3) were added to the logoff event record (D4/R2). Now user resource consumption statistics are not lost for the time between the last sample interval and when the user logs off.
- The processor version (determines the type of processor) was added to D1/R5 (for example: 9021-580, not 9021).
- A count of dialed users and a count of SNA users was added to D0/R8.
- SET SRM DSPBUF settings were added to D1/R16 and D2/R7.

SFS Enhancements

Nine new QUERY FILEPOOL commands were added to display the output of the QUERY FILEPOOL STATUS command in a more readable format. QUERY FILEPOOL REPORT displays all of the information that is contained in the QUERY FILEPOOL STATUS output plus additional information. The eight other new commands each display a specified subset of that output.

The additional information provided by QUERY FILEPOOL REPORT includes the following:

The date and time when the file pool server was last started
The date and time that this query report was requested
The date and time of the last control data backup
The maximum number of IUCV and APPC connections that are allowed to the file pool server machine
The number of addressable 4KB blocks in the file pool that are currently defined
Total number of agents
The number of storage groups and minidisks that are in use
Block usage information, aggregated by storage group
Virtual storage size of the file pool server machine
The control minidisk size in 512-byte blocks
Virtual addresses of the control minidisk and the log minidisks

SFS administrator authority is no longer required to enter the QUERY FILEPOOL STATUS command except when the CATALOG option is specified. The same applies to the new QUERY FILEPOOL REPORT command. Of the eight subset commands, only QUERY FILEPOOL CATALOG requires SFS administrator authority.

FBA DASD Considerations

For integrated adapter and control unit DASD, including all models of the 9332, 9335, and 9336 DASD, the hardware updates only the subchannel measurement block I/O request count. It does not update device connect time, device disconnect time, control unit queuing time (pending time), or device active-only time. As a result, performance products show zero or inaccurate values for device service times, utilizations, or response times.

MLOAD statistics, however, contain accurate queuing time for page and spool volumes. CP computes MLOAD statistics by recording the number of paging or spooling requests and the time required to process the requests. These statistics are kept for each page or spool volume. They are maintained by the paging subsystem in CP internal control blocks for the purpose of load balancing. This information is only useful for volumes that just contain page or spool space.

For volumes that do not contain page or spool space, the average queue length for the device can provide some information. The queue length value is from a field in the Real Device (RDEV) control block. However, the RDEV queue length field is not updated for page or spool I/Os. A high queue length could indicate a performance problem. A performance problem due to seek time for I/Os from a single server would not result in a high queue length. Therefore, a low queue length value does not always mean there are no performance problems with the given volume.

Both the MLOAD statistics and the real device queue length are included in monitor data.

Accounting Data Changes

Values in accounting records may change in relationship to other changes in VM/ESA 1.2.0 that involve resource consumption. The degree of change in accounting data is workload dependent.

Virtual Machine Resource Usage Record

The following list describes fields in the virtual machine resource usage accounting record (type 01) that may be affected by performance changes in VM/ESA 1.2.0. The columns where the field is located are shown in parentheses.

Milliseconds of processor time used (33-36): This is the total processor time charged to a user and includes both CP and emulation time. Some CMS changes resulted in increased processor usage while several CP improvements resulted in decreased processor usage. Some system overhead improvements do not show up in normal user type 01 records, but do affect the type 01 record for the system.
Milliseconds of Virtual processor time (37-40): This is the virtual time charged to a user. As mentioned earlier, changes in CMS resulted in increased processor usage. Therefore, the value for this field will probably increase for CMS users.
Total Page Reads (41-44): CMS storage requirements increased this release, so this field will increase for CMS users.
Total Page Writes (45-48): CMS storage requirements increased this release, so this field will increase for CMS users.
Requested Virtual nonspooled I/O Starts (49-52): This is a total count of requests. All requests may not complete. The value of this field will decrease in proportion to the benefit of the CMS record manager (DMSRCM) improvement. See "CMS Record Manager" for details.
Completed Virtual nonspooled I/O Starts (73-76): This is a total count of completed requests. All requests may not complete. The value of this field will decrease in proportion to the benefit of the CMS record manager (DMSRCM) improvement.

Temporary Disk Space Record

The accounting record for temporary disk space (record type 03) was modified because of FBA support. For FBA devices, the number of FBA blocks is given. For CKD and ECKD devices, cylinders remain as the given units. A new field was added that lists the size of the temporary disk space in pages for either type of device.

Back to the Performance Changes Page