VM/ESA 1.2.0 Performance Changes
- Performance Improvements
- Performance Considerations
- Performance Management
Performance Improvements
The code that handles short-term free storage requests was redesigned
to reduce the processor requirements of this CP function.
The processing time in HCPFRE was reduced by over one-third.
The degree of benefit is proportional to the frequency of free storage
requests.
In addition, two new macros were created that normally obtain or
release a block of free storage without calling HCPFRE.
These macros are intended for use by performance-critical CP components
and are not for general use.
They are currently exploited by three other performance enhancements:
HCPCFR (described in "Reduced Master Processor Usage"), IUCV (described in
"IUCV Storage Management"), and fast path CCW translation (described in
"Fast Path CCW Translation Extensions").
In VM/ESA Version 1 Release 1.1, one of the several changes
to IUCV that led to improved
performance was the more efficient storage management of control blocks
MSGBK and IUSBK.
Storage management was improved by making the control blocks
semi-permanent and by exploiting stack management.
This reduced the processor usage of HCPFRE.
However, this improvement caused short term garbage collection to be
less efficient, because the control blocks were being held for several
minutes (long-term).
This caused storage fragmentation.
In an attempt to improve short term garbage collection in VM/ESA 1.2.0,
the control blocks were obtained from long-term storage.
But large systems experienced large-system effects in the long-term
storage algorithm when obtaining long-term storage.
This caused unacceptable response times and internal throughput rate.
APAR VM54161 to VM/ESA 1.2.0
corrects three problems by using inline macros
to get the control blocks out of short term storage.
- Large-system effect
This is avoided because the control blocks come out of short-term
storage.
There is no net improvement over VM/ESA 1.1.1.
However, avoiding the large-system effect is the primary reason for
this APAR.
- HCPFRE calls
By using the inline macros, calls are avoided to HCPFRE.
This is an improvement over VM/ESA 1.1.1.
- Short-term garbage collection
Storage fragmentation does not occur, because storage is no longer held
long-term.
This is an improvement over VM/ESA 1.1.1.
For more information on the inline macros, see "CP Free Storage".
In VM/ESA, one method of serialization is to limit certain functions to
run on only the master processor.
Only one processor is designated as the master in VM/ESA.
The IPLed processor is usually the master.
However, if the IPLed processor has a special feature (vector or
crypto) installed, then another processor without a special feature is
selected as the master.
The master can also be changed by disabling processors (using the VARY
OFFLINE PROCESSOR command).
You can determine which is the master processor by issuing the CP QUERY
PROCESSORS command.
Because there is only one master processor, it has the potential to be
a bottleneck if the demand for master-only work is great.
As the number of processors in the complex increases, the potential for
a master processor bottleneck increases.
For example, a master processor bottleneck is more likely on a 6-way
system than on a 2-way system.
Two changes were made in VM/ESA 1.2.0 that reduce the master processor
requirements.
Neither change involved moving work off the master, because moving work
off the master would involve replacing the serialization methodology, a
nontrivial task.
Instead, these changes improve the efficiency of processing on the
master.
As a result, they also have a positive effect for environments that are
not master processor constrained.
The two items are as follows:
- HCPCNF improvement - DIAGNOSE code X'08' handling.
- HCPCFR improvement - storage management for the console formatter
The HCPCNF module formats console output from DIAGNOSE
code X'08' processing.
This DIAGNOSE code issues CP commands.
The modification applies to console output that goes to a buffer, not
the user's screen.
This would apply to programs that use EXECIO or CMS Pipelines to
capture output from CP commands.
CP takes special precautions to prevent a single user from flooding the
system if the user issues DIAGNOSE code X'08's that generate a
lot of output.
Prior to VM/ESA 1.2.0, each time this DIAGNOSE code was issued, CP opened a
dispatch window from HCPCNF to allow another user to run.
This kept the virtual machine that repeatedly issued the DIAGNOSE code
from flooding the system with work.
In VM/ESA 1.2.0, the frequency of HCPCNF opening the dispatch window is
changed.
Instead of opening the dispatch window every time, it is opened
approximately every sixteenth time.
This eliminates most of the overhead while maintaining the protection
feature.
The HCPCFR module is not a master-only module.
However, it is usually called by master-only modules, and therefore
ends up completing on the master processor.
This module handles console output formatting related to CP commands.
In the OfficeVision* environment, it is called frequently.
The improvement in HCPCFR involved replacing costly CP free storage
calls with the new inline macros.
These inline macros are new in VM/ESA 1.2.0.
See the "CP Free Storage" section for additional information on
them.
Improving the efficiency of master processor work reduces processor
consumption.
In addition, master processor constrained environments may also see
significant improvement in response times.
However, the great majority of systems are not master processor
constrained.
A system probably has a meaningful master processor constraint if the
processor utilization of the master processor is much higher than the
utilization of the other processors, and if the percent emulation time
and idle time on the master processor is less than 5%.
This enhancement applies to VM systems that run with user state
sampling (MONITOR SAMPLE ENABLE USER).
When user state sampling is enabled, HCPMOU is called once for every
logged on or disconnected user at the end of each high frequency sample
interval.
In VM/ESA 1.2.0, HCPMOU no longer calls HCPFRE to obtain or release free
storage for a module work area.
Instead, HCPMOU uses leftover storage in the save block (used for
linkage).
Reducing the number of free storage calls reduces processor usage and
improves response time.
The number of calls avoided is proportional to the number of users and
the monitor sample rate.
The DASD slot allocation algorithm, which gives out page and spool
slots, was redesigned to take advantage of block paging and to reduce
I/O seek times by increasing the number of contiguous DASD slots.
As in the past, a moving cursor approach is used.
However, the new scheme allocates contiguous slots as much as possible
by scanning ahead of the cursor for groups of contiguous available
slots of appropriate sizes that reside on the same cylinder.
This is accomplished in two ways:
- These groups of contiguous available slots have a minimum size of
two.
anything less than this minimum size is ignored.
The one with the best fit is selected.
- Allocation no longer switches volumes in the middle of allocating a
block.
Instead it may switch a little prematurely or a little late.
Environments that do significant paging to DASD should see an
improvement in response time because of faster page fault resolution
and a reduction in I/Os on the paging volumes.
The benefits increase as the system ages.
For a CMS-intensive benchmark, the average pages per page I/O increased
13% as a result of this improvement.
Fast path CCW translation was introduced as APAR VM51012 to VM/ESA 1.1.1.
Fast path CCW translation was extended in VM/ESA 1.2.0 to include support
for FBA DASD.
In addition, it now applies to more types of DASD channel programs.
Fast path CCW translation also takes advantage of two new macros that
normally obtain or release free storage without having to call HCPFRE.
Typical DASD I/O such as SSCH (Start Subchannel), DIAGNOSE
code X'A8', SIO (Start I/O), and SIOF (Start I/O fast) will
benefit from this.
However, not all I/O is DASD I/O.
For example, if the system is reporting a high rate of DIAGNOSE
code X'A8's, many of these could be for unit record devices,
which do not benefit from this improvement.
In VSE guest environments that are heavy in DASD I/O, a significant
performance improvement can be gained by this enhancement.
In a CMS-intensive environment, where most minidisk I/Os are diagnose
code X'A4's and SFS I/Os are block I/O, the performance
improvement is less significant.
Fast path CCW translation does not apply to these cases because they
take a different path through CP's I/O subsystem.
The input to CP is not an actual channel program, but rather input from
which CP builds the channel program.
CP initialization establishes a minor time slice for the dispatcher to
use when running virtual machines.
This code calculates the amount of time to complete about
100 000 selected instructions.
The result is used as the dispatching minor time slice.
Installations can change the size of the time slice with the CP SET SRM
DSPSLICE command, but a reasonable default value is important.
As the minor time slice becomes too small, the cost to run and to stop
running users goes up.
As the time slice is increased, the responsiveness of the system may
decrease.
Before VM/ESA 1.1.1, the minimum value for the minor time slice was 1
millisecond.
If the CP initialization code calculated a value less than 1
millisecond, CP used 1 millisecond instead.
On some large S/390* processors, this led to minor time slices of 2
milliseconds or less.
Such a small value caused a noticeable increase in the CP processor
time needed to run and to stop running users.
In VM/ESA 1.1.1, the minimum default time slice was changed to 5
milliseconds because of this problem.
Even after this change, the time slice value for some of the smaller
S/390 processors remained too small.
Many of the instructions used by the initialization code to determine
the minor time slice were too short relative to a typical instruction
mix and this discrepancy was particularly pronounced on smaller S/390
processors.
Although the default minor time slice for these processors fell above
the 5 milliseconds minimum introduced
in VM/ESA 1.1.1, the value was still
too low for efficient dispatching of work.
VM/ESA 1.2.0 changes the loop of instructions used to determine the minor
time slice.
The number of instructions completed is still about the same, but on
average the instructions are longer.
This loop produces a reasonable minor time slice value for the whole
range of S/390 processors.
The minimum default time slice is still 5 milliseconds, and the maximum
remains at 100 milliseconds.
There are two principal benefits of this change:
- Installations with small S/390 processors do not have to issue the
CP SET SRM DSPSLICE command to achieve reasonable dispatching costs.
- Installations with small S/390 processors that do not issue CP SET
SRM DSPSLICE to increase the minor time slice may see lower CP
processor costs.
When the DSPSLICE is increased, workloads consisting of long-running
transactions experience the largest dispatching efficiency benefits.
The DSPSLICE value is reported by monitor data and also by the QUERY
SRM DSPSLICE command, with less precision.
In prior releases, a virtual machine could request excessive amounts of
CP free storage, which could disable the system and be seen as
performance degradation or empty available list.
This situation could be caused by:
- Repeatedly issuing CP commands that consume free storage
- Performing tasks in a disconnected machine that caused large
amounts of console output to be routed to a secondary user
- Accounting or EREP records not being retrieved by the accounting or
EREP virtual machine.
In VM/ESA 1.2.0, the free storage limit detection function tracks free
storage so that a virtual machine cannot request excessive amounts.
Three thresholds are calculated based on the size of the dynamic paging
area.
If the virtual machine exceeds the first threshold, it receives a
warning message.
If the virtual machine exceeds the second threshold, it is put into
stopped state.
If the virtual machine exceeds the third threshold, it is forced off
the system.
Messages are sent to the virtual machine and the operator user ID for
each threshold crossed.
Four new commands were added to the CP Commands.
The QUERY STGLIMIT command (privilege classes: A, B, C, and E) checks
the state of free storage limit detection for the system.
The SET STGLIMIT command (privilege classes: A, B, and C) controls the
state of free storage limit detection for the system.
The QUERY STGEXEMP command (privilege classes: A, B, C, and E) checks
the state of free storage limit detection for virtual machines.
The SET STGEXEMP command (privilege classes: A, B, and C) controls the
state of free storage limit detection for virtual machines.
Exempting a user ID ensures that the user ID is not subject to being
stopped or forced off because of the amount of free storage it causes
CP to consume.
This is recommended for special purpose user IDs that are vital to the
installation, user IDs running trusted code, and user IDs that should
never be forced off the system.
If no action is taken, free storage limits are enforced for all users.
For VM/ESA 1.2.0, the value that CP uses to determine the size of the frame
table to build has changed.
Prior to VM/ESA 1.2.0, CP used the RMSIZE value generated in HCPSYS.
In VM/ESA 1.2.0, CP builds the frame table for the smaller value of either
the actual real storage or RMSIZE of the SYSTORE macro in HCPSYS.
With this improvement, the system may use less fixed real storage for
the frame table.
The reported SYSGEN value for the QUERY FRAMES command was changed to
reflect the smaller value of either actual real storage or RMSIZE.
The frame table is no longer built as part of the CP nucleus.
It is built in the dynamic paging area.
This reduces the size of the CP nucleus, which reduces the time to read
the nucleus at system IPL.
The format for accessed read-only CMS minidisk directories FSTs (File
Storage Table) was improved.
The old layout, called hyperblock format, alternated a 30-byte header
with a block of FSTs, where the size of the block equaled the minidisk
blocksize.
The new layout has one header followed by all the FSTs.
These FSTs are organized so that the block of FSTs in the hyperblock
map are aligned by page boundary.
This structure was in use, for some time, for the S and Y disks.
During FST lookup, the hyperblock map pointed to the page of desired
FSTs.
If the desired page doesn't exist, only one page of FSTs would be
referenced.
Previously, the hyperblock of FSTs would span pages causing two pages
to be referenced.
For 4KB-blocked minidisks with an average-to-large number of files,
this new layout typically eliminates one referenced nonshared page per
read-only minidisk (other than S and Y) in the search order.
Note, this applies only to read-only minidisks, not read-only SFS
directories.
Improvements were made to the CMS record manager (DMSRCM) to increase
from 8 to 500 the maximum number of 4KB blocks that DMSRCM reads or
writes with one DIAGNOSE code X'A4'.
This can result in reduced DIAGNOSE code X'A4's, processor
usage, I/O time, and device usage.
This improvement applies to minidisk files used by applications (like
XEDIT) that specify a large buffer.
Most applications are unaffected by this improvement, and it does not
apply to SFS.
Since CMS 5.5, page boundary alignment was forced for storage requests
greater than or equal to 4KB.
In VM/ESA 1.2.0, this is no longer done.
As a result, applications that do a large number of requests greater
than or equal to 4KB may see a reduction in virtual or real storage
requirements and paging.
However, if no such requests are made, it costs the CMS user 1 more
referenced nonshared page.
For those applications that require page boundary alignment, CMSSTOR
has an option to do this.
CMS provides a multitasking environment for applications and servers.
The multitasking facilities are available only at the application
programming level and are provided as routines in a callable service
library.
The CMS user still runs one application at a time, but the application
can split itself into multiple threads, and the multiple threads can be
dispatched on multiple virtual processors.
These multitasking facilities allow applications to harness the power
of the underlying multiprocessor complex and to overlap operations,
achieving high performance.
Multiprocessor exploitation is supported in XA or XC mode virtual
machines only.
In VM/ESA 1.2.0, improvements were made to the processing of the NAMES file.
These improvements include caching the NAMES file in storage and
optimizing the search for tags.
The degree of improvement depends on the size of the NAMES file and the
number of names requested; the larger the NAMES file and number of
requested names, the larger the improvement.
REXX storage management APARS VM47302 and VM50916 for VM/ESA 1.1.1 can
improve performance.
These REXX storage management improvements are part of VM/ESA 1.2.0.
Most of the performance change in REXX from VM/ESA 1.1.1 to VM/ESA 1.2.0 is
because of these changes.
The improvements are to obtain or release storage in larger,
page-aligned areas.
By obtaining larger areas, there are fewer calls to CMS
storage management.
This reduces pathlength, but depending on the REXX program, some
additional virtual storage may be required.
By obtaining page-aligned areas, a header address can be
calculated by clearing the low order bits rather than scanning a chain.
Removing this scan can reduce pathlength and thrashing when REXX
releases variable storage.
The REXX VALUE function was extended in VM/ESA 1.2.0 to permit manipulation
of global variables (as is done by the CMS GLOBALV command).
Prior to this, manipulation of global variables from REXX required use
of the GLOBALV CMS command.
The VALUE function is much faster than GLOBALV when there is only one
variable involved.
When working with a list of variables, the GLOBALV command has an
advantage because it supports a list of variable names.
The checkpoint process consists of writing back to DASD all changed
catalog blocks and releasing all shadow blocks allocated since the last
checkpoint.
The checkpoint routine serializes the operation of the server before
beginning this process.
The checkpoint process was changed in two ways to reduce the amount of
time that the server is serialized.
First, the writeback of changed catalog buffers now exploits multiblock
block I/O.
Second, writeback of changed catalog buffers is now done before
checkpoint serialization starts.
This is called preflush.
Any catalog buffer changes that may have occurred after the preflush
and before the checkpoint is serialized are still written back during
the serialization period.
These improvements benefit users of read/write SFS file pools.
The benefit in reduced checkpoint serialization time results in a lower
average response time, a more consistent response time, and a reduced
SFS server working set.
In addition, the reduced checkpoint time allows a larger CATBUFFERs
setting for better exploitation of large real memories.
From QUERY FILEPOOL STATUS, checkpoint duration can be calculated by
dividing checkpoint time by checkpoints taken.
The SFS catalog insert algorithm was changed to remember the location
of the last inserted row and begin the search for space at that
location.
The benefits from this change are reduced processor usage and, in some
cases, such as very large catalog spaces, reduced I/Os.
The degree of benefit is proportional to the rate of inserts into the
catalog and the catalog size.
Typically, most catalog inserts come from GRANT AUTH and files that are
larger than 32KB.
The SFS log manager was enhanced to exploit multiblock block I/O.
This enhancement eliminates log writes because of a full log buffer.
Instead, the full log buffer is written at the next commit along with
the buffer containing the commit log record.
In addition, the log manager routines were combined and streamlined for
reduced pathlength.
These changes apply to users of read/write file pools and result in
fewer log I/Os and reduced processor usage.
The default file cache size for SFS files (a CMS nucleus generation
option) was increased from 12KB to 20KB.
This has the advantage of reduced server communication and I/Os, but
increases virtual storage requirements and may increase paging.
The 20KB default should be a better trade-off for most installations.
Performance Considerations
The CMS application multitasking code is in a callable services library
called VMMTLIB.
It is important to save VMMTLIB in a shared segment whether CMS
application multitasking is in use or not, in order to keep virtual and
real storage requirements to a minimum.
CMS virtual storage requirements for VM/ESA 1.2.0 increased by about 25
pages.
The majority of this increase is due to CMS application multitasking.
These pages are referenced at CMS initialization or when CMS
multitasking is used.
For workloads that do not use multitasking, these pages are migrated
out to DASD, causing an increase in DASD slot usage and an increase in
paging.
The performance of the CMS productivity aids changed.
Most of the commands were rewritten in REXX, causing an increase in
pathlength.
This tends to be balanced by the NAMEFIND improvement discussed earlier.
Because FBA devices are, in general, slower than CKD devices, minidisk
caching is particularly beneficial for reducing the response times of
FBA DASD. To use minidisk caching with FBA minidisks, it is important
to format the CMS minidisks with a block size of 4KB. However, the CMS
default block size for FBA devices is 1KB. Therefore, 4KB blocking
must be specified explicitly. In addition to this requirement, when
CMS minidisks are allocated on an FBA DASD volume, they must begin on a
4KB page boundary (that is, a block address that is divisible by 8).
The LOCATEVM and LOCATE CPReal/LOCATE CPVirtual commands permit users
to search storage.
Both commands can consume very large amounts of resources when the
given range for the locate is large.
The LOCATEVM command is a class G command and therefore permits any end
user to impact system performance.
User class restructure could be used to change the class of LOCATEVM
command to avoid potential problems.
Performance Management
A number of new monitor records and fields have been added. Some of
the more significant changes are summarized below. For a complete list
of changes, see the MONITOR LIST1403 file for VM/ESA 1.2.0.
For information about the content and format of the monitor records
file, see the VM/ESA: Performance book.
- LPAR Monitor Enhancements
Prior to VM/ESA 1.2.0, monitor reported partition processor consumption, but
did not report LPAR management time (the processor busy time that is
not charged to any given partition). Monitor was enhanced to report
LPAR management time. A flag in monitor domain 0 record 15 (D0/R15)
tells whether the information is reported by the hardware or not, and a
new record (D0/R17) reports the management time per physical processor.
There is a new flag in D0/R15 that reports whether capping is in
effect. Capping limits the processor resources that a partition may
consume.
- ESCON* Multiple Image Facility (EMIF) Monitor Enhancements
EMIF provides the capability to share a physical ESCON channel among
multiple logical partitions running on the same processor. Prior to
EMIF, processor channels were dedicated to individual logical
partitions. The new EMIF monitor information (found in D0/R18) reports
busy time because of the partition in which VM/ESA is running.
- Additional Monitor Enhancements
The following fields were added:
- User configuration fields were added: virtual machine size;
account number; RACF* group name; count of reserved pages; logon time;
QUICKDSP status; flag for dialed or SNA user and a V=V, V=F, or V=R
flag. The records that were affected are D1/R15, D4/R1, D4/R3, and
D4/R9.
- Relative and absolute SHARE settings by user ID were added.
These affect records D2/R5, D4/R3, and D4/R9.
- User logoff data and resource consumption statistics from the user
activity record (D4/R3) were added to the logoff event record (D4/R2).
Now user resource consumption statistics are not lost for the time
between the last sample interval and when the user logs off.
- The processor version (determines the type of processor) was added
to D1/R5 (for example: 9021-580, not 9021).
- A count of dialed users and a count of SNA users was added to D0/R8.
- SET SRM DSPBUF settings were added to D1/R16 and D2/R7.
Nine new QUERY FILEPOOL commands were added to display the output of
the QUERY FILEPOOL STATUS command in a more readable format. QUERY
FILEPOOL REPORT displays all of the information that is contained in
the QUERY FILEPOOL STATUS output plus additional information. The
eight other new commands each display a specified subset of that output.
The additional information provided by QUERY FILEPOOL REPORT includes
the following:
- The date and time when the file pool server was last started
- The date and time that this query report was requested
- The date and time of the last control data backup
- The maximum number of IUCV and APPC connections that are allowed to
the file pool server machine
- The number of addressable 4KB blocks in the file pool that are
currently defined
- Total number of agents
- The number of storage groups and minidisks that are in use
- Block usage information, aggregated by storage group
- Virtual storage size of the file pool server machine
- The control minidisk size in 512-byte blocks
- Virtual addresses of the control minidisk and the log minidisks
SFS administrator authority is no longer required to enter the QUERY
FILEPOOL STATUS command except when the CATALOG option is specified.
The same applies to the new QUERY FILEPOOL REPORT command. Of the
eight subset commands, only QUERY FILEPOOL CATALOG requires SFS
administrator authority.
For integrated adapter and control unit DASD, including all models of
the 9332, 9335, and 9336 DASD, the hardware updates only the subchannel
measurement block I/O request count. It does not update device connect
time, device disconnect time, control unit queuing time (pending time),
or device active-only time. As a result, performance products show
zero or inaccurate values for device service times, utilizations, or
response times.
MLOAD statistics, however, contain accurate queuing time for page and
spool volumes. CP computes MLOAD statistics by recording the number of
paging or spooling requests and the time required to process the
requests. These statistics are kept for each page or spool volume.
They are maintained by the paging subsystem in CP internal control
blocks for the purpose of load balancing.
This information is only useful for volumes that just contain page or
spool space.
For volumes that do not contain page or spool space, the average queue
length for the device can provide some information. The queue length
value is from a field in the Real Device (RDEV) control block.
However, the RDEV queue length field is not updated for page or spool
I/Os. A high queue length could indicate a performance problem. A
performance problem due to seek time for I/Os from a single server
would not result in a high queue length. Therefore, a low queue length
value does not always mean there are no performance problems with the
given volume.
Both the MLOAD statistics and the real device queue length are included
in monitor data.
Values in accounting records may change in relationship to other
changes in VM/ESA 1.2.0 that involve resource consumption.
The degree of change in accounting data is workload dependent.
The following list describes fields in the virtual machine resource
usage accounting record (type 01) that may be affected by performance
changes in VM/ESA 1.2.0.
The columns where the field is located are shown in parentheses.
- Milliseconds of processor time used (33-36)
- This is the total processor time charged to a user and includes
both CP and emulation time. Some CMS changes resulted in increased
processor usage while several CP improvements resulted in decreased
processor usage. Some system overhead improvements do not show up in
normal user type 01 records, but do affect the type 01 record for the
system.
- Milliseconds of Virtual processor time (37-40)
- This is the virtual time charged to a user. As mentioned earlier,
changes in CMS resulted in increased processor usage.
Therefore, the value for this field will probably increase for CMS
users.
- Total Page Reads (41-44)
- CMS storage requirements increased this release, so this field will
increase for CMS users.
- Total Page Writes (45-48)
- CMS storage requirements increased this release, so this field will
increase for CMS users.
- Requested Virtual nonspooled I/O Starts (49-52)
- This is a total count of requests. All requests may not complete.
The value of this field will decrease in proportion to the benefit of
the CMS record manager (DMSRCM) improvement. See "CMS Record Manager"
for details.
- Completed Virtual nonspooled I/O Starts (73-76)
- This is a total count of completed requests. All requests may not
complete. The value of this field will decrease in proportion to the
benefit of the CMS record manager (DMSRCM) improvement.
The accounting record for temporary disk space (record type 03) was
modified because of FBA support.
For FBA devices, the number of FBA blocks is given.
For CKD and ECKD devices, cylinders remain as the given units.
A new field was added that lists the size of the temporary disk space
in pages for either type of device.
Back to the Performance Changes Page
|