|
Contents | Previous | Next
Performance Improvements
The following items improve performance:
- Enhanced Large Real Storage Exploitation
- QDIO Enhanced Buffer State Management
- Storage Management APARs
- Contiguous Frame Management Improvements
- Extended Diagnose X'44' Fast Path
- Dispatcher Improvements
- Fast Steal from Idling Virtual Machines
- Reduced Page Reorder Processing for High-CPU Virtual Machines
- Improved SCSI Disk Performance
- VM Resource Manager Cooperative Memory Management
- Improved DirMaint Performance
Substantial changes have been made to CP in z/VM 5.2.0 that allow for
much improved exploitation of large real storage sizes.
Prior to z/VM 3.1.0, CP did not support real storage sizes larger than
2 GB -- the extent of 31-bit addressability. Starting with the 64-bit
build of z/VM 3.1.0 through z/VM 5.1.0, CP was changed to provide a
limited form of support for real storage sizes greater than 2G. All CP
code and data structures still had to reside below the 2G line and most
of the CP code continued to use 31-bit addressing. Guest pages could
reside above 2G. They could continue to reside above 2G when
referenced by 64-bit-capable CP code
using access registers but there are only a few places
in CP that do that. Normally, when a guest page that mapped to a real
storage frame above 2G had to be referenced by CP (as, for example,
during the handling of a guest I/O request), it first had to be moved
to a frame below the 2G line before it could be manipulated by the
31-bit CP code. In large systems, this sometimes led to contention for
the limited number of frames below 2G, limiting system throughput.
CP runs in its own address space, called the System
Execution Space (SXS), which can be up to 2G in size. Prior to z/VM
5.2.0, the SXS was identity-mapped -- that is, all logical addresses
were the same as the corresponding real addresses. With z/VM 5.2.0,
the SXS is no longer identity-mapped, thus allowing logical pages in
the SXS to reside anywhere in real storage. Now, when CP needs to
reference a guest page, it maps (aliases) that page to a logical page
in the SXS. This allows most of the CP code to continue to run using
31-bit addressability while also eliminating
the need to move pages to real frames that are below the 2G line.
The CP code and most CP data structures can now reside above 2G. For
example, Frame Table Entries (FRMTEs) can now reside above 2G. These
can require much space because there is one 32-byte FRMTE for every 4K
frame of real storage. The most notable exception is the Page
Management Blocks (PGMBKs), which must still reside below 2G. These
are discussed further under Performance
Considerations.
This change effectively removes the 2G
line constraint and, as test measurements have
demonstrated, allows for effective utilization of real storage up to
the maximum 128 GB currently supported by z/VM.
The Queued Direct I/O (QDIO)
Enhanced Buffer State Management (QEBSM) facility provides
virtual machines running under z/VM an optimized mechanism for
transferring data via QDIO (including FCP, which uses QDIO).
Prior to this new facility, z/VM had
to be an intermediary between the virtual machine and adapter during
QDIO data transfers. With QEBSM, z/VM will not have to get involved
with a typical QDIO data transfer for operating systems or device
drivers that support the facility.
Starting with z/VM 5.2.0 and z990/890 with QEBSM Enablement
applied, a program running in a virtual machine has the
option of using QEBSM to manage QDIO buffers. By using QEBSM for
buffer management, the processor millicode can perform the shadow queue
processing typically performed by z/VM for a QDIO connection. This
eliminates the z/VM and hardware overhead associated with SIE entry and
exit for every QDIO data transfer. The shadow queue processing still
requires processor time, but much less than required when done by the
software. The net effect is a small increase in virtual CPU time
coupled with a much larger decrease in CP CPU time used on behalf of
the guest.
Measurement results show reductions in
total CPU usage ranging from 13% to 36%, resulting in throughput
improvements ranging from 0% to 50% for the measured QDIO, HiperSockets,
and FCP workloads.
There are a number of improvements to the performance of CP's
storage management functions that have been made available to z/VM 5.1.0
(and, in some cases, z/VM 4.4.0) through the service stream. All of
these have been incorporated into z/VM 5.2.0.
VM63636 - This corrects a condition where an excessive number
of frames were being stolen for potential future use in satisfying
contiguous frame requests.
VM63729 - This applies to storage constrained systems that have
more than 2 GB of main storage. Demand scan CPU usage is reduced by
bypassing scan of the frames owned by a user when that user does not
have any frames of the type required (below 2G or above 2G). This
service is also available on z/VM 4.4.0.
VM63730 - This applies to systems that support large amounts of
virtual storage. Contiguous frame management is improved, resulting
in reduced CPU usage and better scaling.
VM63752 - This applies to storage constrained systems that have
more than 2G of main storage and expanded storage available for paging.
Performance is improved by making more extensive use of the above-2G main
storage when pages must be moved from the below-2G main storage. This
service is also available on z/VM 4.4.0.
In addition to VM63636 and VM63730 discussed above, z/VM
5.2.0 includes other improvements that further reduce the time required
to search for contiguous real storage frames.
When running in a virtual machine with multiple virtual processors,
guest operating systems such as Linux use Diagnose x'44' to notify CP
whenever a process within that guest must wait for an MP spin lock.
This allows CP to dispatch any other virtual processors for that guest
that are ready to run.
Prior VM releases include a Diagnose x'44' fast path for the case
of virtual 2-way configurations. When the guest has no other work that
is waiting to run on its other virtual processor (either because it is
already running or it has nothing to do), the fast path applies and CP
does an immediate return to the guest. Normally, most Diagnose x'44s
qualify for the fast path. With z/VM 5.2.0, this fast path has been
extended so that it applies to any virtual multiprocessor
configuration.
The fast path improves guest throughput by reducing the average
delay between the time that the guest lock becomes available and the
time the delayed guest process resumes execution. It also reduces the
load on CP's scheduler lock, which can improve overall system
performance in cases where there is high contention for that lock.
See Extended Diagnose X'44' Fast Path for
further discussion and measurement results.
APAR VM63687, available on z/VM 5.1.0, fixes a dispatcher problem that
can result in long delays right after IPL of a multiprocessor z/VM
system while a CP background task is completing initialization of
central storage above 2 GB. This fix has been integrated into z/VM
5.2.0.
The dispatcher was changed so to reduce the amount of non-master
work that gets assigned to the master processor, thus allowing it to
handle more master-only work. This can result in improved throughput
and processing efficiency for workloads that cause execution in
master-only CP modules.
When a virtual machine becomes completely idle (uses no CPU),
it quickly gets moved to the dormant list and, if central storage is
constrained, its frames are quickly stolen and made available for
satisfying other paging requests. A CMS virtual machine is a good
example of this case. Most virtual machines that run servers and
guest operating systems, however, do not go completely idle when they
run out of work. Instead, they typically enter a timer-driven polling
loop. From CP's perspective, such a virtual machine is still active and
frames are stolen from it just like any other active virtual machine.
This is based on the frames' hardware reference bit settings, which are
tested and reset each time that virtual machine's frames are examined
by CP's reorder background task.
Prior to z/VM 5.2.0, active virtual machines that use little
CPU are infrequently reordered. As a result, they tend to keep
their frames for a long time even if the system is storage
constrained. With z/VM 5.2.0, the reorder task is run more frequently
for such virtual machines so that their frames can be stolen more
quickly when needed. This is done by basing reorder frequency on how
much CPU time is made available to a virtual machine instead of the
amount of CPU time it actually consumes. This change can result in a
significant reduction in total paging for storage-constrained systems
where a significant proportion of real storage is used by guest/server
virtual machines that frequently cycle between periods of idling and
activity.
The improvement described above also reduces how frequently the
frames of high-CPU usage virtual machines are reordered, thus reducing
system CPU usage. In prior releases, such virtual machines were being
reordered more frequently than necessary.
z/VM-owned support for SCSI disks (via FBA emulation) was introduced
in z/VM 5.1.0. Since then, improvements have been made that
reduce the amount of processing time required by this support. One
of these improvements is available on z/VM 5.1.0 as APARs
VM63725 and VM63534 (as part of DS8000 support) and is now integrated
into z/VM 5.2.0. It can greatly improve the performance of I/Os to
minidisks on emulated FBA on
SCSI devices for CMS or any other guest application that uses I/O buffers
that are not 512-byte aligned. Total CPU time was decreased 10-fold
for the Xedit read workload used to evaluate this improvement.
Additional changes to the SCSI code in z/VM 5.2.0 have improved the
efficiency of CP paging to SCSI devices. Measurement results show a
14% decrease in CP CPU time per page read/written for an example
workload and configuration.
For further information on both of these improvements, see
CP Disk I/O Performance.
VM Resource Manager Cooperative Memory Management (VMRM-CMM) can be
used to help manage total system memory constraint in a z/VM system.
Based on several variables obtained from the System and Storage domain
CP monitor data, VMRM
detects when there is such constraint and requests the Linux guests to
reduce use of virtual memory. The guests can then take appropriate
action to reduce their memory utilization in order to relieve this
constraint on the system. When the system constraint goes away, VMRM
notifies those guests that more memory can now be used. For more
information, see
VMRM-CMM.
For measurement evaluation results, see Memory
Management: VMRM-CMM and CMMA.
Changes to an existing directory statement can, in many cases, be
put online very quickly using Diagnose x'84' -
Directory-Update-In-Place. However, it is not possible to add a new
directory statement or delete an existing directory statement using
Diagnose x'84'. In such cases, prior to z/VM 5.2.0, DirMaint had to
run the DIRECTXA command against the full source directory to rebuild
the entire object directory. For large directories, this can be quite
time-consuming.
In z/VM 5.2.0, CP and DirMaint have been updated to allow virtual
machine directory entries to be added, deleted, or updated without
having to rewrite the entire object directory. Instead, this is done
by processing a small subset of the source directory. This can
substantially improve DirMaint responsiveness when the directory is
large.
Contents | Previous | Next
|