CP Regression Measurements
This section summarizes z/VM 5.1.0 to z/VM 5.2.0 performance comparison results for workloads that do not have a 2G storage constraint in z/VM 5.1.0. Results for workloads that are constrained in z/VM 5.1.0 are presented in the Enhanced Large Real Storage Exploitation section.
Factors that most affect the performance results include:
- The CP virtual address space no longer has a direct identity mapping to real storage
- The CP control block that maps real storage (frame table) was moved above 2G
- Some additional CP modules were changed to 64-bit addressing
- Some CP modules (including those involved in Linux virtual I/Os) were changed to use access registers to reference guest pages that reside above 2G
What workloads have a below-2G constraint?
The effects of these changes are very dependent on the workload characteristics. Workloads currently constrained for page frames below the 2G line should benefit in proportion to the amount of constraint.
Systems with 2G or less of real storage receive no benefit from these enhancements and may experience some reduction in performance from this support.
Systems with 2G to 4G of real storage need careful evaluation to decide if they will receive a performance benefit or a performance decrease. These environments can have more available storage below 2G than above 2G, which can lead to a constraint for above-2G frames. Storage-constrained workloads can see an increase in the demand scan processing when the number of above-2G frames is not larger than the number of below-2G frames.
At least one of the following characteristics must be present in a pre-z/VM-5.2.0 system to receive a benefit from the z/VM 5.2.0 large real storage enhancements:
- A high below-2G paging rate
- A high expanded storage (Xstor) and/or DASD paging rate combined with a large number of available frames above the 2G line
A high expanded storage and/or DASD paging rate with a low below-2G paging rate and full utilization of the frames above the 2G line may indicate a storage-constrained workload, but not a below-2G-constrained workload.
The Apache workload was used to create a 3G workload that was below-2G-constrained and a 3G workload that was storage-constrained. The below-2G-constrained workload received a benefit from z/VM 5.2.0 but the storage-constrained workload showed a performance decrease.
The following table compares the characteristics of these two workloads, including some measurement data from z/VM 5.1.0 in a 3-way LPAR on a 2064-116 system.
|Server virtual machines||12||3|
|Number of Files||600||5000|
|Location of the file||Linux cache||Xstor MDC|
|Below-2G Page Rate||28||83|
|Xstor Total Rate||26170||29327|
|Resident Pages above 2G||253692||28373|
|Resident Pages below 2G||495504||503540|
|z/VM-5.1.0; 2064-116; 3 dedicated processors, 3G central storage, 8G expanded storage; 1024M server virtual storage size; 1 megabyte URL file size; 1 server virtual processor; 2 client virtual machines; 1 client virtual processor; 1 client connection per server|
z/VM 5.2.0 results for the storage-constrained workload are discussed later in this section. z/VM 5.2.0 results for the below-2G-constrained workload are discussed in the Enhanced Large Real Storage Exploitation section.
Both are measured with 3G of real storage, 8G of expanded storage, 2 AWM client virtual machines, and URL files of 1 megabyte. Both have low below-2G paging rates, high expanded storage paging rates, and no DASD paging. However, the storage-constrained workload is utilizing all the frames above the 2G line while the below-2G-constrained workload is utilizing a very small percentage of the above-2G frames.
For this experiment, location of the files is the primary controlling factor for the results, and location is controlled by the number of files and server virtual storage size.
For the storage-constrained workload, all 600 files reside in the Linux page cache of all 12 servers. Retrieving URL files from these Linux page caches does not require CP to move pages below the 2G line.
For the 2G-constrained workload, most of the 5000 files reside in z/VM's expanded storage minidisk cache because fewer than 1000 will fit in any server page cache and ones not in the file cache must be read by each Linux server. All page frames related to these Linux I/Os are required to be below 2G. Since the majority of page frames are in this category, above-2G storage frames aren't fully utilized.
Regression Workload Expectations
Most workloads that are not 2G-constrained will show an increase in CP CPU time because of the following factors.
- Address translation, including CCW translation, is more costly since CP is no longer identity-mapped to real storage
- Module linkage is more costly because of the mode switches between 31-bit and 64-bit
- Saving and restoring status, including access registers, is more costly with 64-bit addresses
- Trace table pages fill up faster because trace table entries have become larger due to the presence of 64-bit addresses instead of 31-bit addresses. This results in an increase in trace-table-full interrupts
Some performance improvements partially offset these factors. Each specific combination of the CP regression factors and the offsetting improvements will cause unique regression results. Workloads that concentrate on one particular CP service can experience a significantly different performance impact than comprehensive workloads. Storage-constrained workloads in real storage sizes where the number of above-2G frames is not larger than the number of below-2G frames also show higher regression ratios than the more comprehensive workloads.
Virtual time should not be directly affected by these CP changes. Exceptions will be discussed in the detailed sections. Virtual time can be indirectly affected by uncontrollable factors such as hardware cache misses and timer-related activities.
Transaction rate is affected by a number of factors. Workloads currently limited by think time that do not fully utilize the processor capacity generally show very little change. Workloads that currently run at 100% of processor capacity will generally see a decrease in the transaction rate that is proportional to the increase in CPU time per transaction. Workloads currently limited by virtual MP locking may see an increased transaction rate because the Diagnose X'44' fast path can reduce the time it takes to dispatch another virtual processor once the lock is freed.
The following table provides an overall summary of the regression workloads. Values are provided for the transaction rate, total microseconds (µsec) per transaction, CP µsec per transaction, and virtual µsec per transaction. All values are expressed as the percentage change between z/VM 5.1.0 and z/VM 5.2.0. The meaning of "transaction" varies greatly from one workload to another. More details about each individual workload follow the table.
|Apache nonpaging (2064-116)||1.9||4.9||1.7||6.4|
|Apache nonpaging (2094-738)||-1.3||1.7||-0.3||2.3|
|Apache paging Xstor (3G)||-11||11||22||3.5|
|Apache paging mixed (5G)||2.7||2.8||6.2||0.0|
|VSE Guest (PACE)||0.0||2.2||12||0.0|
|Values are expressed as a percentage change from z/VM 5.1.0. Positive numbers in the ETR field mean that z/VM 5.2.0 had a higher transaction rate than z/VM 5.1.0. Positive numbers in the CPU/tx field mean that z/VM 5.2.0 required more processor time than z/VM 5.1.0.|
Regression measurements were completed in a 2064-116 LPAR with 9 dedicated processors, 30G of real storage, and 1G of expanded storage. Regression measurements were also completed in a 2094-738 LPAR with 4 dedicated processors, 5G of real storage, and 4G of expanded storage. z/VM 5.2.0 regression characteristics were better on the 2094-738 than on the 2064-116.
Since these measurements use a 3-way client virtual machine, z/VM 5.2.0 receives some benefit from the Diagnose X'44' (DIAG 44) improvement described in the Extended Diagnose X'44' Fast Path section.
The following table contains selected measurement results for the 2094-738 measurement.
|DIAG 44/sec - Normal||24514||3511|
|DIAG 44/sec - Fast Path||0||42368|
|DIAG 44/sec - Total||24514||45879|
|Resident Pages above 2G||334800||377825|
|Resident Pages below 2G||70675||475|
|Steady-State CPU Util||97.8||98.1|
|2094-738; 4 dedicated processors, 5G real storage, 4G expanded storage; 10 connections|
This is a good regression example of a nonpaging z/VM 5.1.0 MP guest with greater than 2 virtual processors. Although the configuration has 5G of real storage, the resident page count shows that less than 2G are needed for this workload.
The normal-path DIAG 44 rate decreased by 85%, while the DIAG 44 fast-path rate increased from zero to a rate that caused a 87% increase in the overall virtual DIAG 44 rate. This causes a shift of processor time from CP to the virtual machine, resulting in a decrease in CP µsec per transaction but an increase in virtual µsec per transaction. Total µsec per transaction increased by 1.7% and since the base processor utilization was nearly 100%, transaction rate decreased by a similar amount.
Although no run data are included for the 2064-116 measurement, it showed a slight improvement in transaction rate because there were enough idle processor cycles to absorb the increase in total µsec per transaction.
The Apache workload was used to create a z/VM 5.1.0 storage-constrained workload that was measured in different paging configurations. The following table contains the Apache workload parameter settings.
|Server virtual machines||12||12|
|Client virtual machines||2||2|
|Client connections per server||1||1|
|Number of 1M files||600||800|
|Location of the files||Linux cache||Linux cache|
|Server virtual storage||1024M||1024M|
|Server virtual processors||1||1|
|Client virtual processors||1||1|
The specific paging environment was controlled by the following Xstor values.
- 8G for Xstor paging
- 4G for mixed paging
- 0 for DASD paging
The following table contains selected results between z/VM 5.2.0 and z/VM 5.1.0 for the 3G Xstor paging measurements.
|Below-2G Page Rate||28||0|
|Xstor Total Rate||26170||22072|
|Resident Pages above 2G||253692||247390|
|Resident Pages below 2G||495504||501020|
|Steady-State CPU Util||99.8||99.8|
|2064-116; 3 dedicated processors, 3G central storage, 8G expanded storage; 24 connections|
This workload shows a higher increase in CP µsec per transaction than any other measurement in the regression summary table. This is a storage-constrained workload in a configuration where the number of above-2G pages is not larger than the number of below-2G pages. This causes a large increase in the demand scan activity. The reason for this increase is still under investigation. It also shows an increase in virtual µsec per transaction. Total µsec per transaction increased by 11.7% and since the base processor utilization was nearly 100%, there was a corresponding decrease in the transaction rate.
Although the following 5G Apache paging workload is as storage constrained and has as high a paging rate, it does not have the increased demand scan activity because the number of above-2G pages is much larger than the number of below-2G pages.
Although no run data are included for the 3G mixed paging and DASD paging measurements, they show similar characteristics in CP µsec per transaction and virtual µsec per transaction but show a smaller decrease in transaction rate because they are not limited by 100% processor utilization.
The following table contains selected results between z/VM 5.2.0 and z/VM 5.1.0 for the 5G mixed paging measurements.
|Below 2G Page Rate||29||0|
|Xstor Total Rate||13787||16556|
|Xstor Migr Rate||1463||2615|
|Pages Read from DASD||3749||2724|
|Pages Written to DASD||3450||2602|
|Resident Pages above 2G||778626||767286|
|Resident Pages below 2G||489267||497691|
|2064-116; 3 dedicated processors, 5G central storage, 4G expanded storage 24 connections|
The 5G measurement showed some different characteristics from the 3G measurements. The transaction rate increased 2.8% instead of decreasing. Virtual µsec per transaction remained nearly identical instead of increasing. Both CP µsec per transaction and total µsec per transaction increased by a smaller percentage. These results demonstrate the expected regression characteristics of z/VM 5.2.0 compared to z/VM 5.1.0 for this workload.
These Apache measurements use guest LAN QDIO connectivity which contains one of the offsetting improvements. Results with vSwitch, real QDIO, or other connectivity methods may show a higher increase in CP µsec per transaction.
All disk I/O is avoided in the Apache measurements because the URL files are preloaded in either a z/VM expanded storage minidisk cache or the Linux page cache. Three separate disk I/O workloads, each exercising very specific system functions, are discussed in CP Disk I/O Performance.
The minidisk version of the CMS1 workload described in CMS-Intensive (CMS1) was measured in a 2064-116 LPAR with 2 dedicated processors, 1G of real storage, and 2G of expanded storage. Results demonstrate expected regression characteristics of z/VM 5.2.0 compared to z/VM 5.1.0.
The PACE workload described in VSE Guest (DYNAPACE) was measured in a 2064-116 LPAR with 2 dedicated processors, 1G of real storage and no expanded storage. Results demonstrate expected regression characteristics of z/VM 5.2.0 compared to z/VM 5.1.0.