z/VM ITR Scaling, z9 to z10

The purpose of this document is to discuss factors that affect z/VM workloads scaling from z9 to z10, the data available to detect these factors, and how to estimate a scaling ratio using these factors.

At the time this document was created, the z/VM LSPR ITR Ratios for IBM Processors showed a z/VM scaling ratio range from the z9 to the z10 of 1.63 on a 1-way to 1.30 on a 16-way. These LSPR measurements were completed on dedicated systems and real storage was increased by the same ratio as processors were increased.

The scaling ratios discussed in this document are measured in dedicated LPARs but not on fully dedicated systems and are thus subject to a variable amount of interference from other LPARs for shared resources. The workloads discussed in this document have widely varying characteristics and demonstrate a number of factors that affect the expected scaling ratio between the z9 and z10. Most ratios fall within the range published in the LSPR data and this document will show workload characteristics that fall in various portions of this range. Workloads that scaled better than the LSPR range were single guest, compute-intensive, with very little storage access and no storage overcommitment. Workloads that scaled worse than the LSPR range contained many of the documented scaling factors and all had storage overcommitment.

Here is a summary of scaling ranges and workload factors that fall into the ranges.

Scaling Ranges	Scaling Factors
Above 1.6	uniprocessor, compute-intensive with very small storage reference pattern and very few system interactions (low T/V ratio)
1.5 to 1.6	MP, non-storage-overcommitment with small data movement or uniprocessor with no storage overcommitment but increased data movement
1.4 to 1.5	small number of processors (i.e., less than 8) with small storage overcommitment and limited data movement
1.3 to 1.4	MP workload with mild storage overcommitment, average storage references, and average system interactions
Below 1.3	MP workloads with high storage overcommitment that causes high storage management activity

Following is more detail about the factors known to affect the scaling ratios. The example used for any individual scaling factor may also be affected by other scaling factors. In these cases, no attempt was made to exactly quantify the effect of each scaling factor.

Number of Processors

Everything else being equal, the scaling ratio varies inversely to the number of processors. This was validated by measuring the same workload with different numbers of processors. Unlike the LSPR workload, storage reference is not increased between the 2-way and the 8-way measurements. The number of processors is available as "CPU online" from the SYSSUMLG Performance Toolkit for VM report.

Workload	Number of Processors	z9 to z10 Scaling Ratio
Apache CPU-intensive	2	1.43
Apache CPU-intensive	8	1.41

Storage References

Everything else being equal, the scaling ratio varies inversely to the amount of storage that is referenced. This was validated by varying the number of URL files that are used for a measurement, the virtual storage size of the servers, and the number of servers. For our workload, the amount of storage being referenced is calculated from the >System< userid information on the UPAGE Performance Toolkit for VM report using the formula (Nr of Users) * ("R<2GB" + "R>2GB"). If all users are not included in this report, an alternate method would be required.

Workload	User Resident Pages	z9 to z10 Scaling Ratio
Apache	5971948	1.41
Apache	39317553	1.23

Data Movement

Everything else being equal, the scaling ratio varies inversely to the amount of data movement. This was validated by measuring our AWM to Apache application using 1 MB files versus using two small files (10 KB and 20 KB). Moving large amounts of data generally requires references to real storage. Not all data movement can be determined from the Performance Toolkit information available for data moved across FICON channels, across virtual networks, and through various communication services.

Workload	Average File Size	z9 to z10 Scaling Ratio
Apache (2 clients and 12 servers)	15 KB	1.53
Apache (2 clients and 12 servers)	1024 KB	1.31

Virtual I/O to Real Devices

Everything else being equal, the scaling ratio varies inversely to the amount of I/O that is issued for real devices. This was validated by changing the amount of data that could be cached in the Linux file caches. Our creation of virtual I/O also caused the amount of referenced storage to be reduced. This should have provided an improved scaling ratio and thus offset some of the effect from the virtual I/Os. The virtual I/O rate is obtained from "Virtual I/O rate" on the CPU Performance Toolkit for VM report.

Workload	Virtual I/Os per Second	z9 to z10 Scaling Ratio
Apache (1.5 GB virtual servers)	26	1.41
Apache (256 MB virtual servers)	515	1.35

Searches

Everything else being equal, the scaling ratio varies inversely to the amount of storage that is referenced by long searches. Although data is not available in Performance Toolkit for VM reports regarding search lengths for most storage services, information is available that implies the length of certain storage management searches so one of these was selected to evaluate the scaling ratio of storage searches. The specific search used for validation was the "Emergency Scan-Page Frames" counters from the DEMNDLOG Performance Toolkit for VM report. Application search may have a higher scaling ratio than the storage management searches because the storage management searches involve storage key manipulations. Everything else being equal, the scaling ratio varies inversely to number of storage key operations.

Workload	Emergency Scan Frames per Second	z9 to z10 Scaling Ratio
Apache (2 clients and 12 servers)	38055	1.53
Apache (2 clients and 12 servers)	12000000	1.31

Storage Overcommitment

Everything else being equal, the scaling ratio varies inversely to the amount of storage overcommitment. This was validated by measuring the same workload in different real storage sizes. Performance Toolkit contains a lot of information dealing with storage overcommitment.

Workload	Storage Size	z9 to z10 Scaling Ratio
CMMA 64 servers	6 GB	1.32
CMMA 64 servers	3 GB	1.23

Last revised March 18, 2008 (Virg)