IBM: z/VM Performance Report

4 TB Real Memory Support

Abstract

With APAR VM66173, z/VM supports a maximum of 4 TB of memory. To validate memory scaling IBM used selected microbenchmarks that validated scaling within memory management. Workloads that exhibited hardware characteristics similar to LSPR workloads scaled well. Workloads that did not scale exhibited higher memory pressure. This was expected as configurations became multi-drawer.

The best way to predict whether your workload will scale is to collect, reduce, and evaluate CPU MF counters. Workloads that remain within L1 to L4 and have low to moderate MEMP can reasonably be expected to scale.

For information on how to collect CPU MF data, see How To Collect CPU MF Counters on z/VM. For information on how to reduce and interpret CPU MF data, see Using CPU Measurement Facility Host Counters.

Method

To decide whether z/VM scales to 4 TB memory, IBM ran workloads that stressed CP as memory was increased, with focus on memory management.

The ability of a workload to scale is not solely controlled by whether z/VM is algorithmically correct. It is also controlled by the workload's habits with respect to memory references, especially when the partition crosses drawers. For more information on how to compare a production workload to IBM LSPR workloads, see the LSPR article Relating production workloads to LSPR workloads. For a general explanation of scaling z/VM workloads, see the Eighty Logical Processors article.

The studies presented here were done with several workloads. All measurements were collected on a 8561-T01. The partition was always dedicated, with all IFL processors. No other partitions were ever activated. The number of processors was varied according to the needs of the particular workload being run. The amount of memory was varied from 512 GB or 1 TB up to 4 TB.

PTOUCH Workload

The PTOUCH workload consists of multiple instances of the PFAULT Linux application, which uses processor cycles and randomly references memory, thereby constantly changing memory pages. The workload parameters were adjusted to produce a maximum paging rate. Configurations measured ranged from 512 GB with 5 IFL cores to 4 TB with 40 IFL cores.

Sweet Spot Non-Paging Workload

The Sweet Spot Non-Paging workload consists of multiple instances of the Virtual Storage Exerciser (VIRSTOR) application, which uses processor cycles and randomly references memory, thereby constantly changing memory pages. The workload parameters were adjusted to produce a steady CPU utilization with no paging. Configurations measured ranged from 1 TB with 10 IFL cores to 4 TB with 40 IFL cores.

Sweet Spot Paging Workload

The Sweet Spot Paging workload consists of multiple instances of the Virtual Storage Exerciser (VIRSTOR) application, which uses processor cycles and randomly references memory, thereby constantly changing memory pages. The workload parameters were adjusted to produce a steady CPU utilization and steady paging rate. The workload parameters were adjusted to produce a memory overcommitment of about 28%. Configurations measured ranged from 1 TB with 10 IFL cores to 4 TB with 40 IFL cores.

CM4 Priming Workload

The CM4 Priming workload is a subset of the Sweet Spot Paging workload. It consists of multiple instances of the Virtual Storage Exerciser (VIRSTOR). The workload parameters were adjusted to produce a maximum paging rate when instantiating a memory overcommitment of about 28%. Configurations measured ranged from 1 TB with 10 IFL cores to 4 TB with 40 IFL cores.

Apache Tiled Workload

The Apache Tiled workload consisted of a number of small groups of client guests and server guests, the members of each group connected to one another through a group-specific vswitch. Such an arrangement is a tiled arrangement. Each client guest sent HTTP transactions to all the server guests on its tile. To ramp up the workload IBM added tiles, processors, and central memory. Non-SMT configurations measured ranged from 1 TB with 20 IFL cores to 4 TB with 80 IFL cores. SMT-2 configurations measured ranged from 1 TB with 20 IFL cores to 2 TB with 40 IFL cores.

Results and Discussion

Table 1 shows the different workloads run.

Table 1. Workload Scaling Results.
Workload	SMT-2	L1MP	RNI	Hint	MEMP	Scale?
PTOUCH	No	about 3.0	0.5 - 1.0	LOW - AVG	about 1.0	yes
PTOUCH	Yes	about 3.2	0.4 - 0.9	LOW - AVG	0.9 - 1.4	yes
Sweet Spot non-paging	No	about 0.005	0.9 - 2.1	AVG	3.3 - 6.7	yes
Sweet Spot non-paging	Yes	about 0.005	0.8 - 1.6	AVG	3.0 - 5.4	yes
Sweet Spot paging	No	about 0.04	0.6 - 0.8	LOW	about 1.3	yes
Sweet Spot paging	Yes	about 0.04	0.4 - 0.7	LOW	1.0 - 1.2	yes
CM4 Priming	No	about 5.0	about 0.5	LOW	about 1.0	yes
CM4 Priming	Yes	about 6.5	about 0.4	AVG	0.7 - 1.4	yes
Apache Tiled	No	about 3.3	3.6 - 4.1	HIGH	about 15.0	no
Apache Tiled	Yes	about 4.3	about 3.3	HIGH	about 13.0	no
Notes: 8561-T01, z/VM 7.2.

The column Hint is the nest pressure as defined by IBM LSPR. This value is based on L1MP (L1 miss percentage) and RNI (Relative Nest Intensity). The majority of the workloads were classified as low to average. The Apache Tiled workload was classified as high. The Scale? column indicated whether the z/VM workload scaled to 4 TB.

In workloads where the hint was LOW or AVERAGE, the z/VM workload scaled well. The Apache Tiled workload did not scale to 4 TB. Rolloff happened when the workload became so large that it crossed drawers. MEMP (percent of L1 misses sourced from memory) was very high and IBM believes this was the cause of the failure to scale.

Summary

With APAR VM66173, z/VM supports a maximum of 4 TB of memory. Workloads that exhibited hardware characteristics similar to LSPR workloads scaled well. Workloads that did not scale exhibited higher memory pressure. This was expected as configurations became multi-drawer.

Contents | Previous | Next