IBM: VMDUMP Performance

VMDUMP Performance

The VMDUMP command is used to dump the memory of a virtual machine. This historically has been very helpful in problem determination. However, the performance of VMDUMP can be unacceptable in some scenarios. In z/VM 6.2, VMDUMP as made interruptible. This support also went out for APAR VM64548 for z/VM 5.4 as PTF UM33400 and z/VM 6.1 as PTF UM33401 (both RSU 1102). Without the interruption support, it is important to understand the impact of using VMDUMP prior to issuing the command. While VMDUMP can be used for any virtual machine, this article will focus on its use for a Linux guest.

In most cases, when performance is a concern, the Linux dump alternative should be used. The key cases where the VMDUMP might be required are:

When DCSS & NSS pages are required in the dump as well.
When a non-disruptive dump is required (a 'snapshot')
When the virtual machine has not been pre-configured to use any other alternative or they do not have a second Linux guest to set up for the alternative. (pre-configuration is simple).

More detail on these cases and the differences can be found below. This document contains the following information:

A comparison between VMDUMP and Linux DASD Dump
A description of a methodology to estimate the size of a dump and the time required for VMDUMP.
A list of references to additional information

Comparison between VMDUMP and Linux DASD Dump

Here is a table that compares and contrasts the Linux and VMDUMP techniques over a number of attributes. Additional information on the Linux method is available in the reference section below.

Attribute	Linux Disk Dump	VMDUMP Dump
Performance
Creating dump	Faster (more details below)	Slower (more details below)
Post-processing dump	???	???
Predictability	The performance is fairly linear based on the I/O performance.	Several factors make VMDUMP processing less predictable. Other sections below will document.
Configuration
Configuration Requirements	Configuration is required to provide a disk of sufficient size. This disk can be used by different Linux virtual machines, but not at the same time. If shared, it has to be of sufficient size for the largest virtual machine.	The z/VM system must be provisioned with enough spool space for at least the largest dump, more if multiple dumps are desired. This space is configured for the system and then available for all virtual machines.
Capability
Saved Segments (DCSS and NSS)	Saved Segments are NOT included in dump when they reside outside the configured storage of the virtual machine.	Saved Segments are included in the dump when the ALL option is used.
Disruptive	The Linux disk dump effectively restarts the virtual machine in the dump process and does not allow a 'snapshot' of the virtual machine where it continues to run.	VMDUMP allows the virtual machine to continue to run after the dump has been completed. This is helpful for scenarios where a 'snapshot' of the virtual machine is required.
Status	Messages issued every 5000 records	None
Pages included	Dumps all pages (except NSS & DCSS), including pages of zeroes.	Does not dump pages of zeroes. Can request ranges of pages.
Size of Virtual Machine	?????????????.	Maximum of 512GB.

Estimating the Time for VMDUMP

This section is presented in two parts: first we estimate the size of the dump and then we estimate the time required to dump that amount of data on a given system. In the second part, an approach is also given to estimate how much longer VMDUMP has to run.

While we have attempted to make this methodology simple, there are a number of steps required. There are two scenarios where you might want to do this estimation. The first is when you are planning your debug strategies, prior to any problems. The other scenario is when you have a particular virtual machine that is currenlty running on the system and the need to capture debug information arises. You might want to know how long VMDUMP will take. As a warning, consider the following -

VMDUMP processing has been seen as high as 40 minutes per GB of the virtual machine.

Estimating the Size of the Dump

In the case where you are planning for virtual machines, you can estimate the size by taking the size of the virtual machine and adding the size of any NSS or DCSSs that would be included in the dump.

In the case where you have a running virtual machine, you can use the output of the INDICATE USER userid EXPANDED command to estimate.

IND USER LNXG7001 EXP Userid=LNXG7001 Mach=XA V=V Attached xstore=NONE Iplsys=DEV 0201 Devnum=36 Spool: Reads=0 Writes=300 Owned spaces: Number=1 Owned size=1G Pages: LockedReal=14 LockedLogical=3 Primary space: ID=LNXG7001:BASE PRIVATE Defined size=1G Address limit=1G Private spaces: Number=1 Owned size=1G Pages: Main=214390 Xstore=7112 Dasd=60035 WS=214376 Reserved=0 ResidentLogical=711 LockedLogical=3 Shared spaces: Number=0 Owned size=0 Pages: Main=0 Xstore=0 Dasd=0 ResidentLogical=0 LockedLogical=0 The above example shows this Linux virtual machine is a 1GB virtual machine. Under Private spaces, we can tell it has
214390 pages resident in memory (Main= field)
7112 pages in expanded storage (Xstore= field)
60035 pages that have been paged out to disk (Dasd= field)
Adding those pages up we get a total of 281537 pages or roughly 1100 MB. Let's also keep track of the pages on disk separately, which is roughly 235MB. This value is needed because the pages need to be paged into real memory before they can be dumped.

Estimating the Time to Dump the Estimated Data

From the earlier step, we determined the number of pages, or size of the data to be dumped. Because of the design of VMDUMP, it does not stream this data out to the spool system. If you use Performance Toolkit or a product that reports similar information, you can get an estimate for page and spool performance. In Performance Toolkit, this information can be found on FCX109 (DEV CPOWN) or FCX146 (AUXLOG) reports.

ElapsedTime = TotalPages * SpoolServTime + DiskPages * PageServTime + TotalPages * 2ms where - ElapsedTime is in milliseconds; divide by 1000 to get seconds SpoolServTime is Spool service time in milliseconds PageServTime is Page service time in milliseconds

Let's assume in our example, our paging devices give 2.6 millisecond service time and spooling devices 1.1 millisecond service time. Plugging the values in the formula above, gives the following results:

elapsed time = (281537 * 1.1 + 60035 * 2.6 + 281537 * 2) / 1000 = 1029 seconds (17 minutes 9 seconds)

In cases where you have already started VMDUMP, you can get a measure of progress by doing a QUERY RDR to see the number of records in the open dump file. If you do multiple QUERY RDRs, taking note of the elapsed time between the QUERY commands, you can get a rate of records per second. This can be used to approximate the remaining time.

There are many reasons why performance of VMDUMP may vary. The following list contains some of the known factors for the variability of VMDUMP performance:

Number of Non-Zero pages
DASD - type/speed/paths, etc.
Spool configuration
System configuration and load (processors and memory)
Location of the virtual machine pages (resident, xstore, DASD)
Master processor contention

References

The following are links to additional reference material:

Using the Dump Tools - Linux Kernel 2.6. Book that describes the Linux Disk Dump tools.
Device Drivers, Features, and Commands - Linux Kernel 2.6. Additional details on Linux commands and diagnostics.

Back to the Performance Tips Page