Contents | Previous | Next

Performance Toolkit for VM and PAV

With VM63855, the z/VM Control Program (CP) now uses PAV aliases to perform I/O to volumes attached to SYSTEM but not present in the CPOWNED list. Practically speaking, this means that CP now uses PAV to parallelize guest I/O to minidisks.

Performance Toolkit does not accurately calculate I/O response time when CP is using PAV. In particular, I/O wait time (some folks call this queue time) is calculated incorrectly. This throws off the ensuing I/O response time calculation. In this brief writeup we describe the problems and illustrate how customers can calculate corrected numbers if they choose.

Note the following discussion does not apply to the case where PAV alias devices are attached or dedicated to guests. This discussion applies only to disk volumes where CP is using PAV to parallelize disk I/O to real volumes attached to SYSTEM and not in the CPOWNED list. In such a scenario, the PAV base device and all of its alias devices are attached to SYSTEM and varied on.

General Problem

The general problem is that Performance Toolkit does not account for the idea that CP uses a single wait queue to hold I/Os waiting for a given real disk volume, no matter how many PAV aliases CP might be using for that volume. CP uses the PAV base device to queue waiting I/Os, but it uses the base or any of its aliases to perform I/Os as they arrive at the front of the wait queue. This is like "bank teller" queueing in that there is a single wait queue and a multitude of servers that can process items on the queue.

For example, suppose we have real volume 1000 attached to SYSTEM but not in the CPOWNED list. If there are user minidisks on real volume 1000 and if real volume 1000 has aliases 1001 and 1002, I/Os to those minidisks would always queue on 1000, but CP might perform them on 1000, 1001, or 1002. In calculating performance statistics for 1000, 1001, and 1002, Performance Toolkit must take into account that these three devices are working the single wait queue formed behind base device 1000.

What Specifically is Wrong?

Performance Toolkit's reports and screens that comment on a disk's I/O response time are incorrect when the disk is part of a CP-managed PAV group and a wait queue is forming for the volume. There are two problems in this scenario. First, for the PAV base device, the I/O response time that Performance Toolkit calculates is too large. This is because Performance Toolkit fails to account for CP using PAV alias devices to help service I/Os waiting on the base base device's wait queue. Second, for PAV alias devices, Performance Toolkit reports I/O response time as equal to I/O service time, because it sees no wait queues on the alias RDEVs and therefore calculates there is no I/O wait time. Performance Toolkit fails to recognize that I/Os queued for the PAV group queue only on the base RDEV's wait queue.

Note that if no wait queue is forming at the volume, Performance Toolkit's calculation of I/O response time is correct.

Example

For example, consider the following excerpt from the DEVICE report (aka FCX108) for one of our recent measurements. In this experiment, we had base device 3702 with alias devices 37F4, 37F5, 37F6, and 37F7. (Notice Performance Toolkit's use of "->" notation to express the alias-to-base relationship.) For the sake of presentation in this article, we have truncated the report on the right, beyond the "requests queued" column.

1FCX108 Run 2006/03/30 21:53:31 DEVICE General I/O Device Load and Performance From 2006/03/30 17:28:22 To 2006/03/30 17:39:22 For 660 Secs 00:11:00 Result of N327A044 Run _______________________________________________________________________________ . . . ___ . . . . . . . . <-- Device Descr. --> Mdisk Pa- <-Rate/s-> <------- Time (msec) -------> Req. Addr Type Label/ID Links ths I/O Avoid Pend Disc Conn Serv Resp CUWt Qued 3702 3390-3 BWPVS2 40 2 183 .0 2.4 .0 2.5 4.9 78.3 .0 13.4 37F5 ->3702 BWPVS2 40 2 182 .0 2.4 .0 2.5 4.9 4.9 .0 .0 37F4 ->3702 BWPVS2 40 2 182 .0 2.4 .0 2.5 4.9 4.9 .0 .0 37F7 ->3702 BWPVS2 40 2 182 .0 2.4 .0 2.5 4.9 4.9 .0 .0 37F6 ->3702 BWPVS2 40 2 182 .0 2.4 .0 2.5 4.9 4.9 .0 .0

In this excerpt, we see Performance Toolkit showing base device 3702 having an I/O wait time of (78.3 - 4.9) = 73.4 msec while its aliases had no wait time at all. Performance Toolkit reached this erroneous conclusion by calculating the base's wait time by Little's Law, namely, [ (13.4 items queued) / (183 I/Os per sec) ] = 73.4 msec of wait time. This calculation of I/O wait time is in error. Performance Toolkit should be using the aggregate PAV group I/O rate when it calculates wait time. In this excerpt, the aggregate I/O rate to the PAV group is 911 I/Os per second, and so the correct wait time would be 13.4 / 911 = 14.7 msec. Further, Performance Toolkit should show the 14.7 msec of wait time for every device in the PAV group, not just for the base device.

The DEVICE report (FCX108) is not the only report that is affected. Any report or screen that comments on a disk's I/O response time will be affected. For example, the DEVLOG report (aka FCX168) is also incorrect.

How to Correct It

Customers wanting to correct Performance Toolkit's assessment of I/O wait time can do so by dividing the base device's average queue length by the PAV group's aggregate I/O rate. The quotient is the correct I/O wait time for the group. To calculate I/O response time for each member of the group, add the calculated wait time to each device's respective I/O service time.

Contents | Previous | Next