z/VM, PAV, and MDC
In May 2006 IBM shipped PTF UM31771 (APAR VM63855), which lets the
z/VM Control Program (CP) exploit Parallel Access Volumes (PAV)
technology so as to parallelize I/O to user volumes attached
to SYSTEM.
This support lets CP run more than one I/O at a time to a volume
containing user minidisks. CP takes advantage of this capability
so as to reduce, and in some cases eliminate, I/O queueing at
heavily used user volumes. Our
PAV performance
article illustrates the effect PAV can have on application
performance.
In March 2007 IBM became aware that when z/VM Minidisk Cache (MDC)
is enabled for a user volume, CP fails to exploit PAV for that
volume, even if the customer has correctly configured z/VM and
the storage controller (e.g., 2105) for PAV. This failure can
reduce transaction rates for applications using minidisks on
the volume, compared to what the applications would see if MDC
were off for the volume. PTF UMxxxxx (VM64199) repairs the defect.
In this brief article we illustrate how to use Performance Toolkit
to detect whether a system is experiencing the behavior
repaired by UMxxxxx. Other performance monitoring products'
corresponding reports could also be used.
Illustrative Workload
To illustrate the problem, we set up as follows on one of our
test systems:
- We created four user IDs, SQPERF1..4.
- SQPERF1..4 are sharing a single read-only minidisk on volume
LDB307.
- Each of SQPERF1..4 also has
a private read-write
minidisk also on volume LDB307.
- Volume LDB307 is on RDEV E700 (one PAV base) and E7FD..E7FF
(three PAV aliases). All four RDEVs are ATTACHed to SYSTEM.
Provided the minidisk I/O burden generated by SQPERF1..4 is great
enough, this arrangement lets
CP use the PAV aliases to drive I/O to the volume's minidisks.
Next we set up SQPERF1..4 in a COPYFILE loop, each user ID copying
all the files from the shared read-only minidisk to its respective
read-write minidisk. This workload is a good test case, because
when MDC is enabled for LDB307, the shared read-only minidisk should
find its way into MDC pretty quickly, and transaction rate, as
indicated by virtual I/O rate, should increase.
With MDC OFF
With MDC off, the FCX112 (USER) screen showed us the following
virtual I/O rates for SQPERF1..4. We see that each of the four
user IDs under test is experiencing about the same transaction rate.
<----- CPU Load -----> <------ Virtual IO/s ------>
<-Seconds-> T/V
Userid %CPU TCPU VCPU Ratio Total DASD Avoid Diag98 UR Pg/s User Status
SQPERF1 1.01 .303 .137 2.2 517 517 .0 .0 .0 .0 XC, CL3,DIS
SQPERF2 .97 .291 .126 2.3 537 537 .0 .0 .0 .0 XC, CL3,DIS
SQPERF3 .98 .295 .123 2.4 558 558 .0 .0 .0 .0 XC, CL3,DIS
SQPERF4 1.01 .304 .134 2.3 536 536 .0 .0 .0 .0 XC, CL3,DIS
Further, when we look at the FCX108 (DEVICE) screen for the LDB307 volume, we see
the base and three aliases all in use equally. Further, and especially important,
we see there is no I/O wait queue forming at base RDEV E700.
<-- Device Descr. --> Mdisk Pa- <-Rate/s-> <------- Time (msec) -------> Req.
Addr Type Label/ID Links ths I/O Avoid Pend Disc Conn Serv Resp CUWt Qued
E700 3390 LDB307 15 4 527 .0 .3 .2 1.3 1.8 1.8 .0 .0
E7FD ->E700 LDB307 15 4 524 .0 .3 .2 1.3 1.8 1.8 .0 .0
E7FE ->E700 LDB307 15 4 521 .0 .3 .2 1.3 1.8 1.8 .0 .0
E7FF ->E700 LDB307 15 4 524 .0 .3 .2 1.3 1.8 1.8 .0 .0
Because we know that one of the minidisks involved is read-only, one would think
that turning on MDC for the LDB307 volume would improve transaction rate (aka
virtual I/O rate). Let's see what happens.
With MDC ON
The FCX108 (DEVICE) excerpt below shows what happened to the LDB307 volume when
we turned on MDC.
At first glance, we seem to have improved matters. Avoided I/Os started
showing up for the base RDEV, and the total
I/O rate to the volume went down.
This is kind of what we'd expect if MDC were in play.
<-- Device Descr. --> Mdisk Pa- <-Rate/s-> <------- Time (msec) -------> Req.
Addr Type Label/ID Links ths I/O Avoid Pend Disc Conn Serv Resp CUWt Qued
E700 3390 LDB307 15 4 575 174.2 .3 .1 1.3 1.7 1.9 .0 2.6
E7FD ->E700 LDB307 15 4 17.7 .0 .4 .2 1.3 1.9 2.1 .0 .0
E7FE ->E700 LDB307 15 4 21.5 .0 .4 .4 1.3 2.1 2.3 .0 .0
E7FF ->E700 LDB307 15 4 18.1 .0 .4 .2 1.3 1.9 2.1 .0 .0
However, look closely at the base RDEV, E700. As indicated by the "Req. Qued"
column, I/Os are queueing at the base. Further, the aliases E7FD..E7FF are
barely in use. This means CP is not exploiting the aliases.
The impact of the reduced I/O capability to LDB307 is clearly seen in the transaction
rates (aka virtual I/O rates) experienced by SQPERF1..4. The FCX112 (USER) report
tells the tale. Even though some virtual I/Os are being handled by MDC, the overall
rates are significantly reduced.
<----- CPU Load -----> <------ Virtual IO/s ------>
<-Seconds-> T/V
Userid %CPU TCPU VCPU Ratio Total DASD Avoid Diag98 UR Pg/s User Status
SQPERF1 .35 .105 .050 2.1 158 158 17.3 .0 .0 .0 XC, CL3,DIS
SQPERF2 .32 .095 .045 2.1 164 164 21.7 .0 .0 .0 XC, CL3,DIS
SQPERF3 .32 .095 .047 2.0 163 163 21.8 .0 .0 .0 XC, CL3,DIS
SQPERF4 .31 .092 .044 2.1 163 163 21.4 .0 .0 .0 XC, CL3,DIS
The Problem and the Fix
CP was performing its MDC
full-track reads in such a way that they were
ineligible for PAV fanout. Thus all of MDC's reads were being forced to the
base RDEV. UMxxxxx fixes the problem, letting MDC's full-track reads fan out.
With the Fix
The reports pretty much speak for themselves. The FCX112 (USER) screen
shows the transaction rates for SQPERF1..4 much improved, compared
to MDC OFF.
<----- CPU Load -----> <------ Virtual IO/s ------>
<-Seconds-> T/V
Userid %CPU TCPU VCPU Ratio Total DASD Avoid Diag98 UR Pg/s User Status
SQPERF1 1.26 .378 .148 2.6 573 573 112 .0 .0 .0 XC, CL3,DIS
SQPERF2 1.25 .374 .145 2.6 593 593 114 .0 .0 .0 XC, CL3,DIS
SQPERF3 1.24 .372 .144 2.6 587 587 115 .0 .0 .0 XC, CL3,DIS
SQPERF4 1.33 .398 .153 2.6 569 569 113 .0 .0 .0 XC, CL3,DIS
Further, the FCX108 (DEVICE) screen shows I/O fanned out to all four
exposures, and a significant fraction of the I/O being handled by
MDC, and no requests queueing on the base RDEV.
<-- Device Descr. --> Mdisk Pa- <-Rate/s-> <------- Time (msec) -------> Req.
Addr Type Label/ID Links ths I/O Avoid Pend Disc Conn Serv Resp CUWt Qued
E700 3390 LDB307 15 4 444 474.1 .3 .3 1.4 2.0 2.0 .0 .0
E7FD ->E700 LDB307 15 4 422 .0 .4 .3 1.5 2.2 2.2 .0 .0
E7FE ->E700 LDB307 15 4 411 .0 .4 .3 1.5 2.2 2.2 .0 .0
E7FF ->E700 LDB307 15 4 406 .0 .4 .3 1.5 2.2 2.2 .0 .0
Last revised March 29, 2007 (BKW)
|