z/VM HyperPAV Support
In z/VM 5.3, the Control Program (CP) can use the HyperPAV feature of the IBM System Storage DS8000 line of storage controllers. The HyperPAV feature is similar to IBM's PAV (Parallel Access Volumes) feature in that HyperPAV offers the host system more than one device number for a volume, thereby enabling per-volume I/O concurrency. Further, z/VM's use of HyperPAV is like its use of PAV: the support is for ECKD disks only, the bases and aliases must all be ATTACHed to SYSTEM, and only guest minidisk I/O or I/O provoked by guest actions (such as MDC full-track reads) is parallelized.
We used our PAV measurement workload to study the performance of HyperPAV aliases as compared to classic PAV aliases. We found, as we expected, that HyperPAV aliases match the performance of classic PAV aliases. However, HyperPAV aliases require different management and tuning techniques than classic PAV aliases did. This section discusses the differences and illustrates how to monitor and tune a z/VM system that uses PAV or HyperPAV aliases.
In May 2006 IBM equipped z/VM 5.2 with the ability to use Parallel Access Volumes (PAV) aliases so as to parallelize I/O to user extents (minidisks) on SYSTEM-attached volumes. In its PAV section, this report describes the performance characteristics of z/VM's PAV support, under various workloads, on several different IBM storage subsystems. Readers not familiar with PAV or not familiar with z/VM's PAV support should read that section and our PAV technology description before continuing here.
With z/VM 5.3, IBM extended z/VM's PAV capability so as to support the IBM 2107's HyperPAV feature. Like PAV, HyperPAV offers the host system the opportunity to use many different device numbers to address the same disk volume, thereby enabling per-volume I/O concurrency. Recall that with PAV, each alias device is affiliated with exactly one base, and it remains with that base until the system programmer reworks the I/O configuration. With HyperPAV, though, the base and alias devices are grouped into pools, the rule being that an alias device in a given pool can perform I/O on behalf of any base device in said pool. This lets the host system achieve per-volume I/O concurrency while potentially consuming fewer device numbers for alias devices.
IBM's performance objective for z/VM's HyperPAV support was that with equivalent numbers of aliases, HyperPAV disk performance should equal PAV disk performance. Measurements showed z/VM 5.3 meets this criterion, to within a very small margin. The study revealed, though, that the performance management techniques necessary to exploit HyperPAV effectively are not the same as the techniques one would use to exploit PAV. Rather than discussing the performance of HyperPAV aliases, this section describes the performance management techniques necessary to use HyperPAV effectively. For completeness' sake, this section also discusses tuning techniques appropriate for classic PAV.
Customers must apply VM64248 (UM32072) to z/VM 5.3 for its HyperPAV support to work correctly. This fix is not on the z/VM 5.3 GA RSU. Customers must order it from IBM.
z/VM Performance Toolkit does not calculate volume response times correctly for base or alias devices, either classic PAV or HyperPAV. Service times, I/O rates, and queue depths are correct. In this section, DEVICE report excerpts for classic PAV scenarios have been hand-corrected to show accurate response time values. DEVICE report excerpts for HyperPAV scenarios have not been corrected.
Understanding Disk Performance
Largely speaking, z/VM disk performance can be understood by looking at the amount of time a guest machine perceives is required to do a single I/O operation to a single disk volume. This time, called response time, consists of two main components. The first, queue time (aka wait time), is the time the guest's I/O spends waiting for access to the appropriate real volume. The second component, service time, is the time required for the System z I/O subsystem to perform the real I/O, once z/VM starts it.
Technologies like PAV and HyperPAV can help reduce queue time in that they provide the System z host with means to run more than one real I/O to a volume concurrently. This is similar to there being more than one teller window operating at the local bank. Up to a certain point, adding tellers helps decrease the amount of time a customer stands in line waiting for a teller to become available. In a similar fashion, PAV and HyperPAV aliases help decrease the amount of time a guest I/O waits in queue for access to the real volume.
This idea -- that PAV and HyperPAV offer I/O concurrency and thereby decrease the likelihood of I/Os queueing at a volume -- leads us to our first principle as regards using PAV or HyperPAV to adjust volume performance. If a volume is not experiencing queueing, adding aliases for the volume will not help the volume's performance. Consider adding aliases for a volume only if there is an I/O queue for the volume.
In the bank metaphor, once a given customer has reached a teller, the number of other tellers working does not appreciably change the time needed to perform a customer's transaction. With PAV and HyperPAV, though, IBM has seen evidence that in some environments, increasing the number of aliases for a volume can increase service time for the volume. Most of the time, the decrease in wait time outweighs the increase in service time, so response time improves. At worst, service time increases exactly as wait time decreases, so response time stands unchanged.
This trait -- that adding aliases will generally change the split between wait time and service time, but will generally not increase their sum -- leads us to our second principle for using PAV or HyperPAV. If a queue is forming at a volume, add aliases until you run out of alias capability, or until the queue disappears. Depending on the workload, it might take several aliases before things start to get better.
A performance analyst can come to an understanding of the right number of PAV or HyperPAV aliases for his environment by examining the disk performance statistics z/VM emits in its monitor data. Performance monitoring products such as IBM's z/VM Performance Toolkit comment on device performance and thus are invaluable in tuning and configuring PAV or HyperPAV.
The Basic DEVICE Report
z/VM Performance Toolkit emits a report called DEVICE which comments on the performance statistics for the z/VM system's real devices. This report is the analyst's primary tool for understanding disk performance. Below is an excerpt from the DEVICE report for one of the disk exercising workloads we use for PAV and HyperPAV measurements, run with no aliases.
Readers: please note that due to browser window width limitations, all of the report excerpts in this section are truncated on the right, after the "Req. Qued" column. The rest of the columns are interesting, but not in this discussion. Ed.
FCX108 Run 2007/06/05 14:00:12 DEVICE General I/O Device Load and Performance From 2007/05/27 15:00:14 To 2007/05/27 15:10:14 For 600 Secs 00:10:00 Result of Y040180P Run ________________________________________________________________________________ . . . ___ . . . . . . . . <-- Device Descr. --> Mdisk Pa- <-Rate/s-> <------- Time (msec) -------> Req. Addr Type Label/ID Links ths I/O Avoid Pend Disc Conn Serv Resp CUWt Qued 522A 3390-3 BWPVS0 0 4 755 .0 .2 .2 .9 1.3 3.7 .0 1.84
The following columns are interesting in this discussion:
- I/O is the I/O rate for the device, in operations per second.
- Serv (service time) is the average amount of time (milliseconds) the device is using in performing a single I/O.
- Resp (response time) is the average amount of time (milliseconds) the guest machine perceives is required to perform an I/O to its minidisk on the volume. Response time includes service time plus wait time (aka queue time).
- Req. Qued is the average length of the wait queue for the real device. This is the average number of I/Os waiting in line to use the volume.
The excerpt above shows device 522A that has a wait queue and has low pending time. This suggests opportunity to tune the volume by using PAV or HyperPAV. Let's look at the two approaches.
DASD Tuning via Classic PAV
For classic PAV, the strategy is to add aliases for the volume until the volume's I/O rate maximizes or the volume's wait queue disappears, whichever comes first. Ordinarily, we would expect these to happen simultaneously.
First, let's use Performance Toolkit's DEVICE report to estimate how many aliases the volume will need. The Req. Qued column gives us the number we seek. For a given volume, the estimate for aliases needed is just the queue depth, smoothing any fractional part up to the next integer.
In the excerpt above, device 522A is reporting a queue depth of 1.84. This suggests two aliases will be needed to tune the volume. Keep this in mind as we work through the tuning exercise.
Starting small, we first added one alias to the workload. Here is the corresponding DEVICE excerpt, showing how the performance of the 522A volume changed, now that one alias is helping.
FCX108 Run 2007/06/05 14:03:05 DEVICE General I/O Device Load and Performance From 2007/05/27 14:48:26 To 2007/05/27 14:58:26 For 600 Secs 00:10:00 Result of Y040181P Run ________________________________________________________________________________ . . . ___ . . . . . . . . <-- Device Descr. --> Mdisk Pa- <-Rate/s-> <------- Time (msec) -------> Req. Addr Type Label/ID Links ths I/O Avoid Pend Disc Conn Serv Resp CUWt Qued 522A 3390-3 BWPVS0 0 4 477 .0 .2 .3 1.4 1.9 2.8 .0 .91 5249 ->522A BWPVS0 0 4 465 .0 .2 .3 1.5 2.0 2.9 .0 .00
Notice several things about this example:
- In this situation, there is one base device, 522A, and there is one classic PAV alias device for it, as notated in the second column by ->522A.
- The 522A volume, although it now has one alias, is still experiencing queueing. We could have forecast this, given our estimate that two aliases would be needed.
- The alias device does not have a wait queue. When CP owns the base and alias devices, volume queueing happens only on the volume's base device, not on its alias devices. Guest minidisk I/O never queues on an alias device.
- Volume I/O rate has increased from 755/sec to (477+465) = 942/sec.
- The service time has increased from 1.3 msec to about 1.9 msec. However, because wait time is reduced, response time improved from 3.7 msec to about 2.8 msec.
Adding this one alias increased volume I/O rate and decreased volume response time. We made progress.
Because there's still a wait queue at base device 522A, and because we'd estimated that two aliases would be needed to tune the volume, let's keep going. Let's see what happens if we add another classic PAV alias for volume 522A.
FCX108 Run 2007/06/05 14:27:46 DEVICE General I/O Device Load and Performance From 2007/05/27 14:36:37 To 2007/05/27 14:46:37 For 600 Secs 00:10:00 Result of Y040182P Run ________________________________________________________________________________ . . . ___ . . . . . . . . <-- Device Descr. --> Mdisk Pa- <-Rate/s-> <------- Time (msec) -------> Req. Addr Type Label/ID Links ths I/O Avoid Pend Disc Conn Serv Resp CUWt Qued 522A 3390-3 BWPVS0 0 4 552 .0 .3 .2 1.2 1.7 1.7 .0 .02 5249 ->522A BWPVS0 0 4 545 .0 .3 .2 1.2 1.7 1.7 .0 .00 524C ->522A BWPVS0 0 4 522 .0 .3 .2 1.3 1.8 1.8 .0 .00
By adding another PAV alias, we increased the volume I/O rate to (552+545+522) = 1619/sec. Note we also decreased response time to about 1.7 msec. Because the 522A wait queue is now gone, adding more aliases will not further improve volume performance.
The overall result was that we tuned 522A from 742/sec and 3.7 msec response time to 1619/sec and 1.7 msec response time.
DASD Tuning via HyperPAV
With HyperPAV, base devices and alias devices are organized into pools. Each alias in the pool can perform I/O on behalf of any base device in its same pool.
To reduce queueing at a base device, we add an alias to the pool in which the base resides. However, we must remember that said alias will be used to parallelize I/O for all bases in the pool. It follows that with HyperPAV, there really isn't any notion of "volume tuning" per se. Rather, we tune the pool.
Usually, some base devices in a pool will be experiencing little queueing while others will be experiencing more. The design of HyperPAV makes it possible to add just enough aliases to satisfy the I/O concurrency level for the pool. Usually this will result in needing fewer aliases, as compared to having to equip each base with its own aliases.
For example, in a pool having ten base devices, it might be possible to satisfy the I/O concurrency requirements for all ten bases by adding merely five aliases to the pool. This lets us conserve device numbers. IBM is aware that in large environments, conservation of device numbers is an important requirement.
Let's look at a DEVICE report excerpt for a measurement involving our DASD volumes 522A-5231.
FCX108 Run 2007/06/05 11:01:14 DEVICE General I/O Device Load and Performance From 2007/02/04 13:28:34 To 2007/02/04 13:38:35 For 600 Secs 00:10:00 Result of Y032180H Run _______________________________________________________________________________ . . . ___ . . . . . . . . <-- Device Descr. --> Mdisk Pa- <-Rate/s-> <------- Time (msec) -------> Req. Addr Type Label/ID Links ths I/O Avoid Pend Disc Conn Serv Resp CUWt Qued 522A 3390 BWPVS0 0 4 711 .0 .2 .2 .9 1.3 3.9 .0 1.8 522B 3390 BWPVS1 0 4 745 .0 .2 .2 .9 1.3 3.8 .0 1.8 522C 3390 BWPVS2 0 4 744 .0 .2 .2 .9 1.3 3.8 .0 1.8 522D 3390 BWPVS3 0 4 745 .0 .2 .2 .9 1.3 3.8 .0 1.8 522E 3390 BWPVT0 0 4 769 .0 .2 .2 .8 1.2 3.6 .0 1.8 522F 3390 BWPVT1 0 4 740 .0 .2 .2 .9 1.3 3.7 .0 1.8 5230 3390 BWPVT2 0 4 716 .0 .2 .2 .9 1.3 3.9 .0 1.8 5231 3390 BWPVT3 0 4 719 .0 .2 .2 .9 1.3 3.9 .0 1.8
In this workload we see that each of the eight volumes is experiencing queueing, with pending time not being an issue. Again, volume tuning looks promising.
Notice also that each volume is experiencing an I/O rate of about 735/sec and a response time of about 3.8 msec. When we are done tuning this pool, we will take another look at these, to see what happened.
Because we are going to use HyperPAV this time, we will not be tuning these volumes individually. Rather, we will be tuning them as a group. Noticing that the total queue depth for the group is (1.8*8) = 14.4, we can estimate that 15 HyperPAV aliases should suffice to tune the pool. Let's start by adding eight HyperPAV aliases 5249-5250 and see what happens.
FCX108 Run 2007/06/05 14:33:16 DEVICE General I/O Device Load and Performance From 2007/02/04 13:16:46 To 2007/02/04 13:26:47 For 600 Secs 00:10:00 Result of Y032181H Run _______________________________________________________________________________ . . . ___ . . . . . . . . <-- Device Descr. --> Mdisk Pa- <-Rate/s-> <------- Time (msec) -------> Req. Addr Type Label/ID Links ths I/O Avoid Pend Disc Conn Serv Resp CUWt Qued 522A 3390-3 BWPVS0 0 4 665 .0 .2 .2 1.0 1.4 2.8 .0 .90 522B 3390-3 BWPVS1 0 4 651 .0 .2 .2 1.0 1.4 2.9 .0 .98 522C 3390-3 BWPVS2 0 4 658 .0 .2 .2 1.0 1.4 2.8 .0 .90 522D 3390-3 BWPVS3 0 4 644 .0 .2 .2 1.0 1.4 2.8 .0 .90 522E 3390-3 BWPVT0 0 4 597 .0 .2 .2 1.1 1.5 3.1 .0 .93 522F 3390-3 BWPVT1 0 4 721 .0 .2 .1 .9 1.2 2.4 .0 .89 5230 3390-3 BWPVT2 0 4 606 .0 .2 .2 1.1 1.5 3.1 .0 .95 5231 3390-3 BWPVT3 0 4 649 .0 .2 .2 1.0 1.4 2.8 .0 .94 5249 3390-3 0 4 608 .0 .2 .2 1.1 1.5 1.5 .0 .00 524A 3390-3 0 4 621 .0 .2 .2 1.0 1.4 1.4 .0 .00 524B 3390-3 0 4 611 .0 .2 .2 1.0 1.4 1.4 .0 .00 524C 3390-3 0 4 601 .0 .2 .2 1.1 1.5 1.5 .0 .00 524D 3390-3 0 4 578 .0 .2 .2 1.1 1.5 1.5 .0 .00 524E 3390-3 0 4 615 .0 .2 .2 1.1 1.5 1.5 .0 .00 524F 3390-3 0 4 562 .0 .2 .2 1.2 1.6 1.6 .0 .00 5250 3390-3 0 4 592 .0 .2 .2 1.1 1.5 1.5 .0 .00
There are lots of interesting things in this report, such as:
- Base devices 522A-5231 are still experiencing queueing. So some benefit could likely be had by adding more HyperPAV aliases to the pool. We predicted this.
- Alias devices are not showing "->" notation to indicate base affiliation. That's because the base with which a HyperPAV alias is affiliated changes for every I/O the alias does.
- Alias devices 5249-5250 are not showing volume labels. Again, affiliation changes with every I/O, so an alias has no long-lived volume label.
- Alias devices 5249-5250 are not showing device queues. This is correct. Again, I/O queues form only on base devices.
One note about I/O rates needs mention. When we tuned via classic PAV, it was easy to calculate the aggregate I/O rate for a volume. All we did was add up the rates for the volume's base and alias devices. By doing this summing, we could see the volume I/O rates rise as we added aliases. With HyperPAV, though, an alias does I/Os for all of the bases in the pool. Thus there is no way from the DEVICE report to calculate the aggregate I/O rate for a specific volume. There is relief in the raw monitor data, though. More on this later.
Bear in mind also that z/VM Performance Toolkit does not calculate response times correctly in PAV or HyperPAV situations, so we can't really see how well we're doing at this interim step. Again, there is relief in the raw monitor data. More on this later, too.
To continue to tune this pool, we can add some more HyperPAV aliases. Again summing the queue depths for the pool's base devices yields a sum of 7.39 I/Os still queued for these bases. Let's add eight more HyperPAV aliases for this pool at device numbers 5251-5258 and see what happens. Again, for convenience we have sorted the report by device number.
FCX108 Run 2007/06/05 14:39:47 DEVICE General I/O Device Load and Performance From 2007/02/04 13:04:57 To 2007/02/04 13:14:57 For 600 Secs 00:10:00 Result of Y032182H Run ________________________________________________________________________________ . . . ___ . . . . . . . . <-- Device Descr. --> Mdisk Pa- <-Rate/s-> <------- Time (msec) -------> Req. Addr Type Label/ID Links ths I/O Avoid Pend Disc Conn Serv Resp CUWt Qued 522A 3390-3 BWPVS0 0 4 536 .0 .3 .2 1.2 1.7 1.8 .0 .03 522B 3390-3 BWPVS1 0 4 553 .0 .3 .2 1.2 1.7 1.7 .0 .00 522C 3390-3 BWPVS2 0 4 576 .0 .3 .2 1.1 1.6 1.6 .0 .00 522D 3390-3 BWPVS3 0 4 570 .0 .3 .2 1.1 1.6 1.6 .0 .01 522E 3390-3 BWPVT0 0 4 548 .0 .3 .2 1.2 1.7 1.7 .0 .00 522F 3390-3 BWPVT1 0 4 573 .0 .3 .2 1.1 1.6 1.6 .0 .01 5230 3390-3 BWPVT2 0 4 584 .0 .3 .2 1.1 1.6 1.6 .0 .00 5231 3390-3 BWPVT3 0 4 572 .0 .3 .2 1.1 1.6 1.6 .0 .00 5249 3390-3 0 4 558 .0 .3 .2 1.2 1.7 1.7 .0 .00 524A 3390-3 0 4 569 .0 .3 .2 1.1 1.6 1.6 .0 .00 524B 3390-3 0 4 562 .0 .3 .2 1.1 1.6 1.6 .0 .00 524C 3390-3 0 4 566 .0 .3 .2 1.1 1.6 1.6 .0 .00 524D 3390-3 0 4 564 .0 .3 .2 1.1 1.6 1.6 .0 .00 524E 3390-3 0 4 538 .0 .3 .2 1.2 1.7 1.7 .0 .00 524F 3390-3 0 4 563 .0 .3 .2 1.1 1.6 1.6 .0 .00 5250 3390-3 0 4 548 .0 .3 .2 1.2 1.7 1.7 .0 .00 5251 3390-3 0 4 524 .0 .3 .2 1.2 1.7 1.7 .0 .00 5252 3390-3 0 4 535 .0 .3 .2 1.2 1.7 1.7 .0 .00 5253 3390-3 0 4 568 .0 .3 .2 1.1 1.6 1.6 .0 .00 5254 3390-3 0 4 570 .0 .3 .2 1.1 1.6 1.6 .0 .00 5255 3390-3 0 4 557 .0 .3 .2 1.2 1.7 1.7 .0 .00 5256 3390-3 0 4 543 .0 .3 .2 1.2 1.7 1.7 .0 .00 5257 3390-3 0 4 544 .0 .3 .2 1.2 1.7 1.7 .0 .00 5258 3390-3 0 4 574 .0 .3 .2 1.1 1.6 1.6 .0 .00
We see that by adding the eight HyperPAV aliases, we have eliminated queueing at the eight bases, which was our objective.
Further, now that there is no queueing, we can assess volume response time by inspecting the service times in the DEVICE report. For this example, we can conclude that we reduced volume response time in this pool from about 3.8 msec to about 1.7 msec.
Because this pool is comprised only of bases 522A-5231 and aliases 5249-5268, summing the device I/O rates gives 13395/sec aggregate I/O rate to the pool. We can approximate the volume I/O rate by dividing by 8, because there are eight bases. This gives us a volume I/O rate of about 1674/sec, which is an increase from our original value of 735/sec.
Regarding HyperPAV pools, one other point needs mention. The span of a HyperPAV pool is typically the logical subsystem (LSS) (aka logical control unit, or LCU) within the IBM 2107. IBM anticipates that customers using HyperPAV will have more than one LSS (LCU) configured in HyperPAV mode, and so keeping track of the base-alias relationships can become a bit challenging. Unfortunately, z/VM Performance Toolkit does not report on the base-alias relationships for HyperPAV, so the system administrator must resort to other means. The CP command QUERY PAV yields comprehensive console output that describes the organization of HyperPAV bases and aliases into pools, thereby telling the system administrator what he needs to know to interpret Performance Toolkit reports and subsequently tune the I/O subsystem. Customers interpreting raw monitor data will notice that the MRIODDEV record has new bits IODDEV_RDEVHPBA and IODDEV_RDEVHPAL which tell whether the device is a HyperPAV base or alias respectively. If one of those bits is set, a new field, IODDEV_RDEVHPPL, gives the pool number in which the device resides.
Another Look at HyperPAV Alias Provisioning
New monitor record MRIODHPP (domain 6, record 28) comments on the configuration of a HyperPAV pool.
From a pool tuning perspective, perhaps the most important fields in this record are IODHPP_HPPTRIES and IODHPP_HPPFAILS. The former counts the number of times CP went to the pool to try to get an alias to do an I/O on behalf of a base. The latter counts the number of those tries where CP came up empty, that is, there were no aliases available.
Trend analysis on IODHPP_HPPTRIES and IODHPP_HPPFAILS reveals whether there are enough aliases in the pool. If HPPTRIES is increasing but HPPFAILS is remaining constant, there are enough aliases. If both are rising, there are not enough aliases.
Fields IODHPP_HPPMINCT and IODHPP_HPPMAXCT are low-water and high-water marks on free alias counts. CP updates these fields each time it tries to get an alias from the pool, and it resets them each time it cuts an MRIODHPP record. Thus each MRIODHPP record comments on the minimal and maximal number of free aliases CP found in the pool since the previous MRIODHPP record. If IODHPP_HPPMINCT is consistently large, we can conclude the pool probably has too many aliases. If our I/O configuration is suffering for device numbers, some of those aliases could be removed and the device numbers reassigned for other purposes.
z/VM Performance Toolkit does not yet report on the MRIODHPP record. The customer must use other means, such as the MONVIEW package on our download page, to inspect it.
A Unified Look at Volume I/O Rate and Volume Response Time
In z/VM 5.3, IBM has extended the MRIODDEV record so that it comments on the I/O contributions made by alias devices, no matter which aliases contributed. These additional fields make it simple to calculate volume performance statistics, such as volume I/O rate, volume service time, and volume response time.
Analysts familiar with fields IODDEV_SCGSSCH, IODDEV_SCMCNTIM, IODDEV_SCMFPTIM, and friends already know how to use those fields to calculate device I/O rate, device pending time, device connect time, device service time, and so on. In z/VM 5.3, this set of fields continues to have the same meaning, but it's important to realize that in a PAV or HyperPAV situation, those fields comment on the behavior of only the base device for the volume.
The new MRIODDEV fields IODDEV_PAVSSCH, IODDEV_PAVCNTIM, IODDEV_PAVFPTIM, and friends comment on the aggregate corresponding phenomena for all aliases ever acting for this base, regardless of PAV or HyperPAV, and regardless of alias device number.
What this means is that by looking at MRIODDEV and doing the appropriate arithmetic, a reduction program can calculate volume behavior quite easily, by weighting the base and aggregate-alias contributions according to their respective I/O rates.
For example, if the traditional MRIODDEV base device fields show an I/O rate of 400/sec and a connect time of 1.2 msec, and the same calculational technique applied to the new aggregate-alias fields reveals an I/O rate of 700/sec and a connect time of 1.4 msec, the expected value of the volume's connect time is calculated to be (400*1.2 + 700*1.4) / (400 + 700), or 1.33 msec.
Authors of reduction programs can update their software so as to calculate and report on volume behavior.
z/VM Performance Toolkit does not yet report on the new MRIODDEV fields. The customer must use other means, such as the MONVIEW package on our download page, to examine them.
I/O Parallelism: Other Thoughts
In this section we have discussed that for the case of guest I/O to minidisks, z/VM CP can exploit PAV or HyperPAV so as to parallelize the corresponding real I/O, thereby reducing or eliminating CP's serializing on real volumes. This support is useful in removing I/O queueing in the case where several guests require access to the same real volume, each such guest manipulating only its own slice (minidisk) of the volume.
Depending on the environment or configuration, other opportunities exist in a z/VM system for disk I/O to become serialized inadvertently, and other remedies exist too.
For example, even though z/VM can itself use PAV or HyperPAV to run several guest minidisk I/Os concurrently to a single real volume, each such guest still can do only one I/O at a time to any given minidisk. Depending on the workload inside the guest, the guest itself might be maintaining its own I/O queue for the minidisk. z/VM Performance Toolkit would not ordinarily report on such a queue, nor would z/VM's PAV or HyperPAV support be useful in removing it.
A holistic approach to removing I/O queueing requires an end-to-end analysis of I/O performance and the application of appropriate relief measures at each step. Returning to the earlier example, if guest I/O queueing is the real concern, and if the guest is PAV-aware, it might make sense to move the guest's data to a dedicated volume, and then attach the volume's base and alias device numbers to the guest. Such an approach would give the guest an opportunity to use PAV or HyperPAV to do its own I/O scheduling and thereby mitigate its queueing problem.
If the guest is PAV-aware, another tool for removing a guest I/O queue is to have z/VM create some virtual PAV aliases for the guest's minidisk. z/VM can virtualize either classic PAV aliases or HyperPAV aliases for the minidisk. This approach lets the guest's data continue to reside on a minidisk but also lets the guest itself achieve I/O concurrency for the minidisk.
Depending on software levels, guests running Linux for System z can use PAV to parallelize I/O to volumes containing Linux file systems, so as to achieve file system I/O concurrency. Again, whether this is useful or appropriate depends on the results of a comprehensive study of the I/O habits of the guest.
As always, customers must study performance data and apply tuning techniques appropriate to the bottlenecks discovered.