|
Contents | Previous | Next
CP Disk I/O Performance
Introduction
The purpose of this study was to measure
the processor costs of
regression disk environments.
Simply put, we defined a number of different measurement
scenarios that exercise z/VM's ability to do disk I/O. We
ran each measurement case on both z/VM 5.1.0 and z/VM 5.2.0
and compared corresponding runs.
For this study, we devised and ran three measurement suites:
- To evaluate guest-initiated I/O (SSCH, Diagnose x'250', and
FCP SCSI), we used
the Linux disk exerciser IOzone to measure
disk performance a Linux guest experiences. We ran IOzone
against a wide variety of disk choices a Linux guest might use.
- To evaluate CMS-initiated diagnose I/O,
we ran an XEDIT loop that repeatedly read a 400 KB file
from a minidisk located on an emulated FBA volume.
- To evaluate CP paging I/O,
we ran a CMS-based program designed to induce
page faults. We ran this program in
a system configuration specifically contrived to be
particularly stressful on the Control Program. These stressors
included more than just being short on real storage.
We chose these measurements specifically because they
exercise disk I/O scenarios present in both z/VM 5.1.0
and z/VM 5.2.0.
We need to mention, though, that
z/VM 5.2.0 also contains new I/O support.
In our
Linux Disk I/O Alternatives
chapter, we describe 64-bit extensions to Diagnose x'250'
and the Linux device driver that exploits them.
In the following sections we describe the results of each of
our disk I/O regression experiments.
z/VM 5.3 note:
For later information about SCSI performance, see
our z/VM 5.3 SCSI study.
Linux IOzone
To measure disk performance with Linux guests, we set up a single
Linux guest running the IOzone disk exerciser. IOzone
is a file system exercise tool. See our
IOzone workload description
for details of how we run IOzone.
For this experiment, we used the following configuration:
- 2084-324, two-way dedicated partition,
2 GB central storage, 2 GB expanded storage.
-
No activity in the partition being measured, except our
Linux guest and the test case driver.
- 2105-F20, 16 GB of cache, FICON and FCP attached.
- z/VM 5.1.0 GA RSU, or z/VM 5.1.0 serviced through
mid-October 2005, or z/VM 5.2.0 GA RSU plus certain
other PTFs,
as called out
in the tables below.
- Linux SLES 9 SP 1, 192 MB virtual uniprocessor,
ext2 file systems.
The Linux guest is 64-bit in all runs except the Diagnose
x'250' runs. In those runs, we used a 31-bit Linux guest.
We could not do a regression measurement of
64-bit Diagnose x'250' I/O, because it is not supported
on z/VM 5.1.0.
It is important to notice that we chose the ballast file
to be about four times larger than the virtual machine.
This ensures that the Linux page cache plays little to
no role in buffering IOzone's file operations. We wanted
to be sure to measure CP I/O processing performance,
not the performance of the Linux page cache.
We remind the reader that this chapter studies disk performance
from a regression perspective.
To see performance comparisons among choices,
and to read about new z/VM 5.2.0 disk choices for Linux guests,
the reader should
refer to our Linux Disk I/O Alternatives
chapter.
We also remind the reader that our primary intent in conducting these
experiments was to measure the Control Program overhead (processor time
per unit of work) associated with these disk I/O methods, so as
to determine
how said overhead had changed from z/VM 5.1.0 to z/VM 5.2.0.
It was not
our intent to measure or report on the maximum I/O rate or throughput
achievable with z/VM, or with a specific processor, or with a specific
type or model of disk hardware.
The disk configurations mentioned in this chapter
(e.g., EDED, LNS0, and so on)
are defined in our
IOzone workload description
appendix.
For the reader's convenience, we offer here the following brief
tips on interpreting the configuration names:
| Name prefix |
Name suffix |
|
E for ECKD with the Linux ECKD CCW driver.
|
- DED for a dedicated volume.
- MD0 for a minidisk with MDC OFF.
- MD1 for a minidisk with MDC ON.
|
|
F for emulated FBA with the Linux FBA CCW driver.
|
- DED for a dedicated volume.
- MD0 for a minidisk with MDC OFF.
- MD1 for a minidisk with MDC ON.
|
|
D2 for ECKD minidisk with the Linux Diagnose X'250' driver.
|
- 10 for blocksize 1024, MDC OFF.
- 11 for blocksize 1024, MDC ON.
- 20 for blocksize 2048, MDC OFF.
- 21 for blocksize 2048, MDC ON.
- 40 for blocksize 4096, MDC OFF.
- 41 for blocksize 4096, MDC ON.
|
|
LNS0 for Linux owning an FCP subchannel and
using the zFCP driver.
|
None.
|
For each configuration,
the tables below show the ratio of the z/VM 5.2.0 result to
the corresponding z/VM 5.1.0 result.
Each table comments on a particular phase of IOzone.
IOzone Initial Write Results
(scaled to z/VM 5.1.0)
|
| Configuration |
KB/sec |
Total CPU/KB |
CP CPU/KB |
Virtual CPU/KB |
| ECKD ccw |
|
|
|
|
| EDED |
0.963 |
1.01 |
1.09 |
0.997 |
| EMD0 |
1.12 |
1.01 |
1.05 |
1.01 |
| EMD1 |
1.01 |
1.01 |
1.05 |
1.00 |
| Emulated FBA ccw |
|
|
|
|
| FDED |
0.999 |
1.01 |
1.04 |
1.000 |
| FMD0 |
1.00 |
1.02 |
1.04 |
1.01 |
| FMD1 |
0.998 |
1.01 |
1.03 |
0.994 |
| ECKD Diag X'250' (31-bit) |
|
|
|
|
| D210 |
1.07 |
1.01 |
1.04 |
1.01 |
| D211 |
0.999 |
1.03 |
1.08 |
1.01 |
| D220 |
2.24 |
0.964 |
0.820 |
0.992 |
| D221 |
1.02 |
1.03 |
1.06 |
1.03 |
| D240 |
2.31 |
0.954 |
0.734 |
0.985 |
| D241 |
1.02 |
1.00 |
0.994 |
1.00 |
| Dedicated FCP |
|
|
|
|
| LNS0 |
1.00 |
1.01 |
1.03 |
1.01 |
|
Notes:
2084-324. Two-way dedicated partition. 2 GB central. 2 GB XSTORE.
2105-F20 16 GB FICON/FCP.
z/VM 5.1.0 GA RSU.
z/VM 5.2.0 GA RSU + VM63893.
Linux SLES 9 SP 1, 192 MB virtual uniprocessor.
|
IOzone Rewrite Results
(scaled to z/VM 5.1.0)
|
| Configuration |
KB/sec |
Total CPU/KB |
CP CPU/KB |
Virtual CPU/KB |
| ECKD ccw |
|
|
|
|
| EDED |
0.976 |
1.03 |
1.04 |
1.02 |
| EMD0 |
1.01 |
1.02 |
1.05 |
1.01 |
| EMD1 |
1.00 |
1.01 |
1.01 |
1.00 |
| Emulated FBA ccw |
|
|
|
|
| FDED |
0.999 |
1.04 |
1.06 |
1.02 |
| FMD0 |
0.999 |
1.04 |
1.04 |
1.04 |
| FMD1 |
0.999 |
1.01 |
1.03 |
0.999 |
| ECKD Diag X'250' (31-bit) |
|
|
|
|
| D210 |
0.995 |
1.00 |
1.02 |
0.991 |
| D211 |
0.998 |
1.04 |
1.06 |
1.03 |
| D220 |
0.992 |
1.04 |
1.10 |
1.02 |
| D221 |
1.00 |
1.04 |
1.11 |
1.03 |
| D240 |
1.00 |
1.02 |
1.08 |
1.01 |
| D241 |
0.965 |
1.03 |
1.07 |
1.02 |
| Dedicated FCP |
|
|
|
|
| LNS0 |
0.969 |
1.000 |
0.999 |
1.000 |
|
Notes:
2084-324. Two-way dedicated partition. 2 GB central. 2 GB XSTORE.
2105-F20 16 GB FICON/FCP.
z/VM 5.1.0 GA RSU.
z/VM 5.2.0 GA RSU + VM63893.
Linux SLES 9 SP 1, 192 MB virtual uniprocessor.
|
IOzone Initial Read Results
(scaled to z/VM 5.1.0)
|
| Configuration |
KB/sec |
Total CPU/KB |
CP CPU/KB |
Virtual CPU/KB |
| ECKD ccw |
|
|
|
|
| EDED |
0.995 |
1.09 |
1.50 |
1.02 |
| EMD0 |
0.995 |
1.07 |
1.40 |
1.01 |
| EMD1 |
0.995 |
1.04 |
1.12 |
0.986 |
| Emulated FBA ccw |
|
|
|
|
| FDED |
0.994 |
1.05 |
1.06 |
1.03 |
| FMD0 |
0.995 |
1.03 |
1.05 |
1.02 |
| FMD1 |
0.998 |
1.01 |
1.01 |
1.01 |
| ECKD Diag X'250' (31-bit) |
|
|
|
|
| D210 |
0.999 |
1.04 |
1.09 |
1.01 |
| D211 |
0.999 |
1.07 |
1.09 |
1.05 |
| D220 |
0.999 |
1.04 |
1.11 |
1.02 |
| D221 |
0.998 |
1.04 |
1.05 |
1.02 |
| D240 |
0.998 |
1.00 |
0.994 |
1.01 |
| D241 |
0.996 |
1.06 |
1.14 |
1.01 |
| Dedicated FCP |
|
|
|
|
| LNS0 |
0.997 |
1.02 |
0.973 |
1.02 |
|
Notes:
2084-324. Two-way dedicated partition. 2 GB central. 2 GB XSTORE.
2105-F20 16 GB FICON/FCP.
z/VM 5.1.0 GA RSU.
z/VM 5.2.0 GA RSU + VM63893.
Linux SLES 9 SP 1, 192 MB virtual uniprocessor.
|
IOzone Reread Results
(scaled to z/VM 5.1.0)
|
| Configuration |
KB/sec |
Total CPU/KB |
CP CPU/KB |
Virtual CPU/KB |
| ECKD ccw |
|
|
|
|
| EDED |
0.996 |
1.02 |
1.32 |
0.962 |
| EMD0 |
0.995 |
1.06 |
1.41 |
0.999 |
| EMD1 |
0.954 |
1.27 |
1.41 |
1.22 |
| Emulated FBA ccw |
|
|
|
|
| FDED |
0.994 |
1.03 |
1.06 |
0.994 |
| FMD0 |
0.996 |
1.03 |
1.04 |
1.01 |
| FMD1 |
0.992 |
0.922 |
0.925 |
0.919 |
| ECKD Diag X'250' (31-bit) |
|
|
|
|
| D210 |
0.999 |
1.03 |
1.09 |
1.000 |
| D211 |
0.954 |
1.16 |
1.18 |
1.14 |
| D220 |
1.000 |
1.04 |
1.10 |
1.02 |
| D221 |
0.975 |
1.03 |
1.05 |
1.01 |
| D240 |
0.998 |
1.02 |
1.08 |
1.01 |
| D241 |
0.954 |
0.988 |
1.09 |
0.952 |
| Dedicated FCP |
|
|
|
|
| LNS0 |
0.998 |
1.01 |
1.06 |
0.999 |
|
Notes:
2084-324. Two-way dedicated partition. 2 GB central. 2 GB XSTORE.
2105-F20 16 GB FICON/FCP.
z/VM 5.1.0 GA RSU.
z/VM 5.2.0 GA RSU + VM63893.
Linux SLES 9 SP 1, 192 MB virtual uniprocessor.
|
IOzone Overall Results
(scaled to z/VM 5.1.0)
|
| Configuration |
KB/sec |
Total CPU/KB |
CP CPU/KB |
Virtual CPU/KB |
| ECKD ccw |
|
|
|
|
| EDED |
0.979 |
1.03 |
1.21 |
1.00 |
| EMD0 |
1.04 |
1.03 |
1.20 |
1.01 |
| EMD1 |
1.00 |
1.05 |
1.13 |
1.02 |
| Emulated FBA ccw |
|
|
|
|
| FDED |
0.997 |
1.03 |
1.05 |
1.01 |
| FMD0 |
0.999 |
1.03 |
1.04 |
1.02 |
| FMD1 |
0.998 |
1.000 |
1.01 |
0.989 |
| ECKD Diag X'250' (31-bit) |
|
|
|
|
| D210 |
1.02 |
1.02 |
1.06 |
1.00 |
| D211 |
0.998 |
1.06 |
1.10 |
1.04 |
| D220 |
1.37 |
1.01 |
1.01 |
1.01 |
| D221 |
1.00 |
1.04 |
1.06 |
1.03 |
| D240 |
1.42 |
0.990 |
0.942 |
0.998 |
| D241 |
0.993 |
1.02 |
1.09 |
0.999 |
| Dedicated FCP |
|
|
|
|
| LNS0 |
0.988 |
1.01 |
1.02 |
1.01 |
|
Notes:
2084-324. Two-way dedicated partition. 2 GB central. 2 GB XSTORE.
2105-F20 16 GB FICON/FCP.
z/VM 5.1.0 GA RSU.
z/VM 5.2.0 GA RSU + VM63893.
Linux SLES 9 SP 1, 192 MB virtual uniprocessor.
|
Discussion
Most of the IOzone cases show regression results consistent
with the trends reported in our
general discussion of the regression traits of
z/VM 5.2.0, namely, that we see some rise in CP CPU/tx but
that overall CPU/tx and transaction rate are not changed
all that much.
Notable are the
Diagnose x'250'
initial write cases for block sizes 2048 and 4096,
which showed
markedly improved
data rates and CP CPU
times compared to z/VM 5.1.
We investigated this for each block size
by comparing the z/VM 5.1 initial
write pass to its rewrite pass,
and by comparing the z/VM 5.2
initial write pass to its
rewrite pass.
It was our intuition that on a given
z/VM release, we should
find that the initial write pass had about the same
performance as the rewrite pass.
On z/VM 5.2, we indeed found
what we expected.
On z/VM 5.1, however, we found that the initial write pass
experienced degraded performance (about 56% drop in throughput and
about 43% rise in CP CPU/tx) compared to the rewrite pass.
We also found that the rewrite numbers for z/VM 5.1 were
about equal to both
the initial write and rewrite numbers for
z/VM 5.2. Based on these findings, we concluded that z/VM 5.2
must have coincidentally repaired a z/VM 5.1 defect in
Diagnose x'250' write processing, and so we studied the anomaly
no further.
During z/VM 5.2.0 development, we measured a variety of
workloads, some known to be heavily constrained on z/VM 5.1.0,
others not. We knew from our measures of unconstrained
workloads that they would pay a CPU consumption penalty for
the constraint relief technology put into z/VM 5.2.0, even
though they would get no direct benefit from those technologies.
To compensate,
we looked for ways to put in offsetting improvements for
such workloads. For work resembling IOzone, we made
improvements in
CCW translation and
in support for emulated FBA volumes. This helped us
maintain regression performance for z/VM 5.2.0.
XEDIT Read Loop
In this experiment we gave a CMS guest a 4-KB-formatted
minidisk on an emulated FBA volume, MDC OFF. We ran an
exec that looped on XEDIT reading a 100-block file from
the minidisk. We measured
XEDIT file loads per second
and total CPU time per XEDIT file load.
In this measurement
we specifically confined
ourselves to using an emulated FBA minidisk with MDC OFF. We
did this because we
knew that regression performance of
ECKD volumes and of MDC were being
covered by the Linux IOzone experiments. We also knew that
excessive
CPU time per XEDIT file load had been addressed in the
z/VM 5.1.0 service stream since GA. We wanted to assess
the impact of that service.
As in other experiments, we used the zSeries hardware sampler
to measure CPU time consumed.
For this experiment, we used the following configuration:
- 2064-116, three-way dedicated partition,
4 GB central storage, 2 GB expanded storage.
-
No activity in the partition being measured, except our
CMS guest and the test case driver.
All other partitions either shut down or idling.
- 2105-F20, 16 GB of cache, FCP attached.
- z/VM 5.1.0 GA RSU, or z/VM 5.1.0 serviced through
mid-October 2005, or z/VM 5.2.0 GA RSU, as called out
in the table below.
- CMS minidisk, formatted at 4 KB, MDC OFF.
In the table below, a transaction is one XEDIT load
of the 100-block (400 KB) file.
XEDIT Results
(scaled to 5.1 GA RSU)
|
| Run |
5.1 GA RSU |
5.1 Oct 2005 |
5.2 GA RSU |
| Tx/sec |
6.42 |
71.9 |
76.5 |
| scaled |
|
11.2 |
11.9 |
| CP/tx (usec) |
24400 |
1980 |
2050 |
| scaled |
|
0.0809 |
0.0837 |
| Virt/tx (usec) |
575 |
552 |
548 |
| scaled |
|
0.961 |
0.954 |
| Total/tx (usec) |
25000 |
2530 |
2590 |
| scaled |
|
0.101 |
0.104 |
|
Notes:
2064-116. Three-way dedicated partition. 4 GB central. 2 GB XSTORE.
2105-F20 16 GB FCP.
EFBA minidisk with MDC OFF.
|
Discussion
In these measurements
we saw an 1190% improvement in throughput rate and
a 91.5% decrease in CPU time per transaction,
compared to z/VM 5.1.0 GA RSU.
APARs
VM63534
and
VM63725,
available correctively for z/VM 5.1.0 and included in z/VM 5.2.0,
contained performance enhancements for I/O to emulated FBA
volumes. These changes are especially helpful for applications
whose file I/O buffers are not aligned on 512-byte boundaries.
They also help applications that tend to issue
multi-block I/Os. XEDIT does both of these things and so
these changes were effective for it.
Paging
In this experiment we used a CMS Rexx program to induce
paging on a z/VM system specifically configured to be
storage-constrained. This program used the Rexx
storage() function to touch virtual storage pages
randomly, with a uniform distribution.
By running this program in a storage-constrained
environment, we induced page faults.
In this experiment, we measured transaction rate
by measuring pages touched per second by the thrasher.
Being interested in how Control Program overhead
had changed since z/VM 5.1.0,
we also measured CP CPU time per page moved.
Finally, being interested in the efficacy of CP's storage
management logic, we calculated the pages CP moved per page
the thrasher touched.
Informally, we thought of this metric as commenting on how
"smart" CP was being about keeping the "correct" pages in
storage for the thrasher. Though this metric isn't directly
related to a regression assessment of DASD I/O performance,
we are reporting it here anyway as a matter of general
interest.
For this experiment, we used the following configuration:
- 2084-324, two-way dedicated partition,
2 GB central storage, 0 GB expanded storage.
-
No activity in the partition being measured, except our
CMS guest and the test case driver.
All other partitions either shut down or idling.
- 2105-F20, 16 GB of cache, FICON or FCP attached.
- z/VM 5.1.0 GA RSU, or z/VM 5.1.0 serviced through
mid-October 2005, or z/VM 5.2.0, as called out
in the tables below.
- Two 2 GB paging volumes.
- 512 MB CMS guest, configured to use the Rexx
storage() function to touch pages randomly
within a 480 MB address range of the virtual machine.
- 944 MB of other virtual machines logged on, all
with their addresses spaces completely locked into
storage via CP LOCK REAL.
- CP SET TRACEFRAMES was set artificially
high so that we would end up with about 180 MB of
real storage frames available for holding pages of
the thrashing CMS guest.
- We ran the thrasher for 20 minutes unmeasured, then
collected data for five minutes. Measurements reported
here are from the five-minute measured interval.
The net effect of this configuration was that the z/VM Control
Program would have about 180 MB of real storage to use to
run a CMS guest that was trying to touch about 480 MB worth
of its pages.
This ratio created a healthy paging rate.
Further, the Control Program would have to run this
guest while dealing with large numbers of locked user pages
and CP trace table frames.
This let us
exercise real storage management routines that were significantly
rewritten for z/VM 5.2.0.
ECKD On All Releases
Paging Results
(scaled to ECKD 5.1 GA RSU)
|
| Run |
ECKD 5.1 GA RSU |
ECKD 5.1 Oct 2005 |
ECKD 5.2 GA RSU + APARs |
| Touches/sec |
1670 |
1620 |
1600 |
| scaled |
|
0.973 |
0.960 |
| CP/move (usec) |
9.84 |
9.54 |
11.5 |
| scaled |
|
0.970 |
1.17 |
| Moves/touch |
1.38 |
1.43 |
1.45 |
| scaled |
|
1.04 |
1.05 |
|
Notes:
2084-324.
Two-way dedicated partition.
2 GB central.
0 GB XSTORE.
2105-F20 16 GB FICON/FCP.
944 MB of locked users.
180 MB of DPA.
RXTHR2 in 512 MB virtual, thrashing randomly in 480 MB.
z/VM 5.2.0 includes VM63845, VM63877, and VM63892.
|
Emulated FBA On All Releases
Paging Results
(scaled to 2105 5.1 GA RSU)
|
| Run |
2105 5.1 GA RSU |
2105 5.1 Oct 2005 |
2105 5.2 GA RSU + APARs |
| Touches/sec |
2150 |
2120 |
2700 |
| scaled |
|
0.985 |
1.25 |
| CP/move (usec) |
44.7 |
43.3 |
37.9 |
| scaled |
|
0.970 |
0.850 |
| Moves/touch |
1.48 |
1.53 |
1.41 |
| scaled |
|
1.04 |
0.953 |
|
Notes:
2084-324.
Two-way dedicated partition.
2 GB central.
0 GB XSTORE.
2105-F20 16 GB FICON/FCP.
944 MB of locked users.
180 MB of DPA.
RXTHR2 in 512 MB virtual, thrashing randomly in 480 MB.
z/VM 5.2.0 includes VM63845, VM63877, and VM63892.
|
Discussion
For ECKD, we saw a small drop in transaction rate and a rise
of 17% in CP CPU time per page moved.
This is consistent with our
general findings
for high paging workloads
not constrained by the 2 GB line on
z/VM 5.1.0. The rise is spread among
linkage, dispatcher, address translation, and paging modules
in the Control Program.
We did find drops in time spent in available list scan and
in management of spin locks.
For emulated FBA, we saw a
25% rise in transaction rate and a
15% drop in CP CPU time per page moved.
Significant
improvements in the Control Program's
SCSI modules -- both the generic SCSI modules
and the modules associated with paging to SCSI EDEVs --
accounted for most of the reduction.
The reductions in spin lock time and
available list scan time that we saw in the ECKD case also
appeared in the emulated FBA runs, but they did not contribute
as much percentage-wise to the drop, owing to the SCSI
modules' CPU consumption being the dominant contributor
to CP CPU time when z/VM pages to SCSI.
We emphasize that customers must apply
VM63845, VM63877, and VM63892
to see correct results in environments having large numbers
of locked pages.
Contents | Previous | Next
|