Contents | Previous | Next

CP Disk I/O Performance

Introduction

The purpose of this study was to measure the processor costs of regression disk environments. Simply put, we defined a number of different measurement scenarios that exercise z/VM's ability to do disk I/O. We ran each measurement case on both z/VM 5.1.0 and z/VM 5.2.0 and compared corresponding runs.

For this study, we devised and ran three measurement suites:

  • To evaluate guest-initiated I/O (SSCH, Diagnose x'250', and FCP SCSI), we used the Linux disk exerciser IOzone to measure disk performance a Linux guest experiences. We ran IOzone against a wide variety of disk choices a Linux guest might use.
  • To evaluate CMS-initiated diagnose I/O, we ran an XEDIT loop that repeatedly read a 400 KB file from a minidisk located on an emulated FBA volume.
  • To evaluate CP paging I/O, we ran a CMS-based program designed to induce page faults. We ran this program in a system configuration specifically contrived to be particularly stressful on the Control Program. These stressors included more than just being short on real storage.

We chose these measurements specifically because they exercise disk I/O scenarios present in both z/VM 5.1.0 and z/VM 5.2.0. We need to mention, though, that z/VM 5.2.0 also contains new I/O support. In our Linux Disk I/O Alternatives chapter, we describe 64-bit extensions to Diagnose x'250' and the Linux device driver that exploits them.

In the following sections we describe the results of each of our disk I/O regression experiments.

z/VM 5.3 note: For later information about SCSI performance, see our z/VM 5.3 SCSI study.

Linux IOzone

To measure disk performance with Linux guests, we set up a single Linux guest running the IOzone disk exerciser. IOzone is a file system exercise tool. See our IOzone workload description for details of how we run IOzone.

For this experiment, we used the following configuration:

  • 2084-324, two-way dedicated partition, 2 GB central storage, 2 GB expanded storage.
  • No activity in the partition being measured, except our Linux guest and the test case driver.
  • 2105-F20, 16 GB of cache, FICON and FCP attached.
  • z/VM 5.1.0 GA RSU, or z/VM 5.1.0 serviced through mid-October 2005, or z/VM 5.2.0 GA RSU plus certain other PTFs, as called out in the tables below.
  • Linux SLES 9 SP 1, 192 MB virtual uniprocessor, ext2 file systems.

The Linux guest is 64-bit in all runs except the Diagnose x'250' runs. In those runs, we used a 31-bit Linux guest. We could not do a regression measurement of 64-bit Diagnose x'250' I/O, because it is not supported on z/VM 5.1.0.

It is important to notice that we chose the ballast file to be about four times larger than the virtual machine. This ensures that the Linux page cache plays little to no role in buffering IOzone's file operations. We wanted to be sure to measure CP I/O processing performance, not the performance of the Linux page cache.

We remind the reader that this chapter studies disk performance from a regression perspective. To see performance comparisons among choices, and to read about new z/VM 5.2.0 disk choices for Linux guests, the reader should refer to our Linux Disk I/O Alternatives chapter.

We also remind the reader that our primary intent in conducting these experiments was to measure the Control Program overhead (processor time per unit of work) associated with these disk I/O methods, so as to determine how said overhead had changed from z/VM 5.1.0 to z/VM 5.2.0. It was not our intent to measure or report on the maximum I/O rate or throughput achievable with z/VM, or with a specific processor, or with a specific type or model of disk hardware.

The disk configurations mentioned in this chapter (e.g., EDED, LNS0, and so on) are defined in our IOzone workload description appendix. For the reader's convenience, we offer here the following brief tips on interpreting the configuration names:

Name prefix Name suffix
E for ECKD with the Linux ECKD CCW driver.
  • DED for a dedicated volume.
  • MD0 for a minidisk with MDC OFF.
  • MD1 for a minidisk with MDC ON.
F for emulated FBA with the Linux FBA CCW driver.
  • DED for a dedicated volume.
  • MD0 for a minidisk with MDC OFF.
  • MD1 for a minidisk with MDC ON.
D2 for ECKD minidisk with the Linux Diagnose X'250' driver.
  • 10 for blocksize 1024, MDC OFF.
  • 11 for blocksize 1024, MDC ON.
  • 20 for blocksize 2048, MDC OFF.
  • 21 for blocksize 2048, MDC ON.
  • 40 for blocksize 4096, MDC OFF.
  • 41 for blocksize 4096, MDC ON.
LNS0 for Linux owning an FCP subchannel and using the zFCP driver. None.

For each configuration, the tables below show the ratio of the z/VM 5.2.0 result to the corresponding z/VM 5.1.0 result. Each table comments on a particular phase of IOzone.

IOzone Initial Write Results
(scaled to z/VM 5.1.0)
Configuration KB/sec Total CPU/KB CP CPU/KB Virtual CPU/KB
ECKD ccw        
EDED 0.963 1.01 1.09 0.997
EMD0 1.12 1.01 1.05 1.01
EMD1 1.01 1.01 1.05 1.00
Emulated FBA ccw        
FDED 0.999 1.01 1.04 1.000
FMD0 1.00 1.02 1.04 1.01
FMD1 0.998 1.01 1.03 0.994
ECKD Diag X'250' (31-bit)        
D210 1.07 1.01 1.04 1.01
D211 0.999 1.03 1.08 1.01
D220 2.24 0.964 0.820 0.992
D221 1.02 1.03 1.06 1.03
D240 2.31 0.954 0.734 0.985
D241 1.02 1.00 0.994 1.00
Dedicated FCP        
LNS0 1.00 1.01 1.03 1.01
Notes: 2084-324. Two-way dedicated partition. 2 GB central. 2 GB XSTORE. 2105-F20 16 GB FICON/FCP. z/VM 5.1.0 GA RSU. z/VM 5.2.0 GA RSU + VM63893. Linux SLES 9 SP 1, 192 MB virtual uniprocessor.

IOzone Rewrite Results
(scaled to z/VM 5.1.0)
Configuration KB/sec Total CPU/KB CP CPU/KB Virtual CPU/KB
ECKD ccw        
EDED 0.976 1.03 1.04 1.02
EMD0 1.01 1.02 1.05 1.01
EMD1 1.00 1.01 1.01 1.00
Emulated FBA ccw        
FDED 0.999 1.04 1.06 1.02
FMD0 0.999 1.04 1.04 1.04
FMD1 0.999 1.01 1.03 0.999
ECKD Diag X'250' (31-bit)        
D210 0.995 1.00 1.02 0.991
D211 0.998 1.04 1.06 1.03
D220 0.992 1.04 1.10 1.02
D221 1.00 1.04 1.11 1.03
D240 1.00 1.02 1.08 1.01
D241 0.965 1.03 1.07 1.02
Dedicated FCP        
LNS0 0.969 1.000 0.999 1.000
Notes: 2084-324. Two-way dedicated partition. 2 GB central. 2 GB XSTORE. 2105-F20 16 GB FICON/FCP. z/VM 5.1.0 GA RSU. z/VM 5.2.0 GA RSU + VM63893. Linux SLES 9 SP 1, 192 MB virtual uniprocessor.

IOzone Initial Read Results
(scaled to z/VM 5.1.0)
Configuration KB/sec Total CPU/KB CP CPU/KB Virtual CPU/KB
ECKD ccw        
EDED 0.995 1.09 1.50 1.02
EMD0 0.995 1.07 1.40 1.01
EMD1 0.995 1.04 1.12 0.986
Emulated FBA ccw        
FDED 0.994 1.05 1.06 1.03
FMD0 0.995 1.03 1.05 1.02
FMD1 0.998 1.01 1.01 1.01
ECKD Diag X'250' (31-bit)        
D210 0.999 1.04 1.09 1.01
D211 0.999 1.07 1.09 1.05
D220 0.999 1.04 1.11 1.02
D221 0.998 1.04 1.05 1.02
D240 0.998 1.00 0.994 1.01
D241 0.996 1.06 1.14 1.01
Dedicated FCP        
LNS0 0.997 1.02 0.973 1.02
Notes: 2084-324. Two-way dedicated partition. 2 GB central. 2 GB XSTORE. 2105-F20 16 GB FICON/FCP. z/VM 5.1.0 GA RSU. z/VM 5.2.0 GA RSU + VM63893. Linux SLES 9 SP 1, 192 MB virtual uniprocessor.

IOzone Reread Results
(scaled to z/VM 5.1.0)
Configuration KB/sec Total CPU/KB CP CPU/KB Virtual CPU/KB
ECKD ccw        
EDED 0.996 1.02 1.32 0.962
EMD0 0.995 1.06 1.41 0.999
EMD1 0.954 1.27 1.41 1.22
Emulated FBA ccw        
FDED 0.994 1.03 1.06 0.994
FMD0 0.996 1.03 1.04 1.01
FMD1 0.992 0.922 0.925 0.919
ECKD Diag X'250' (31-bit)        
D210 0.999 1.03 1.09 1.000
D211 0.954 1.16 1.18 1.14
D220 1.000 1.04 1.10 1.02
D221 0.975 1.03 1.05 1.01
D240 0.998 1.02 1.08 1.01
D241 0.954 0.988 1.09 0.952
Dedicated FCP        
LNS0 0.998 1.01 1.06 0.999
Notes: 2084-324. Two-way dedicated partition. 2 GB central. 2 GB XSTORE. 2105-F20 16 GB FICON/FCP. z/VM 5.1.0 GA RSU. z/VM 5.2.0 GA RSU + VM63893. Linux SLES 9 SP 1, 192 MB virtual uniprocessor.

IOzone Overall Results
(scaled to z/VM 5.1.0)
Configuration KB/sec Total CPU/KB CP CPU/KB Virtual CPU/KB
ECKD ccw        
EDED 0.979 1.03 1.21 1.00
EMD0 1.04 1.03 1.20 1.01
EMD1 1.00 1.05 1.13 1.02
Emulated FBA ccw        
FDED 0.997 1.03 1.05 1.01
FMD0 0.999 1.03 1.04 1.02
FMD1 0.998 1.000 1.01 0.989
ECKD Diag X'250' (31-bit)        
D210 1.02 1.02 1.06 1.00
D211 0.998 1.06 1.10 1.04
D220 1.37 1.01 1.01 1.01
D221 1.00 1.04 1.06 1.03
D240 1.42 0.990 0.942 0.998
D241 0.993 1.02 1.09 0.999
Dedicated FCP        
LNS0 0.988 1.01 1.02 1.01
Notes: 2084-324. Two-way dedicated partition. 2 GB central. 2 GB XSTORE. 2105-F20 16 GB FICON/FCP. z/VM 5.1.0 GA RSU. z/VM 5.2.0 GA RSU + VM63893. Linux SLES 9 SP 1, 192 MB virtual uniprocessor.
Discussion

Most of the IOzone cases show regression results consistent with the trends reported in our general discussion of the regression traits of z/VM 5.2.0, namely, that we see some rise in CP CPU/tx but that overall CPU/tx and transaction rate are not changed all that much.

Notable are the Diagnose x'250' initial write cases for block sizes 2048 and 4096, which showed markedly improved data rates and CP CPU times compared to z/VM 5.1. We investigated this for each block size by comparing the z/VM 5.1 initial write pass to its rewrite pass, and by comparing the z/VM 5.2 initial write pass to its rewrite pass. It was our intuition that on a given z/VM release, we should find that the initial write pass had about the same performance as the rewrite pass. On z/VM 5.2, we indeed found what we expected. On z/VM 5.1, however, we found that the initial write pass experienced degraded performance (about 56% drop in throughput and about 43% rise in CP CPU/tx) compared to the rewrite pass. We also found that the rewrite numbers for z/VM 5.1 were about equal to both the initial write and rewrite numbers for z/VM 5.2. Based on these findings, we concluded that z/VM 5.2 must have coincidentally repaired a z/VM 5.1 defect in Diagnose x'250' write processing, and so we studied the anomaly no further.

During z/VM 5.2.0 development, we measured a variety of workloads, some known to be heavily constrained on z/VM 5.1.0, others not. We knew from our measures of unconstrained workloads that they would pay a CPU consumption penalty for the constraint relief technology put into z/VM 5.2.0, even though they would get no direct benefit from those technologies. To compensate, we looked for ways to put in offsetting improvements for such workloads. For work resembling IOzone, we made improvements in CCW translation and in support for emulated FBA volumes. This helped us maintain regression performance for z/VM 5.2.0.

XEDIT Read Loop

In this experiment we gave a CMS guest a 4-KB-formatted minidisk on an emulated FBA volume, MDC OFF. We ran an exec that looped on XEDIT reading a 100-block file from the minidisk. We measured XEDIT file loads per second and total CPU time per XEDIT file load.

In this measurement we specifically confined ourselves to using an emulated FBA minidisk with MDC OFF. We did this because we knew that regression performance of ECKD volumes and of MDC were being covered by the Linux IOzone experiments. We also knew that excessive CPU time per XEDIT file load had been addressed in the z/VM 5.1.0 service stream since GA. We wanted to assess the impact of that service.

As in other experiments, we used the zSeries hardware sampler to measure CPU time consumed.

For this experiment, we used the following configuration:

  • 2064-116, three-way dedicated partition, 4 GB central storage, 2 GB expanded storage.
  • No activity in the partition being measured, except our CMS guest and the test case driver. All other partitions either shut down or idling.
  • 2105-F20, 16 GB of cache, FCP attached.
  • z/VM 5.1.0 GA RSU, or z/VM 5.1.0 serviced through mid-October 2005, or z/VM 5.2.0 GA RSU, as called out in the table below.
  • CMS minidisk, formatted at 4 KB, MDC OFF.

In the table below, a transaction is one XEDIT load of the 100-block (400 KB) file.

XEDIT Results
(scaled to 5.1 GA RSU)
Run 5.1 GA RSU 5.1 Oct 2005 5.2 GA RSU
Tx/sec 6.42 71.9 76.5
scaled   11.2 11.9
CP/tx (usec) 24400 1980 2050
scaled   0.0809 0.0837
Virt/tx (usec) 575 552 548
scaled   0.961 0.954
Total/tx (usec) 25000 2530 2590
scaled   0.101 0.104
Notes: 2064-116. Three-way dedicated partition. 4 GB central. 2 GB XSTORE. 2105-F20 16 GB FCP. EFBA minidisk with MDC OFF.
Discussion

In these measurements we saw an 1190% improvement in throughput rate and a 91.5% decrease in CPU time per transaction, compared to z/VM 5.1.0 GA RSU. APARs VM63534 and VM63725, available correctively for z/VM 5.1.0 and included in z/VM 5.2.0, contained performance enhancements for I/O to emulated FBA volumes. These changes are especially helpful for applications whose file I/O buffers are not aligned on 512-byte boundaries. They also help applications that tend to issue multi-block I/Os. XEDIT does both of these things and so these changes were effective for it.

Paging

In this experiment we used a CMS Rexx program to induce paging on a z/VM system specifically configured to be storage-constrained. This program used the Rexx storage() function to touch virtual storage pages randomly, with a uniform distribution. By running this program in a storage-constrained environment, we induced page faults.

In this experiment, we measured transaction rate by measuring pages touched per second by the thrasher. Being interested in how Control Program overhead had changed since z/VM 5.1.0, we also measured CP CPU time per page moved. Finally, being interested in the efficacy of CP's storage management logic, we calculated the pages CP moved per page the thrasher touched. Informally, we thought of this metric as commenting on how "smart" CP was being about keeping the "correct" pages in storage for the thrasher. Though this metric isn't directly related to a regression assessment of DASD I/O performance, we are reporting it here anyway as a matter of general interest.

For this experiment, we used the following configuration:

  • 2084-324, two-way dedicated partition, 2 GB central storage, 0 GB expanded storage.
  • No activity in the partition being measured, except our CMS guest and the test case driver. All other partitions either shut down or idling.
  • 2105-F20, 16 GB of cache, FICON or FCP attached.
  • z/VM 5.1.0 GA RSU, or z/VM 5.1.0 serviced through mid-October 2005, or z/VM 5.2.0, as called out in the tables below.
  • Two 2 GB paging volumes.
  • 512 MB CMS guest, configured to use the Rexx storage() function to touch pages randomly within a 480 MB address range of the virtual machine.
  • 944 MB of other virtual machines logged on, all with their addresses spaces completely locked into storage via CP LOCK REAL.
  • CP SET TRACEFRAMES was set artificially high so that we would end up with about 180 MB of real storage frames available for holding pages of the thrashing CMS guest.
  • We ran the thrasher for 20 minutes unmeasured, then collected data for five minutes. Measurements reported here are from the five-minute measured interval.

The net effect of this configuration was that the z/VM Control Program would have about 180 MB of real storage to use to run a CMS guest that was trying to touch about 480 MB worth of its pages. This ratio created a healthy paging rate. Further, the Control Program would have to run this guest while dealing with large numbers of locked user pages and CP trace table frames. This let us exercise real storage management routines that were significantly rewritten for z/VM 5.2.0.

ECKD On All Releases

Paging Results
(scaled to ECKD 5.1 GA RSU)
Run ECKD 5.1 GA RSU ECKD 5.1 Oct 2005 ECKD 5.2 GA RSU + APARs
Touches/sec 1670 1620 1600
scaled   0.973 0.960
CP/move (usec) 9.84 9.54 11.5
scaled   0.970 1.17
Moves/touch 1.38 1.43 1.45
scaled   1.04 1.05
Notes: 2084-324. Two-way dedicated partition. 2 GB central. 0 GB XSTORE. 2105-F20 16 GB FICON/FCP. 944 MB of locked users. 180 MB of DPA. RXTHR2 in 512 MB virtual, thrashing randomly in 480 MB. z/VM 5.2.0 includes VM63845, VM63877, and VM63892.

Emulated FBA On All Releases

Paging Results
(scaled to 2105 5.1 GA RSU)
Run 2105 5.1 GA RSU 2105 5.1 Oct 2005 2105 5.2 GA RSU + APARs
Touches/sec 2150 2120 2700
scaled   0.985 1.25
CP/move (usec) 44.7 43.3 37.9
scaled   0.970 0.850
Moves/touch 1.48 1.53 1.41
scaled   1.04 0.953
Notes: 2084-324. Two-way dedicated partition. 2 GB central. 0 GB XSTORE. 2105-F20 16 GB FICON/FCP. 944 MB of locked users. 180 MB of DPA. RXTHR2 in 512 MB virtual, thrashing randomly in 480 MB. z/VM 5.2.0 includes VM63845, VM63877, and VM63892.
Discussion

For ECKD, we saw a small drop in transaction rate and a rise of 17% in CP CPU time per page moved. This is consistent with our general findings for high paging workloads not constrained by the 2 GB line on z/VM 5.1.0. The rise is spread among linkage, dispatcher, address translation, and paging modules in the Control Program. We did find drops in time spent in available list scan and in management of spin locks.

For emulated FBA, we saw a 25% rise in transaction rate and a 15% drop in CP CPU time per page moved. Significant improvements in the Control Program's SCSI modules -- both the generic SCSI modules and the modules associated with paging to SCSI EDEVs -- accounted for most of the reduction. The reductions in spin lock time and available list scan time that we saw in the ECKD case also appeared in the emulated FBA runs, but they did not contribute as much percentage-wise to the drop, owing to the SCSI modules' CPU consumption being the dominant contributor to CP CPU time when z/VM pages to SCSI.

We emphasize that customers must apply VM63845, VM63877, and VM63892 to see correct results in environments having large numbers of locked pages.

Contents | Previous | Next