Contents | Previous | Next

SCSI Performance Improvements

Abstract

z/VM 5.3 contains several performance improvements for I/O to emulated FBA on SCSI (EFBA, aka EDEV) volumes. First, z/VM now exploits the SCSI write-same function of the IBM 2105 and 2107 DASD subsystems, so as to accelerate the CMS FORMAT function for minidisks on EDEVs. Compared to z/VM 5.2, z/VM 5.3 finishes such a FORMAT in 41% less elapsed time and consumes 97% less CPU time. Second, the Control Program (CP) modules that support SCSI were tuned to reduce path length for common kinds of I/O requests. This tuning resulted in anywhere from a 4% to 15% reduction in CP CPU time per unit of work, depending on the workload. Third, for CP paging to EDEVs, the Control Program paging subsystem was changed to bypass FBA emulation and instead call the SCSI modules directly. In our workload, this enhancement decreased CP CPU time per page moved by about 25%.

Introduction

In z/VM 5.1, IBM shipped support that let the Control Program (CP) use a zSeries Fibre Channel Protocol (FCP) adapter to perform I/O to SCSI LUNs housed in various IBM storage controllers, such as those of the IBM 2105 family. The basic idea behind the z/VM SCSI support was that a fairly low layer in CP would use SCSI LUNs as backing store for emulation of Fixed Block Architecture (FBA) disk volumes. With this FBA emulation in place, higher levels of CP, such as paging and spooling, could use low-cost SCSI DASD instead of more-expensive ECKD DASD. The FBA emulation also let CP place user minidisks on SCSI volumes. Thus, guests not aware of SCSI and FCP protocols could use SCSI storage, CP having fooled those guests into thinking the storage were FBA. IBM's objective in supporting SCSI DASD on z/VM was to help customers reduce the cost of their disk storage subsystems.

Since z/VM 5.1, IBM has made improvements in the performance of z/VM's use of SCSI LUNs. Late in z/VM 5.1, IBM shipped APARs VM63534 and VM63725, which contained performance improvements for I/O to emulated FBA (EFBA) volumes. IBM included those APARs in z/VM 5.2 and documented their effect in its study of z/VM 5.2 disk performance.

In z/VM 5.3, IBM continued its effort to improve performance of emulated FBA volumes, doing work in these areas:

  • The CMS FORMAT command and the CP SCSI layer (sometimes called the SCSI container) now exploit the write-same function of the IBM 2105 and 2107 disk subsystems. Write-same lets CP pass the storage subsystem a single 512-byte buffer and tell the the storage subsystem to write that buffer repeatedly onto a sequence of FB-512 blocks on the LUN.

  • The SCSI container was tuned to remove instructions from frequently-used code paths.

  • The CP paging and spooling subsystems no longer build FBA channel programs to do I/O to emulated FBA DASD. In other words, CP paging and spooling no longer depend on CP's FBA emulation to translate FBA I/O requests to SCSI ones. Rather, paging and spooling now call the SCSI container directly, bypassing the building of FBA channel programs and bypassing FBA emulation's conversion and handling of those channel programs. There are two consequences to this change. First, CPU time per page moved is reduced, because the overhead of building an FBA channel program and then emulating it on SCSI were both eliminated. Second, because the SCSI container can overlap I/Os to a LUN, paging and spooling can now have more than one I/O in progress at a time when using EDEVs.

This report chapter describes the four different experiments IBM performed to measure the effects of these improvements.


SCSI Write-Same: CMS FORMAT

Method

Overview: We set up a CMS user ID with a minidisk on an EDEV. We formatted the minidisk with write-same disabled and then again with write-same enabled. For each case, we measured elapsed time and processor time consumed.

Environment: See table notes.

Data collected: We collected CP QUERY TIME data and CP monitor data.

Results and Discussion

EFBA Minidisk Fast-Format Results
Metric z/VM 5.2 z/VM 5.3 Delta % Delta
Elapsed time (sec) 468 276 -192 -41%
CP CPU time (msec) 610 20 -590 -97%
Virtual CPU time (msec) 0 0 0 0%
Total CPU time (msec) 610 20 -590 -97%
Notes: 2084-C24, model-capacity indicator 322. Two dedicated engines, 2 GB central, 2 GB XSTORE. 2105-F20, 16 GB, one 1 Gb FCP chpid. z/VM 5.2 serviced through May 2007. z/VM 5.3 GA RSU. CMS FORMAT ( BLKSIZE 4K of a 10 GB minidisk on a 100 GB EDEV.

SCSI write-same removed 41% of the elapsed time and 97% of the CP CPU time from the formatting of this minidisk.


SCSI Container Tuning: XEDIT Read

Method

Overview: We gave a CMS guest a 4-KB-formatted minidisk on an emulated FBA volume, MDC OFF. We ran an exec that looped on XEDIT reading a 100-block file from the minidisk. We measured XEDIT file loads per second and CP CPU time per XEDIT file load.

Environment: See table notes.

Data collected: We counted XEDIT file loads per second and used this as the transaction rate. We also collected zSeries hardware sampler data. We used the sampler data to calculate CP CPU time used per transaction.

Results and Discussion

EFBA Minidisk XEDIT Read Results
Metric z/VM 5.2 z/VM 5.3 Delta % Delta
Read rate (/sec) 100.9 102.2 1.3 1.3%
CP CPU/read (usec) 1121.23 1005.17 -116.06 -10.4%
Virtual CPU/read (usec) 360.79 366.12 5.33 1.5%
Total CPU/read (usec) 1482.02 1371.29 -110.73 -7.47%
Notes: 2084-B16, model-capacity indicator 320. Partition with three dedicated engines, 4 GB central, 2 GB XSTORE. 2105-F20, 16 GB, one 1 Gb FCP chpid. z/VM 5.2 with all service applied (May 2007). z/VM 5.3 GA RSU.

The SCSI container tuning resulted in about a 10% reduction in CP CPU time per unit of data moved. Transaction rate increased slightly.


SCSI Container Tuning: Linux IOzone

Method

Overview: We ran a subset of our IOzone workloads as described in our IOzone appendix. Because Linux disk performance is a topic of continuing interest, we chose to run not only the emulated FBA cases, but also some ECKD and Linux-native cases.

Environment: See table notes.

Data collected: To assess data rates, we collected IOzone console output. To assess CPU time per unit of work, we used the zSeries hardware sampler.

Results and Discussion

IOzone Overall Results
(scaled to z/VM 5.2)
Configuration KB/sec Total CPU/KB CP CPU/KB Virtual CPU/KB
ECKD SSCH        
EDED 1.00 0.99 0.94 1.00
EMD0 0.99 0.98 0.94 0.99
EMD1 1.00 0.98 0.96 0.99
EFBA SSCH        
FDED 1.01 0.92 0.85 0.99
FMD0 1.01 0.92 0.86 0.99
FMD1 1.00 0.93 0.87 1.00
ECKD Diag X'250'        
D240 1.00 0.99 0.94 0.99
D241 1.01 0.99 0.98 0.99
EFBA Diag X'250'        
G240 1.01 0.96 0.87 0.99
G241 1.00 0.96 0.91 1.00
Linux native SCSI        
LNS0 1.01 0.98 0.90 0.99
Notes: 2084-B16, model-capacity indicator 320. Partition with three dedicated engines, 4 GB central, 2 GB XSTORE. 2105-F20, 16 GB, one FICON chpid, one 1 Gb FCP chpid. z/VM 5.2 with all service applied (May 2007). z/VM 5.3 GA RSU. Linux SLES 9 SP 3, 192 MB, 64-bit, virtual uniprocessor. See our IOzone appendix for workload descriptions.

We see that in all cases, z/VM 5.3 equalled z/VM 5.2 in data rate and in virtual time per unit of work. For CP CPU time per unit of work, improvements range from 4% to 15%. Improvements in the FBA cases (Fxxx, Gxxx) exceed improvements in the other cases (Exxx, Dxxx, Lxxx) because of z/VM 5.3's tuning in the SCSI container.


Paging and Spooling: FBA Emulation Bypass

Method

Overview: We used a CMS Rexx program to induce paging on a z/VM system specifically configured to be storage-constrained. This program used the Rexx storage() function to touch virtual storage pages randomly, with a uniform distribution. By running this program in a storage-constrained environment, we induced page faults.

Configuration: We used the following configuration:

  • 2084-C24, model-capacity indicator 322.
  • Partition with two dedicated engines, 2 GB central storage, 0 GB expanded storage.
  • No activity in the partition being measured, except our CMS guest and the test case driver. All other partitions either shut down or idling.
  • 2105-F20, 16 GB of cache, FICON or FCP attached.
  • z/VM 5.2 serviced through May 2007, or z/VM 5.3, as called out in the tables below.
  • Two 2 GB paging volumes, either two ECKD or two EFBA (EDEV).
  • 512 MB CMS guest, configured to use the Rexx storage() function to touch pages randomly within a 480 MB address range of the virtual machine.
  • 944 MB of other virtual machines logged on, all with their addresses spaces completely locked into storage via CP LOCK REAL.
  • CP SET TRACEFRAMES was set artificially high so that we would end up with about 180 MB of real storage frames available for holding pages of the thrashing CMS guest.
  • We ran the thrasher for 20 minutes unmeasured, then collected data for five minutes. Measurements reported here are from the five-minute measured interval.

The net effect of this configuration was that the z/VM Control Program would have about 180 MB of real storage to use to run a CMS guest that was trying to touch about 480 MB worth of its pages. This ratio created a healthy paging rate. Further, the Control Program would have to run this guest while dealing with large numbers of locked user pages and CP trace table frames. This let us exercise real storage management routines that were significantly rewritten for z/VM 5.3.

One other note about configuration. We are aware that comparing ECKD paging to SCSI paging is a topic of continuing interest. So, we ran this pair of experiments with ECKD DASD as well as with SCSI DASD. This lets us illustrate the differences in CP CPU time per page moved for the two different DASD types.

Data collected: We measured transaction rate by measuring pages touched per second by the thrasher. Being interested in how CP overhead had changed since z/VM 5.2, we also measured CP CPU time per page moved. Finally, being interested in the efficacy of CP's storage management logic, we calculated the pages CP moved per page the thrasher touched. Informally, we thought of this metric as commenting on how "smart" CP was being about keeping the "correct" pages in storage for the thrasher. Though this metric isn't directly related to an assessment of SCSI I/O performance, we are reporting it here anyway as a matter of general interest.

Results and Discussion

SCSI Paging, z/VM 5.2 to z/VM 5.3
Metric z/VM 5.2 z/VM 5.3 Delta % Delta
Page moves (/sec) 3528 3523 -5 0%
CP/move (usec) 37.7 28.5 -9.2 -24%
Page touches (/sec) 2627 2616 -11 0.4%
Virt/touch (usec) 51.8 51.6 -0.2 0%
Moves/touch 1.35 1.35 0 0%
Notes: 2084-C24, model-capacity indicator 322. Two dedicated engines, 2 GB central, 0 GB XSTORE. 2105-F20, 16 GB, one 1 Gb FCP chpid. z/VM 5.2 with all service applied (May 2007). z/VM 5.3 GA RSU. 944 MB of locked users. 180 MB of DPA. RXTHR2 in 512 MB virtual, thrashing randomly in 480 MB.

For paging to SCSI, we see that transaction rate and page touch rate are unchanged, but CP time per page moved is down about 25%. This is due to the z/VM 5.3 FBA bypass for paging and spooling.

ECKD Paging, z/VM 5.2 to z/VM 5.3
Metric z/VM 5.2 z/VM 5.3 Delta % Delta
Page moves (/sec) 2277 2119 -158 -6.9%
CP/move (usec) 10.2 11.8 1.6 15.7%
Page touches (/sec) 1597 1785 188 11.8%
Virt/touch (usec) 51.1 51.1 0 0%
Moves/touch 1.43 1.18 -0.25 -17.5%
Notes: 2084-C24, model-capacity indicator 322. Two dedicated engines, 2 GB central, 0 GB XSTORE. 2105-F20, 16 GB, one 1 Gb FCP chpid. z/VM 5.2 with all service applied (May 2007). z/VM 5.3 GA RSU. 944 MB of locked users. 180 MB of DPA. RXTHR2 in 512 MB virtual, thrashing randomly in 480 MB.

For paging to ECKD, we see that CP time per page moved is elevated slightly in z/VM 5.3. Analysis of zSeries hardware sampler data showed that the increases are due to changes in the CP dispatcher so as to support specialty engines. (For paging to SCSI, the dispatcher growth from specialty engines support is also present, but said growth was more than paid off by the FBA emulation bypass.) We also see that page touches per second are increased by 12%, with moves per touch down by almost 18%. For this particular workload, z/VM 5.3 was more effective than z/VM 5.2 at keeping the correct user pages in storage, thus letting the application experience a higher transaction rate (aka page touch rate).

Finally, the CPU cost of SCSI paging compared to ECKD paging is a topic of continuing interest. On z/VM 5.2, we see that the ratio of CP/move is (37.7/10.2), or 3.7x. On z/VM 5.3, we see that the ratio is (28.5/11.8), or 2.4x. The FBA emulation bypass helped bring the CPU cost of SCSI paging toward the cost of ECKD paging.

Contents | Previous | Next