|
Contents | Previous | Next
SCSI Performance Improvements
Abstract
z/VM 5.3 contains several
performance improvements for I/O to emulated FBA on SCSI
(EFBA, aka EDEV) volumes.
First, z/VM now exploits the
SCSI write-same function of the IBM 2105 and 2107 DASD subsystems,
so as to accelerate the CMS FORMAT function for minidisks on EDEVs.
Compared to z/VM 5.2, z/VM 5.3 finishes such a FORMAT in 41% less
elapsed time and consumes 97% less CPU time.
Second, the Control Program
(CP) modules that support SCSI were tuned to reduce path length for
common kinds of I/O requests. This tuning resulted in anywhere from
a 4% to 15% reduction in CP CPU time per unit of work, depending on
the workload.
Third, for CP paging to EDEVs, the Control
Program paging subsystem was changed to bypass FBA emulation and
instead call the SCSI modules directly.
In our workload, this enhancement
decreased CP CPU time per page moved by about 25%.
Introduction
In z/VM 5.1, IBM shipped support that let the Control Program (CP)
use a zSeries Fibre Channel Protocol (FCP) adapter to perform I/O
to SCSI LUNs housed in various IBM storage controllers, such as
those of the IBM 2105 family. The basic idea behind the z/VM SCSI
support was that a fairly low layer in CP would use SCSI LUNs as
backing store for emulation of Fixed Block Architecture (FBA)
disk volumes. With this FBA emulation in place, higher levels of
CP, such as paging and spooling, could use low-cost SCSI DASD
instead of more-expensive ECKD DASD. The FBA emulation also let
CP place user minidisks on SCSI volumes. Thus, guests not aware
of SCSI and FCP protocols could use SCSI storage, CP having
fooled those guests into thinking the storage were FBA. IBM's
objective in supporting SCSI DASD on z/VM was
to help customers reduce the cost of their
disk storage subsystems.
Since z/VM 5.1, IBM has made improvements in the performance of
z/VM's use of SCSI LUNs. Late in z/VM 5.1, IBM shipped APARs
VM63534 and VM63725, which contained performance improvements
for I/O to emulated FBA (EFBA) volumes. IBM included those APARs
in z/VM 5.2 and documented their effect in its
study of z/VM 5.2 disk performance.
In z/VM 5.3, IBM continued its effort to improve performance of
emulated FBA volumes, doing work in these areas:
-
The CMS FORMAT command and the
CP SCSI layer (sometimes called the SCSI container)
now exploit the write-same function of the
IBM 2105 and 2107 disk subsystems. Write-same lets CP pass
the storage subsystem a single 512-byte buffer and tell the
the storage subsystem to write that buffer repeatedly onto a
sequence of FB-512 blocks on the LUN.
-
The SCSI container
was tuned to remove instructions from frequently-used code
paths.
-
The CP paging and spooling subsystems no longer build FBA
channel programs
to do I/O to emulated FBA DASD.
In other words, CP
paging and spooling no longer
depend on CP's FBA emulation to translate FBA
I/O requests to SCSI ones. Rather, paging and spooling now
call the SCSI container directly, bypassing the building of
FBA channel programs and bypassing FBA emulation's
conversion and handling of those channel programs.
There are two consequences to this change. First, CPU time
per page moved is reduced, because the overhead of building
an FBA channel program and then emulating it on SCSI were
both eliminated. Second, because the SCSI container
can overlap I/Os to a LUN, paging and spooling can now have
more than one I/O in progress at a time when using EDEVs.
This report chapter describes the four different experiments
IBM performed to measure the effects of these improvements.
SCSI Write-Same: CMS FORMAT
Method
Overview:
We set up a CMS user ID with a minidisk on an EDEV. We formatted
the minidisk with write-same disabled and then again with write-same
enabled.
For each case, we measured elapsed time and processor time consumed.
Environment:
See table notes.
Data collected:
We collected CP QUERY TIME data and CP monitor data.
Results and Discussion
|
EFBA Minidisk Fast-Format Results
|
| Metric |
z/VM 5.2 |
z/VM 5.3 |
Delta |
% Delta |
| Elapsed time (sec) |
468 |
276 |
-192 |
-41% |
| CP CPU time (msec) |
610 |
20 |
-590 |
-97% |
| Virtual CPU time (msec) |
0 |
0 |
0 |
0% |
| Total CPU time (msec) |
610 |
20 |
-590 |
-97% |
|
Notes:
2084-C24, model-capacity indicator 322.
Two dedicated engines, 2 GB central, 2 GB XSTORE.
2105-F20, 16 GB, one 1 Gb FCP chpid.
z/VM 5.2 serviced through May 2007.
z/VM 5.3 GA RSU.
CMS FORMAT ( BLKSIZE 4K of a 10 GB minidisk on a 100 GB EDEV.
|
SCSI write-same removed 41% of the elapsed time and
97% of the CP CPU time from the formatting of this
minidisk.
SCSI Container Tuning: XEDIT Read
Method
Overview:
We gave a CMS guest a 4-KB-formatted
minidisk on an emulated FBA volume, MDC OFF. We ran an
exec that looped on XEDIT reading a 100-block file from
the minidisk. We measured
XEDIT file loads per second
and CP CPU time per XEDIT file load.
Environment:
See table notes.
Data collected:
We counted XEDIT file loads per second and used this as
the transaction rate. We also collected zSeries hardware sampler
data. We used the sampler data to calculate CP CPU time
used per transaction.
Results and Discussion
|
EFBA Minidisk XEDIT Read Results
|
| Metric |
z/VM 5.2 |
z/VM 5.3 |
Delta |
% Delta |
| Read rate (/sec) |
100.9 |
102.2 |
1.3 |
1.3% |
| CP CPU/read (usec) |
1121.23 |
1005.17 |
-116.06 |
-10.4% |
| Virtual CPU/read (usec) |
360.79 |
366.12 |
5.33 |
1.5% |
| Total CPU/read (usec) |
1482.02 |
1371.29 |
-110.73 |
-7.47% |
|
Notes:
2084-B16, model-capacity indicator 320.
Partition with three dedicated engines, 4 GB central, 2 GB XSTORE.
2105-F20, 16 GB, one 1 Gb FCP chpid.
z/VM 5.2 with all service applied (May 2007).
z/VM 5.3 GA RSU.
|
The SCSI container tuning resulted in about a 10% reduction in
CP CPU time per unit of data moved. Transaction rate increased
slightly.
SCSI Container Tuning: Linux IOzone
Method
Overview:
We ran a subset of our IOzone workloads as described in
our IOzone appendix.
Because Linux disk performance is a topic of continuing
interest, we chose to run not only the emulated FBA
cases, but also some ECKD and Linux-native cases.
Environment:
See table notes.
Data collected:
To assess data rates, we collected IOzone console output.
To assess CPU time per unit of work, we used the zSeries hardware
sampler.
Results and Discussion
IOzone Overall Results
(scaled to z/VM 5.2)
|
| Configuration |
KB/sec |
Total CPU/KB |
CP CPU/KB |
Virtual CPU/KB |
| ECKD SSCH |
|
|
|
|
| EDED |
1.00 |
0.99 |
0.94 |
1.00 |
| EMD0 |
0.99 |
0.98 |
0.94 |
0.99 |
| EMD1 |
1.00 |
0.98 |
0.96 |
0.99 |
| EFBA SSCH |
|
|
|
|
| FDED |
1.01 |
0.92 |
0.85 |
0.99 |
| FMD0 |
1.01 |
0.92 |
0.86 |
0.99 |
| FMD1 |
1.00 |
0.93 |
0.87 |
1.00 |
| ECKD Diag X'250' |
|
|
|
|
| D240 |
1.00 |
0.99 |
0.94 |
0.99 |
| D241 |
1.01 |
0.99 |
0.98 |
0.99 |
| EFBA Diag X'250' |
|
|
|
|
| G240 |
1.01 |
0.96 |
0.87 |
0.99 |
| G241 |
1.00 |
0.96 |
0.91 |
1.00 |
| Linux native SCSI |
|
|
|
|
| LNS0 |
1.01 |
0.98 |
0.90 |
0.99 |
|
Notes:
2084-B16, model-capacity indicator 320.
Partition with three dedicated engines, 4 GB central, 2 GB XSTORE.
2105-F20, 16 GB, one FICON chpid, one 1 Gb FCP chpid.
z/VM 5.2 with all service applied (May 2007).
z/VM 5.3 GA RSU.
Linux SLES 9 SP 3, 192 MB, 64-bit, virtual uniprocessor.
See
our IOzone appendix for workload
descriptions.
|
We see that in all cases, z/VM 5.3 equalled z/VM 5.2 in
data rate and in virtual time per unit of work. For
CP CPU time
per unit of work, improvements range from 4% to
15%. Improvements in the FBA cases (Fxxx, Gxxx)
exceed improvements in the other cases (Exxx, Dxxx, Lxxx)
because of z/VM 5.3's tuning in the SCSI container.
Paging and Spooling: FBA Emulation Bypass
Method
Overview:
We used a CMS Rexx program to induce
paging on a z/VM system specifically configured to be
storage-constrained. This program used the Rexx
storage() function to touch virtual storage pages
randomly, with a uniform distribution.
By running this program in a storage-constrained
environment, we induced page faults.
Configuration:
We used the following configuration:
- 2084-C24, model-capacity indicator 322.
- Partition with two dedicated engines,
2 GB central storage, 0 GB expanded storage.
-
No activity in the partition being measured, except our
CMS guest and the test case driver.
All other partitions either shut down or idling.
- 2105-F20, 16 GB of cache, FICON or FCP attached.
- z/VM 5.2 serviced through May 2007, or z/VM 5.3,
as called out in the tables below.
- Two 2 GB paging volumes, either two ECKD or two
EFBA (EDEV).
- 512 MB CMS guest, configured to use the Rexx
storage() function to touch pages randomly
within a 480 MB address range of the virtual machine.
- 944 MB of other virtual machines logged on, all
with their addresses spaces completely locked into
storage via CP LOCK REAL.
- CP SET TRACEFRAMES was set artificially
high so that we would end up with about 180 MB of
real storage frames available for holding pages of
the thrashing CMS guest.
- We ran the thrasher for 20 minutes unmeasured, then
collected data for five minutes. Measurements reported
here are from the five-minute measured interval.
The net effect of this configuration was that the z/VM Control
Program would have about 180 MB of real storage to use to
run a CMS guest that was trying to touch about 480 MB worth
of its pages.
This ratio created a healthy paging rate.
Further, the Control Program would have to run this
guest while dealing with large numbers of locked user pages
and CP trace table frames.
This let us
exercise real storage management routines that were significantly
rewritten for z/VM 5.3.
One other note about configuration. We are aware that
comparing ECKD paging to SCSI paging is a topic of continuing
interest. So, we ran this pair of experiments with ECKD DASD
as well as with SCSI DASD. This lets us illustrate the
differences in
CP CPU time per page moved
for the two different DASD types.
Data collected:
We measured transaction rate
by measuring pages touched per second by the thrasher.
Being interested in how CP overhead
had changed since z/VM 5.2,
we also measured CP CPU time per page moved.
Finally, being interested in the efficacy of CP's storage
management logic, we calculated the pages CP moved per page
the thrasher touched.
Informally, we thought of this metric as commenting on how
"smart" CP was being about keeping the "correct" pages in
storage for the thrasher. Though this metric isn't directly
related to an assessment of SCSI I/O performance,
we are reporting it here anyway as a matter of general
interest.
Results and Discussion
|
SCSI Paging, z/VM 5.2 to z/VM 5.3
|
| Metric |
z/VM 5.2 |
z/VM 5.3 |
Delta |
% Delta |
| Page moves (/sec) |
3528 |
3523 |
-5 |
0% |
| CP/move (usec) |
37.7 |
28.5 |
-9.2 |
-24% |
| Page touches (/sec) |
2627 |
2616 |
-11 |
0.4% |
| Virt/touch (usec) |
51.8 |
51.6 |
-0.2 |
0% |
| Moves/touch |
1.35 |
1.35 |
0 |
0% |
|
Notes:
2084-C24, model-capacity indicator 322.
Two dedicated engines, 2 GB central, 0 GB XSTORE.
2105-F20, 16 GB, one 1 Gb FCP chpid.
z/VM 5.2 with all service applied (May 2007).
z/VM 5.3 GA RSU.
944 MB of locked users.
180 MB of DPA.
RXTHR2 in 512 MB virtual, thrashing randomly in 480 MB.
|
For paging to SCSI, we see that transaction rate and page touch
rate are unchanged, but CP time per page moved is down about
25%. This is due to the z/VM 5.3 FBA bypass for paging and
spooling.
|
ECKD Paging, z/VM 5.2 to z/VM 5.3
|
| Metric |
z/VM 5.2 |
z/VM 5.3 |
Delta |
% Delta |
| Page moves (/sec) |
2277 |
2119 |
-158 |
-6.9% |
| CP/move (usec) |
10.2 |
11.8 |
1.6 |
15.7% |
| Page touches (/sec) |
1597 |
1785 |
188 |
11.8% |
| Virt/touch (usec) |
51.1 |
51.1 |
0 |
0% |
| Moves/touch |
1.43 |
1.18 |
-0.25 |
-17.5% |
|
Notes:
2084-C24, model-capacity indicator 322.
Two dedicated engines, 2 GB central, 0 GB XSTORE.
2105-F20, 16 GB, one 1 Gb FCP chpid.
z/VM 5.2 with all service applied (May 2007).
z/VM 5.3 GA RSU.
944 MB of locked users.
180 MB of DPA.
RXTHR2 in 512 MB virtual, thrashing randomly in 480 MB.
|
For paging to ECKD, we see that CP time per page moved is elevated
slightly in z/VM 5.3. Analysis of zSeries hardware sampler data
showed that the increases are due to changes in the CP dispatcher
so as to support specialty engines. (For paging to SCSI, the
dispatcher growth from specialty engines support
is also present, but said growth
was more than paid off
by the FBA emulation bypass.) We also see that page touches per
second are increased by 12%, with moves per touch down by almost
18%. For this particular workload, z/VM 5.3 was more effective
than z/VM 5.2
at keeping the correct user pages in storage, thus letting the
application experience a higher transaction rate (aka page
touch rate).
Finally, the CPU cost of SCSI paging compared to ECKD paging is
a topic of continuing interest. On z/VM 5.2, we see that the
ratio of CP/move is (37.7/10.2), or 3.7x. On z/VM 5.3, we see
that the ratio is (28.5/11.8), or 2.4x. The FBA emulation
bypass helped bring the CPU cost of SCSI paging toward the cost
of ECKD paging.
Contents | Previous | Next
|