Contents | Previous | Next

High Performance FICON

Abstract

The IBM System z platform introduced High Performance FICON (zHPF) which uses a new I/O channel program format referred to as transport-mode I/O. Transport-mode I/O requires less overhead between the channel subsystem and the FICON adapter than traditional command-mode I/O requires. As a result of lower overhead, transport-mode I/Os complete faster than command-mode I/Os do, resulting in higher I/O rates and less CPU overhead.

In our experiments transport-mode I/Os averaged a 35% increase in I/O rate, an 18% decrease in service time per I/O, and a 45% to 75% decrease in %CP-CPU per I/O. %CP-CPU per I/O changed based on I/O size and did not vary a lot. We believe service time per I/O and I/O rate vary a lot due to the external interference induced by our shared environment.

Introduction

zHPF was introduced to improve the execution performance of FICON channel programs. zHPF achieves a performance improvement using a new channel program format that reduces the handshake overhead (fetching and decoding commands) between the channel subsystem and the FICON adapter. This is particularly beneficial for small block transfers.

z/VM 6.2 plus VM65041 lets a guest operating system use transport-mode I/O provided the channel and control unit support it. For more information about the z/VM support, see our z/VM 6.2 recent enhancements page.

To evaluate the benefit of transport-mode I/O we ran a variety of I/O-bound workloads, varying read-write mix, volume concurrency, and I/O size, running each combination with command-mode I/O and again with transport-mode I/O. To illustrate the benefit, we collected and tabulated key I/O performance metrics.

Method

IO3390 Workload
Our exerciser IO3390 is a CMS application that uses Start Subchannel (SSCH) to perform random I/Os to a partial-pack minidisk, full-pack minidisk, or dedicated disk formatted at 4 KB block size. The random block numbers are drawn from a uniform distribution [0..size_of_disk-1]. For more information about IO3390, refer to its appendix.

For partial-pack minidisks and full-pack minidisks we organized the IO3390 machines' disks onto real volumes so that as we logged on additional virtual machines, we added load to the real volumes equally. For example, with eight virtual machines running, we had one IO3390 instance assigned to each real volume. With sixteen virtual machines we had two IO3390s per real volume. Using this scheme, we ran 1, 3, 5, 7, and 10 IO3390s per volume with 83-cylinder partial-pack and full-pack minidisks. For dedicated disk we ran 1 IO3390 per volume.

For each combination of number of IO3390s, we tried four different I/O mixes: 0% reads, 33% reads, 67% reads, and 100% reads.

For each I/O mix we varied the number of records per I/O: 1 record per I/O, 4 records per I/O, 16 records per I/O, 32 records per I/O, and 64 records per I/O.

We ran each configuration with command-mode I/O and again with transport-mode I/O.

The IO3390 agents are CMS virtual uniprocessor machines with 24 MB of storage.

System Configuration

Processor: 2097-E64, model-capacity indicator 742, 30G central, 2G XSTORE, four dedicated processors. Thirty-four 3390-3 paging volumes.

IBM TotalStorage DS8800 (2421-931) DASD: 6 GB cache, four 8 Gb FICON chpids leading to a FICON switch, then four 8 Gb FICON chpids from the switch to the DS8800. Twenty-four 3390-3 volumes in a single LSS, eight for partial-pack minidisk, eight for full-pack minidisks, and eight for dedicated disks.

We ran all measurements with z/VM 6.2.0 plus APAR VM65041, with CP SET MDCACHE SYSTEM OFF in effect.

Metrics
For each experiment, we measured I/O rate, I/O service time, percent busy per volume, and %CP-CPU per I/O.

I/O rate is the rate at which I/Os are completing at a volume. As long as the size of the I/Os remains constant, using a different type of I/O to achieve a higher I/O rate for a volume is a performance improvement, because we move more data each second.

I/O service time is the amount of time it takes for the DASD subsystem to perform the requested operation, once the host system starts the I/O. Factors influencing I/O service time include channel speed, load on the DASD subsystem, amount of data being moved in the I/O, whether the I/O is a read or a write, and the presence or availability of cache memory in the controller, just to name a few.

Volume percent busy is the percentage of time during which the device was busy. It is calculated by taking the count of I/Os in a time interval times the average service time per I/O divided by the time period times 100.

Percent CP-CPU per I/O is CP CPU utilization divided by I/O rate.

We ran each configuration for five minutes, with CP Monitor set to emit sample records at one-minute intervals.

Results and Discussion

For our measurements, when we removed the outliers we believe were caused by our shared environment, transport-mode I/O averaged a 35% increase in I/O rate, an 18% decrease in service time per I/O, and a 45% to 75% decrease in %CP-CPU per I/O. %CP-CPU per I/O changed based on I/O size and did not vary a lot when I/O size was held constant. Service time per I/O and I/O rate varied a lot. We believe this is due to external interference induced by our shared environment.

In doing our analysis we discovered that some small amount of time is apparently missing from the service time accumulators for command-mode I/O. This causes service time per I/O to report as smaller than it really is and thereby prevents the percent-busy calculation from ever reaching 100%.

As records per I/O increased the %CP-CPU used per I/O delta between command-mode and transport-mode increased. Transport-mode I/O scaled more efficiently as I/O sizes got larger.

I/O device type did not have an influence on results.

Introducing transport-mode I/O support did not cause any regression to the performance of command-mode I/O.

In the following table we show a comparison of command-mode I/O to transport-mode I/O. This measurement was done with dedicated disks and 1 4 KB record per I/O. We varied the percent of I/Os that were reads. These results show the benefit that we received from using transport-mode I/O.

Transport-mode I/O vs. command-mode I/O, 0% reads, 1 record per I/O
Guests/vol Run Name I/Os/vol/sec Serv/I/O (msec) %Busy/vol %CP-CPU/I/O
1
 
 
 
JB001218 (c)
JB001219 (t)
Delta
%Delta
3373.4
4499.1
1125.7
33.37
0.2525
0.2038
-0.0487
-19.29
85.1740
91.6766
6.5026
7.63
0.00156
0.00090
-0.00067
-42.74
Notes: (c) denotes command-mode I/O. (t) denotes transport-mode I/O.

Transport-mode I/O vs. command-mode I/O, 33% reads, 1 record per I/O
Guests/vol Run Name I/Os/vol/sec Serv/I/O (msec) %Busy/vol %CP-CPU/I/O
1
 
 
 
JB001228 (c)
JB001229 (t)
Delta
%Delta
3455.8
4663.8
1208.0
34.96
0.2447
0.1962
-0.0485
-19.82
84.5469
91.4839
6.9370
8.20
0.00156
0.00089
-0.00066
-42.61
Notes: (c) denotes command-mode I/O. (t) denotes transport-mode I/O.

Transport-mode I/O vs. command-mode I/O, 67% reads, 1 record per I/O
Guests/vol Run Name I/Os/vol/sec Serv/I/O (msec) %Busy/vol %CP-CPU/I/O
1
 
 
 
JB001238 (c)
JB001239 (t)
Delta
%Delta
3605.1
4914.6
1309.5
36.32
0.2326
0.1855
-0.0471
-20.25
83.8508
91.1570
7.3062
8.71
0.00156
0.00090
-0.00066
-42.33
Notes: (c) denotes command-mode I/O. (t) denotes transport-mode I/O.

Transport-mode I/O vs. command-mode I/O, 100% reads, 1 record per I/O
Guests/vol Run Name I/Os/vol/sec Serv/I/O (msec) %Busy/vol %CP-CPU/I/O
1
 
 
 
JB001248 (c)
JB001249 (t)
Delta
%Delta
3728.1
5225.0
1496.9
40.15
0.2240
0.1739
-0.0501
-22.37
83.5212
90.8752
7.3540
8.80
0.00157
0.00091
-0.00066
-42.23
Notes: (c) denotes command-mode I/O. (t) denotes transport-mode I/O.

As we increased I/O size we saw the delta between command-mode I/O and transport-mode I/O increase for %CP-CPU per I/O. Transport-mode was more beneficial for workloads with larger I/Os than it was for workloads with smaller I/Os. The following table shows a larger delta in %CP-CPU per I/O, demonstrating the benefit that large I/Os got from transport-mode I/O.

Transport-mode I/O vs. command-mode I/O, 0% reads, 64 records per I/O
Guests/vol Run Name I/Os/vol/sec Serv/I/O (msec) %Busy/vol %CP-CPU/I/O
1
 
 
 
JB001226 (c)
JB001227 (t)
Delta
%Delta
310.1
428.0
117.9
38.02
3.0893
2.2957
-0.7936
-25.69
95.8157
98.2642
2.4485
2.56
0.00975
0.00246
-0.00729
-74.76
Notes: (c) denotes command-mode I/O. (t) denotes transport-mode I/O.

Transport-mode I/O vs. command-mode I/O, 33% reads, 64 records per I/O
Guests/vol Run Name I/Os/vol/sec Serv/I/O (msec) %Busy/vol %CP-CPU/I/O
1
 
 
 
JB001236 (c)
JB001237 (t)
Delta
%Delta
328.2
463.5
135.3
41.22
2.9120
2.1183
-0.7937
-27.26
95.5772
98.1808
2.6036
2.72
0.00972
0.00243
-0.00729
-74.96
Notes: (c) denotes command-mode I/O. (t) denotes transport-mode I/O.

Transport-mode I/O vs. command-mode I/O, 67% reads, 64 records per I/O
Guests/vol Run Name I/Os/vol/sec Serv/I/O (msec) %Busy/vol %CP-CPU/I/O
1
 
 
 
JB001246 (c)
JB001247 (t)
Delta
%Delta
359.4
554.4
195.0
54.26
2.6454
1.7660
-0.8794
-33.24
95.0844
97.9029
2.8185
2.96
0.00976
0.00242
-0.00734
-75.17
Notes: (c) denotes command-mode I/O. (t) denotes transport-mode I/O.

Transport-mode I/O vs. command-mode I/O, 100% reads, 64 records per I/O
Guests/vol Run Name I/Os/vol/sec Serv/I/O (msec) %Busy/vol %CP-CPU/I/O
1
 
 
 
JB001256 (c)
JB001257 (t)
Delta
%Delta
388.8
700.1
311.3
80.07
2.4330
1.3924
-1.0406
-42.77
94.6053
97.4776
2.8723
3.04
0.00988
0.00241
-0.00747
-75.65
Notes: (c) denotes command-mode I/O. (t) denotes transport-mode I/O.

Summary and Conclusions

For our workloads transport-mode I/Os averaged a 35% increase in I/O rate, an 18% decrease in service time per I/O, and a 45% to 75% decrease in %CP-CPU per I/O. This is because transport-mode I/O is less complex than command-mode I/O.

Workloads that do large I/Os benefit the most from transport-mode I/O.

Contents | Previous | Next