Contents | Previous | Next

64-bit Fast CCW Translation

Purpose

In z/VM 4.2.0, IBM extended CP's fast CCW translation facility to provide fast translation support for 64-bit disk I/O. One reason for this enhancement was to give 64-bit guests the reduced CPU consumption benefit of fast translation. Another reason for the enhancement was to to enable 64-bit I/O for minidisk cache (MDC); recall that only those disk I/Os that succeed in using fast translation are eligible for MDC.

The purpose of this experiment was to quantify the CP CPU consumption improvement offered by the 64-bit fast translation extension and compare said improvement to the corresponding improvement fast translation already offers to 31-bit I/O.

We specifically did not design this experiment to measure the impact of MDC on 64-bit disk I/O. This is because the impact of MDC on disk I/O is known from other experiments.

Executive Summary of Results

The new fast translation support reduces CP CPU time per MB for 64-bit Linux DASD I/O by about 38% for writes and by about 33% for reads. This is not quite as dramatic as fast CCW translation's effect on 31-bit Linux DASD I/O, but it is still quite good. We also saw that 64-bit Linux DASD I/O costs about the same (CP CPU time per CCW) as 31-bit I/O.

Hardware

2064-109, LPAR with 2 dedicated CPUs, 1 GB real, 2 GB XSTORE, LPAR dedicated to this experiment during the runs. DASD is RAMAC-1 behind a 3990-6 controller.

Software

z/VM 4.2.0. Also an internal development driver of 64-bit Linux, configured with no swap partition and with one 12 GB LVM logical volume with an ext2 file system thereon. All file systems resided on DEDICATEd full-pack volumes. Finally, we used a DASD I/O exercising tool which opens a Linux file (the "ballast file"), writes it in 16 KB chunks until the desired file size is reached, closes it, then performs N (N>=0) open-read-close passes over the file, reading the file in 16 KB chunks during each pass.

Experiment

The general idea is that we ran the DASD I/O exercising tool a number of times, each run having a different environmental configuration. For each run, we collected elapsed time (seconds, via CP QUERY TIME), virtual CPU time (hundredths of seconds, via CP QUERY TIME), CP CPU time (hundredths of seconds, via CP QUERY TIME), and virtual I/O count (via CP INDICATE USER * EXP). Also, the tool prints its observed write data rate (KB/sec) and observed read data rate (KB/sec) when it finishes its run.

We ran the tool 16 times, varying the Linux virtual machine architecture mode (ESA/390 or z/Architecture), the number of read passes over the ballast file (0 or 1), the setting of CP SET MDCACHE (OFF or ON), and whether fast CCW translation was intentionally disabled via a zap (CP STORE HOST) in CP.

Observations

In the results table, each run name is a six-character token smmffr, where:

Portion

Meaning

s

3 for a 128 MB Linux guest doing ESA/390 I/O to a 384 MB ballast file, or 6 for a 3072 MB Linux guest doing z/Architecture I/O to a 9216 MB ballast file

mm

M1 for MDC enabled, or M0 for MDC disabled

ff

F1 for fast CCW translation enabled, or F0 for fast CCW translation disabled

r

The number of read passes over the file, 0 or 1

So run 3M0F01 would be the small Linux guest, MDC disabled, fast CCW translation disabled, and one read pass.

Note that Linux automatically selects 31-bit mode or 64-bit mode according to whether the storage size is greater than 2 GB. So, to vary the mode, we just varied the storage size. Note also that we chose the ballast file to be three times the size of the Linux virtual machine, so as to suppress Linux's attempts to use its internal file cache.

Here are the results we collected:

Run name CP CPU / MB (msec/MB) Observed read rate (KB/sec) Observed write rate (KB/sec)
3M1F10 0.625 n/a 2853
3M0F10 0.651 n/a 2553
3M1F00 1.589 n/a 2551
3M0F00 1.484 n/a 3268
6M1F10 2.077 n/a 3043
6M0F10 2.064 n/a 3056
6M1F00 3.352 n/a 3008
6M0F00 3.237 n/a 3024
3M1F11 1.224 8414 2548
3M0F11 1.354 8259 2580
3M1F01 3.047 4973 3270
3M0F01 3.021 8122 2580
6M1F11 4.301 2170 3047
6M0F11 4.349 2063 3043
6M1F01 6.373 2163 3080
6M0F01 6.429 2166 3055
Note: 2064-109, LPAR with 2 dedicated CPUs, 1 GB real, 2 GB XSTORE, LPAR dedicated to these runs. RAMAC-1 behind 3990-6. z/VM 4.2.0. Internal driver of 64-bit Linux.

We wish to emphasize that previous experience with this DASD I/O tool has shown us that small variations from one "identical" run to another are to be expected. Small unexplainable changes in measured variables are probably due to natural run variation. Unfortunately, this experiment's particular runs are so time-consumptive that doing multiple runs of a given configuration so as to quantify run variability just wasn't practical for this experiment.

Discussion

  1. We can see that fast CCW translation is definitely working for both the ESA/390 Linux guest and the z/Architecture Linux guest. For the write-only runs, fast translation being enabled reduces CP CPU per MB by about 59% for the small guest and by about 38% for the large guest. For the write-once, read-once runs, fast translation being enabled reduces CP CPU per MB by about 57% for the small guest and by about 33% for the large guest.

  2. CP CPU per MB for a given large guest run is about twice the CP CPU per MB for the corresponding small guest run. This was disconcerting until we looked at I/O traces for the respective runs. We found that the large Linux guest performs half as many CCWs per Start Subchannel as the small guest. We found that the Linux guest allocates the same size memory buffer for its channel program, whether it is doing z/Architecture I/O or ESA/390 I/O. However, z/Architecture channel programs take twice as much storage per CCW as ESA/390 channel programs do. Therefore the large Linux guest is doing twice as many Start Subchannel operations per MB as the small Linux guest does, and so the difference in CP CPU time per MB is easily explained.

    We did not collect monitor data to verify that the large guest's Start Subchannel rate is twice the corresponding small guest's rate. The CP CPU time evidence coupled with inspection of the I/O traces satisfied us.

  3. These experiments read the ballast file at most once. We saw in another experiment (see Linux Guest DASD Performance) that MDC does not come reliably into play until we read the file more than once. So these runs should not be taken to be in any way indicative of MDC effects.

  4. Readers should note that if an I/O happens to fail the criteria for fast CCW translation, it will also be ineligible for MDC.

Conclusion

z/VM 4.2.0's fast CCW translation for 64-bit DASD I/O is doing what it is supposed to be doing.

Contents | Previous | Next