64-bit Fast CCW Translation
Purpose
In z/VM 4.2.0, IBM extended CP's fast CCW translation facility to provide fast translation support for 64-bit disk I/O. One reason for this enhancement was to give 64-bit guests the reduced CPU consumption benefit of fast translation. Another reason for the enhancement was to to enable 64-bit I/O for minidisk cache (MDC); recall that only those disk I/Os that succeed in using fast translation are eligible for MDC.
The purpose of this experiment was to quantify the CP CPU consumption improvement offered by the 64-bit fast translation extension and compare said improvement to the corresponding improvement fast translation already offers to 31-bit I/O.
We specifically did not design this experiment to measure the impact of MDC on 64-bit disk I/O. This is because the impact of MDC on disk I/O is known from other experiments.
Executive Summary of Results
The new fast translation support reduces CP CPU time per MB for 64-bit Linux DASD I/O by about 38% for writes and by about 33% for reads. This is not quite as dramatic as fast CCW translation's effect on 31-bit Linux DASD I/O, but it is still quite good. We also saw that 64-bit Linux DASD I/O costs about the same (CP CPU time per CCW) as 31-bit I/O.
Hardware
2064-109, LPAR with 2 dedicated CPUs, 1 GB real, 2 GB XSTORE, LPAR dedicated to this experiment during the runs. DASD is RAMAC-1 behind a 3990-6 controller.
Software
z/VM 4.2.0. Also an internal development driver of 64-bit Linux, configured with no swap partition and with one 12 GB LVM logical volume with an ext2 file system thereon. All file systems resided on DEDICATEd full-pack volumes. Finally, we used a DASD I/O exercising tool which opens a Linux file (the "ballast file"), writes it in 16 KB chunks until the desired file size is reached, closes it, then performs N (N>=0) open-read-close passes over the file, reading the file in 16 KB chunks during each pass.
Experiment
The general idea is that we ran the DASD I/O exercising tool a number of times, each run having a different environmental configuration. For each run, we collected elapsed time (seconds, via CP QUERY TIME), virtual CPU time (hundredths of seconds, via CP QUERY TIME), CP CPU time (hundredths of seconds, via CP QUERY TIME), and virtual I/O count (via CP INDICATE USER * EXP). Also, the tool prints its observed write data rate (KB/sec) and observed read data rate (KB/sec) when it finishes its run.
We ran the tool 16 times, varying the Linux virtual machine architecture mode (ESA/390 or z/Architecture), the number of read passes over the ballast file (0 or 1), the setting of CP SET MDCACHE (OFF or ON), and whether fast CCW translation was intentionally disabled via a zap (CP STORE HOST) in CP.
Observations
In the results table, each run name is a six-character token smmffr, where:
Portion | Meaning |
s | 3 for a 128 MB Linux guest doing ESA/390 I/O to a 384 MB ballast file, or 6 for a 3072 MB Linux guest doing z/Architecture I/O to a 9216 MB ballast file |
mm | M1 for MDC enabled, or M0 for MDC disabled |
ff | F1 for fast CCW translation enabled, or F0 for fast CCW translation disabled |
r | The number of read passes over the file, 0 or 1 |
So run 3M0F01 would be the small Linux guest, MDC disabled, fast CCW translation disabled, and one read pass.
Note that Linux automatically selects 31-bit mode or 64-bit mode according to whether the storage size is greater than 2 GB. So, to vary the mode, we just varied the storage size. Note also that we chose the ballast file to be three times the size of the Linux virtual machine, so as to suppress Linux's attempts to use its internal file cache.
Here are the results we collected:
Run name | CP CPU / MB (msec/MB) | Observed read rate (KB/sec) | Observed write rate (KB/sec) |
---|---|---|---|
3M1F10 | 0.625 | n/a | 2853 |
3M0F10 | 0.651 | n/a | 2553 |
3M1F00 | 1.589 | n/a | 2551 |
3M0F00 | 1.484 | n/a | 3268 |
6M1F10 | 2.077 | n/a | 3043 |
6M0F10 | 2.064 | n/a | 3056 |
6M1F00 | 3.352 | n/a | 3008 |
6M0F00 | 3.237 | n/a | 3024 |
3M1F11 | 1.224 | 8414 | 2548 |
3M0F11 | 1.354 | 8259 | 2580 |
3M1F01 | 3.047 | 4973 | 3270 |
3M0F01 | 3.021 | 8122 | 2580 |
6M1F11 | 4.301 | 2170 | 3047 |
6M0F11 | 4.349 | 2063 | 3043 |
6M1F01 | 6.373 | 2163 | 3080 |
6M0F01 | 6.429 | 2166 | 3055 |
Note: 2064-109, LPAR with 2 dedicated CPUs, 1 GB real, 2 GB XSTORE, LPAR dedicated to these runs. RAMAC-1 behind 3990-6. z/VM 4.2.0. Internal driver of 64-bit Linux. |
We wish to emphasize that previous experience with this DASD I/O tool has shown us that small variations from one "identical" run to another are to be expected. Small unexplainable changes in measured variables are probably due to natural run variation. Unfortunately, this experiment's particular runs are so time-consumptive that doing multiple runs of a given configuration so as to quantify run variability just wasn't practical for this experiment.
Discussion
- We can see that fast CCW translation is definitely working for both the ESA/390 Linux guest and the z/Architecture Linux guest. For the write-only runs, fast translation being enabled reduces CP CPU per MB by about 59% for the small guest and by about 38% for the large guest. For the write-once, read-once runs, fast translation being enabled reduces CP CPU per MB by about 57% for the small guest and by about 33% for the large guest.
- CP CPU per MB for a given large guest run is about
twice the CP CPU per MB for the corresponding small guest
run. This was disconcerting until we looked at I/O traces for
the respective runs. We found that the large Linux guest
performs half as many CCWs per Start Subchannel as the small
guest. We found that
the Linux guest allocates the same size memory buffer for
its channel program, whether it is doing z/Architecture I/O
or ESA/390 I/O. However,
z/Architecture channel programs take twice as much
storage per CCW as ESA/390 channel programs do.
Therefore the large Linux guest is doing twice as many
Start Subchannel operations per MB as the small Linux guest does,
and so the difference in CP CPU time per MB
is easily explained.
We did not collect monitor data to verify that the large guest's Start Subchannel rate is twice the corresponding small guest's rate. The CP CPU time evidence coupled with inspection of the I/O traces satisfied us.
- These experiments read the ballast file at most once. We saw in another experiment (see Linux Guest DASD Performance) that MDC does not come reliably into play until we read the file more than once. So these runs should not be taken to be in any way indicative of MDC effects.
- Readers should note that if an I/O happens to fail the criteria for fast CCW translation, it will also be ineligible for MDC.
Conclusion
z/VM 4.2.0's fast CCW translation for 64-bit DASD I/O is doing what it is supposed to be doing.