Contents | Previous | Next

QDIO Enhanced Buffer State Management

Introduction 

The Queued Direct I/O (QDIO) Enhanced Buffer State Management (QEBSM) facility provides an optimized mechanism for transferring data via QDIO (including FCP, which uses QDIO) to and from virtual machines. Prior to this new facility, z/VM had to mediate between the virtual machine and the OSA Express or FCP adapter during QDIO data transfers. With QEBSM, z/VM is not involved with a typical QDIO data transfer when the guest operating system or device driver supports the facility.

Starting with z/VM 5.2.0 and the z990/z890 with QEBSM Enablement applied (refer to Performance Related APARs for a list of required maintenance), a program running in a virtual machine has the option of using QEBSM when performing QDIO operations. By using QEBSM, the processor millicode performs the shadow-queue processing typically performed by z/VM for a QDIO operation. This eliminates the z/VM and hardware overhead associated with SIE entry and exit for every QDIO data transfer. The shadow-queue processing still requires processor time, but much less than required when done by the software. The net effect is a small increase in virtual CPU time coupled with a much larger decrease in CP CPU time.

This section summarizes measurement results comparing Linux communicating over a QDIO connection under z/VM 5.1.0 with measurement results under z/VM 5.2.0 with QEBSM active.

Methodology 

The Application Workload Modeler (AWM) was used to drive the workload for OSA and HiperSockets. (Refer to AWM Workload for more information.) A complete set of runs was done for RR and STR workloads. IOzone was used to drive the workload for native SCSI (FCP) devices. Refer to Linux IOzone Workload for details.

The measurements were done on a 2084-324 with two dedicated processors in each of the two LPARs used. Running under z/VM, an internal Linux driver at level 2.6.14-16 that supports QEBSM was used.

Two LPARs were used for the OSA and HiperSockets measurements. The AWM client ran in one LPAR and the AWM server ran in the other LPAR. Each LPAR had 2GB of main storage and 2GB of expanded storage. CP Monitor data were captured for one LPAR (client side) during the measurement and were reduced using Performance Toolkit for VM (Perfkit).

One LPAR was used for the FCP measurements. CP Monitor data and hardware instrumentation data were captured.

Summary of Results 


  Number of CP CPU Virtual CPU Total CPU Throughput
  Comparisons range range range range
OSA 9 -74%/-92% 25%/-9% -13%/-29% 0%/18%
HiperSockets 9 -70%/-100% 8%/-1% -19%/-36% 1%/50%
FCP 1 -71% -8% -17% 0%/1%

The direct effect of QEBSM is to decrease CPU time. This, in turn, increases throughput in cases where it had been limited by CPU usage. This effect is demonstrated in this table for all three cases.

Detailed Results 

The following tables compare the measurements for OSA, HiperSockets and FCP. The %diff numbers shown are the percent increase (or decrease) between the measurement on 5.1.0 and 5.2.0.

Table 1. OSA RR

MTU 1492      
Number of clients 01 10 50
5.1.0      
runid lorn0101 lorn1001 lorn5001
trans/sec 2539.21 13756.22 28037.86
Total CPU msec/trans 0.064 0.042 0.033
Virtual CPU msec/trans 0.033 0.029 0.027
CP CPU msec/trans 0.031 0.013 0.006
5.2.0 (QEBSM)      
runid lorn0101 lorn1001 lorn5001
trans/sec 2672.71 14536.72 33072.56
Total CPU msec/trans 0.043 0.030 0.025
Virtual CPU msec/trans 0.036 0.028 0.024
CP CPU msec/trans 0.007 0.002 0.001
% diff      
trans/sec 5% 6% 18%
Total CPU msec/trans -33% -29% -25%
Virtual CPU msec/trans 8% -4% -9%
CP CPU msec/trans -78% -85% -92%
2084-324; z/VM 5.2.0; Linux 2.6.14-16

Table 2. OSA - STR

MTU 1492      
Number of clients 01 10 50
5.1.0      
runid losn0101 losn1001 losn5001
MB/sec 53.6 111.9 111.7
Total CPU msec/MB 7.76 5.92 6.25
Virtual CPU msec/MB 5.45 4.97 5.32
CP CPU msec/MB 2.31 0.95 0.93
5.2.0 (QEBSM)      
runid losn0101 losn1001 losn5001
MB/sec 55.4 111.9 111.7
Total CPU msec/MB 5.81 4.84 5.14
Virtual CPU msec/MB 5.38 4.68 4.96
CP CPU msec/MB 0.43 0.16 0.18
% diff      
MB/sec 3% 0% 0%
Total CPU msec/MB -25% -18% -18%
Virtual CPU msec/MB -1% -6% -7%
CP CPU msec/MB -81% -83% -81%
MTU 8992      
Number of clients 01 10 50
5.1.0      
runid losj0101 losj1001 losj5001
MB/sec 99.2 117.9 117.7
Total CPU msec/MB 4.86 3.99 4.35
Virtual CPU msec/MB 2.72 2.48 2.65
CP CPU msec/MB 2.14 1.51 1.70
5.2.0 (QEBSM)      
runid losj0101 losj1001 losj5001
MB/sec 102.1 117.9 117.7
Total CPU msec/MB 3.64 3.48 3.52
Virtual CPU msec/MB 3.17 3.09 3.13
CP CPU msec/MB 0.47 0.39 0.39
% diff      
MB/sec 3% 0% 0%
Total CPU msec/MB -25% -13% -19%
Virtual CPU msec/MB 17% 25% 18%
CP CPU msec/MB -78% -74% -77%
2084-324; z/VM 5.2.0; Linux 2.6.14-16

Table 3. HiperSockets RR

MTU 8192      
Number of clients 01 10 50
5.1.0      
runid lhrj0101 lhrj1001 lhrj5001
trans/sec 8069.87 22641.90 21692.50
Total CPU msec/trans 0.052 0.044 0.045
Virtual CPU msec/trans 0.031 0.030 0.030
CP CPU msec/trans 0.021 0.015 0.015
5.2.0 (QEBSM)      
runid lhrj0101 lhrj1001 lhrj5001
trans/sec 8150.87 33662.32 32532.32
Total CPU msec/trans 0.039 0.028 0.029
Virtual CPU msec/trans 0.033 0.028 0.029
CP CPU msec/trans 0.006 0.000 0.000
% diff      
trans/sec 1.0% 48.7% 50.0%
Total CPU msec/trans -25.4% -36.1% -34.5%
Virtual CPU msec/trans 5.2% -3.1% -1.0%
CP CPU msec/trans -70.0% -100.0% -100.0%
2084-324; z/VM 5.2.0; Linux 2.6.14-16

Table 4. HiperSockets - STR

MTU 8192      
Number of clients 01 10 50
5.1.0      
runid lhsj0101 lhsj1001 lhsj5001
MB/sec 241.4 289.6 286.9
Total CPU msec/MB 3.56 3.45 3.48
Virtual CPU msec/MB 2.14 2.16 2.22
CP CPU msec/MB 1.43 1.30 1.26
5.2.0 (QEBSM)      
runid lhsj0101 lhsj1001 lhsj5001
MB/sec 264.7 349.6 340.0
Total CPU msec/MB 2.82 2.79 2.82
Virtual CPU msec/MB 2.81 2.78 2.81
CP CPU msec/MB 0.01 0.01 0.01
% diff      
MB/sec 10% 21% 19%
Total CPU msec/MB -21% -19% -19%
Virtual CPU msec/MB 31% 29% 27%
CP CPU msec/MB -100% -100% -100%
MTU 56K      
Number of clients 01 10 50
5.1.0      
runid lhsh0101 lhsh1001 lhsh5001
MB/sec 294.3 452.2 436.7
Total CPU msec/MB 2.30 2.21 2.22
Virtual CPU msec/MB 1.52 1.59 1.60
CP CPU msec/MB 0.79 0.62 0.63
5.2.0 (QEBSM)      
runid lhsh0101 lhsh1001 lhsh5001
MB/sec 318.7 566.2 553.5
Total CPU msec/MB 1.72 1.71 1.73
Virtual CPU msec/MB 1.58 1.71 1.73
CP CPU msec/MB 0.14 0.00 0.01
% diff      
MB/sec 8% 25% 27%
Total CPU msec/MB -25% -23% -22%
Virtual CPU msec/MB 4% 7% 8%
CP CPU msec/MB -83% -100% -99%
2084-324; z/VM 5.2.0; Linux 2.6.14-16

The following table shows the results for FCP. Values are provided for total microseconds (µsec) per transaction, CP (µsec) per transaction, and virtual (µsec) per transaction.

Table 5. FCP

IOzone Init Write Re-write Init Read Re-read Total CP Virtual
  (KB/s) (KB/s) (KB/s) (KB/s) µsec/tx µsec/tx µsec/tx
5.1.0 39954.88 35462.16 42619.48 42848.82 1198 172 1024
5.2.0 (QEBSM) 40441.04 35527.13 43046.18 43389.61 993 50 940
% diff 1% 0% 1% 1% -17% -71% -8%
2084-324; z/VM 5.2.0; Linux 2.6.14-16

Contents | Previous | Next