IBM: z/VM Performance Report: QDIO Enhanced Buffer State Management

QDIO Enhanced Buffer State Management

The Queued Direct I/O (QDIO) Enhanced Buffer State Management (QEBSM) facility provides an optimized mechanism for transferring data via QDIO (including FCP, which uses QDIO) to and from virtual machines. Prior to this new facility, z/VM had to mediate between the virtual machine and the OSA Express or FCP adapter during QDIO data transfers. With QEBSM, z/VM is not involved with a typical QDIO data transfer when the guest operating system or device driver supports the facility.

Starting with z/VM 5.2.0 and the z990/z890 with QEBSM Enablement applied (refer to Performance Related APARs for a list of required maintenance), a program running in a virtual machine has the option of using QEBSM when performing QDIO operations. By using QEBSM, the processor millicode performs the shadow-queue processing typically performed by z/VM for a QDIO operation. This eliminates the z/VM and hardware overhead associated with SIE entry and exit for every QDIO data transfer. The shadow-queue processing still requires processor time, but much less than required when done by the software. The net effect is a small increase in virtual CPU time coupled with a much larger decrease in CP CPU time.

This section summarizes measurement results comparing Linux communicating over a QDIO connection under z/VM 5.1.0 with measurement results under z/VM 5.2.0 with QEBSM active.

Methodology:

The Application Workload Modeler (AWM) was used to drive the workload for OSA and HiperSockets. (Refer to AWM Workload for more information.) A complete set of runs was done for RR and STR workloads. IOzone was used to drive the workload for native SCSI (FCP) devices. Refer to Linux IOzone Workload for details.

The measurements were done on a 2084-324 with two dedicated processors in each of the two LPARs used. Running under z/VM, an internal Linux driver at level 2.6.14-16 that supports QEBSM was used.

Two LPARs were used for the OSA and HiperSockets measurements. The AWM client ran in one LPAR and the AWM server ran in the other LPAR. Each LPAR had 2GB of main storage and 2GB of expanded storage. CP Monitor data were captured for one LPAR (client side) during the measurement and were reduced using Performance Toolkit for VM (Perfkit).

One LPAR was used for the FCP measurements. CP Monitor data and hardware instrumentation data were captured.

Summary of Results:

	Number of	CP CPU	Virtual CPU	Total CPU	Throughput
	Comparisons	range	range	range	range
OSA	9	-74%/-92%	25%/-9%	-13%/-29%	0%/18%
HiperSockets	9	-70%/-100%	8%/-1%	-19%/-36%	1%/50%
FCP	1	-71%	-8%	-17%	0%/1%

The direct effect of QEBSM is to decrease CPU time. This, in turn, increases throughput in cases where it had been limited by CPU usage. This effect is demonstrated in this table for all three cases.

Detailed Results:

The following tables compare the measurements for OSA, HiperSockets and FCP. The %diff numbers shown are the percent increase (or decrease) between the measurement on 5.1.0 and 5.2.0.

Table 1. OSA RR

MTU 1492
Number of clients	01	10	50
5.1.0
runid	lorn0101	lorn1001	lorn5001
trans/sec	2539.21	13756.22	28037.86
Total CPU msec/trans	0.064	0.042	0.033
Virtual CPU msec/trans	0.033	0.029	0.027
CP CPU msec/trans	0.031	0.013	0.006
5.2.0 (QEBSM)
runid	lorn0101	lorn1001	lorn5001
trans/sec	2672.71	14536.72	33072.56
Total CPU msec/trans	0.043	0.030	0.025
Virtual CPU msec/trans	0.036	0.028	0.024
CP CPU msec/trans	0.007	0.002	0.001
% diff
trans/sec	5%	6%	18%
Total CPU msec/trans	-33%	-29%	-25%
Virtual CPU msec/trans	8%	-4%	-9%
CP CPU msec/trans	-78%	-85%	-92%
2084-324; z/VM 5.2.0; Linux 2.6.14-16

Table 2. OSA - STR

MTU 1492
Number of clients	01	10	50
5.1.0
runid	losn0101	losn1001	losn5001
MB/sec	53.6	111.9	111.7
Total CPU msec/MB	7.76	5.92	6.25
Virtual CPU msec/MB	5.45	4.97	5.32
CP CPU msec/MB	2.31	0.95	0.93
5.2.0 (QEBSM)
runid	losn0101	losn1001	losn5001
MB/sec	55.4	111.9	111.7
Total CPU msec/MB	5.81	4.84	5.14
Virtual CPU msec/MB	5.38	4.68	4.96
CP CPU msec/MB	0.43	0.16	0.18
% diff
MB/sec	3%	0%	0%
Total CPU msec/MB	-25%	-18%	-18%
Virtual CPU msec/MB	-1%	-6%	-7%
CP CPU msec/MB	-81%	-83%	-81%
MTU 8992
Number of clients	01	10	50
5.1.0
runid	losj0101	losj1001	losj5001
MB/sec	99.2	117.9	117.7
Total CPU msec/MB	4.86	3.99	4.35
Virtual CPU msec/MB	2.72	2.48	2.65
CP CPU msec/MB	2.14	1.51	1.70
5.2.0 (QEBSM)
runid	losj0101	losj1001	losj5001
MB/sec	102.1	117.9	117.7
Total CPU msec/MB	3.64	3.48	3.52
Virtual CPU msec/MB	3.17	3.09	3.13
CP CPU msec/MB	0.47	0.39	0.39
% diff
MB/sec	3%	0%	0%
Total CPU msec/MB	-25%	-13%	-19%
Virtual CPU msec/MB	17%	25%	18%
CP CPU msec/MB	-78%	-74%	-77%
2084-324; z/VM 5.2.0; Linux 2.6.14-16

Table 3. HiperSockets RR

MTU 8192
Number of clients	01	10	50
5.1.0
runid	lhrj0101	lhrj1001	lhrj5001
trans/sec	8069.87	22641.90	21692.50
Total CPU msec/trans	0.052	0.044	0.045
Virtual CPU msec/trans	0.031	0.030	0.030
CP CPU msec/trans	0.021	0.015	0.015
5.2.0 (QEBSM)
runid	lhrj0101	lhrj1001	lhrj5001
trans/sec	8150.87	33662.32	32532.32
Total CPU msec/trans	0.039	0.028	0.029
Virtual CPU msec/trans	0.033	0.028	0.029
CP CPU msec/trans	0.006	0.000	0.000
% diff
trans/sec	1.0%	48.7%	50.0%
Total CPU msec/trans	-25.4%	-36.1%	-34.5%
Virtual CPU msec/trans	5.2%	-3.1%	-1.0%
CP CPU msec/trans	-70.0%	-100.0%	-100.0%
2084-324; z/VM 5.2.0; Linux 2.6.14-16

Table 4. HiperSockets - STR

MTU 8192
Number of clients	01	10	50
5.1.0
runid	lhsj0101	lhsj1001	lhsj5001
MB/sec	241.4	289.6	286.9
Total CPU msec/MB	3.56	3.45	3.48
Virtual CPU msec/MB	2.14	2.16	2.22
CP CPU msec/MB	1.43	1.30	1.26
5.2.0 (QEBSM)
runid	lhsj0101	lhsj1001	lhsj5001
MB/sec	264.7	349.6	340.0
Total CPU msec/MB	2.82	2.79	2.82
Virtual CPU msec/MB	2.81	2.78	2.81
CP CPU msec/MB	0.01	0.01	0.01
% diff
MB/sec	10%	21%	19%
Total CPU msec/MB	-21%	-19%	-19%
Virtual CPU msec/MB	31%	29%	27%
CP CPU msec/MB	-100%	-100%	-100%
MTU 56K
Number of clients	01	10	50
5.1.0
runid	lhsh0101	lhsh1001	lhsh5001
MB/sec	294.3	452.2	436.7
Total CPU msec/MB	2.30	2.21	2.22
Virtual CPU msec/MB	1.52	1.59	1.60
CP CPU msec/MB	0.79	0.62	0.63
5.2.0 (QEBSM)
runid	lhsh0101	lhsh1001	lhsh5001
MB/sec	318.7	566.2	553.5
Total CPU msec/MB	1.72	1.71	1.73
Virtual CPU msec/MB	1.58	1.71	1.73
CP CPU msec/MB	0.14	0.00	0.01
% diff
MB/sec	8%	25%	27%
Total CPU msec/MB	-25%	-23%	-22%
Virtual CPU msec/MB	4%	7%	8%
CP CPU msec/MB	-83%	-100%	-99%
2084-324; z/VM 5.2.0; Linux 2.6.14-16

The following table shows the results for FCP. Values are provided for total microseconds (µsec) per transaction, CP (µsec) per transaction, and virtual (µsec) per transaction.

Table 5. FCP

IOzone	Init Write	Re-write	Init Read	Re-read	Total	CP	Virtual
	(KB/s)	(KB/s)	(KB/s)	(KB/s)	µsec/tx	µsec/tx	µsec/tx
5.1.0	39954.88	35462.16	42619.48	42848.82	1198	172	1024
5.2.0 (QEBSM)	40441.04	35527.13	43046.18	43389.61	993	50	940
% diff	1%	0%	1%	1%	-17%	-71%	-8%
2084-324; z/VM 5.2.0; Linux 2.6.14-16