IBM: z/VM Performance Report

Dedicated OSA vs. Vswitch Update

Abstract

To connect to an external network, z/VM guests can use a dedicated OSA or a vswitch. This chapter provides a comparison of how the choice impacts the transaction rate when running request-response (RR) workloads and the outbound data rate when running streaming (STR) workloads. A variety of different configurations are compared.

Introduction

The Dedicated OSA vs. VSWITCH chapter of the z/VM 5.2 Performance Report compared two connectivity options available for guests running under z/VM: direct connection to OSA and vswitch.

Here we present an update of the z/VM 5.2 information. This refresh contains a comparison of key measurement points between the two options and lists some of the reasons for choosing one over the other. Customer results will vary according to system configuration and workload.

Method

Application Workload Modeler (AWM), a Linux network benchmarking application, was used to drive network traffic between one client Linux guest and one server Linux guest. Each guest was in its own dedicated LPAR. Both dedicated OSA configurations and vswitch configurations were evaluated. Both request-response (RR) and streaming (STR) workloads were used. The RR workload consisted of the client sending 200 bytes to the server and the server responding with 1000 bytes. The STR workload consisted of the client sending 20 bytes to the server and the server responding with 20 MB. The measurement ran for 600 seconds. The workloads were run in 12 configurations. The configurations varied by maximum transmission unit (MTU) size, SMT mode, and transport mode. The table below shows the combination of workloads and configurations used.

Table 1. Combination of workloads and configurations
Workload	MTU Size	SMT Mode	Transport Mode
RR	1492	SMT-1	Layer 2
RR	1492	SMT-1	Layer 3
RR	1492	SMT-2	Layer 2
RR	1492	SMT-2	Layer 3
STR	1492	SMT-1	Layer 2
STR	1492	SMT-1	Layer 3
STR	1492	SMT-2	Layer 2
STR	1492	SMT-2	Layer 3
STR	8992	SMT-1	Layer 2
STR	8992	SMT-1	Layer 3
STR	8992	SMT-2	Layer 2
STR	8992	SMT-2	Layer 3
Note: See Layer 2 and Layer 3 for more details about transport modes.

Each combination from Table 1 was run three times: once using one socket connection, once using 10 concurrent socket connections, and once using 50 concurrent socket connections.

The measurements were done on a z15 8561-T01 using two dedicated LPARs. For SMT-1 runs, each LPAR used two logical IFL cores. For SMT-2 runs, each LPAR used one logical IFL core. Connectivity between the two LPARs was over an OSA-Express6 10GbE card. The software used included z/VM 7.2 and Linux SLES 12 SP1.

Figure 1. Vswitch Environment

Use of a vswitch to connect the client guest to the server guest.

Figure 2. OSA Environment

Use of dedicated OSA to connect the client guest to the server guest.

In both environments, the server Linux guest ran in LPAR 1 and the client Linux guest ran in LPAR 2. Each LPAR had 512 GB of central storage. CP monitor data was captured for LPAR 1 (server side) during each measurement and reduced using Performance Toolkit for VM (Perfkit).

The z/VM 5.2 measurements captured data from the client side. For this new study, the data was captured on the server side. This more closely aligns with the role typically played by a Linux guest.

Results and Discussion

The following tables contain the average of select metrics for each run. For RR runs, the focus is on transaction rate. For STR runs, the focus is on outbound data transmission rate. The tables also compare the difference in these metrics between the OSA and vswitch runs. The %diff numbers shown are the percent change comparing OSA to the vswitch. For example, if the number is positive, OSA was that percent greater than vswitch.

In general, a Linux guest using a dedicated OSA gets higher throughput and uses less CPU time than a Linux guest connected through a vswitch. However, this must be balanced against advantages gained using the vswitch, such as:

Ease of network design
Ability to share network resources (OSA card)
Management of the network including security and capabilities available to the z/VM guest on the LAN
Measurement of the network via z/VM monitor records
Layer 3 bridge
Less overhead than using a router stack

Table 2. Results of RR runs with MTU size of 1492 and using SMT-1
Transport mode	Layer 3	Layer 3	Layer 3	Layer 2	Layer 2	Layer 2
Number of Clients	1	10	50	1	10	50
Workload	RR	RR	RR	RR	RR	RR
MTU size	1492	1492	1492	1492	1492	1492
SMT mode	SMT-1	SMT-1	SMT-1	SMT-1	SMT-1	SMT-1
VSwitch
Runid	NVS1L301	NVS1L310	NVS1L350	NVS1L201	NVS1L210	NVS1L250
ETR	5,754.34	35,214.88	89,560.30	5,766.87	35,484.94	90,498.04
Total CPU msec/transaction	0.00855	0.00481	0.00507	0.00902	0.00474	0.00495
Emul CPU msec/transaction	0.00589	0.00356	0.00402	0.00617	0.00350	0.00388
CP CPU msec/transaction	0.00266	0.00125	0.00105	0.00285	0.00124	0.00107
OSA
Runid	NOS1L301	NOS1L310	NOS1L350	NOS1L201	NOS1L210	NOS1L250
ETR	9,950.98	59,350.26	160225.18	10,026.60	59,397.52	163,282.63
Total CPU msec/transaction	0.01025	0.00696	0.00475	0.01015	0.00686	0.00464
Emul CPU msec/transaction	0.00927	0.00657	0.00465	0.00920	0.00648	0.00454
CP CPU msec/transaction	0.00098	0.00039	0.00010	0.00095	0.00038	0.00010
% difference
ETR	72.93%	68.54%	78.90%	73.87%	67.39%	80.43%
Total CPU msec/transaction	19.88%	44.70%	-6.31%	12.53%	44.73%	-6.26%
Emul CPU msec/transaction	57.39%	84.55%	15.67%	49.11%	85.14%	17.01%
CP CPU msec/transaction	-63.16%	-68.80%	-90.48%	-66.67%	-69.35%	-90.65%
Notes: 8561-T01, 2 dedicated IFL cores, 512 GB central storage, OSA-Express6 10GbE card, z/VM 7.2 of May 7, 2020, Linux SLES 12 SP1.

The ETR of the OSA runs was 68.54% to 80.43% higher than the equivalent vswitch runs when running the RR workload in an SMT-1 configuration with an MTU size of 1492. The total CPU per transaction of the OSA runs was between 44.73% higher to 6.31% lower than the equivalent vswitch runs.

Table 3. Results of RR runs with MTU size of 1492 and using SMT-2
Transport mode	Layer 3	Layer 3	Layer 3	Layer 2	Layer 2	Layer 2
Number of Clients	1	10	50	1	10	50
Workload	RR	RR	RR	RR	RR	RR
MTU size	1492	1492	1492	1492	1492	1492
SMT mode	SMT-2	SMT-2	SMT-2	SMT-2	SMT-2	SMT-2
VSwitch
Runid	NVS2L301	NVS2L310	NVS2L350	NVS2L201	NVS2L210	NVS2L250
ETR	5,705.34	34,202.31	75,775.96	5,675.12	34,485.39	75,912.95
Total CPU msec/transaction	0.01059	0.00571	0.00670	0.01256	0.00560	0.00658
Emul CPU msec/transaction	0.00787	0.00438	0.00532	0.00913	0.00428	0.00520
CP CPU msec/transaction	0.00272	0.00133	0.00138	0.00343	0.00132	0.00138
OSA
Runid	NOS2L301	NOS2L310	NOS2L350	NOS2L201	NOS2L210	NOS2L250
ETR	9,721.54	58,886.62	157,694.74	9,776.24	58,482.06	159,551.13
Total CPU msec/transaction	0.01192	0.00802	0.00586	0.01177	0.00782	0.00576
Emul CPU msec/transaction	0.01101	0.00762	0.00573	0.01086	0.00743	0.00564
CP CPU msec/transaction	0.00091	0.00040	0.00013	0.00091	0.00039	0.00012
% difference
ETR	70.39%	72.17%	108.11%	72.26%	69.59%	110.18%
Total CPU msec/transaction	12.56%	40.46%	-12.54%	-6.29%	39.64%	-12.46%
Emul CPU msec/transaction	39.90%	73.97%	7.71%	18.95%	73.60%	8.46%
CP CPU msec/transaction	-66.54%	-69.92%	-90.58%	-73.47%	-70.45%	-91.30%
Notes: 8561-T01, 2 dedicated IFL cores, 512 GB central storage, OSA-Express6 10GbE card, z/VM 7.2 of May 7, 2020, Linux SLES 12 SP1.

The ETR of the OSA runs was 69.59% to 110.18% higher than the equivalent vswitch runs when running the RR workload in an SMT-2 configuration with an MTU size of 1492. The total CPU per transaction of the OSA runs was between 40.46% higher to 12.54% lower than the equivalent vswitch runs.

Table 4. Results of STR runs with MTU size of 1492 and using SMT-1
Transport mode	Layer 3	Layer 3	Layer 3	Layer 2	Layer 2	Layer 2
Number of Clients	1	10	50	1	10	50
Workload	STR	STR	STR	STR	STR	STR
MTU size	1492	1492	1492	1492	1492	1492
SMT mode	SMT-1	SMT-1	SMT-1	SMT-1	SMT-1	SMT-1
VSwitch
Runid	NVM1L301	NVM1L310	NVM1L350	NVM1L201	NVM1L210	NVM1L250
Outbound MB/sec	481	913	997	450	1,042	1,036
Total CPU msec/Outbound MB	2.00728	1.62651	1.40020	1.99556	1.59693	1.45753
Emul CPU msec/Outbound MB	1.15904	1.26725	1.06018	1.20244	1.25432	1.12548
CP CPU msec/Outbound MB	0.84824	0.35926	0.34002	0.79312	0.34261	0.33205
OSA
Runid	NOM1L301	NOM1L310	NOM1L350	NOM1L201	NOM1L210	NOM1L250
Outbound MB/sec	935	1,131	1,136	785	1,159	1,155
Total CPU msec/Outbound MB	0.98599	1.03802	1.09419	1.07975	1.26488	1.26320
Emul CPU msec/Outbound MB	0.98513	1.03271	1.08363	1.07873	1.25626	1.25887
CP CPU msec/Outbound MB	0.00086	0.00531	0.01056	0.00102	0.00862	0.00433
% difference
Outbound MB/sec	94.39%	23.88%	13.94%	74.44%	11.23%	11.49%
Total CPU msec/Outbound MB	-50.88%	-36.18%	-21.85%	-45.89%	-20.79%	-13.33%
Emul CPU msec/Outbound MB	-15.00%	-18.51%	2.21%	-10.29%	0.15%	11.85%
CP CPU msec/Outbound MB	-99.90%	-98.52%	-96.89%	-99.87%	-97.48	-98.70%
Notes: 8561-T01, 2 dedicated IFL cores, 512 GB central storage, OSA-Express6 10GbE card, z/VM 7.2 of May 7, 2020, Linux SLES 12 SP1.

The outbound data rate of the OSA runs was 11.23% to 94.39% higher than the equivalent vswitch runs when running the STR workload in an SMT-1 configuration with an MTU size of 1492. The total CPU per outbound MB rate of the OSA runs was between 13.33% to 50.88% lower than the equivalent vswitch runs.

Table 5. Results of STR runs with MTU size of 1492 and using SMT-2
Transport mode	Layer 3	Layer 3	Layer 3	Layer 2	Layer 2	Layer 2
Number of Clients	1	10	50	1	10	50
Workload	STR	STR	STR	STR	STR	STR
MTU size	1492	1492	1492	1492	1492	1492
SMT mode	SMT-2	SMT-2	SMT-2	SMT-2	SMT-2	SMT-2
VSwitch
Runid	NVM2L301	NVM2L310	NVM2L350	NVM2L201	NVM2L210	NVM2L250
Outbound MB/sec	448	812	885	380	876	843
Total CPU msec/Outbound MB	2.21920	1.81527	1.82260	2.23289	1.92123	1.85647
Emul CPU msec/Outbound MB	1.38638	1.40640	1.40904	1.39026	1.48630	1.43416
CP CPU msec/Outbound MB	0.83282	0.40887	0.41356	0.84263	0.43493	0.42231
OSA
Runid	NOM2L301	NOM2L310	NOM2L350	NOM2L201	NOM2L210	NOM2L250
Outbound MB/sec	875	1,129	1,121	761	1,075	1,072
Total CPU msec/Outbound MB	1.05371	1.23738	1.41659	1.12431	1.54884	1.51026
Emul CPU msec/Outbound MB	1.05269	1.23206	1.40856	1.12326	1.53395	1.50466
CP CPU msec/Outbound MB	0.00102	0.00532	0.00803	0.00105	0.01489	0.00560
% difference
Outbound MB/sec	95.31%	39.04%	26.67%	100.26%	22.72%	27.16%
Total CPU msec/Outbound MB	-52.52%	-31.83%	-22.28%	-49.65%	-19.38%	-18.65%
Emul CPU msec/Outbound MB	-24.07%	-12.40%	-0.03%	-19.21%	3.21%	4.92%
CP CPU msec/Outbound MB	-99.88%	-98.70%	-98.06%	-99.88%	-96.58%	-98.67%
Notes: 8561-T01, 1 dedicated IFL core, 512 GB central storage, OSA-Express6 10GbE card, z/VM 7.2 of May 7, 2020, Linux SLES 12 SP1.

The outbound data rate of the OSA runs was 22.72% to 100.26% higher than the equivalent vswitch runs when running the STR workload in an SMT-2 configuration with an MTU size of 1492. The total CPU per outbound MB rate of the OSA runs was between 18.65% to 52.52% lower than the equivalent vswitch runs.

Table 6. Results of STR runs with MTU size of 8992 and using SMT-1
Transport mode	Layer 3	Layer 3	Layer 3	Layer 2	Layer 2	Layer 2
Number of Clients	1	10	50	1	10	50
Workload	STR	STR	STR	STR	STR	STR
MTU size	8992	8992	8992	8992	8992	8992
SMT mode	SMT-1	SMT-1	SMT-1	SMT-1	SMT-1	SMT-1
VSwitch
Runid	NVL1L301	NVL1L310	NVL1L350	NVL1L201	NVL1L210	NVL1L250
Outbound MB/sec	1011	1156	1154	747	1158	1156
Total CPU msec/Outbound MB	0.64857	0.61808	0.63648	0.63387	0.58109	0.60580
Emul CPU msec/Outbound MB	0.38586	0.38901	0.39636	0.38541	0.37332	0.39273
CP CPU msec/Outbound MB	0.26271	0.22907	0.24012	0.24846	0.20777	0.21307
OSA
Runid	NOL1L301	NOL1L310	NOL1L350	NOL1L201	NOL1L210	NOL1L250
Outbound MB/sec	1,121	1,153	1,154	1,113	1,157	1,156
Total CPU msec/Outbound MB	0.50696	0.54293	0.56205	0.54403	0.56206	0.56522
Emul CPU msec/Outbound MB	0.50000	0.53374	0.54948	0.53504	0.55315	0.55303
CP CPU msec/Outbound MB	0.00696	0.00919	0.01257	0.00899	0.00891	0.01219
% difference
Outbound MB/sec	10.88%	-0.26%	0.00%	49.00%	-0.09%	0.00%
Total CPU msec/Outbound MB	-21.83%	-12.16%	-11.69%	-14.17%	-3.27%	-6.70%
Emul CPU msec/Outbound MB	29.58%	37.20%	38.63%	38.82%	48.17%	40.82%
CP CPU msec/Outbound MB	-97.35%	-95.99%	-94.77%	-96.38%	-95.71%	-94.28%
Notes: 8561-T01, 2 dedicated IFL cores, 512 GB central storage, OSA-Express6 10GbE card, z/VM 7.2 of May 7, 2020, Linux SLES 12 SP1.

The outbound data rate of the OSA runs was 0.26% lower to 49.00% higher than the equivalent vswitch runs when running the STR workload in an SMT-1 configuration with an MTU size of 8992. The total CPU per outbound MB rate of the OSA runs was between 3.27% to 21.83% lower than the equivalent vswitch runs.

Table 7. Results of STR runs with MTU size of 8992 and using SMT-2
Transport mode	Layer 3	Layer 3	Layer 3	Layer 2	Layer 2	Layer 2
Number of Clients	1	10	50	1	10	50
Workload	STR	STR	STR	STR	STR	STR
MTU size	8992	8992	8992	8992	8992	8992
SMT mode	SMT-2	SMT-2	SMT-2	SMT-2	SMT-2	SMT-2
VSwitch
Runid	NVL2L301	NVL2L310	NVL2L350	NVL2L201	NVL2L210	NVL2L250
Outbound MB/sec	916	1156	1155	598	1157	1156
Total CPU msec/Outbound MB	0.73362	0.71678	0.73792	0.70084	0.68634	0.74334
Emul CPU msec/Outbound MB	0.45382	0.46557	0.47506	0.43779	0.45748	0.50389
CP CPU msec/Outbound MB	0.27980	0.25121	0.26286	0.26305	0.22886	0.23945
OSA
Runid	NOL2L301	NOL2L310	NOL2L350	NOL2L201	NOL2L210	NOL2L250
Outbound MB/sec	1,137.00	1,145.00	1,154.00	1,123.00	1,156.00	1,156.00
Total CPU msec/Outbound MB	0.53369	0.59712	0.63527	0.57106	0.61090	0.63503
Emul CPU msec/Outbound MB	0.52647	0.58777	0.62227	0.56215	0.60156	0.62301
CP CPU msec/Outbound MB	0.00722	0.00935	0.01300	0.00891	0.00934	0.01202
% difference
Outbound MB/sec	24.13%	-0.95%	-0.09%	87.79%	-0.09%	0.00%
Total CPU msec/Outbound MB	-27.25%	-16.69%	-13.91%	-18.52%	-10.99%	-14.57%
Emul CPU msec/Outbound MB	16.01%	26.25%	30.99%	28.41%	31.49%	23.64%
CP CPU msec/Outbound MB	-97.42%	-96.28%	-95.05%	-96.61%	-95.92%	-94.98%
Notes: 8561-T01, 1 dedicated IFL core, 512 GB central storage, OSA-Express6 10GbE card, z/VM 7.2 of May 7, 2020, Linux SLES 12 SP1.

The outbound data rate of the OSA runs was 0.95% lower to 87.79% higher than the equivalent vswitch runs when running the STR workload in an SMT-2 configuration with an MTU size of 8992. The total CPU per outbound MB rate of the OSA runs was between 10.99% to 27.25% better than the equivalent vswitch runs.

Summary

The results of the experiments conducted for this report indicate that for a request-response (RR) workload, Linux guests using a dedicated OSA experience a greater ETR than Linux guests using a vswitch. Further, for a streaming (STR) workload, Linux guests using a dedicated OSA experience equal or greater outbound data rate than Linux guests using a vswitch. The degree of improvement varies depending on the number of concurrent connections used between the two guests, especially in the case of a streaming workload.

Contents | Previous | Next