IBM: z/VM Performance Report: Layer 3 and Layer 2 Comparisons

Layer 3 and Layer 2 Comparisons

In addition to the measurements described in the z/VM 5.1.0 report section Virtual Switch Layer 2 Support, similar measurements were also done for:

Virtual switch on z/VM 5.2.0
Linux directly attached to an OSA-Express2 1 Gigabit Ethernet (GbE) card
Linux directly attached to an OSA-Express2 10 GbE card

The OSA-Express features can support two transport modes: Layer 2 (Link Layer or MAC Layer) and Layer 3 (Network Layer). Both the virtual switch and Linux are then configured to support the desired capability (Layer 2 or Layer 3). In Layer 2 mode, each port is referenced by its Media Access Control (MAC) address instead of by its Internet Protocol (IP) address. Data is transported and delivered in Ethernet frames, providing the ability to handle protocol-independent traffic for both IP and non-IP such as IPX, NetBIOS, or SNA.

Acknowledgment:

The 10 GbE information is provided in cooperation with IBM's STG OSA Performance Analysis & Measurement teams.

Methodology:

The Application Workload Modeler (AWM) product was used to drive request-response (RR) and streaming (STR) workloads with IPv4 layer 3 and layer 2. Refer to AWM workload for a description of the workload used for the virtual switch and for the 1 GbE (real QDIO) measurements. The workload for the 10 GbE measurement was the same with the exception of the duration and the number of client-server pairs. For this measurement the requests were repeated for 600 seconds.

CP monitor data was captured for the LPAR and reduced using Performance Toolkit for VM. The results shown here are for the client side only. The following table shows any differences in the environments for the three measurements discussed here.

Table 1. Environment Differences

Measurement	z/VM	processor	# of real	# of virt	# of
	level	type	CPUs	CPUs	LPARs
Virtual Switch 1 GbE	5.2.0	2084-324	2	1	2
Direct OSA 1 GbE	5.2.0	2084-324	2	1	2
Direct OSA 10 GbE	5.1.0	2084-B16	8	3	1

Summary of Results:

In general, Layer 2 has higher throughput (between 0.2% and 4.0%) than Layer 3. When using virtual switch, CPU time is less for Layer 2 (between -4.7% and 0%). When going directly through OSA, CPU time for Layer 2 was between -0.6% and 7.3% compared to Layer 3 for the 1 GbE card. For the 10 GbE card, Layer 2 throughput was between 0% and 10% higher than Layer 3, and CPU time was between −75% and 1%. Results can vary based on the level of z/VM, the OSA card, and the workload.

For virtual switch, Layer 2 performance improved dramatically on z/VM 5.2.0 relative to z/VM 5.1.0 (throughput increased between 2.2% and 54.8% and CPU time decreased between -3.1% and -30.1%) while Layer 3 performance was essentially unchanged. As a result, the relative performance of Layer 2 and Layer 3 changed significantly. This was not the case when going directly to OSA because both Layer 2 and Layer 3 showed little change in performance when going from z/VM 5.1.0 to z/VM 5.2.0.

Detailed Results:

The following three tables are included for background information. They show a comparison between z/VM 5.1.0 and z/VM 5.2.0 for both Layer 2 and Layer 3 for all three virtual switch workloads. This information is then used to better understand changes in results when comparing Layer 2 and Layer 3. Improvements in z/VM 5.2.0 for guest LAN, mentioned in CP Regression Measurements, are apparent in the results. Note that measurements going directly to OSA are not affected much by going to z/VM 5.2.0 since guest LAN is not involved.

Table 2. VSwitch base - 1 GbE - RR - 1492

Layer 3
Number of clients	01	10	50
%diff 5.1.0 to 5.2.0
trans/sec	-3.8%	-2.4%	-0.1%
Total CPU msec/trans	3.2%	2.4%	0.0%
Emul CPU msec/trans	0.0%	0.0%	0.0%
CP CPU msec/trans	5.3%	5.0%	0.0%
Layer 2
Number of clients	01	10	50
%diff 5.1.0 to 5.2.0
trans/sec	54.8%	31.7%	25.4%
Total CPU msec/trans	-3.1%	-6.5%	-5.6%
Emul CPU msec/trans	-7.4%	-4.3%	-4.8%
CP CPU msec/trans	0.0%	-8.7%	-6.7%
2084-324; Linux SLES8; 2 LPARs; 2 CPUs each; 1 virtual CPU

Table 3. VSwitch Base - 1 GbE - STR - 1492

Layer 3
Number of clients	01	10	50
%diff 5.1.0 to 5.2.0
MB/sec	-0.7%	0.3%	-0.2%
Total CPU msec/MB	0.7%	0.3%	2.8%
Emul CPU msec/MB	0.7%	-0.9%	1.5%
CP CPU msec/MB	0.7%	1.5%	4.0%
Layer 2
Number of clients	01	10	50
%diff 5.1.0 to 5.2.0
MB/sec	4.4%	28.1%	3.0%
Total CPU msec/MB	-28.9%	-18.0%	-13.6%
Emul CPU msec/MB	-21.1%	-15.5%	-11.7%
CP CPU msec/MB	-34.8%	-20.3%	-15.5%
2084-324; Linux SLES8; 2 LPARS; 2 CPUs each; 1 virtual CPU

Table 4. VSwitch base - 1 GbE - STR - 8992

Layer 3
Number of clients	01	10	50
%diff 5.1.0 to 5.2.0
MB/sec	0.3%	0.0%	0.0%
Total CPU msec/MB	-3.5%	2.0%	8.7%
Emul CPU msec/MB	-14.5%	-12.6%	-9.8%
CP CPU msec/MB	4.7%	15.8%	26.0%
Layer 2
Number of clients	01	10	50
%diff 5.1.0 to 5.2.0
MB/sec	18.4%	6.2%	2.2%
Total CPU msec/MB	-30.7%	-13.0%	-9.4%
Emul CPU msec/MB	-32.4%	-17.8%	-16.9%
CP CPU msec/MB	-29.6%	-9.1%	-3.4%
2084-324; Linux SLES8; 2 LPARS; 2 CPUs each; 1 virtual CPU

The following tables compare each measurement using layer 3 against the same measurement using layer 2. The table includes a percentage difference section which shows the percent increase (or decrease) for layer 2.

Table 5. VSwitch - 1 GbE - RR - 1492

Number of clients	01	10	50
runid (layer3)	vl4rn012	vl4rn101	vl4rn503
trans/sec	2124.07	13823.21	31914.04
Total CPU msec/trans	0.065	0.043	0.034
Emul CPU msec/trans	0.025	0.022	0.020
CP CPU msec/trans	0.040	0.021	0.014
runid (layer2)	vl5rn012	vl5rn101	vl5rn503
trans/sec	2183.77	14336.50	32521.90
Total CPU msec/trans	0.063	0.043	0.034
Emul CPU msec/trans	0.025	0.022	0.020
CP CPU msec/trans	0.038	0.021	0.014
z/VM 5.2.0
%diff layer3 to layer2
trans/sec	2.8%	3.7%	1.9%
Total CPU msec/trans	-3.1%	0.0%	0.0%
Emul CPU msec/trans	0.0%	0.0%	0.0%
CP CPU msec/trans	-5.0%	0.0%	0.0%
z/VM 5.1.0
%diff layer3 to layer2
trans/sec	-35.9%	-23.2%	-19.0%
Total CPU msec/trans	3.2%	9.5%	4.9%
Emul CPU msec/trans	8.0%	4.5%	5.0%
CP CPU msec/trans	0.0%	15.0%	4.8%
2084-324; Linux SLES8; 2 LPARS; 2 CPUs each; 1 virtual CPU

When traffic goes through a virtual switch to the 1 GbE card, layer 2 gets higher throughput. CPU time is the same except for the one client-server pair where layer 2 uses less CPU time. Notice the marked improvement for layer 2 when using z/VM 5.2.0 over z/VM 5.1.0. This is true for this workload and for both streaming workloads that follow.

Table 6. VSwitch - 1 GbE - STR - 1492

Number of clients	01	10	50
runid (layer 3)	vl4sn013	vl4sn103	vl4sn502
MB/sec	44.6	92.4	93.3
Total CPU msec/MB	7.13	6.93	6.95
Emul CPU msec/MB	3.27	3.38	3.39
CP CPU msec/MB	3.86	3.55	3.56
runid (layer2 )	vl5sn013	vl5sn103	vl5sn502
MB/sec	44.8	91.3	92.2
Total CPU msec/MB	6.96	6.86	6.88
Emul CPU msec/MB	3.35	3.48	3.51
CP CPU msec/MB	3.62	3.37	3.36
z/VM 5.2.0
%diff layer3 to layer2
MB/sec	0.4%	-1.2%	-1.2%
Total CPU msec/MB	-2.3%	-1.0%	-1.0%
Emul CPU msec/MB	2.3%	3.1%	3.7%
CP CPU msec/MB	-6.2%	-4.9%	-5.5%
z/VM 5.1.0
%diff layer3 to layer2
MB/sec	-4.5%	-22.6%	-4.3%
Total CPU msec/MB	38.2%	21.0%	17.7%
Emul CPU msec/MB	30.4%	20.9%	19.2%
CP CPU msec/MB	44.9%	21.1%	16.2%
2084-324; Linux SLES8; 2 LPARS; 2 CPUs each; 1 virtual CPU

In the virtual switch environment, the STR workload gets slightly less throughput and uses slightly less CPU msec/MB when using layer 2.

Table 7. VSwitch - 1 GbE - STR - 8992

Number of clients	01	10	50
runid (layer3)	vl4sj011	vl4sj103	vl4sj502
MB/sec	36.4	118.0	118.0
Total CPU msec/MB	4.18	4.25	4.68
Emul CPU msec/MB	1.59	1.76	1.88
CP CPU msec/MB	2.58	2.49	2.80
runid (layer2)	vl5sj011	vl5sj103	vl5sj502
MB/sec	36.7	118.0	118.0
Total CPU msec/MB	3.98	4.12	4.39
Emul CPU msec/MB	1.53	1.75	1.81
CP CPU msec/MB	2.45	2.37	2.58
z/VM 5.2.0
%diff layer3 to layer2
MB/sec	0.8%	0.0%	0.0%
Total CPU msec/MB	-4.7%	-3.2%	-6.2%
Emul CPU msec/MB	-4.2%	-1.0%	-3.6%
CP CPU msec/MB	-5.1%	-4.7%	-7.9%
z/VM 5.1.0
%diff layer3 to layer2
MB/sec	-15.2%	-5.8%	-2.1%
Total CPU msec/MB	35.1%	13.6%	14.1%
Emul CPU msec/MB	23.8%	4.7%	5.8%
CP CPU msec/MB	43.6%	21.8%	22.0%
2084-324; Linux SLES8; 2 LPARS; 2 CPUs each; 1 virtual CPU

When the large MTU size is used, throughput is the same for layer 2 and layer 3, and CPU msec/MB is less for layer 2.

Table 8. OSA - 1 GbE - RR - 1492

Number of clients	01	10	50
run ID (layer 3)	lorn0102	lorn1002	lorn5001
trans/sec	2538.62	14340.35	31996.02
Total CPU msec/trans	.0559	.0349	.0244
Emul CPU msec/trans	.0244	.0206	.0187
CP CPU msec/trans	.0315	.0143	.0057
run ID (layer 2)	lorn0102	lorn1002	lorn5001
trans/sec	2567.58	14681.64	32797.35
Total CPU msec/trans	.0561	.0347	.0245
Emul CPU msec/trans	.0249	.0204	.0187
CP CPU msec/trans	.0312	.0143	.0058
%diff layer 3 to layer 2
trans/sec	1.1%	2.4%	2.5%
Total CPU msec/trans	0.4%	-0.6%	0.4%
Emul CPU msec/trans	2.0%	-1.0%	0.0%
CP CPU msec/trans	-1.0%	0.0%	1.8%
2084-324; Linux SLES 8; 2 LPARs; 2 CPUs each; 1 virtual CPU

Over the 1 GbE card, layer2 gets better throughput and CPU time is very close to the same as layer 3.

Table 9. OSA - 1 GbE - STR - 1492

Number of clients	01	10	50
run ID (layer 3)	losn0101	losn1002	losn5001
MB/sec	54.5	112.0	112.1
Total CPU msec/MB	5.58	4.82	5.41
Emul CPU msec/MB	3.52	3.71	4.30
CP CPU msec/MB	2.06	1.11	1.11
run ID (layer 2)	losn0101	losn1002	losn5001
MB/sec	54.6	112.0	112.0
Total CPU msec/MB	5.64	4.95	5.64
Emul CPU msec/MB	3.59	3.80	4.45
CP CPU msec/MB	2.05	1.14	1.20
%diff layer 3 to layer 2
MB/sec	0.2%	0.0%	-0.1%
Total CPU msec/MB	1.1%	2.6%	4.4%
Emul CPU msec/MB	1.9%	2.4%	3.4%
CP CPU msec/MB	-0.2%	3.2%	8.2%
2084-324; Linux SLES 8; 2 LPARs; 2 CPUs each; 1 virtual CPU

With the streaming workload, throughput was the same, but layer 2 had a higher CPU msec/MB than layer 3. The difference increased as the workload increased. This was true for both MTU sizes.

Table 10. OSA - 1 GbE - STR - 8992

Number of clients	01	10	50
run ID (layer 3)	losj0102	losj1003	losj5001
MB/sec	64.7	118.0	118.0
Total CPU msec/MB	4.70	4.19	4.27
Emul CPU msec/MB	2.13	2.10	2.17
CP CPU msec/MB	2.57	2.08	2.10
run ID (layer 2)	losj0102	losj1003	losj5001
MB/sec	67.3	118.0	118.0
Total CPU msec/MB	4.87	4.49	4.56
Emul CPU msec/MB	2.20	2.19	2.27
CP CPU msec/MB	2.68	2.31	2.29
%diff layer 3 to layer 2
MB/sec	4.0%	0.0%	0.0%
Total CPU msec/MB	3.7%	7.3%	6.7%
Emul CPU msec/MB	3.1%	4.0%	4.7%
CP CPU msec/MB	4.2%	10.7%	8.8%
2084-324; Linux SLES 8; 2 LPARs; 2 CPUs each; 1 virtual CPU

When the MTU size is larger, layer 2 shows slightly higher throughput than layer 3 for one client-server pair.

Table 11. OSA - 10 GbE - RR - 1492

Number of clients	01	10	50	100	200	300	400	500
run ID (layer 3)	ror001	ror010	ror050	ror100	ror200	ror300	ror400	ror500
trans/sec	2429.0	14916.0	34774.0	47574.0	58185.0	64578.0	69523.0	70533.0
Total CPU msec/trans	2.40	0.62	0.31	0.24	0.22	0.21	0.22	0.23
Emul CPU msec/trans	0.76	0.13	0.05	0.03	0.03	0.02	0.02	0.02
CP CPU msec/trans	1.65	0.49	0.26	0.21	0.20	0.19	0.20	0.21
run ID (layer 2)	ro2001	r02010	r02050	r02100	r02200	r02300	r02400	r02500
trans/sec	2513.0	14951.0	36000.0	49579.0	60733.0	67568.0	72512.0	75060.0
Total CPU msec/trans	0.60	0.40	0.27	0.23	0.21	0.21	0.21	0.22
Emul CPU msec/trans	0.25	0.13	0.06	0.04	0.03	0.02	0.02	0.02
CP CPU msec/trans	0.38	0.27	0.21	0.19	0.19	0.18	0.19	0.20
%diff layer 3 to layer 2
trans/sec	3%	0%	4%	4%	4%	5%	4%	6%
Total CPU msec/trans	-75%	-35%	-14%	-4%	-4%	1%	-4%	-3%
Emul CPU msec/trans	-66%	0%	10%	10%	6%	6%	7%	5%
CP CPU msec/trans	-77%	-45%	-20%	-7%	-6%	6%	-5%	-4%
2084-B16; Linux SLES 8; z/VM 5.1.0; 10 GbE card with feature 3368; 1 LPAR; 8 CPUs; 3 virtual CPUs

As workload increases, throughput is higher for Layer 2 than Layer 3 and CPU time is somewhat less. For the lighter workloads, the CPU time for Layer 2 is considerably less than for Layer 3. We plan on investigating why CPU time is so much less for this workload, as well as for the STR workload following.

Table 12. OSA - 10 GbE - STR - 1492

Number of clients	01	08	16	32
run ID (layer 3)	sor001	sor010	sor016	sor032
MB/sec	72.0	145.0	144.0	136.0
Total CPU msec/MB	88.89	45.24	49.44	55.88
Emul CPU msec/MB	14.44	7.17	7.78	8.82
CP CPU msec/MB	74.44	37.52	41.67	47.06
run ID (layer 2)	so2001	s02008	s02016	s02032
MB/sec	74.0	157.0	156.0	149.0
Total CPU msec/MB	55.14	45.86	46.67	52.08
Emul CPU msec/MB	16.22	7.64	7.69	8.05
CP CPU msec/MB	40.00	38.22	38.97	44.03
%diff layer 3 to layer 2
MB/sec	3%	8%	8%	10%
Total CPU msec/MB	-38%	1%	-6%	-7%
Emul CPU msec/MB	12%	7%	-1%	-9%
CP CPU msec/MB	-46%	2%	-6%	-6%
2084-B16; Linux SLES 8; z/VM 5.1.0; 10 GbE card with feature 3368; 1 LPAR; 8 CPUs; 3 virtual CPUs

The same trend seen for this card and the RR workload is true for streaming, with throughput being higher for Layer 2, heavy workloads showing somewhat less CPU time and lighter workloads showing significantly less CPU time.

Contents | Previous | Next