Contents | Previous | Next

Layer 3 and Layer 2 Comparisons

Introduction 

In addition to the measurements described in the z/VM 5.1.0 report section Virtual Switch Layer 2 Support, similar measurements were also done for:

  • Virtual switch on z/VM 5.2.0
  • Linux directly attached to an OSA-Express2 1 Gigabit Ethernet (GbE) card
  • Linux directly attached to an OSA-Express2 10 GbE card
The OSA-Express features can support two transport modes: Layer 2 (Link Layer or MAC Layer) and Layer 3 (Network Layer). Both the virtual switch and Linux are then configured to support the desired capability (Layer 2 or Layer 3). In Layer 2 mode, each port is referenced by its Media Access Control (MAC) address instead of by its Internet Protocol (IP) address. Data is transported and delivered in Ethernet frames, providing the ability to handle protocol-independent traffic for both IP and non-IP such as IPX, NetBIOS, or SNA.

Acknowledgment 

The 10 GbE information is provided in cooperation with IBM's STG OSA Performance Analysis & Measurement teams.

Methodology 

The Application Workload Modeler (AWM) product was used to drive request-response (RR) and streaming (STR) workloads with IPv4 layer 3 and layer 2. Refer to AWM workload for a description of the workload used for the virtual switch and for the 1 GbE (real QDIO) measurements. The workload for the 10 GbE measurement was the same with the exception of the duration and the number of client-server pairs. For this measurement the requests were repeated for 600 seconds.

CP monitor data was captured for the LPAR and reduced using Performance Toolkit for VM. The results shown here are for the client side only. The following table shows any differences in the environments for the three measurements discussed here.

Table 1. Environment Differences

Measurement z/VM processor # of real # of virt # of
  level type CPUs CPUs LPARs
Virtual Switch 1 GbE 5.2.0 2084-324 2 1 2
Direct OSA 1 GbE 5.2.0 2084-324 2 1 2
Direct OSA 10 GbE 5.1.0 2084-B16 8 3 1

Summary of Results 

In general, Layer 2 has higher throughput (between 0.2% and 4.0%) than Layer 3. When using virtual switch, CPU time is less for Layer 2 (between -4.7% and 0%). When going directly through OSA, CPU time for Layer 2 was between -0.6% and 7.3% compared to Layer 3 for the 1 GbE card. For the 10 GbE card, Layer 2 throughput was between 0% and 10% higher than Layer 3, and CPU time was between −75% and 1%. Results can vary based on the level of z/VM, the OSA card, and the workload.

For virtual switch, Layer 2 performance improved dramatically on z/VM 5.2.0 relative to z/VM 5.1.0 (throughput increased between 2.2% and 54.8% and CPU time decreased between -3.1% and -30.1%) while Layer 3 performance was essentially unchanged. As a result, the relative performance of Layer 2 and Layer 3 changed significantly. This was not the case when going directly to OSA because both Layer 2 and Layer 3 showed little change in performance when going from z/VM 5.1.0 to z/VM 5.2.0.

Detailed Results 

The following three tables are included for background information. They show a comparison between z/VM 5.1.0 and z/VM 5.2.0 for both Layer 2 and Layer 3 for all three virtual switch workloads. This information is then used to better understand changes in results when comparing Layer 2 and Layer 3. Improvements in z/VM 5.2.0 for guest LAN, mentioned in CP Regression Measurements, are apparent in the results. Note that measurements going directly to OSA are not affected much by going to z/VM 5.2.0 since guest LAN is not involved.

Table 2. VSwitch base - 1 GbE - RR - 1492

Layer 3      
Number of clients 01 10 50
%diff 5.1.0 to 5.2.0      
trans/sec -3.8% -2.4% -0.1%
Total CPU msec/trans 3.2% 2.4% 0.0%
Emul CPU msec/trans 0.0% 0.0% 0.0%
CP CPU msec/trans 5.3% 5.0% 0.0%
Layer 2      
Number of clients 01 10 50
%diff 5.1.0 to 5.2.0      
trans/sec 54.8% 31.7% 25.4%
Total CPU msec/trans -3.1% -6.5% -5.6%
Emul CPU msec/trans -7.4% -4.3% -4.8%
CP CPU msec/trans 0.0% -8.7% -6.7%
2084-324; Linux SLES8; 2 LPARs; 2 CPUs each; 1 virtual CPU

Table 3. VSwitch Base - 1 GbE - STR - 1492

Layer 3      
Number of clients 01 10 50
%diff 5.1.0 to 5.2.0      
MB/sec -0.7% 0.3% -0.2%
Total CPU msec/MB 0.7% 0.3% 2.8%
Emul CPU msec/MB 0.7% -0.9% 1.5%
CP CPU msec/MB 0.7% 1.5% 4.0%
Layer 2      
Number of clients 01 10 50
%diff 5.1.0 to 5.2.0      
MB/sec 4.4% 28.1% 3.0%
Total CPU msec/MB -28.9% -18.0% -13.6%
Emul CPU msec/MB -21.1% -15.5% -11.7%
CP CPU msec/MB -34.8% -20.3% -15.5%
2084-324; Linux SLES8; 2 LPARS; 2 CPUs each; 1 virtual CPU

Table 4. VSwitch base - 1 GbE - STR - 8992

Layer 3      
Number of clients 01 10 50
%diff 5.1.0 to 5.2.0      
MB/sec 0.3% 0.0% 0.0%
Total CPU msec/MB -3.5% 2.0% 8.7%
Emul CPU msec/MB -14.5% -12.6% -9.8%
CP CPU msec/MB 4.7% 15.8% 26.0%
Layer 2      
Number of clients 01 10 50
%diff 5.1.0 to 5.2.0      
MB/sec 18.4% 6.2% 2.2%
Total CPU msec/MB -30.7% -13.0% -9.4%
Emul CPU msec/MB -32.4% -17.8% -16.9%
CP CPU msec/MB -29.6% -9.1% -3.4%
2084-324; Linux SLES8; 2 LPARS; 2 CPUs each; 1 virtual CPU

The following tables compare each measurement using layer 3 against the same measurement using layer 2. The table includes a percentage difference section which shows the percent increase (or decrease) for layer 2.

Table 5. VSwitch - 1 GbE - RR - 1492

Number of clients 01 10 50
runid (layer3) vl4rn012 vl4rn101 vl4rn503
trans/sec 2124.07 13823.21 31914.04
Total CPU msec/trans 0.065 0.043 0.034
Emul CPU msec/trans 0.025 0.022 0.020
CP CPU msec/trans 0.040 0.021 0.014
runid (layer2) vl5rn012 vl5rn101 vl5rn503
trans/sec 2183.77 14336.50 32521.90
Total CPU msec/trans 0.063 0.043 0.034
Emul CPU msec/trans 0.025 0.022 0.020
CP CPU msec/trans 0.038 0.021 0.014
z/VM 5.2.0      
%diff layer3 to layer2      
trans/sec 2.8% 3.7% 1.9%
Total CPU msec/trans -3.1% 0.0% 0.0%
Emul CPU msec/trans 0.0% 0.0% 0.0%
CP CPU msec/trans -5.0% 0.0% 0.0%
z/VM 5.1.0      
%diff layer3 to layer2      
trans/sec -35.9% -23.2% -19.0%
Total CPU msec/trans 3.2% 9.5% 4.9%
Emul CPU msec/trans 8.0% 4.5% 5.0%
CP CPU msec/trans 0.0% 15.0% 4.8%
2084-324; Linux SLES8; 2 LPARS; 2 CPUs each; 1 virtual CPU

When traffic goes through a virtual switch to the 1 GbE card, layer 2 gets higher throughput. CPU time is the same except for the one client-server pair where layer 2 uses less CPU time. Notice the marked improvement for layer 2 when using z/VM 5.2.0 over z/VM 5.1.0. This is true for this workload and for both streaming workloads that follow.

Table 6. VSwitch - 1 GbE - STR - 1492

Number of clients 01 10 50
runid (layer 3) vl4sn013 vl4sn103 vl4sn502
MB/sec 44.6 92.4 93.3
Total CPU msec/MB 7.13 6.93 6.95
Emul CPU msec/MB 3.27 3.38 3.39
CP CPU msec/MB 3.86 3.55 3.56
runid (layer2 ) vl5sn013 vl5sn103 vl5sn502
MB/sec 44.8 91.3 92.2
Total CPU msec/MB 6.96 6.86 6.88
Emul CPU msec/MB 3.35 3.48 3.51
CP CPU msec/MB 3.62 3.37 3.36
z/VM 5.2.0      
%diff layer3 to layer2      
MB/sec 0.4% -1.2% -1.2%
Total CPU msec/MB -2.3% -1.0% -1.0%
Emul CPU msec/MB 2.3% 3.1% 3.7%
CP CPU msec/MB -6.2% -4.9% -5.5%
z/VM 5.1.0      
%diff layer3 to layer2      
MB/sec -4.5% -22.6% -4.3%
Total CPU msec/MB 38.2% 21.0% 17.7%
Emul CPU msec/MB 30.4% 20.9% 19.2%
CP CPU msec/MB 44.9% 21.1% 16.2%
2084-324; Linux SLES8; 2 LPARS; 2 CPUs each; 1 virtual CPU

In the virtual switch environment, the STR workload gets slightly less throughput and uses slightly less CPU msec/MB when using layer 2.

Table 7. VSwitch - 1 GbE - STR - 8992

Number of clients 01 10 50
runid (layer3) vl4sj011 vl4sj103 vl4sj502
MB/sec 36.4 118.0 118.0
Total CPU msec/MB 4.18 4.25 4.68
Emul CPU msec/MB 1.59 1.76 1.88
CP CPU msec/MB 2.58 2.49 2.80
runid (layer2) vl5sj011 vl5sj103 vl5sj502
MB/sec 36.7 118.0 118.0
Total CPU msec/MB 3.98 4.12 4.39
Emul CPU msec/MB 1.53 1.75 1.81
CP CPU msec/MB 2.45 2.37 2.58
z/VM 5.2.0      
%diff layer3 to layer2      
MB/sec 0.8% 0.0% 0.0%
Total CPU msec/MB -4.7% -3.2% -6.2%
Emul CPU msec/MB -4.2% -1.0% -3.6%
CP CPU msec/MB -5.1% -4.7% -7.9%
z/VM 5.1.0      
%diff layer3 to layer2      
MB/sec -15.2% -5.8% -2.1%
Total CPU msec/MB 35.1% 13.6% 14.1%
Emul CPU msec/MB 23.8% 4.7% 5.8%
CP CPU msec/MB 43.6% 21.8% 22.0%
2084-324; Linux SLES8; 2 LPARS; 2 CPUs each; 1 virtual CPU

When the large MTU size is used, throughput is the same for layer 2 and layer 3, and CPU msec/MB is less for layer 2.

Table 8. OSA - 1 GbE - RR - 1492

Number of clients 01 10 50
run ID (layer 3) lorn0102 lorn1002 lorn5001
trans/sec 2538.62 14340.35 31996.02
Total CPU msec/trans .0559 .0349 .0244
Emul CPU msec/trans .0244 .0206 .0187
CP CPU msec/trans .0315 .0143 .0057
run ID (layer 2) lorn0102 lorn1002 lorn5001
trans/sec 2567.58 14681.64 32797.35
Total CPU msec/trans .0561 .0347 .0245
Emul CPU msec/trans .0249 .0204 .0187
CP CPU msec/trans .0312 .0143 .0058
%diff layer 3 to layer 2      
trans/sec 1.1% 2.4% 2.5%
Total CPU msec/trans 0.4% -0.6% 0.4%
Emul CPU msec/trans 2.0% -1.0% 0.0%
CP CPU msec/trans -1.0% 0.0% 1.8%
2084-324; Linux SLES 8; 2 LPARs; 2 CPUs each; 1 virtual CPU

Over the 1 GbE card, layer2 gets better throughput and CPU time is very close to the same as layer 3.

Table 9. OSA - 1 GbE - STR - 1492

Number of clients 01 10 50
run ID (layer 3) losn0101 losn1002 losn5001
MB/sec 54.5 112.0 112.1
Total CPU msec/MB 5.58 4.82 5.41
Emul CPU msec/MB 3.52 3.71 4.30
CP CPU msec/MB 2.06 1.11 1.11
run ID (layer 2) losn0101 losn1002 losn5001
MB/sec 54.6 112.0 112.0
Total CPU msec/MB 5.64 4.95 5.64
Emul CPU msec/MB 3.59 3.80 4.45
CP CPU msec/MB 2.05 1.14 1.20
%diff layer 3 to layer 2      
MB/sec 0.2% 0.0% -0.1%
Total CPU msec/MB 1.1% 2.6% 4.4%
Emul CPU msec/MB 1.9% 2.4% 3.4%
CP CPU msec/MB -0.2% 3.2% 8.2%
2084-324; Linux SLES 8; 2 LPARs; 2 CPUs each; 1 virtual CPU

With the streaming workload, throughput was the same, but layer 2 had a higher CPU msec/MB than layer 3. The difference increased as the workload increased. This was true for both MTU sizes.

Table 10. OSA - 1 GbE - STR - 8992

Number of clients 01 10 50
run ID (layer 3) losj0102 losj1003 losj5001
MB/sec 64.7 118.0 118.0
Total CPU msec/MB 4.70 4.19 4.27
Emul CPU msec/MB 2.13 2.10 2.17
CP CPU msec/MB 2.57 2.08 2.10
run ID (layer 2) losj0102 losj1003 losj5001
MB/sec 67.3 118.0 118.0
Total CPU msec/MB 4.87 4.49 4.56
Emul CPU msec/MB 2.20 2.19 2.27
CP CPU msec/MB 2.68 2.31 2.29
%diff layer 3 to layer 2      
MB/sec 4.0% 0.0% 0.0%
Total CPU msec/MB 3.7% 7.3% 6.7%
Emul CPU msec/MB 3.1% 4.0% 4.7%
CP CPU msec/MB 4.2% 10.7% 8.8%
2084-324; Linux SLES 8; 2 LPARs; 2 CPUs each; 1 virtual CPU

When the MTU size is larger, layer 2 shows slightly higher throughput than layer 3 for one client-server pair.

Table 11. OSA - 10 GbE - RR - 1492

Number of clients 01 10 50 100 200 300 400 500
run ID (layer 3) ror001 ror010 ror050 ror100 ror200 ror300 ror400 ror500
trans/sec 2429.0 14916.0 34774.0 47574.0 58185.0 64578.0 69523.0 70533.0
Total CPU msec/trans 2.40 0.62 0.31 0.24 0.22 0.21 0.22 0.23
Emul CPU msec/trans 0.76 0.13 0.05 0.03 0.03 0.02 0.02 0.02
CP CPU msec/trans 1.65 0.49 0.26 0.21 0.20 0.19 0.20 0.21
run ID (layer 2) ro2001 r02010 r02050 r02100 r02200 r02300 r02400 r02500
trans/sec 2513.0 14951.0 36000.0 49579.0 60733.0 67568.0 72512.0 75060.0
Total CPU msec/trans 0.60 0.40 0.27 0.23 0.21 0.21 0.21 0.22
Emul CPU msec/trans 0.25 0.13 0.06 0.04 0.03 0.02 0.02 0.02
CP CPU msec/trans 0.38 0.27 0.21 0.19 0.19 0.18 0.19 0.20
%diff layer 3 to layer 2                
trans/sec 3% 0% 4% 4% 4% 5% 4% 6%
Total CPU msec/trans -75% -35% -14% -4% -4% 1% -4% -3%
Emul CPU msec/trans -66% 0% 10% 10% 6% 6% 7% 5%
CP CPU msec/trans -77% -45% -20% -7% -6% 6% -5% -4%
2084-B16; Linux SLES 8; z/VM 5.1.0; 10 GbE card with feature 3368; 1 LPAR; 8 CPUs; 3 virtual CPUs

As workload increases, throughput is higher for Layer 2 than Layer 3 and CPU time is somewhat less. For the lighter workloads, the CPU time for Layer 2 is considerably less than for Layer 3. We plan on investigating why CPU time is so much less for this workload, as well as for the STR workload following.

Table 12. OSA - 10 GbE - STR - 1492

Number of clients 01 08 16 32
run ID (layer 3) sor001 sor010 sor016 sor032
MB/sec 72.0 145.0 144.0 136.0
Total CPU msec/MB 88.89 45.24 49.44 55.88
Emul CPU msec/MB 14.44 7.17 7.78 8.82
CP CPU msec/MB 74.44 37.52 41.67 47.06
run ID (layer 2) so2001 s02008 s02016 s02032
MB/sec 74.0 157.0 156.0 149.0
Total CPU msec/MB 55.14 45.86 46.67 52.08
Emul CPU msec/MB 16.22 7.64 7.69 8.05
CP CPU msec/MB 40.00 38.22 38.97 44.03
%diff layer 3 to layer 2        
MB/sec 3% 8% 8% 10%
Total CPU msec/MB -38% 1% -6% -7%
Emul CPU msec/MB 12% 7% -1% -9%
CP CPU msec/MB -46% 2% -6% -6%
2084-B16; Linux SLES 8; z/VM 5.1.0; 10 GbE card with feature 3368; 1 LPAR; 8 CPUs; 3 virtual CPUs

The same trend seen for this card and the RR workload is true for streaming, with throughput being higher for Layer 2, heavy workloads showing somewhat less CPU time and lighter workloads showing significantly less CPU time.

Contents | Previous | Next