Layer 3 and Layer 2 Comparisons
In addition to the measurements described in the z/VM 5.1.0 report section Virtual Switch Layer 2 Support, similar measurements were also done for:
- Virtual switch on z/VM 5.2.0
- Linux directly attached to an OSA-Express2 1 Gigabit Ethernet (GbE) card
- Linux directly attached to an OSA-Express2 10 GbE card
The 10 GbE information is provided in cooperation with IBM's STG OSA Performance Analysis & Measurement teams.
The Application Workload Modeler (AWM) product was used to drive request-response (RR) and streaming (STR) workloads with IPv4 layer 3 and layer 2. Refer to AWM workload for a description of the workload used for the virtual switch and for the 1 GbE (real QDIO) measurements. The workload for the 10 GbE measurement was the same with the exception of the duration and the number of client-server pairs. For this measurement the requests were repeated for 600 seconds.
CP monitor data was captured for the LPAR and reduced using Performance Toolkit for VM. The results shown here are for the client side only. The following table shows any differences in the environments for the three measurements discussed here.
Table 1. Environment Differences
Measurement | z/VM | processor | # of real | # of virt | # of |
---|---|---|---|---|---|
level | type | CPUs | CPUs | LPARs | |
Virtual Switch 1 GbE | 5.2.0 | 2084-324 | 2 | 1 | 2 |
Direct OSA 1 GbE | 5.2.0 | 2084-324 | 2 | 1 | 2 |
Direct OSA 10 GbE | 5.1.0 | 2084-B16 | 8 | 3 | 1 |
In general, Layer 2 has higher throughput (between 0.2% and 4.0%) than Layer 3. When using virtual switch, CPU time is less for Layer 2 (between -4.7% and 0%). When going directly through OSA, CPU time for Layer 2 was between -0.6% and 7.3% compared to Layer 3 for the 1 GbE card. For the 10 GbE card, Layer 2 throughput was between 0% and 10% higher than Layer 3, and CPU time was between −75% and 1%. Results can vary based on the level of z/VM, the OSA card, and the workload.
For virtual switch, Layer 2 performance improved dramatically on z/VM 5.2.0 relative to z/VM 5.1.0 (throughput increased between 2.2% and 54.8% and CPU time decreased between -3.1% and -30.1%) while Layer 3 performance was essentially unchanged. As a result, the relative performance of Layer 2 and Layer 3 changed significantly. This was not the case when going directly to OSA because both Layer 2 and Layer 3 showed little change in performance when going from z/VM 5.1.0 to z/VM 5.2.0.
The following three tables are included for background information. They show a comparison between z/VM 5.1.0 and z/VM 5.2.0 for both Layer 2 and Layer 3 for all three virtual switch workloads. This information is then used to better understand changes in results when comparing Layer 2 and Layer 3. Improvements in z/VM 5.2.0 for guest LAN, mentioned in CP Regression Measurements, are apparent in the results. Note that measurements going directly to OSA are not affected much by going to z/VM 5.2.0 since guest LAN is not involved.
Table 2. VSwitch base - 1 GbE - RR - 1492
Layer 3 | |||
Number of clients | 01 | 10 | 50 |
%diff 5.1.0 to 5.2.0 | |||
trans/sec | -3.8% | -2.4% | -0.1% |
Total CPU msec/trans | 3.2% | 2.4% | 0.0% |
Emul CPU msec/trans | 0.0% | 0.0% | 0.0% |
CP CPU msec/trans | 5.3% | 5.0% | 0.0% |
Layer 2 | |||
Number of clients | 01 | 10 | 50 |
%diff 5.1.0 to 5.2.0 | |||
trans/sec | 54.8% | 31.7% | 25.4% |
Total CPU msec/trans | -3.1% | -6.5% | -5.6% |
Emul CPU msec/trans | -7.4% | -4.3% | -4.8% |
CP CPU msec/trans | 0.0% | -8.7% | -6.7% |
2084-324; Linux SLES8; 2 LPARs; 2 CPUs each; 1 virtual CPU |
Table 3. VSwitch Base - 1 GbE - STR - 1492
Layer 3 | |||
Number of clients | 01 | 10 | 50 |
%diff 5.1.0 to 5.2.0 | |||
MB/sec | -0.7% | 0.3% | -0.2% |
Total CPU msec/MB | 0.7% | 0.3% | 2.8% |
Emul CPU msec/MB | 0.7% | -0.9% | 1.5% |
CP CPU msec/MB | 0.7% | 1.5% | 4.0% |
Layer 2 | |||
Number of clients | 01 | 10 | 50 |
%diff 5.1.0 to 5.2.0 | |||
MB/sec | 4.4% | 28.1% | 3.0% |
Total CPU msec/MB | -28.9% | -18.0% | -13.6% |
Emul CPU msec/MB | -21.1% | -15.5% | -11.7% |
CP CPU msec/MB | -34.8% | -20.3% | -15.5% |
2084-324; Linux SLES8; 2 LPARS; 2 CPUs each; 1 virtual CPU |
Table 4. VSwitch base - 1 GbE - STR - 8992
Layer 3 | |||
Number of clients | 01 | 10 | 50 |
%diff 5.1.0 to 5.2.0 | |||
MB/sec | 0.3% | 0.0% | 0.0% |
Total CPU msec/MB | -3.5% | 2.0% | 8.7% |
Emul CPU msec/MB | -14.5% | -12.6% | -9.8% |
CP CPU msec/MB | 4.7% | 15.8% | 26.0% |
Layer 2 | |||
Number of clients | 01 | 10 | 50 |
%diff 5.1.0 to 5.2.0 | |||
MB/sec | 18.4% | 6.2% | 2.2% |
Total CPU msec/MB | -30.7% | -13.0% | -9.4% |
Emul CPU msec/MB | -32.4% | -17.8% | -16.9% |
CP CPU msec/MB | -29.6% | -9.1% | -3.4% |
2084-324; Linux SLES8; 2 LPARS; 2 CPUs each; 1 virtual CPU |
The following tables compare each measurement using layer 3 against the same measurement using layer 2. The table includes a percentage difference section which shows the percent increase (or decrease) for layer 2.
Table 5. VSwitch - 1 GbE - RR - 1492
Number of clients | 01 | 10 | 50 |
runid (layer3) | vl4rn012 | vl4rn101 | vl4rn503 |
trans/sec | 2124.07 | 13823.21 | 31914.04 |
Total CPU msec/trans | 0.065 | 0.043 | 0.034 |
Emul CPU msec/trans | 0.025 | 0.022 | 0.020 |
CP CPU msec/trans | 0.040 | 0.021 | 0.014 |
runid (layer2) | vl5rn012 | vl5rn101 | vl5rn503 |
trans/sec | 2183.77 | 14336.50 | 32521.90 |
Total CPU msec/trans | 0.063 | 0.043 | 0.034 |
Emul CPU msec/trans | 0.025 | 0.022 | 0.020 |
CP CPU msec/trans | 0.038 | 0.021 | 0.014 |
z/VM 5.2.0 | |||
%diff layer3 to layer2 | |||
trans/sec | 2.8% | 3.7% | 1.9% |
Total CPU msec/trans | -3.1% | 0.0% | 0.0% |
Emul CPU msec/trans | 0.0% | 0.0% | 0.0% |
CP CPU msec/trans | -5.0% | 0.0% | 0.0% |
z/VM 5.1.0 | |||
%diff layer3 to layer2 | |||
trans/sec | -35.9% | -23.2% | -19.0% |
Total CPU msec/trans | 3.2% | 9.5% | 4.9% |
Emul CPU msec/trans | 8.0% | 4.5% | 5.0% |
CP CPU msec/trans | 0.0% | 15.0% | 4.8% |
2084-324; Linux SLES8; 2 LPARS; 2 CPUs each; 1 virtual CPU |
When traffic goes through a virtual switch to the 1 GbE card, layer 2 gets higher throughput. CPU time is the same except for the one client-server pair where layer 2 uses less CPU time. Notice the marked improvement for layer 2 when using z/VM 5.2.0 over z/VM 5.1.0. This is true for this workload and for both streaming workloads that follow.
Table 6. VSwitch - 1 GbE - STR - 1492
Number of clients | 01 | 10 | 50 |
runid (layer 3) | vl4sn013 | vl4sn103 | vl4sn502 |
MB/sec | 44.6 | 92.4 | 93.3 |
Total CPU msec/MB | 7.13 | 6.93 | 6.95 |
Emul CPU msec/MB | 3.27 | 3.38 | 3.39 |
CP CPU msec/MB | 3.86 | 3.55 | 3.56 |
runid (layer2 ) | vl5sn013 | vl5sn103 | vl5sn502 |
MB/sec | 44.8 | 91.3 | 92.2 |
Total CPU msec/MB | 6.96 | 6.86 | 6.88 |
Emul CPU msec/MB | 3.35 | 3.48 | 3.51 |
CP CPU msec/MB | 3.62 | 3.37 | 3.36 |
z/VM 5.2.0 | |||
%diff layer3 to layer2 | |||
MB/sec | 0.4% | -1.2% | -1.2% |
Total CPU msec/MB | -2.3% | -1.0% | -1.0% |
Emul CPU msec/MB | 2.3% | 3.1% | 3.7% |
CP CPU msec/MB | -6.2% | -4.9% | -5.5% |
z/VM 5.1.0 | |||
%diff layer3 to layer2 | |||
MB/sec | -4.5% | -22.6% | -4.3% |
Total CPU msec/MB | 38.2% | 21.0% | 17.7% |
Emul CPU msec/MB | 30.4% | 20.9% | 19.2% |
CP CPU msec/MB | 44.9% | 21.1% | 16.2% |
2084-324; Linux SLES8; 2 LPARS; 2 CPUs each; 1 virtual CPU |
In the virtual switch environment, the STR workload gets slightly less throughput and uses slightly less CPU msec/MB when using layer 2.
Table 7. VSwitch - 1 GbE - STR - 8992
Number of clients | 01 | 10 | 50 |
runid (layer3) | vl4sj011 | vl4sj103 | vl4sj502 |
MB/sec | 36.4 | 118.0 | 118.0 |
Total CPU msec/MB | 4.18 | 4.25 | 4.68 |
Emul CPU msec/MB | 1.59 | 1.76 | 1.88 |
CP CPU msec/MB | 2.58 | 2.49 | 2.80 |
runid (layer2) | vl5sj011 | vl5sj103 | vl5sj502 |
MB/sec | 36.7 | 118.0 | 118.0 |
Total CPU msec/MB | 3.98 | 4.12 | 4.39 |
Emul CPU msec/MB | 1.53 | 1.75 | 1.81 |
CP CPU msec/MB | 2.45 | 2.37 | 2.58 |
z/VM 5.2.0 | |||
%diff layer3 to layer2 | |||
MB/sec | 0.8% | 0.0% | 0.0% |
Total CPU msec/MB | -4.7% | -3.2% | -6.2% |
Emul CPU msec/MB | -4.2% | -1.0% | -3.6% |
CP CPU msec/MB | -5.1% | -4.7% | -7.9% |
z/VM 5.1.0 | |||
%diff layer3 to layer2 | |||
MB/sec | -15.2% | -5.8% | -2.1% |
Total CPU msec/MB | 35.1% | 13.6% | 14.1% |
Emul CPU msec/MB | 23.8% | 4.7% | 5.8% |
CP CPU msec/MB | 43.6% | 21.8% | 22.0% |
2084-324; Linux SLES8; 2 LPARS; 2 CPUs each; 1 virtual CPU |
When the large MTU size is used, throughput is the same for layer 2 and layer 3, and CPU msec/MB is less for layer 2.
Table 8. OSA - 1 GbE - RR - 1492
Number of clients | 01 | 10 | 50 |
run ID (layer 3) | lorn0102 | lorn1002 | lorn5001 |
trans/sec | 2538.62 | 14340.35 | 31996.02 |
Total CPU msec/trans | .0559 | .0349 | .0244 |
Emul CPU msec/trans | .0244 | .0206 | .0187 |
CP CPU msec/trans | .0315 | .0143 | .0057 |
run ID (layer 2) | lorn0102 | lorn1002 | lorn5001 |
trans/sec | 2567.58 | 14681.64 | 32797.35 |
Total CPU msec/trans | .0561 | .0347 | .0245 |
Emul CPU msec/trans | .0249 | .0204 | .0187 |
CP CPU msec/trans | .0312 | .0143 | .0058 |
%diff layer 3 to layer 2 | |||
trans/sec | 1.1% | 2.4% | 2.5% |
Total CPU msec/trans | 0.4% | -0.6% | 0.4% |
Emul CPU msec/trans | 2.0% | -1.0% | 0.0% |
CP CPU msec/trans | -1.0% | 0.0% | 1.8% |
2084-324; Linux SLES 8; 2 LPARs; 2 CPUs each; 1 virtual CPU |
Over the 1 GbE card, layer2 gets better throughput and CPU time is very close to the same as layer 3.
Table 9. OSA - 1 GbE - STR - 1492
Number of clients | 01 | 10 | 50 |
run ID (layer 3) | losn0101 | losn1002 | losn5001 |
MB/sec | 54.5 | 112.0 | 112.1 |
Total CPU msec/MB | 5.58 | 4.82 | 5.41 |
Emul CPU msec/MB | 3.52 | 3.71 | 4.30 |
CP CPU msec/MB | 2.06 | 1.11 | 1.11 |
run ID (layer 2) | losn0101 | losn1002 | losn5001 |
MB/sec | 54.6 | 112.0 | 112.0 |
Total CPU msec/MB | 5.64 | 4.95 | 5.64 |
Emul CPU msec/MB | 3.59 | 3.80 | 4.45 |
CP CPU msec/MB | 2.05 | 1.14 | 1.20 |
%diff layer 3 to layer 2 | |||
MB/sec | 0.2% | 0.0% | -0.1% |
Total CPU msec/MB | 1.1% | 2.6% | 4.4% |
Emul CPU msec/MB | 1.9% | 2.4% | 3.4% |
CP CPU msec/MB | -0.2% | 3.2% | 8.2% |
2084-324; Linux SLES 8; 2 LPARs; 2 CPUs each; 1 virtual CPU |
With the streaming workload, throughput was the same, but layer 2 had a higher CPU msec/MB than layer 3. The difference increased as the workload increased. This was true for both MTU sizes.
Table 10. OSA - 1 GbE - STR - 8992
Number of clients | 01 | 10 | 50 |
run ID (layer 3) | losj0102 | losj1003 | losj5001 |
MB/sec | 64.7 | 118.0 | 118.0 |
Total CPU msec/MB | 4.70 | 4.19 | 4.27 |
Emul CPU msec/MB | 2.13 | 2.10 | 2.17 |
CP CPU msec/MB | 2.57 | 2.08 | 2.10 |
run ID (layer 2) | losj0102 | losj1003 | losj5001 |
MB/sec | 67.3 | 118.0 | 118.0 |
Total CPU msec/MB | 4.87 | 4.49 | 4.56 |
Emul CPU msec/MB | 2.20 | 2.19 | 2.27 |
CP CPU msec/MB | 2.68 | 2.31 | 2.29 |
%diff layer 3 to layer 2 | |||
MB/sec | 4.0% | 0.0% | 0.0% |
Total CPU msec/MB | 3.7% | 7.3% | 6.7% |
Emul CPU msec/MB | 3.1% | 4.0% | 4.7% |
CP CPU msec/MB | 4.2% | 10.7% | 8.8% |
2084-324; Linux SLES 8; 2 LPARs; 2 CPUs each; 1 virtual CPU |
When the MTU size is larger, layer 2 shows slightly higher throughput than layer 3 for one client-server pair.
Table 11. OSA - 10 GbE - RR - 1492
Number of clients | 01 | 10 | 50 | 100 | 200 | 300 | 400 | 500 |
run ID (layer 3) | ror001 | ror010 | ror050 | ror100 | ror200 | ror300 | ror400 | ror500 |
trans/sec | 2429.0 | 14916.0 | 34774.0 | 47574.0 | 58185.0 | 64578.0 | 69523.0 | 70533.0 |
Total CPU msec/trans | 2.40 | 0.62 | 0.31 | 0.24 | 0.22 | 0.21 | 0.22 | 0.23 |
Emul CPU msec/trans | 0.76 | 0.13 | 0.05 | 0.03 | 0.03 | 0.02 | 0.02 | 0.02 |
CP CPU msec/trans | 1.65 | 0.49 | 0.26 | 0.21 | 0.20 | 0.19 | 0.20 | 0.21 |
run ID (layer 2) | ro2001 | r02010 | r02050 | r02100 | r02200 | r02300 | r02400 | r02500 |
trans/sec | 2513.0 | 14951.0 | 36000.0 | 49579.0 | 60733.0 | 67568.0 | 72512.0 | 75060.0 |
Total CPU msec/trans | 0.60 | 0.40 | 0.27 | 0.23 | 0.21 | 0.21 | 0.21 | 0.22 |
Emul CPU msec/trans | 0.25 | 0.13 | 0.06 | 0.04 | 0.03 | 0.02 | 0.02 | 0.02 |
CP CPU msec/trans | 0.38 | 0.27 | 0.21 | 0.19 | 0.19 | 0.18 | 0.19 | 0.20 |
%diff layer 3 to layer 2 | ||||||||
trans/sec | 3% | 0% | 4% | 4% | 4% | 5% | 4% | 6% |
Total CPU msec/trans | -75% | -35% | -14% | -4% | -4% | 1% | -4% | -3% |
Emul CPU msec/trans | -66% | 0% | 10% | 10% | 6% | 6% | 7% | 5% |
CP CPU msec/trans | -77% | -45% | -20% | -7% | -6% | 6% | -5% | -4% |
2084-B16; Linux SLES 8; z/VM 5.1.0; 10 GbE card with feature 3368; 1 LPAR; 8 CPUs; 3 virtual CPUs |
As workload increases, throughput is higher for Layer 2 than Layer 3 and CPU time is somewhat less. For the lighter workloads, the CPU time for Layer 2 is considerably less than for Layer 3. We plan on investigating why CPU time is so much less for this workload, as well as for the STR workload following.
Table 12. OSA - 10 GbE - STR - 1492
Number of clients | 01 | 08 | 16 | 32 |
run ID (layer 3) | sor001 | sor010 | sor016 | sor032 |
MB/sec | 72.0 | 145.0 | 144.0 | 136.0 |
Total CPU msec/MB | 88.89 | 45.24 | 49.44 | 55.88 |
Emul CPU msec/MB | 14.44 | 7.17 | 7.78 | 8.82 |
CP CPU msec/MB | 74.44 | 37.52 | 41.67 | 47.06 |
run ID (layer 2) | so2001 | s02008 | s02016 | s02032 |
MB/sec | 74.0 | 157.0 | 156.0 | 149.0 |
Total CPU msec/MB | 55.14 | 45.86 | 46.67 | 52.08 |
Emul CPU msec/MB | 16.22 | 7.64 | 7.69 | 8.05 |
CP CPU msec/MB | 40.00 | 38.22 | 38.97 | 44.03 |
%diff layer 3 to layer 2 | ||||
MB/sec | 3% | 8% | 8% | 10% |
Total CPU msec/MB | -38% | 1% | -6% | -7% |
Emul CPU msec/MB | 12% | 7% | -1% | -9% |
CP CPU msec/MB | -46% | 2% | -6% | -6% |
2084-B16; Linux SLES 8; z/VM 5.1.0; 10 GbE card with feature 3368; 1 LPAR; 8 CPUs; 3 virtual CPUs |
The same trend seen for this card and the RR workload is true for streaming, with throughput being higher for Layer 2, heavy workloads showing somewhat less CPU time and lighter workloads showing significantly less CPU time.