|
Contents | Previous | Next
Layer 3 and Layer 2 Comparisons
Introduction:
In addition to the measurements described in the z/VM 5.1.0 report
section Virtual Switch Layer 2 Support,
similar measurements were also done for:
- Virtual switch on z/VM 5.2.0
- Linux directly attached to an OSA-Express2 1
Gigabit Ethernet (GbE) card
- Linux directly attached to an OSA-Express2 10 GbE card
The OSA-Express features can support two transport modes: Layer 2
(Link Layer or MAC Layer) and Layer 3 (Network Layer). Both the
virtual switch and Linux are then configured to support the desired
capability (Layer 2 or Layer 3).
In Layer 2 mode, each port is referenced by its Media Access Control
(MAC) address instead of by its Internet Protocol (IP) address. Data is
transported and delivered in Ethernet frames, providing the ability to
handle protocol-independent traffic for both IP and non-IP such as
IPX, NetBIOS, or SNA.
Acknowledgment:
The 10 GbE information is provided in cooperation with
IBM's STG OSA
Performance Analysis & Measurement teams.
Methodology:
The Application Workload Modeler (AWM) product was used to drive
request-response (RR) and streaming (STR) workloads
with IPv4 layer 3 and layer 2. Refer to
AWM workload for a description of the workload used for the
virtual switch and for the 1 GbE (real QDIO) measurements.
The workload for
the 10 GbE measurement was the same with the exception of the
duration and the number of client-server pairs. For this measurement
the requests were repeated for 600 seconds.
CP monitor data was captured for the LPAR and reduced using
Performance Toolkit for VM. The results shown here are
for the client side only.
The following table shows any differences in the environments
for the three measurements discussed here.
Table 1. Environment Differences
| Measurement | z/VM | processor | # of real | # of virt | # of |
| | level | type | CPUs | CPUs | LPARs |
| Virtual Switch 1 GbE | 5.2.0 | 2084-324 | 2 | 1 | 2 |
| Direct OSA 1 GbE | 5.2.0 | 2084-324 | 2 | 1 | 2 |
| Direct OSA 10 GbE | 5.1.0 | 2084-B16 | 8 | 3 | 1 |
Summary of Results:
In general, Layer 2 has higher throughput (between 0.2% and 4.0%)
than Layer 3. When using virtual switch, CPU time is less for Layer 2
(between -4.7% and 0%). When going directly through OSA, CPU time for
Layer 2 was between -0.6% and 7.3% compared to Layer 3 for the 1 GbE
card. For the 10 GbE card, Layer 2 throughput
was between 0% and 10% higher than
Layer 3, and CPU time was between −75% and 1%.
Results can
vary based on the level of z/VM, the OSA card, and the workload.
For virtual switch, Layer 2 performance improved dramatically on
z/VM 5.2.0 relative to z/VM 5.1.0
(throughput increased between 2.2% and 54.8%
and CPU time decreased between -3.1% and -30.1%) while Layer 3
performance was essentially unchanged. As a result, the relative
performance of Layer 2 and Layer 3 changed significantly. This was
not the case when going directly to OSA because both Layer 2 and
Layer 3 showed little change in performance when going from
z/VM 5.1.0 to z/VM 5.2.0.
Detailed Results:
The following three tables are included for background
information. They
show a comparison between z/VM 5.1.0 and z/VM 5.2.0 for both Layer 2
and Layer 3 for all three virtual switch workloads.
This information is then used
to better understand changes in results when comparing
Layer 2 and Layer 3. Improvements in z/VM 5.2.0 for guest LAN,
mentioned in CP Regression Measurements,
are apparent in the results.
Note that measurements going directly to OSA are not affected much
by going to z/VM 5.2.0 since guest LAN is not involved.
Table 2. VSwitch base - 1 GbE - RR - 1492
| Layer 3 | | | |
| Number of clients | 01 | 10 | 50 |
| %diff 5.1.0 to 5.2.0 | | | |
| trans/sec | -3.8% | -2.4% | -0.1% |
| Total CPU msec/trans | 3.2% | 2.4% | 0.0% |
| Emul CPU msec/trans | 0.0% | 0.0% | 0.0% |
| CP CPU msec/trans | 5.3% | 5.0% | 0.0% |
| Layer 2 | | | |
| Number of clients | 01 | 10 | 50 |
| %diff 5.1.0 to 5.2.0 | | | |
| trans/sec | 54.8% | 31.7% | 25.4% |
| Total CPU msec/trans | -3.1% | -6.5% | -5.6% |
| Emul CPU msec/trans | -7.4% | -4.3% | -4.8% |
| CP CPU msec/trans | 0.0% | -8.7% | -6.7% |
|
2084-324; Linux SLES8; 2 LPARs; 2 CPUs each; 1 virtual CPU
|
Table 3. VSwitch Base - 1 GbE - STR - 1492
| Layer 3 | | | |
| Number of clients | 01 | 10 | 50 |
| %diff 5.1.0 to 5.2.0 | | | |
| MB/sec | -0.7% | 0.3% | -0.2% |
| Total CPU msec/MB | 0.7% | 0.3% | 2.8% |
| Emul CPU msec/MB | 0.7% | -0.9% | 1.5% |
| CP CPU msec/MB | 0.7% | 1.5% | 4.0% |
| Layer 2 | | | |
| Number of clients | 01 | 10 | 50 |
| %diff 5.1.0 to 5.2.0 | | | |
| MB/sec | 4.4% | 28.1% | 3.0% |
| Total CPU msec/MB | -28.9% | -18.0% | -13.6% |
| Emul CPU msec/MB | -21.1% | -15.5% | -11.7% |
| CP CPU msec/MB | -34.8% | -20.3% | -15.5% |
|
2084-324; Linux SLES8; 2 LPARS; 2 CPUs each; 1 virtual CPU
|
Table 4. VSwitch base - 1 GbE - STR - 8992
| Layer 3 | | | |
| Number of clients | 01 | 10 | 50 |
| %diff 5.1.0 to 5.2.0 | | | |
| MB/sec | 0.3% | 0.0% | 0.0% |
| Total CPU msec/MB | -3.5% | 2.0% | 8.7% |
| Emul CPU msec/MB | -14.5% | -12.6% | -9.8% |
| CP CPU msec/MB | 4.7% | 15.8% | 26.0% |
| Layer 2 | | | |
| Number of clients | 01 | 10 | 50 |
| %diff 5.1.0 to 5.2.0 | | | |
| MB/sec | 18.4% | 6.2% | 2.2% |
| Total CPU msec/MB | -30.7% | -13.0% | -9.4% |
| Emul CPU msec/MB | -32.4% | -17.8% | -16.9% |
| CP CPU msec/MB | -29.6% | -9.1% | -3.4% |
|
2084-324; Linux SLES8; 2 LPARS; 2 CPUs each; 1 virtual CPU
|
The following tables compare each measurement using layer 3 against
the same measurement using layer 2. The table includes a
percentage difference section
which shows the percent increase (or decrease) for layer 2.
Table 5. VSwitch - 1 GbE - RR - 1492
| Number of clients | 01 | 10 | 50 |
| runid (layer3) | vl4rn012 | vl4rn101 | vl4rn503 |
| trans/sec | 2124.07 | 13823.21 | 31914.04 |
| Total CPU msec/trans | 0.065 | 0.043 | 0.034 |
| Emul CPU msec/trans | 0.025 | 0.022 | 0.020 |
| CP CPU msec/trans | 0.040 | 0.021 | 0.014 |
| runid (layer2) | vl5rn012 | vl5rn101 | vl5rn503 |
| trans/sec | 2183.77 | 14336.50 | 32521.90 |
| Total CPU msec/trans | 0.063 | 0.043 | 0.034 |
| Emul CPU msec/trans | 0.025 | 0.022 | 0.020 |
| CP CPU msec/trans | 0.038 | 0.021 | 0.014 |
| z/VM 5.2.0 | | | |
| %diff layer3 to layer2 | | | |
| trans/sec | 2.8% | 3.7% | 1.9% |
| Total CPU msec/trans | -3.1% | 0.0% | 0.0% |
| Emul CPU msec/trans | 0.0% | 0.0% | 0.0% |
| CP CPU msec/trans | -5.0% | 0.0% | 0.0% |
| z/VM 5.1.0 | | | |
| %diff layer3 to layer2 | | | |
| trans/sec | -35.9% | -23.2% | -19.0% |
| Total CPU msec/trans | 3.2% | 9.5% | 4.9% |
| Emul CPU msec/trans | 8.0% | 4.5% | 5.0% |
| CP CPU msec/trans | 0.0% | 15.0% | 4.8% |
|
2084-324; Linux SLES8; 2 LPARS; 2 CPUs each; 1 virtual CPU
|
When traffic goes through a virtual switch to the 1 GbE card,
layer 2 gets higher throughput. CPU time is the same except for the
one client-server pair where layer 2 uses less CPU time.
Notice the marked improvement for layer 2 when using z/VM 5.2.0 over
z/VM 5.1.0. This is true for this workload and for both streaming
workloads that follow.
Table 6. VSwitch - 1 GbE - STR - 1492
| Number of clients | 01 | 10 | 50 |
| runid (layer 3) | vl4sn013 | vl4sn103 | vl4sn502 |
| MB/sec | 44.6 | 92.4 | 93.3 |
| Total CPU msec/MB | 7.13 | 6.93 | 6.95 |
| Emul CPU msec/MB | 3.27 | 3.38 | 3.39 |
| CP CPU msec/MB | 3.86 | 3.55 | 3.56 |
| runid (layer2 ) | vl5sn013 | vl5sn103 | vl5sn502 |
| MB/sec | 44.8 | 91.3 | 92.2 |
| Total CPU msec/MB | 6.96 | 6.86 | 6.88 |
| Emul CPU msec/MB | 3.35 | 3.48 | 3.51 |
| CP CPU msec/MB | 3.62 | 3.37 | 3.36 |
| z/VM 5.2.0 | | | |
| %diff layer3 to layer2 | | | |
| MB/sec | 0.4% | -1.2% | -1.2% |
| Total CPU msec/MB | -2.3% | -1.0% | -1.0% |
| Emul CPU msec/MB | 2.3% | 3.1% | 3.7% |
| CP CPU msec/MB | -6.2% | -4.9% | -5.5% |
| z/VM 5.1.0 | | | |
| %diff layer3 to layer2 | | | |
| MB/sec | -4.5% | -22.6% | -4.3% |
| Total CPU msec/MB | 38.2% | 21.0% | 17.7% |
| Emul CPU msec/MB | 30.4% | 20.9% | 19.2% |
| CP CPU msec/MB | 44.9% | 21.1% | 16.2% |
|
2084-324; Linux SLES8; 2 LPARS; 2 CPUs each; 1 virtual CPU
|
In the virtual switch environment, the STR workload gets slightly
less throughput and uses slightly less CPU msec/MB when using layer 2.
Table 7. VSwitch - 1 GbE - STR - 8992
| Number of clients | 01 | 10 | 50 |
| runid (layer3) | vl4sj011 | vl4sj103 | vl4sj502 |
| MB/sec | 36.4 | 118.0 | 118.0 |
| Total CPU msec/MB | 4.18 | 4.25 | 4.68 |
| Emul CPU msec/MB | 1.59 | 1.76 | 1.88 |
| CP CPU msec/MB | 2.58 | 2.49 | 2.80 |
| runid (layer2) | vl5sj011 | vl5sj103 | vl5sj502 |
| MB/sec | 36.7 | 118.0 | 118.0 |
| Total CPU msec/MB | 3.98 | 4.12 | 4.39 |
| Emul CPU msec/MB | 1.53 | 1.75 | 1.81 |
| CP CPU msec/MB | 2.45 | 2.37 | 2.58 |
| z/VM 5.2.0 | | | |
| %diff layer3 to layer2 | | | |
| MB/sec | 0.8% | 0.0% | 0.0% |
| Total CPU msec/MB | -4.7% | -3.2% | -6.2% |
| Emul CPU msec/MB | -4.2% | -1.0% | -3.6% |
| CP CPU msec/MB | -5.1% | -4.7% | -7.9% |
| z/VM 5.1.0 | | | |
| %diff layer3 to layer2 | | | |
| MB/sec | -15.2% | -5.8% | -2.1% |
| Total CPU msec/MB | 35.1% | 13.6% | 14.1% |
| Emul CPU msec/MB | 23.8% | 4.7% | 5.8% |
| CP CPU msec/MB | 43.6% | 21.8% | 22.0% |
|
2084-324; Linux SLES8; 2 LPARS; 2 CPUs each; 1 virtual CPU
|
When the large MTU size is used,
throughput is the same for layer 2 and layer 3,
and CPU msec/MB is less for layer 2.
Table 8. OSA - 1 GbE - RR - 1492
| Number of clients | 01 | 10 | 50 |
| run ID (layer 3) | lorn0102 | lorn1002 | lorn5001 |
| trans/sec | 2538.62 | 14340.35 | 31996.02 |
| Total CPU msec/trans | .0559 | .0349 | .0244 |
| Emul CPU msec/trans | .0244 | .0206 | .0187 |
| CP CPU msec/trans | .0315 | .0143 | .0057 |
| run ID (layer 2) | lorn0102 | lorn1002 | lorn5001 |
| trans/sec | 2567.58 | 14681.64 | 32797.35 |
| Total CPU msec/trans | .0561 | .0347 | .0245 |
| Emul CPU msec/trans | .0249 | .0204 | .0187 |
| CP CPU msec/trans | .0312 | .0143 | .0058 |
| %diff layer 3 to layer 2 | | | |
| trans/sec | 1.1% | 2.4% | 2.5% |
| Total CPU msec/trans | 0.4% | -0.6% | 0.4% |
| Emul CPU msec/trans | 2.0% | -1.0% | 0.0% |
| CP CPU msec/trans | -1.0% | 0.0% | 1.8% |
|
2084-324; Linux SLES 8; 2 LPARs; 2 CPUs each; 1 virtual CPU
|
Over the 1 GbE card, layer2 gets better throughput and CPU time is
very close to the same as layer 3.
Table 9. OSA - 1 GbE - STR - 1492
| Number of clients | 01 | 10 | 50 |
| run ID (layer 3) | losn0101 | losn1002 | losn5001 |
| MB/sec | 54.5 | 112.0 | 112.1 |
| Total CPU msec/MB | 5.58 | 4.82 | 5.41 |
| Emul CPU msec/MB | 3.52 | 3.71 | 4.30 |
| CP CPU msec/MB | 2.06 | 1.11 | 1.11 |
| run ID (layer 2) | losn0101 | losn1002 | losn5001 |
| MB/sec | 54.6 | 112.0 | 112.0 |
| Total CPU msec/MB | 5.64 | 4.95 | 5.64 |
| Emul CPU msec/MB | 3.59 | 3.80 | 4.45 |
| CP CPU msec/MB | 2.05 | 1.14 | 1.20 |
| %diff layer 3 to layer 2 | | | |
| MB/sec | 0.2% | 0.0% | -0.1% |
| Total CPU msec/MB | 1.1% | 2.6% | 4.4% |
| Emul CPU msec/MB | 1.9% | 2.4% | 3.4% |
| CP CPU msec/MB | -0.2% | 3.2% | 8.2% |
|
2084-324; Linux SLES 8; 2 LPARs; 2 CPUs each; 1 virtual CPU
|
With the streaming workload, throughput was the same, but layer 2
had a higher CPU msec/MB than layer 3. The difference increased as
the workload increased. This was true for both MTU sizes.
Table 10. OSA - 1 GbE - STR - 8992
| Number of clients | 01 | 10 | 50 |
| run ID (layer 3) | losj0102 | losj1003 | losj5001 |
| MB/sec | 64.7 | 118.0 | 118.0 |
| Total CPU msec/MB | 4.70 | 4.19 | 4.27 |
| Emul CPU msec/MB | 2.13 | 2.10 | 2.17 |
| CP CPU msec/MB | 2.57 | 2.08 | 2.10 |
| run ID (layer 2) | losj0102 | losj1003 | losj5001 |
| MB/sec | 67.3 | 118.0 | 118.0 |
| Total CPU msec/MB | 4.87 | 4.49 | 4.56 |
| Emul CPU msec/MB | 2.20 | 2.19 | 2.27 |
| CP CPU msec/MB | 2.68 | 2.31 | 2.29 |
| %diff layer 3 to layer 2 | | | |
| MB/sec | 4.0% | 0.0% | 0.0% |
| Total CPU msec/MB | 3.7% | 7.3% | 6.7% |
| Emul CPU msec/MB | 3.1% | 4.0% | 4.7% |
| CP CPU msec/MB | 4.2% | 10.7% | 8.8% |
|
2084-324; Linux SLES 8; 2 LPARs; 2 CPUs each; 1 virtual CPU
|
When the MTU size is larger,
layer 2 shows slightly higher throughput than layer 3 for
one client-server pair.
Table 11. OSA - 10 GbE - RR - 1492
| Number of clients | 01 | 10 | 50 | 100 | 200 | 300 | 400 | 500 |
| run ID (layer 3) | ror001 | ror010 | ror050 | ror100 | ror200 | ror300 | ror400 | ror500 |
| trans/sec | 2429.0 | 14916.0 | 34774.0 | 47574.0 | 58185.0 | 64578.0 | 69523.0 | 70533.0 |
| Total CPU msec/trans | 2.40 | 0.62 | 0.31 | 0.24 | 0.22 | 0.21 | 0.22 | 0.23 |
| Emul CPU msec/trans | 0.76 | 0.13 | 0.05 | 0.03 | 0.03 | 0.02 | 0.02 | 0.02 |
| CP CPU msec/trans | 1.65 | 0.49 | 0.26 | 0.21 | 0.20 | 0.19 | 0.20 | 0.21 |
| run ID (layer 2) | ro2001 | r02010 | r02050 | r02100 | r02200 | r02300 | r02400 | r02500 |
| trans/sec | 2513.0 | 14951.0 | 36000.0 | 49579.0 | 60733.0 | 67568.0 | 72512.0 | 75060.0 |
| Total CPU msec/trans | 0.60 | 0.40 | 0.27 | 0.23 | 0.21 | 0.21 | 0.21 | 0.22 |
| Emul CPU msec/trans | 0.25 | 0.13 | 0.06 | 0.04 | 0.03 | 0.02 | 0.02 | 0.02 |
| CP CPU msec/trans | 0.38 | 0.27 | 0.21 | 0.19 | 0.19 | 0.18 | 0.19 | 0.20 |
| %diff layer 3 to layer 2 | | | | | | | | |
| trans/sec | 3% | 0% | 4% | 4% | 4% | 5% | 4% | 6% |
| Total CPU msec/trans | -75% | -35% | -14% | -4% | -4% | 1% | -4% | -3% |
| Emul CPU msec/trans | -66% | 0% | 10% | 10% | 6% | 6% | 7% | 5% |
| CP CPU msec/trans | -77% | -45% | -20% | -7% | -6% | 6% | -5% | -4% |
|
2084-B16; Linux SLES 8; z/VM 5.1.0; 10 GbE card with feature 3368;
1 LPAR; 8 CPUs; 3 virtual CPUs
|
As workload increases,
throughput is higher for Layer 2 than Layer 3 and CPU time is
somewhat less. For the lighter workloads,
the CPU time for Layer 2 is considerably less than for Layer 3.
We plan on investigating why CPU time is so much less for this
workload, as well as for the STR workload following.
Table 12. OSA - 10 GbE - STR - 1492
| Number of clients | 01 | 08 | 16 | 32 |
| run ID (layer 3) | sor001 | sor010 | sor016 | sor032 |
| MB/sec | 72.0 | 145.0 | 144.0 | 136.0 |
| Total CPU msec/MB | 88.89 | 45.24 | 49.44 | 55.88 |
| Emul CPU msec/MB | 14.44 | 7.17 | 7.78 | 8.82 |
| CP CPU msec/MB | 74.44 | 37.52 | 41.67 | 47.06 |
| run ID (layer 2) | so2001 | s02008 | s02016 | s02032 |
| MB/sec | 74.0 | 157.0 | 156.0 | 149.0 |
| Total CPU msec/MB | 55.14 | 45.86 | 46.67 | 52.08 |
| Emul CPU msec/MB | 16.22 | 7.64 | 7.69 | 8.05 |
| CP CPU msec/MB | 40.00 | 38.22 | 38.97 | 44.03 |
| %diff layer 3 to layer 2 | | | | |
| MB/sec | 3% | 8% | 8% | 10% |
| Total CPU msec/MB | -38% | 1% | -6% | -7% |
| Emul CPU msec/MB | 12% | 7% | -1% | -9% |
| CP CPU msec/MB | -46% | 2% | -6% | -6% |
|
2084-B16; Linux SLES 8; z/VM 5.1.0; 10 GbE card with feature 3368;
1 LPAR; 8 CPUs; 3 virtual CPUs
|
The same trend seen for this card and the RR workload is true for
streaming, with throughput being higher for Layer 2, heavy workloads
showing somewhat less CPU time and lighter workloads showing
significantly less CPU time.
Contents | Previous | Next
|