Linux Guest IUCV Driver
Executive Summary
We used a CMS test program to measure the best data rate one could expect between two virtual machines connected by IUCV. We then measured the data rate experienced by two Linux guests connected via the Linux IUCV line driver. We found that the Linux machines could drive the IUCV link to about 30% of its capacity.
We also conducted a head-to-head data rate comparison of the Linux IUCV and Linux CTC line drivers. We found that the data rate with the CTC driver was at best 72% of the IUCV driver's data rate, and the larger the MTU size, the larger the gap.
We did not measure the latency of the IUCV or CTC line drivers. Nor did we measure the effects of larger n-way configurations or other scaling scenarios.
Procedure
To measure the practical upper limit on the data rate, we prepared a pair of CMS programs that would exchange data using APPC/VM (aka "synchronous IUCV"). We chose APPC/VM for this because when WAIT=YES is used fewer interrupts are delivered to the guests and thus the data rate is increased.
The processing performed by the two CMS programs looks like this:
Requester Server --------- ------ 0. Identify resource manager 1. Allocate conversation 2. Accept conversation 3. Do "I" times: 4. Sample TOD clock 5. Do "J" times: 6. Transmit a value "N" 7. Receive value of "N" 8. Transmit "N" bytes 9. Receive "N" bytes 10. Sample TOD clock again 11. Subtract TOD clock samples 12. Print microseconds used 13. Deallocate conversation 14. Wait for another round
We chose I=20 and J=1000.
We ran this pair of programs for increasing values of N (1, 100, 200, 400, 800, 1000, 2000, ..., 8000000) and recorded the value of N at which the curve flattened out, that is, beyond which a larger data rate was not achieved.
For each value of N, we used CP QUERY TIME to record the virtual time and CP time for the requester and the server. We added the two machines' virtual and CP times together so that we could see the distribution of total processor time between CP and the two guests.
To measure the data rate between Linux guests, we set up two Linux 2.2.16 systems and connected them via the IUCV line driver (GA-equivalent level) with various MTU sizes. 1 We then ran an IBM internal network performance measurement tool in stream put mode, transfer size 20,000,000 bytes, 10 samples of 100 repetitions each, with API crossing sizes equal to the MTU size. We used CP QUERY TIME to record the virtual time and CP time used by each Linux machine during the run. We added the two machines' virtual and CP times together so that we could see the distribution of total processor time between CP and the two guests.
We repeated the aforementioned Linux experiment, using the CTC driver and a VCTC connection instead of the IUCV driver, and took the same measurements during the run.
Hardware Used
9672-XZ7, two-processor LPAR (dedicated), LPAR had 2 GB main storage and 2 GB XSTORE. z/VM V3.1.0.
Results
Here are the results we observed for our CMS test program.
Transfer Size (bytes) | MB/sec | MB/CPU-sec | CPU-sec/second | CP CPU-sec/MB | %CP |
---|---|---|---|---|---|
1 | 0.017 | 0.018 | 0.944 | 46.137344 | 84.620 |
100 | 1.723 | 1.870 | 0.921 | 0.450888 | 84.310 |
1000 | 17.174 | 18.165 | 0.945 | 0.046662 | 84.760 |
1500 | 25.871 | 27.777 | 0.931 | 0.030409 | 84.470 |
2000 | 34.091 | 36.680 | 0.929 | 0.023331 | 85.580 |
4000 | 65.477 | 72.661 | 0.901 | 0.011665 | 84.760 |
8000 | 120.576 | 125.072 | 0.964 | 0.006947 | 86.890 |
9000 | 121.666 | 126.222 | 0.964 | 0.006991 | 88.240 |
10000 | 133.723 | 142.339 | 0.939 | 0.006134 | 87.310 |
20000 | 219.809 | 227.065 | 0.968 | 0.003985 | 90.480 |
32764 | 286.070 | 294.775 | 0.970 | 0.003121 | 91.980 |
40000 | 306.755 | 313.967 | 0.977 | 0.002976 | 93.420 |
80000 | 380.396 | 386.298 | 0.985 | 0.002470 | 95.440 |
100000 | 401.215 | 404.957 | 0.991 | 0.002380 | 96.390 |
200000 | 452.134 | 455.758 | 0.992 | 0.002144 | 97.730 |
400000 | 480.947 | 482.873 | 0.996 | 0.002045 | 98.730 |
800000 | 494.869 | 496.221 | 0.997 | 0.002000 | 99.220 |
1000000 | 496.635 | 497.612 | 0.998 | 0.001996 | 99.320 |
2000000 | 499.014 | 499.698 | 0.999 | 0.001990 | 99.460 |
4000000 | 499.976 | 500.485 | 0.999 | 0.001989 | 99.570 |
8000000 | 501.732 | 502.066 | 0.999 | 0.001985 | 99.660 |
Note: 9672-XZ7, 2 GB main, 2 GB expanded. Two-processor LPAR, both dedicated. z/VM V3.1.0. Guests 128 MB. |
Here are the results for our Linux IUCV line driver experiments.
Table 2. Linux IUCV Data Rates
MTU Size (bytes) | MB/sec | MB/CPU-sec | CPU-sec/sec | CP CPU-sec/MB | %CP |
---|---|---|---|---|---|
1500 | 7.84 | 8.12 | 0.966 | 0.053393 | 43.33 |
9000 | 33.33 | 34.17 | 0.975 | 0.012367 | 42.26 |
32764 | 73.09 | 74.13 | 0.986 | 0.005608 | 41.57 |
Note: 9672-XZ7, 2 GB main, 2 GB expanded. Two-processor LPAR, both dedicated. z/VM V3.1.0. Guests 128 MB. |
Here are the results for our Linux CTC line driver experiments.
MTU Size (bytes) | MB/sec | MB/CPU-sec | CPU-sec/sec | CP CPU-sec/MB | %CP |
---|---|---|---|---|---|
1500 | 5.95 | 5.95 | 1.00 | 0.053472 | 31.97 |
9000 | 17.33 | 17.84 | 0.971 | 0.017920 | 31.97 |
32764 | 29.71 | 30.91 | 0.961 | 0.010346 | 31.98 |
Note: 9672-XZ7, 2 GB main, 2 GB expanded. Two-processor LPAR, both dedicated. z/VM V3.1.0. Guests 128 MB. |
Analysis
First let's compare some data rates, at a selection of
transfer sizes (aka MTU sizes).
Table 4. Data Rate (MB/CPU-sec) Comparisons
MTU Size (bytes) | CMS/IUCV | Linux/IUCV | Linux/CTC |
---|---|---|---|
1500 | 27.8 | 8.12 (0.292) | 5.95 (0.214) [0.733] |
9000 | 126.2 | 34.17 (0.271) | 17.84 (0.141) [0.522] |
32764 | 294.8 | 74.13 (0.251) | 30.91 (0.105) [0.417] |
Note: 9672-XZ7, 2 GB main, 2 GB expanded. Two-processor LPAR, both dedicated. z/VM V3.1.0. Guests 128 MB. In the Linux/IUCV and Linux/CTC columns, a number in parentheses is the cell value's fraction of the CMS/IUCV value in the same row. In the Linux/CTC column, a number in brackets is the cell value's fraction of the Linux/IUCV value in the same row. |
These numbers illustrate Linux's ability to utilize the IUCV pipe. Utilization at MTU 1500 runs at about 29%. As we move toward larger and larger frames, IUCV utilization goes down.
We see also that the Linux IUCV line driver is a better data rate performer than the Linux CTC line driver, at each MTU size we measured.
Next we examine CP CPU time per MB transferred,
for a selection of MTU sizes.
Table 5. CP CPU-sec/MB Comparisons
MTU Size (bytes) | CMS/IUCV | Linux/IUCV | Linux/CTC |
---|---|---|---|
1500 | 0.030409 | 0.053393 (1.756) | 0.053472 (1.758) [1.001] |
9000 | 0.006991 | 0.012367 (1.770) | 0.017920 (2.563) [1.449] |
32764 | 0.003121 | 0.005608 (1.796) | 0.010346 (3.315) [1.845] |
Note: 9672-XZ7, 2 GB main, 2 GB expanded. Two-processor LPAR, both dedicated. z/VM V3.1.0. Guests 128 MB. In the Linux/IUCV and Linux/CTC columns, a number in parentheses is the cell value's fraction of the CMS/IUCV value in the same row. In the Linux/CTC column, a number in brackets is the cell value's fraction of the Linux/IUCV value in the same row. |
We see here that the Linux/IUCV cases use about 1.8 times as much CP CPU time per MB as the CMS/IUCV case. This is likely indicative of the extra CP time required to deliver the extra IUCV interrupt to the Linux guest, though other Linux overhead issues (e.g., timer tick processing) also contribute.
We also see that in the Linux/CTC case, CP CPU time is greater than in the Linux/IUCV case, and as MTU size grows, the gap widens. Apparently there is more overhead in CP CTC processing than in CP IUCV processing, so the fixed cost is not amortized as quickly.
Now we examine virtual CPU time per MB transferred,
for a selection of MTU sizes.
Table 6. Virtual CPU-sec/MB Comparisons
MTU Size (bytes) | CMS/IUCV | Linux/IUCV | Linux/CTC |
---|---|---|---|
1500 | 0.005592 | 0.069819 (12.5) | 0.114499 (20.5) [1.64] |
9000 | 0.000932 | 0.016896 (18.1) | 0.038124 (40.9) [2.26] |
32764 | 0.000272 | 0.007882 (29.0) | 0.022001 (80.9) [2.79] |
Note: 9672-XZ7, 2 GB main, 2 GB expanded. Two-processor LPAR, both dedicated. z/VM V3.1.0. Guests 128 MB. In the Linux/IUCV and Linux/CTC columns, a number in parentheses is the cell value's fraction of the CMS/IUCV value in the same row. In the Linux/CTC column, a number in brackets is the cell value's fraction of the Linux/IUCV value in the same row. |
These numbers illustrate the cost of the TCP/IP layers in Linux kernel and its line drivers. This cost is the biggest reason why Linux is unable to drive the IUCV connection to its capacity. CPU resource is being spent on running the guest instead of on running CP, where data movement actually takes place.
These numbers also show us that the Linux guest consumes more CPU in the CTC case than in the IUCV case. Apparently the Linux CTC line driver uses more CPU per MB transferred than the Linux IUCV line driver does.
Conclusions
- The Linux IUCV driver is able to drive the IUCV pipe to at best about 30% of its capacity. As the MTU size grows, the percentage drops.
- The Linux IUCV line driver provides a greater data rate than the Linux CTC line driver over a virtual CTC for the streaming workload measured.
- CP CTC support uses more CPU per MB transferred than CP IUCV support, at a given MTU size.
-
For large transfers between Linux guests, the data rate
would be improved if
the IUCV line driver were modified to support MTU sizes
beyond 32764. Based on our measurements, supporting a
maximum MTU size as high as 2 MB would probably be
sufficient.
However, use of a large MTU size to increase data rate between two Linux images must be balanced against fragmentation issues. As long as the traffic on the IUCV link is destined to remain within the VM image, very large MTU sizes are probably OK. However, if the packets will eventually find their way onto real hardware, they will be fragmented down to the minimum MTU size encountered on the way to the eventual destination. This might be as small as 576 bytes depending on network configuration. The system programmer needs to take this into account when deciding whether to use a very large MTU size on an IUCV link.
- IUCV and virtual CTC are memory-to-memory data transfer technologies. Their speeds will improve as processor speed improves and as memory access speed improves.
Footnotes:
- 1
- uname -a reported Linux prf3linux 2.2.16 #1 SMP Mon Nov 13 09:51:30 EST 2000 s390 unknown.