IBM: z/VM Performance Report: TCP/IP Stack Improvement Part 2

TCP/IP Stack Improvement Part 2

In TCP/IP 430, performance enhancements were made to the VM TCP/IP stack focusing on the device layer (see TCP/IP Stack Performance Improvements). For TCP/IP 440, the focus of the enhancements was on the TCP layer and the major socket functions. As in TCP/IP 430, the improvements in TCP/IP 440 were achieved by optimizing high-use paths, improving algorithms, and implementing performance related features. The goal of these improvements was to increase the performance of the stack when it is acting as a host. This section summarizes the results of a performance evaluation of these improvements by comparing TCP/IP 430 with TCP/IP 440.

Methodology: An internal tool was used to drive connect-request-response (CRR), streaming and request-response(RR) workloads utilizing either the TCP or UPD TCP/IP protocols. The CRR workload had the client connecting, sending 64 bytes to the server, the server responding with 8K and the client then disconnecting utilizing the TCP protocol. The streaming workload consisted of the client sending 1 byte to the server and the server responding with 20MB utilizing the TCP protocol. The RR workload utilized the UDP protocol and consisted of the client sending 200 bytes to the server and the server responding with 1000 bytes. In each case above, the client/server sequences were repeated for 400 seconds. A complete set of runs, consisting of 3 trials for each case, was done.

The measurements were done on a 2064-109 with 3 dedicated processors for the LPAR used. The LPAR had 1GB of central storage and 2GB expanded storage. CP monitor data was captured for the LPAR during the measurement and reduced using VMPRF.

In the measurement environment there was one client, one server, and one TCP/IP stack. Both the client and the server communicated with the TCP/IP stack using IUCV via the loopback feature of TCP/IP.

Results: The following tables show the comparison between results on TCP/IP 430 and the enhancements on TCP/IP 440. MB/sec (megabytes per second) or trans/sec (transactions per second) were supplied by the workload driver and shows the throughput rate. All other values are from CP monitor data or derived from CP monitor data.

Table 1. CRR Workload (TCP protocol)

runid TCP/IP Level	icr0b01 4.3.0	icr0n10 4.4.0	Percent Difference
trans/sec tot_cpu_util tot_cpu_msec/trans emul_msec/trans cp_msec/trans	43.79 34.60 15.81 15.35 0.46	212.99 32.50 3.05 2.61 0.44	386.4 - 6.1 - 80.7 - 83.0 - 4.4
Note: 2064-109; LPAR with 3 dedicated processors

Table 2. Streaming Workload (TCP protocol)

runid TCP/IP Level	ist0b01 4.3.0	ist0n10 4.4.0	Percent Difference
MB/sec tot_cpu_util tot_cpu_msec/MB emul_msec/MB cp_msec/MB	72.18 34.70 9.62 5.57 4.05	87.90 35.60 8.10 4.23 3.87	21.8 2.6 -15.8 -24.1 - 4.4
Note: 2064-109; LPAR with 3 dedicated processors

Table 3. RR Workload (UDP protocol)

runid TCP/IP Level	iud0b02 4.3.0	iud0n14 4.4.0	Percent Difference
trans/sec tot_cpu_util tot_cpu_msec/trans emul_msec/trans cp_msec/trans	1549.78 33.70 0.43 0.30 0.13	1745.09 35.30 0.41 0.28 0.13	12.6 4.8 - 4.7 - 6.7 0.0
Note: 2064-109; LPAR with 3 dedicated processors

Summary:

As seen in the tables above, in all three workloads the amount of CPU used per transaction/MB was reduced. This, in turn, allowed the throughput (trans/MB per sec) to increase. While the focus of the enhancements was in the TCP layer and the major sockets functions, bottlenecks were found in the connect-disconnect code which showed up as limitations in performance runs. As a result this code was updated and the improvements can be seen by the increase in throughput in the CRR workload.