About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Linux Guest Crypto on z990
This section presents and discusses the results of a number of new measurements that were designed to understand the performance characteristics of the z990 cryptographic support. Included are:
- a processor comparison between the z990 2084-308 and the z900 2064-2C8 with zVM 4.3.0
- a processor comparison between the z990 2084-316 and the z990 2084-308 with zVM 4.3.0
- a comparison between z/VM 4.4.0 and z/VM 4.3.0 on a z990 2084-316
On the IBM z990 systems available as of June 2003, the only cryptographic hardware available at the time of this report was the CP Assist for Cryptographic Function (CPACF) associated with each z990 processor, and optionally up to 12 Peripheral Component Interconnect Cryptographic Accelerator (PCICA) Cards. The IBM complementary metal-oxide semiconductor (CMOS) Cryptographic Coprocessor Feature (CCF) is no longer available and no other secure cryptographic device is available.
The section titled Linux Guest Crypto Support describes the original cryptographic support and the original methodology. Measurements were completed using the Linux OpenSSL Exerciser Performance Workload described in Linux OpenSSL Exerciser. Specific parameters used can be found in the measurement item list or in the various table columns.
Some of the original methodology has changed including system levels, client machines, and connectivity types.
The following list defines the terms and labels used in the tables in this section of the report:
- #Clients/Sys(Path:
The number of clients with the specified system model and
specified connective path to the server.
- Sys:
- XZ7: An IBM S/390 G6 Server Model XZ7.
- 112: An IBM eServer zSeries 900 Model 112.
- 116: An IBM eServer zSeries 900 Model 116.
- Sys:
- CCF Chips: The number of CCF chips available for use during the measurement.
- Cipher RC4 MD5 US: The SSL cipher algorithm used 128-bit RC4 encryption, with MD5 message authentication, and RSA key exchange.
- Client Authentication: Identifies whether client authentication was being used. The SSL server application handshake type is set to CLIENT AUTH and the client's certificate is verified using the local server key and certificate.
- Client Machines: The number of client machines used for the measurement.
- Client Threads: The number of client threads used for the measurement.
- Connections: The number of times the client connects to and disconnects from the server for each iteration of the workload.
- CPACFs: The number of CP Assist for Cryptographic Function (CPACF) facilities available for the measurement.
- Encryption Facility:
The cryptographic facilities used:
- SSL-SL: The SSL data cryptographic operations were done by SSL software routines, regardless of the presence of hardware cryptographic facilities. SSL handshake cryptographic operations were done. The SSL handshake consists of DES encryption or decryption, and PKA Encrypt or Decrypt operations. All the SSL handshake cryptographic operations were done in hardware by one or more PCICA cards.
- Guest Type V=V: A z/VM measurement on a real z900 or z990 system model with LINUX defined as a virtual equal virtual machine.
- Key Size: The length (in bits) of the key used for SSL data cryptographic operations.
- Linux Kernel Level: The Linux kernel level used for the measurement.
- Linux Kernel Mode MP: The Linux kernel is multiple processor mode.
- Linux Kernel Size 31-bit: The Linux kernel used 31-bit addressing.
- Linux System Level: The Linux system level used for the measurement.
- Linux Virt CPUs: The number of virtual processors defined for the Linux virtual machines.
- Linux Virt Mach Size: The storage size defined for the Linux virtual machines.
- No. of Domains per PCICA: The number of PCICA domains that are accessible from the measurement system.
- No. of TCP/IP Virt Machs: The number of z/VM TCP/IP virtual machines defined on the zVM server to handle the communication traffic to the client systems.
- Number of Guests: The number of z/VM virtual machines defined for LINUX server systems.
- OpenSSL Code Level: The OpenSSL code level used for the measurement.
- Packets: The number of packets that are sent between the client and the server for each iteration of the workload.
- PCICA Cards Active: The number of PCICA cards available for use during the measurement.
- Real Connectivity: Specifies the real communication handler between the client machines and the z/VM server machine.
- Receive Bytes: The number of bytes in each packet that is returned from the server to the client.
- Send Bytes: The number of bytes in each packet that is sent from the client to the server.
- Server Model: The system model used for the server.
- Server SID Cache: The size of the SSL server session ID cache.
- Server Threads: The number of server threads defined for the measurement.
- Servers: The number of servers used for the measurement.
- TCP/IP Virt Mach Mode UP: The TCP/IP code is executing in uni-processor mode.
- TCP/IP Virt Mach Size: Storage size defined for the z/VM TCP/IP virtual machines.
- Virtual Connectivity: Specifies the type of virtual connectivity between the z/VM TCP/IP virtual machines and the Linux virtual machines.
- z90Crypt Level: The z90Crypt level used for the measurement.
- z900 2064-2C8: An IBM eServer zSeries 900 Model 2C8 (8-way) system represented by a logical partition with 8 dedicated processors on a 2064-216.
- z990 2084-308: An IBM eServer zSeries 990 Model 308 (8-way) system represented by a logical partition with 8 dedicated processors on a B16.
- z990 2084-316: An IBM eServer zSeries 990 Model 316 (16-way) system represented by a logical partition with 16 dedicated processors on a B16.
Table 1. Common items for measurements in this section
Client Authentication Encryption Facility No. of Domains per PCICA Server SID Cache Cipher Connections Packets Send Bytes Receive Bytes Key Size | No SSL-SL 1 Disabled RC4 MD5 US 1 1 2048 2048 1024 |
Linux System Level Linux Kernel Level OpenSSL Code Level z90Crypt Level | SLES8 2.4.19 0.9.6E 1.1.2 |
Real Connectivity Virtual Connectivity TCP/IP Virt Mach Size TCP/IP Virt Mach Mode | VM TCP/IP vCTC 256M UP |
Guest Type Number of Guests Linux Virt Mach Size Linux Virt CPUs Linux Kernel Mode Linux Kernel Size | V=V 120 128M 1 MP 31-bit |
Comparison between z990 2084-308 and z900 2064-2C8:
Measurements were obtained to compare the performance
of the SSL workload with hardware
encryption between a z990 2084-308 and a z900 2064-2C8.
For these measurements, there were 120 Linux guests
running in an LPAR with 8 dedicated processors. The LPAR was
configured with one domain of 9 or 12 PCICA cards.
The results are
summarized
in Table 2.
Table 2. Benefit of Z990 processor
SERVER MODEL CCF CHIPS CPACFS PCICA CARDS ACTIVE SERVERS SERVER THREADS CLIENT MACHINES CLIENT THREADS NO. OF TCP/IP VIRT MACHS #Clients/Sys(Path #Clients/Sys(Path | Z900 2064-2C8 2 NA 9 360 360 1 360 5 1/XZ7(6 CTC Na | Z990 2084-308 NA 8 12 198 198 2 198 2 1/XZ7(4 CTC 1/112(6 CTC |
Run ID | E2C06BV3 | E3503BV1 |
Avg Spin Lock Rate (v) Spin Time (v) Pct Spin Time (v) | 4969 2.037 1.012 | na na na |
ETR ETR Ratio ITR (R) ITR Ratio (R) Processor Utilization (R) | 2253.08 1.00 2432.60 1.00 92.62 | 4456.91 1.97 4479.30 1.84 99.50 |
Total CPU/Tx (R) Emul CPU/Tx (R) CP CPU/Tx (R) | 3.28 2.19 1.49 | 1.78 1.22 0.56 |
Note: Workload: Linux OpenSSL Exerciser; z/VM 4.3.0 SLU 0000; LPAR with 8 dedicated processors; (R) = RTM |
ITR improved when moving from the 2064-2C8 to the 2084-308 because of the increased processor speed.
ETR improved more than ITR because the z990 measurement obtained a higher processor utilization. The z900 measurement was limited by the single client configuration.
Comparison between z990 2084-316 and z990 2084-308:
An additional measurement was obtained on a 16-way processor
to see how performance scaled from the 8-way to the 16-way.
The results are
summarized
in Table 3.
Table 3. Z990 16-way versus 8-way
SERVER MODEL Processors (p) CPACFS SERVERS SERVER THREADS CLIENT MACHINES CLIENT THREADS NO. OF TCP/IP VIRT MACHS #Clients/Sys(Path #Clients/Sys(Path #Clients/Sys(Path | Z990 2084-308 8 8 198 198 2 198 2 1/XZ7(4 CTC 1/112(6 CTC na | Z990 2084-316 16 16 297 297 3 297 3 1/XZ7(4 CTC 1/112(6 CTC 1/116(6 CTC |
Run ID | E3503BV1 | E3503BV2 |
Avg Spin Lock Rate (v) Spin Time (v) Pct Spin Time (v) | na na na | 13174 24.989 32.92 |
ETR ETR Ratio ITR (R) ITR Ratio (R) Processor Utilization (R) | 4456.91 1.00 4479.30 1.00 99.50 | 4426.48 0.99 4719.06 1.05 93.80 |
Total CPU/Tx (p) Emul CPU/Tx (p) CP CPU/Tx (p) | 1.784 1.224 0.560 | 3.391 0.961 2.430 |
Note: Workload: Linux OpenSSL Exerciser; z/VM 4.3.0 SLU 0000; LPAR with 8 or 16 dedicated processors; 12 PCICA Cards; (p) = z/VM FCON/ESA on 4.3.0; (v) = VMPRF; (R) = RTM |
With z/VM 4.3.0, the 2084-316 measurement experienced severe spin lock serialization in the z/VM scheduler and did not scale very well compared to the 2084-308 measurement. The 2084-316 measurement provided an ETR of 0.99 times and an ITR of 1.05 times the 2084-308 measurement. The processor utilization was less than 100%. Milliseconds of CP time per transaction increased by 333% between the 8-way and 16-way measurements. The spin lock percentage was 33% compared to about 1% for the 8-way 2064-2C8 measurement shown in Table 2.
Release comparison of z/VM 4.4.0 and z/VM 4.3.0 on z990:
Since
the 16-way measurement on 4.3.0 was affected by the spin lock, it
was repeated on z/VM 4.4.0.
The 16-way results are summarized
in Table 4.
Table 4. Benefit of z/VM 4.4.0
Z/VM SYSTEM LEVEL SERVERS SERVER THREADS CLIENT THREADS | 4.3.0 SLU 0000 297 297 297 | 4.4.0 SLU 0000 360 360 360 |
Run ID | E3503BV2 | E3521BV1 |
Avg Spin Lock Rate (v) Spin Time (v) Pct Spin Time (v) | 13174 24.989 32.92 | 5772 8.984 5.186 |
ETR ETR Ratio ITR (p) ITR Ratio (p) Processor Utilization (p) | 4426.48 1.00 4719.06 1.00 93.80 | 7696.83 1.73 8294.00 1.75 92.80 |
Total CPU/Tx (p) Emul CPU/Tx (p) CP CPU/Tx (p) | 3.391 0.961 2.430 | 1.929 1.083 0.846 |
Note: Workload: Linux OpenSSL Exerciser; z990 2084-316; LPAR with 16 dedicated processors; 12 PCICA Cards; 3 TCP/IP Virt Machs; 3 Client Machines connected by CTCs ( 1/XZ7(4 CTC, 1/112(6 CTC, 1/116(6 CTC); (p) = z/VM Perfkit on 4.4.0 and z/VM FCON/ESA on 4.3.0; (v) = VMPRF |
On the 2084-316, z/VM 4.4.0 provided a 73% ETR improvement and a 75% ITR improvement over z/VM 4.3.0. This spin lock improvement is discussed in the section titled Scheduler Lock Improvement. The spin lock percentage was 5% compared to about 33% for the 4.3.0 measurement. Milliseconds of CP time per transaction decreased by 65% from the 4.3.0 measurements.
Despite this large improvement, the z/VM 4.4.0 guest measurement was also limited by z/VM spin lock serialization and processor utilization was less than 100%.