Contents | Previous | Next

Linux Guest Crypto on z990

This section presents and discusses the results of a number of new measurements that were designed to understand the performance characteristics of the z990 cryptographic support. Included are:

  • a processor comparison between the z990 2084-308 and the z900 2064-2C8 with zVM 4.3.0
  • a processor comparison between the z990 2084-316 and the z990 2084-308 with zVM 4.3.0
  • a comparison between z/VM 4.4.0 and z/VM 4.3.0 on a z990 2084-316

On the IBM z990 systems available as of June 2003, the only cryptographic hardware available at the time of this report was the CP Assist for Cryptographic Function (CPACF) associated with each z990 processor, and optionally up to 12 Peripheral Component Interconnect Cryptographic Accelerator (PCICA) Cards. The IBM complementary metal-oxide semiconductor (CMOS) Cryptographic Coprocessor Feature (CCF) is no longer available and no other secure cryptographic device is available.

The section titled Linux Guest Crypto Support describes the original cryptographic support and the original methodology. Measurements were completed using the Linux OpenSSL Exerciser Performance Workload described in Linux OpenSSL Exerciser. Specific parameters used can be found in the measurement item list or in the various table columns.

Some of the original methodology has changed including system levels, client machines, and connectivity types.

The following list defines the terms and labels used in the tables in this section of the report:

  • #Clients/Sys(Path: The number of clients with the specified system model and specified connective path to the server.

    • Sys:
      • XZ7: An IBM S/390 G6 Server Model XZ7.
      • 112: An IBM eServer zSeries 900 Model 112.
      • 116: An IBM eServer zSeries 900 Model 116.

  • CCF Chips: The number of CCF chips available for use during the measurement.

  • Cipher RC4 MD5 US: The SSL cipher algorithm used 128-bit RC4 encryption, with MD5 message authentication, and RSA key exchange.

  • Client Authentication: Identifies whether client authentication was being used. The SSL server application handshake type is set to CLIENT AUTH and the client's certificate is verified using the local server key and certificate.

  • Client Machines: The number of client machines used for the measurement.

  • Client Threads: The number of client threads used for the measurement.

  • Connections: The number of times the client connects to and disconnects from the server for each iteration of the workload.

  • CPACFs: The number of CP Assist for Cryptographic Function (CPACF) facilities available for the measurement.

  • Encryption Facility: The cryptographic facilities used:
    • SSL-SL: The SSL data cryptographic operations were done by SSL software routines, regardless of the presence of hardware cryptographic facilities. SSL handshake cryptographic operations were done. The SSL handshake consists of DES encryption or decryption, and PKA Encrypt or Decrypt operations. All the SSL handshake cryptographic operations were done in hardware by one or more PCICA cards.

  • Guest Type V=V: A z/VM measurement on a real z900 or z990 system model with LINUX defined as a virtual equal virtual machine.

  • Key Size: The length (in bits) of the key used for SSL data cryptographic operations.

  • Linux Kernel Level: The Linux kernel level used for the measurement.

  • Linux Kernel Mode MP: The Linux kernel is multiple processor mode.

  • Linux Kernel Size 31-bit: The Linux kernel used 31-bit addressing.

  • Linux System Level: The Linux system level used for the measurement.

  • Linux Virt CPUs: The number of virtual processors defined for the Linux virtual machines.

  • Linux Virt Mach Size: The storage size defined for the Linux virtual machines.

  • No. of Domains per PCICA: The number of PCICA domains that are accessible from the measurement system.

  • No. of TCP/IP Virt Machs: The number of z/VM TCP/IP virtual machines defined on the zVM server to handle the communication traffic to the client systems.

  • Number of Guests: The number of z/VM virtual machines defined for LINUX server systems.

  • OpenSSL Code Level: The OpenSSL code level used for the measurement.

  • Packets: The number of packets that are sent between the client and the server for each iteration of the workload.

  • PCICA Cards Active: The number of PCICA cards available for use during the measurement.

  • Real Connectivity: Specifies the real communication handler between the client machines and the z/VM server machine.

  • Receive Bytes: The number of bytes in each packet that is returned from the server to the client.

  • Send Bytes: The number of bytes in each packet that is sent from the client to the server.

  • Server Model: The system model used for the server.

  • Server SID Cache: The size of the SSL server session ID cache.

  • Server Threads: The number of server threads defined for the measurement.

  • Servers: The number of servers used for the measurement.

  • TCP/IP Virt Mach Mode UP: The TCP/IP code is executing in uni-processor mode.

  • TCP/IP Virt Mach Size: Storage size defined for the z/VM TCP/IP virtual machines.

  • Virtual Connectivity: Specifies the type of virtual connectivity between the z/VM TCP/IP virtual machines and the Linux virtual machines.

  • z90Crypt Level: The z90Crypt level used for the measurement.

  • z900 2064-2C8: An IBM eServer zSeries 900 Model 2C8 (8-way) system represented by a logical partition with 8 dedicated processors on a 2064-216.

  • z990 2084-308: An IBM eServer zSeries 990 Model 308 (8-way) system represented by a logical partition with 8 dedicated processors on a B16.

  • z990 2084-316: An IBM eServer zSeries 990 Model 316 (16-way) system represented by a logical partition with 16 dedicated processors on a B16.
Items common to all the measurements in this section are summarized in Table 1.

Table 1. Common items for measurements in this section


Client Authentication
Encryption Facility
No. of Domains per PCICA
Server SID Cache
Cipher
Connections
Packets
Send Bytes
Receive Bytes
Key Size


No
SSL-SL
1
Disabled
RC4 MD5 US
1
1
2048
2048
1024


Linux System Level
Linux Kernel Level
OpenSSL Code Level
z90Crypt Level


SLES8
2.4.19
0.9.6E
1.1.2


Real Connectivity
Virtual Connectivity
TCP/IP Virt Mach Size
TCP/IP Virt Mach Mode


VM TCP/IP
vCTC
256M
UP


Guest Type
Number of Guests
Linux Virt Mach Size
Linux Virt CPUs
Linux Kernel Mode
Linux Kernel Size


V=V
120
128M
1
MP
31-bit

Comparison between z990 2084-308 and z900 2064-2C8 

Measurements were obtained to compare the performance of the SSL workload with hardware encryption between a z990 2084-308 and a z900 2064-2C8. For these measurements, there were 120 Linux guests running in an LPAR with 8 dedicated processors. The LPAR was configured with one domain of 9 or 12 PCICA cards. The results are summarized in Table 2.

Table 2. Benefit of Z990 processor


SERVER MODEL
CCF CHIPS
CPACFS
PCICA CARDS ACTIVE
SERVERS
SERVER THREADS
CLIENT MACHINES
CLIENT THREADS
NO. OF TCP/IP VIRT MACHS
#Clients/Sys(Path
#Clients/Sys(Path


Z900 2064-2C8
2
NA
9
360
360
1
360
5
1/XZ7(6 CTC
Na


Z990 2084-308
NA
8
12
198
198
2
198
2
1/XZ7(4 CTC
1/112(6 CTC


Run ID


E2C06BV3


E3503BV1


Avg Spin Lock Rate (v)
Spin Time (v)
Pct Spin Time (v)


4969
2.037
1.012


na
na
na


ETR
ETR Ratio
ITR (R)
ITR Ratio (R)
Processor Utilization (R)


2253.08
1.00
2432.60
1.00
92.62


4456.91
1.97
4479.30
1.84
99.50


Total CPU/Tx (R)
Emul CPU/Tx (R)
CP CPU/Tx (R)


3.28
2.19
1.49


1.78
1.22
0.56

Note: Workload: Linux OpenSSL Exerciser; z/VM 4.3.0 SLU 0000; LPAR with 8 dedicated processors; (R) = RTM

ITR improved when moving from the 2064-2C8 to the 2084-308 because of the increased processor speed.

ETR improved more than ITR because the z990 measurement obtained a higher processor utilization. The z900 measurement was limited by the single client configuration.

Comparison between z990 2084-316 and z990 2084-308 

An additional measurement was obtained on a 16-way processor to see how performance scaled from the 8-way to the 16-way. The results are summarized in Table 3.

Table 3. Z990 16-way versus 8-way


SERVER MODEL
Processors (p)
CPACFS
SERVERS
SERVER THREADS
CLIENT MACHINES
CLIENT THREADS
NO. OF TCP/IP VIRT MACHS
#Clients/Sys(Path
#Clients/Sys(Path
#Clients/Sys(Path


Z990 2084-308
8
8
198
198
2
198
2
1/XZ7(4 CTC
1/112(6 CTC
na


Z990 2084-316
16
16
297
297
3
297
3
1/XZ7(4 CTC
1/112(6 CTC
1/116(6 CTC


Run ID


E3503BV1


E3503BV2


Avg Spin Lock Rate (v)
Spin Time (v)
Pct Spin Time (v)


na
na
na


13174
24.989
32.92


ETR
ETR Ratio
ITR (R)
ITR Ratio (R)
Processor Utilization (R)


4456.91
1.00
4479.30
1.00
99.50


4426.48
0.99
4719.06
1.05
93.80


Total CPU/Tx (p)
Emul CPU/Tx (p)
CP CPU/Tx (p)


1.784
1.224
0.560


3.391
0.961
2.430

Note: Workload: Linux OpenSSL Exerciser; z/VM 4.3.0 SLU 0000; LPAR with 8 or 16 dedicated processors; 12 PCICA Cards; (p) = z/VM FCON/ESA on 4.3.0; (v) = VMPRF; (R) = RTM

With z/VM 4.3.0, the 2084-316 measurement experienced severe spin lock serialization in the z/VM scheduler and did not scale very well compared to the 2084-308 measurement. The 2084-316 measurement provided an ETR of 0.99 times and an ITR of 1.05 times the 2084-308 measurement. The processor utilization was less than 100%. Milliseconds of CP time per transaction increased by 333% between the 8-way and 16-way measurements. The spin lock percentage was 33% compared to about 1% for the 8-way 2064-2C8 measurement shown in Table 2.

Release comparison of z/VM 4.4.0 and z/VM 4.3.0 on z990 

Since the 16-way measurement on 4.3.0 was affected by the spin lock, it was repeated on z/VM 4.4.0. The 16-way results are summarized in Table 4.

Table 4. Benefit of z/VM 4.4.0


Z/VM SYSTEM LEVEL
SERVERS
SERVER THREADS
CLIENT THREADS


4.3.0 SLU 0000
297
297
297


4.4.0 SLU 0000
360
360
360


Run ID


E3503BV2


E3521BV1


Avg Spin Lock Rate (v)
Spin Time (v)
Pct Spin Time (v)


13174
24.989
32.92


5772
8.984
5.186


ETR
ETR Ratio
ITR (p)
ITR Ratio (p)
Processor Utilization (p)


4426.48
1.00
4719.06
1.00
93.80


7696.83
1.73
8294.00
1.75
92.80


Total CPU/Tx (p)
Emul CPU/Tx (p)
CP CPU/Tx (p)


3.391
0.961
2.430


1.929
1.083
0.846

Note: Workload: Linux OpenSSL Exerciser; z990 2084-316; LPAR with 16 dedicated processors; 12 PCICA Cards; 3 TCP/IP Virt Machs; 3 Client Machines connected by CTCs ( 1/XZ7(4 CTC, 1/112(6 CTC, 1/116(6 CTC); (p) = z/VM Perfkit on 4.4.0 and z/VM FCON/ESA on 4.3.0; (v) = VMPRF

On the 2084-316, z/VM 4.4.0 provided a 73% ETR improvement and a 75% ITR improvement over z/VM 4.3.0. This spin lock improvement is discussed in the section titled Scheduler Lock Improvement. The spin lock percentage was 5% compared to about 33% for the 4.3.0 measurement. Milliseconds of CP time per transaction decreased by 65% from the 4.3.0 measurements.

Despite this large improvement, the z/VM 4.4.0 guest measurement was also limited by z/VM spin lock serialization and processor utilization was less than 100%.

Contents | Previous | Next