Contents | Previous | Next

Guest Cryptographic Enhancements

This section summarizes the results of a number of new measurements that were designed to understand the performance characteristics of the enhanced cryptographic support provided in z/VM 5.2.0.

Introduction

z/VM 5.2.0 extended the existing shared and dedicated cryptographic queue support to include the Cryptographic Express2 coprocessor (CEX2C) on the z990 and z9 processors and the Cryptographic Express2 Accelerator coprocessor (CEX2A) on the z9 processor.

Support of the CEX2C card is also available in z/VM 5.1.0 and support of the CEX2A card was provided on z/VM 5.1.0 via APAR VM63646.

The existing z/VM 5.1.0 support, component terminology, and measurement methodology are described in z990 Guest Crypto Enhancements.

z/VM 5.2.0 also provided support for the CHSC Store Crypto-Measurement data command. This Crypto-Measurement data along with z/VM internal data are now included in z/VM monitor data. See Monitor Enhancements for details.

In addition to these z/VM enhancements, z/OS provided support for additional features of the CP Assist for Cryptographic Functions (CPACF) that are available on the z9.

Summary of Results

The results of individual measurements are affected by the cryptographic card configuration, processor configuration, cryptographic sharing configuration, cryptographic operations, the guest operating system, and the cipher for the SSL workloads.

For dedicated cryptographic cards, both z/OS and Linux will route cryptographic operations to all available cards and each can use the full capacity of the encryption facilities unless limited by processor utilization or other serialization. z/VM guest measurements are generally limited by the same factor as a native measurement. All of the Linux guest SSL measurements with dedicated cryptographic cards are limited by processor utilization. The z/OS guest SSL measurements are limited by some undetermined serialization. The z/OS guest ICSF measurements are nearly identical to the native z/OS measurement and each is limited by same factor as the native measurement.

For shared cryptographic cards, z/VM routes cryptographic operations to all available real cryptographic cards. For a single guest, the external throughput rate is determined by the amount that can be obtained through the 8 virtual queues, the maximum capacity of the encryption facilities, processor utilization, or other serialization. Examples of external throughput rates limited by the 8 virtual queues and by the maximum throughput rate of the real cryptographic cards are included in the detailed measurement section. Processor time per transaction is higher with shared cryptographic cards than with dedicated cryptographic cards.

With a sufficient number of Linux guests, the shared cryptographic support will reach 100% utilization of the real cryptographic configuration unless processor utilization or other serialization becomes the limiting factor. All of the multiple guest measurements included in the detailed measurement section are limited by processor utilization.

Results for measurements of the SSL workload vary by SSL cipher. For the SSL workload, CEX2C and CEX2A cards are used only for the SSL handshake. Data encryption using the specified cipher is handled by software encryption routines or by CPACF. The detail measurement section contains both z/OS and Linux results by SSL cipher. Ratios between ciphers vary depending on the guest operating system and the processor model.

Linux Guest Crypto on z990 and z9

The z/VM 4.3.0 section titled Linux Guest Crypto Support describes the original cryptographic support and the original methodology. The z/VM 4.4.0 section titled Linux Guest Crypto on z990 and the z/VM 5.1.0 section titled Linux Guest Crypto on z990 describe additional cryptographic support and methodology.

Measurements were completed using the Linux OpenSSL Exerciser Performance Workload described in Linux OpenSSL Exerciser. Specific parameters used can be found in the common items table, various table columns, or table footnotes.

Items common to the measurements in this section are summarized in Table 1.

Table 1. Common items for measurements in this section

Client Authentication No
Server SID Cache Disabled
Client SID Cache 0
Client SID Timeout 180
Connections 1
Packets 1
Send Bytes 2048
Receive Bytes 2048
Key Size 1024
z/VM System Level 5.2.0 GA
Linux System Level SLES9 SP2
Linux Kernel Level 2.6.13-14
OpenSSL Code Level 0.9.7C
z90Crypt Level 1.2.2
Real Connectivity (1 Guest) Linux TCP/IP
Real Connectivity (30 Guests) z/VM TCP/IP
Virtual Connectivity Guest LAN QDIO
TCP/IP Virt Mach Size 256M
TCP/IP Virt Mach Mode UP
Guest Type V=V
Linux Kernel Mode MP
Linux Kernel Size 64-bit
Linux Virt Mach Size (1 Guest) 2G
Linux Virt Mach Size (30 Guests) 128M
Linux Virt Processors (1 Guest) 4
Linux Virt Processors (30 Guests) 1
Servers 100-280
Server Threads 100-280
Client Threads 100-280
Client Machines 1-2
Server Model z990,z9 dedicated LPAR
Client Model z990
Connective Paths CTCs

Dedicated and Shared CEX2C cards on z990 

Table 2 contains a summary of the results from measurements using 6 CEX2C cards including dedicated cards for a single guest, shared cards for a single guest, and shared cards for 30 guests.

For dedicated CEX2C cards, a single Linux guest routes cryptographic operations to all dedicated cards. A single guest can obtain the maximum throughput rate of the real cards unless processor utilization becomes a limit. The measurement with 6 dedicated CEX2C cards is limited by nearly 100% processor utilization while the utilization of the CEX2C cards was only 80%.

For shared CEX2C cards, z/VM routes cryptographic operations to all available real cards. For a single guest, the total external throughput rate is determined by the amount that can be obtained through the 8 virtual queues or the maximum throughput rate of the real cards. The single user measurement with 6 shared CEX2C cards is limited by the 8 virtual queues with processor utilization of 74% and CEX2C utilization of 54%.

Processor time per transaction is higher for shared cards than with dedicated cards. In the single users measurements with 6 CEX2C cards, processor time per transaction with shared cards increased 23% from the measurement with dedicated cards.

With a sufficient number of Linux guests, the shared cryptographic support will reach 100% utilization of the real CEX2C cards unless 100% processor utilization becomes the limiting factor. The 30 user measurement with 6 shared CEX2C cards is limited by nearly 100% processor utilization while the utilization of the CEX2C cards was only 56%.

Processor time per transaction is higher with 30 guests than with a single guest. In the measurements with 6 CEX2C cards, processor time per transaction with 30 guests increased 16% from the measurement with a single guest.

Table 2. Dedicated and Shared CEX2C cards by number of Linux guests

Dedicated CEX2C cards 6 0 0
Shared CEX2C cards 0 6 6
No. Linux Guest 1 1 30
Run ID E5723BV3 E5724BV1 E5727BV1
Tx/sec (w) 4574.52 2805.18 3205.70
Total Util/Proc (p) 99.2 74.6 99.2
Total msec/Tx (p) 0.867 1.064 1.238
CEX2C Utilization (m) 80 54 56
Notes: z/VM 520 GA; 2084-324; 4 dedicated processors; 4G central storage; no expanded storage; workload: SSL Exerciser; RC4 MD5 US cipher; no session id caching; (w) = test case; (p) = z/VM Performance Toolkit (Perfkit) (m) = z/VM Monitor

Shared CEX2C cards on z990 with 30 Linux guests by SSL cipher 

Table 3 contains a summary of the results from measurements using 6 CEX2C shared cards for 30 guests and various SSL ciphers.

With 6 CEX2C cards, measurements for all five ciphers are limited by nearly 100% processor utilization. The external throughput rates vary by more than 40%, with the DES SHA cipher providing the highest rate and the AES-256 SHA US cipher providing the lowest rate. The AES-256 SHA US cipher achieved an external throughput rate of 0.71 times the external throughput rate achieved with the DES SHA cipher.

Results with a RC4 MD5 US and DES SHA cipher are nearly identical to results for similar measurements using PCIXCC cards shown in table Shared PCIXCC and PCICA cards with 30 Linux guest by SSL cipher of the z/VM 5.1.0 section titled z990 Guest Crypto Enhancements.

Table 3. Six shared CEX2C cards with 30 Linux guests by cipher

cipher RC4 MD5 US DES SHA AES-128 SHA US AES-256 SHA US
Run ID E5727BV1 E5728BV1 E5731BV1 E5731BV2
Tx/sec (w) 3205.70 3632.55 2783.58 2576.08
Total Util/Proc (p) 99.2 98.9 99.2 99.4
Total Msec/Tx (p) 1.238 1.089 1.426 1.543
Notes: z/VM 520 GA 2084-324; 4 dedicated processors; 4G central storage; no expanded storage; Workload: SSL exerciser; no session id caching; (w) = Workload, (p) = z/VM Perfkit

Dedicated and Shared CEX2A cards on z9 

Table 4 contains a summary of the results from z9 measurements using CEX2A cards including 3 dedicated cards for a single guest, 1 shared card for a single guest, 3 shared cards for a single guest, and 3 shared cards for 30 guests.

For dedicated CEX2A cards, a single Linux guest routes cryptographic operations to all dedicated cards. A single guest can obtain the maximum throughput rate of the real cards unless processor utilization becomes a limit. The measurement with 3 dedicated CEX2A cards is limited by nearly 100% processor utilization while the utilization of the CEX2A cards was only 60%.

For shared CEX2A cards, z/VM routes cryptographic operations to all available real cards. For a single guest, the total external throughput rate is determined by the amount that can be obtained through the 8 virtual queues or the maximum throughput rate of the real cards.

The single user measurement with 1 shared CEX2A card is limited by the CEX2A utilization of 97% while processor utilization was 70%. The single user measurement with 3 shared CEX2A cards is limited by the 8 virtual queues with processor utilization of 92% and CEX2A utilization of 50%.

Processor time per transaction is higher for shared cards than with dedicated cards. In the single user measurements with 3 CEX2A cards, processor time per transaction with shared cards increased 25% from the measurement with dedicated cards.

With a sufficient number of Linux guests, the shared cryptographic support will reach 100% utilization of the real CEX2A cards unless 100% processor utilization becomes the limiting factor. The 30 user measurement with 3 shared CEX2A cards is limited by nearly 100% processor utilization while the utilization of the CEX2A cards was only 45%.

Processor time per transaction is higher with 30 guests than with a single guest. In the measurements with 3 CEX2A cards, processor time per transaction with 30 guests increased 16% from the measurement with a single guest.

Table 4. Dedicated and Shared CEX2A cards on z9 by number of Linux guests

Dedicated CEX2A cards 3 0 0 0
Shared CEX2A cards 0 1 3 3
No. Linux Guest 1 1 1 30
Run ID E5B09BV1 E6114BV1 E6118BV3 E6128BV5
Tx/sec (w) 5584.15 3141.89 4657.00 4305.92
Total Util/Proc (p) 99.1 70.2 92.6 99.2
Total msec/Tx (p) 0.710 0.894 0.795 0.922
CEX2A Utilization (m) 60 97 50 45
Notes: z/VM 520 GA; 2094-734; 4 dedicated processors; 5G central storage; 4G expanded storage; workload: SSL Exerciser; RC4 MD5 US cipher; no session id caching; (w) = workload, (p) = z/VM Perfkit, (m) = z/VM Monitor

Shared CEX2A cards on z9 with 30 Linux guests by SSL cipher 

Table 5 contains a summary of the results from z9 measurements using 3 CEX2A shared cards for 30 guests and various SSL ciphers.

With 3 CEX2A cards, measurements for all five ciphers are limited by nearly 100% processor utilization and the external throughput rates vary by more than 31% with the DES SHA cipher providing the highest rate and the AES-256 SHA US cipher providing the lowest rate. The AES-256 SHA US cipher achieved an external throughput rate of 0.76 times the external throughput rate achieved with the DES SHA cipher.

All results are higher than the z990 measurements with CEX2C cards because of the faster processor.

Table 5. Three shared CEX2A cards with 30 Linux guests by cipher

cipher RC4 MD5 US DES SHA TDES SHA US AES-128 SHA US AES-256 SHA US
Run ID E6128BV5 E6128BV6 E6129BV1 E6129BV2 E6130BV1
Tx/sec (w) 4305.92 4805.16 4704.17 3867.81 3671.86
Total Util/Proc (p) 99.2 98.9 99.0 99.4 99.5
Total msec/Tx (p) 0.922 0.823 0.842 1.028 1.084
Notes: z/VM 520 GA 2094-734; 4 dedicated processors; 5G central storage; 4G expanded storage; workload: SSL Exerciser; no session id caching; (w) = workload; (p) = z/VM Perfkit

z/OS Guest with CEX2A on z9

The z/VM 5.1.0 section titled z/OS Guest Crypto on z990 describes the original cryptographic support and the original methodology.

Guest versus native for z/OS ICSF with 1 dedicated CEX2A card on a z9

Measurements were completed using the z/OS ICSF Performance Workload PCXA sweep described in z/OS Integrated Cryptographic Service Facility (ICSF) Performance Workload . ICSF test cases developed for the PCICA card will execute on the CEX2A card and ICSF test cases developed for the PCIXCC card will execute on the CEX2C card.

The external throughput rates achieved by a z/OS guest using the z/VM dedicated cryptographic support are nearly identical to z/OS native for all measured test cases. Of the 56 individual test case comparisons, all of the guest rates were within 3% of the native measurement.

The 56 individual test cases produced far too much data to include in this report but Table 6 has a summary of guest to native throughput ratios for all measurements.

Multiple jobs provided a higher external throughput rate than a single job for all test case. Specific ratios varied dramatically by test case. The number of jobs in the multiple job measurements is enough to reach full capacity of the specified encryption facility.

Table 6. Guest to Native Throughput Ratio for z/OS ICSF PCXA Sweep

CEX2A 8 8
Jobs 1 8
PCXA Sweep (28 test cases)
Run ID (Native) E5831IW1 E5901IW1
Run ID (Guest) E5905IV1 E5905IV2
Ratio (Average) 0.990 0.985
Ratio (Minimum) 0.988 0.974
Ratio (Maximum) 0.992 1.000
2094-738; 1 dedicated processor, 4G central storage, no expanded storage; workload: z/OS ICSF Sweeps; Ratios are calculated from workload data

Guest versus native for z/OS SSL with 8 dedicated CEX2A cards on a z9

Measurements were completed for a data exchange test case with both servers and clients on the same z/OS system using the z/OS SSL Performance Workload described in z/OS Secure Sockets Layer (System SSL) Performance Workload. Specific parameters used can be found in the various table columns or table footnotes.

Table 7 contains a summary of results for the dedicated guest cryptographic support and native z/OS measurements.

With 8 dedicated CEX2A cards, the native measurement is limited by nearly 100% processor utilization. The guest measurement achieved only 90% processor utilization and 14% CEX2A utilization and appears to be limited by some undetermined system serialization. The guest measurement achieved an external throughput rate of 0.83 times the native measurement. Processor time per transaction for the guest measurement is 8% higher than the native measurement.

Table 7. Guest versus native for z/OS System SSL

Type Native Guest
Dedicated CEX2A cards 8 8
Run ID E5831BE3 E5904VE5
Tx/sec (w) 2575.58 2149.50
Total Util/Proc (f) 99.93 na
Total Util/Proc (p) na 90.0
Total msec/Tx (u) 1.55 na
Total msec/Tx (p) na 1.675
CEX2A Utilization (f) 16 na
CEX2A Utilization (m) na 14
Tx/sec Ratio (Guest/Native) - 0.835
Notes: zVM 5.2.0 GA; z/OS V1R6 with update ICSF code; 2094-738; 4 dedicated processors, 4G central storage, no expanded storage; workload: SSL Exerciser, no client authentication, 1 connection, 1 packet, 2048 bytes each way, 1024 bit keys, AES-256 SHA US cipher, no session id caching; 2 servers, 40 server threads, 40 client threads; server SID cache = 16000, client SID cache = 0, server SID timeout = 180, client SID timeout = 180; (w) = workload, (u) = z/OS RMF & workload RUN-DATA, (f) = z/OS RMF, (p) = z/VM Perfkit, (m) = z/VM Monitor

Dedicated CEX2A cards on a z9 with z/OS by SSL cipher 

Table 8 contains a summary of the results from z9 measurements using 3 CEX2A dedicated cards for a z/OS guest and various SSL ciphers.

With 8 CEX2A cards, measured external throughput rates for all five ciphers varied by less than 6% with the DES SHA cipher providing the highest rate and the RC4 MD5 US cipher providing the lowest rate. Native z/OS measurements, not included in this report, showed up to 50% improvement using the new CPACF support for the AES ciphers. The AES-256 SHA US cipher achieved an external throughput rate of 0.96 times the external throughput rate achieved with the DES SHA cipher. This ratio is much better than the ones reported in the Linux section, thus demonstrates that the z/OS guest receives a benefit similar to native z/OS from the new CPACF support provided by z/OS.

Neither processor utilization nor CEX2A utilization are 100% so all of these measurements are limited by the undetermined system serialization.

Table 8. Eight dedicated CEX2A cards with one z/OS guest by cipher

cipher RC4 MD5 US DES SHA TDES SHA US AES-128 SHA US AES-256 SHA US
Run ID E5904VE1 E5904VE2 E5904VE3 E5904VE4 E5904VE5
Tx/sec (w) 2101.33 2224.92 2202.33 2221.92 2149.50
Total Util/Proc (p) 92.7 82.8 83.9 83.2 90.0
Total msec/Tx (p) 1.765 1.489 1.524 1.498 1.675
Notes: z/VM 520 GA 2094-734; 4 dedicated processors; 5G central storage; 4G expanded storage; workload: SSL Exerciser; no session id caching; (w) = Workload, (p) = z/VM Perfkit

Contents | Previous | Next