|
Contents | Previous | Next
Guest Cryptographic Enhancements
This section summarizes the results of a number of
new measurements that were designed to understand the
performance characteristics of the enhanced cryptographic support
provided
in z/VM 5.2.0.
Introduction
z/VM 5.2.0 extended the existing shared and dedicated cryptographic queue
support to include the
Cryptographic Express2 coprocessor (CEX2C) on the z990
and z9 processors
and the
Cryptographic Express2 Accelerator coprocessor (CEX2A) on the z9
processor.
Support of the CEX2C card is also available in z/VM 5.1.0
and support of the CEX2A card was provided on
z/VM 5.1.0 via APAR VM63646.
The existing z/VM 5.1.0
support, component terminology, and measurement
methodology are described in
z990 Guest Crypto Enhancements.
z/VM 5.2.0 also
provided support for the CHSC Store Crypto-Measurement data command.
This Crypto-Measurement data along with z/VM internal data are now
included in z/VM monitor data. See
Monitor Enhancements
for details.
In addition to these z/VM enhancements, z/OS provided
support for
additional
features of the
CP Assist for Cryptographic Functions (CPACF) that are available on
the z9.
Summary of Results
The results of individual measurements
are affected by the cryptographic card configuration, processor
configuration, cryptographic sharing configuration, cryptographic
operations,
the guest
operating system,
and the
cipher for the SSL workloads.
For dedicated cryptographic cards, both
z/OS and Linux will route cryptographic operations
to all available cards
and each can use the full capacity of the encryption facilities unless
limited by
processor utilization or other serialization.
z/VM guest measurements are generally limited by the same factor as a
native measurement.
All of the
Linux guest SSL
measurements with dedicated cryptographic
cards are limited by processor utilization.
The z/OS guest
SSL measurements are limited by some undetermined serialization.
The z/OS guest
ICSF measurements are nearly identical to the native z/OS measurement
and each is limited by same factor as the native measurement.
For shared cryptographic cards,
z/VM routes cryptographic
operations
to all available real cryptographic cards.
For a single guest, the external throughput
rate is determined by
the amount that can be obtained through
the 8 virtual queues,
the maximum capacity of the encryption facilities,
processor utilization, or other serialization.
Examples of external throughput rates limited by
the 8 virtual queues and by the
maximum throughput
rate of the real cryptographic
cards are included in the detailed measurement section.
Processor time per transaction is higher with shared cryptographic
cards than with dedicated cryptographic cards.
With a sufficient number of Linux guests, the shared cryptographic
support will reach 100% utilization of the real cryptographic
configuration
unless processor utilization or other serialization
becomes the limiting factor.
All of
the multiple
guest measurements included in the detailed measurement section
are limited by processor utilization.
Results for measurements of the SSL workload
vary by SSL cipher.
For the SSL workload, CEX2C
and CEX2A cards are used only for the SSL handshake. Data encryption
using the specified cipher is handled by software encryption routines or
by CPACF. The detail measurement section contains both z/OS and Linux
results by SSL cipher. Ratios between ciphers vary depending on the guest
operating system and the processor model.
The z/VM 4.3.0
section titled
Linux Guest Crypto Support
describes the original
cryptographic
support and the original methodology.
The
z/VM 4.4.0
section titled
Linux Guest Crypto on z990
and
the
z/VM 5.1.0
section titled
Linux Guest Crypto on z990
describe
additional
cryptographic
support and methodology.
Measurements were completed using the
Linux OpenSSL Exerciser
Performance Workload
described in
Linux OpenSSL Exerciser.
Specific parameters used can be found
in the common items
table, various table columns, or table footnotes.
Items common to the measurements in this section are
summarized
in Table 1.
Table 1. Common items for measurements in this section
| Client Authentication | No |
| Server SID Cache | Disabled |
| Client SID Cache | 0 |
| Client SID Timeout | 180 |
| Connections | 1 |
| Packets | 1 |
| Send Bytes | 2048 |
| Receive Bytes | 2048 |
| Key Size | 1024 |
| z/VM System Level | 5.2.0 GA |
| Linux System Level | SLES9 SP2 |
| Linux Kernel Level | 2.6.13-14 |
| OpenSSL Code Level | 0.9.7C |
| z90Crypt Level | 1.2.2 |
| Real Connectivity (1 Guest) | Linux TCP/IP |
| Real Connectivity (30 Guests) | z/VM TCP/IP |
| Virtual Connectivity | Guest LAN QDIO |
| TCP/IP Virt Mach Size | 256M |
| TCP/IP Virt Mach Mode | UP |
| Guest Type | V=V |
| Linux Kernel Mode | MP |
| Linux Kernel Size | 64-bit |
| Linux Virt Mach Size (1 Guest) | 2G |
| Linux Virt Mach Size (30 Guests) | 128M |
| Linux Virt Processors (1 Guest) | 4 |
| Linux Virt Processors (30 Guests) | 1 |
| Servers | 100-280 |
| Server Threads | 100-280 |
| Client Threads | 100-280 |
| Client Machines | 1-2 |
| Server Model | z990,z9 dedicated LPAR |
| Client Model | z990 |
| Connective Paths | CTCs |
Dedicated and Shared CEX2C cards on z990:
Table 2
contains a summary of the
results from measurements using 6 CEX2C cards including
dedicated cards for a single guest, shared cards for a single guest, and
shared cards for 30 guests.
For dedicated CEX2C cards, a single Linux guest
routes cryptographic
operations to all dedicated cards.
A single guest
can obtain the
maximum throughput
rate of the real cards unless processor utilization becomes a
limit.
The measurement with 6 dedicated CEX2C cards
is limited by nearly 100% processor utilization while the
utilization of the CEX2C cards was only 80%.
For shared CEX2C cards, z/VM routes cryptographic
operations
to all available real cards.
For a single guest, the total external throughput
rate is determined by
the amount that can be obtained through the 8 virtual queues or the
maximum throughput
rate of the real cards.
The single user measurement with 6 shared CEX2C cards is limited by the
8 virtual queues with processor utilization of 74% and CEX2C utilization
of 54%.
Processor time per transaction is higher for shared cards than with
dedicated cards. In the single users measurements with 6 CEX2C cards,
processor time per transaction with shared cards increased 23% from the
measurement with dedicated cards.
With a sufficient number of Linux guests, the shared cryptographic
support will reach 100% utilization of the real CEX2C cards
unless 100% processor utilization
becomes the limiting factor.
The 30 user measurement with 6 shared CEX2C cards
is limited by nearly 100% processor utilization while the
utilization of the CEX2C cards was only 56%.
Processor time per transaction is higher with 30 guests than with
a single guest. In the measurements with 6 CEX2C cards,
processor time per transaction with 30 guests increased 16% from the
measurement with a single guest.
Table 2. Dedicated and Shared CEX2C cards by number of Linux guests
| Dedicated CEX2C cards | 6 | 0 | 0 |
| Shared CEX2C cards | 0 | 6 | 6 |
| No. Linux Guest | 1 | 1 | 30 |
| Run ID | E5723BV3 | E5724BV1 | E5727BV1 |
| Tx/sec (w) | 4574.52 | 2805.18 | 3205.70 |
| Total Util/Proc (p) | 99.2 | 74.6 | 99.2 |
| Total msec/Tx (p) | 0.867 | 1.064 | 1.238 |
| CEX2C Utilization (m) | 80 | 54 | 56 |
|
Notes:
z/VM 520 GA;
2084-324;
4 dedicated processors;
4G central storage;
no expanded storage;
workload: SSL Exerciser;
RC4 MD5 US cipher;
no session id caching;
(w) = test case;
(p) = z/VM Performance Toolkit (Perfkit)
(m) = z/VM Monitor
|
Shared CEX2C cards on z990 with 30 Linux guests by SSL cipher:
Table 3
contains a summary of the
results from measurements using 6 CEX2C
shared cards for 30 guests and various SSL ciphers.
With 6 CEX2C
cards, measurements for all five ciphers are limited
by nearly 100% processor utilization. The external throughput
rates vary by more than 40%,
with the DES SHA cipher providing the highest rate and the AES-256
SHA US cipher providing the lowest rate.
The AES-256 SHA US cipher achieved an external throughput
rate of 0.71 times the external throughput
rate achieved
with the DES SHA cipher.
Results
with a RC4 MD5 US and DES SHA cipher
are nearly identical to
results for similar measurements using PCIXCC cards
shown in table
Shared PCIXCC and PCICA cards with 30 Linux guest by SSL cipher
of the z/VM 5.1.0 section titled
z990 Guest Crypto Enhancements.
Table 3. Six shared CEX2C cards with 30 Linux guests by cipher
| cipher | RC4 MD5 US | DES SHA | AES-128 SHA US | AES-256 SHA US |
| Run ID | E5727BV1 | E5728BV1 | E5731BV1 | E5731BV2 |
| Tx/sec (w) | 3205.70 | 3632.55 | 2783.58 | 2576.08 |
| Total Util/Proc (p) | 99.2 | 98.9 | 99.2 | 99.4 |
| Total Msec/Tx (p) | 1.238 | 1.089 | 1.426 | 1.543 |
|
Notes:
z/VM 520 GA
2084-324;
4 dedicated processors;
4G central storage;
no expanded storage;
Workload: SSL exerciser;
no session id caching;
(w) = Workload,
(p) = z/VM Perfkit
|
Dedicated and Shared CEX2A cards on z9:
Table 4
contains a summary of the
results from z9 measurements using CEX2A cards including
3 dedicated cards for a single guest,
1 shared card for a single guest,
3 shared cards for a single guest,
and
3 shared cards for 30 guests.
For dedicated CEX2A cards, a single Linux guest
routes cryptographic
operations to all dedicated cards.
A single guest
can obtain the
maximum throughput
rate of the real cards unless processor utilization becomes a
limit.
The measurement with 3 dedicated CEX2A cards
is limited by nearly 100% processor utilization while the
utilization of the CEX2A cards was only 60%.
For shared CEX2A cards, z/VM routes cryptographic
operations
to all available real cards.
For a single guest, the total external throughput
rate is determined by
the amount that can be obtained through the 8 virtual queues or the
maximum throughput
rate of the real cards.
The single user measurement with 1 shared CEX2A card is limited by the
CEX2A utilization of 97% while processor utilization was 70%.
The single user measurement with 3 shared CEX2A cards is limited by the
8 virtual queues with processor utilization of 92% and CEX2A utilization
of 50%.
Processor time per transaction is higher for shared cards than with
dedicated cards. In the single user measurements with 3 CEX2A cards,
processor time per transaction with shared cards increased 25% from the
measurement with dedicated cards.
With a sufficient number of Linux guests, the shared cryptographic
support will reach 100% utilization of the real CEX2A cards
unless 100% processor utilization
becomes the limiting factor.
The 30 user measurement with 3 shared CEX2A cards
is limited by nearly 100% processor utilization while the
utilization of the CEX2A cards was only 45%.
Processor time per transaction is higher with 30 guests than with
a single guest. In the measurements with 3 CEX2A cards,
processor time per transaction with 30 guests increased 16% from the
measurement with a single guest.
Table 4. Dedicated and Shared CEX2A cards on z9 by number of Linux guests
| Dedicated CEX2A cards | 3 | 0 | 0 | 0 |
| Shared CEX2A cards | 0 | 1 | 3 | 3 |
| No. Linux Guest | 1 | 1 | 1 | 30 |
| Run ID | E5B09BV1 | E6114BV1 | E6118BV3 | E6128BV5 |
| Tx/sec (w) | 5584.15 | 3141.89 | 4657.00 | 4305.92 |
| Total Util/Proc (p) | 99.1 | 70.2 | 92.6 | 99.2 |
| Total msec/Tx (p) | 0.710 | 0.894 | 0.795 | 0.922 |
| CEX2A Utilization (m) | 60 | 97 | 50 | 45 |
|
Notes:
z/VM 520 GA;
2094-734;
4 dedicated processors;
5G central storage;
4G expanded storage;
workload: SSL Exerciser;
RC4 MD5 US cipher;
no session id caching;
(w) = workload,
(p) = z/VM Perfkit,
(m) = z/VM Monitor
|
Shared CEX2A cards on z9 with 30 Linux guests by SSL cipher:
Table 5
contains a summary of the
results from z9
measurements using 3 CEX2A
shared cards for 30 guests and various SSL ciphers.
With 3 CEX2A
cards, measurements for all five ciphers are limited
by nearly 100% processor utilization and the external throughput
rates vary by more than 31%
with the DES SHA cipher providing the highest rate and the AES-256
SHA US cipher providing the lowest rate.
The AES-256 SHA US cipher achieved an external throughput
rate of 0.76 times the external throughput
rate achieved
with the DES SHA cipher.
All results are higher than the z990 measurements with CEX2C cards
because of the faster processor.
Table 5. Three shared CEX2A cards with 30 Linux guests by cipher
| cipher | RC4 MD5 US | DES SHA | TDES SHA US | AES-128 SHA US | AES-256 SHA US |
| Run ID | E6128BV5 | E6128BV6 | E6129BV1 | E6129BV2 | E6130BV1 |
| Tx/sec (w) | 4305.92 | 4805.16 | 4704.17 | 3867.81 | 3671.86 |
| Total Util/Proc (p) | 99.2 | 98.9 | 99.0 | 99.4 | 99.5 |
| Total msec/Tx (p) | 0.922 | 0.823 | 0.842 | 1.028 | 1.084 |
|
Notes:
z/VM 520 GA
2094-734;
4 dedicated processors;
5G central storage;
4G expanded storage;
workload: SSL Exerciser;
no session id caching;
(w) = workload;
(p) = z/VM Perfkit
|
z/OS Guest with CEX2A on z9
The z/VM 5.1.0
section titled
z/OS Guest Crypto on z990
describes the original cryptographic
support and the original methodology.
Guest versus native for z/OS ICSF with 1 dedicated CEX2A card on a z9
Measurements were completed using the
z/OS ICSF Performance Workload
PCXA sweep
described in
z/OS Integrated Cryptographic Service Facility (ICSF) Performance
Workload
.
ICSF
test cases developed for the PCICA card will execute on the CEX2A card
and
ICSF
test cases developed for the PCIXCC card will execute on the CEX2C card.
The external throughput
rates achieved by a z/OS guest using the z/VM dedicated
cryptographic
support are nearly identical to z/OS native for all measured test cases.
Of the 56 individual test case comparisons,
all of the guest rates were
within 3% of the native measurement.
The 56 individual test cases
produced far too much data to
include in this report but
Table 6
has a summary of guest to native
throughput
ratios
for all measurements.
Multiple jobs provided a higher external throughput
rate than a single job for all
test case. Specific ratios varied dramatically by test case.
The number of jobs in the multiple job
measurements is enough to reach full capacity of the specified
encryption facility.
Table 6. Guest to Native Throughput Ratio for z/OS ICSF PCXA Sweep
| CEX2A | 8 | 8 |
| Jobs | 1 | 8 |
| PCXA Sweep (28 test cases) | | |
| Run ID (Native) | E5831IW1 | E5901IW1 |
| Run ID (Guest) | E5905IV1 | E5905IV2 |
| Ratio (Average) | 0.990 | 0.985 |
| Ratio (Minimum) | 0.988 | 0.974 |
| Ratio (Maximum) | 0.992 | 1.000 |
|
2094-738;
1 dedicated processor, 4G central storage,
no expanded storage;
workload: z/OS ICSF Sweeps;
Ratios are calculated from workload data
|
Guest versus native for z/OS SSL with 8 dedicated CEX2A cards on a z9
Measurements were completed for a data exchange test case with both
servers and clients on the same z/OS system
using the
z/OS SSL Performance Workload
described in
z/OS Secure Sockets Layer (System SSL) Performance Workload.
Specific parameters used can be found
in the various table columns
or table footnotes.
Table 7
contains a
summary of
results for the
dedicated guest cryptographic
support and
native z/OS measurements.
With 8 dedicated
CEX2A cards,
the native
measurement is limited by nearly 100% processor utilization.
The guest measurement achieved only 90% processor utilization
and 14% CEX2A utilization and appears to be limited by some
undetermined system serialization.
The guest measurement achieved an external throughput
rate of 0.83 times the native
measurement.
Processor
time per transaction for the guest measurement is 8% higher
than the native measurement.
Table 7. Guest versus native for z/OS System SSL
| Type | Native | Guest |
| Dedicated CEX2A cards | 8 | 8 |
| Run ID | E5831BE3 | E5904VE5 |
| Tx/sec (w) | 2575.58 | 2149.50 |
| Total Util/Proc (f) | 99.93 | na |
| Total Util/Proc (p) | na | 90.0 |
| Total msec/Tx (u) | 1.55 | na |
| Total msec/Tx (p) | na | 1.675 |
| CEX2A Utilization (f) | 16 | na |
| CEX2A Utilization (m) | na | 14 |
| Tx/sec Ratio (Guest/Native) | - | 0.835 |
|
Notes:
zVM 5.2.0 GA;
z/OS V1R6 with update ICSF code;
2094-738;
4 dedicated processors,
4G central storage,
no expanded storage;
workload: SSL Exerciser,
no client authentication,
1 connection,
1 packet,
2048 bytes each way,
1024 bit keys,
AES-256 SHA US cipher,
no session id caching;
2 servers,
40 server threads,
40 client threads;
server SID cache = 16000,
client SID cache = 0,
server SID timeout = 180,
client SID timeout = 180;
(w) = workload,
(u) = z/OS RMF & workload RUN-DATA,
(f) = z/OS RMF,
(p) = z/VM Perfkit,
(m) = z/VM Monitor
|
Dedicated CEX2A cards on a z9 with z/OS by SSL cipher:
Table 8
contains a summary of the
results from z9
measurements using 3 CEX2A
dedicated cards for a z/OS guest and various SSL ciphers.
With 8 CEX2A
cards, measured external throughput
rates
for all five ciphers
varied by less than 6%
with the DES SHA cipher providing the highest rate and the RC4 MD5
US cipher providing the lowest rate.
Native z/OS measurements, not included in this report, showed up to 50%
improvement using the new CPACF support for the AES ciphers.
The AES-256 SHA US cipher achieved an external throughput
rate of 0.96 times the external throughput
rate achieved
with the DES SHA cipher.
This ratio is much better than the ones
reported in the Linux section, thus
demonstrates that the
z/OS guest receives a benefit similar to native z/OS
from the new CPACF support provided by z/OS.
Neither processor utilization nor CEX2A utilization
are 100% so all of these measurements are
limited by
the undetermined
system serialization.
Table 8. Eight dedicated CEX2A cards with one z/OS guest by cipher
| cipher | RC4 MD5 US | DES SHA | TDES SHA US | AES-128 SHA US | AES-256 SHA US |
| Run ID | E5904VE1 | E5904VE2 | E5904VE3 | E5904VE4 | E5904VE5 |
| Tx/sec (w) | 2101.33 | 2224.92 | 2202.33 | 2221.92 | 2149.50 |
| Total Util/Proc (p) | 92.7 | 82.8 | 83.9 | 83.2 | 90.0 |
| Total msec/Tx (p) | 1.765 | 1.489 | 1.524 | 1.498 | 1.675 |
|
Notes:
z/VM 520 GA
2094-734;
4 dedicated processors;
5G central storage;
4G expanded storage;
workload: SSL Exerciser;
no session id caching;
(w) = Workload,
(p) = z/VM Perfkit
|
Contents | Previous | Next
|