Contents | Previous | Next

z990 Guest Crypto Enhancements

This section presents and discusses the results of a number of new measurements that were designed to understand the performance characteristics of the enhanced z990 cryptographic support. This support includes:

  • shared cryptographic support for the PCIXCC card
  • dedicated queue cryptographic support for PCICA and PCIXCC cards
  • dedicated queue cryptographic support for z/OS
  • LINUX SSL support for more ciphers

On the eServer z890 and z990 system, cryptographic hardware available at the time of this report was:

  • The CP Assist for Cryptographic Function (CPACF) associated with each z990 processor
  • Up to 6 PCI Cryptographic Accelerator (PCICA) features (2 cards per feature)
  • Up to 4 PCI-X Cryptographic Coprocessor (PCIXCC) features (1 Coprocessor per feature)
The total number of PCICA and PCIXCC features cannot exceed 8.

LINUX Guest Crypto on z990

The section titled Linux Guest Crypto Support describes the original cryptographic support and the original methodology. The section titled Linux Guest Crypto on z990 describes additional cryptographic support and methodology.

Measurements were completed using the Linux OpenSSL Exerciser Performance Workload described in Linux OpenSSL Exerciser. Specific parameters used can be found in the measurement item list, various table columns, or table footnotes.

Some of the original methodology has changed including system levels, client machines, and connectivity types.

Client machines, client threads, servers, server threads, and connectivity paths were increased as needed to maximize usage of the processors and encryption facilities in each individual configuration. The range of values is listed in the common items table but individual measurement values are not in the tables because they had no specific impact on the results.

For some measurements, a z/OS system was active in a separate LPAR on the measurement system. z/OS RMF collects data for the full cryptographic card configuration. This RMF data is used to calculate PCIXCC and PCICA card utilization data for some tables in this report.

Items common to the previous measurements are summarized in . Items common to the measurements in this section are summarized in Table 1.

Table 1. Common items for measurements in this section


Client Authentication
Server SID Cache
Client SID Cache
Client SID Timeout
Connections
Packets
Send Bytes
Receive Bytes
Key Size


No
Disabled
0
180
1
1
2048
2048
1024


z/VM System Level
Linux System Level
Linux Kernel Level
OpenSSL Code Level
z90Crypt Level


5.1.0 SLU 000
SLES8
2.4.21-110
0.9.7C
1.2.2


Real Connectivity (1 Guest)
Real Connectivity (30 Guests)
Virtual Connectivity
TCP/IP Virt Mach Size
TCP/IP Virt Mach Mode


LINUX TCP/IP
z/VM TCP/IP
vCTC
256M
UP


Guest Type
Linux Kernel Mode
Linux Kernel Size


V=V
MP
31-bit


Linux Virt Mach Size (1 Guest)
Linux Virt Mach Size (30 Guests)
Linux Virt Processors (1 Guest)
Linux Virt Processors (30 Guests)


2G
128M
4
1


Servers
Server Threads
Client Threads
Client Machines


100-180
100-180
100-180
1-2


Server Model
Client Model
Connective Path 1
Connective Path 2
Connective Path 3


z990 dedicated LPAR
z990, z900, S/390 LPAR
HiperSockets
CTCs
CTCs

Shared PCIXCC and PCICA cards with 1 LINUX guest  For both shared PCICA and PCIXCC cards, VM routes cryptographic operations to all available real cards. For a single guest, the total rate is determined by the amount that can be obtained through the 8 virtual queues or the maximum rate of the real cards. A single LINUX guest achieved a higher throughput rate with 1 PCIXCC card than with 1 PCICA card. With additional PCICA cards, the rate remained nearly constant. With additional PCIXCC cards, the rate continued to increase but did not reach 100% card utilization.

With a sufficient number of LINUX guests, the shared cryptographic support will reach 100% utilization of the real PCICA or PCIXCC cards unless 100% processor utilization becomes the limiting factor.

Measurements were obtained to compare the performance of the SSL workload using hardware encryption between the newly supported PCIXCC cards and the existing support for the PCICA cards.

Results of the new shared queue support for PCIXCC cards along with existing support for PCICA cards are summarized in Table 2.

Table 2. Shared PCIXCC and PCICA cards with 1 LINUX guest


Shared PCICA cards
Shared PCIXCC cards


1
0


6
0


0
1


0
2


0
4


Run ID


E4827BV1


E4719BV1


E4726BV1


E4720BV2


E4426BV1


Tx/sec (m)
Total Util/Proc (v)
Total Util/PCICA (f)
Total Util/PCIXCC (f)


876.09
26.7
na
na


871.50
27.2
13.0
na


1145.76
34.4
na
97.8


1969.90
59.0
na
90.3


2431.50
64.5
na
na

Note: 2084-324; 4 dedicated processors, 4G central storage, no expanded storage; Workload: SSL exerciser, RC4 MD5 US cipher, no session id caching; 1 LINUX Guest; (m) = Workload MINRATES, (v) = VMPRF, (f) = z/OS RMF

Dedicated PCIXCC and PCICA cards with 1 LINUX guest 

For both dedicated PCICA and PCIXCC cards, a single LINUX guest routes cryptographic operations to all dedicated cards. A single guest can obtain the maximum rate of the real cards unless processor utilization becomes a limit. A single LINUX guest achieved a higher throughput rate with 1 PCIXCC card than with 1 PCICA card. The measurement with 2 PCIXCC cards achieved nearly 2.0 times the rate of the measurement with 1 PCIXCC card. The measurement with 6 PCICA cards and the measurement with 4 PCIXCC cards are limited by nearly 100% processor utilization and thus do not achieve the maximum rate for the encryption configuration.

Results to evaluate performance of the new dedicated queue support for PCIXCC cards and PCICA cards is summarized in Table 3.

Table 3. Dedicated PCIXCC and PCICA cards with 1 LINUX guest


Dedicated PCICA cards
Dedicated PCIXCC cards


1
0


6
0


0
1


0
2


0
4


Run ID


E4721BV3


E4719BV2


E4720BV1


E4719BV3


E4426BV2


Tx/sec
Total Util/Proc (v)
Total Util/PCICA (f)
Total Util/PCIXCC (f)


1080.39
31.4
98.4
na


3967.17
96.9
59.3
na


1186.46
34.7
na
99.9


2384.11
66.6
na
100.0


4351.34
99.8
na
na

Note: 2084-324; 4 dedicated processors, 4G central storage, no expanded storage; Workload: SSL exerciser, RC4 MD5 US cipher, no session id caching; 1 LINUX Guest; (v) = VMPRF, (f) = z/OS RMF

Dedicated versus shared cards with 1 LINUX guest  Dedicated cards provided a higher rate than shared cards for all measured encryption configurations. However, there was a wide variation in the ratio between dedicated and shared. The minimum ratio of 1.036 occurred with the 1 PCIXCC card configuration because the shared measurement achieve 97.8% utilization on the PCIXCC card, thus allowing little opportunity for improvement. The maximum ratio of 4.552 occurred with the 6 PCICA card configuration because the shared measurement achieved only 13% utilization on the 6 PCICA cards, thus leaving a lot of opportunity for the dedicated measurement.

The shared results from Table 2 and the dedicated results from Table 3 are combined for comparison in Table 4.

Table 4. Dedicated versus shared cards with 1 LINUX guest


PCICA cards
PCIXCC cards


1
0


6
0


0
1


0
2


0
4


Run ID (Shared)
Run ID (Dedicated)


E4827BV1
E4721BV3


E4719BV1
E4719BV2


E4726BV1
E4720BV1


E4720BV2
E4719BV3


E4426BV1
E4426BV2


Tx/sec (Shared)
Tx/sec (Dedicated)


876.09
1080.39


871.50
3967.17


1145.76
1186.46


1969.90
2384.11


2431.50
4351.34


Ratio (Dedicated/Shared)


1.233


4.552


1.036


1.210


1.790

Note: see Table 2 and Table 3

Shared PCIXCC and PCICA cards with 30 LINUX guest 

30 LINUX guests obtain a higher throughput than a single LINUX guest. Single guest measurements are limited by the 8 virtual queues for shared cryptographic support. 30 guest measurements are limited by the 100% processor or 100% cryptographic card utilization.

With 6 PCICA cards, processor utilization becomes a limiting factor before the PCICA cards reach 100% utilization. Processor time per transaction with 30 LINUX guests was 8% lower than the 1 guest measurement.

With 2 PCIXCC cards, the PCIXCC utilization becomes a limiting factor before the processor reaches 100% utilization. Processor time per transaction with 30 LINUX guests was 13% higher than the 1 guest measurement.

The measurement with 2 PCIXCC cards achieved a rate 0.679 times the measurement with 6 PCICA cards. Processor time per transaction for the 2 PCIXCC card measurement is 19% higher than the 6 PCICA card measurement.

With 4 PCIXCC cards, processor utilization becomes a limiting factor before the PCIXCC cards reach 100% utilization. Processor time per transaction with 30 LINUX guests was 4% higher than the 1 guest measurement.

The measurement with 4 PCIXCC cards achieved a rate 1.037 times the measurement with 6 PCICA cards and 1.528 times the measurement with 2 PCIXCC cards. Processor time per transaction for the 4 PCIXCC card measurement is 3.2% lower than the 6 PCICA card measurement and 19% lower than the 2 PCIXCC card measurement.

Results of the 30 guest measurements and corresponding 1 guest measurements are summarized in Table 5.

Table 5. Shared PCIXCC and PCICA cards by number of LINUX guests


Shared PCICA cards
Shared PCIXCC cards
No. LINUX Guest


6
0
1


6
0
30


0
2
1


0
2
30


0
4
1


0
4
30


Run ID


E4719BV1


E4715BV2


E4720BV2


E4720BV3


E4426BV1


E4419BV1


Tx/sec
Total Util/Proc (h)
Total msec/Tx (h)
Total Util/PCICA (f)
Total Util/PCIXCC (f)


871.50
27.10
1.244
13.0
na


3470.93
98.96
1.140
52.2
na


1969.90
58.98
1.198
na
90.3


2355.10
79.95
1.358
na
99.5


2431.50
64.48
1.061
na
na


3599.51
99.27
1.103
na
na

Note: 2084-324; 4 dedicated processors, 4G central storage, no expanded storage; Workload: SSL exerciser, RC4 MD5 US cipher, no session id caching; (h) = hardware instrumentation, (f) = z/OS RMF

Shared PCIXCC and PCICA cards with 30 LINUX guest by SSL cipher 

Changing the cipher had very little impact on any of the results. With 6 PCICA cards, measurements for all three ciphers are limited by nearly 100% processor utilization and the rates vary by less than 5%. With 2 PCIXCC cards, measurements for all three ciphers are limited by nearly 100% card utilization and the rates vary by less than 1.1%.

With a RC4 MD5 US cipher, the measurement with 2 PCIXCC cards achieved a rate 0.679 times the measurement with 6 PCICA cards. Processor time per transaction for the 2 PCIXCC card measurement is 19% higher than the 6 PCICA card measurement.

With a DES SHA US cipher, the measurement with 2 PCIXCC cards achieved a rate 0.659 times the measurement with 6 PCICA cards. Processor time per transaction for the 2 PCIXCC card measurement is 10% higher than the 6 PCICA card measurement.

With a TDES SHA US cipher, the measurement with 2 PCIXCC cards achieved a rate 0.687 times the measurement with 6 PCICA cards. Processor time per transaction for the 2 PCIXCC card measurement is 8% higher than the 6 PCICA card measurement.

Results by cipher for both PCIXCC and PCICA cards are summarized in Table 6.

Table 6. Shared PCIXCC and PCICA cards with 30 LINUX guest by SSL cipher


Cipher
Shared PCICA cards
Shared PCIXCC cards


RC4 MD5 US
6
0


DES SHA US
6
0


TDES SHA US
6
0


RC4 MD5 US
0
2


DES SHA US
0
2


TDES SHA US
0
2


Run ID


E4715BV2


E4715BV1


E4715BV3


E4720BV3


E4721BV1


E4721BV2


Tx/sec
Total Util/Proc (h)
Total msec/Tx (h)
Total Util/PCICA (f)
Total Util/PCIXCC (f)


3470.93
98.96
1.140
52.2
na


3577.01
99.07
1.108
53.7
na


3429.93
98.53
1.149
51.4
na


2355.10
79.95
1.358
na
99.5


2357.94
72.11
1.223
na
99.6


2357.77
73.20
1.242
na
99.6

Note: 2084-324; 4 dedicated processors, 4G central storage, no expanded storage; Workload: SSL exerciser, no session id caching; 30 LINUX Guests; (h) = hardware instrumentation, (f) = z/OS RMF

z/OS Guest Crypto on z990

z/VM support for z/OS Guest Crypto on z990 is new with z/VM 5.1.0 and was evaluated with both dedicated PCICA and PCIXCC cards. Two separate z/OS performance workloads were used for this evaluation.

  • z/OS Integrated Cryptographic Service Facility (ICSF) Sweeps

  • z/OS System Secure Sockets Layer (System SSL)

z/OS Integrated Cryptographic Service Facility (ICSF) is a software element of z/OS that works with the hardware cryptographic features and the z/OS Security Server--Resource Access Control Facility (RACF), to provide secure, high-speed cryptographic services in the z/OS environment. ICSF provides the application programming interfaces by which applications request the cryptographic services. The cryptographic features are secure, high-speed hardware that do the actual cryptographic functions. The cryptographic features available to applications depend on the server or processor hardware.

z/OS System Secure Sockets Layer (System SSL) is part of the Cryptographic Services element of z/OS. Secure Sockets Layer (SSL) is a communications protocol that provides secure communications over an open communications network (for example, the Internet). The SSL protocol is a layered protocol that is intended to be used on top of a reliable transport, such as Transmission Control Protocol (TCP/IP). SSL provides data privacy and integrity as well as server and client authentication based on public key certificates. Once an SSL connection is established between a client and server, data communications between client and server are transparent to the encryption and integrity added by the SSL protocol. System SSL supports the SSL V2.0, SSL V3.0 and TLS (Transport Layer Security) V1.0 protocols. TLS V1.0 is the latest version of the SSL protocol. z/OS provides a set of SSL C/C&supplus.&supplus. callable application programming interfaces that, when used with the z/OS Sockets APIs, provide the functions required for applications to establish this secure sockets communications. In addition to providing the API interfaces to exploit the SSL and TLS protocols, System SSL is also providing a suite of Certificate Management APIs. These APIs give the capability to create/manage your own certificate databases, use certificates stored in key databases and key rings for purposes other than SSL and to build/process Public-Key Cryptography Standards (PKCS) #7 standard messages.

The SSL protocol begins with a "handshake." During the handshake, the client authenticates the server, the server optionally authenticates the client, and the client and server agree on how to encrypt and decrypt information. A non-cached measurement is created by setting a cache size of zero for the client application. This ensures no Session IDs are found in the cache and allows a measurement with no cache hits. Although no session IDs are found in any server cache, the size of the cache is important because all new session IDs must be placed in the cache and it will generally become full before the server Session ID Timeout value expires. For measurement consistency when comparing different numbers of servers, the server Session ID cache is set to "32000 divided by the number of servers" instead of the original methodology value of "32000 per server".

Dedicated PCIXCC and PCICA cards with 1 z/OS guest for ICSF test cases 

The rates achieved by a z/OS guest using the z/VM dedicated cryptographic support are nearly identical to z/OS native for all measured test cases. There are a few unexplained anomalies that occurred causing the minimum and maximum ratios to be much different than the average ratio. In many of these anomalies, the guest measurement achieved a higher rate than the z/OS native measurement. Of the 663 individual test case comparisons, 85% of the guest rates were within 1% of the native measurement and 95% of the guest rates were within 2% of the native measurement. The remaining 5% fall into the unexplained anomalies.

Measurements were completed using the z/OS ICSF Performance Workload described in z/OS Integrated Cryptographic Service Facility (ICSF) Performance Workload .

The number of test cases and configurations produced far too much data to include in this report but Table 7 has a summary of guest to native throughput ratios for all measurements. The number of test cases measured for each ICSF sweep is included in parenthesis following the sweep name.

Table 7. Guest to Native Throughput Ratio for z/OS ICSF Sweeps


PCICA cards
PCIXCC cards
Jobs


1
0
1


1
0
8


0
1
1


0
1
7


0
2
14


PCXA Sweep (28)
Run ID (Native)
Run ID (Guest)
Ratio (Average)
Ratio (Minimum)
Ratio (Maximum)



E4806IW1
E4808IV1
0.996
0.987
1.021



E4805IW3
E4806IV1
1.001
0.938
1.082


























XCPC Sweep (81)
Run ID (Native)
Run ID (Guest)
Ratio (Average)
Ratio (Minimum)
Ratio (Maximum)



















E4728IW1
E4728IV1
0.998
0.991
1.085



E4727IW1
E4726IV1
1.001
0.997
1.009



E4727IW2
E4728IV2
0.998
0.992
1.002


XCDC Sweep (86)
Run ID (Native)
Run ID (Guest)
Ratio (Average)
Ratio (Minimum)
Ratio (Maximum)



















E4817IW2
E4819IV2
0.995
0.935
1.041



E4817IW1
E4819IV1
0.998
0.982
1.004










XCPB Sweep (32)
Run ID (Native)
Run ID (Guest)
Ratio (Average)
Ratio (Minimum)
Ratio (Maximum)



















E4805IW1
E4804IV3
0.997
0.988
1.027



E4730IW1
E4804IV2
0.998
0.995
1.001










XCDB Sweep (64)
Run ID (Native)
Run ID (Guest)
Ratio (Average)
Ratio (Minimum)
Ratio (Maximum)



















E4802IW2
E4803IV1
1.001
0.986
1.250



E4730IW2
E4802IV1
0.999
0.997
1.000









Note: 2084-324; 1 dedicated processors, 4G central storage, no expanded storage; Workload: z/OS ICSF Sweeps; Ratios are calculated from workload data

Other observations available from this set of measurement data but without any supporting data in this report include:

  • Multiple jobs provided a higher rate than a single job for all except 1 test case. Specific ratios varied dramatically by test case. Generally, ratios were higher for the PCICA card measurements than the PCIXCC card measurements. The number of jobs in the multiple job measurements is enough to reach full capacity of the specified encryption facility.

  • 28 of the ICSF test cases are supported on both the PCICA card and the PCIXCC card. With 1 job, the PCIXCC card achieved a higher rate than the PCICA card for all test cases but the amount of increase varied dramatically by individual test case. The minimum increase was 1.049 times and the maximum increase was 5.060 times. With multiple jobs (enough to reach 100% card utilization), the PCIXCC card achieved a lower rate than the PCICA card for 24 ICSF test cases and a higher rate than the PCICA card for 4 ICSF test cases. The minimum ratio was 0.355 times and the maximum ratio was 1.699 times.

  • 2 PCIXCC cards achieved nearly 2.0 times the rate of 1 PCIXCC card for all test cases.

Dedicated PCIXCC and PCICA cards with 1 z/OS guest for System SSL 

Measurements were completed for a data exchange test case with both servers and clients on the same z/OS system using the z/OS SSL Performance Workload described in z/OS Secure Sockets Layer (System SSL) Performance Workload. Specific parameters used can be found in the measurement item list, various table columns, common items table, or table footnotes.

No z/VM or hardware instrumentation data was collected for these measurements, so the only available data is z/OS RMF data and workload data.

Results for the dedicated guest cryptographic support and native z/OS measurements are summarized in Table 8.

With 8 dedicated PCICA cards, both the native and guest measurements are limited by nearly 100% processor utilization. The guest measurement achieved a rate of 0.946 times the native measurement.

With 2 dedicated PCIXCC cards, both the native and guest measurements are limited by 100% PCIXCC card utilization. Both measurement achieved nearly identical rates. Processor time per transaction for the guest measurement is 13.6% higher than the native measurement.

The measurement with 2 PCIXCC cards is limited by 100% PCIXCC card utilization and achieved a rate 0.565 times the measurement with 8 PCICA cards which is limited by 100% processor utilization. Processor time per transaction for the 2 PCIXCC card measurement is 7% higher than the 8 PCICA card measurement.

Table 8. Guest versus Native for z/OS System SSL


Type
Dedicated PCICA cards
Dedicated PCIXCC cards


Native
8
0


Guest
8
0


Native
0
2


Guest
0
2


Run ID


E4806BE1


E4808VE1


E4805BE1


E4808VE2


Tx/sec (u)
Total Util/Proc (u)
Total Msec/tx (u)
Total Util/PCICA (f)
Total Util/PCIXCC (f)


1640.867
99.98
2.44
26.1
na


1552.000
99.98
2.58
na
na


875.146
53.17
2.43
na
100.0


876.564
60.47
2.76
na
na


Tx/sec Ratio (Guest/Native)


-


0.946


-


1.002

Note: zVM 5.1.0 SLU 000; z/OS V1R4; 2084-324; 4 dedicated processors, 4G central storage, no expanded storage; Workload: SSL exerciser, No Client Authentication, 1 Connection, 1 Packet, 2048 bytes each way, 1024 bit keys, RC4 MD5 US cipher, no session id caching; 2 Servers, 40 Server Threads, 40 Client Threads; Server SID Cache = 16000, Client SID Cache = 0, Server SID Timeout = 180, Client SID Timeout = 180; (u) = z/OS RMF & Workload RUN-DATA, (f) = z/OS RMF

Contents | Previous | Next