About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Specialty Engine Support
Abstract
Guest support is provided for virtual CPU types of zAAP (IBM System z Application Assist Processors), zIIP ( IBM z9 Integrated Information Processors), and IFL (Integrated Facilities for LINUX Processors), in addition to general purpose CPs (Central Processors). These types of virtual processors can be defined for a z/VM user by issuing the DEFINE CPU command or placing the DEFINE CPU command in the directory. The system administrator can issue the SET CPUAFFINITY command to specify whether z/VM should dispatch user's specialty CPUs on real CPUs that match their types (if available) or simulate them on real CPs. On system configurations where the CPs and specialty engines are the same speed, performance results are similar whether dispatched on specialty engines or simulated on CPs. On system configurations where the specialty engines are faster than CPs, performance results are better when using the faster specialty engines and scale correctly based on the relative processor speed. CP monitor data and Performance Toolkit for VM both provide information relative to the specialty engines.
Introduction
This section of the report provides general observations about performance results and more detail about performance information available for effective use of the zAAP and zIIP specialty engine support. It will deal only with the z/VM support for zIIP and zAAP processors as used by a z/OS guest. References to specialty engine support in this section applies only to zIIP and zAAP processors and not IFLs.
Valid combinations of processors types are defined in z/VM: Running Guest Operating Systems. IFLs cannot be defined in the same virtual machine as a zIIP or a zAAP.
Without proper balance between the LPAR, z/VM, and guest settings, a system can have a large queue for one processor type while other processor types remain idle.
Method
The specialty engine support was evaluated using z/OS guest virtual machines and three separate workloads.A z/OS JAVA Workload described in z/OS JAVA Encryption Performance Workload provided use of the zAAP. This workload will run a processor at 100% utilization and is mostly eligible for a zAAP.
A z/OS DB2 Utility Workload described in z/OS DB2 Utility Workload provided use of a zIIP. Due to DASD I/O delays and processing that is not eligible for a zIIP, this workload can only utilize about 10% of a zIIP.
A z/OS SSL Performance Workload described in z/OS Secure Sockets Layer (System SSL) Performance Workload provided utilization of the CPs. It is capable of using all the available CP processors.
The workloads were measured independently and together in many different configurations. The workloads were measured with and without specialty engines in the configuration. The workloads were measured with all available SET CPUAFFINITY values (ON, OFF, and Suppressed). The workloads were also measured with z/OS running directly in an LPAR. Measurements of individual workloads were used to verify quantitative performance results. Measurements involving multiple workloads were used to evaluate the various controlling parameters and to demonstrate the available performance information but not for quantitative results.
New z/VM monitor data available with the specialty engine support is described in z/VM 5.3 Performance Management.
This report section will deal mostly with the controlling parameters and the available performance information rather than the quantitative results.
Results and Discussion
Results were always consistent with the speed and numbers of engines provided to the application. Balancing of the LPAR, z/VM, and guest processors configurations is the key to optimal performance. This section will deal mostly with performance information available for effective use of the specialty engine support. Without proper balance between the LPAR, z/VM, and guest settings, a system can have a large queue for one processor type while other processor types remain idle.This section contains examples of both Performance Toolkit data and z/OS RMF data. Terminology for processor type has varied in both and includes CP for Central Processors, IFA, AAP, ZAAP for zAAP, and IIP, ZIIP for zIIP.
Specialty Engines from a LPAR Perspective
The LPAR in which z/VM is running can have a mixture of central processors and various types of specialty engines. Processors can be dedicated to the z/VM LPAR or they can be shared with other LPARs. For LPARs with shared processors, the LPAR weight is used to determine the capacity factor for the z/VM LPAR. On z9 processors, the weight for specialty engines can be different than the weight for the primary engine. Shared processors can be capped or non-capped. A capped LPAR cannot exceed its defined capacity factor but a non-capped LPAR can use excess capacity from other LPARs. On some z9 and z990 models, the specialty engines are faster than the primary engines.
Identifying the Relative Processor Speeds
The Performance Toolkit can be used tell whether CPs and specialty processors run at the same speed or different speeds.
Here is an example (Runid E7411ZZ2) of the Performance Toolkit SYSCONF screen showing the same Cap value for CPs and specialty engines so dispatching on a virtualized engine does not have any performance advantage over simulation on a CP.
FCX180 Run 2007/06/20 12:21:10 SYSCONF System Configuration, Initial and Changed _________________________________________________________________________________ Initial Status on 2007/04/11 at 16:42, Processor 2094-733 Real Proc: Cap 1456, Total 40, Conf 33, Stby 0, Resvd 7 Sec. Proc: Cap 1456, Total 5, Conf 5, Stby 0, Resvd 2
Here is an example (Runid E7307ZZ2) of the Performance Toolkit SYSCONF screen showing a different Cap value for CPs and specialty engines so dispatching on a virtualized engine has a performance advantage over simulation on a CP. A smaller number means a faster processor.
FCX180 Run 2007/06/20 09:53:49 SYSCONF System Configuration, Initial and Changed _________________________________________________________________________________ Initial Status on 2007/03/07 at 21:30, Processor 2096-X03 Real Proc: Cap 2224, Total 7, Conf 3, Stby 0, Resvd 4 Sec. Proc: Cap 1760, Total 3, Conf 3, Stby 0, Resvd 2
Identifying Dedicated, Shared Weights, and Capping
Quantitative results can be affected by how the processors are defined for the z/VM LPAR. With dedicated processors, the z/VM LPAR gets full utilization of the processors. With shared processors, the z/VM LPAR's capacity factor is determined by the z/VM LPAR weight, the total weights for each processor type, and the total number of each type processor. If capping is specified, the z/VM LPAR cannot exceed it calculated capacity factor. If capping is not specified, the z/VM LPAR competes with other LPARs for unused cycles by processor type.
Here is an example (Runid E7307ZZ2) of the Performance Toolkit LPAR screen for the KST1 LPAR with dedicated CP, zAAP, and zIIP processors. It shows 100% utilization regardless of how much is actually being used by z/VM because it is a dedicated partition.
FCX126 Run 2007/06/20 09:53:49 LPAR Logical Partition Activity __________________________________________________________________________________________ Partition Nr. Upid #Proc Weight Wait-C Cap %Load CPU %Busy %Ovhd %Susp %VMld %Logld Type KST1 4 04 5 DED YES NO 83.3 0 100.0 .0 .1 99.8 99.8 CP DED NO 1 100.0 .0 .1 99.8 99.8 CP DED NO 2 100.0 .0 .1 99.8 99.8 CP DED NO 3 100.0 .0 .1 96.0 96.0 ZAAP DED NO 4 100.0 .0 .1 8.8 8.8 ZIIP
Here is an example (Runid E7123ZI2) of the Performance Toolkit LPAR screen for the KST1 LPAR with shared but non-capped CP, zAAP, and zIIP processors. KST1's weight for specialty processors is 80 versus only 20 for CPs. A new table at the bottom of this screen shows the total weights by processor type. Although KST1's fair share of CPs is only 20%, its actual utilization is 99.9% because all other LPARs are idle. This particular measurement did not have any JAVA work so the zAAP utilization is nearly zero. The zIIP utilization is only 9.3% but that is about the maximum that can be achieved by a single copy of the DB2 Utility workload.
FCX126 Run 2007/06/20 09:55:12 LPAR Logical Partition Activity __________________________________________________________________________________________ Partition Nr. Upid #Proc Weight Wait-C Cap %Load CPU %Busy %Ovhd %Susp %VMld %Logld Type KST1 4 04 5 20 NO NO 51.5 0 99.9 .0 .1 99.9 99.9 CP 20 NO 1 99.9 .0 .1 99.8 99.9 CP 20 NO 2 99.9 .1 .1 99.8 99.9 CP 80 NO 3 .0 .0 .4 .0 .0 ZAAP 80 NO 4 9.3 .2 .3 9.0 9.0 ZIIP Summary of physical processors: Type Number Weight Dedicated CP 3 100 0 ZAAP 1 100 0 IFL 1 0 0 ZIIP 1 100 0
Here is an example (Runid E6B20ZA3) of the Performance Toolkit LPAR screen for the KST1 LPAR with shared capped CP, zAAP, and zIIP processors. KST1's weight for zAAPs is 80 versus only 20 for CPs and zIIPs. Utilization of the CPs showed the expected 20%. The summary shows there are 2 real zAAPs with a total weight of 100 and so KST1's weight of 80 would allow a capacity factor equal to 1.6 zAAPs. However, since the KST1 LPAR has a single zAAP it cannot use all the allocated capacity and it is limited to 100% of its single zAAP.
FCX126 Run 2007/06/20 09:56:47 LPAR Logical Partition Activity __________________________________________________________________________________________ Partition Nr. Upid #Proc Weight Wait-C Cap %Load CPU %Busy %Ovhd %Susp %VMld %Logld Type KST1 4 04 5 20 NO YES 22.5 0 20.7 .0 78.3 20.7 95.0 CP 20 YES 1 20.7 .0 78.9 20.7 98.0 CP 20 YES 2 20.7 .0 78.6 20.7 96.5 CP 80 YES 3 95.5 .1 .1 95.4 95.5 ZAAP 20 YES 4 .0 .0 .1 .0 .0 ZIIP Summary of physical processors: Type Number Weight Dedicated CP 3 100 0 ZAAP 2 100 0 IFL 1 0 0 ZIIP 1 100 0
Here is an example (Runid E7123ZI2) of the Performance Toolkit LPARLOG screen for the KST1 LPAR with shared non-capped CP, zAAP, and zIIP processors. KST1's weight for zAAPs an zIIPs is 80 versus only 20 for CPs. This screen does not separate data by processor type so the utilization is an average for all types. The weight shown on this screen is for the last processor listed in the LPAR screen (a zIIP in this case). The label in the rightmost report column identifies KST1 as an LPAR with a mixture of engine types.
FCX202 Run 2007/06/20 09:55:12 LPARLOG Logical Partition Activity Log _________________________________________________________________________________________________ Interval <Partition-> <- Load per Log. Processor --> End Time Name Nr. Upid #Proc Weight Wait-C Cap %Load %Busy %Ovhd %Susp %VMld %Logld Type >>Mean>> KST1 4 04 5 80 NO NO ... 61.8 .1 .2 61.7 61.7 MIX >>Mean>> Total .. .. 6 100 .. .. .1 30.9 .0 ... ... ... ..
Specialty Engines from a z/VM Perspective
The CPUAFFINITY value is used to determine whether simulation or virtualization is desired for a guest's specialty engines. With CPUAFFINITY ON, z/VM will dispatch user's specialty CPUs on real CPUs that match their types. If no matching CPUs exist in the z/VM LPAR, z/VM will suppress this CPUAFFINITY and simulate these specialty engines on CPs. With CPUAFFINITY OFF, z/VM will simulate specialty engines on CPs regardless of the existence of matching specialty engines. z/VM's only use of specialty engines is for guest code that is dispatched on a virtual specialty processor. Without any guest virtual specialty processors, z/VM's real specialty processors will appear nearly idle in both the z/VM monitor data and the LPAR data. Interrupts are enabled so their usage will not be absolute zero. The Performance Toolkit SYSCONF screen was updated to provide information about the processor types and capacity factor by processor type.Here is an example (Runid E7123ZI2) of the Performance Toolkit SYSCONF screen showing the same number and capacity factor by processor type. Since this was a non-capped LPAR, the capacity shows 1000. The z/VM LPAR can only use this much if other shared LPARs do not use their fair share.
FCX180 Run 2007/06/20 09:55:12 SYSCONF System Configuration, Initial and Changed _________________________________________________________________________________ Initial Status on 2007/01/23 at 09:57, Processor 2096-X03 Log. CP : CAF 1000, Total 3, Conf 3, Stby 0, Resvd 0, Ded 0, Shrd 3 Log. ZAAP: CAF 1000, Total 1, Conf 1, Stby 0, Resvd 0, Ded 0, Shrd 0 Log. ZIIP: CAF 1000, Total 1, Conf 1, Stby 0, Resvd 0, Ded 0, Shrd 0The Performance Toolkit PROCLOG screen was updated to provide the processor type for each individual processor and to include averages by processor type.
Here is an example (Runid E7307ZZ2) of the Performance Toolkit PROCLOG screen showing the utilization of the individual processors and the average utilization by processor type. This example has all three workloads active, the z/OS and z/VM configurations are identical, CPUAFFINITY is ON, and shows full utilization of CPs and zAAPs, but only about 10% utilization of the zIIP. These values are consistent with the workload characteristics.
FCX144 Run 2007/06/20 09:53:49 PROCLOG Processor Activity, by Time _______________________________________________________________________ <------ Percent Busy -------> <--- Rates per Sec.---> C Interval P Inst End Time U Type Total User Syst Emul Vect Siml DIAG SIGP SSCH >>Mean>> 0 CP 99.8 99.5 .3 97.9 .... 125.0 12.6 .7 71.6 >>Mean>> 1 CP 99.8 99.5 .2 98.0 .... 120.9 4.5 .8 58.4 >>Mean>> 2 CP 99.8 99.5 .3 98.0 .... 123.4 3.2 .7 59.5 >>Mean>> 3 ZAAP 96.0 96.0 .1 95.8 .... 1.1 .0 36.6 1.4 >>Mean>> 4 ZIIP 8.8 8.4 .4 8.1 .... 1.0 .0 289.9 7.5 >>Mean>> . CP 99.7 99.5 .2 98.0 .... 123.0 6.7 .7 63.1 >>Mean>> . ZAAP 96.0 96.0 .1 95.8 .... 1.1 .0 36.6 1.4 >>Mean>> . ZIIP 8.8 8.4 .4 8.1 .... 1.0 .0 289.9 7.5 (Report split for formatting purposes. Ed.) ___________________________________________________________ <--------- PLDV ----------> <------ Paging -------> <Comm> Pct Mean VMDBK VMDBK To Below Fast Page Em- when Mastr Stoln Mastr 2GB PGIN Path Reads Msgs pty Non-0 only /s /s /s /s % /s /s 0 1 0 .2 .8 .0 .0 .... .0 .6 100 0 .... .1 .0 .0 .0 .... .0 .1 100 1 .... .1 .0 .0 .0 .... .0 .1 100 1 .... .0 .0 .0 .0 .... .0 .0 97 1 .... .0 .0 .0 .0 .... .0 .0 66 0 .... .1 .2 .0 .0 .... .0 .2 100 1 .... .0 .0 .0 .0 .... .0 .0 97 1 .... .0 .0 .0 .0 .... .0 .0
Here is an example (runid E7320ZZ2) of the Performance Toolkit PROCLOG screen showing the utilization of the individual processors and the average utilization by processor type. This example has all three workloads active, the z/OS and z/VM configurations are identical (3 CPs, 1 zAAP, and 1 zIIP), and CPUAFFINITY is OFF. With CPUAFFINITY OFF, z/VM will simulate the virtual zAAP and zIIP on CPs, resulting in 100% utilization plus queuing for the CPs while the zAAP and zIIP are idle. Since CPUAFFINITY defaults to ON, the SET CPUAFFINITY command must be used to create this configuration.
FCX144 Run 2007/06/20 09:50:18 PROCLOG Processor Activity, by Time _______________________________________________________________________ <------ Percent Busy -------> <--- Rates per Sec.---> C Interval P Inst End Time U Type Total User Syst Emul Vect Siml DIAG SIGP SSCH >>Mean>> 0 CP 99.9 99.4 .4 98.2 .... 98.5 18.8 .2 54.2 >>Mean>> 1 CP 99.9 99.6 .3 98.3 .... 94.9 3.5 .3 43.5 >>Mean>> 2 CP 99.9 99.5 .3 98.3 .... 85.8 4.5 .3 41.9 >>Mean>> 3 ZAAP .0 .0 .0 .0 .... .0 .0 .0 1.5 >>Mean>> 4 ZIIP .0 .0 .0 .0 .... .0 .0 .0 5.0 >>Mean>> . CP 99.8 99.5 .3 98.3 .... 93.0 8.9 .2 46.5 >>Mean>> . ZAAP .0 .0 .0 .0 .... .0 .0 .0 1.5 >>Mean>> . ZIIP .0 .0 .0 .0 .... .0 .0 .0 5.0 (Report split for formatting purposes. Ed.) ___________________________________________________________ <--------- PLDV ----------> <------ Paging -------> <Comm> Pct Mean VMDBK VMDBK To Below Fast Page Em- when Mastr Stoln Mastr 2GB PGIN Path Reads Msgs pty Non-0 only /s /s /s /s % /s /s 0 1 0 2.3 .9 .0 .0 .... .0 .7 57 1 .... 2.0 .0 .0 .0 .... .0 .1 57 1 .... 2.0 .0 .0 .0 .... .0 .1 100 0 .... .0 .0 .0 .0 .... .0 .0 100 0 .... .0 .0 .0 .0 .... .0 .0 38 1 .... 2.1 .3 .0 .0 .... .0 .2 100 0 .... .0 .0 .0 .0 .... .0 .0 100 0 .... .0 .0 .0 .0 .... .0 .0
Specialty Engines from a z/VM Guest Perspective
Performance of an individual guest is controlled by the z/VM share setting and the SET CPUAFFINITY command.The share setting for a z/VM guest determines the percentage of available processor resources for the individual guest. The share setting for a virtual machine applies to each pool of the processor types (CP, IFL, zIIP, zAAP). Shares are normalized to the sum of shares for virtual machines in the dispatcher list for each pool of processor type. Since the sum will not necessarily be the same for each processor type, an individual guest could get a different percentage of a real processor for each processor type. The share setting for individual guests is shown in the Performance Toolkit UCONF screen.
Here is an example (Runid E7307ZZ2) of the Performance Toolkit UCONF screen showing the number of processors and the share settings for the ZOS1 virtual machine. It shows the 5 defined processors but does not show the individual processor types ( 3 CPs, 1 zAAP, and 1 zIIP). The relative share of 100 will be applied independently for each processor type and so these 5 virtual processors will not necessarily have the same percentage of a real processor.
FCX226 Run 2007/06/20 09:53:49 UCONF User Configuration Data ____________________________________________________________________________________________________ <-------- Share --------> No Stor Virt Mach Stor % Max. Max. Max. QUICK MDC Size Reserved Userid SVM CPUs Mode Mode Relative Absolute Value/% Share Limit DSP Fair (MB) Pages ZOS1 No 5 EME V=V 100 ... ... .. .. Yes Yes 4096M 0The sum of dispatcher list share setting is shown in the Performance Toolkit SCHEDLOG screen. It is a global value and does not contain any information about the individual processor types.
Here is an example (Runid E7307ZZ2) of the Performance Toolkit SCHEDLOG screen showing 5 virtual processors in the dispatcher list with total share settings of 101. It does not show the individual processor types. Its does show that ZOS1 is generally the only guest in the dispatch list.
FCX145 Run 2007/06/20 09:53:49 SCHEDLOG Scheduler Queue Lengths, by Time ___________________________________________________________________________ Total <-- Users in Dispatch List ---> Lim <- In Eligible List --> Interval VMDBK <- Loading --> it < Loading-> End Time in Q Q0 Q1 Q2 Q3 Q0 Q1 Q2 Q3 Lst E1 E2 E3 E1 E2 E3 >>Mean>> 5.1 5.1 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 (Report split for formatting purposes. Ed.) _____________________________________________________ Class 1 Sum of Sum of <----- Storage (Pages) ------> Elapsed Abs. Rel. Total <----- Total WSS -----> T-Slice Shares Shares Consid Q0 Q1 Q2 Q3 1.133 0% 101 1022k 881k 56 0 0
The overall processor usage for individual guests is shown in the Performance Toolkit USER screen but it does not show individual processor types.
Here is an example (Runid E7307ZZ2) of the Performance Toolkit USER screen showing the ZOS1 guest using slightly more than 4 processors. It does not show the individual processor types ( 3 CPs, 1 zAAP, and 1 zIIP).
FCX112 Run 2007/06/20 09:53:49 USER General User Resource Utilization _________________________________________________________________________________ <----- CPU Load -----> <------ Virtual IO/s ------> <-Seconds-> T/V Userid %CPU TCPU VCPU Ratio Total DASD Avoid Diag98 UR Pg/s User Status ZOS1 403 4914 4853 1.0 173 173 .0 .0 .0 .0 EME,CL0,DISP (Report split for formatting purposes. Ed.) ______________________________________________ <-User Time-> <--Spool--> MDC <--Minutes--> Total Rate Insert Nr of Logged Active Pages SPg/s MDC/s Share Users 20 20 0 .00 ... 100
The Performance Toolkit USER Resource Detail Screen (FCX115) has additional information for a virtual machine but it does not show processor type so no example is included.
For a z/OS guest, RMF data provides number and utilization of CP, zAAP, and zIIP virtual processors. The RMF reporting of data is not affected by the CPUAFFINITY setting but the actual values can be affected, as demonstrated by the next two examples.
Although the Performance Toolkit does not provide any information about the CPUAFFINITY setting, it can be determined from the QUERY CPUAFFINITY command or from a flag in z/VM monitor data z/VM monitor data Domain 4 Record 3.
Here is an example (Runid E7307ZZ2) of the RMF CPU Activity report showing the processor utilization by processor type with all three workloads active, identical z/OS and z/VM configurations (3 CPs, 1 zAAP, and 1 zIIP), and CPUAFFINITY ON.
The RMF reported processor utilization for each processor type matches the z/VM reported utilization for this measurement which is shown as an example in Performance Toolkit data.
C P U A C T I V I T Y z/OS V1R8 SYSTEM ID UNKN DATE 03/07/2007 RPT VERSION V1R8 RMF TIME 21.31.08 CPU 2096 MODEL X03 H/W MODEL S07 ---CPU--- ONLINE TIME LPAR BUSY MVS BUSY CPU SERIAL I/O TOTAL NUM TYPE PERCENTAGE TIME PERC TIME PERC NUMBER INTERRUPT RAT 0 CP 100.00 ---- 99.96 047D9B 0.06 1 CP 100.00 ---- 99.96 047D9B 0.06 2 CP 100.00 ---- 99.96 047D9B 175.8 CP TOTAL/AVERAGE ---- 99.96 176.0 3 AAP 100.00 ---- 97.46 047D9B AAP AVERAGE ---- 97.46 4 IIP 100.00 ---- 9.16 047D9B IIP AVERAGE ---- 9.16
Here is an example (Runid E7320ZZ2) of the RMF CPU Activity report showing processor utilization by processor type with all three workloads active, identical z/OS and z/VM configurations (3 CPs, 1 zAAP, and 1 zIIP), and CPUAFFINITY OFF.
With CPUAFFINITY OFF, z/VM will not use the real zAAP or zIIP but simulate them on a CP, thus the five virtual processors need to be dispatched on the three z/VM CPs. The RMF reported zIIP utilization is much higher than our workload is capable of generating and demonstrates the need to balance the LPAR, z/VM, and z/OS configuration. With CPUAFFINITY OFF, z/OS will report the zIIP as busy when it is queued for a processor by z/VM.
The RMF reported processor utilization for each processor type does not match the z/VM reported utilization for this measurement, which is shown as an example in Performance Toolkit data.
C P U A C T I V I T Y z/OS V1R8 SYSTEM ID UNKN DATE 03/20/2007 RPT VERSION V1R8 RMF TIME 16.31.22 CPU 2096 MODEL X03 H/W MODEL S07 ---CPU--- ONLINE TIME LPAR BUSY MVS BUSY CPU SERIAL I/O TOTAL NUM TYPE PERCENTAGE TIME PERC TIME PERC NUMBER INTERRUPT RATE 0 CP 100.00 ---- 99.95 047D9B 0.03 1 CP 100.00 ---- 99.96 047D9B 0.03 2 CP 100.00 ---- 99.96 047D9B 124.4 CP TOTAL/AVERAGE ---- 99.96 124.5 3 AAP 100.00 ---- 99.57 047D9B AAP AVERAGE ---- 99.57 4 IIP 100.00 ---- 29.58 047D9B IIP AVERAGE ---- 29.58 (Report split for formatting purposes. Ed.) INTERVAL 19.59.939 CYCLE 1.000 SECONDS % I/O INTERRUPTS HANDLED VIA TPI 0.00 4.88 5.15 5.15
It will be more difficult to correlate the z/OS data and the z/VM data when multiple guests have specialty engines and different share values.