An Unparking Example

(Last revised: 2018-05-18, BKW)

In a recent exchange with the field I explained z/VM's unparking habits in the context of an SMT-2, all-IFL LPAR. I feel the explanation is reusable (those of you who know my history know I like reusable things) so I am posting the explanation here.

(Note: before continuing, you might like to read the SMT vocabulary article.)

A little context will help. The LPAR in question has 11 logical IFL cores. The LPAR's entitlement is 905%, so it has eight Vh logical cores, two Vm logical cores, and one Vl logical core. It is running SMT-2 so it has 22 logical CPUs. The unparking heuristic is UNPARKING SMALL. And last, GPD is on.

Under the above conditions, z/VM watches both the LPAR's core utilization and the CPC's unentitled power availability. UNPARKING SMALL gives us the following rules:

z/VM sets into unparked state only the logical cores needed to cover the logical core utilization ceiling projection plus CPUPAD,
... with the extra proviso that z/VM will unpark a vertical-low logical core only if it detects PR/SM will power the vertical-low.

The points of confusion were:

The difference between logical cores and logical processors, and
The difference between core utilization and CPU utilization, and
The failure to realize that it's core utilization that z/VM examines in making unparking decisions.

And now, the explanation.

Hi Xxxxx,

SMALL unparking is done based on core utilization, not CPU utilization. In SMT-2 there is a difference between those two. For background you can read our article http://www.vm.ibm.com/perf/tips/smtutil.html.

CPUPAD is a safety margin added to the projected ceiling of total core utilization when deciding how many logical cores to set into unparked state.

Here are some highlights and then some explanation of what is going on in your LPAR.

The first difference is that the phrase "an IFL" is ambiguous. We can talk about IFL cores or we can talk about IFL CPUs. In SMT-2 each logical IFL core contains two logical IFL CPUs. I will strongly suggest you no longer say "an IFL" because that term is undefined in SMT-2. I will strongly suggest you say instead "an IFL core" or "an IFL CPU".
The entity of dispatch for PR/SM is the logical core. PR/SM dispatches logical cores on the machine's physical cores.
For a given logical core, the percent of time the logical core is in dispatch on a physical core is called the logical core's core utilization.
A given LPAR has a total core utilization for each type of core defined in the LPAR. For example, if we have LPAR FRED with IFL cores 0, 1, and 2 with core utilizations 35%, 55%, and 20% respectively, LPAR FRED's total IFL core utilization is 35+55+20 = 110%.
Perfkit FCX126 LPAR, FCX306 LSHARACT, and FCX202 LPARLOG tabulate logical core utilization.
In a similar way, for a given physical core, the percent of time the physical core has a logical core dispatched upon it is called the physical core's core utilization.
On the CPC, a given physical type-pool has a total core utilization equal to the sum of the core utilizations of the physical cores making up the type pool.
Perfkit FCX302 PHYSLOG tabulates physical core utilization.
During the time a logical core is in dispatch, its two logical CPUs can move in and out of wait state independently. Thus CPU utilization and core utilization are only loosely related. Suppose we have logical core 6 with logical CPUs 12 and 13. Suppose logical core 6 is in dispatch on a physical core 85% of the time. Suppose also that during that same period, CPU 12 runs 71% busy and CPU 13 runs 55% busy. The core utilization for logical core 6 is 85%, the CPU utilization for CPU 12 is 71%, and the CPU utilization for CPU 13 is 55%.
Perfkit FCX225 SYSSUMLG, FCX100 CPU, FCX144 PROCLOG, and FCX304 PRCLOG tabulate logical CPU utilization.
You can see now that in SMT-2, just because an LPAR is completely logical-core-busy (e.g., three IFL logical cores with 300% core utilization), that doesn't mean the LPAR is out of capacity. The logical IFL CPUs might still be loading wait states.
You can also see now that in SMT-2, just because FCX302 PHYSLOG reports the CPC's shared type-pool is completely physical-core-busy, that doesn't mean the physical type pool is out of capacity. For example, even though we have 11 IFL shared physical cores running 1100% busy, the logical IFL CPUs of the SMT-2 LPARs might still be loading wait states.
Refer now to file p1120006.$d5r16, posted below. You can open the file with Notepad.
Per the posted file, your LPAR has 11 logical cores, so its core utilization can run from 0 to 1100%.
Your LPAR's logical cores break out this way: there are eight vertical-high (Vh) logical cores, two vertical-medium (Vm) logical cores, and one vertical-low (Vl) logical core.
z/VM does parking and unparking on a logical core basis. The entity we park is the logical core. The entity we unpark is the logical core. The reason is this: the purpose of doing parking is to try to decrease the number of dispatchable units PR/SM has to juggle. PR/SM dispatches logical cores, so we park and unpark logical cores.
The metric we use to decide how many cores to put into unparked state is core utilization.
Per the posted file,
- In column U-Last we can see that your LPAR runs with a variety of total core utilizations in the range of 333% to 989%, mean 589%. This has almost nothing to do with the CPU utilization on the 22 logical IFL CPUs.
- In column U-Next we can see z/VM is forming a statistically derived projected ceiling for total core utilization U-Next. The projected ceiling is calculated by doing some math on the last ten samples of U-Last.
- In column H-Pad we see your CPUPAD value.
- In column C-Need we see the sum U-Next + H-Pad. The sum C-Need is the needed core capacity z/VM is going to configure for the LPAR by parking or unparking logical cores to match.
- In column WkCt we see the number of logical cores z/VM decided to have in unparked state for the next interval, so as to be able to achieve needed core capacity C-Need.
- In the first data row we see U-Last 815.38, U-Next 840.76, H-Pad 50.0, C-Need 890.76, and WkCt 9. All of this is absolutely correct for UNPARKING SMALL.
None of this has anything to do with CPU utilization as reported in FCX144 PROCLOG or FCX304 PRCLOG. The reason is because while a given logical core is in dispatch, its logical CPUs move in and out of wait state independently.

Here is a log of logical-core-busy for your LPAR from your MONWRITE file P1120006 MONDATA. In the log you can see evidence that z/VM is trying hard to run the partition's workload on only the vertical-high logical cores, aka cores 0-7. The vertical-medium logical cores (cores 8 and 9) and the vertical-low logical core (core 10) have significantly less core utilization. You can also see that logical core 7, the last vertical-high, has less core utilization than have the other seven vertical-highs, cores 0-6. In other words, z/VM is even parking vertical-high logical core 7 when it feels it can do so safely.

If you want z/VM to spread into the entitled cores 7-9 a little more, you could try SET SRM UNPARKING MEDIUM. This will cause z/VM to:

keep all the Vh and Vm logical cores unparked all the time, and
unpark the Vl logical cores only when:
1. the projected ceiling on core utilization suggests they will be needed, and
2. the projected floor on the LPAR's expected take from the the CPC's unentitled capacity suggests they will be powered.

Regards,
Brian