Brian's z/VM Performance Best Practices
Last revised: 2022-05-13, BKW
In my role as a performance specialist I often look at MONWRITE data from lots of different places. What I have seen has driven me to put together a list of best practices on a number of topics. Here are some tips.
This is a living document, so you might like to click on the Notify me link in the left-nav area.
Goals and Measurements
- Define performance success metrics for your workload and set some thresholds for what constitutes "good enough" behavior. For example, maybe transaction response time is important to you and you need transaction time to be less than some threshold T.
- Routinely measure the success metrics you defined. Track how you are doing against the thresholds you selected.
- Routinely collect and archive MONWRITE data including CPU MF counters. For help on MONWRITE, look here. For help on the CPU MF counters, look here. For a package that can collect MONWRITE data for you, look here.
- Routinely collect and archive performance data collected inside the guests you are running. For example, for Linux guests, routinely collect and archive sar data.
Reporting Problems
- When you report a problem, include measurement data from all layers in the software stack, collected concurrently. For example, if you experience a CPU steal-time problem in a Linux guest, when you report the problem to IBM you will need to send along the sar data from the Linux guest and the MONWRITE data from the z/VM system, both collected at the moment the incident happened.
- Generally, sending formatted reports such as Perfkit screens is not useful to the IBM analysts who are going to look at your problem. The analysts generally need the "raw data" for their layer, packaged in a means native to their layer. For the z/VM layer, this generally means a MONWRITE file packaged with COPYFILE ( PACK or packaged into a VMARC archive file.
Upgrades
For best practices around an upgrade, visit here.
Partitions
A lot of this is drawn from my presentation Topics in LPAR Performance.
- If you can get away with it, of course, use a dedicated LPAR. But most clients can't get away with this, because they need those physical cores' capacity to be usable by other LPARs when the LPAR of interest is quiet.
- For shared LPARs, learn what entitlements are, learn how to calculate entitlements, and learn how to compare entitlements to logical core counts.
-
For shared LPARs, make sure you harmonize entitlement
and logical core count.
Perfkit's FCX306 LSHARACT report helps a lot with this. What we are looking
for here is for the partition to have just the right number of logical
cores to absorb its entitlement, plus 1-2 as VL (vertical-low) cores.
Why is this vital?
- Partitions having too few logical cores for their entitlement deprive other partitions of entitled logical cores.
- Partitions having too many logical cores for their entitlement risk running too much of their work on VLs, which do not fare all that well in the world of PR/SM dispatching.
- Make sure every LPAR has enough entitlement to get its work done on entitled power most of the time. FCX306 LSHARACT can help with this too. If you routinely see column "Excess" nonzero, your LPAR is routinely trying to get its work done on unentitled power and its behavior will be correspondingly and routinely erratic.
- Do not be afraid to change LPAR weights and logical core counts as your situation changes.
z/VM Control Program
System Settings
- Vertical mode: be sure to run vertically. On all modern z/VM releases, vertical mode is the default.
- SMT: run with SMT-2 if your workload can tolerate the single-CPU speed. Running SMT-2 lets you get the most out of every physical core.
- Unparking: use the MEDIUM unparking model. This shuts off unnecessary VLs. In the system configuration file, code SRM UNPARKING MEDIUM. Or, in your startup processing, code CP SET SRM UNPARKING MEDIUM. On z/VM 7.2 and later, MEDIUM is the default.
- Cores vs. processors: Learn and remember the difference between logical cores and logical processors. In non-SMT and SMT-1 deployments the difference is esoteric at best. But in SMT-2 deployments the difference is very real. Know the concepts and use the words correctly.
Processors
- Configuration: make your system ready for logical processors to be added or removed. In the image activation profile, code some standby processors. Recognize that if the z/VM system is running SMT-2, these standby entities are actually cores.
- Number: spreading a given workload over too many logical processors contributes to serialization overhead in the Control Program. Think carefully about your configuration. Use no more logical processors than you really need. But also remember that if you pack work too tightly onto logical processors, response time will suffer.
Central Memory
- Configuration: make your system ready for storage to be added or removed. In the image activation profile, code some standby storage. In the system configuration file, code some reconfigurable storage.
- Amount: tendency of the system to page is not generally a problem as long as the workload's success thresholds are being satisfied. Thus we do not make any recommendation here about "memory overcommitment ratio", nor do we even attempt here to define it.
Paging
For best practices for paging, refer here. The information there is very complete.
Networking
- If you have many guests each with light networking requirements, use vswitches to provide networking to those guests. Give each guest a VNIC and couple the VNIC to some vswitch.
- If you have a guest that has extraordinary need for networking, dedicate some OSA rdevs to that guest. You probably don't have to dedicate the whole OSA chpid.
- If you are using vswitch, consider link aggregation, for its capacity and failover advantages.
- To provide networking between partitions on the same CEC, one often thinks of HiperSockets. We generally prefer OSA adapters for such jobs, because doing so offloads networking work to a very capable coprocessor (the OSA chpid), leaving precious central processor power available to do other work. Also, when one uses an OSA, networking behavior does not become erratic as central processor utilization climbs, as it can tend to do for HiperSockets.
Classic I/O
- Remember to take advantage of zHPF (High Performance FICON). Configure guest operating systems or their applications to exploit it.
- Remember to take advantage of HyperPAV aliases. In each LCU define some device numbers as HyperPAV aliases. Attach some of those aliases to SYSTEM so that CP can use them to parallelize I/O to guest minidisks and to CP paging packs.
- Remember to take advantage of multiple channel paths per control unit. This increases both reliability and performance.
- For volumes holding minidisks, remember to take advantage of Minidisk Cache (MDC).
FCP-SCSI EDEVs
- For each EDEV, configure multiple FCP paths. This gives CP a failover option.
- For a given EDEV, put its multiple FCP paths on distinct FCP chpids. This gives CP a failover option.
ISFC Links
Inter-System Facility for Communication, or ISFC, is the transport that carries guest relocations. ISFC also carries APPC/VM traffic, such as the traffic that flows in support of operation of the CMS Shared File System.
- An ISFC logical link connects two adjacent z/VM systems.
- An ISFC logical link is made up of a number of real CTC connections ganged together into one large pipe.
- A single real CTC connection is one CTC device number on system A providing a half-duplex pipe to one CTC device number on system B.
- At the very least the ISFC logical link should consist of two CTC connections on one CTC chpid. This lets CP avoid dealing with the half-duplex nature of a real CTC connection, but it offers no fault tolerance.
- A maximally configured ISFC logical link generally consists of sixteen CTC connections hosted on eight CTC chpids. Each CTC chpid would host two CTC device numbers.
- Use an ISFC logical link configuration that meets your performance requirements, as measured by guest relocation time.
Configuration of Guests
- For the system as a whole, the ratio of total number of guest virtual processors to total number of host logical processors, generally known as CPU overcommitment, is not usually a relevant factor as long as the workload's success thresholds are being met.
- Give each guest no more virtual processors than it can keep reasonably busy. Having excess virtual processors in a guest causes excess serialization overhead in that guest which in turn drives excess simulation, serialization, and dispatch overhead in the Control Program.
-
Generally, give a guest no more virtual processors than the underlying
system has entitled logical processors. For example, if the z/VM system
has 16 entitled
logical processors, give a guest no more than 16 virtual processors.
- Remember the first key word here is entitled. If a system has 16 VH logical processors, 2 VM logical processors, and 12 VL logical processors, give the guest certainly no more than 18 virtual processors and probably no more than 16. Logical processor counts are available on FCX304 PRCLOG.
- Remember the second key word here is processors. If you are running SMT-2 your partition has logical cores and logical processors. We are talking here about processors.
- For each guest, do not overconfigure guest-real memory. Think carefully about the needs of the workload running in the guest and consider that to be your starting point for the size of the guest. Remember that many operating systems we run in guests of z/VM are written with the assumption they are running on physical hardware and so from their point of view it is better to use a frame of memory for something than it is to leave the frame unused. But that approach can be disastrous in a virtualized world. The underlying hypervisor has to manage all those dubiously useful guest pages and the corresponding management overhead will be apparent.
- Give the guest I/O consistent with your balance of needs between performance and manageability. Giving the guest an FCP device and some SCSI LUNs might be the highest-performing alternative but it might not be the most manageable. There are many choices here: FCP with SCSI LUNs, dedicated ECKD, minidisks on EDEVs, and so on.
- Give the guest networking consistent with its networking requirements. If it has fairly mild networking requirements, give it a VNIC coupled to some vswitch. If the guest needs substantial networking, maybe the guest needs a dedicated OSA.
- If the guest is Linux, give it two swap extents, one on each of two VDISKs.