z/VM Support - System Preparation and Best Practices

Last Updated: 22 February 2022

This page contains information to help customers prepare their z/VM system for optimal use. It also contains best practices information that will enable customers to properly monitor and maintain their systems on an ongoing basis.

This information is divided into logical sections:

Security
Disk - Auxiliary Storage Configuration and Management (FICON DASD and FCP SCSI)
Networking
System Backups and Disaster Recovery (DR)
Paging Space
Spooling Space
System Performance
Guest Virtual Machine Monitoring
Understanding and Using the SYSTEM CONFIG File
Software Maintenance
Communication
Prepare for z/VM Failures
OPERATOR's Console
Support Process

Security

Preparation:

Implement an External Security Manager (ESM) on your system. This is the most important thing you can do for the security of your system. IBM recommends the RACF Security Server feature of z/VM. Other ESMs are available from Independent Software Vendors (ISVs).
Be aware of any vulnerabilities that could adversely affect your system security. IBM Z does not publicly disclose whether its hardware or software are or are not impacted by any particular vulnerability. We recommend that you register for the IBM Z Security Portal; all communications for IBM Z and IBM LinuxONE around potential security vulnerabilities are published in that space. The information from the Portal is not shared in support cases.
- For more information about IBM Z system integrity, and registration instructions for the IBM Z Security Portal,
  visit - https://www.ibm.com/it-infrastructure/z/capabilities/system-integrity
  Once registered, access to the Portal can be found on the Resource Link site.
- Please refer to the Communication section of this page for information on how to access Resource Link.

Best Practices:

Conform to all of your company's security policies for computing
Execute consistent and common sense security practices, including
- Change default installation passwords for all components and ISV products
- Make sure passwords comply with any corporate security policies
- Audit the activities of privileged system administrators
- Use role-based access rights to simplify administration
Use multi-factor authentication to access privileged system administrator IDs. IBM recommends IBM Z Multi-Factor Authentication

Disk - Auxiliary Storage Configuration and Management (FICON DASD and FCP SCSI)

Preparation:

Plan allocation across logical partition(s). For example, consider standards on size and device numbers

Best Practices:

Avoid using duplicate volume serials (VOLSERs) unless absolutely necessary
Use a naming convention for VOLSERs.

Networking

Preparation:

Implement built-in failover
Virtual Switch (VSWITCH) and VSWITCH with Link Aggregation preferred for availability characteristics.

Best Practices:

Ongoing open communication with your network team

System Backups and Disaster Recovery (DR)

Preparation:

Develop and periodically review your DR strategy
Identify all resources that require backup. Exclude resources that do not require backup (Example: Paging volumes).
Identify backup programs to be used. Examples:
- IBM Backup and Restore Manager for z/VM
- IBM Tape Manager for z/VM
- DASD Dump Restore (DDR). Part of z/VM
- ISV applications
Make testing of DR process and backups a normal part of your systems maintenance schedule
Special Single Sytem Image (SSI) consideration:
- If you use backups (meaning that your DR site is sourced from a production backup from some time ago), you will need to use the CLEARPDR IPL parameter on the Stand Alone Program Loader (SAPL) panel to IPL the first DR SSI member. Many customers use this method, and IPL only one of the SSI members on their DR site.

Best Practices:

IMPORTANT! Test your backup and restore process and data recovery integrity
Use SPXTAPE or ISV equivalent to backup spool files and z/VM system data files

Paging Space

Preparation:

Paging space is user and system pages on External Disk Storage. Paging space tends to grow over time due to factors such as increased workload. Running out of paging space will lead to an outage in the form of a PGT004 abend. Consult the CP Planning and Administration manual to help you estimate how much space you will need.

Best Practices:

Periodically issue CP Q ALLOC PAGE to monitor paging space utilization
Install other software products to monitor page space and other system functions
- IBM Operations Manager for z/VM
- IBM Tivoli OMEGAMON XE
- Other ISV products
- VIR2REAL package on the z/VM download page can help monitor this.

Spooling Space

Preparation:

Spooling space is required for normal system operation. Functions that require spool space include dumps, console files, NSS files, etc. Spooling space utilization tends to grow over time and must be monitored.

Best Practices:

Periodically issue CP Q ALLOC SPOOL to monitor spooling space utilization
Utilize CMS utility SFPURGER
Utilize z/VM download packages such as SPOOLPIG, which can be found at https://www.vm.ibm.com/download/packages/
Install other software products to monitor spool space and other system functions
- IBM Operations Manager for z/VM
- Other ISV products

System Performance

Preparation:

If you open up a z/VM case with a performance-related problem, it is very likely that you will be asked to send in MONWRITE data, both from a period in which your system/guest was performing well, and from a period of poor performance. As a result, you should:
- Setup collection of MONWRITE data: https://www.vm.ibm.com/devpages/bkw/monsimp.html
- If you want automation so that z/VM continues to collect data and save the most recent files, see the MONCLEAN package: https://www.vm.ibm.com/download/packages/ for one example of how to do that.
Be aware of and prepared for periodic workload spikes (for example, weekly or quarterly processing).

Best Practices:

Install software products to monitor performance:
- IBM Performance Toolkit
- IBM Tivoli OMEGAMON XE on z/VM and Linux
- Other ISV products

Guest Virtual Machine Monitoring

Preparation:

Identify, and be familiar with the functions of, all guest systems running on your z/VM host system
- Linux guests
- Other operating system guests such as z/OS, z/VSE, z/TPF, etc
- Other CMS guests
  - IBM guests (RSCS, PVM, TCPIP, etc.)
  - Guests from ISV products

Best Practices:

Monitor guest logs as appropriate

z/VM console logs
Other guest-specific log data collected by the guest

Archive critical guest logs. For example, a guest event that was logged days ago might be needed to debug a guest problem that occurred today.
Install IBM Operations Manager for z/VM to save and manager guest consoles and logs.
- VIEWCON tool allows for real time viewing of events that make management easier

Understanding and Using the SYSTEM CONFIG File

Preparation:

Defines your z/VM host system configuration
Resides on PMAINT CF0 minidisk

Best Practices:

Develop a consistent process for changing it and backing it up
Have an additional backup stored in a place outside of the z/VM system that you can access in an emergency
File updates must be checked!
- Use the CPSYNTAX EXEC (on MAINT 193) to check for errors. Even an improperly coded comment could prevent your system from IPLing next time!
- If possible, have a colleague peer review your changes. Even if the syntax is right, you may have injected other problems
Note that changes do not take effect until the next system IPL. An error may not be discovered for months.

Software Maintenance

Preparation:

Develop a strategy to apply z/VM maintenance at least twice a year
Recommended Service Upgrades (RSUs) are made available about every 6 months during the early part of a z/VM release's life cycle. See https://www.vm.ibm.com/service/rsu/rsuplan.HTML for RSU plan.
Pay attention to HIPER and Red Alert APARs (http://www.vm.ibm.com/service/redalert/) that might not be on an RSU yet.

Best Practices:

Apply RSUs regularly and HIPER / Red Alert APARs as needed
Setup a test LPAR or second level z/VM system to apply service to, to allow you to test the service before applying it to production

Communication

Preparation:

Know who your end users are and what they are doing on the system/guests
Be engaged with your users so that you can prepare for user requirements and workload changes.
Be engaged with colleagues from other companies
- Bug/process fixes ("Yeah that happened to us and here's how we fixed it")
- Common experiences, successes, and failures
- If possible, attend customer conferences (VM Workshop, SHARE, etc.)

Best Practices:

Communication tools
- Various social media
- VM Community: https://www.vm.ibm.com/techinfo/forums.html
- List Servers, especially IBMVM and LINUX-390
  - Available 24 / 7 / 365
  - Relatively low traffic, low spam, advice from experienced system programmers.
  - Great for finding friendly, helpful, lasting contacts ("birds of a feather")
- IBM Resource Link: https://www-01.ibm.com/servers/resourcelink/svc03100.nsf?OpenDatabase

Prepare for z/VM Failures

Preparation:

z/VM CP is very stable but you must be prepared in case of failure.
- CP Abend
- System or user hangs where an HMC PSW Restart must be taken
Dump space is part of spool space and is allocated at system IPL time
Dumps are processed on the OPERATNS user (default). However, as shipped the A-disk for OPERATNS is usually too small for processing a dump. You should ensure that OPERATNS has a sufficiently large read-write disk to contain the processed dump.
Dedicated dump volume(s) are highly recommended (DUMP option on CP_Owned statement in SYSTEM CONFIG)
Failures in z/VM guests may require virtual machine dumps for debugging

Best Practices:

At a minimum, issue CP QUERY DUMP periodically to make sure you have dump space available, best is to use automation through products like Operations Manager for z/VM to check and alert if there is insufficient z/VM system dump space. For more information on estimating and validating dump space requirements, see the video below:
- How to estimate and monitor z/VM dump space
Setup dedicated dump volume(s) by using the DUMP option on the CP_Owned statement in SYSTEM CONFIG (IPL required)
Virtual machine dumps:
- For zLinux guests, consult your zLinux support team for instructions on getting a dump
- For other guests, support may request that you create a dump with the CP VMDUMP command, which creates a dump of the given virtual machine.
CP SNAPDUMPS:
- While a SNAPDUMP allows a z/VM system to stay up during the dump collection, it does delay processing which may result in a network or heartbeat to time out. The delay is proportional to the amount of data being dumped. There may be a temporary loss or degradation of connectivity across other LPARs on the same CEC when taking a large SNAPDUMP on an LPAR that is a member of Multi-VSwitch LAG group and contains the Active LAG Port Controller. Note that using either or both FRMTBL YES and PGMBKS ALL operands will likely result in the dump being large enough to cause a temporary disruption of connectivity. IBM recommends using the default SNAPDUMP operands unless z/VM support personnel instruct you to do otherwise in order to diagnose a specific problem.
- To prevent a temporary connectivity loss from occurring in a Multi-VSwitch LAG group environment when using SNAPDUMP with either or both PGMBKS ALL and FRMTBL YES operands, issue the QUERY VSWITCH command to determine if the LPAR contains the Active LAG Port Controller. If the LPAR does contain the Active LAG Port Controller, issue the following commands in order to preserve connectivity across LPARs on the same CEC:
  
  1) SET VSWITCH <switchname> UPLINK DISCONNECT
  2) SNAPDUMP PGMBKS ALL FRMTBL YES
  3) SET VSWITCH <switchname> UPLINK CONNECT
  
  The above commands will force the shared OSA CHPIDs to select a standby instance to become the active instance momentarily while the SNAPDUMP is issued, and then resume connectivity locally once the SNAPDUMP is completed. See the figure below for an illustration of this.
  
  Global z/VM Virtual Switch

OPERATOR's Console

Preparation:

The default z/VM system operator is the user OPERATOR. Though this can be changed in the configuration or if the OPERATOR is logged off. You can validate the current system operator by issuing the CP command QUERY SYSOPER.
By default, user OPERATOR's console is spooled to their virtual printer. That is, commands issued at command line and all messages will be captured.
The Operator's console contains a sequential record of system activity and error messages that are extremely valuable for debugging. z/VM support will ask for this often.
If you are using the IBM Operations Manager for z/VM application, you can specify the userids (perhaps in addition to OPERATOR) for which you would like to have console data collected and written to disk in real-time. For instructions on doing that, refer to Chapter 2 Step 4 of the IBM Operations Manager for z/VM Administration Guide.

Best Practices:

At a minimum, manually save and archive on a regular basis
Use an automated program to save and archive on a regular basis
- IBM Operations Manager for z/VM
- Other ISV products

Support Process

Preparation:

Be familiar with the support process and having the appropriate information needed for opening an IBM support case. For example:
- Your IBMid and password
See https://www.ibm.com/mysupport/

Best Practices:

Since you may be asked to describe configuration of the system and/or network, having a diagram or write up of that available before you hit a problem can be helpful.
Have a change management process that allows you to describe recent changes.