Last Updated: 22 February 2022


This page contains information to help customers prepare their z/VM system for optimal use. It also contains best practices information that will enable customers to properly monitor and maintain their systems on an ongoing basis.

This information is divided into logical sections:


Security

Preparation:
  • Implement an External Security Manager (ESM) on your system. This is the most important thing you can do for the security of your system. IBM recommends the RACF Security Server feature of z/VM. Other ESMs are available from Independent Software Vendors (ISVs).
  • Be aware of any vulnerabilities that could adversely affect your system security. IBM Z does not publicly disclose whether its hardware or software are or are not impacted by any particular vulnerability. We recommend that you register for the IBM Z Security Portal; all communications for IBM Z and IBM LinuxONE around potential security vulnerabilities are published in that space. The information from the Portal is not shared in support cases.
Best Practices:
  • Conform to all of your company's security policies for computing
  • Execute consistent and common sense security practices, including
    • Change default installation passwords for all components and ISV products
    • Make sure passwords comply with any corporate security policies
    • Audit the activities of privileged system administrators
    • Use role-based access rights to simplify administration
  • Use multi-factor authentication to access privileged system administrator IDs. IBM recommends IBM Z Multi-Factor Authentication

 

Disk - Auxiliary Storage Configuration and Management (FICON DASD and FCP SCSI)

Preparation:
  • Plan allocation across logical partition(s). For example, consider standards on size and device numbers
Best Practices:
  • Avoid using duplicate volume serials (VOLSERs) unless absolutely necessary
  • Use a naming convention for VOLSERs.

 

Networking

Preparation:
  • Implement built-in failover
  • Virtual Switch (VSWITCH) and VSWITCH with Link Aggregation preferred for availability characteristics.
Best Practices:
  • Ongoing open communication with your network team

 

System Backups and Disaster Recovery (DR)

Preparation:
  • Develop and periodically review your DR strategy
  • Identify all resources that require backup. Exclude resources that do not require backup (Example: Paging volumes).
  • Identify backup programs to be used. Examples:
    • IBM Backup and Restore Manager for z/VM
    • IBM Tape Manager for z/VM
    • DASD Dump Restore (DDR). Part of z/VM
    • ISV applications
  • Make testing of DR process and backups a normal part of your systems maintenance schedule
  • Special Single Sytem Image (SSI) consideration:
    • If you use backups (meaning that your DR site is sourced from a production backup from some time ago), you will need to use the CLEARPDR IPL parameter on the Stand Alone Program Loader (SAPL) panel to IPL the first DR SSI member. Many customers use this method, and IPL only one of the SSI members on their DR site.
Best Practices:
  • IMPORTANT! Test your backup and restore process and data recovery integrity
  • Use SPXTAPE or ISV equivalent to backup spool files and z/VM system data files

 

Paging Space

Preparation:
  • Paging space is user and system pages on External Disk Storage. Paging space tends to grow over time due to factors such as increased workload. Running out of paging space will lead to an outage in the form of a PGT004 abend. Consult the CP Planning and Administration manual to help you estimate how much space you will need.
Best Practices:
  • Periodically issue CP Q ALLOC PAGE to monitor paging space utilization
  • Install other software products to monitor page space and other system functions
    • IBM Operations Manager for z/VM
    • IBM Tivoli OMEGAMON XE
    • Other ISV products
    • VIR2REAL package on the z/VM download page can help monitor this.

 

Spooling Space

Preparation:
  • Spooling space is required for normal system operation. Functions that require spool space include dumps, console files, NSS files, etc. Spooling space utilization tends to grow over time and must be monitored.
Best Practices:
  • Periodically issue CP Q ALLOC SPOOL to monitor spooling space utilization
  • Utilize CMS utility SFPURGER
  • Utilize z/VM download packages such as SPOOLPIG, which can be found at https://www.vm.ibm.com/download/packages/
  • Install other software products to monitor spool space and other system functions
    • IBM Operations Manager for z/VM
    • Other ISV products

 

System Performance

Preparation:
  • If you open up a z/VM case with a performance-related problem, it is very likely that you will be asked to send in MONWRITE data, both from a period in which your system/guest was performing well, and from a period of poor performance. As a result, you should:
  • Be aware of and prepared for periodic workload spikes (for example, weekly or quarterly processing).
Best Practices:
  • Install software products to monitor performance:
    • IBM Performance Toolkit
    • IBM Tivoli OMEGAMON XE on z/VM and Linux
    • Other ISV products

 

Guest Virtual Machine Monitoring

Preparation:
  • Identify, and be familiar with the functions of, all guest systems running on your z/VM host system
    • Linux guests
    • Other operating system guests such as z/OS, z/VSE, z/TPF, etc
    • Other CMS guests
      • IBM guests (RSCS, PVM, TCPIP, etc.)
      • Guests from ISV products
Best Practices:
  • Monitor guest logs as appropriate
    • z/VM console logs
    • Other guest-specific log data collected by the guest
  • Archive critical guest logs. For example, a guest event that was logged days ago might be needed to debug a guest problem that occurred today.
  • Install IBM Operations Manager for z/VM to save and manager guest consoles and logs.
    • VIEWCON tool allows for real time viewing of events that make management easier

 

Understanding and Using the SYSTEM CONFIG File

Preparation:
  • Defines your z/VM host system configuration
  • Resides on PMAINT CF0 minidisk
Best Practices:
  • Develop a consistent process for changing it and backing it up
  • Have an additional backup stored in a place outside of the z/VM system that you can access in an emergency
  • File updates must be checked!
    • Use the CPSYNTAX EXEC (on MAINT 193) to check for errors. Even an improperly coded comment could prevent your system from IPLing next time!
    • If possible, have a colleague peer review your changes. Even if the syntax is right, you may have injected other problems
  • Note that changes do not take effect until the next system IPL. An error may not be discovered for months.

 

Software Maintenance

Preparation:
Best Practices:
  • Apply RSUs regularly and HIPER / Red Alert APARs as needed
  • Setup a test LPAR or second level z/VM system to apply service to, to allow you to test the service before applying it to production

 

Communication

Preparation:
  • Know who your end users are and what they are doing on the system/guests
  • Be engaged with your users so that you can prepare for user requirements and workload changes.
  • Be engaged with colleagues from other companies
    • Bug/process fixes ("Yeah that happened to us and here's how we fixed it")
    • Common experiences, successes, and failures
    • If possible, attend customer conferences (VM Workshop, SHARE, etc.)
Best Practices:

 

Prepare for z/VM Failures

Preparation:
  • z/VM CP is very stable but you must be prepared in case of failure.
    • CP Abend
    • System or user hangs where an HMC PSW Restart must be taken
  • Dump space is part of spool space and is allocated at system IPL time
  • Dumps are processed on the OPERATNS user (default). However, as shipped the A-disk for OPERATNS is usually too small for processing a dump. You should ensure that OPERATNS has a sufficiently large read-write disk to contain the processed dump.
  • Dedicated dump volume(s) are highly recommended (DUMP option on CP_Owned statement in SYSTEM CONFIG)
  • Failures in z/VM guests may require virtual machine dumps for debugging
Best Practices:
  • At a minimum, issue CP QUERY DUMP periodically to make sure you have dump space available, best is to use automation through products like Operations Manager for z/VM to check and alert if there is insufficient z/VM system dump space. For more information on estimating and validating dump space requirements, see the video below:
  • Setup dedicated dump volume(s) by using the DUMP option on the CP_Owned statement in SYSTEM CONFIG (IPL required)
  • Virtual machine dumps:
    • For zLinux guests, consult your zLinux support team for instructions on getting a dump
    • For other guests, support may request that you create a dump with the CP VMDUMP command, which creates a dump of the given virtual machine.
  • CP SNAPDUMPS:
    • While a SNAPDUMP allows a z/VM system to stay up during the dump collection, it does delay processing which may result in a network or heartbeat to time out. The delay is proportional to the amount of data being dumped. There may be a temporary loss or degradation of connectivity across other LPARs on the same CEC when taking a large SNAPDUMP on an LPAR that is a member of Multi-VSwitch LAG group and contains the Active LAG Port Controller. Note that using either or both FRMTBL YES and PGMBKS ALL operands will likely result in the dump being large enough to cause a temporary disruption of connectivity. IBM recommends using the default SNAPDUMP operands unless z/VM support personnel instruct you to do otherwise in order to diagnose a specific problem.
    • To prevent a temporary connectivity loss from occurring in a Multi-VSwitch LAG group environment when using SNAPDUMP with either or both PGMBKS ALL and FRMTBL YES operands, issue the QUERY VSWITCH command to determine if the LPAR contains the Active LAG Port Controller. If the LPAR does contain the Active LAG Port Controller, issue the following commands in order to preserve connectivity across LPARs on the same CEC:
       
      1) SET VSWITCH <switchname> UPLINK DISCONNECT
      2) SNAPDUMP PGMBKS ALL FRMTBL YES
      3) SET VSWITCH <switchname> UPLINK CONNECT
       
      The above commands will force the shared OSA CHPIDs to select a standby instance to become the active instance momentarily while the SNAPDUMP is issued, and then resume connectivity locally once the SNAPDUMP is completed. See the figure below for an illustration of this.
       
      Global z/VM Virtual Switch
      Image depicting Global z/VM Virtual Switch traffic

 

OPERATOR's Console

Preparation:
  • The default z/VM system operator is the user OPERATOR. Though this can be changed in the configuration or if the OPERATOR is logged off. You can validate the current system operator by issuing the CP command QUERY SYSOPER.
  • By default, user OPERATOR's console is spooled to their virtual printer. That is, commands issued at command line and all messages will be captured.
  • The Operator's console contains a sequential record of system activity and error messages that are extremely valuable for debugging. z/VM support will ask for this often.
  • If you are using the IBM Operations Manager for z/VM application, you can specify the userids (perhaps in addition to OPERATOR) for which you would like to have console data collected and written to disk in real-time. For instructions on doing that, refer to Chapter 2 Step 4 of the IBM Operations Manager for z/VM Administration Guide.
Best Practices:
  • At a minimum, manually save and archive on a regular basis
  • Use an automated program to save and archive on a regular basis
    • IBM Operations Manager for z/VM
    • Other ISV products

 

Support Process

Preparation:
  • Be familiar with the support process and having the appropriate information needed for opening an IBM support case. For example:
    • Your IBMid and password
  • See https://www.ibm.com/mysupport/
Best Practices:
  • Since you may be asked to describe configuration of the system and/or network, having a diagram or write up of that available before you hit a problem can be helpful.
  • Have a change management process that allows you to describe recent changes.