Last Updated: 22 February 2022
This page contains information to help customers prepare their z/VM system for optimal use. It also contains best practices information that will enable customers to properly monitor and maintain their systems on an ongoing basis.
This information is divided into logical sections:
- Security
- Disk - Auxiliary Storage Configuration and Management (FICON DASD and FCP SCSI)
- Networking
- System Backups and Disaster Recovery (DR)
- Paging Space
- Spooling Space
- System Performance
- Guest Virtual Machine Monitoring
- Understanding and Using the SYSTEM CONFIG File
- Software Maintenance
- Communication
- Prepare for z/VM Failures
- OPERATOR's Console
- Support Process
Security
Preparation:- Implement an External Security Manager (ESM) on your system. This is the most important thing you can do for the security of your system. IBM recommends the RACF Security Server feature of z/VM. Other ESMs are available from Independent Software Vendors (ISVs).
-
Be aware of any vulnerabilities that could adversely affect your system security. IBM Z does not publicly disclose whether its hardware or software are or are not impacted by any particular vulnerability. We recommend that you register for the IBM Z Security Portal; all communications for IBM Z and IBM LinuxONE around potential security vulnerabilities are published in that space. The information from the Portal is not shared in support cases.
- For more information about IBM Z system integrity, and registration instructions for the IBM Z Security Portal,
visit - https://www.ibm.com/it-infrastructure/z/capabilities/system-integrity
Once registered, access to the Portal can be found on the Resource Link site. - Please refer to the Communication section of this page for information on how to access Resource Link.
- For more information about IBM Z system integrity, and registration instructions for the IBM Z Security Portal,
- Conform to all of your company's security policies for computing
- Execute consistent and common sense security practices, including
- Change default installation passwords for all components and ISV products
- Make sure passwords comply with any corporate security policies
- Audit the activities of privileged system administrators
- Use role-based access rights to simplify administration
- Use multi-factor authentication to access privileged system administrator IDs. IBM recommends IBM Z Multi-Factor Authentication
Disk - Auxiliary Storage Configuration and Management (FICON DASD and FCP SCSI)
Preparation:- Plan allocation across logical partition(s). For example, consider standards on size and device numbers
- Avoid using duplicate volume serials (VOLSERs) unless absolutely necessary
- Use a naming convention for VOLSERs.
Networking
Preparation:- Implement built-in failover
- Virtual Switch (VSWITCH) and VSWITCH with Link Aggregation preferred for availability characteristics.
- Ongoing open communication with your network team
System Backups and Disaster Recovery (DR)
Preparation:- Develop and periodically review your DR strategy
- Identify all resources that require backup. Exclude resources that do not require backup (Example: Paging volumes).
- Identify backup programs to be used. Examples:
- IBM Backup and Restore Manager for z/VM
- IBM Tape Manager for z/VM
- DASD Dump Restore (DDR). Part of z/VM
- ISV applications
- Make testing of DR process and backups a normal part of your systems maintenance schedule
- Special Single Sytem Image (SSI) consideration:
- If you use backups (meaning that your DR site is sourced from a production backup from some time ago), you will need to use the CLEARPDR IPL parameter on the Stand Alone Program Loader (SAPL) panel to IPL the first DR SSI member. Many customers use this method, and IPL only one of the SSI members on their DR site.
- IMPORTANT! Test your backup and restore process and data recovery integrity
- Use SPXTAPE or ISV equivalent to backup spool files and z/VM system data files
Paging Space
Preparation:- Paging space is user and system pages on External Disk Storage. Paging space tends to grow over time due to factors such as increased workload. Running out of paging space will lead to an outage in the form of a PGT004 abend. Consult the CP Planning and Administration manual to help you estimate how much space you will need.
- Periodically issue CP Q ALLOC PAGE to monitor paging space utilization
- Install other software products to monitor page space and other system functions
- IBM Operations Manager for z/VM
- IBM Tivoli OMEGAMON XE
- Other ISV products
- VIR2REAL package on the z/VM download page can help monitor this.
Spooling Space
Preparation:- Spooling space is required for normal system operation. Functions that require spool space include dumps, console files, NSS files, etc. Spooling space utilization tends to grow over time and must be monitored.
- Periodically issue CP Q ALLOC SPOOL to monitor spooling space utilization
- Utilize CMS utility SFPURGER
- Utilize z/VM download packages such as SPOOLPIG, which can be found at https://www.vm.ibm.com/download/packages/
- Install other software products to monitor spool space and other system functions
- IBM Operations Manager for z/VM
- Other ISV products
System Performance
Preparation:- If you open up a z/VM case with a performance-related problem, it is very
likely that you will be asked to send in MONWRITE data, both from a period in
which your system/guest was performing well, and from a period of poor
performance. As a result, you should:
- Setup collection of MONWRITE data: https://www.vm.ibm.com/devpages/bkw/monsimp.html
- If you want automation so that z/VM continues to collect data and save the most recent files, see the MONCLEAN package: https://www.vm.ibm.com/download/packages/ for one example of how to do that.
- Be aware of and prepared for periodic workload spikes (for example, weekly or quarterly processing).
- Install software products to monitor performance:
- IBM Performance Toolkit
- IBM Tivoli OMEGAMON XE on z/VM and Linux
- Other ISV products
Guest Virtual Machine Monitoring
Preparation:- Identify, and be familiar with the functions of, all guest systems running on your z/VM host system
- Linux guests
- Other operating system guests such as z/OS, z/VSE, z/TPF, etc
- Other CMS guests
- IBM guests (RSCS, PVM, TCPIP, etc.)
- Guests from ISV products
- Monitor guest logs as appropriate
- z/VM console logs
- Other guest-specific log data collected by the guest
- Archive critical guest logs. For example, a guest event that was logged days ago might be needed to debug a guest problem that occurred today.
- Install IBM Operations Manager for z/VM to save and manager guest consoles and logs.
- VIEWCON tool allows for real time viewing of events that make management easier
Understanding and Using the SYSTEM CONFIG File
Preparation:- Defines your z/VM host system configuration
- Resides on PMAINT CF0 minidisk
- Develop a consistent process for changing it and backing it up
- Have an additional backup stored in a place outside of the z/VM system that you can access in an emergency
- File updates must be checked!
- Use the CPSYNTAX EXEC (on MAINT 193) to check for errors. Even an improperly coded comment could prevent your system from IPLing next time!
- If possible, have a colleague peer review your changes. Even if the syntax is right, you may have injected other problems
- Note that changes do not take effect until the next system IPL. An error may not be discovered for months.
Software Maintenance
Preparation:- Develop a strategy to apply z/VM maintenance at least twice a year
- Recommended Service Upgrades (RSUs) are made available about every 6 months during the early part of a z/VM release's life cycle. See https://www.vm.ibm.com/service/rsu/rsuplan.HTML for RSU plan.
- Pay attention to HIPER and Red Alert APARs (http://www.vm.ibm.com/service/redalert/) that might not be on an RSU yet.
- Apply RSUs regularly and HIPER / Red Alert APARs as needed
- Setup a test LPAR or second level z/VM system to apply service to, to allow you to test the service before applying it to production
Communication
Preparation:- Know who your end users are and what they are doing on the system/guests
- Be engaged with your users so that you can prepare for user requirements and workload changes.
- Be engaged with colleagues from other companies
- Bug/process fixes ("Yeah that happened to us and here's how we fixed it")
- Common experiences, successes, and failures
- If possible, attend customer conferences (VM Workshop, SHARE, etc.)
- Communication tools
- Various social media
- VM Community: https://www.vm.ibm.com/techinfo/forums.html
- List Servers, especially IBMVM and LINUX-390
- Available 24 / 7 / 365
- Relatively low traffic, low spam, advice from experienced system programmers.
- Great for finding friendly, helpful, lasting contacts ("birds of a feather")
- IBM Resource Link: https://www-01.ibm.com/servers/resourcelink/svc03100.nsf?OpenDatabase
Prepare for z/VM Failures
Preparation:- z/VM CP is very stable but you must be prepared in case of failure.
- CP Abend
- System or user hangs where an HMC PSW Restart must be taken
- Dump space is part of spool space and is allocated at system IPL time
- Dumps are processed on the OPERATNS user (default). However, as shipped the A-disk for OPERATNS is usually too small for processing a dump. You should ensure that OPERATNS has a sufficiently large read-write disk to contain the processed dump.
- Dedicated dump volume(s) are highly recommended (DUMP option on CP_Owned statement in SYSTEM CONFIG)
- Failures in z/VM guests may require virtual machine dumps for debugging
- At a minimum, issue CP QUERY DUMP periodically to make sure you have dump space available, best is to use automation through products like Operations Manager for z/VM to check and alert if there is insufficient z/VM system dump space. For more information on estimating and validating dump space requirements, see the video below:
- Setup dedicated dump volume(s) by using the DUMP option on the CP_Owned statement in SYSTEM CONFIG (IPL required)
- Virtual machine dumps:
- For zLinux guests, consult your zLinux support team for instructions on getting a dump
- For other guests, support may request that you create a dump with the CP VMDUMP command, which creates a dump of the given virtual machine.
- CP SNAPDUMPS:
- While a SNAPDUMP allows a z/VM system to stay up during the dump collection, it does delay processing which may result in a network or heartbeat to time out. The delay is proportional to the amount of data being dumped. There may be a temporary loss or degradation of connectivity across other LPARs on the same CEC when taking a large SNAPDUMP on an LPAR that is a member of Multi-VSwitch LAG group and contains the Active LAG Port Controller. Note that using either or both FRMTBL YES and PGMBKS ALL operands will likely result in the dump being large enough to cause a temporary disruption of connectivity. IBM recommends using the default SNAPDUMP operands unless z/VM support personnel instruct you to do otherwise in order to diagnose a specific problem.
-
To prevent a temporary connectivity loss from occurring in a
Multi-VSwitch LAG group environment when using SNAPDUMP with either or
both PGMBKS ALL and FRMTBL YES operands, issue the QUERY VSWITCH command
to determine if the LPAR contains the Active LAG Port Controller. If the
LPAR does contain the Active LAG Port Controller, issue the following
commands in order to preserve connectivity across LPARs on the same CEC:
1) SET VSWITCH <switchname> UPLINK DISCONNECT
2) SNAPDUMP PGMBKS ALL FRMTBL YES
3) SET VSWITCH <switchname> UPLINK CONNECT
The above commands will force the shared OSA CHPIDs to select a standby instance to become the active instance momentarily while the SNAPDUMP is issued, and then resume connectivity locally once the SNAPDUMP is completed. See the figure below for an illustration of this.
Global z/VM Virtual Switch
OPERATOR's Console
Preparation:- The default z/VM system operator is the user OPERATOR. Though this can be changed in the configuration or if the OPERATOR is logged off. You can validate the current system operator by issuing the CP command QUERY SYSOPER.
- By default, user OPERATOR's console is spooled to their virtual printer. That is, commands issued at command line and all messages will be captured.
- The Operator's console contains a sequential record of system activity and error messages that are extremely valuable for debugging. z/VM support will ask for this often.
- If you are using the IBM Operations Manager for z/VM application, you can specify the userids (perhaps in addition to OPERATOR) for which you would like to have console data collected and written to disk in real-time. For instructions on doing that, refer to Chapter 2 Step 4 of the IBM Operations Manager for z/VM Administration Guide.
- At a minimum, manually save and archive on a regular basis
- Use an automated program to save and archive on a regular basis
- IBM Operations Manager for z/VM
- Other ISV products
Support Process
Preparation:- Be familiar with the support process and having the appropriate information needed for opening an IBM support case. For example:
- Your IBMid and password
- See https://www.ibm.com/mysupport/
- Since you may be asked to describe configuration of the system and/or network, having a diagram or write up of that available before you hit a problem can be helpful.
- Have a change management process that allows you to describe recent changes.