Last Updated: 9 February 2022
This section contains details of "must gather" items that are referenced elsewhere on this site. Information here includes where to find these items, and how to prepare them to be sent to IBM.
This information is divided into logical sections:
Detailed Problem DescriptionOne of the most important things that can be provided to IBM Support is a detailed problem description of the incident that is being reported.
This description must contain:
- A description of the failure, which could include, for example:
- Time and date of the failure
- Abend code
- Error messages observed
- System behavior, including hung users, poor performance, etc
- Circumstances surrounding the failure, which could be:
- Changes to the system that occurred before the incident, including software, hardware, increased workload, etc. Recall that some changes to z/VM, and guests, may take effect at a different time than when they are made. For example, changes to the z/VM System Configuration file do not take effect until the next z/VM IPL.
- Anything unusual observed before or after the incident
- What, if anything, has been done to mitigate the incident
- Has your system recovered? What is the current status of the system?
- Additional background information which could be helpful:
- Naming conventions of virtual machines and/or disks (e.g. LDB2xxxx = DB2 Linux guests)
- Primary workloads involved (e.g. WebSphere and Oracle)
- Diagrams of network topology if network related problem
- Timezones used if sending data from different sources (e.g. z/VM data at UTC and Linux data at EST)
Control Program (CP) Dumps
These dumps are created in multiple ways:
- After a z/VM abend (meaning that your z/VM encountered an unplanned outage). These are created automatically by z/VM.
- As a result of a CP SNAPDUMP command (usually at the request of IBM z/VM support)
- As a result of a PSW RESTART operation on the Hardware Management Console (HMC), again usually at the request of IBM z/VM support.
By default, CP dumps are placed in the reader of user OPERATNS. (The userids to which dumps go to is controlled by the SYSTEM_USERIDS statement in SYSTEM CONFIG. If you have specified another userid on that statement, then you will need to process the dump from that user's reader).
To process the dump, you will need to use the DUMPLD2 utility (DUMPLD2 is the replacement for the DUMPLOAD utility. DUMPLOAD is still supported, but DUMPLD2 is preferred) to process the dump spool file, DUMPLD2 will create CMS file(s) that can be used for debugging. Both of these utilities are documented in the CP Command and Utilities manual. You can also type HELP DUMPLD2 on your CMS command line for help information. These utilities are similar. IBM encourages the use of the newer DUMPLD2 utility, as it enables you to create dumps comprised of multiple files. This is encouraged because size of the resultant CMS files is smaller, which makes transmission of the files to IBM less error-prone.
Some tips for this process:
- In the Preparation section, we discussed increasing the disk for OPERATNS. If you did not do that, then you will need to do it now or find a user id or disk that is large enough. By default, user OPERATNS does not have a large enough R/W minidisk defined to hold most large dump files produced by DUMPLD2. You can either define a new, large R/W OPERATNS minidisk, or use the CP TRANSFER command to transfer the dump spool file in OPERATNS reader to the reader of another userid (perhaps MAINT, for example) that already has minidisk space capable of holding the dump files. The DUMPLD2 utility could then be executed by the other userid.
- If DUMPLD2 fails for any reason (most commonly because there was not enough minidisk space available), note that the dump spool file will still exist but it will have a USER hold on it. This will prevent DUMPLD2 from finding the dump spool file if you execute it again after obtaining more minidisk space. You will get a message that a dump cannot be found. To fix this, use the CP CHANGE command to remove the USER hold on that file (see the CP Command and Utilities manual for details). After you do that, you will be able to run DUMPLD2 again.
- DUMPLD2 typically ends with an error message of "HCPDLD8247E HCQ720 MODULE was not found". You can ignore this message and send the dump to IBM as normal.
- In the event that you are asked to take a SNAPDUMP with either or both of the FRMTBL YES and PGMBKS ALL operands on an LPAR that is a member of a Multi-VSwitch LAG group, you may experience a temporary loss or degradation in connectivity across the members of that group while the SNAPDUMP is being taken. See https://www.vm.ibm.com/support/prep.html#failures for more details on SNAPDUMP and how to mitigate this exposure.
Virtual Machine Dumps
When a guest virtual machine encounters a failure, support may request that you provide a dump of the guest's virtual storage (memory).
- For any CMS guest, issue #CP VMDUMP 0-END FORMAT CMS DCSS to create the VMDUMP.
- For TCP/IP related service virtual machines, the requested virtual machine dump should be routed to the TCPMAINT user ID by adding the "TO" option, (for example, VMDUMP 0-END FORMAT CMS DCSS TO TCPMAINT). For more details on TCP/IP virtual machine dumps, please see https://www.vm.ibm.com/related/tcpip/vmdump.html.
- For Linux guests, additional guidance for collecting dumps across multiple distributions and releases can be found here: https://www.ibm.com/docs/en/linux-on-systems?topic=troubleshooting-using-dump-tools
- For other guest operating systems (for example z/OS, etc.), consult with the support team for that operating system for instructions on creating a dump.
In general, the phrase "Console Files" is used to describe the input and output that has appeared on the 'console' of a virtual machine. Think of it as a log of commands, command output, and messages. These files can be very beneficial to problem determination. Depending on the tools and products you use for consoles will determine where you go to get the console files. In the past, it was very common to 'spool' the console of virtual machines to the virtual printer and perhaps close the spool file at midnight each day. This results in the console file being written to spool space. Presently, IBM suggests you can further harden the console files to disk other than the spool subsystem as there are cases where spool space is not recovered after an outage. IBM Operations Manager for z/VM has the ability to harden the console files and archive them into a single daily log file for each z/VM system. There are also other ISV products that do similar things.
Depending on the problem at hand, the consoles for different virtual machines might be helpful. Below, we describe some of the more common virtual machines where the console is often requested.
- CP OPERATOR's console:
The Operator of a z/VM system (by default, userid OPERATOR) is sent many important messages while the system is running. Many error and informational messages are sent to OPERATOR, which can be extremely valuable when trying to debug a z/VM problem. By default, these messages are saved in a z/VM console file. Customers are strongly encouraged to manage their OPERATOR's console data, so that it can be sent to z/VM service personnel upon request.
To understand the basics of the OPERATOR's console data, including how to process and save console file, read the chapter "Collecting Information about System Operation" in the z/VM System Operation manual.
Many customers use automated programs to manage this. You can use the Programmable Operator function of z/VM, which is described in the CMS Planning and Administration manual. There are also several programs from IBM business partners that can be used for this purpose. You will need to use the instructions from those programs to collect and save the console data, before it can be sent to IBM support.
If you are using IBM Operations Manager for z/VM, by default the daily log of all monitored users' consoles is stored on OPMGRM1's 194 disk. The naming convention of this file is sysname yyyymmdd, where sysname is the z/VM system name and yyyymmdd is the date of the data in the file. More details about log file management are in Appendix E of the IBM Operations Manager for z/VM Administration Guide.
- Console files from CMS service virtual machines:
Some CMS virtual machines provide automated systems management functions to help run your system. Console files from these machines can be required to debug a problem. Examples of these are:
- RACFVM - IBM's External Security Manager (ESM) for z/VM
- DIRMAINT - Directory maintenance (TBD.....)
- TCPIP/VM - From userid TCPMAINT, issue "NETSTAT CP SP CONS CLOSE" to close the file.
System Configuration file (SYSTEM CONFIG)
SYSTEM CONFIG is a file that defines your system configuration, and it is often needed by IBM support for debugging purposes. It resides on userid PMAINT's CF0 disk. Please be prepared to send the SYSTEM CONFIG file to IBM Support upon request. If you use embedded files, please remember to send all the necessary files. If you don't know what embedded files are, you aren't using them.
Data Collection for Performance Problems
For most z/VM Performance problems, the data that is required the most is MONWRITE data. For the most basic information on collecting MONWRITE data, go here: https://www.vm.ibm.com/devpages/BKW/monsimp.html
It is important that the customer provide detailed information regarding their performance problem, including, but not limited to:
- When did the problem first appear?
- How frequently does it appear?
- Has anything changed on the system, including workload, hardware, software, microcode, storage, etc.?
- How are you measuring performance?