z/VM Support - Troubleshooting

Last Updated: 17 November 2021

This page provides a summary of materials for you to collect when you have a problem with your z/VM system. For more details, select the problem you are troubleshooting, and follow the instructions. Data that needs to be collected and sent to IBM is highlighted in the table below. If there is "data to collect" for your problem, refer to the "Must Gather - Details" section for details on how to collect the data.

A list of issues and troubleshooting techniques for z/VM and related components.
#	Category	Problem Description	Actions / Data to Collect
symptom-0101.	General	z/VM system abend. In this case your z/VM system has gone down with a failure. The abend will have an abend code, for example, HTT001. If your system is setup properly, a CP dump will be produced and your system will automatically reIPL itself.	Make sure all of your guests are up and running and all system functions are operational Collect and send CP dump to IBM Collect and send the system operator's console (OPERATOR by default) to IBM
symptom-02 hang02.	General	Multiple z/VM guests appear to be hung, to an extent such that your production guests can no longer function and your business is severely impacted.	Perform a PSW Restart function from the HMC. This reIPLs your system AND creates a CP dump. Note: It is very important that you do a PSW Restart and NOT an IPL from the Load panel on the HMC. An IPL from the Load panel will NOT create a CP dump, and as a result, debugging your problem will be nearly impossible. Make sure all of your guests are up and running and all system functions are operational Collect and send CP dump to IBM Collect and send the system operator's console (OPERATOR by default) to IBM Collect and send Monwrite data to IBM
symptom-03 hang03.	General	One or more z/VM guests appear to be hung or non-responsive	Contact IBM support, in some cases it is possible to recover hung users. Support will provide guidance for this attempt. IBM Support may instruct you to do a PSW Restart from the HMC. If the users that are hung are not critical to your business, you may choose to delay doing the PSW Restart until a scheduled maintenance window, to reduce impact to your users. Make sure all of your guests are up and running and all system functions are operational Collect and send CP dump to IBM Collect and send the system operator's console (OPERATOR by default) to IBM
symptom-0404.	General	Non-CMS guest failure. In this case the guest is running an operating system such as Linux, z/VSE, z/TPF, or z/OS.	Contact your guest operating system support team. That team will instruct you regarding what data to send. If a dump of the guest is requested, make sure you collect the dump using the instructions specific to the operating system, and NOT the z/VM VMDUMP command. For Linux guests, additional guidance for collecting dumps across multiple distributions and releases can be found here: https://www.ibm.com/docs/en/linux-on-systems?topic=troubleshooting-using-dump-tools Send the dump and any other requested files using the instructions provided by your guest operating system support team.
symptom-0505.	CMS	CMS guest failure	CMS virtual machines are used for various purposes. This describes generic use of CMS. You will see other entries here for specific uses, such as TCP/IP or SSL. Use the CP VMDUMP command to create a VMDUMP. This command is documented in the CP Commands and Utilities manual. Send the VMDUMP dump to IBM. If available, collect and send the console file for the CMS guest to IBM.
symptom-0606.	TCP/IP	TCP/IP for z/VM - Loss of connectivity - General instructions Note: There are multiple entries here for TCP/IP. IBM Support will help determine the most appropriate based on the configuration.	Determine if you are using a Real OSA or NIC device by issuing 'ifconfig -a' or 'NETSTAT DEVL DETAILS'. Verify that the device is in the 'Ready' state. Test connectivity by issuing PING and TRACERTE from both inside and outside the network and report the results If the device is not ready then check the following: If using a real OSA, query the device to make sure it has been attached to the TCPIP stack. If using a NIC, issue NETSTAT CP QUERY VIRTUAL NIC to the TCP/IP stack to verify the device exists and that it is coupled to the appropriate virtual switch. Verify that there are no configuration errors reported by looking at the TCPIP Console log Collect and send the profile for TCPIP from the TCPMAINT 198 minidisk. This file has a filetype of TCPIP. Collect and send the TCPIP console log. Collect the output from 'NETSTAT CP SP CONS CLOSE' as issued from TCPMAINT and get the log. The log will be sent to the userid specified on the :OWNER tag of the TCP/IP server DTCPARMS entry (for which the default is TCPMAINT). If the device is ready, then collect the output from the following TCP/IP commands issued from TCPMAINT NETSTAT GATE NETSTAT HOME NETSTAT CONFIG ALL Collect and send a VMDUMP using the CP VMDUMP command, if instructed to do so by z/VM Support Collect and send command output from: CP Q VSWITCH DETAILS CP Q V NIC
symptom-0707.	TCP/IP	TCP/IP for z/VM - Loss of connectivity - Native TCP/IP Packet Trace required	Perform the steps below on user TCPMAINT to perform a native packet trace: Issue NETSTAT CP SP CONS CLOSE Issue NETSTAT OBEY MORETRACE IPUP IPDOWN PACKET Issue NETSTAT OBEY TRACEONLY a.b.c.d ENDTRACEONLY Recreate the problem to be traced Issue NETSTAT OBEY NOTRACE Issue NETSTAT OBEY TRACEONLY ENDTRACEONLY Issue NETSTAT CP SP CONS CLOSE and collect the trace data. The trace will be sent to the userid specified on the :OWNER tag of the TCP/IP server DTCPARMS entry (for which the default is TCPMAINT). NOTE: The PKTTRACE SAMPEXEC file that is shipped on the TCPMAINT 592 disk can be used as an alternative to manually issuing the above commands. To use this exec, copy it to the 198 disk, rename it with a filetype of EXEC and remove the EXIT 0 statement from within the EXEC.
symptom-0808.	TCP/IP	TCP/IP for z/VM - Loss of connectivity - TCP/IP Packet Trace with CP TRSOURCE command required.	From TCPMAINT perform the following steps: Issue NETSTAT OBEY PACKETTRACESIZE 100 Issue NETSTAT OBEY TRACEONLY <devname> ENDTRACEONLY Issue CP TRSOURCE ID TCP1 TYPE GT BLOCK FOR USER TCPIP Issue CP TRSOURCE ENABLE ID TCP1 Recreate the problem to be traced Issue NETSTAT OBEY PACKETTRACESIZE 0 Issue NETSTAT OBEY TRACEONLY ENDTRACEONLY Issue CP TRSOURCE DISABLE ID TCP1 Issue CP Q TRF ALL and note the spool file id of the trace file for trace id TCP1 Issue TRACERED nnnn CMS TRACE1 TRCDATA A (ALL, where "nnnn" is the spool id from above. This will create a file called TRACE1 TRCDATA on your A disk Use IPFORMAT to format the data in either IPF format that the IPFORMAT command can view or as a PCAP file for viewing by programs like Wireshark. Issue either: IPFORMAT TRACE1 (OUTFILE TCPTRC1 IPFDATA A IPFORMAT TRACE1 (OUTFILE TCPTRC1 PCAP A FORMAT PCAP Send the TCPTRC1 IPFDATA or TCPTRC1 PCAP to IBM Support NOTE: The PKTTRACE SAMPEXEC file that is shipped on the TCPMAINT 592 disk can be used as an alternative to manually issuing the above commands. To use this exec, copy it to the 198 disk, rename it with a filetype of EXEC and remove the EXIT 0 statement from within the EXEC.
symptom-0909.	TCP/IP	TCP/IP for z/VM - Loss of connectivity - MPROUTE problems	Please collect the following from user TCPMAINT: Routing table - can be obtained via 'SMSG MPROUTE RTTABLE'- 'SMSG MPROUTE HELP' will show a list of available commands. MPROUTE CONFIG file Stack Configuration file (PROFILE TCPIP or its equivalent) NETSTAT GATE output TCPIP Console Log -NETSTAT CP SP CONS CLOSE An MPROUTE trace may be needed, we generally recommend D2, T2 but this will generate a lot of data on a busy system so don't let it run too long: SMSG MPROUTE DEBUG=2 SMSG MPROUTE TRACE=2 Let it run for a couple of Dead Router Intervals then close the console by issuing SEND CP MPROUTE SP CONS CLOSE SMSG MPROUTE DEBUG=0 SMSG MPROUTE TRACE=0 A related TCPIP stack trace may also be needed, see section 7 above for more details If MPROUTE is not responding to SMSG commands, take a dump of the MPROUTE virtrual machine: Issue SEND CP MPROUTE VMDUMP 0-END DCSS FORMAT CMS TO TCPMAINT The dump will be sent to the Reader of userid TCPMAINT. From TCPMAINT, use the DUMPLOAD utility to create a dump file on a TCPMAINT minidisk. Send this dump file to IBM support. Send all console logs collected above to IBM Support
symptom-1010.	TCP/IP	TCP/IP for z/VM - Loss of connectivity - SSL problems. The SSL server can run as a single server or a pool of servers. We recommend a pool. Connections are routed to the first SSL server on the Active list until we use all available connections, then we move to the next server.	z/VM Support may request the following as issued by TCPMAINT: Issue NETSTAT CONFIG SSL - tells the state of the SSL servers Issue SSLADMIN Q STATUS DETAILS Issue NETSTAT IDENT SSL Your customized SYSTEM (or nodeID named) DTCPARMS file GSKADMIN (gskkyman) information for the certificate in use - contact z/VM support for further instructions Obtain trace information: TCPIP Stack trace - Issue NETSTAT OBEY MORETRACE SSL SOCKET SSL Server trace -SSLADMIN TRACE DEBUG Try to establish secure connection Issue NETSTAT OBEY NOTRACE Issue SSLADMIN NOTRACE Issue SSLADMIN CLOSECON to close console. The log will be sent to the userid specified on the :OWNER tag of the TCP/IP server DTCPARMS entry (for which the default is TCPMAINT). Issue NETSTAT CP SP CONS CLOSE. The log will be sent to the userid specified on the :OWNER tag of the TCP/IP server DTCPARMS entry (for which the default is TCPMAINT). z/VM Support may also request a System SSL trace. If so, refer to website listed here: http://www.vm.ibm.com/related/tcpip/
symptom-1111.	TCP/IP	TCP/IP for z/VM - Loss of connectivity - FTP problems	z/VM Support may request the following: From the Client: Tracing can be started when the FTP command is issued: FTP <ip_address> (trace To enable or disable trace mode interactively, use the DEBUG subcommand of FTP From the Server: Specify the TRACE statement in the SRVRFTP CONFIG file Tracing can be activated dynamically by the SMSG interface: SMSG FTPSERVE TRACE ON CONSOLE These Stack commands may be helpful as issued from TCPMAINT: NETSTAT CONN NETSTAT CONFIG PORT
symptom-1212.	Wave	IBM Wave for z/VM	For instructions on what data to collect, use this page: https://www.ibm.com/support/pages/node/720615
symptom-1313.	Performance	z/VM system performance problems	z/VM Support may request the data below. Note: This data should be collected from time periods in which the system was performing well, as well as from time periods in which the system was running poorly: Collect and send MONWRITE data to IBM Collect and send CP OPERATOR's console to IBM
symptom-1414.	Install	Problems when doing an upgrade installation on z/VM	Below are some of the items that z/VM support may request from you: From MIGMAINT 24CC: $INST$ $FILE$ From MIGMAINT 2CF0: $STAGE2$ $TABLE$ $STAGE2$ $WRNFILE INSTUPGR $CONSLOG INSTUPGR $DEBUG$ <sysname> $WRNFILE $STAGE1$ $TABLE$ $STAGE1$ $WRNFILE Directory Manager Log file. One of: INSTDMXT $LOGFILE (if edit directory by hand) UPGDVHXT $LOGFILE (if running DirMaint) UPGDMIXT $LOGFILE (if running a directory manager from another ISV) From MIGMAINT 191: All files with a filetype of $MSGLOG
symptom-1515.	Networking	Virtual Switch (VSWITCH) General Instructions	From any user with class B privilege, execute the following commands: CP QUERY VMLAN to get global VM LAN information (e.g. limits) to find out what service has been applied CP QUERY VSWITCH DETAILS to find out the state of the Uplink devices to find out which users are coupled to find out which IP addresses are active CP QUERY PORT GROUP DETAILS (if using Link Aggregation) to find out the state of the port group and LACP activity
symptom-1616.	Networking	Virtual Switch Specific user ID issues	From any user with class C privilege, execute the following commands for the user ID in question: CP FOR <user> CMD QUERY VIRTUAL NIC DETAILS to find out if your adapter is coupled to find out if your adapter is initialized to find out if your IP addresses have been registered to find out how many bytes/packets sent/received or discarded
symptom-1717.	Networking	Virtual Switch - Loss of connectivity - Packet Trace with CP TRSOURCE command required	From any user with class C privilege, perform the following commands for the VSwitch in question: TRSAVE CP ON DEFERIO FRAMES 600 DASD TO * KEEP 4 TRSOURCE ID VSW TYPE LAN OWNER SYSTEM LANNAME <vswitchname> TRSOURCE ENABLE ID VSW Recreate the problem to be traced For example, ping to and from the z/VM guest that is experiencing connectivity problems TRSOURCE DISABLE ID VSW Q TRF ALL and note the spool file id(s) of the trace file for trace id VSW TRACERED nnnn CMS VSW TRCDATA A (ALL, where "nnnn" is the spool id(s) from above. This will create a file called VSW TRCDATA on your A disk Send the VSW TRCDATA to IBM Support This can create quite a lot of trace data on a busy network. It may be necessary to limit the TRSOURCE trace with the NIC, TRUNK, or DROPPED options, although it is preferred to not limit the output.
symptom-1818.	Networking	Virtual Switch - Uplink issues	Determine what the state of the Uplink ports are by issuing: QUERY VSWITCH Look at the Uplink Port: section and note any devices that are not "Ready". Messages related to the state of the VSwitch Uplink devices are written to the system operator console. Collect and send the system operator's console (OPERATOR by default) to IBM. Collect and send the VSwitch controllers (by default DTCVSW1, DTCVSW2, DTCVSW3, DTCVSW4) consoles (sent to MAINT by default) to IBM. If you need to close the active console, from TCPMAINT you can issue this command for each controller. NETSTAT TCP DTCVSW1 CP SP CONSOLE CLOSE NETSTAT TCP DTCVSW2 CP SP CONSOLE CLOSE NETSTAT TCP DTCVSW3 CP SP CONSOLE CLOSE NETSTAT TCP DTCVSW4 CP SP CONSOLE CLOSE Collect and send the output from NETSTAT OSAINFO DETAILS command: NETSTAT TCP DTCVSW1 OSAINFO DETAILS NETSTAT TCP DTCVSW2 OSAINFO DETAILS NETSTAT TCP DTCVSW3 OSAINFO DETAILS NETSTAT TCP DTCVSW4 OSAINFO DETAILS
symptom-1919.	Networking	Virtual Switch data collection	When collecting this data, it may be useful to use a PIPE command to capture the command output into a single file that can be sent into IBM. For example: PIPE \| CP QUERY TIME \| > VNET DATA A PIPE \| CP QUERY CPLEVEL \| >> VNET DATA A PIPE \| CP QUERY VMLAN \| >> VNET DATA A PIPE \| CP QUERY VSWITCH DETAILS \| >> VNET DATA A PIPE \| CP QUERY PORT GROUP DETAILS \| >> VNET DATA A PIPE \| CMS NETSTAT TCP DTCVSW1 OSAINFO DETAILS \| >> VNET DATA A PIPE \| CMS NETSTAT TCP DTCVSW2 OSAINFO DETAILS \| >> VNET DATA A PIPE \| CMS NETSTAT TCP DTCVSW3 OSAINFO DETAILS \| >> VNET DATA A PIPE \| CMS NETSTAT TCP DTCVSW4 OSAINFO DETAILS \| >> VNET DATA A Send the VNET DATA file to IBM.
symptom-2020.	Crypto devices	Problems with Crypto devices	From the guest virtual machine that is using the device, the output of the CP QUERY VIRTUAL CRYPTO command. Any Crypto error messages displayed by the guest operating system. From MAINT or any class A/B/C/E user, the output of these commands: CP QUERY CRYPTO DOMAINS USERS CP QUERY CPSERVICE From the HMC, a screenshot of the following screens: Single Object Operations -> System Management -> (open the CPC) -> Partitions -> (open the partition in question) -> Cryptos. The Crypto page in the image profile for the LPAR in question. If you have issued any commands to dynamically manage the Crypto Express adapters, the commands and the output they produced (e.g., console file). SYSTEM CONFIG file All Directory statements for the guest virtual machine that is using the device.
symptom-2121.	CP - Accounting	These messages are being observed: HCP8082I Accounting records are accumulating for userid userid. HCP8083I Accounting record threshold has been exceeded for userid userid. Currently nnnnn records are enqueued. While these messages are not reporting an immediate, serious problem, unless you take action, over a long period of time these records can consume all of your checkpoint area space. If this happens, you may lose important spool files, including system abend dumps and OPERATOR's consoles, which are critical z/VM debugging files.	If you DO NOT need to systematically collect any Accounting data, do the following: Issue these commands: CP RECORDING ACCOUNT OFF CP RECORDING ACCOUNT PURGE In addition, add these commands to OPERATOR's PROFILE EXEC, so that they are executed each time your system IPLs. These actions will cause the above messages to no longer be issued, and your checkpoint area will not fill up with accounting records. If you DO need to systematically collect Accounting data, do the following: You maybe get getting message HCP8083I because the A disk of your accounting userid (by default, DISKACNT) is full. Logon to DISKACNT and clear the A disk, either by saving the accounting files someplace else, or erasing them if you don't need them. If DISKACNT's A disk is too small, you might want to consider making it bigger to hold more accounting data. Once you have done the above, make sure DISKACNT is running the RETRIEVE utility in its PROFILE EXEC. RETRIEVE gets the accounting data from the checkpoint area and stores it on DISKACNT's A disk, helps keep checkpoint space available, and prevents messages HCP8082I and HCP8083I. Implement a plan to periodically make sure your accounting data is being RETRIEVEd, that there is space available on DISKACNT's A disk, and that accounting data is being archived if necessary.