Last Updated: 23 August 2023

z/VM Greater than 1 TB Guest Support Considerations


z/VM allows guests that exceed the currently supported memory size limit of 1 TB to be defined and used. However, there are several considerations outlined below that may affect the viability of exceeding this limit.


Reset Time

A guest reset (e.g., performed as part of LOGOFF or re-IPL processing) may take an extremely long time. This issue has been made less problematic by introducing minor changes that reduce the interval between guest LOGOFF initiation and subsequent re-LOGON and to clarify when a such a guest is in LOGOFF. Note that the anticipated use of the guest after it is logged back on is not to once again instantiate large amounts of memory but rather to perform management and maintenance functions that occasioned the LOGOFF.

Recommendation: Be careful to avoid overloading the memory subsystem by restarting a large guest workload prematurely after LOGOFF. This is a customer responsibility as there is no inherent mechanism in the system for detecting and preventing such a situation.


Live Guest Relocation

Relocating a large guest may take an extremely long time, may require using the FORCE STORAGE and MAXQUIESCE options of the VMRELOCATE command and adjusting the MAXTOTAL option if provided, and may result in network or transaction timeouts or other workload problems.

Recommendation: Use a second guest on another system to take over the workload and provide relief.


Small Guests

A smaller guest may be disadvantaged by a larger one if memory is overcommitted. In such a situation, small guest memory resources may be reduced more aggressively because their memory is, for a variety of reasons, easier to steal than that of a large one.

Recommendation: If overcommitment cannot be avoided (e.g., by adding real memory to the z/VM logical partition), the working set of a smaller guest that cannot tolerate the adverse performance implications of this behavior may be protected using SET RESERVE.


Dynamic Memory Reconfiguration

z/VM dynamic memory downgrade (DMD) may take longer and be more difficult to accomplish with a large guest that is active.

Recommendation: Use DMD only when a large guest is not particularly busy.


System SHUTDOWN

During SHUTDOWN, if a large guest receives the shutdown signal, it may not be able to complete its termination process in the allotted SHUTDOWNTIME interval, depending on what that process entails. A failure to complete it would cause a preemptive system termination that could result in Linux file system damage.

Recommendation: Use of a journaling file system to facilitate recovery from file system damage is advisable. Increasing the SHUTDOWNTIME limit might be helpful. Avoid guest actions (e.g., LOGOFF) that cause a system reset.


Helpful Commands and Tools

VMDUMP of a large guest may run too slowly to be viable and in any event is limited to dumping memory only up to 512GB.

Recommendation: Exploit a guest standalone or other dump mechanism.


Performance Analysis

Performance analysis products may not be prepared to handle monitor data values that are larger for large guests.

Recommendation: Review performance analysis results (e.g., reports) and ensure values are reported accurately.
For the results of the IBM z/VM Performance Team's study, read here.

Important note related to: QUERY NAMES, INDICATE PAGING, LOCATE VMDBK

If a large virtual machine is logged off multiple times and two or more logged-off instances are still going through memory reset, the user ID (transformed from uppercase to lowercase) will appear multiple times in the responses to this command. This behavior is unusual, however, it is not considered harmful. Eventually, each instance will finish memory reset and will disappear from the responses.

The following sequence of events is an example of this type of situation:

  1. A user logs on virtual machine "LARGEONE" and runs a workload that instantiates a large amount of memory.
  2. The user logs off the virtual machine. This instance of virtual machine "LARGEONE" becomes known as instance "largeone."
  3. Responses to the following commands now include:
    • QUERY NAMES: "largeone - LOF."
    • INDICATE PAGING: "largeone --- yyyyyyyyyy LOGOFF."
    • LOCATE VMDBK include "largeone nnnnnnnn LOGOFF nnnnnnnnnnnnnnnn."
  4. The user logs back on to virtual machine "LARGEONE" and runs a workload that instantiates a large amount of memory.
  5. The user logs off. This instance of virtual machine "LARGEONE" also becomes known as instance "largeone."
  6. Responses to the following commands now include two occurrences:
    • QUERY NAMES: "largeone - LOF."
    • INDICATE PAGING: "largeone --- yyyyyyyyyy LOGOFF."
    • LOCATE VMDBK include "largeone nnnnnnnn LOGOFF nnnnnnnnnnnnnnnn."
  7. Eventually instance "largeone" from Step 2 completes reset and disappears.
  8. Responses to the following commands now include only one occurrence:
    • QUERY NAMES: "largeone - LOF."
    • INDICATE PAGING: "largeone --- yyyyyyyyyy LOGOFF."
    • LOCATE VMDBK include "largeone nnnnnnnn LOGOFF nnnnnnnnnnnnnnnn."
  9. Eventually instance "largeone" from Step 5 completes reset and disappears.