Pre-z/VM 5.2.0 64-bit Considerations

Introduction

The following discussion applies to the initial 64-bit support that was first provided by z/VM 3.1.0 and that continues, with incremental improvements, through z/VM 5.1.0. If you are currently running on one of these releases with more than 2GB of central storage, this discussion can help you to determine whether your system's performance is being affected by 2GB-line constraints and, if so, what actions you might take while still running on these releases.

z/VM 64-bit support was greatly enhanced in z/VM 5.2.0. Migration to z/VM 5.2.0 is highly recommended for systems that are currently experiencing significant 2GB-line constraints. See z/VM 5.2.0 64-bit Considerations for further information.

Understanding Use of Memory below 2GB

While z/VM has 64-bit support for virtual storage and can support more save 2GB of central storage (real memory), there are areas of the Control Program (CP) that are limited to 31-bit (2GB). A guest page can reside above 2GB and have instructions in it executed or data referenced without being moved. However, pages that require certain CP processing must reside below 2GB in z/VM's central storage (host real memory). This includes things such as I/O channel programs and data (both traditional SSCH and the newer QDIO), simulation of instructions, and Locked pages (e.g. QDIO stuctures for real devices).

For example, if a guest is executing an I/O where the channel program and data area reside in a guest page which is in a frame (4KB of central storage) above 2GB, then when CP does page translation (in order to process the virtual I/O) the guest page is moved to a frame of central storage below 2GB. In the case of I/O, that page is locked until the I/O completes. Once the page is moved below 2GB host real, it remains there until it is stolen or the page is released (pages can be released explicitly or when the guest logs off or goes through system reset). This can occur for both guests that run with 31-bit and 64-bit addressing. The determining factor is not the location of the the page within guest memory, but the location within z/VM (host real) memory. A guest that is 31-bit can have its pages above 2GB in host real memory.

A page can be stolen as part of the CP steal processing which is started when the number of available frames below 2GB is too low. During steal processing, CP makes various passes in attempt to add frames to the available list. Each pass represents a step in the hierarchy of page value. For example, unreferenced pages of idle users will be stolen before pages owned by CP. There is a separate steal process for pages above 2GB. When stolen, a page will be moved to expanded storage if available. This is one of the reasons configuring expanded storage is highly recommended. If there is significant contention on the storage below 2GB, thrashing can occur. The page rates to expanded storage, and potentially DASD, grow to very large rates (1000s pages below per second). In this thrashing scenario, the pages stolen from below 2GB will often be ones that will need to be brought back below 2GB in a short period of time.

Identifying a Constraint with 2GB

The symptoms of being constrained by 2GB are high paging activity and having storage available above 2GB. The simplest way to see this is with the CP INDICATE LOAD and CP QUERY FRAMES commands. Additionally, since stealing and related processing is a CP system function, you will see an increase in System processor time. In the example that follows, the key values to look at are -

  • Indicate Load's XSTORE - rate of paging to expanded storage (13301 in our example).
  • Indicate Load's PAGING - rate of paging to DASD (112 in our example).
  • Indicate Load's STEAL - percentage of pages stolen from active users. A high percentage is not necessarily bad if the rates for XSTORE and PAGING are low. (91% in our example).
  • Query Frame's >2G Available List - the number of frames on the available list for use above 2GB (58339 in our example).
  • Query Frame's AVAIL - different from the available list count discussed above. It is the count of pages potentially in use by guests below 2GB (504611 in our example).
In this example, we have a high paging rate and a large number of frames available above 2GB. Therefore, this is likely to be a system constrained on storage below 2GB. INDICATE LOAD AVGPROC-008% 03 XSTORE-013301/SEC MIGRATE-0023/SEC MDC READS-000020/SEC WRITES-000000/SEC HIT RATIO-100% STORAGE-030% PAGING-0112/SEC STEAL-091% Q0-00008(00000) DORMANT-0044 Q1-00001(00000) E1-00000(00000) Q2-00001(00000) EXPAN-002 E2-00000(00000) Q3-00038(00000) EXPAN-002 E3-00000(00000) PROC 0000-009% PROC 0001-008% PROC 0002-008% LIMITED-00000 QUERY FRAMES SYSGEN REAL USABLE OFFLINE 524287 524287 524287 000000 V=R RESNUC PAGING TRACE RIO370 000000 000735 523272 000280 000000 AVAIL PAGNUC LOCKRS LOCKCP SAVE FREE LOCKRIO 504611 011941 000902 000000 000042 005776 000000 Storage >= 2G: Online = 1048576 Available List = 58339 Not init = 0 Offline = 0

Additional details on processing associated with storage below 2GB can be found in VM monitor data. Various performance products may report on these values.

How to Project Requirements for Storage Below 2GB

The following are factors that need to be understood to determine your storage requirements.

Factors

There are a number of factors that impact the requirements for storage under 2GB. Some of these can be influenced by configurations and other tuning, while others are strictly workload dependent.

V=R Area is storage that is configured for V=R and V=F guests. This area must reside below 2GB. Since CP and its control blocks must also reside below 2GB, the V=R Area is limited to less than 2GB.

CP LOCK Command can be used to explicitly lock a guest's pages. When this command is used, the pages are locked below 2GB. (The CP SET RESERVE command is a better alternative to CP LOCK).

Real QDIO (FCP) Devices when used with a Linux guest require certain structures to be locked below 2GB. Because of the direct memory access nature of the QDIO architecture, these pages are locked for as long as the device is in use. The default settings for a Linux guest with a real QDIO device result in about 8MB of storage for network devices. (For QDIO GbE there are 3 device addresses associated with a connection, the 8MB is the total for all 3). More recent releases of Linux (SLES 9 and RHEL 4) allow one to configure the QDIO setup so that the maximum number (128) of buffers are not always used. The default has also been lowered significantly so that the amount of locked memory is closer to 1MB. For FCP SCSI devices, there is a single device. The storage locked below 2GB is not as predictable, but 800 pages is a good planning number.

Virtual QDIO for things such as Guest LAN and Virtual Switch does not require the structures to be permanently locked. However, while an I/O is being processed, pages will be locked below 2GB. The number of pages is dependent on the I/O.

Traditional SSCH I/O except in assisted cases with V=R or V=F guests, requires CCW translation by CP. In the CCW translation, CP will need to have the guest page resident below 2GB. Also, when the real I/O is issued, the I/O data areas are locked below 2GB until the I/O completes. Other CP Control Blocks besides the control blocks referenced above also reside below 2GB. While some of these control blocks are pageable, most are not. Some of the control blocks that may consume the most storage are -

  • VMDBKs - Virtual Machine Description Block describes each virtual processor per guest logged on or running disconnected on the VM system.
  • PGMBKs - Page Management Blocks (page tables). User PGMBKs can be paged out under certain criteria. The number of PGMBKs is proportional to the amount of virtual storage used.
  • Other DAT (Dynamic Address Translated) Control blocks such as Region and Segment tables for guests also consume storage below 2GB. Unlike PGMBKs, these are not pageable. The number is also proportional to the amount of virtual storage used.
  • VDEVs - Virtual Device Blocks represent each virtual device defined.
  • RDEVs - Real Device Blocks are created for each real device known to VM.
  • MDISKs - Minidisk Control Block are created for each minidisk in the VM system. This control block only represents the minidisk, not the data for that minidisk or related MDC structures.
  • Spool Related control blocks on systems with large number of spool files (Reader, Printer, Punch, Console) may also consume a significant amount of storage below 2GB.
  • Frame Table - CP uses a frame table to manage central storage. There is an entry in the frame table for every frame of central storage. So the more central storage available to VM, the larger the frame table. (4MB of frame table entries for each GB of central storage).

Number of Virtual Machines - since many of the control blocks listed above are required for virtual machines, having large numbers of extra virtual machines running on the system increases the storage requirements below the line. Included in this are service virtual machines that are started automatically, but not used.

Worksheet for Storage Requirements

The following worksheet can be used to help compute fixed storage requirements below 2GB. All values are decimal values and this is calculated in pages. It also assumes CP running in 64-bit mode. The units below is pages (4096 byte pages).

Frame Table = 1 page per MB of Real storage Trace Table = 100 pages + (number_CPUs - 1) * 75 V=R/F Area = V=R size from system config Dedicated OSAs = 2000 * number_of_dedicated_devices (triplets) (use 150 instead of 2000 for newer releases of Linux) Dedicated FCPs = 800 * number_of_dedicate devices RDEVs = 0.125 * number_of_physical_devices Spool Files = 0.05 * number_of_spool_files Segment Tables = 0.001 * total_virtual_storage_MBs Fixed PGMBKs = 2 * total_vdisk_size_MBs Pageable PGMBKs= 2 * active_virtual_storage_MBs VDEVs = 0.04 * total_number_virtual_devices VMDBKs = 1 * total_number_virtual_processors

Methods to Lower Storage Requirements Below 2GB

There are several approaches to help improve the storage requirements. Many of them involve trade-offs with other resources or characteristics. The following are some of the most common approaches used, listed in no particular order.

Get on Current Software Levels

The following improvements can be found in various releases, more details can be found in the z/VM performance report.

  • Linux Timer Kernel Patch reduces the timer pop frequency for idle Linux guest virtual machines.
  • z/VM 4.2.0 extended Minidisk Cache to 64-bit channel programs.
  • z/VM 4.3.0 for certain cases, allowed pages stolen below 2GB to be moved to central storage above 2GB instead of being paged out to expanded storage or DASD. The control program is conservative in moving to central storage above 2GB versus expanded storage.
  • z/VM 4.4.0 changed how the dispatcher determines idle users.
  • z/VM 4.4.0 allowed virtual disk in storage pages to be page faulted in above 2GB.
  • z/VM 5.1.0 further changed how the dispatcher determines idle users.

Other service to consider includes -

  • VM63729 - improves efficiency when searching for pages below 2GB to steal. Applies to z/VM 4.4.0 and z/VM 5.1.0
  • VM63730 - improves management of contiguous frame requests for below 2GB.
  • VM63752 - allow steal for below 2GB to be more agressive about moving stolen pages to real memory above 2GB rather than to expanded storage.
Consider using the Linux Fixed I/O Buffer feature

On Linux SLES 9 SP1 or RHEL 4 systems, there is the ability to use the Fixed I/O Buffer feature. This minimizes the number of guest pages Linux uses for I/O at the cost of additional data moves inside the guest. This is described in greater detail on the zSeries Linux zSeries Linux Performance page.

Consider using a Guest LAN or Virtual Switch

In connecting Guests to the physical network, there are many choices. Using a Guest LAN with a virtual router (or the Virtual Switch in z/VM 4.4.0) instead of giving each guest its own GbE device reduces required below 2GB.

Consider defining additional Expanded Storage

By configuring central storage as expanded storage, you lower the storage required by the frame table. Also, if you have a large amount of storage on the available list above 2GB, then a portion of that storage would probably be more effective as expanded storage.

Lower Number of Real and Virtual Devices in System

While it can be nice to have access to all devices on all systems, this approach consumes additional storage below 2GB.

Clean up Unnecessary Spool Files

Process or Disable Accounting, Logrec, Symptom Records

Various CP system services exist that will generate system information which is kept in storage below 2GB if not processed. These services can be disabled or one can set up the service machines to process the records.

Exploit Segments (DCSSs) Where Appropriate

One of the unique advantages of z/VM is the ability to use segments and share them amongst guest virtual machines. There are two things to consider in this area. First, there are a number of segments that may be predefined on your system. If these are not used then they should not be loaded into storage. The QUERY NSS MAP command can be used determine the number of users (#USERS field) using a segment. The second consideration is that when a segment is shared with multiple virtual machines, there is only one set of PGMBKs (page tables) which can reduce storage requirements under 2GB. Using an segment for the Linux kernel is an example of this.

Limit Virtual Machine Storage Sizes

Since the number of segment and page tables are determined by the amount of virtual storage a guest uses, you can limit these control blocks by lowering the virtual machine storage size. PGMBKs are pageable, but only with certain criteria (e.g. all pages in the segment represented by the PGMBK must be paged out already).

Review usage of Virtual Disk in Storage

A virtual disk in storage is a volatile FBA disk that is backed by memory (a system utility space) and is pageable. Virtual disks in storage are often used as swap disks for Linux. While the pages that make up the virtual disk data blocks are pageable, the PGMBKs for the virtual disk are not pageable. (Also prior to z/VM 4.4.0, the virtual disk in storage pages were page faulted in below 2GB.) Therefore, it would be good to define the virtual disks in storage no larger than they need to be.

Evaluate MDC Usage for Traditional (SSCH) Disk I/O

Since I/Os satisfied from MDC do not require pages to be moved below 2GB, getting the full benefit of MDC can help lower storage requirements below 2GB. For I/Os to be satisfied by MDC, they must be reads that reference data that already exists in the cache. Possible ways to influence this is to ensure there is sufficient memory for MDC (either expanded storage or central storage) and that disks are eligible. Dedicated disks are not eligible for MDC. In addition, some advantage may be found with using Record level MDC instead of Track level (default). Record Level MDC requires diagnose I/O instead of SSCH I/O. The Linux DASD device driver can be configured to use diagnose x'250' for its disk I/O.

Ensure Idle Guest Virtual Machines Treated as Idle

There have been a number of improvements to help ensure that inactive guest virtual machines appear idle to the VM scheduler. When this occurs the VM storage management algorithms can be more effective in reclaiming the most appropriate page frames. The Linux Kernel Timer Patch and z/VM APAR VM63282 (in base of z/VM 4.4.0) are examples of these improvements.

Splitting into Multiple LPARs

A more involved alternative is to split the VM system into multiple z/VM LPARs, thus effectively getting multiple 2GBs of storage. This may also be desirable to provide hot backup or high availability solutions. There is some duplication of base costs for running multiple z/VM systems as well as administration costs to be considered.


Return to the Performance Tips Page