Memory Overcommitment
Virtualization is often about over committing resources. As such, there will be discussions about this. First, one needs to agree on the definition of overcommitment. In this document, we will define it as the total of the virtual memory of the started (logged on) virtual machines to the total real memory available to the z/VM system.
For example, if there is 10GB of real memory available to z/VM and there are 15 virtual machines, each 1GB, that are started, then the ratio of virtual to real would be 15:10, or 1.5:1. Some will simplify this by saying 1.5.
Guidelines for Acceptable Ratios
There are a number of factors to determine what is an acceptable ratio. These include:
- What percentage of virtual machines are idle/active?
- How much is shared memory exploited?
- Were the virtual machines defined with the appropriate amount of virtual memory?
- What memory management capabilities are being exploited (e.g. Cooperative Memory Management)?
- What sort of Service Level Agreements (SLAs) are there? Do they require things like 100% of transactions meeting a threshold or less restrictive like 99.9%?
- The performance characteristics of the z/VM paging configuration. Are you using all-flash storage? Multiple channels? HyperPAV and zHPF?
Some guidelines to keep in mind are:
- When planning whether memory can be overcommitted in a z/VM LPAR, the most important thing is to understand the usage pattern and characteristics of the applications, and to plan for the peak period of the day. This will allow you to plan the most effective strategy for utilizing your z/VM system's ability to overcommit memory while meeting application-based business requirements.
- For z/VM LPARs where all started guests are heavily-used production WAS servers that are constantly active, caution should be taken when overcommitment of memory is attempted.
- In other cases where started guests experience some idle time, overcommitment of memory is possible.
- As discussed, how much you can reasonable overcommit is dependent on a number of factors. While exceptions exist, if you overcommit more than 1.8 to 1 for production workloads or more than 3 to 1 for test environments, you should do detailed analysis to ensure adequate performance.
Determining Impact of Memory Overcommitment
While you can plan for a particular level of overcommitment, it is valuable to validate this and measure any impact. There are some very simple tools available to do this. VIR2REAL is one of these tools. It is a simple exec that can be downloaded to provide a point in time ratio for the started virtual machines. One aspect to note is that it includes VDisk virtual memory as well. A link is included in the reference section of this article.
For more detailed analysis, you may want to use a performance monitor such as Performance Toolkit. We will briefly describe some of the key fields here and how to interpret them. For historic reasons, there are time these reports refer to 'storage' when they are really discussing 'memory'.
FCX113 UPAGE - User Paging Activity Report
For each virtual machine, the Toolkit UPAGE report shows both the location of the pages making up the virtual machine and the movement of those pages. Let's look at the following example.
The "Paging Activity" section shows the rate (per second) of page movements between the three locations. The movements that have the greatest impact tend to be the read type operations, as these are the cases where z/VM control program needs to make pages resident in order for the virtual machine to run. For Linux guests, some page fault requests may not stop the virtual machine as Linux and z/VM handshake on some of these requests to allow Linux to run other processes when appropriate. The read type requests would be reflected in the "Page Read" and "Page Migration X>MS" (page from expanded storage to main storage).
FCX114 USTAT - User State Sampling Report
The User State Sampling Report is useful for determining delays to virtual machines. A subset of it is shown below:
- %PGW - percentage of samples depicting waiting on synchronous page faults.
- %PGA - percentage of samples depicting waiting on asychronous page faults.
FCX143 PAGELOG - Total Paging Activity
The final report discussed here is the PAGELOG report, which shows overall z/VM system paging information over time. Again, we show a couple subsets of the report. You'll see that there are three main sections. One for each of the three key areas: Expanded Storage, Real Storage, and Paging DASD.
References
The following are links to additional reference material:
- Understanding and Tuning z/VM Paging This article describes the basic workings of z/VM paging, the metrics that describe the performance, and some of the tuning options.
- VIR2REAL Tool A very simple tool to compute Logged On Virtual to Real Memory Ratio.
Back to the Performance Tips Page