Common Performance Problems and Solutions
I have over twenty logical paritions and each has a large number of logical
processors defined. In total the ration of logical to physical processors is
over 10 to 1. Is this a problem even if they're just logical processors?
Solution: There are a couple different rules of thumb. The most common is to not go above 2:1 logical to physical processors for any processor type. That is if you have a mix of IFLs and CPs where the IFL ratio is 3:1, but the CP ratio is 1.5:1, this still breaks the rule. Configuring an IBM Z or LinuxONE server with a high ratio puts undue stress on the environment. This can cause elongation in dispatch times, workload queues growing, and large increases in overhead in the PR/SM, hipervisor, and OS levels. At its worse, symptoms will include hangs as queues keep growing and can never catch up. For some additional information, see Tech Doc TD106388 for additional information.
I believe I have a problem with my OSAs, as I see queues forming and overflowing.
What can I review to better understand what might be causing this?
Solution: The VSWITCH overflow counters are a gauge on how responsive z/VM and an OSA are in processing the VSWITCH UpLink Port. The following is a description of each counter:
Input Queue Overflow - The number of times z/VM went to process additional empty buffers on the input queue, but discovered all the existing buffers have been filled by the OSA. In this case the OSA is running faster then z/VM can process the incoming data. Insufficient processor resources or a lack of timely dispatching is the typical cause for this occurrence. Ideally, we want z/VM to process the queues faster than the OSA.
Output Queue Overflow - The number of times z/VM went to send a packet to the physical network but found that the output queue was full (all 128 buffers waiting for the OSA to process). In this case z/VM is running faster than the OSA can process the data. Most likely OSA's bandwidth is stretched to its maximum either by z/VM or some other LPAR. This may be normal, and typically occurs with large streaming TCP/IP workloads, given that z/VM typically runs faster than the OSA.
Overflows in general are not bad for short periods of time. Both the OSA and z/VM will queue packets waiting to be transmitted in order to prevent packet lost. A high overflow count only becomes problematic when the overflow condition is long enough to cause the queueing logic to start discarding packets. In this case you will see the Uplink Port's TX or RX packet discard rate or the OSA's packet loss counters increase also. In either case the overflow counters point to the culprit (OSA or z/VM).
I have a Linux virtual machine
that is fairly processor intensive. I've increased
the number of virtual CPUs defined for the virtual machine, but it did
not seem to improve the performance. Why?
Solution: There are a couple things to remember in this scenario. First remember that as the share value assigned to a virtual machine is distributed across the started virtual CPUs. So if you have a virtual machine set with Relative 100 and move from 1 virtual CPU to 2 virtual CPUs without changing the Share setting, you really just have two Relative 50 virtual CPUs for the work. Another thing to remember is that if the workload in the virtual machine is dominated by a single process (or small number) it can only be dispatched on one virtual CPU at a time. For example, if a single process makes up or consumes 30% of a workload, then running on more than a virtual 3-way will not be efficient. As a single virtual CPU of a 3-way is 33% of the potential resources. Adding more virtual CPUs in scenarios such as that, can actually lower performance as the impact from the MP effect increases.
I heard that I should not mix my paging devices across different type of disk
(SCSI and ECKD) or even different sizes? Is this true? If so, why?
Solution: Yes, this is correct. You should use uniform devices for paging. They should be of the same type (either FCP SCSI EDEVs or traditional ECKDs) and all of the same size and speed. The z/VM Paging subsystem has a load balancing algorithm. To be effective, that algorithm assumes the devices are of the same size and basic characteristics. The calculations are done very differently for ECKD and EDEVs. Trying to mix the two will result in unpredictable, and potentially poor, performance.
I see a non-trivial number in my Linux Reports for %Steal. Is this
Solution: It depends. Current Linux distributions report %Steal as well as pct User and pct System. The %Steal field is a Linux view of percent of time that it had work to run but was unable to run. There are a number of things that could contribute to this and they are not all reasons to be alarmed. The most commonly known case is z/VM was dispatching other virtual machines. To check for this case, compare to %CPU Wait in z/VM state sampling report. Another case could be z/VM was executing on behalf of the Linux virtual processor, compare to CP CPU usage of the Linux virtual machine. This could be z/VM doing useful things like I/O channel program translation for an I/O Linux issued. A different aspect of this could be z/VM processing a page fault. A less known case is Linux yielded its time slice to z/VM via diagnose 0x9C instead of spinning on a formal spin lock. Examine diagnose rates to see if this is a possibility. Also remember the z/VM partition might not have been able to run due to another logical partition being dispatched at the LPAR level. So while the field is called Steal, it doesn't always mean stealing resources from one virtual machine to give to another.
The AVGPROC field on INDICATE LOAD command is showing 100%, but
Performance Toolkit and other monitors show a lower value. Why?
Solution: The INDICATE LOAD command AVGPROC field gives the processor utilization out of 100% for the number of processors VM is running on. It is based on processor time used and voluntary wait time. Therefore, it is skewed when running in an LPAR when the partition is using shared processors. It will further differ by the fact that it reports a smoothed average, while other tools are averages for given monitor interval. Another way of looking at it is, z/VM was using 100% of what LPAR was giving it. For this reason, we don't recommend INDICATE LOAD for capacity planning.
In the Performance Toolkit User State Sampling Report (FCX114),
the Other state (%OTH) is showing a non-zero value. What is this?
Solution: The Other state is a catch all for states of virtual processors that are not one of the existing known states. When a scenario resulting in Other happens enough, IBM tends to create a new state to define that scenario or redefine an existing state to include this scenario where it makes sense. One scenario that started appearing around 2007 for Other, is a virtual MP configuration and the base virtual processors is idle, however the other virtual processors in the configuration are in the dispatch list. The base virtual processor must stay in the dispatch list when any of the other virtual processors are there. The monitor was changed in z/VM V6.2 to adjust for this scenario.
We recently added additional paging volumes on a newer control unit,
as our page space utilization was getting high. Now we see less
consistent paging performance.
Solution: The paging algorithms are designed to work with paging volumes that have the same characteristics. IBM does not recommend mixing different size or different speed devices for paging space. Undesirable effects may occur.
In the user state sampling report on my performance monitor product,
I see a large percentage of samples for
waiting on Test Idle (%TIW). It is not
clear to me which resource is experiencing a shortage.
Solution: There is no shortage for this virtual machine, other than perhaps a shortage of work. Test Idle is a state the virtual machine is put into when it first goes idle (load a wait state). To avoid the overhead of dropping a virtual machine from the dispatch list and then immediately re-adding it, z/VM keeps the virtual machine in the dispatch list in a state called test idle for a period of time in case other work may start. After 300 milliseconds of being in test idle, z/VM will drop the user to the dormant list. So time spent in test idle is really a measure of how long the virtual machine spent idle for short periods of time. It is not a problem by itself. This value also shows up in the Performance Toolkit User Detail screen (FCX115) as PSW Wait.
I have a channel that is no longer being reported on my performance
Solution: Check to see if the channel was taken offline via the HMC without first taking the devices on the channel offline. If this is the case, the problem can sometimes be resolved by varying the channel off and then back on at the z/VM level (via VARY CHPID). You may also need to vary the path online also (via VARY PATH).
Performance Toolkit (or insert favorite monitor here) is giving me
alerts about the C1ETS being too high.
Solution: The C1ETS stands for class 1 Elapsed Time Slice. Each scheduler class has an Elapsed Time Slice (ETS) associated with it. The Class 1 ETS is dynamically adjusted by the scheduler. All the other time slices are multiples of the C1ETS (classes 0/2/3 multiplication factors are 6/8/48 respectively). The scheduler adjusts C1ETS in order to try and keep 85% of the transactions as trivial (that is within the first ETS). On systems where there are guests that never go truly idle, the transactions are very infrequent and therefore can cause the scheduler to increase the C1ETS. This isn't necessarily a problem since the transactions are not real transactions.
I am looking at my performance monitor and I see transaction rates
and transaction response times. However, they make no sense to me.
There also appear to be very large spikes in these numbers. My
workload is primarily Linux guests.
Solution: You can ignore these transaction related numbers. They are based on the Control Program's (CP) view of a transaction, which is actually an estimation. CP basically defines a transaction ending as whenever a virtual processor goes long term dormant. That is a transaction will span test idle grace periods. So in many Linux environments, the guests never go long term dormant frequently enough to make the transaction counts of value.
I increased the number of virtual processors for a given virtual
machine, but it doesn't seem to be able to use additional processor
Solution: First, did you really need to add virtual processors? Often you can achieve the same result by increasing the Share setting. Second, can your workload effectively use additional processor resources? You can check to see if your virtual machine is using the existing processors on Performance Toolkit or other monitor, in most cases, if it is not using over 80% of existing virtual processors there is not value in adding additional ones. Also, some workloads do not scale and one needs to be aware of that. Third, if you increased the number of virtual processors, you might need to increase the share setting as it is distributed across the virtual processors for a given machine. For example, a 1-way virtual machine with default relative share setting is given a second virtual processor. If the Share setting is left the same, both virtual processors are run as if they were each Share Relative 50 virtual machines. A common rule is to increase the Share setting proportional to the number of virtual processors for the given virtual machine. So in our example, set the share to Relative 200 for the virtual 2-way virtual machine. This is a simple approach and you might find changing share values for other considerations also helpful.
Very high processor overhead with z/OS guests using virtual
CTCs and CFs
Solution: Check if IRD and WLM is attempting to be used on the z/OS virtual machines. If so, disable it.
I'm seeing high processor utilization when running a VM system
as a guest of VM when running on LPAR.
Solution: This type of configuration results in three levels of dispatching (SIE - start interpretive execution): LPAR, 1st level VM, and 2nd level VM. The hardware implementation supports 2 layers of SIE. However, when the third level is reached, a layer of SIE needs to be virtualized which is very expensive in terms of processor time. This third level is not supported for production work. See Preferred Guest Migration Considerations for related discussion.
High channel utilzation that cannot be explained by activity on
active logical partitions.
Solution: Check if you have a partition that is sitting in a wait state (such as 1010 for z/VM). If you have a Storage Server (e.g. DS8000) or other device that is signaling all known partitions for things like state change (flash copies), the state changes are reflected to all active partitions whether they are able to deal with them or not. Deactivate the idle partition to avoid extra traffic.
Linux guest has very slow response when connecting to the network.
Solution: Several things can affect performance when connecting, whether by ftp, telnet or any other method. The most common thing to check is:
- Make sure your specification for the name server is correct. Look in /etc/resolv.conf.
I shutdown a number of my Linux systems running in virtual machines,
but I still see them consuming storage in memory or on paging DASD.
Solution: Since VM virtualizes the architecture, it cannot free up backing storage unless that is explicitly requested. In the case where you shutdown the Linux system but keep the virtual machine logged on or disconnected, VM will continue to maintain the memory that Linux last used. To clear this up, either logoff the virtual machine or issue a CP SYSTEM CLEAR command. If you reboot Linux using the IPL command with the CLEAR option, this will also clear memory.
Problem: I get the following error messages on the SFS server:
DMS5GM3001W The file pool server has used up 92 percent of the DMS5GM3001W available virtual storageHowever, when I look at Query Filepool Report and compare the Virtual Storage Size in Bytes to Virtual Storage Highest Value there does not appear to be a problem.
Solution: The gotcha is that the second value is in Kbytes, while the first is really in bytes.
For my older style network connections like LCS and Virtual CTC,
I have 2 devices defined: one for input and
one for output. The utilization of the input device appears to be
near 100% all the time, while the output line is much lower.
Solution: TCP/IP and other network servers may use seldom-ending channel programs for I/O on the device line that receives data. If you look at the breakdown of service time (disconnect, pending, and connect time) you will see that the near-100% device has most of the service time in the disconnect state. This is really the idle time between actual I/Os in this scenario. The connect time is the time data is actually being moved on the channel subsystem. The connect times for the two devices should be fairly close (assuming the same amount of data is coming in as going out). So the high utilization is only an anomaly, and not something to be concerned about.
I used the QMDC tool from the z/VM download library, but it seems to
give strange results for SFS minidisks.
Solution: There is a problem in that the VDEVIOCA counter is not updated in some scenarios, particularly those involving multiple block I/O requests. This results in low MDC rates as reported by QMDC. However, MDC is working. Use Performance Toolkit or similar tool to look at the real I/O avoided due to MDC at the real device level.
I see large amounts of CP CPU time being charged to my DB2 Server for
VSE and VM (SQL/DS).
Solution: Check if you are using minidisks mapped to VM data spaces for SQL, and if so, make sure those minidisks are disabled for MDC. The large amount of CP processor time is most likely a result of overhead in searching for data from that minidisk in MDC that needs to be flushed when SQL uses the SAVE function to force the data in the data space to be written to the mapped minidisks.
I have a processor with storage that can be configured as either
central or expanded storage. Which is better?
Solution: Typically, it is better to have more real storage than expanded storage. However, rule number one applies. "It depends." See Configuring Processor Storage for more details.
I migrated from VM/ESA 1.2.1 to VM/ESA 2.2.0 and now my full pack
minidisks (FPMs) are being minidisk cached. I don't want them to be
eligible for MDC.
Solution: Yes, prior to VM/ESA 1.2.2, full pack minidisks were not eligible for MDC. Now FPMs defined via VOLSER are eligible. Unfortunately, the MINIOPT directory statement is not valid for FPMs and the DASDOPT directory statement does not include a NOMDC option. The following are alternatives.
- Use system config file and the RDEV statement to set MDC off for the entire volume. This is most effective when the FPMs are used exclusively for guest systems.
- Use system config file and the RDEV statement to set MDC off by default (DFLTOFF) for the volume and then explicitly set MDC on for minidisks (via MINIOPT) overlaid by the FPM. The downside to this solution is all the work to explicitly turn on MDC for each new minidisk.
- Autolog a userid that links to all the FPMs and then issues SET MDC MDISK commands to disable MDC for those FPMs. This is effective but has a few more moving parts than we typically like.
- Allow MDC to be on by default for the volume, but use SET MDC INSERT OFF in the backup program (or which ever userid you do not want to use MDC). This can be appropriate for backup software and scanning software.
- Just let the system take the defaults. In many systems the fair share algorithms will keep any one user from abusing MDC.
See Minidisk Cache Guidelines for more details.
The RTM SYSDASD display or other vendor products show paging activity
to non-paging devices.
Solution: This is most likely paging activity for VM data spaces that are mapped to minidisks. This allows an application to just reference a storage location in the VM data space and have CP handle making sure the data is moved from the minidisk to virtual storage. CP uses the paging subsystem to accomplish this. See VM Data Spaces for more details.
INDICATE LOAD, Monitor, and other sources say one of my processors is
100% busy. But it is clear there really is not that much work. Other
processors are not near 100%.
Solution: Check if the processor is a dedicated processor. If so, it will look to VM as if it is 100% or near 100% busy. Some background: for dedicated virtual processors, CP marks the virtual processor as being eligible for the wait state assist. This means that when the virtual processor loads a wait state, the hardware will not automatically exit SIE. The virtual processor continues to run under SIE until an interrupt occurs. The virtual machine will exit SIE eventually and CP will update the timers appropriately but while the virtual processor has been in a wait state, VM thinks it has been running the entire time. For dedicated processors, one really needs to look at the 2nd level guest's performance monitor tools to get accurate measures.
Problem: It takes 15 minutes for the spool initialization on my system to complete. There are 90,000 files in the system. Solution: This is as expected. While VM/ESA spool file initialization has been improved greatly over VM/XA, it still takes a noticeable amount of time for the spool file to initialize with large numbers of spool files. Additional information can be found on the Understanding Spool File Initialization Performance page.
Disconnect time for certain volumes is huge.
Solution: Validate that processor microcode is current. Disconnect time is recorded by the hardware in the subchannel measurement block in a 4 byte field. Several processor lines have had a problem where they accidentally increment the first byte out of sync with the other 3 bytes. Ouch!
After migration to VM/ESA 1.2.2, minidisk cache has very
high steal rate and low hit ratio.
Solution: Make sure there is some central storage being used for minidisk cache. This is especially important when the virtual I/O buffers are not page aligned. For the not page aligned case, CP can not move data directly from expanded storage to user's buffer. There are stats in RTM and monitor data that reflect this.
After migration to VM/ESA 1.2.2, minidisk cache seems less
efficient, and the MDC inserts rejected due to fair share
limit have increased.
Solution: Make sure the directory option NOMDCFS is used for all the appropriate server virtual machines. Also note that the fair share limit minimum value in 1.2.2 may be too low. This has been changed from 8 to 150 in VM/ESA 2.1.0. The APAR to VM/ESA 1.2.2 is VM59590 (PTF UM27424).
It can be modded by changing the values in HCPTCMBK to the following:
TCMFSLIM DC F'150' Fair share insert limit for this interval TCMFSLMM DC F'150' Minimum fair share insert limit per intervalAnd then re-assemble HCPTCM.
After migration to VM/ESA 1.2.2, minidisk cache hit
ratio went down (got worse).
Solution: This may not be a problem. If the hit ratio is from RTM/ESA, note that prior to VM/ESA 1.2.2, RTM computed hit ratio by only looking at MDC eligible I/Os while in 2.2 and beyond all virtual I/Os are considered. If the hit ratio is from the INDICATE LOAD command (which only looks at eligible I/Os), note that the count of eligible I/Os may be greater in VM/ESA 1.2.2. So while hit ratio went down, total I/Os avoided went up. I/Os avoided is reported on VMPRF DASD_BY_ACTIVITY reports.
After migration to VM/ESA 1.2.2, system is paging more.
Solution: In VM/ESA 1.2.2, minidisk cache will use real storage as part for the cache by default. Having less real storage may increase paging, but it is the designs hope that overall performance is better. Check on the total real I/O rates to DASD. If all goes as planned, the I/O increase to paging DASD will be easily offset by I/O decrease for virtual DASD I/O. If it is still a concern, set a more appropriate limit for real storage (SET MDC STORAGE 0M ?M). It is strongly recommended you do not set real storage off for MDC entirely. Note that with MDC changes more data is eligible for MDC (1K and 2K formatted minidisks or VSE data for example). Therefore, MDC may require additional storage which would lead to less real DASD I/O for user I/O at the cost of extra paging I/O.
Guest with multiple virtual processors waits for CPU.
Solution: Increase Share setting. Basically, the current share setting is divided among the virtual processors. For example a virtual 4 way with default Relative 100 Share is treated as 4 relative 25 share processors in making scheduler decisions.
System performance slows down whenever DIRECTXA is issued.
Solution: Check if DIRECTXA target is full pack minidisk. If so, data in the minidisk cache for any other minidisks overlapped by the full pack are invalidated when the directory writes occur. This should not be as significant after VM/ESA 1.2.2 which does better job of handling overlapping minidisks.
VSE guests get stuck in E-list (seen with IND QUEUES EXP
Solution: Increase SET SRM STORBUF settings. Rough starting point could be multiple by 3 (100 85 75 becomes 300 255 225). Might possibly need to increase LDUBUF values as well, but just try STORBUF to start.
Recent migration shows higher paging and less available
Solution: Check if default trace table size being used and adjust accordingly. Can set without regen on VM/ESA 1.2.0 and higher with CP SET TRACEFRAMES command.
Poor response time with much more paging in system. CMS
working sets have increased.
Solution: Check for missing saved segments (VMLIB, VMMTLIB, Help, Pipelines, modules moved from CMS nuc to S-disk (installed as lsegs CMSQRYH and CMSQRYL in VM/ESA 1.2.2). The Q NSS MAP command can be helpful in determining which segments are being used. The CMS NUCXMAP * (SEGINFO and QUERY SEGMENT commands can be helpful to see what is being loaded for CMS.
Recent migration to or over VM/ESA 1.2.0 shows higher paging
especially during morning crunch period with high logon rate.
CMS 9 (VM/ESA 1.2.0) references about 25 additional pages
during initialization. Therefore high logon rate means high
IPL CMS rate. Also associated with shops where users have
short sessions (user logons each time they want to check mail
and then logs off right away).
Solution: Double check all segments to make sure not making it worse than necessary. Try saving storage elsewhere (exploit segments, SAVEFD, tracetable, etc). May require more storage. Some shops have found relief by not forcing idle users as quickly or by autologging users at 4am so overhead of IPL is over by the time people come in.
Paging seems less efficient as seen in higher service time for
Solution: Check that you are not running with paging/spooling config out of the box. As delivered, page and spool and user are mixed on same volumes as small areas. This causes bad seek patterns, interferes with Paging subsystem never-ending channel program and small areas are bad for block paging efficiency. Suggest you reconfigure with dedicated page and dedicated spool volumes.
You just upgraded processor to relieve CPU bottleneck.
New processor is 40% faster, but performance didn't improve
Solution: Check for latent demand and whether a different bottle neck (I/O or storage) has been hit first.
You just migrated to 9221 and didn't get expected performance.
Solution: Check if ESA or 370. Could be caused by misleading marketing information (i.e. sized an ESA system based on 370 planning Mips). There are some things we can do to soften the blow:
- exploit MDC if possible
- increase VTAM delay factor
- increase DSPSLICE
DASD on 3990 Cache Controllers do not appear to be caching.
Solution: Do QUERY DASD DETAILS for the volume and make sure CACHE is Yes for both subsystem and device for regular cache. For DASD fast write, NVS must be Y for subsystem and DFW must be Y for device. Use SET CACHE, SET NVS, and SET DASDFW to correct.
Short sporadic periods of terrible response time with lots
of expanded storage (> 1GB) on VM/ESA 1.2.1 or earlier. The
sporadic hits map to high system spin time (%SP on RTM SLOG
Solution: Use RETAIN XSTORE to fix amount of storage used for MDC. The sporadic hits are caused by holding the MDC lock while re-hashing the hash table for different size cache. Fixing the amount of storage for MDC avoids the size change.
Devices show 0% on 9221. They are attached
via integrated adapters. Device utilization in RTM and Monitor
is computed by using the timing values supplied in the
subchannel measurement block (Connect, Disconnect, Pending, etc)
which is updated by the hardware. Unfortunately, the IA hardware
doesn't support the timing values and therefore everything shows
up as zero.
Solution: Some information can be determined by looking at queueing on devices. Monitor does hi-frequency sampling on queue value in RDEV and VMPRF reports it in DASD_BY_ACTIVITY report. This value will not be accurate for page, spool, or mapped minidisks since the paging subsystem keeps its own queueing stats. For page, spool, and mapped mdisk see the VMPRF report DASD_SYSTEM_AREAS or the RTM SYSDASD display.
System seems sluggish with no obvious reason.
Solution: Check to see if IPLed last time and changed the TOD, and if so, whether it was for a large amount of time. If the TOD is changed by more than a few minutes, unpredictable results can occur since timer blocks which were suppose to go off for system feedback algorithms may not go off for a long period of time and that could bad. Note, this does not apply to systems with APAR VM60324 applied or at VM/ESA 2.2.0 or higher.
Performance Toolkit reports misleading I/O response times
for devices that are part of a Parallel Access Volumes (PAV) group.
Solution: There is a known problem with z/VM 5.2.0's Performance Toolkit that causes it to report I/O response times incorrectly in certain PAV situations. Refer to the z/VM 5.2.0 performance management discussion for more information.
Back to the Performance Tips Page