Common Performance Problems and Solutions
I believe I have a problem with my OSAs and queues forming and overflowing.
What can I review to better understand?
The VSWITCH overflow counters are a gauge on how responsive z/VM and OSA
are in processing the VSWITCH UpLink Port. The
following is a description of each counter:
Input Queue Overflow -
The number of times z/VM went to replenish additional empty buffers on
the input queue, but discovered all the existing
buffers have been full by OSA. In this case OSA is running faster then
z/VM can process the incoming data. Insufficient
processor resources or a dispatching is the typical cause for this occurrence.
Ideally, we want z/VM to run equal or faster
Output Queue Overflow -
The number of times z/VM went to send a packet to the physical network but
encounter the output queue full (all 128 buffers
waiting for OSA to process). In this case z/VM is running faster then OSA
can process the data. Most likely OSA's
bandwidth is stretched to its maximum either by z/VM or some other LPAR.
This is not necessary bad and typically occurs with
large streaming TCPIP workloads. Given z/VM typically runs faster then OSA.
Overflows in general are not bad for short periods of time. Both OSA and
z/VM will queue packets waiting for transmisson to
prevent packet lost. A high overflow count only becomes probamatic when
the overflow condition is long enough to cause the
queing logic to start discarding packets. In this case you will see the
Uplink Port's TX or RX packet discard rate or OSA's
packet loss counters go up too. In either case the overflow counters point
to the culpret (OSA or z/VM).
I have a Linux virtual machine
that is fairly processor intensive. I've increased
the number of virtual CPUs defined for the virtual machine, but it did
not seem to improve the performance. Why?
There are a couple things to remember in this scenario. First remember that
as the share value assigned to a virtual machine is distributed across the
started virtual CPUs. So if you have a virtual machine set with Relative 100
and move from 1 virtual CPU to 2 virtual CPUs without changing the Share
setting, you really just have two Relative 50 virtual CPUs for the work.
Another thing to remember is that if the workload in the virtual machine
is dominated by a single process (or small number) it can only be dispatched
on one virtual CPU at a time. For example, if a single process makes up or
consumes 30% of a workload, then running on more than a virtual 3-way will not
be efficient. As a single virtual CPU of a 3-way is 33% of the potential
resources. Adding more virtual CPUs in scenarios such as that, can actually
lower performance as the impact from the MP effect increases.
I heard that I should not mix my paging devices across different type of disk
(SCSI and ECKD) or even different sizes? Is this true? If so, why?
Yes, this is correct. You should use uniform devices for paging. They should
be of the same type (either FCP SCSI EDEVs or traditional ECKDs) and all of the
same size and speed. The z/VM Paging subsystem has a load balancing algorithm.
To be effective, that algorithm assumes the devices are of the same size and
basic characteristics. The calculations are done very differently for ECKD and
EDEVs. Trying to mix the two will result in unpredictable, and potentially poor,
I see a non-trivial number in my Linux Reports for %Steal. Is this
Current Linux distributions (RHEL 5 & SLES 10) report %Steal as well as pct User and pct System.
The %Steal field is a
Linux view of percent of time that it had work to run but was unable to run.
There are a number of things that could contribute to this and they are not all reasons to
be alarmed. The most commonly known case is
z/VM was dispatching other virtual machines. To check for this case,
compare to %CPU Wait in z/VM state sampling report.
Another case could be
z/VM was executing on behalf of the Linux virtual processor, compare to CP CPU usage of the Linux virtual machine. This
could be z/VM doing useful things like I/O channel program translation for an I/O Linux issued. A different aspect of
this could be z/VM processing a page fault. A less known case is
Linux yielded its time slice to z/VM via diagnose 0x9C instead of spinning on a formal spin lock. Examine diagnose rates
to see if this is a possibility. Also remember the
z/VM partition might not have been able to
run due to another logical partition being dispatched at the LPAR level.
So while the field is called Steal, it doesn't always mean stealing resources from one virtual machine to give to
The AVGPROC field on INDICATE LOAD command is showing 100%, but
Performance Toolkit and other monitors show a lower value. Why?
The INDICATE LOAD command AVGPROC field gives the processor
utilization out of 100% for the number of processors VM is running on.
It is based on processor time used and voluntary wait time. Therefore,
it is skewed when running in an LPAR when the partition is using shared
processors. It will further differ by the fact that it reports a
smoothed average, while other tools are averages for given monitor
Another way of looking at it is, z/VM was using 100% of what LPAR
was giving it. For this reason, we don't recommend INDICATE LOAD for
In the Performance Toolkit User State Sampling Report (FCX114),
the Other state (%OTH) is showing a non-zero value. What is this?
The Other state is a catch all for states of virtual processors that
are not one of the existing known states. When a scenario resulting
in Other happens enough, IBM tends to create a new state to define
that scenario or redefine an existing state to include this scenario
where it makes sense. One scenario that started appearing around
2007 for Other, is a virtual MP configuration and the base virtual
processors is idle, however the other virtual processors in the
configuration are in the dispatch list. The base virtual processor
must stay in the dispatch list when any of the other virtual processors
are there. The monitor was changed in z/VM V6.2 to adjust for this
We recently added additional paging volumes on a newer control unit,
as our page space utilization was getting high. Now we see less
consistent paging performance.
The paging algorithms are designed to work with paging volumes that
have the same characteristics. IBM does not recommend mixing different
size or different speed devices for paging space. Undesirable
effects may occur.
In the user state sampling report on my performance monitor product,
I see a large percentage of samples for
waiting on Test Idle (%TIW). It is not
clear to me which resource is experiencing a shortage.
There is no shortage for this virtual machine, other than perhaps
a shortage of work. Test Idle is a state the virtual machine is put
into when it first goes idle (load a wait state). To avoid the overhead
of dropping a virtual machine from the dispatch list and then
immediately re-adding it, z/VM keeps the virtual machine in the
dispatch list in a state called test idle for a period of time
in case other work may start. After 300 milliseconds of being in test
idle, z/VM will drop the user to the dormant list. So time spent in
test idle is really a measure of how long the virtual machine spent
idle for short periods of time. It is not a problem by itself.
This value also shows up in the Performance Toolkit User Detail
screen (FCX115) as PSW Wait.
I have a channel that is no longer being reported on my performance
Check to see if the channel was taken offline via the HMC without
first taking the devices on the channel offline. If this is the
case, the problem can sometimes be resolved by varying the channel
off and then back on at the z/VM level (via VARY CHPID). You may also need to vary
the path online also (via VARY PATH).
Performance Toolkit (or insert favorite monitor here) is giving me
alerts about the C1ETS being too high.
The C1ETS stands for class 1 Elapsed Time Slice. Each scheduler
class has an Elapsed Time Slice (ETS) associated with it. The
Class 1 ETS is dynamically adjusted by the scheduler. All the
other time slices are multiples of the C1ETS (classes 0/2/3
multiplication factors are 6/8/48 respectively). The scheduler
adjusts C1ETS in order to try and keep 85% of the transactions
as trivial (that is within the first ETS). On systems where
there are guests that never go truly idle, the transactions
are very infrequent and therefore can cause the scheduler to
increase the C1ETS. This isn't necessarily a problem since
the transactions are not real transactions.
I am looking at my performance monitor and I see transaction rates
and transaction response times. However, they make no sense to me.
There also appear to be very large spikes in these numbers. My
workload is primarily Linux guests.
You can ignore these transaction related numbers. They are based
on the Control Program's (CP) view of a transaction, which is
actually an estimation. CP basically defines a transaction ending
as whenever a virtual processor goes long term dormant. That is
a transaction will span test idle grace periods. So in many Linux
environments, the guests never go long term dormant frequently
enough to make the transaction counts of value.
I increased the number of virtual processors for a given virtual
machine, but it doesn't seem to be able to use additional processor
First, did you really need to add virtual processors? Often you can
achieve the same result by increasing the Share setting. Second,
can your workload effectively use additional processor resources? You
can check to see if your virtual machine is using the existing
processors on Performance Toolkit or other monitor, in most cases,
if it is not using over 80% of existing virtual processors there is
not value in adding additional ones. Also, some workloads do not
scale and one needs to be aware of that.
you increased the number of virtual processors, you might need to
increase the share setting as it is distributed across the virtual
processors for a given machine. For example, a 1-way virtual machine
with default relative share setting is given a second virtual processor.
If the Share setting is left the same, both virtual processors are
run as if they were each Share Relative 50 virtual machines. A
common rule is to increase the Share setting proportional to the number
of virtual processors for the given virtual machine. So in our example,
set the share to Relative 200 for the virtual 2-way virtual machine.
This is a simple approach and you might find changing share values for
other considerations also helpful.
Very high processor overhead with z/OS guests using virtual
CTCs and CFs
Check if IRD and WLM is attempting to be used on the z/OS virtual
machines. If so, disable it.
I'm seeing high processor utilization when running a VM system
as a guest of VM when running on LPAR.
This type of configuration results in three levels of
dispatching (SIE - start interpretive execution): LPAR, 1st level
VM, and 2nd level VM. The hardware implementation supports 2 layers
of SIE. However, when the third level is reached, a layer of SIE
needs to be virtualized which is very expensive in terms of
processor time. This third level is not supported for production
See Preferred Guest Migration Considerations
for related discussion.
High channel utilzation that cannot be explained by activity on
active logical partitions.
Check if you have a partition that is sitting in a wait state (such
as 1010 for z/VM). If you have a Storage Server (e.g. DS8000) or
other device that
is signaling all known partitions for things like state change (flash
copies), the state changes are reflected to all active partitions
whether they are able to deal with them or not. Deactivate the idle
partition to avoid extra traffic.
Linux guest has very slow response when connecting to the network.
Several things can affect performance when connecting, whether by
ftp, telnet or any other method. The most common thing to check is:
For more information refer to
z/VM Virtual Networking
Hints and Tips
- Make sure your specification for the name server is correct. Look
I shutdown a number of my Linux systems running in virtual machines,
but I still see them consuming storage in memory or on paging DASD.
Since VM virtualizes the architecture, it cannot free up backing
storage unless that is explicitly requested. In the case where you
shutdown the Linux system but keep the virtual machine logged on or
disconnected, VM will continue to maintain the memory that Linux
last used. To clear this up, either logoff the virtual machine or
issue a CP SYSTEM CLEAR command. If you reboot Linux using the IPL
command with the CLEAR option, this will also clear memory.
I get the following error messages on the SFS server:
DMS5GM3001W The file pool server has used up 92 percent of the
DMS5GM3001W available virtual storage
However, when I look at Query Filepool Report and compare the
Virtual Storage Size in Bytes to Virtual Storage Highest
Value there does not appear to be a problem.
The gotcha is that the second value is in Kbytes, while the first is
really in bytes.
For my older style network connections like LCS and Virtual CTC,
I have 2 devices defined: one for input and
one for output. The utilization of the input device appears to be
near 100% all the time, while the output line is much lower.
TCP/IP and other network servers may use seldom-ending channel
programs for I/O on the device line that receives data. If you
look at the breakdown of service time (disconnect, pending, and
connect time) you will see that the near-100% device has most of
the service time in the disconnect state. This is really the
idle time between actual I/Os in this scenario. The connect time
is the time data is actually being moved on the channel subsystem.
The connect times for the two devices should be fairly close
(assuming the same amount of data is coming in as going out).
So the high utilization is only an anomaly, and not something
to be concerned about.
I used the QMDC tool from the z/VM download library, but it seems to
give strange results for SFS minidisks.
There is a problem in that the VDEVIOCA counter is not updated in
some scenarios, particularly those involving multiple block I/O
requests. This results in low MDC rates as reported by QMDC.
However, MDC is working. Use Performance Toolkit
or similar tool to look at the
real I/O avoided due to MDC at the real device level.
I see large amounts of CP CPU time being charged to my DB2 Server for
VSE and VM (SQL/DS).
Check if you are using minidisks mapped to VM data spaces for SQL, and
if so, make sure those minidisks are disabled for MDC. The large amount
of CP processor time is most likely a result of overhead in searching
for data from that minidisk in MDC that needs to be flushed when SQL uses
the SAVE function to force the data in the data space to be written to
the mapped minidisks.
I have a processor with storage that can be configured as either
central or expanded storage. Which is better?
Typically, it is better to have more real storage than expanded storage.
However, rule number one applies. "It depends."
See Configuring Processor Storage for more
I migrated from VM/ESA 1.2.1 to VM/ESA 2.2.0 and now my full pack
minidisks (FPMs) are being minidisk cached. I don't want them to be
eligible for MDC.
Yes, prior to VM/ESA 1.2.2, full pack minidisks were not eligible for
MDC. Now FPMs defined via VOLSER are eligible. Unfortunately, the
MINIOPT directory statement is not valid for FPMs and the DASDOPT
directory statement does not include a NOMDC option. The following
- Use system config file and the RDEV statement to set MDC off for
the entire volume.
This is most effective when the FPMs are used exclusively for guest
- Use system config file and the RDEV statement to set MDC off by
default (DFLTOFF) for the volume and then explicitly set MDC on for
minidisks (via MINIOPT) overlaid by the FPM. The downside to this
solution is all the work to explicitly turn on MDC for each new
- Autolog a userid that links to all the FPMs and then issues
SET MDC MDISK commands to disable MDC for those FPMs. This is effective
but has a few more moving parts than we typically like.
- Allow MDC to be on by default for the volume, but use SET MDC
INSERT OFF in the backup program (or which ever userid you do not
want to use MDC). This can be appropriate for backup software and
- Just let the system take the defaults. In many systems the fair
share algorithms will keep any one user from abusing MDC.
See Minidisk Cache Guidelines for more
The RTM SYSDASD display or other vendor products show paging activity
to non-paging devices.
This is most likely paging activity for VM data spaces that are mapped
to minidisks. This allows an application to just reference a storage
location in the VM data space and have CP handle making sure the data
is moved from the minidisk to virtual storage. CP uses the paging
subsystem to accomplish this.
See VM Data Spaces for more details.
INDICATE LOAD, Monitor, and other sources say one of my processors is
100% busy. But it is clear there really is not that much work. Other
processors are not near 100%.
Check if the processor is a dedicated processor. If so, it will look
to VM as if it is 100% or near 100% busy. Some background: for dedicated
virtual processors, CP marks the virtual processor as being eligible for
the wait state assist. This means that when the virtual processor loads
a wait state, the hardware will not automatically exit SIE. The virtual
processor continues to run under SIE until an interrupt occurs. The
virtual machine will exit SIE eventually and CP will update the timers
appropriately but while the virtual processor has been in a wait state,
VM thinks it has been running the entire time. For dedicated processors,
one really needs to look at the 2nd level guest's performance monitor
tools to get accurate measures.
It takes 15 minutes for the spool initialization on my system to
complete. There are 90,000 files in the system.
This is as expected. While VM/ESA spool file initialization has been
improved greatly over VM/XA, it still takes a noticeable amount of time
for the spool file to initialize with large numbers of spool files.
Additional information can be found on the
Spool File Initialization Performance page.
Disconnect time for certain volumes is huge.
Validate that processor microcode is current. Disconnect
time is recorded by the hardware in the subchannel
measurement block in a 4 byte field. Several processor
lines have had a problem where they accidentally increment
the first byte out of sync with the other 3 bytes. Ouch!
After migration to VM/ESA 1.2.2, minidisk cache has very
high steal rate and low hit ratio.
Make sure there is some central storage being used for
minidisk cache. This is especially important when the
virtual I/O buffers are not page aligned. For the not
page aligned case, CP can not move data directly from
expanded storage to user's buffer. There are stats in
RTM and monitor data that reflect this.
After migration to VM/ESA 1.2.2, minidisk cache seems less
efficient, and the MDC inserts rejected due to fair share
limit have increased.
Make sure the directory option NOMDCFS is used for all the
appropriate server virtual machines. Also note that the
fair share limit minimum value in 1.2.2 may be too low.
This has been changed from 8 to 150 in VM/ESA 2.1.0. The
APAR to VM/ESA 1.2.2 is VM59590 (PTF UM27424).
It can be modded by changing the values in HCPTCMBK to the
TCMFSLIM DC F'150' Fair share insert limit for this
TCMFSLMM DC F'150' Minimum fair share insert limit
And then re-assemble HCPTCM.
After migration to VM/ESA 1.2.2, minidisk cache hit
ratio went down (got worse).
This may not be a problem. If the hit ratio is from RTM/ESA,
note that prior to VM/ESA 1.2.2, RTM computed hit ratio by
only looking at MDC eligible I/Os while in 2.2 and beyond all
virtual I/Os are considered. If the hit ratio is from the
INDICATE LOAD command (which only looks at eligible I/Os), note
that the count of eligible I/Os may be greater in VM/ESA 1.2.2.
So while hit ratio went down, total I/Os avoided went up. I/Os
avoided is reported on VMPRF DASD_BY_ACTIVITY reports.
After migration to VM/ESA 1.2.2, system is paging more.
In VM/ESA 1.2.2, minidisk cache will use real storage as part
for the cache by default. Having less real storage may
increase paging, but it is the designs hope that overall
performance is better. Check on the total real I/O rates to
DASD. If all goes as planned, the I/O increase to paging DASD
will be easily offset by I/O decrease for virtual DASD I/O.
If it is still a concern, set a more appropriate limit for
real storage (SET MDC STORAGE 0M ?M). It is strongly recommended
you do not set real storage off for MDC entirely.
Note that with MDC changes more data is eligible for MDC (1K
and 2K formatted minidisks or VSE data for example).
Therefore, MDC may require additional storage which would
lead to less real DASD I/O for user I/O at the cost of extra
Guest with multiple virtual processors waits for CPU.
Increase Share setting. Basically, the current share
setting is divided among the virtual processors. For
example a virtual 4 way with default Relative 100 Share
is treated as 4 relative 25 share processors in making
System performance slows down whenever DIRECTXA is issued.
Check if DIRECTXA target is full pack minidisk. If so,
data in the minidisk cache for any other minidisks overlapped
by the full pack are invalidated when the directory writes
occur. This should not be as significant after VM/ESA 1.2.2
which does better job of handling overlapping minidisks.
VSE guests get stuck in E-list (seen with IND QUEUES EXP
Increase SET SRM STORBUF settings. Rough starting point
could be multiple by 3 (100 85 75 becomes 300 255 225).
Might possibly need to increase LDUBUF values as well, but just
try STORBUF to start.
Recent migration shows higher paging and less available
Check if default trace table size being used and adjust
accordingly. Can set without regen on VM/ESA 1.2.0 and higher
with CP SET TRACEFRAMES command.
Poor response time with much more paging in system. CMS
working sets have increased.
Check for missing saved segments (VMLIB, VMMTLIB, Help,
Pipelines, modules moved from CMS nuc to S-disk (installed
as lsegs CMSQRYH and CMSQRYL in VM/ESA 1.2.2). The Q NSS MAP
command can be helpful in determining which segments are
being used. The CMS NUCXMAP * (SEGINFO and QUERY SEGMENT
commands can be helpful to see what is being loaded for CMS.
Recent migration to or over VM/ESA 1.2.0 shows higher paging
especially during morning crunch period with high logon rate.
CMS 9 (VM/ESA 1.2.0) references about 25 additional pages
during initialization. Therefore high logon rate means high
IPL CMS rate. Also associated with shops where users have
short sessions (user logons each time they want to check mail
and then logs off right away).
Double check all segments to make sure not making it worse
than necessary. Try saving storage elsewhere (exploit
segments, SAVEFD, tracetable, etc). May require more storage.
Some shops have found relief by not forcing idle users as
quickly or by autologging users at 4am so overhead of IPL is
over by the time people come in.
Paging seems less efficient as seen in higher service time for
Check that you are not running with paging/spooling config out
of the box. As delivered, page and spool and user are mixed on
same volumes as small areas. This causes bad seek patterns,
interferes with Paging subsystem never-ending channel program
and small areas are bad for block paging efficiency. Suggest
you reconfigure with dedicated page and dedicated spool volumes.
You just upgraded processor to relieve CPU bottleneck.
New processor is 40% faster, but performance didn't improve
Check for latent demand and whether a different bottle neck
(I/O or storage) has been hit first.
You just migrated to 9221 and didn't get expected performance.
Check if ESA or 370. Could be caused by misleading marketing
information (i.e. sized an ESA system based on 370 planning
Mips). There are some things we can do to soften the blow:
- exploit MDC if possible
- increase VTAM delay factor
- increase DSPSLICE
DASD on 3990 Cache Controllers do not appear to be caching.
Do QUERY DASD DETAILS for the volume and make sure CACHE is
Yes for both subsystem and device for regular cache. For
DASD fast write, NVS must be Y for subsystem and DFW must be
Y for device. Use SET CACHE, SET NVS, and SET DASDFW to
Short sporadic periods of terrible response time with lots
of expanded storage (> 1GB) on VM/ESA 1.2.1 or earlier. The
sporadic hits map to high system spin time (%SP on RTM SLOG
Use RETAIN XSTORE to fix amount of storage used for MDC. The
sporadic hits are caused by holding the MDC lock while
re-hashing the hash table for different size cache. Fixing
the amount of storage for MDC avoids the size change.
Devices show 0% on 9221. They are attached
via integrated adapters. Device utilization in RTM and Monitor
is computed by using the timing values supplied in the
subchannel measurement block (Connect, Disconnect, Pending, etc)
which is updated by the hardware. Unfortunately, the IA hardware
doesn't support the timing values and therefore everything shows
up as zero.
Some information can be determined by looking at queueing on
devices. Monitor does hi-frequency sampling on queue value in
RDEV and VMPRF reports it in DASD_BY_ACTIVITY report. This
value will not be accurate for page, spool, or mapped minidisks
since the paging subsystem keeps its own queueing stats. For
page, spool, and mapped mdisk see the VMPRF report
DASD_SYSTEM_AREAS or the RTM SYSDASD display.
System seems sluggish with no obvious reason.
Check to see if IPLed last time and changed the TOD,
and if so, whether it was for a large amount of time. If
the TOD is changed by more than a few minutes, unpredictable
results can occur since timer blocks which were suppose to
go off for system feedback algorithms may not go off for a
long period of time and that could bad. Note, this does not
apply to systems with APAR VM60324 applied or at VM/ESA 2.2.0 or
Performance Toolkit reports misleading I/O response times
for devices that are part of a Parallel Access Volumes (PAV) group.
There is a known problem with z/VM 5.2.0's Performance Toolkit
that causes it to report I/O response times incorrectly in
certain PAV situations. Refer to the z/VM 5.2.0
discussion for more information.
Back to the Performance Tips Page