Linux Performance when running under VM
The following are some things to keep in mind when running Linux as a
guest of VM.
Some of these factors are more important if you are running 100s of
guests as opposed to a single guest.
Many of the guidelines that exist for other guests
(VSE and OS/390)
apply here as well.
Why is my network response so slow for my Linux guest?
Several factors can affect response time. Be sure to check and
watch the following:
- Ensure /etc/resolv.conf points to the correct (valid) name
- As much as possible make sure that the mtu is the same size
along the entire path. Do traceroute to see what path is taken.
Any negotiation that occurs for mtu can affect performance.
Why does my idle Linux Guest consume processor resources?
By default Linux "wakes up" 100 times per second. These timer pops
are part of the way that it determines if it has work to do. It also
maintains its "jiffies" through this mechanism. These timer pops have
three system impacts when run as a guest of VM
- Processor time is consumed in this process. The majority of the
cycles are used by VM's control program in virtualizing the interrupt.
- Parts of the interrupt reflection must run on VM's master processor.
- Because the Linux guest wakes up so often, VM considers it to always
be active. Therefore, it will not be dropped from the dispatch list and
the storage management routines will tend not to steal pages from these
A couple approaches have been used to address this problem
- Do not define more virtual CPUs for a guest than are needed. The
timer pops are on a per virtual processor basis (not per virtual
machine). See also Scheduler Basics.
- Some customers have changed the HZ value in Linux (default of
100) to a lower value. Caution must be used since lowering the HZ
value could make Linux less responsive or stop functioning. I have
heard of some success at a value of 16, with lower responsiveness.
- There has been work on a patch for the 2.4 kernel that avoids the
timer pops and jiffies. There is on-going discussions about whether
this should be incorporated into the official Linux kernel. Some
zSeries distributions do incorporate it already.
Our measurement results showed it lowered the total
processor time for an idle Linux guest by about 78%.
How Big of a Virtual Machine Should I Use?
In general, do not define the virtual machine larger
than you need.
Consider decreasing the size of
the virtual machine and watching the Linux swapping. If you decrease
the virtual machine and no swapping occurs, then the smaller machine
size may be acceptable.
Sometimes a good guess at virtual machine size
is the z/VM scheduler's
assessment of the Linux guest's working set size.
Excessive virtual machine sizes negatively impact system
As far as the Linux guest is concerned, the storage
it has access to is all real and it should use it. Therefore, whatever
storage it doesn't need for code and control blocks and virtual pages,
it will often use as part of its file buffer cache. So if you define
the storage for a guest to be 2 GB, it will often use all of that
storage, even though the applications the Linux guest is running
could fit effectively into a far smaller virtual machine.
Where should Linux swap?
Try to avoid swapping in Linux whenever possible.
It adds pathlength and
significant hit to response time.
However, sometimes swapping is unavoidable. If you have
to swap, these are your choices.
- Dedicated volume
- If the storage load on your Linux guest is large, the
guest might need a lot of room for swap. One way to
accomplish this is simply to ATTACH or DEDICATE an entire
volume to Linux for swapping. If you have the DASD to
spare, this can be a simple and effective approach.
- Traditional Minidisk
- Using a traditional minidisk on physical DASD requires some setup
and formatting the first time and whenever changes in size of swap
space are required. However, the storage burden on z/VM to support
minidisk I/O is small, the controllers are well-cached, and I/O
performance is generally very good.
If you use a traditional minidisk, you should
disable z/VM Minidisk Cache (MDC) for that minidisk
(use MINIOPT NOMDC
statement in the user directory).
- VM T-disk
- A VM temporary disk (tdisk) could be used. This lets one define
disks of various sizes with less consideration for
placement (having to find 'x' contiguous cylinders by hand if you
don't have DIRMAINT or a similiar product). However, t-disk
so it needs to be configured (perhaps via PROFILE EXEC)
whenever the Linux virtual machine logs on.
Storage and performance benefits of traditional minidisk I/O apply.
If you use a t-disk, you should disable minidisk cache
for that minidisk.
- VM V-disk
- A VM virtual disk in storage (VDISK)
is transient like a t-disk is. However, VDISK is backed by
a memory address space instead of by real DASD.
While in use, VDISK blocks resides in central
storage (which makes it very fast). When not in use, VDISK
can be paged out to expanded storage or paging DASD. The use of
VDISK for swapping is sufficiently complex that we have
written a separate tips page for it.
- Attach expanded storage to the Linux guest and allow it to swap
to this media. This can give good performance if the Linux guest
makes good use of the memory, but it can waste valuable memory
if Linux uses it poorly or not at all. In general, this is not
recommended for use in a z/VM environment.
- Create an EW/EN DCSS and configure the Linux guest to swap
to the DCSS. This technique is useful for cases where the Linux
guest is storage-constrained but the z/VM system is not. The
technique lets the Linux guest dispose of the overhead associated
with building channel programs to talk to the swap device.
For one illustration of the use of swap-to-DCSS, read the
Linux assigns priorities to swap extents. So, for example,
you could set up a small VDISK with higher priority (higher
numeric value) and it would be selected for swap as long as
there were space on the VDISK
to contain the process being swapped.
Swap extents of equal priority are used in round-robin
fashion. Equal prioritization can be used to spread swap I/O
across chpids and controllers, but if you are doing this,
be careful not to put all
the swap extents on minidisks on
the same physical DASD volume, for if you do, you will not
be accomplishing any spreading.
Use swapon -p ... to set swap extent priorities.
Should I set Quick Dispatch on for my Linux Guest?
Quick dispatch (QUICKDSP) can be set in the directory or via the CP
SET command. It makes a virtual machine exempt from being held back
in the eligible list during scheduling. Instead, the virtual
machine goes directly to the dispatch list.
In general, we recommend setting QUICKDSP on for
production guests and server virtual
machines that perform critical system functions.
However, you may not want to set it on for all your test or
VM scheduler to create an eligible list allows it to
avoid some thrashing situations that could occur from over committing
There is also an
excellent synopsis of the situation on the Linux-390
listserver from Malcolm Beattie.
Eligible lists are forming on my system. Certain Linux
guests are remaining unscheduled for very long periods.
What do I do?
This is closely related to the above question.
When the sum of the virtual machine sizes of the logged-on
Linux guests approaches the size of central storage, eligible
lists will tend to form. This is because Linux guests tend
to want to touch all of their pages and because Linux guests
tend not to drop from the dispatch list.
You can solve this with one of two approaches: use of QUICKDSP or
changing the SRM STORBUF settings.
The choice is dependent on where you want the responsibility to
protect the system from thrashing. The more you use QUICKDSP, the
greater the responsibility on yourself. Setting appropriate
STORBUF settings puts the responsibility on the VM scheduler.
The next line of defense is to set up the Linux guests conservatively
as regards the virtual storage sizes and to set up the VM system well
for paging. Here are some guidelines:
Set each Linux machine's virtual storage size only as large as it
needs to be to let the desired Linux application(s) run. This
will suppress the Linux guest's tendency to use its entire address
space for file cache. Make up for this with MDC if the Linux file
system is hit largely by reads. Otherwise turn MDC off, because
it induces about an 11% instruction path length penalty on writes,
consumes storage for the cached data, and pays off little because
the read fraction is not high enough.
Use whole volumes for VM paging, instead of fractional volumes.
In other words, never mix paging I/O and non-paging I/O on the same
Implement a one-to-one relationship between paging CHPIDs and
Spread the paging volumes over as many DASD control units as you can.
If the paging control units support NVS or DASDFW, turn them on
(applies to RAID devices).
Provide at least twice as much DASD paging space (CP QUERY ALLOC
PAGE) as the sum of the Linux guests' virtual storage sizes.
Having at least one paging volume per Linux guest is a great thing.
If the Linux guest is using synchronous page faults, exactly one
volume per Linux guest will be enough. If the guest is using
asynchronous page faults, more than one per guest might be
appropriate; one per active Linux application would be more like it.
It is best if the VM paging volumes are all of the same model and
characteristics. Undesirable effects can occur when mixing different
size or speed devices.
In QDIO-intensive environments, plan that 1.25 MB per idling real
QDIO adapter will be consumed out of CP below-2GB free storage,
for CP control blocks (shadow queues). If the adapter is being
driven very hard, this number could rise to as much as 40 MB per
adapter. This tends to hit the below-2GB storage pretty hard.
CP prefers to resolve below-2GB contention by using XSTORE.
Consider configuring at least 2 GB to 3 GB of XSTORE so as to
back the below-2GB central storage, even if central storage is
If you need to favor storage use toward certain Linux
guests, CP SET RESERVE might be something to try.
How should I configure Linux guests to communicate with each other
For early 2.2 based Linux systems,
IUCV is faster than virtual CTC for communicating between
two Linux guests.
See the z/VM Performance Report
for some measurement information.
However, some recent 2.4 measurements show IUCV to be slightly slower
than virtual CTC, at least at small MTU sizes.
Not all non-Linux guests can use IUCV, so connecting to an OS/390 guest
may require use of virtual CTCs. One must also balance RAS considerations
when making a choice in communication methods.
For customers on z/VM 4.2.0 and with appropriate support in Linux,
the Guest LAN connectivity is a good choice.
Unlike point-to-point methods like IUCV and vCTC, Guest LAN is ring
based and can be much simplier to configure and maintain.
And the performance is good.
How should I configure processor memory for this environment?
On most S/390 and zSeries processors, one has the option of configuring
processor memory as either central or expanded storage. With 31-bit
addressing systems, 2 gigabytes of central storage was the limit that
could be used. With the introduction of z/VM and 64-bit support, that
limit has been lifted and begs the question
Do I still need
Yes. Even with 64-bit support, we still recommend that some processor
storage be defined as expanded storage. See
Configuring Processor Storage for more
details. A good starting point may be to define 25% of the processor
storage as expanded storage.
On z/VM Version 4 systems, if there is contention for real memory below
2GB, some relief can be found by limiting use of minidisk cache (MDC)
in expanded storage, via the CP command SET MDC XSTORE.
Can I run a shared copy of Linux like I run a shared CMS?
Yes, to some extent. There has been work done to create an NSS (named
saved system) for the Linux kernel.
Think of an NSS as a snapshot of the Linux kernel at boot time. Instead
of IPLing (booting) the kernel off of disk and reading in all of the
kernel, you can IPL this snapshot which VM can keep in memory. Further,
the pages in memory making up this snapshot can be shared with many
virtual machines. This decreases memory requirements by having one
copy of the kernel in memory instead of one for each guest.
Do I need Paging Space on DASD?
YES.One of the common mistakes with new VM customers is to
ignore paging space. The VM system as shipped contains enough page
space to get the system installed and running trivial work.
However, you should add DASD page space to do real work.
The Planning and Admin book has details on determining how much
space is required. Here are a few thoughts:
- If the system is not paging. You may not care where you put the
page space. However, it has been my experience that sooner or later
the system grows to a point where it pages and then you'll wish you
had thought about it.
- VM paging is most optimal when it has large contiguous available
space on volumes that are dedicated to paging. Therefore, do not mix
page space with other space (user, tdisk, spool, etc.).
- A rough starting point for page allocation is to add up the
virtual machine sizes of virtual servers running and multiple by 2.
Keep an eye on the allocation percentage and the block read set size.
Understanding poor performance due to paging
Other Miscellaneous Tips
- After you shutdown a Linux system, it is best to logoff the virtual
If you only
shutdown a Linux system running in a guest virtual machine,
VM continues to back the memory used by Linux.
You can also use the CP SYSTEM CLEAR command to reset the virtual
machine and clear free backing storage.
- Be careful of cron jobs. Some distributions ship with cron
jobs configured to do processing automatically, such as file integrity
or security scans. These may run fine for one or two virtual servers, but
could cause significant problems for more servers depending on the job.
For example, to have 100 virtual servers all wake up at midnight to scan
and validate every file on the system would result in significant storage
and processor resource consumption. You might want to adjust the cron
settings in those cases.
- Application performance can have an impact on system performance.
An application in an error loop or an application processing large
amounts of data in a byte-by-byte fashion are two examples where a
larger than expected impact can be placed on the system.
- You can get a sense of the system your Linux virtual server is
running on by issuing cat /proc/sysinfo.
In the example below, the virtual machine LINDV1 running Linux is on a
guest on a z/VM 4.3.0 system with 1 virtual processor. The z/VM
system runs on an LPAR with 3 logical processors dedicated to
this VM partition. The physical box is 9-way (2064-109).
Sequence Code: 0000000000051542
CPUs Total: 10
CPUs Configured: 9
CPUs Standby: 0
CPUs Reserved: 1
Adjustment 02-way: 94
Adjustment 03-way: 90
Adjustment 04-way: 87
Adjustment 05-way: 84
Adjustment 06-way: 81
Adjustment 07-way: 79
Adjustment 08-way: 76
Adjustment 09-way: 73
Adjustment 10-way: 70
LPAR Number: 3
LPAR Characteristics: Dedicated
LPAR Name: SPRF3
LPAR Adjustment: 333
LPAR CPUs Total: 3
LPAR CPUs Configured: 3
LPAR CPUs Standby: 0
LPAR CPUs Reserved: 0
LPAR CPUs Dedicated: 3
LPAR CPUs Shared: 0
VM00 Name: LINDV1
VM00 Control Program: z/VM 4.3.0
VM00 Adjustment: 333
VM00 CPUs Total: 1
VM00 CPUs Configured: 1
VM00 CPUs Standby: 0
VM00 CPUs Reserved: 0
Is there other information of interest?
The following links may be of interest.
Back to the Performance Tips Page