General VM FAQ
What can be done if a user is in Logoff/Force Pending ---
can't FORCE the user and can't logon to it.
Logoff Force pending problems are generally NOT recoverable.
Over the past years there has been a dramatic reduction in occurrences
of these but they still may happen. The ability of z/VM to log of user
off the system has been enhanced and the holes have been plugged.
We aggressively investigate any of these problems we see
either internal to our test systems or in a Customer environment.
In the off chance this happens to you, gathering documentation
will help.
-
Gather any information as to what the virtual machine was doing prior to
the hang.
- Console logs from the Operator and guest
- Ask the user. Generally they may have an idea as to what was
going on even if they don't understand the cause.
- Other commands like INDICATE I/O, INDICATE USER, QUERY USER, QUERY
NAMES etc. can also be potentially useful.
- In the process of gathering this information note what other
virtual machines interact with the hung user.
Once in a while you can log one of those users off and the 'stuck' user
will also be logged off.
Of course sometimes you end up with two hung users.
- Get CP documentation. Either via a SNAPDUMP or Restart dump.
This will at least give the support center a fighting chance to
diagnose the problem. Without this there is little that can be done.
+ A display on the VMDBK by first using "LOCATE " and then
"DISPLAY HT.1000" can be useful but not as good as the SNAPDUMP
Now, how to recover after you obtain a SNAPDUMP.
- Other than restarting or IPLing the system there is no
proven 'Safe' way to recover the system.
Bad things "may" happen if you attempt to change CP
storage. Some examples:
- Changing the VM Userid can cause a soft or hard ABEND when CP
attempts to recalculate the hash table used for look-up.
* If changing the userid is successful and the user IS able to
logon again it may still have problems.
- Service machines or other users may not recognize a
2nd instance of this userid
- Minidisk may be forced Read Only because the first
user is still holding the write link
- Changing the deferred work counter (to one of course) can lead you
open to STK017 abends if the other work ever decides to finish.
-
Finally If you do make changes to userid or other areas in the VMDBK it
can make diagnosis of the problem much more difficult. This would be a
reason for taking a SNAPDUMP before attempting to fix the problem.
-
After all of this please open a problem with the support center. We will
likely ask you to send in the dump so it can be analyzed to find the root
cause. It is nearly impossible to identify the cause of these without
seeing documentation. Once in a while we get lucky using the console
log or other information but it's a not a high probability.
How does the Sysplex Timer affect VM
systems running in LPAR and basic modes?
The Sysplex Timer is an External Time Reference (ETR). For the rest of
this discussion, I will refer to it with the generic term ETR.
Each processor's physical TOD clocks are synchronized to the same time.
When running in LPAR mode, each partition may run with a different TOD
setting. Each logical partition has an EPOCH setting, which maintains
an offset from the physical TOD clock. When a system is IPL'd in a
logical partition and requests that the TOD be changed, it is actually
the offset in that partition's EPOCH which is changed by the SET CLOCK
instruction, not the physical TOD clock. This is the same concept as
changing a virtual machine's TOD setting for a VM guest system. Each
time a logical partition is activated, its EPOCH is initialized to zero,
which results in that partition's TOD matching the physical TOD.
The ETR by itself does not change the physical TOD clock nor the EPOCH
offset in an LPAR. The ETR sends signals. Out of sync conditions are
detected by the S/390 proessor, which then
generates interrupts to the operating system. It is then up to the
operating system to accept and process these interrupts. LPAR and
OS/390 (MVS) support them; VM does not.
When VM is running in basic mode, ETR sync requests are simply ignored.
Therefore, the physical TOD clock would never change
on a system running VM natively even if it were
connected to an ETR. While this will not have any impact to the VM
system (although TOD clock changes on a running VM system could cause
problems), the physical TOD clock on this system could get out of sync
with the others connected to the same ETR.
MVS and OS/390 systems do support ETRs. They enable themselves to
accept sync-check interrupts from the ETR. When running in basic mode,
they issue SET CLOCK instructions to synchronize the physical TOD clock
with the ETR.
When running in LPAR mode, sync-check interrupts are accepted and
processed by the LPAR (PR/SM). LPAR re-synchronizes the physical TOD
clocks and adjusts the EPOCHs of the logical partitions so it appears to
the partitions that the TOD has not changed.
This makes ETR re-syncs and physical TOD changes transparent to VM
systems running in an LPAR. As with basic mode, the VM system's TOD
could get out of sync with the rest of the sysplex.
For MVS and OS/390 systems running in logical partitions, the operating
system can enable for ETR related events (interrupts) and can read the
ETR data needed to re-synchronize itself to the ETR once LPAR has
re-synchronized the physical TODs. In this process, the operating system
will issue SET CLOCK which will result in LPAR adjusting that logical
partition's EPOCH value accordingly. This process enables MVS and
OS/390 systems running in LPAR mode to stay synchronized with the ETR.
Return to lists of VM FAQ
|