LONG RUNNING C APPLICATION HITS OUT-OF-STORAGE ABEND CONDITION
APAR Identifier ...... VM63819 Last Changed ........ 06/10/25
LONG RUNNING C APPLICATION HITS OUT-OF-STORAGE ABEND CONDITION
Symptom ...... AB ABEND Status ........... CLOSED PER
Severity ................... 2 Date Closed ......... 06/01/06
Component .......... 568411201 Duplicate of ........
Reported Release ......... 510 Fixed Release ............ 999
Component Name VM CMS Special Notice
Current Target Date ..05/11/18 Flags
SCP ...................
Platform ............
Status Detail: SHIPMENT - Packaged solution is available for
shipment.
PE PTF List:
PTF List:
Release 510 : UM31618 available 06/01/12 (0601 )
Release 520 : UM31619 available 06/01/12 (0601 )
Parent APAR:
Child APAR list:
ERROR DESCRIPTION:
User machine is program checking or abending with the message:
DMSABE155T User abend 4093 called from 0800F74A reason code
when running PING commands in a very large loop in a REXX EXEC.
LOCAL FIX:
PROBLEM SUMMARY:
****************************************************************
* USERS AFFECTED: Affected are all users running C/C++ *
* applications, especially any which are long *
* running and take a while to return to the *
* CMS READY message. *
****************************************************************
* PROBLEM DESCRIPTION: *
****************************************************************
* RECOMMENDATION: APPLY PTF *
****************************************************************
User had a long running REXX EXEC which continually called
PING and recorded the status of the target systems. When the
user migrated from z/VM 3.1.0 to z/VM 5.1.0, this long running
application suddenly began failing with an out-of-storage
condition. The user would see error message DTCINB004S from
the PING command followed by error message DMSABE155T from CMS.
PROBLEM CONCLUSION:
There were several areas in CMS multitasking, CMS sockets, and
LE where storage was being obtained and not correctly returned
to CMS storage management. This eventually led to the
out-of-storage condition and the failure of the application.
The LE changes are made via APAR VM63820. The CMS changes are
being made here.
Herewith, an explanation of what has changed and why.
The basic problems fixed are storage "leaks" encountered in
several different ways and thread timing problems leaving loose
signals around.
DMSMPT (Mutex Process Terminate)
--------------------------------
mutex create (DMSMCR) gets a single chunk of storage for both
the mutex control block and the mutex name. However, these two
entities are defined as separate definitions with distinct
starting addresses and lengths. When mutex process terminate is
doing its clean up work, one of the things it releases is the
mutex control block. However, mutex process terminate never
returns the chunk of storage for the mutex name. This results
in a slow fragmentation of virtual storage because we keep
leaving the blocks of mutex names laying around. This also very
quickly chews up CMS MT's pre-allocated storage blocks forcing
us to continuously issue new CMSSTOR OBTAINS to add pages to the
MT storage pool.
DMSTDY
--------------------------------
An earlier development change has been removed from DMSTDY.
That change caused CMS MT to ignore loose signals in order to
keep the thread priority up and not cause inordinate delays on
the thread issuing ThreadDelay. Ignoring loose signals caused
CMS MT to leave thousands of signal blocks, monitor blocks, etc.
lying around in storage. This eventually led to an
out-of-storage condition.
Also changed in is the error path after the EvenMonitorCreate
call. The error path attempted to call TimerStop to cancel
the wait time for the ThreadDelay. However, the last parameter
of the TimerStop call is an output parameter. This parameter
was hardcoded inside DMSTDY which is in the CMS nucleus. When
TimerStop tried to store into the parameter, DMSTDY died with
a protection exception.
DMSSGC (Complete signal request)
--------------------------------
For asynchronous signals, we keep a count of how processes are
looking at the signal. The old check was when the count reached
zero, we set a flag (disdata) saying we can delete the signal
data. However, there are cases where the count field can go
negative. At the points where we checked the count to see if we
can set the disdata flag, the count was not zero, we did not set
the flag, and the signal data was never deleted. This led to
storage fragmentation because we were leaving signal data around
long after its useful life. The signal data started out in
pre-allocated storage, which was used up quickly in the case of
PMR 30994,344,000. The "fix" was to set the disdata flag on
when the count field was LESS THAN OR EQUAL zero.
DMSSWT
--------------------------------
An earlier APAR, VM63151, corrected an ABEND when the semaphore
handle table grew in size and moved in storage. However, in
certain instances, the SemWait caller's input parameter for its
semaphore handle pointer could still point to the old table,
not the new table. With all the other changes made here, this
particular condition arose and produced a x'F09' kernel ABEND.
The "grow-the-table" code in DMSSWT has been altered to
re-initialize the semaphore handles so the caller's pointer is
now pointing to the new table, not the old.
DMS9IC and DMS9LO
--------------------------------
DMS9IC was changed to correctly release signal data storage
during the thread delete process. Also, several changes for a
reworked process clean up are included. The combination of
these changes returned fragments of storage which contributed to
the out-of-storage conditions.
DMSTST (z/VM 5.2.0 only)
--------------------------------
An update for DMSTST is the only piece of this APAR to apply
to z/VM 5.2.0. All of the changes cited above exist in the
z/VM 5.2.0 base as well as an change to DMSTST which has been
discovered to cause problems when "time remaining" is returned
to a caller. This problem is corrected as part of this APAR
since the problem was introduced as part of the overall changes
cited above.
TEMPORARY FIX:
COMMENTS:
MODULES/MACROS: DMSMPT DMSSGC DMSSWT DMSTDY DMSTST
DMS9IC DMS9LO VMMTLIB
SRLS: NONE
RTN CODES:
CIRCUMVENTION:
MESSAGE TO SUBMITTER:
|