LONG RUNNING C APPLICATION HITS OUT-OF-STORAGE ABEND CONDITION


 
 APAR Identifier ...... VM63819      Last Changed ........ 06/10/25
 LONG RUNNING C APPLICATION HITS OUT-OF-STORAGE ABEND CONDITION
 
 Symptom ...... AB ABEND             Status ........... CLOSED  PER
 Severity ................... 2      Date Closed ......... 06/01/06
 Component .......... 568411201      Duplicate of ........
 Reported Release ......... 510      Fixed Release ............ 999
 Component Name VM CMS               Special Notice
 Current Target Date ..05/11/18      Flags
 SCP ...................
 Platform ............
 
 Status Detail: SHIPMENT - Packaged solution is available for
                           shipment.
 
 PE PTF List:
 
 PTF List:
 Release 510   : UM31618 available 06/01/12 (0601 )
 Release 520   : UM31619 available 06/01/12 (0601 )
 
 Parent APAR:
 Child APAR list:
 
 ERROR DESCRIPTION:
 User machine is program checking or abending with the message:
 DMSABE155T User abend 4093 called from 0800F74A reason code
 when running PING commands in a very large loop in a REXX EXEC.
 
 LOCAL FIX:
 
 PROBLEM SUMMARY:
 ****************************************************************
 * USERS AFFECTED: Affected are all users running C/C++         *
 *                 applications, especially any which are long  *
 *                 running and take a while to return to the    *
 *                 CMS READY message.                           *
 ****************************************************************
 * PROBLEM DESCRIPTION:                                         *
 ****************************************************************
 * RECOMMENDATION: APPLY PTF                                    *
 ****************************************************************
 User had a long running REXX EXEC which continually called
 PING and recorded the status of the target systems.  When the
 user migrated from z/VM 3.1.0 to z/VM 5.1.0, this long running
 application suddenly began failing with an out-of-storage
 condition.  The user would see error message DTCINB004S from
 the PING command followed by error message DMSABE155T from CMS.
 
 PROBLEM CONCLUSION:
 There were several areas in CMS multitasking, CMS sockets, and
 LE where storage was being obtained and not correctly returned
 to CMS storage management.  This eventually led to the
 out-of-storage condition and the failure of the application.
 The LE changes are made via APAR VM63820.  The CMS changes are
 being made here.
 
 Herewith, an explanation of what has changed and why.
 
 The basic problems fixed are storage "leaks" encountered in
 several different ways and thread timing problems leaving loose
 signals around.
 
 DMSMPT (Mutex Process Terminate)
 --------------------------------
 mutex create (DMSMCR) gets a single chunk of storage for both
 the mutex control block and the mutex name.  However, these two
 entities are defined as separate definitions with distinct
 starting addresses and lengths.  When mutex process terminate is
 doing its clean up work, one of the things it releases is the
 mutex control block.  However, mutex process terminate never
 returns the chunk of storage for the mutex name.  This results
 in a slow fragmentation of virtual storage because we keep
 leaving the blocks of mutex names laying around.  This also very
 quickly chews up CMS MT's pre-allocated storage blocks forcing
 us to continuously issue new CMSSTOR OBTAINS to add pages to the
 MT storage pool.
 
 DMSTDY
 --------------------------------
 An earlier development change has been removed from DMSTDY.
 That change caused CMS MT to ignore loose signals in order to
 keep the thread priority up and not cause inordinate delays on
 the thread issuing ThreadDelay.  Ignoring loose signals caused
 CMS MT to leave thousands of signal blocks, monitor blocks, etc.
 lying around in storage.  This eventually led to an
 out-of-storage condition.
 
 Also changed in is the error path after the EvenMonitorCreate
 call.  The error path attempted to call TimerStop to cancel
 the wait time for the ThreadDelay. However, the last parameter
 of the TimerStop call is an output parameter.  This parameter
 was hardcoded inside DMSTDY which is in the CMS nucleus.  When
 TimerStop tried to store into the parameter, DMSTDY died with
 a protection exception.
 
 DMSSGC (Complete signal request)
 --------------------------------
 For asynchronous signals, we keep a count of how processes are
 looking at the signal.  The old check was when the count reached
 zero, we set a flag (disdata) saying we can delete the signal
 data.  However, there are cases where the count field can go
 negative.  At the points where we checked the count to see if we
 
 can set the disdata flag, the count was not zero, we did not set
 the flag, and the signal data was never deleted.  This led to
 storage fragmentation because we were leaving signal data around
 long after its useful life.  The signal data started out in
 pre-allocated storage, which was used up quickly in the case of
 PMR 30994,344,000.  The "fix" was to set the disdata flag on
 when the count field was LESS THAN OR EQUAL zero.
 
 DMSSWT
 --------------------------------
 An earlier APAR, VM63151, corrected an ABEND when the semaphore
 handle table grew in size and moved in storage.  However, in
 certain instances, the SemWait caller's input parameter for its
 semaphore handle pointer could still point to the old table,
 not the new table.  With all the other changes made here, this
 particular condition arose and produced a x'F09' kernel ABEND.
 The "grow-the-table" code in DMSSWT has been altered to
 re-initialize the semaphore handles so the caller's pointer is
 now pointing to the new table, not the old.
 
 DMS9IC and DMS9LO
 --------------------------------
 DMS9IC was changed to correctly release signal data storage
 during the thread delete process.  Also, several changes for a
 reworked process clean up are included.  The combination of
 these changes returned fragments of storage which contributed to
 the out-of-storage conditions.
 
 DMSTST (z/VM 5.2.0 only)
 --------------------------------
 An update for DMSTST is the only piece of this APAR to apply
 to z/VM 5.2.0.  All of the changes cited above exist in the
 z/VM 5.2.0 base as well as an change to DMSTST which has been
 discovered to cause problems when "time remaining" is returned
 to a caller.   This problem is corrected as part of this APAR
 since the problem was introduced as part of the overall changes
 cited above.
 
 TEMPORARY FIX:
 
 COMMENTS:
 
 MODULES/MACROS:   DMSMPT   DMSSGC   DMSSWT   DMSTDY   DMSTST
 DMS9IC   DMS9LO   VMMTLIB
 
 SRLS:      NONE
 
 RTN CODES:
 
 CIRCUMVENTION:
 
 MESSAGE TO SUBMITTER: