LONG RUNNING C APPLICATION HITS OUT-OF-STORAGE ABEND CONDITION
APAR Identifier ...... VM63819 Last Changed ........ 06/10/25 LONG RUNNING C APPLICATION HITS OUT-OF-STORAGE ABEND CONDITION Symptom ...... AB ABEND Status ........... CLOSED PER Severity ................... 2 Date Closed ......... 06/01/06 Component .......... 568411201 Duplicate of ........ Reported Release ......... 510 Fixed Release ............ 999 Component Name VM CMS Special Notice Current Target Date ..05/11/18 Flags SCP ................... Platform ............ Status Detail: SHIPMENT - Packaged solution is available for shipment. PE PTF List: PTF List: Release 510 : UM31618 available 06/01/12 (0601 ) Release 520 : UM31619 available 06/01/12 (0601 ) Parent APAR: Child APAR list: ERROR DESCRIPTION: User machine is program checking or abending with the message: DMSABE155T User abend 4093 called from 0800F74A reason code when running PING commands in a very large loop in a REXX EXEC. LOCAL FIX: PROBLEM SUMMARY: **************************************************************** * USERS AFFECTED: Affected are all users running C/C++ * * applications, especially any which are long * * running and take a while to return to the * * CMS READY message. * **************************************************************** * PROBLEM DESCRIPTION: * **************************************************************** * RECOMMENDATION: APPLY PTF * **************************************************************** User had a long running REXX EXEC which continually called PING and recorded the status of the target systems. When the user migrated from z/VM 3.1.0 to z/VM 5.1.0, this long running application suddenly began failing with an out-of-storage condition. The user would see error message DTCINB004S from the PING command followed by error message DMSABE155T from CMS. PROBLEM CONCLUSION: There were several areas in CMS multitasking, CMS sockets, and LE where storage was being obtained and not correctly returned to CMS storage management. This eventually led to the out-of-storage condition and the failure of the application. The LE changes are made via APAR VM63820. The CMS changes are being made here. Herewith, an explanation of what has changed and why. The basic problems fixed are storage "leaks" encountered in several different ways and thread timing problems leaving loose signals around. DMSMPT (Mutex Process Terminate) -------------------------------- mutex create (DMSMCR) gets a single chunk of storage for both the mutex control block and the mutex name. However, these two entities are defined as separate definitions with distinct starting addresses and lengths. When mutex process terminate is doing its clean up work, one of the things it releases is the mutex control block. However, mutex process terminate never returns the chunk of storage for the mutex name. This results in a slow fragmentation of virtual storage because we keep leaving the blocks of mutex names laying around. This also very quickly chews up CMS MT's pre-allocated storage blocks forcing us to continuously issue new CMSSTOR OBTAINS to add pages to the MT storage pool. DMSTDY -------------------------------- An earlier development change has been removed from DMSTDY. That change caused CMS MT to ignore loose signals in order to keep the thread priority up and not cause inordinate delays on the thread issuing ThreadDelay. Ignoring loose signals caused CMS MT to leave thousands of signal blocks, monitor blocks, etc. lying around in storage. This eventually led to an out-of-storage condition. Also changed in is the error path after the EvenMonitorCreate call. The error path attempted to call TimerStop to cancel the wait time for the ThreadDelay. However, the last parameter of the TimerStop call is an output parameter. This parameter was hardcoded inside DMSTDY which is in the CMS nucleus. When TimerStop tried to store into the parameter, DMSTDY died with a protection exception. DMSSGC (Complete signal request) -------------------------------- For asynchronous signals, we keep a count of how processes are looking at the signal. The old check was when the count reached zero, we set a flag (disdata) saying we can delete the signal data. However, there are cases where the count field can go negative. At the points where we checked the count to see if we can set the disdata flag, the count was not zero, we did not set the flag, and the signal data was never deleted. This led to storage fragmentation because we were leaving signal data around long after its useful life. The signal data started out in pre-allocated storage, which was used up quickly in the case of PMR 30994,344,000. The "fix" was to set the disdata flag on when the count field was LESS THAN OR EQUAL zero. DMSSWT -------------------------------- An earlier APAR, VM63151, corrected an ABEND when the semaphore handle table grew in size and moved in storage. However, in certain instances, the SemWait caller's input parameter for its semaphore handle pointer could still point to the old table, not the new table. With all the other changes made here, this particular condition arose and produced a x'F09' kernel ABEND. The "grow-the-table" code in DMSSWT has been altered to re-initialize the semaphore handles so the caller's pointer is now pointing to the new table, not the old. DMS9IC and DMS9LO -------------------------------- DMS9IC was changed to correctly release signal data storage during the thread delete process. Also, several changes for a reworked process clean up are included. The combination of these changes returned fragments of storage which contributed to the out-of-storage conditions. DMSTST (z/VM 5.2.0 only) -------------------------------- An update for DMSTST is the only piece of this APAR to apply to z/VM 5.2.0. All of the changes cited above exist in the z/VM 5.2.0 base as well as an change to DMSTST which has been discovered to cause problems when "time remaining" is returned to a caller. This problem is corrected as part of this APAR since the problem was introduced as part of the overall changes cited above. TEMPORARY FIX: COMMENTS: MODULES/MACROS: DMSMPT DMSSGC DMSSWT DMSTDY DMSTST DMS9IC DMS9LO VMMTLIB SRLS: NONE RTN CODES: CIRCUMVENTION: MESSAGE TO SUBMITTER: