Skip to main content

IBM Systems  >   z Systems  >   z/VM  >  

Performance and Monitor Data Collection

Last updated 02 December 2011


Some useful CP commands

INDICATE QUEUES EXP if scheduler related (get a couple of these).
useful for looking at scheduler related problems and snapshot of what users are waiting on. Since this is a snapshot, it is often most useful to capture repeated invocations.
QUERY FRAMES
useful for storage related problems
INDICATE I/O
See what users are waiting for what I/O. Can be great to spot a volume for which minidisk cache or CU cache were mistakenly turned off.
QUERY NSS MAP
use this to insure segments are being used in storage related problems.

Monitor Data Collection

The exact data we collect may differ slightly from problem to problem, but the following sections give insight into what data to collect and how to go about doing that. On current z/VM releases, there is a preconfigured userid called MONWRITE and the default installation creates a MONDCSS saved segment. In those cases, all you need is the following:

Quick Cheat Sheet

  1. Validate MONDCSS Issue the CP QUERY NSS NAME MONDCSS to check for the existence of a DCSS for monitor. If it does not exist, then see background section for addtional information on creating a MONDCSS.
  2. Start Monitor. If you do not currently run a performance monitor such as Performance Toolkit, then you will need to enable the appropriate monitor domains. Issue the following commands from a userid, such as MAINT, that has A or E privilege class to enable and start monitor
         CP MONITOR SAMPLE ENABLE ALL
         CP MONITOR EVENT ENABLE STORAGE
         CP MONITOR EVENT ENABLE PROCESSOR
         CP MONITOR EVENT ENABLE I/O ALL
         CP MONITOR EVENT ENABLE APPLDATA ALL
         CP MONITOR START
    
  3. Start MONWRITE. On the preconfigured MONWRITE userid, there is an A-disk that is suffcient for multiple hours of data. However, for larger spans of data, you will want to increase the size of the minidisk used. In this example, we assume the minidisk is accessed as filemode A. as filemode B. From the MONWRITE userid, issue the following command to collect data to a file id of MONITOR DATA A
         MONWRITE MONDCSS *MONITOR DISK MONITOR DATA A
    
  4. Stop MONWRITE and Monitor. From the writer id issue
         MONWSTOP
    

    From privileged id, issue

         CP MONITOR STOP
    

Automation

You may want to consider some automation to keep Monwrite running continuously. If so, the following steps may be helpful.

  1. In the profile exec for MONWRITE, insert a command such as the following. It will create a new Monwrite file that closes at the end of the hour (60 minutes). The files will be named Dxxxxxx Txxxxxx, where that corresponds to the Day and Time.
      MONWRITE MONDCSS *MONITOR DISK CLOSE 60
    
  2. Eventually the minidisk A will become full. There is a package on the z/VM Download page which will delete the oldest files as the disk fills up. See MONCLEAN Package

Background

If the above cheat sheet seemed too simple, read the following for a discussion of why and what the above is doing.

  1. If you currently don't run Monitor, you'll need to create a DCSS. Example commands (note DCSS size depends on config and domains collected). The RSTD option states that only a userid with an appropriate NAMESAVE statement in its directry entry can load the segment. This is for security reasons. The segment is defined for a total of 64 meg starting at the 144 meg. You need to make sure there are no other segments in the userid that overlap with this one.
         CP DEFSEG MONDCSS 9000-CFFF SC RSTD
         CP SAVESEG MONDCSS
    

    NOTE: A default MONDCSS definition is provided with your z/VM system. See the Guide for Automatic Installation, Chapter 9. "Contents of Your z/VM System" under the section titled "Saved Segments on the z/VM System".

    More details on the Monitor DCSS size are available.

  2. If you currently don't run Monitor, you will also need to set up a userid to collect the data. Create a userid as you would normally and add the following statements.
         MACH XA
         NAMESAVE MONDCSS
         IUCV *MONITOR MSGLIMIT 255
         SHARE ABSOLUTE 3%
         OPTION QUICKDSP
    

    The virtual machine will run CMS and needs to be an XA or XC mode machine. The NAMESAVE options allows the virtual machine to load a restricted DCSS. This is not needed if you do not use the RSTD option on the earlier DEFSEG command. The IUCV statement allows the virtual machine to connect to the Monitor CP system service.

    NOTE: A default MONWRITE userid is provided in the USER DIRECT file on the MAINT 2CC minidisk provided with your z/VM system.

  3. Typically, we enable for all SAMPLE data. Some people only include DASD devices in I/O Domain. The following EVENT domains are also of interest: STORAGE, PROCESSOR, I/O (not SEEKS!), and APPLDATA. Enabling the SCHEDULER domain in general is not a good idea because of the volume of data it creates. For scheduler related problems, enabling for a dummy user (such as $SPOOL$) can be of help. There are times when it is helpful to collect Seeks data, however, in general it is not recommended. The following commands accomplish the above.
         CP MONITOR SAMPLE ENABLE ALL
         CP MONITOR EVENT  ENABLE STORAGE
         CP MONITOR EVENT  ENABLE PROCESSOR
         CP MONITOR EVENT  ENABLE I/O       ALL
         CP MONITOR EVENT  ENALBE APPLDATA  ALL
    
  4. Data should cover "good" times as well as "bad" times if possible.

  5. The monitor interval is the amount of time inbetween the CP monitor collecting data written as sample records. The interval should be small enough to get many samples, capture the problem, and not risk counters wrapping. For VMPAF analysis it is nice to have at least 100 samples. If the problem being diagnosed is a system slow down that only occurs for 2 minutes, a 3 minute sample interval is too large since the problem could come and go between samples. Some counters can start to wrap after 15 minutes. The interval size needs to be balanced as the smaller the interval, the more data is collected. The default is 1 minute. It can be changed (for example) to 5 minutes.
         CP MONITOR SAMPLE INTERVAL 5 MIN
    
  6. Start the CP monitor.
         CP MONITOR START
    
  7. Start the virtual machine configured to collect monitor data. It will need access to the MONWRITE utility.
         MONWRITE MONDCSS *MONITOR DISK fn ft fm
      or MONWRITE MONDCSS *MONITOR TAPE 181
    
  8. To Stop issue "MONWSTOP" for MONWRITE and MONITOR STOP for CP.

  9. See CP Command and Utility Reference for info on MONITOR COMMANDS and MONWRITE (note monwrite is in back of book with other utilities)

  10. ROT (Rule of Thumb): raw monitor data size in bytes
            ((users + devices) * 500)* samples
         example: 5 minute interval for 8 hour prime shift with 1000 users
                  and 500 devices would be
           ((1000+500)*500)*(8*60/5) = 69 meg (about 98 3390 cylinders)
    
  11. If collecting to disk, make minidisk ineligible for MDC (MINIOPT NOMDC)

  12. MONWRITE format is best for us; if not we must know true format.

  13. If you collected data to disk, but want to send a tape. You can use TAPE DUMP, VMFPLC2 DUMP, or PIPELINES. Please do not use DDR or SPXTAPE or SPTAPE. Please, also tell us what format you used.

  14. Please note the type of tape drive used (compression, 3480 vs 3490, etc.).

FTP Considerations

If you need to move the raw monitor data by FTP, there are a few things to keep in mind:

  • Monitor data should be treated as binary data
  • If you are FTPing from another platform to VM,then use BINARY F LRECL 4096 and QUOTE SITE FIX 4096 to ensure the file gets there as a binary file of fixed records of size 4096.
  • If you forget to do the above, you can often recover with the following Pipelines command on VM - PIPE < mondata_file_id | FBLOCK 4096 00 | > mondata_file_id F 4096

If the file is on the PC and you can edit it and read dates, volsers, userids, then chances are someone moved it as Text and it was translated from EBCIDIC to ASCII. This is not good.


Back to the Performance Tips Page