VM/ESA® Performance on P/390 and R/390
PC Server 520 and RS/6000 591
IBM VM/ESA® World Wide Marketing Programs
Last updated December 04, 1996
VM/ESA performs very well for CMS interactive workloads on P/390 and R/390. For the program development workload measured in the laboratory for this study, 190 to 200 users was the upper limit we were able to sustain with subsecond response time to the end-users terminal. This performance characteristic makes VM/ESA on P/390 and R/390 an attractive entry level general business system. A previous study performed in 1995 and published in WSC Flash 9522 PC Server 500 System/390 Performance while not discussed in this document, showed that VM/ESA on P/390 is also a good performer for VSE/ESA guest workloads.
The information contained in this document has not been submitted to any formal IBM test and is distributed on an as is basis without any warranty either expressed or implied. The use of this information or the implementation of any of these techniques is a customer responsibility and depends on the customer's ability to evaluate and integrate them into the customers operational environment. While each item may have been reviewed by IBM for accuracy in a specific situation, there is no guarantee that the same or similar results will be obtained elsewhere. Customers attempting to adapt these techniques to their own environments do so at their own risk.
Performance data contained in this presentation were determined in various controlled laboratory environments and are for reference purposes only. Customers should not adapt these performance numbers to their own environments as system performance standards. The results that may be obtained in other operating environments may vary significantly. Users of this document should verify the applicable data for their specific environment.
References in this presentation to IBM products, programs, or services do not imply that IBM intends to make these available in all countries in which IBM operates. Any references to an IBM licensed program in this publication is not intended to state or imply that only IBM's programs may be used. Any functionally equivalent program may be used instead.
The following are trademarks of IBM in the United States and other countries.
- RISC System/6000®
Pentium® is a registered trademark of Intel Corporation
This paper is based upon a presentation first given at the VM/ESA and VSE/ESA Technical Conference in Rome, Italy , October 1996. It is a discussion of laboratory measurements of VM/ESA on an IBM PC Server 520 System/390 (commonly called a P/390) and an IBM RISC System/6000 with System/390 Server-on-Board (commonly called a R/390).
Thanks go to Wes Ernsberger of IBM's VM Performance group at the VM Development Lab for his analysis of VM performance data.
VM/ESA workload discussion
Prior to the announcement of the PC Server 500 System/390 in June 1995, a performance study was executed to determine how well a CMS intensive program development workload for VM/ESA would perform. The results of the 1995 performance study were published as WSC Flash 9522 PC Server 500 System/390 Performance which is available to customers via IBM's FLASHES database on IBMLINK. An ASCII version of the file can also be pulled in a zip file from the world wide web at http://www.pc.ibm.com/files.html and searching for the file named pcsvr390.zip in the downloadable files. Some of the measurements from the 1995 study are included here for comparisons sake. It is important to note that the PC Server 500 has been withdrawn from marketing by IBM and that its current closest equivalent replacement as a platform for the MicroChannel System/390 Microprocessor Complex is the PC Server 520 model MD0. Throughout 1995 and 1996 changes were made in the P/390 OS/2 support programs that improved performance. Regression runs on the PC Server 500 were not performed with the newer P/390 support code since the PC Server 500 is no longer marketed by IBM. Therefore it is important to remember that the comparisons between the PC Server 520 System/390 are to a PC Server 500 System/390 executing the older P/390 support code without available performance improvements.
PC Server configurations measured
The following table contains information about the hardware and software that were measured in the laboratory measurements. If your browser cannot display tables correctly, then click here for an alternate view of Table 1.
|Table 1. PC Server hardware and software configurations measured.|
|System||PC Server 500 (old)||PC Server 520 (new)|
|PC processor||90Mhz Pentium with 256K of L2 cache (Synchrostream)||133Mhz Pentium with 512K of L2 asynchronous cache|
|Approximate capacity/drive||2.25G Fast/Wide||2.25G Fast/Wide|
|Array stripe width||64K||64K|
|Array Adapter||IBM F/W Streaming RAID Adapter||IBM SCSI-2 PCI-BUS RAID adapter|
|Adapter cache settings||Read ahead=ON
|OS/2 file system||HPFS||HPFS386|
|Logical drive type||RAID-5||RAID-5|
|Number of logical drives||2||2|
|Number of partitions per logical drive||1||1|
|LAN Adapter||IBM AutoLANStreamer MC 32||IBM AutoLANStreamer MC 32|
|OS/2 Level||V3 Warp Fullpack||V4 Warp Server Advanced|
|P/390 Support code level||March 1995||Version 2.2|
Of particular interest in Table 1 are the following items.
- The data on the PC Server 520 was striped across five drives as opposed to three on the PC Server 500.
- The RAID adapter in the PC Server 520 was a PCI architecture RAID adapter as opposed to a MicroChannel RAID adapter. The 520 runs also exploited the adapter's cache for write caching.
- OS/2 V4 Warp Server Advanced is the recommended operating system on the PC Server 520. OS/2 Warp Server Advanced allows use of the HPFS386 file system. A major benefit of HPFS386 over basic HPFS is that it allows use of a file cache greater than 2Megabytes in size.
End user response time measurements and the actual driving of the workload on the measured VM/ESA system were obtained using IBM's Teleprocessing Network Simulator (TPNS). TPNS drives actual VTAM cross domain logons to the target system to the VSCS APPL. The users logon to CMS and run the workload scripts with appropriate think times and randomness as would occur in an actual environment.
It is important to stress that this is the exact same methodology used by the VM Performance Lab for measuring VM/ESA on mainframe processors. In these measurements VM/ESA with VTAM and TPNS on one P/390 drive the target VM/ESA system. Using this method instead of internal drivers allows network delays introduced by software as well as the LAN Adapter (which is used to simulate a 3172 control unit to VM/ESA) to be understood.
Brief workload description
The workload executed on the measured VM/ESA system is the FS8F CMS interactive workload. This is basically a CMS program development workload. A few highlights of things done by the users in the workload are
- CMS file management (FILELIST, LISTFILE, COPY)
- XEDIT (work on program source, etc.)
- Compile and run (HLASM, Cobol II, Fortran, Pliopt)
- Link and Access minidisks
- Manage spool files
- Use CMS HELP
- Obtain and format TDISK
- XEDIT a script file and invoke DCF
- Use CP PER
- and more...
Results and comparisons of PC Server S/390 runs
The following table contains information from runs performed on the PC Server 500 S/390 in 1995 as well as PC Server 520 S/390 runs. If your browser cannot display tables correctly, then click here for an alternate view of Table 2.
|Table 2. VM/ESA measurements on PC Servers.|
OS/2 Cache size
AVG LAST (TPNS)
RIO Rate (VMPRF)
MDC Real Size (MB)
MDC Hit Ratio
PC Server runs discussion
A few points of discussion can be made about this data.
- Comparing the systems when both have a 2MB file cache size showed that the PC Server 520 provided approximately 20% better end-user response time.
- Exploiting a larger HPFS386 cache (17MB in this case) provided approximately 28% better end-user response time than the PC Server 500 did for 190 CMS users.
- While end-user response time improved on the PC Server 520 when compared
to the 500, there is
not a significant increase in capacity of CMS users.
- The S/390 Microprocessor Complex is identical in both the 500 and the 520.
- The S/390 processor is nearly 85% utilized at 200 users.
- At 220 CMS users on the R/390 (see ahead), VM/ESA spent more time doing real storage management (paging) resulting in poor end user response time. Similar results would be expected on the 520 had a run with 220 CMS users been attempted.
- The S/390 Microprocessor Complex is identical in both the 500 and the 520.
The following table describes the RISC System/6000 that was used during laboratory measurements of VM/ESA. If your browser cannot display tables correctly, then click here for an alternate view of Table 3.
|Table 3. RISC System/6000 hardware and software configuration measured.|
|System||RISC System/6000 Model 591|
|Installed memory for AIX||64M|
|Approximate capacity per drive||4.5G|
|Array adapter||7137 Tower|
|Logical drive type||RAID-5|
|LAN Adapter||IBM AutoLANStreamer MC 32|
|P/390 Support code level||18.104.22.168|
Additionally, for the run named PR591201, the same system was used but the VM/ESA DASD volumes were spread across three 4.5G internal hard drives (non array) instead of the 7137 tower.
R/390 Measurement results
The following table contains data for the performance runs executed on the R/390 model 591. If your browser cannot display tables correctly, then click here for an alternate view of Table 4.
|Table 4. VM/ESA measurements on R/390 model 591.|
P/390 code level
AVG LAST (TPNS)
RIO Rate (VMPRF)
MDC Real Size (MB)
MDC Hit Ratio
|IOSTAT Data (AIX)
R/390 runs discussion
- For 190 CMS users, when compared to the PC Server 520, the RISC System/6000 591 offered approximately 30% better end-user response time for this workload. However, a 5% increase in the number of CMS users on the R/390 nearly negated that difference.
- Increasing the number of CMS users in this workload beyond 200 users "broke the knee in the curve" for the VM/ESA system to be able to deliver subsecond response time to the end-users. Multiple things conspired to make this happen.
- Increased S/390 processor utilization caused by the additional users.
- Increased S/390 processor utilization as a result of CP having to do more real storage management.
- Increased paging I/O as a result of increased real storage management.
- Increased S/390 processor utilization caused by the additional users.
- One observation of these results is that for this workload while the RISC System/6000 591 offered better end-user response time than the PC Server 520, it offered little or no more S/390 capacity at the level of S/390 utilization. This should not be surprising given that it is the exact same S/390 Microprocessor Complex in both systems. The point is that while capacity and performance are related, they are not the same thing.
PERFORMANCE TUNING HINTS AND TIPS
This section contains performance and tuning hints that were gathered during laboratory measurements and experiences during the 1995 performance study for the PC Server 500 System/390. It is intended to give information that may be useful in planning for installation and tuning a PC Server System/390. Some of the information may also be useful in a RISC System/6000 with Server on Board (R/390) environment.
PC SERVER SYSTEM/390 ARRAY CONSIDERATIONS
ARRAY STRIPE UNIT SIZE
On array models of the PC Server System/390, the customer sets the stripe unit size (amount of data written on a given disk before writing on the next disk). The default stripe unit size is 8K. Choices are 8K, 16K, 32K, and 64K. Sizes larger than 8K will probably yield better performance for S/390 workloads than the default 8K.
Also consider the I/O characteristics of any other OS/2 applications that you may run concurrently on the PC Server 500 System/390 when choosing a stripe unit size. For example, larger stripe sizes may not be the best performing choice for LAN file serving workloads. A compromise between larger and smaller stripe sizes might be in order depending on the overall system I/O characteristics.
WARNING: Once the stripe unit is chosen and data is stored in the logical drives, the stripe unit cannot be changed without destroying data in the logical drives.
There are two choices for write policy with the RAID adapter. The default write policy is write-through (WT), where the completion status is sent after the data is written to the hard disk drive.
To improve performance, you can change this write policy to write-back (WB) where the completion status is sent after the data is copied to the RAID adapter's cache memory, but before the data is actually written to the storage device. There is 4MB of cache memory, of which more than 3MB are available for caching data.
WARNING: If you lose power before the data is actually written to the storage device, data in cache memory is lost. See also section " LAZY writes" for related information.
You can achieve a performance improvement by using WB, but you run a far greater risk of data loss in the event of a power loss. An uninterruptible power supply (UPS) can help minimize this risk and is highly recommended for this reason and for the other power protection benefits it supplies as well.
The IBM RAID adapters used in the PC Server S/390 include a function called Read ahead caching. When a read I/O is performed that is satisfied from one of the hard drives in an array, the RAID adapter controller will stage the rest of that stripe to the end of the stripe into the adapter's cache. This can be beneficial for sequential processing. It is less beneficial for random reads where SCSI bandwidth can be wasted and higher disk utilization occur when transferring data to the cache that will not be needed.
There are no rules of thumb for setting read ahead on or off. It depends on the workload and to some degree on the kind of emulated dasd (See ahead to the section named S/390 Dasd Device Drivers for further information.) Experimentation with the setting on and off may be the only way to discover the benefits for a workload. The Read Ahead setting can be toggled on or off for logical drives within an array by booting the RAID controller diskette. The default for logical drives when created is for Read Ahead to be ON.
OS/2 HPFS CACHE
BASE OS/2 SYSTEM HPFS CACHE SIZE
The HPFS.IFS device driver delivered with OS/2 WARP has a maximum cache size of 2048K (2 Megabytes). The /CACHE:NNNN parameter of the IFS device driver specifies the size of the cache. The default is 10% of available RAM (if not specified) with a maximum of 2048K. The specified value after an install of OS/2 is dependent on installed RAM at the time of installation. If you are using the standard OS/2 provided IFS device driver, then specifying /CACHE:2048 is highly recommended. Enter HELP HPFS.IFS at the OS/2 command prompt for further explanation of the parameters.
/CRECL ON IFS HPFS CACHE
The /CRECL parameter of the HPFS IFS driver allows you to specify the size of the largest record eligible for this cache. The OS/2 default is 4K. From a S/390 perspective, increasing this value may increase cache read hits if the S/390 operating system is performing repetitive I/Os of the same data in blocks bigger than the default 4K. You can use performance analysis tools for each S/390 operating system to understand the characteristics of I/Os that are being performed by the S/390 operating system and applications. OS/2 performance tools like IBM's SPM/2 V2 can also assist in tuning the /CRECL value.
If using emulated CKD devices (for example 3380 or 3390), then you should increase /CRECL to its maximum of 64K. Otherwise you are unlikely to ever have a cache hit for the OS/2 file holding the emulated volume. For further explanation of this, see section S/390 DASD Device Drivers.
Enter HELP HPFS.IFS at the OS/2 command prompt for further explanation of the parameters.
HPFS386 CACHE with OS/2 Warp Server Advanced and OS/2 Lan Server Advanced
OS/2 Warp Server Advanced and OS/2 LanServer Advanced provide an optional installable file system named HPFS386. HPFS386 provides the ability to specify caches larger than 2M. If you choose to exploit HPFS386 for larger caches, then tuning this cache must be done. Refer to the Warp Server or LanServer Advanced documentation for information on tuning this cache. It is similar to tuning the base OS/2 provided HPFS cache, but is done with a file named HPFS386.INI.
See Table 2. VM/ESA measurements on PC Servers earlier in this document for an example of possible benefits of using a cache greater than 2M in size.
HPFS386 allows greater granularity in controlling write policies than does basic HPFS. You can allow lazy writes on a drive partition basis with HPFS386. For example, lazy writes can be turned off for drive C: but on for drive D:. With basic HPFS, lazy writes are on or off for the entire system.
Lazy writes are defaulted to ON with OS/2's HPFS and also HPFS386. If lazy writes are enabled then when a write occurs for a block of data that is eligible for the HPFS cache, the application is given completion status before the data is actually written to the hard drive. The data is actually written to the hard drive during idle time or when the maximum age for the data is reached. Lazy writes are a
significant performance enhancement, especially if accessing data on hard drives through a SCSI adapter that does not have a hardware cache aboard. (The IBM RAID adapters used in the IBM PC Server System/390 have a hardware cache. Other SCSI adapters added by customers to their systems may not have a cache.)
WARNING: There is a risk to the data in the event of an OS/2 software failure or power loss before the data is written from the cache to the hard drive. See section " Write policy" for related information. You can control whether lazy writes are enabled or not with the OS/2 CACHE command or the CACHE386 command if using HPFS386 as well as maximum age and idle times for the disk and cache buffers. Enter HELP CACHE at the OS/2 command prompt for further information. Enter CACHE386 ? for help with CACHE386. Initial write policies for HPFS386 drives can also be set in HPFS386.INI.
OS/2 FAT CACHE
Lazy writes are defaulted to ON with OS/2's FAT DISKCACHE. If lazy writes are enabled then when a write occurs for a block of data that is eligible for the FAT cache, the application is given completion status before the data is actually written to the hard drive. The data is actually written to the hard drive during idle time or when the maximum age for the data is reached. Lazy writes are a significant performance enhancement, especially if accessing data on hard drives through a SCSI adapter that does not have a hardware cache aboard. (The IBM RAID adapters used in the IBM PC Server System/390 have an hardware cache. Other SCSI adapters added by customers to their systems may not have a cache.)
WARNING: There is a risk to the data in the event of a OS/2 software failure or power loss before the data is written from the cache to the hard drive. See section "Write policy" for related information. You can control whether or not lazy writes occur for the FAT cache with parameters on the DISKCACHE= statement in CONFIG.SYS. Enter HELP DISKCACHE at the OS/2 command prompt for more information on DISKCACHE parameters.
OS/2 CONFIG.SYS TUNING
MAXWAIT in CONFIG.SYS defines the number of seconds that an OS/2 thread waits before being assigned a higher dispatching priority. Applications that are I/O intensive could benefit from setting MAXWAIT=1 in CONFIG.SYS. Since the S/390 operating system running on the PC Server System/390 is likely to be I/O intensive, setting MAXWAIT=1 is generally recommended on the PC Server System/390. The valid ranges for MAXWAIT are 1 to 255. The OS/2 default is 3 seconds. Tuning this setting may only show results when there is other OS/2 work being performed in addition to the S/390 workload.
If your PC Server System/390 has no FAT formatted partitions, then the DISKCACHE= device driver can be commented out (REM) of the PC Server System/390's CONFIG.SYS in order to save some memory. By default, OS/2 places this device driver in CONFIG.SYS. The size of the DISKCACHE may be tuned. Enter HELP DISKCACHE for information on the parameters that may be specified on DISKCACHE.
This command in the CONFIG.SYS file controls whether or not an application running in the foreground of the OS/2 desktop receives priority for its disk accesses over an application running in the background. Because the S/390 operating system is probably serving multiple clients accessing the system over LAN or other communication methods, you would not want users of the S/390 operating system to receive lower priority for the S/390 I/Os in the event someone opens an OS/2 application or window in the foreground.
Specifying PRIORITY_DISK_IO=NO is recommended. NO specifies that all applications (foreground and background) are to be treated equally with regard to disk access. The default is YES. YES specifies that applications running in the foreground are to receive priority for disk access over applications running in the background.
S/390 DASD DEVICE DRIVERS
FUNCTIONAL DIFFERENCES IN DASD EMULATION
The AWSCKD device driver has some functional differences when compared with the AWSFBA device driver. Independent of how much data is requested in the channel program, the AWSCKD device driver reads and writes a full emulated CKD device track when an I/O is performed . The device driver has an internal cache where the track is kept until it must be flushed. As the AWSFBA device driver does not implement an internal cache, the performance characteristics between the two can be different depending upon the I/O workload. VM/ESA ESA Feature's block paging methodology seemed to benefit from the internal cache of the AWSCKD device driver in controlled laboratory experiments. This has been analyzed and the primary reason is that VM/ESA uses data chaining in the channel program to perform a block page I/O to an FBA paging device but does not use data chaining for block page I/O to CKD or ECKD devices. When a block page I/O is performed to an FBA page volume on P/390, the resultant OS/2 I/O appears to be individual 4K I/Os to the OS/2 file holding the FBA page volume. I/Os to a CKD paging volume on a P/390 always resolve to a full track read (48K in the case of an emulated 3380, 56K for a 3390) from OS/2's perspective. Fewer full track reads yielded better performance than individual 4K page reads in the laboratory measurements, though the difference was much smaller in the PC Server 520 study than in the PC Server 500 study due to improvements in the P/390 support code. Given that the subsequent pages in the block of pages requested are likely to be within the track in AWSCKD's buffer, those pages can be satisfied from the buffer without further OS/2 I/Os. You should consider using 3380 or 3390 volumes for VM/ESA ESA Feature CP paging volumes for this reason. (Note: The pre-configured VM/ESA systems for P/390 and R/390 configure only a token amount of CP paging on an FBA volume when first installed from the CDROM.)
You should not generalize this observation into a statement that AWSCKD performs better than AWSFBA. In fact, AWSFBA DASD volumes performed extremely well in laboratory experiments and offer some benefits over AWSCKD including finer granularity on OS/2 file allocation sizes, less Pentium time to handle S/390 I/Os, and a close mapping to the underlying sectors of the dasd media. VM/ESA and VSE/ESA utilize FBA DASD in a very efficient manner. The flexibility of the PC Server System/390 in supporting both CKD and FBA emulated volumes in a mixture allows you to easily have both types in your configuration.
For CMS workloads, think carefully how CMS file I/O occurs from an OS/2 perspective and you can see that there are some pathological cases. Consider a CMS minidisk formatted in 4K blocks that contains a CMS file that occupies multiple 4K blocks whose blocks are fragmented into several different tracks on the minidisk. If the minidisk is an FBA volume, then when the OS/2 I/Os are performed to satisfy the I/O after the channel program is emulated there would be a series of 4K I/Os performed by OS/2. However, if the minidisk is on an emulated CKD volume (for example 3380), then the same CMS I/O will result in multiple full track reads (48K) by AWSCKD. The less data required from that track (a single 4K block in the pathological case), the more expensive the I/O from an PC Server perspective. If other 4K blocks of the file are within a single track, then it is possible to get some benefit from the AWSCKD track cache in the same manner as CP block paging does.
Newer technology LAN adapters such as IBM's Streamer family are highly recommended for maximizing the communications throughput of the PC Server System/390.
XMITBUFSIZE IF USING THE IBM 16/4 TOKEN RING ADAPTER/A
Information in this section is specific to the named adapter and does not apply to the "Streamer" family of IBM LAN adapters.
The value for XMITBUFSIZE (Transmit Buffer Size) is a tuneable value for this adapter card. The default value used for IBM's 16/4 Token Ring Adapter/A may be a poor choice if you are using VTAM for subarea communications between two VTAM subareas. When performing full screen operations such as XEDIT under VM/CMS, the buffer used by VTAM will exceed the XMITBUFSIZE size specified in PROTOCOL.INI and cause segmentation. For example, when using a 16/4 Token Ring Adapter/A in a laboratory environment, multi-second response time was observed while scrolling in XEDIT when logged on via a cross-domain VTAM session from one PC Server System/390 to another. Increasing the value of XMITBUFSIZE so that it was more than the VTAM RUSIZE restored response time to its expected sub-second value. A rule of thumb for tuning XMITBUFSIZE:
z = (VTAM RUSIZE in bytes) + 9 + 40
minimum XMITBUFSIZE = Round-to-next-highest-multiple-of-eight( z )
where the "9" is the nine bytes for the transmit header and the request header, and the "40" is some extra to give a little room for bytes that may not be accounted for at this time. Note that there are different maximums for XMITBUFSIZE depending on whether your token ring is a 4Mbit or 16Mbit ring. For example, the maximum size of XMITBUFSIZE for the IBM 16/4 Token Ring Adapter/A on a 4Mbit ring is 4456. Other older adapters have limits that are smaller still for 4Mbit rings.
It should also be noted that in this particular situation, when REMOTE was set ON under CMS/XEDIT, data compression performed by CMS for fullscreen I/O also restored sub-second response time. This indicates the continued value of this virtual machine setting in tuning for VTAM use in a VM environment.
VM/ESA GENERAL TUNING
In general, anything that can be done to avoid CP having to perform an I/O is likely to provide better performance. This is especially true on P/390 and R/390 since they both rely on emulation to complete the I/O. Therefore, work done to avoid I/O that must be handled outside the S/390 processor is usually worth the effort. Make sure you take advantage of data-in-memory techniques available in VM/ESA today (such as shared segments and VM dataspaces) to help CP avoid paging. Some recent references for data-in-memory techniques and paging performance can be found in the following documents on the world wide web at http://www.vm.ibm.com/perf.
- VM/ESA data in memory techniques - A discussion of the relative merits of saved segments, VM data spaces, minidisk caching, virtual disks in storage, and NUCXLOAD/EXECLOAD for improving performance.
- CMS Saved Segment Management - describes the purpose, use, and location requirements for various segments shipped with CMS.
- Understanding poor performance due to paging increases - A brief explanation of what can cause a system to page more and how to determine that cause.