VM/ESA® Performance on P/390 and R/390PC Server 520 and RS/6000 591Gary EhemanIBM VM/ESA® World Wide Marketing Programseheman@vnet.ibm.com
Last updated December 04, 1996 |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Table 1. PC Server hardware and software configurations measured. | ||
|---|---|---|
| System | PC Server 500 (old) | PC Server 520 (new) |
| PC processor | 90Mhz Pentium with 256K of L2 cache (Synchrostream) | 133Mhz Pentium with 512K of L2 asynchronous cache |
| Hard drives | 3 | 5 |
| Approximate capacity/drive | 2.25G Fast/Wide | 2.25G Fast/Wide |
| Array stripe width | 64K | 64K |
| Array Adapter | IBM F/W Streaming RAID Adapter | IBM SCSI-2 PCI-BUS RAID adapter |
| Adapter cache settings | Read ahead=ON Writeback=OFF | Read ahead=ON Writeback=ON |
| OS/2 file system | HPFS | HPFS386 |
| Logical drive type | RAID-5 | RAID-5 |
| Number of logical drives | 2 | 2 |
| Number of partitions per logical drive | 1 | 1 |
| LAN Adapter | IBM AutoLANStreamer MC 32 | IBM AutoLANStreamer MC 32 |
| OS/2 Level | V3 Warp Fullpack | V4 Warp Server Advanced |
| P/390 Support code level | March 1995 | Version 2.2 |
Of particular interest in Table 1 are the following items.
End user response time measurements and the actual driving of the workload on the measured VM/ESA system were obtained using IBM's Teleprocessing Network Simulator (TPNS). TPNS drives actual VTAM cross domain logons to the target system to the VSCS APPL. The users logon to CMS and run the workload scripts with appropriate think times and randomness as would occur in an actual environment.
Brief workload description
The workload executed on the measured VM/ESA system is the FS8F CMS interactive workload. This is basically a CMS program development workload. A few highlights of things done by the users in the workload are
Results and comparisons of PC Server S/390 runs
The following table contains information from runs performed on the PC Server 500 S/390 in 1995 as well as PC Server 520 S/390 runs. If your browser cannot display tables correctly, then click here for an alternate view of Table 2.
| Table 2. VM/ESA measurements on PC Servers. | ||||
|---|---|---|---|---|
| RUN ID | PC7E5190 | PC520192 | PC520190 | PC520200 |
| Variables System S/390 Memory P/390_code_level OS/2 Cache size CMS Users |
|
|
|
|
| Response Time TRIV INT AVG LAST (TPNS) |
|
|
|
|
| S/390_Proc_util. Total (VMPRF) |
|
|
|
|
| I/O RIO Rate (VMPRF) MDC Real Size (MB) MDC Hit Ratio |
|
|
|
|
| SPM/2 Pentium Util. I/O Req/sec HPFS hit% HPFS386_hits_r%/w% |
|
|
|
|
PC Server runs discussion
A few points of discussion can be made about this data.
The following table describes the RISC System/6000 that was used during laboratory measurements of VM/ESA. If your browser cannot display tables correctly, then click here for an alternate view of Table 3.
| Table 3. RISC System/6000 hardware and software configuration measured. | |
|---|---|
| System | RISC System/6000 Model 591 |
| Installed memory for AIX | 64M |
| Hard drives | 3 |
| Approximate capacity per drive | 4.5G |
| Array adapter | 7137 Tower |
| Logical drive type | RAID-5 |
| LAN Adapter | IBM AutoLANStreamer MC 32 |
| AIX Level | 4.1.4 |
| P/390 Support code level | 4.1.0.0 |
Additionally, for the run named PR591201, the same system was used but the VM/ESA DASD volumes were spread across three 4.5G internal hard drives (non array) instead of the 7137 tower.
R/390 Measurement results
The following table contains data for the performance runs executed on the R/390 model 591. If your browser cannot display tables correctly, then click here for an alternate view of Table 4.
| Table 4. VM/ESA measurements on R/390 model 591. | |||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| RUN ID | PR591190 | PR591200 | PR591201 | PR591220 | |||||||||||||||
| Variables System S/390 Memory I/O subsystem P/390 code level AIX memory CMS users |
|
|
|
| |||||||||||||||
| Response Time TRIV INT AVG LAST (TPNS) |
|
|
|
|
R/390 runs discussion
This section contains performance and tuning hints that were gathered during laboratory measurements and experiences during the 1995 performance study for the PC Server 500 System/390. It is intended to give information that may be useful in planning for installation and tuning a PC Server System/390. Some of the information may also be useful in a RISC System/6000 with Server on Board (R/390) environment.
PC SERVER SYSTEM/390 ARRAY CONSIDERATIONS
ARRAY STRIPE UNIT SIZE
On array models of the PC Server System/390, the customer sets the stripe unit size (amount of data written on a given disk before writing on the next disk). The default stripe unit size is 8K. Choices are 8K, 16K, 32K, and 64K. Sizes larger than 8K will probably yield better performance for S/390 workloads than the default 8K.
Also consider the I/O characteristics of any other OS/2 applications that you may run concurrently on the PC Server 500 System/390 when choosing a stripe unit size. For example, larger stripe sizes may not be the best performing choice for LAN file serving workloads. A compromise between larger and smaller stripe sizes might be in order depending on the overall system I/O characteristics.
WARNING: Once the stripe unit is chosen and data is stored in the logical drives, the stripe unit cannot be changed without destroying data in the logical drives.
WRITE POLICY
There are two choices for write policy with the RAID adapter. The default write policy is write-through (WT), where the completion status is sent after the data is written to the hard disk drive.
To improve performance, you can change this write policy to write-back (WB) where the completion status is sent after the data is copied to the RAID adapter's cache memory, but before the data is actually written to the storage device. There is 4MB of cache memory, of which more than 3MB are available for caching data.
WARNING: If you lose power before the data is actually written to the storage device, data in cache memory is lost. See also section " LAZY writes" for related information.
You can achieve a performance improvement by using WB, but you run a far greater risk of data loss in the event of a power loss. An uninterruptible power supply (UPS) can help minimize this risk and is highly recommended for this reason and for the other power protection benefits it supplies as well.
READ AHEAD
The IBM RAID adapters used in the PC Server S/390 include a function called Read ahead caching. When a read I/O is performed that is satisfied from one of the hard drives in an array, the RAID adapter controller will stage the rest of that stripe to the end of the stripe into the adapter's cache. This can be beneficial for sequential processing. It is less beneficial for random reads where SCSI bandwidth can be wasted and higher disk utilization occur when transferring data to the cache that will not be needed.
There are no rules of thumb for setting read ahead on or off. It depends on the workload and to some degree on the kind of emulated dasd (See ahead to the section named S/390 Dasd Device Drivers for further information.) Experimentation with the setting on and off may be the only way to discover the benefits for a workload. The Read Ahead setting can be toggled on or off for logical drives within an array by booting the RAID controller diskette. The default for logical drives when created is for Read Ahead to be ON.
OS/2 HPFS CACHE
BASE OS/2 SYSTEM HPFS CACHE SIZE
The HPFS.IFS device driver delivered with OS/2 WARP has a maximum cache size of 2048K (2 Megabytes). The /CACHE:NNNN parameter of the IFS device driver specifies the size of the cache. The default is 10% of available RAM (if not specified) with a maximum of 2048K. The specified value after an install of OS/2 is dependent on installed RAM at the time of installation. If you are using the standard OS/2 provided IFS device driver, then specifying /CACHE:2048 is highly recommended. Enter HELP HPFS.IFS at the OS/2 command prompt for further explanation of the parameters.
/CRECL ON IFS HPFS CACHE
The /CRECL parameter of the HPFS IFS driver allows you to specify the size of the largest record eligible for this cache. The OS/2 default is 4K. From a S/390 perspective, increasing this value may increase cache read hits if the S/390 operating system is performing repetitive I/Os of the same data in blocks bigger than the default 4K. You can use performance analysis tools for each S/390 operating system to understand the characteristics of I/Os that are being performed by the S/390 operating system and applications. OS/2 performance tools like IBM's SPM/2 V2 can also assist in tuning the /CRECL value.
If using emulated CKD devices (for example 3380 or 3390), then you should increase /CRECL to its maximum of 64K. Otherwise you are unlikely to ever have a cache hit for the OS/2 file holding the emulated volume. For further explanation of this, see section S/390 DASD Device Drivers.
Enter HELP HPFS.IFS at the OS/2 command prompt for further explanation of the parameters.
HPFS386 CACHE with OS/2 Warp Server Advanced and OS/2 Lan Server Advanced
OS/2 Warp Server Advanced and OS/2 LanServer Advanced provide an optional installable file system named HPFS386. HPFS386 provides the ability to specify caches larger than 2M. If you choose to exploit HPFS386 for larger caches, then tuning this cache must be done. Refer to the Warp Server or LanServer Advanced documentation for information on tuning this cache. It is similar to tuning the base OS/2 provided HPFS cache, but is done with a file named HPFS386.INI.
See Table 2. VM/ESA measurements on PC Servers earlier in this document for an example of possible benefits of using a cache greater than 2M in size.
HPFS386 allows greater granularity in controlling write policies than does basic HPFS. You can allow lazy writes on a drive partition basis with HPFS386. For example, lazy writes can be turned off for drive C: but on for drive D:. With basic HPFS, lazy writes are on or off for the entire system.
LAZY WRITES
Lazy writes are defaulted to ON with OS/2's HPFS and also HPFS386. If lazy writes are enabled then when a write occurs for a block of data that is eligible for the HPFS cache, the application is given completion status before the data is actually written to the hard drive. The data is actually written to the hard drive during idle time or when the maximum age for the data is reached. Lazy writes are a
significant performance enhancement, especially if accessing data on hard drives through a SCSI adapter that does not have a hardware cache aboard. (The IBM RAID adapters used in the IBM PC Server System/390 have a hardware cache. Other SCSI adapters added by customers to their systems may not have a cache.)
WARNING: There is a risk to the data in the event of an OS/2 software failure or power loss before the data is written from the cache to the hard drive. See section " Write policy" for related information. You can control whether lazy writes are enabled or not with the OS/2 CACHE command or the CACHE386 command if using HPFS386 as well as maximum age and idle times for the disk and cache buffers. Enter HELP CACHE at the OS/2 command prompt for further information. Enter CACHE386 ? for help with CACHE386. Initial write policies for HPFS386 drives can also be set in HPFS386.INI.
OS/2 FAT CACHE
LAZY WRITES
Lazy writes are defaulted to ON with OS/2's FAT DISKCACHE. If lazy writes are enabled then when a write occurs for a block of data that is eligible for the FAT cache, the application is given completion status before the data is actually written to the hard drive. The data is actually written to the hard drive during idle time or when the maximum age for the data is reached. Lazy writes are a significant performance enhancement, especially if accessing data on hard drives through a SCSI adapter that does not have a hardware cache aboard. (The IBM RAID adapters used in the IBM PC Server System/390 have an hardware cache. Other SCSI adapters added by customers to their systems may not have a cache.)
WARNING: There is a risk to the data in the event of a OS/2 software failure or power loss before the data is written from the cache to the hard drive. See section "Write policy" for related information. You can control whether or not lazy writes occur for the FAT cache with parameters on the DISKCACHE= statement in CONFIG.SYS. Enter HELP DISKCACHE at the OS/2 command prompt for more information on DISKCACHE parameters.
OS/2 CONFIG.SYS TUNING
MAXWAIT
MAXWAIT in CONFIG.SYS defines the number of seconds that an OS/2 thread waits before being assigned a higher dispatching priority. Applications that are I/O intensive could benefit from setting MAXWAIT=1 in CONFIG.SYS. Since the S/390 operating system running on the PC Server System/390 is likely to be I/O intensive, setting MAXWAIT=1 is generally recommended on the PC Server System/390. The valid ranges for MAXWAIT are 1 to 255. The OS/2 default is 3 seconds. Tuning this setting may only show results when there is other OS/2 work being performed in addition to the S/390 workload.
FAT DISKCACHE
If your PC Server System/390 has no FAT formatted partitions, then the DISKCACHE= device driver can be commented out (REM) of the PC Server System/390's CONFIG.SYS in order to save some memory. By default, OS/2 places this device driver in CONFIG.SYS. The size of the DISKCACHE may be tuned. Enter HELP DISKCACHE for information on the parameters that may be specified on DISKCACHE.
PRIORITY_DISK_IO
This command in the CONFIG.SYS file controls whether or not an application running in the foreground of the OS/2 desktop receives priority for its disk accesses over an application running in the background. Because the S/390 operating system is probably serving multiple clients accessing the system over LAN or other communication methods, you would not want users of the S/390 operating system to receive lower priority for the S/390 I/Os in the event someone opens an OS/2 application or window in the foreground.
Specifying PRIORITY_DISK_IO=NO is recommended. NO specifies that all applications (foreground and background) are to be treated equally with regard to disk access. The default is YES. YES specifies that applications running in the foreground are to receive priority for disk access over applications running in the background.
S/390 DASD DEVICE DRIVERS
FUNCTIONAL DIFFERENCES IN DASD EMULATION
The AWSCKD device driver has some functional differences when compared with the AWSFBA device driver. Independent of how much data is requested in the channel program, the AWSCKD device driver reads and writes a full emulated CKD device track when an I/O is performed . The device driver has an internal cache where the track is kept until it must be flushed. As the AWSFBA device driver does not implement an internal cache, the performance characteristics between the two can be different depending upon the I/O workload. VM/ESA ESA Feature's block paging methodology seemed to benefit from the internal cache of the AWSCKD device driver in controlled laboratory experiments. This has been analyzed and the primary reason is that VM/ESA uses data chaining in the channel program to perform a block page I/O to an FBA paging device but does not use data chaining for block page I/O to CKD or ECKD devices. When a block page I/O is performed to an FBA page volume on P/390, the resultant OS/2 I/O appears to be individual 4K I/Os to the OS/2 file holding the FBA page volume. I/Os to a CKD paging volume on a P/390 always resolve to a full track read (48K in the case of an emulated 3380, 56K for a 3390) from OS/2's perspective. Fewer full track reads yielded better performance than individual 4K page reads in the laboratory measurements, though the difference was much smaller in the PC Server 520 study than in the PC Server 500 study due to improvements in the P/390 support code. Given that the subsequent pages in the block of pages requested are likely to be within the track in AWSCKD's buffer, those pages can be satisfied from the buffer without further OS/2 I/Os. You should consider using 3380 or 3390 volumes for VM/ESA ESA Feature CP paging volumes for this reason. (Note: The pre-configured VM/ESA systems for P/390 and R/390 configure only a token amount of CP paging on an FBA volume when first installed from the CDROM.)
You should not generalize this observation into a statement that AWSCKD performs better than AWSFBA. In fact, AWSFBA DASD volumes performed extremely well in laboratory experiments and offer some benefits over AWSCKD including finer granularity on OS/2 file allocation sizes, less Pentium time to handle S/390 I/Os, and a close mapping to the underlying sectors of the dasd media. VM/ESA and VSE/ESA utilize FBA DASD in a very efficient manner. The flexibility of the PC Server System/390 in supporting both CKD and FBA emulated volumes in a mixture allows you to easily have both types in your configuration.
For CMS workloads, think carefully how CMS file I/O occurs from an OS/2 perspective and you can see that there are some pathological cases. Consider a CMS minidisk formatted in 4K blocks that contains a CMS file that occupies multiple 4K blocks whose blocks are fragmented into several different tracks on the minidisk. If the minidisk is an FBA volume, then when the OS/2 I/Os are performed to satisfy the I/O after the channel program is emulated there would be a series of 4K I/Os performed by OS/2. However, if the minidisk is on an emulated CKD volume (for example 3380), then the same CMS I/O will result in multiple full track reads (48K) by AWSCKD. The less data required from that track (a single 4K block in the pathological case), the more expensive the I/O from an PC Server perspective. If other 4K blocks of the file are within a single track, then it is possible to get some benefit from the AWSCKD track cache in the same manner as CP block paging does.
LAPS TUNING
Newer technology LAN adapters such as IBM's Streamer family are highly recommended for maximizing the communications throughput of the PC Server System/390.
XMITBUFSIZE IF USING THE IBM 16/4 TOKEN RING ADAPTER/A
Information in this section is specific to the named adapter and does not apply to the "Streamer" family of IBM LAN adapters.
The value for XMITBUFSIZE (Transmit Buffer Size) is a tuneable value for this adapter card. The default value used for IBM's 16/4 Token Ring Adapter/A may be a poor choice if you are using VTAM for subarea communications between two VTAM subareas. When performing full screen operations such as XEDIT under VM/CMS, the buffer used by VTAM will exceed the XMITBUFSIZE size specified in PROTOCOL.INI and cause segmentation. For example, when using a 16/4 Token Ring Adapter/A in a laboratory environment, multi-second response time was observed while scrolling in XEDIT when logged on via a cross-domain VTAM session from one PC Server System/390 to another. Increasing the value of XMITBUFSIZE so that it was more than the VTAM RUSIZE restored response time to its expected sub-second value. A rule of thumb for tuning XMITBUFSIZE:
z = (VTAM RUSIZE in bytes) + 9 + 40
minimum XMITBUFSIZE = Round-to-next-highest-multiple-of-eight( z )
where the "9" is the nine bytes for the transmit header and the request header, and the "40" is some extra to give a little room for bytes that may not be accounted for at this time. Note that there are different maximums for XMITBUFSIZE depending on whether your token ring is a 4Mbit or 16Mbit ring. For example, the maximum size of XMITBUFSIZE for the IBM 16/4 Token Ring Adapter/A on a 4Mbit ring is 4456. Other older adapters have limits that are smaller still for 4Mbit rings.
It should also be noted that in this particular situation, when REMOTE was set ON under CMS/XEDIT, data compression performed by CMS for fullscreen I/O also restored sub-second response time. This indicates the continued value of this virtual machine setting in tuning for VTAM use in a VM environment.
VM/ESA GENERAL TUNING
In general, anything that can be done to avoid CP having to perform an I/O is likely to provide better performance. This is especially true on P/390 and R/390 since they both rely on emulation to complete the I/O. Therefore, work done to avoid I/O that must be handled outside the S/390 processor is usually worth the effort. Make sure you take advantage of data-in-memory techniques available in VM/ESA today (such as shared segments and VM dataspaces) to help CP avoid paging. Some recent references for data-in-memory techniques and paging performance can be found in the following documents on the world wide web at http://www.vm.ibm.com/perf.