Contents | Previous | Next

DCSS Above 2 GB

Abstract

In z/VM 5.4, the usability of Discontiguous Saved Segments (DCSSs) is improved. DCSSs can now be defined in storage up to 512 GB, and so more DCSSs can be mapped into each guest. A Linux enhancement takes advantage of this to build a large block device out of several contiguously-defined DCSSs. Because Linux can build an ext2 execute-in-place (XIP) file system on a DCSS block device, large XIP file systems are now possible.

Compared to sharing large read-only file systems via DASD or Minidisk Cache (MDC), ext2 XIP in DCSS offers reductions in storage and CPU utilization. In the workloads measured for this report, we saw reductions of up to 67% in storage consumption, up to 11% in CPU utilization, and elimination of nearly all virtual I/O and real I/O. Further, compared to achieving data-in-memory via large Linux file caches, XIP in DCSS offers savings in storage, CPU, and I/O.

Introduction

With z/VM 5.4 the restriction of having to define Discontiguous Saved Segments (DCSSs) below 2 GB is removed. The new support lets a DCSS be defined in storage up to the 512 GB line.

Though the maximum size of a DCSS remains 2047 MB, new Linux support lets numerous DCSSs defined contiguously be used together as one large block device. We call such contiguous placement stacking and we call such DCSSs stacked DCSSs. The Linux support for this is available from the features branch of the git390 repository found at "git://git390.osdl.marist.edu/pub/scm/linux-2.6.git features". Customers should check with specific distributions' vendors for information about availability in future distributions.

This article evaluates the performance benefits when Linux virtual machines share read-only data in stacked DCSSs. This evaluation includes measurements that compare storing shared read-only data in a DCSS to storing shared read-only data on DASD, or in MDC, or in the individual servers' Linux file caches.

Background

Both z/VM and Linux have made recent improvements to enhance their DCSS support for larger Linux block devices, Linux filesystems, and Linux swap devices.

With z/VM 5.4, a DCSS can be defined in guest storage up to but not including 512 GB. For information on how to define a DCSS in CP, see Chapter 1 of z/VM: Saved Segments Planning and Administration.

Additionally, Linux has added support to exploit stacked DCSSs as one large block device. Although z/VM continues to restrict the size of a DCSS to 2047 MB, this support removes the 2047 MB size restriction from Linux. Note that for Linux to combine the DCSSs in this way, the DCSSs must be defined contiguously. Linux cannot combine discontiguous DCSSs into one large block device.

For Linux to use a DCSS, it must create tables large enough to map memory up to and including the highest DCSS it is using. Linux supports a mem=xxx kernel parameter to size these tables to span the DCSSs being used. For more information on how to extend the Linux address space, see Chapter 33, Selected Kernel Parameters, of Device Drivers, Features and Commands.

The Linux kernel requires 64 bytes of kernel memory for each page defined in the mem=xxx statement. For example, a Linux guest capable of mapping memory up to the 512 GB line will need 8 GB of kernel memory to construct the map. Defining the stacked DCSSs lower in guest storage will reduce the amount of kernel memory needed to map them.

DCSS Type: SN versus SR

There are two primary methods for defining a segment for Linux usage. They are SN (shared read/write access) and SR (shared read-only access). The following table lists a few trade offs for SN and SR.

Trade Offs SN vs. SR

DCSS Attribute SN: Shared R/W SR: Shared R/O
Initial elapsed time to populate the DCSS with the files to be shared faster (no DASD I/O necessarily) slower (DASD I/O required)
File system loaded into DCSSs gets written to z/VM spool no yes
Spool processing for DCSS can delay other spool activity no yes
Note: A file system built in an SN segment does not survive a z/VM IPL.

Method

Three separate Apache workloads were used to evaluate the system benefits experienced when Linux stacked-DCSS exploitation was applied to a Linux-file-I/O-intensive workload running in several different base case configurations.

The first base-case environment studied is a non-cached virtual I/O environment in which the files served by Apache reside on minidisk and, due to disabling of MDC and defining the virtual machine size small enough to disable its Linux file cache, the z/VM system is constrained by real I/O.

The second base-case environment studied is the MDC environment. We attempted to size MDC in such a way that the majority, if not all, of the served files would be found in MDC. The number of servers was chosen in such a way that paging I/O would not be a constraining factor.

The last base-case environment studied is a Linux file cache (LFC) environment. The Linux servers are sized sufficiently large so that the served files find their way into, and remain in, the Linux file cache. Central storage is sized sufficiently large to hold all user pages.

The following table contains the configuration parameters for the three base-case environments.

Apache workload parameters for various base-case environments

Attribute or parameter Non-cached virtual I/O environment MDC environment LFC environment
Processors 4 4 3
Central Memory 10 GB 8 GB 64 GB
XSTORE OFF 2 GB 2 GB
PAGE slots 10378K 10378K 10378K
MDC OFF 6 GB (capped) ON (default)
Server virtual machines 16 4 6
Client virtual machines 1 2 3
Client connections per server 1 1 1
Number of 1 MB HTML files 3000 (3 GB) 5000 (5 GB) 10000 (10 GB)
Files reside during measurement Minidisk MDC Linux file cache
Server virtual memory 512 MB 512 MB 10 GB
Client virtual memory 1 GB 1 GB 1 GB
Server virtual processors 1 1 1
Client virtual processors 3 1 1
Notes: System Model: 2094-719
DASD subsystem: 2105-E8, 8 GB, 4 FICON chpids
Linux device driver: SSCH
Minidisk file system: 10 GB residing on five 2-GB minidisks, mounted ext3 ro, five mount points

For each of the three base-case configurations above, to construct a corresponding DCSS comparison case, the 10 GB file system was copied from DASD to an XIP-in-DCSS file system and mem=25G was added to the Linux kernel parameter file to extend the Linux kernel address space.

To provide storage for the XIP-in-DCSS file system, six DCSSs, each 2047 MB (x'7FF00' pages) in size, were defined contiguously in storage to hold 10 GB worth of files to be served by Apache. The first segment starts at the 12 GB line and runs for 2047 MB. The next five segments are stacked contiguously above the first. This excerpt from QUERY NSS MAP ALL illustrates the segments used. The output is sorted in starting-address order, so the reader can see the contiguity.

FILE FILENAME FILETYPE BEGPAG ENDPAG TYPE CL #USERS 0101 HTTP1 DCSSG 300000 37FEFF SN A 00000 0102 HTTP2 DCSSG 37FF00 3FFDFF SN A 00000 0103 HTTP3 DCSSG 3FFE00 47FCFF SN A 00000 0104 HTTP4 DCSSG 47FD00 4FFBFF SN A 00000 0105 HTTP5 DCSSG 4FFC00 57FAFF SN A 00000 0106 HTTP6 DCSSG 57FB00 5FF9FF SN A 00000

For this report, all of the segments were defined as SN.

The DCSS file system was mounted as read-only ext2 with execute-in-place (XIP) technology. Using XIP lets Linux read the files without copying the file data from the DCSS to primary memory. As the report shows later, this offers opportunity for memory savings.

For most real customer workloads using shared read-only file systems, it is likely the workload will reference only some subset of all the files actually present in the shared file system. Therefore, for each DCSS measurement, we copied all 10,000 of our ballast files into the DCSS, even though each measurement actually touched only a subset of them.

Finally, for each run that used any kind of data-in-memory technique (MDC, LFC, DCSS), the run was primed before measurement data were collected. By priming we mean that the workload ran unmeasured for a while, so as to touch each file of interest and thereby load it into memory, so that once the measurement finally began, files being touched would already be in memory. For the DCSS runs, we expected that during priming, CP would page out the unreferenced portions of the DCSSs.

Results and Discussion

Non-Cached Virtual I/O versus DCSS

Table 1 compares the non-cached virtual I/O environment to its corresponding DCSS environment.

Table 1. Non-Cached Virtual I/O versus DCSS

Apache HTTP files Non-cached virtual I/O DCSS    
Run ID DASDGR00 DCSSGR00 Delta Pct
Tx/sec (p) 84.7 106.9 22.2 26.2
Total Util/Proc (p) 63.8 87.8 24.0 37.6
Virtual I/O Rate (p) 436 14 -404 -96.7
DASD Paging Rate* (p) 0.0 1095.0 1095.0 ****
DASD I/O Rate (p) 465 434 -31 -6.7
Apache server I/O Wait (p) 78 0 -78 -100.0
Resident Pages Total (p) 2206116 409761 -1796355 -81.4
Resident Pages DASD (p) 33 1949178 1949145 99.9
Resident Shared Frames (p) 5379 2133241 2127862 396
DASD service time** msec (p) 7.1 .4 -6.7 -94.4
DASD response time** msec(p) 31.2 .4 -30.8 -98.7
DASD wait queue (p) 1.9 0 -1.9 -100.0
Notes: (p) = Data taken from Performance Toolkit; Resident Pages Total = total number of user pages in central storage; Resident Pages DASD = total number of pages on paging DASD; Resident Shared Frames = total number of shared frames in central storage.

* This is the paging rate. The paging DASD I/O rate is lower because it takes into account I/O chaining.

** Average for the 5 user volumes that contain the URL files

Configuration: Processor Model 2094-719; Processors 4; Central Storage 10G; XSTORE OFF; MDC OFF; 16 servers (512M); 1 client (1G); Apache files 3000

The base measurement was constrained by real I/O. The Apache servers were waiting on minidisk I/O 78% of the time. The DASD I/O rate is slightly higher than the virtual I/O rate because CP monitor is reading performance data. The average DASD response time is greater than the DASD service time. Additionally, the wait queue is not zero. This all demonstrates the workload is constrained by real I/O.

Switching to an XIP-in-DCSS file system increased the transaction rate by 26.2%. Several factors contributed to the increase. The virtual I/O rate decreased by 96.7% because the URL files resided in shared memory. This can be seen by the increase in resident shared pages to approximately 8 GB. Server I/O wait disappeared completely. The average DASD response time and average service time reduced by 98.7% and 94.4%, respectively. Additionally, the DASD queue length decreased by 100%. All of this illustrates the DASD I/O constraint is eliminated when the URL files reside in shared memory.

The servers used most of their 512 MB virtual memory to build the page and segment tables. Approximately 400 MB of kernel memory was needed to build the page and segment tables for 25 GB of virtual memory.

Paging DASD I/O increased in the DCSS environment. It was observed that as the measurement progressed, paging I/O was decreasing suggesting that CP was moving the unreferenced pages to paging DASD. Thus, the DCSS run became constrained by paging DASD I/O.

Total processor utilization increased from 63.8 to 87.8%. This was attributed to the reduction in virtual I/O and corresponding real I/O to user volumes. CPU time per transaction increased in the DCSS case because CP was managing an additional 10 GB of shared memory.

MDC versus DCSS

Table 2 compares the base-case MDC environment to its corresponding DCSS environment.

Table 2. MDC versus DCSS

Apache HTTP files MDC DCSS    
Run ID MDC0GR03 DCSSGR07 Delta Pct
Tx/sec (p) 188.8 229.8 41.0 21.7
CP msec/Tx (p) 6.5 5.8 -0.7 -10.8
Emul msec/Tx (p) 12.3 11.2 -1.1 -9.2
Total Util/Proc (p) 90.9 94.6 3.7 4.1
Virtual I/O Rate (p) 957 8 -949 -99.2
DASD Avoid Rate (p) 927 .2 -926.8 -100.0
DASD paging rate* (p) 0.0 81.1 81.1 ****
DASD I/O Rate (p) 354 40.8 -313.2 -88.5
Resident Pages Total (p) 734998 148236 -586762 -79.8
Resident Pages DASD (p) 0 517022 517022 ****
Resident Shared Frames (p) 5379 1881079 1875700 34870.8
Notes: See footnotes in Table 1 for data definitions.

Configuration: Processor Model 2094-719; Processors 4; Central Storage 8G; XSTORE 2 GB; MDC 6G (capped); 4 servers (512M); 2 client (1G); Apache files 5000

In the base measurement, unexpected DASD I/O caused by the MDC problem prevented the run from reaching 100% CPU utilization. This I/O was unexpected because we had configured the measurement so that CP had enough available pages to hold all of the referenced URL files in MDC. As a consequence, this base case measurement did not yield optimum throughput for its configuration.

The throughput increased by 21.7% in the DCSS environment. Several factors contributed to the benefit. The virtual I/O rate decreased by 99.1% because the URL files resided in shared memory. This can be seen by the increase in resident shared pages to approximately 7 GB. The other factor is the base measurement did not yield optimum results.

The DCSS run was nearly 100% CPU busy, but unexpected paging DASD I/Os prevented it from reaching an absolute 100%. CP was paging to move the unreferenced URL files out of storage.

Overall, by eliminating a majority of the virtual I/O, processor utilization increased by 4.1% and CP msec/tx and emulation msec/tx decreased by 10.8% and 9.2% respectively.

In the special studies section, two additional pairs of MDC-vs.-DCSS measurements were completed. In the first pair, we increased the number of servers from 4 to 12. In the second pair, we both increased the number of servers from 4 to 12 and decreased central storage from 8 GB to 6 GB.

LFC versus DCSS

Table 3 compares the Linux file cache environment to its corresponding DCSS environment.

Table 3. Linux File Cache versus DCSS

Apache HTTP files Linux File Cache DCSS    
Run ID LXCACHE1 DCSSLFC1 Delta Pct
Tx/sec (p) 149.5 157.4 8.0 5.3
CP msec/Tx (p) 6.9 6.3 -0.6 -8.2
Emul msec/Tx (p) 14.2 13.3 -0.9 -6.2
Total Util/Proc (p) 98.7 98.6 -0.1 -0.1
Resident Pages Total (p) 15666025 4018575 -11647450 -74.3
Resident Pages All (p) 16067050 4018575 -12048475 -75.0
Resident Shared Frames (p) 696 2665862 2665166 ****
Notes: See footnotes in Table 1 for data definitions.

Configuration: Processor Model 2094-719; Processors 3; Central Storage 64G; XSTORE 2 GB; ON (default); 6 servers (10G); 3 client (1G); Apache files 10000

In the base measurement, all of the URL files reside in the Linux file cache of each server. In the DCSS environment the URL files reside in the XIP-in-DCSS file system.

The throughput increased by 5.3% in the DCSS environment, but the significant benefit was the reduction in memory. In the DCSS environment the number of resident pages decreased by 75.0% or approximately 46 GB. This is because when a read-only file system is mounted with the option -xip, the referenced data is never inserted into the six Linux server file caches.

CP msec/tx and emulation msec/tx decreased slightly because both CP and Linux were managing less storage.

In the special studies section a pair of measurements was completed to isolate and study the effect of the -xip option when using DCSS file systems.

Special Studies

MDC versus DCSS, More Servers

Table 4 compares an adjusted MDC run to its corresponding DCSS case. The MDC run is like the MDC standard configuration described above, with the number of servers increased from 4 to 12. We added servers to try to drive up CPU utilization for the MDC case.

Table 4. MDC versus DCSS with 12 servers

Apache HTTP files MDC DCSS    
Run ID MDC0GR05 DCSSGR05 Delta Pct
Tx/sec (p) 181.6 221.4 39.9 22.0
CP msec/Tx (p) 6.6 6.2 -0.4 -5.9
Emul msec/Tx (p) 14.0 12.2 -1.8 -13.1
Total Util/Proc (p) 89.4 95.7 6.3 7.0
Virtual I/O Rate (p) 874 13 -861 -98.5
DASD paging rate* (p) 1.2 668.0 667.0 ****
Resident Pages Total (p) 1732962 229648 -1503314 -86.7
Resident Shared Frames (p) 1070 1791124 1790054 ****
Notes: See footnotes in Table 1 for data definitions.

Configuration: Processor Model 2094-719; Processors 4; Central Storage 8G; XSTORE 2 GB; MDC 6G (capped); 12 servers (512M); 2 client (1G); Apache files 5000

In the base case, the unexpected DASD I/O caused by the MDC problem prevented the workload from reaching 100% CPU utilization.

The throughput increased by 22.0% in the DCSS environment. Several factors contributed to the benefit. The virtual I/O rate decreased by 98.5% because the URL files resided in shared memory. This can be seen by the increase in resident shared pages to approximately 7 GB. On average CP was paging approximately 667 pages/sec to paging DASD and it ran nearly 100% CPU utilization at steady state but the DASD I/O prevented it from reaching absolute 100%. Overall, eliminating virtual I/O and reducing the amount of memory management in both CP and Linux provided benefit in the DCSS environment.

Comparing this back to Table 2, we expected that as we added servers both measurements would reach 100% CPU busy. But again, in the base case the unexpected DASD I/O caused by the MDC problem prevented it from reaching 100% CPU busy. The DCSS run was nearly 100% CPU busy, but unexpected paging DASD I/Os prevented it from reaching an absolute 100%.

MDC versus DCSS, More Servers and Constrained Storage

Table 5 has a comparison of selected values for the MDC standard configuration except the number of servers was increased from 4 to 12 and central storage was reduced from 8 GB to 6 GB.

Table 5. MDC versus DCSS, 12 Servers, 6 GB Central Storage

Apache HTTP files MDC DCSS    
Run ID MDCGR6G0 DCSGR6G3 Delta Pct
Tx/sec (p) 192.3 218.7 26.4 13.7
CP msec/Tx (p) 6.6 6.2 -0.4 -5.9
Emul msec/Tx (p) 13.8 12.2 -1.7 -11.9
Total Util/Proc (p) 92.9 94.5 1.6 1.7
Virtual I/O Rate (p) 916 14 -902 -98.5
DASD paging rate* (p) 246.7 969.2 722.5 292.9
Resident Pages Total (p) 1525860 241830 -1284030 -84.2
Resident Shared Frames (p) 55 1261542 1261487 ****
Notes: See footnotes in Table 1 for data definitions.

Configuration: Processor Model 2094-719; Processors 4; Central Storage 6G; XSTORE 2 GB; MDC 6G (capped); 12 servers (512M); 2 client (1G); Apache files 5000

In the base case, the unexpected DASD I/O caused by the MDC problem prevented it from reaching the expected storage over commitment.

The throughput increased by 13.7% in the DCSS environment. Virtual I/O decreased by 98.5% because the URL files resided in shared memory. Resident pages decreased by approximately 5 GB while shared pages increased by approximately 5 GB. Paging DASD I/O increased because CP was managing an extra 10 GB of shared memory. The DCSS run was nearly 100% CPU busy at steady state. Paging I/O was preventing the system from reaching 100% CPU utilization.

Serving pages in the DCSS environment cost less than in the MDC environment. Emulation msec/tx decreased by 11.9%, because Linux memory management activity decreased, because DCSS XIP made it unnecessary to read the files into the Linux file cache. CP msec/tx decreased by 5.9% because CP handled less virtual I/O.

Comparing this back to Table 4, the throughput for the MDC case increased as memory reduced. Again, this is the MDC problem. The system is less affected by the MDC problem when memory contention increases. The throughput for the DCSS case decreased because paging DASD I/O increased.

DCSS non-xip versus DCSS xip

Table 6 has a comparison of selected values for DCSS without XIP versus DCSS with XIP, using the LFC configuration.

Table 6. DCSS without -xip option versus DCSS with -xip option

Apache HTTP files mounted non-xip xip    
Run ID DCSSNXG4 DCSSNXG2 Delta Pct
Tx/sec (p) 150.0 156.1 10.1 6.9
CP msec/Tx (p) 6.7 6.4 -0.4 -5.5
Emul msec/Tx (p) 14.2 13.3 -0.8 -6.0
Total Util/Proc (p) 97.0 98.5 1.5 1.5
DASD paging rate* (p) 748.8 0.0 -748.8 -100.0
Avail List >2 GB (p) 4241 7836000 7831759 ****
Avail List <2 GB (p) 165 275000 274835 ****
Resident Pages Total (p) 15707550 4098300 -11609250 -73.9
Resident Shared Frames (p) 542928 2665817 2211889 391.0
Notes: See footnotes in Table 1 for data definitions.

Configuration: Processor Model 2094-719; Processors 3; Central Storage 64G; XSTORE 2 GB; ON (default); 6 servers (10G); 3 client (1G); Apache files 10000

In the base case, CP was managing seven copies of the 10,000 URL files. In the DCSS XIP case, CP was managing one copy of the 10,000 URL files.

The throughput increased by 6.9%. This was attributed to the reduction in memory requirements that eliminated DASD paging. We estimated the memory savings to be about 47 GB, based on the available list having grown by 31 GB, plus 9 GB of Linux file cache space that one guest used at the beginning of the run to load the XIP file system and never released, plus 7 GB that MDC used during the XIP load and never released. The "Resident Pages Total" row of the table shows a decrease of 44 GB in resident pages, which roughly corroborates the 47 GB estimate. Because the partition was sized at 64 GB central plus 2 GB XSTORE, we roughly estimate the percent memory savings to be at least (44 GB / 66 GB) or 67%.

Again, CP msec/tx and emulation msec/tx were reduced because both CP and Linux were managing less memory.

Summary and Conclusions

Overall, sharing read-only data in a DCSS reduced system resource requirements.

Compared to the non-cached virtual I/O base case, the corresponding DCSS environment reduced the number of virtual I/Os and real I/Os. Paging DASD I/O increased but this was to be expected because CP was managing more memory.

In the MDC configurations, the corresponding DCSS environments reduced the number of virtual I/Os. Paging DASD I/O increased in all three configurations but this did not override the benefit.

In the Linux file cache configuration, where the Linux file cache was large enough to hold the URL files, the DCSS environment reduced the memory requirement.

Compared to not using XIP, the Linux mount option -xip eliminated the need to move the read-only shared data from the DCSS into the individual Linux server file caches. This reduced the memory requirement and memory management overhead.

It should be stressed that the mount option -xip was an important factor in all of our DCSS measurement results.

Contents | Previous | Next