About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
DCSS Above 2 GB
Abstract
In z/VM 5.4, the usability of Discontiguous Saved Segments (DCSSs) is improved. DCSSs can now be defined in storage up to 512 GB, and so more DCSSs can be mapped into each guest. A Linux enhancement takes advantage of this to build a large block device out of several contiguously-defined DCSSs. Because Linux can build an ext2 execute-in-place (XIP) file system on a DCSS block device, large XIP file systems are now possible.
Compared to sharing large read-only file systems via DASD or Minidisk Cache (MDC), ext2 XIP in DCSS offers reductions in storage and CPU utilization. In the workloads measured for this report, we saw reductions of up to 67% in storage consumption, up to 11% in CPU utilization, and elimination of nearly all virtual I/O and real I/O. Further, compared to achieving data-in-memory via large Linux file caches, XIP in DCSS offers savings in storage, CPU, and I/O.
Introduction
With z/VM 5.4 the restriction of having to define Discontiguous Saved Segments (DCSSs) below 2 GB is removed. The new support lets a DCSS be defined in storage up to the 512 GB line.Though the maximum size of a DCSS remains 2047 MB, new Linux support lets numerous DCSSs defined contiguously be used together as one large block device. We call such contiguous placement stacking and we call such DCSSs stacked DCSSs. The Linux support for this is available from the features branch of the git390 repository found at "git://git390.osdl.marist.edu/pub/scm/linux-2.6.git features". Customers should check with specific distributions' vendors for information about availability in future distributions.
This article evaluates the performance benefits when Linux virtual machines share read-only data in stacked DCSSs. This evaluation includes measurements that compare storing shared read-only data in a DCSS to storing shared read-only data on DASD, or in MDC, or in the individual servers' Linux file caches.
Background
Both z/VM and Linux have made recent improvements to enhance their DCSS support for larger Linux block devices, Linux filesystems, and Linux swap devices.
With z/VM 5.4, a DCSS can be defined in guest storage up to but not including 512 GB. For information on how to define a DCSS in CP, see Chapter 1 of z/VM: Saved Segments Planning and Administration.
Additionally, Linux has added support to exploit stacked DCSSs as one large block device. Although z/VM continues to restrict the size of a DCSS to 2047 MB, this support removes the 2047 MB size restriction from Linux. Note that for Linux to combine the DCSSs in this way, the DCSSs must be defined contiguously. Linux cannot combine discontiguous DCSSs into one large block device.
For Linux to use a DCSS, it must create tables large enough to map memory up to and including the highest DCSS it is using. Linux supports a mem=xxx kernel parameter to size these tables to span the DCSSs being used. For more information on how to extend the Linux address space, see Chapter 33, Selected Kernel Parameters, of Device Drivers, Features and Commands.
The Linux kernel requires 64 bytes of kernel memory for each page defined in the mem=xxx statement. For example, a Linux guest capable of mapping memory up to the 512 GB line will need 8 GB of kernel memory to construct the map. Defining the stacked DCSSs lower in guest storage will reduce the amount of kernel memory needed to map them.
DCSS Type: SN versus SR
There are two primary methods for defining a segment for Linux usage. They are SN (shared read/write access) and SR (shared read-only access). The following table lists a few trade offs for SN and SR.
DCSS Attribute | SN: Shared R/W | SR: Shared R/O | |
Initial elapsed time to populate the DCSS with the files to be shared | faster (no DASD I/O necessarily) | slower (DASD I/O required) | |
File system loaded into DCSSs gets written to z/VM spool | no | yes | |
Spool processing for DCSS can delay other spool activity | no | yes | |
Note: A file system built in an SN segment does not survive a z/VM IPL. |
Method
Three separate Apache workloads were used to evaluate the system benefits experienced when Linux stacked-DCSS exploitation was applied to a Linux-file-I/O-intensive workload running in several different base case configurations.
The first base-case environment studied is a non-cached virtual I/O environment in which the files served by Apache reside on minidisk and, due to disabling of MDC and defining the virtual machine size small enough to disable its Linux file cache, the z/VM system is constrained by real I/O.
The second base-case environment studied is the MDC environment. We attempted to size MDC in such a way that the majority, if not all, of the served files would be found in MDC. The number of servers was chosen in such a way that paging I/O would not be a constraining factor.
The last base-case environment studied is a Linux file cache (LFC) environment. The Linux servers are sized sufficiently large so that the served files find their way into, and remain in, the Linux file cache. Central storage is sized sufficiently large to hold all user pages.
The following table contains the configuration parameters for the three base-case environments.
Apache workload parameters for
various base-case environments
Attribute or parameter | Non-cached virtual I/O environment | MDC environment | LFC environment |
Processors | 4 | 4 | 3 |
Central Memory | 10 GB | 8 GB | 64 GB |
XSTORE | OFF | 2 GB | 2 GB |
PAGE slots | 10378K | 10378K | 10378K |
MDC | OFF | 6 GB (capped) | ON (default) |
Server virtual machines | 16 | 4 | 6 |
Client virtual machines | 1 | 2 | 3 |
Client connections per server | 1 | 1 | 1 |
Number of 1 MB HTML files | 3000 (3 GB) | 5000 (5 GB) | 10000 (10 GB) |
Files reside during measurement | Minidisk | MDC | Linux file cache |
Server virtual memory | 512 MB | 512 MB | 10 GB |
Client virtual memory | 1 GB | 1 GB | 1 GB |
Server virtual processors | 1 | 1 | 1 |
Client virtual processors | 3 | 1 | 1 |
Notes:
System Model: 2094-719 DASD subsystem: 2105-E8, 8 GB, 4 FICON chpids Linux device driver: SSCH Minidisk file system: 10 GB residing on five 2-GB minidisks, mounted ext3 ro, five mount points |
For each of the three base-case configurations above, to construct a corresponding DCSS comparison case, the 10 GB file system was copied from DASD to an XIP-in-DCSS file system and mem=25G was added to the Linux kernel parameter file to extend the Linux kernel address space.
To provide storage for the XIP-in-DCSS file system, six DCSSs, each 2047 MB (x'7FF00' pages) in size, were defined contiguously in storage to hold 10 GB worth of files to be served by Apache. The first segment starts at the 12 GB line and runs for 2047 MB. The next five segments are stacked contiguously above the first. This excerpt from QUERY NSS MAP ALL illustrates the segments used. The output is sorted in starting-address order, so the reader can see the contiguity.
For this report, all of the segments were defined as SN.
The DCSS file system was mounted as read-only ext2 with execute-in-place (XIP) technology. Using XIP lets Linux read the files without copying the file data from the DCSS to primary memory. As the report shows later, this offers opportunity for memory savings.
For most real customer workloads using shared read-only file systems, it is likely the workload will reference only some subset of all the files actually present in the shared file system. Therefore, for each DCSS measurement, we copied all 10,000 of our ballast files into the DCSS, even though each measurement actually touched only a subset of them.
Finally, for each run that used any kind of data-in-memory technique (MDC, LFC, DCSS), the run was primed before measurement data were collected. By priming we mean that the workload ran unmeasured for a while, so as to touch each file of interest and thereby load it into memory, so that once the measurement finally began, files being touched would already be in memory. For the DCSS runs, we expected that during priming, CP would page out the unreferenced portions of the DCSSs.
Results and Discussion
Non-Cached Virtual I/O versus DCSS
Table 1 compares the non-cached virtual I/O environment to its corresponding DCSS environment.
Table 1. Non-Cached Virtual I/O versus DCSS
Apache HTTP files | Non-cached virtual I/O | DCSS | ||
Run ID | DASDGR00 | DCSSGR00 | Delta | Pct |
Tx/sec (p) | 84.7 | 106.9 | 22.2 | 26.2 |
Total Util/Proc (p) | 63.8 | 87.8 | 24.0 | 37.6 |
Virtual I/O Rate (p) | 436 | 14 | -404 | -96.7 |
DASD Paging Rate* (p) | 0.0 | 1095.0 | 1095.0 | **** |
DASD I/O Rate (p) | 465 | 434 | -31 | -6.7 |
Apache server I/O Wait (p) | 78 | 0 | -78 | -100.0 |
Resident Pages Total (p) | 2206116 | 409761 | -1796355 | -81.4 |
Resident Pages DASD (p) | 33 | 1949178 | 1949145 | 99.9 |
Resident Shared Frames (p) | 5379 | 2133241 | 2127862 | 396 |
DASD service time** msec (p) | 7.1 | .4 | -6.7 | -94.4 |
DASD response time** msec(p) | 31.2 | .4 | -30.8 | -98.7 |
DASD wait queue (p) | 1.9 | 0 | -1.9 | -100.0 |
Notes: (p) = Data taken from Performance Toolkit;
Resident Pages Total = total number of user pages in central storage;
Resident Pages DASD = total number of pages on paging DASD;
Resident Shared Frames = total number of shared frames
in central storage.
* This is the paging rate. The paging DASD I/O rate is lower because it takes into account I/O chaining. ** Average for the 5 user volumes that contain the URL files Configuration: Processor Model 2094-719; Processors 4; Central Storage 10G; XSTORE OFF; MDC OFF; 16 servers (512M); 1 client (1G); Apache files 3000 |
The base measurement was constrained by real I/O. The Apache servers were waiting on minidisk I/O 78% of the time. The DASD I/O rate is slightly higher than the virtual I/O rate because CP monitor is reading performance data. The average DASD response time is greater than the DASD service time. Additionally, the wait queue is not zero. This all demonstrates the workload is constrained by real I/O.
Switching to an XIP-in-DCSS file system increased the transaction rate by 26.2%. Several factors contributed to the increase. The virtual I/O rate decreased by 96.7% because the URL files resided in shared memory. This can be seen by the increase in resident shared pages to approximately 8 GB. Server I/O wait disappeared completely. The average DASD response time and average service time reduced by 98.7% and 94.4%, respectively. Additionally, the DASD queue length decreased by 100%. All of this illustrates the DASD I/O constraint is eliminated when the URL files reside in shared memory.
The servers used most of their 512 MB virtual memory to build the page and segment tables. Approximately 400 MB of kernel memory was needed to build the page and segment tables for 25 GB of virtual memory.
Paging DASD I/O increased in the DCSS environment. It was observed that as the measurement progressed, paging I/O was decreasing suggesting that CP was moving the unreferenced pages to paging DASD. Thus, the DCSS run became constrained by paging DASD I/O.
Total processor utilization increased from 63.8 to 87.8%. This was attributed to the reduction in virtual I/O and corresponding real I/O to user volumes. CPU time per transaction increased in the DCSS case because CP was managing an additional 10 GB of shared memory.
MDC versus DCSS
Table 2 compares the base-case MDC environment to its corresponding DCSS environment.
Apache HTTP files | MDC | DCSS | ||
Run ID | MDC0GR03 | DCSSGR07 | Delta | Pct |
Tx/sec (p) | 188.8 | 229.8 | 41.0 | 21.7 |
CP msec/Tx (p) | 6.5 | 5.8 | -0.7 | -10.8 |
Emul msec/Tx (p) | 12.3 | 11.2 | -1.1 | -9.2 |
Total Util/Proc (p) | 90.9 | 94.6 | 3.7 | 4.1 |
Virtual I/O Rate (p) | 957 | 8 | -949 | -99.2 |
DASD Avoid Rate (p) | 927 | .2 | -926.8 | -100.0 |
DASD paging rate* (p) | 0.0 | 81.1 | 81.1 | **** |
DASD I/O Rate (p) | 354 | 40.8 | -313.2 | -88.5 |
Resident Pages Total (p) | 734998 | 148236 | -586762 | -79.8 |
Resident Pages DASD (p) | 0 | 517022 | 517022 | **** |
Resident Shared Frames (p) | 5379 | 1881079 | 1875700 | 34870.8 |
Notes: See footnotes in Table 1
for data definitions.
Configuration: Processor Model 2094-719; Processors 4; Central Storage 8G; XSTORE 2 GB; MDC 6G (capped); 4 servers (512M); 2 client (1G); Apache files 5000 |
In the base measurement, unexpected DASD I/O caused by the MDC problem prevented the run from reaching 100% CPU utilization. This I/O was unexpected because we had configured the measurement so that CP had enough available pages to hold all of the referenced URL files in MDC. As a consequence, this base case measurement did not yield optimum throughput for its configuration.
The throughput increased by 21.7% in the DCSS environment. Several factors contributed to the benefit. The virtual I/O rate decreased by 99.1% because the URL files resided in shared memory. This can be seen by the increase in resident shared pages to approximately 7 GB. The other factor is the base measurement did not yield optimum results.
The DCSS run was nearly 100% CPU busy, but unexpected paging DASD I/Os prevented it from reaching an absolute 100%. CP was paging to move the unreferenced URL files out of storage.
Overall, by eliminating a majority of the virtual I/O, processor utilization increased by 4.1% and CP msec/tx and emulation msec/tx decreased by 10.8% and 9.2% respectively.
In the special studies section, two additional pairs of MDC-vs.-DCSS measurements were completed. In the first pair, we increased the number of servers from 4 to 12. In the second pair, we both increased the number of servers from 4 to 12 and decreased central storage from 8 GB to 6 GB.
LFC versus DCSS
Table 3 compares the Linux file cache environment to its corresponding DCSS environment.
Table 3. Linux File Cache versus DCSS
Apache HTTP files | Linux File Cache | DCSS | ||
Run ID | LXCACHE1 | DCSSLFC1 | Delta | Pct |
Tx/sec (p) | 149.5 | 157.4 | 8.0 | 5.3 |
CP msec/Tx (p) | 6.9 | 6.3 | -0.6 | -8.2 |
Emul msec/Tx (p) | 14.2 | 13.3 | -0.9 | -6.2 |
Total Util/Proc (p) | 98.7 | 98.6 | -0.1 | -0.1 |
Resident Pages Total (p) | 15666025 | 4018575 | -11647450 | -74.3 |
Resident Pages All (p) | 16067050 | 4018575 | -12048475 | -75.0 |
Resident Shared Frames (p) | 696 | 2665862 | 2665166 | **** |
Notes: See footnotes in Table 1
for data definitions.
Configuration: Processor Model 2094-719; Processors 3; Central Storage 64G; XSTORE 2 GB; ON (default); 6 servers (10G); 3 client (1G); Apache files 10000 |
In the base measurement, all of the URL files reside in the Linux file cache of each server. In the DCSS environment the URL files reside in the XIP-in-DCSS file system.
The throughput increased by 5.3% in the DCSS environment, but the significant benefit was the reduction in memory. In the DCSS environment the number of resident pages decreased by 75.0% or approximately 46 GB. This is because when a read-only file system is mounted with the option -xip, the referenced data is never inserted into the six Linux server file caches.
CP msec/tx and emulation msec/tx decreased slightly because both CP and Linux were managing less storage.
In the special studies section a pair of measurements was completed to isolate and study the effect of the -xip option when using DCSS file systems.
Special Studies
MDC versus DCSS, More Servers
Table 4 compares an adjusted MDC run to its corresponding DCSS case. The MDC run is like the MDC standard configuration described above, with the number of servers increased from 4 to 12. We added servers to try to drive up CPU utilization for the MDC case.
Table 4. MDC versus DCSS with 12 servers
Apache HTTP files | MDC | DCSS | ||
Run ID | MDC0GR05 | DCSSGR05 | Delta | Pct |
Tx/sec (p) | 181.6 | 221.4 | 39.9 | 22.0 |
CP msec/Tx (p) | 6.6 | 6.2 | -0.4 | -5.9 |
Emul msec/Tx (p) | 14.0 | 12.2 | -1.8 | -13.1 |
Total Util/Proc (p) | 89.4 | 95.7 | 6.3 | 7.0 |
Virtual I/O Rate (p) | 874 | 13 | -861 | -98.5 |
DASD paging rate* (p) | 1.2 | 668.0 | 667.0 | **** |
Resident Pages Total (p) | 1732962 | 229648 | -1503314 | -86.7 |
Resident Shared Frames (p) | 1070 | 1791124 | 1790054 | **** |
Notes: See footnotes in Table 1
for data definitions.
Configuration: Processor Model 2094-719; Processors 4; Central Storage 8G; XSTORE 2 GB; MDC 6G (capped); 12 servers (512M); 2 client (1G); Apache files 5000 |
In the base case, the unexpected DASD I/O caused by the MDC problem prevented the workload from reaching 100% CPU utilization.
The throughput increased by 22.0% in the DCSS environment. Several factors contributed to the benefit. The virtual I/O rate decreased by 98.5% because the URL files resided in shared memory. This can be seen by the increase in resident shared pages to approximately 7 GB. On average CP was paging approximately 667 pages/sec to paging DASD and it ran nearly 100% CPU utilization at steady state but the DASD I/O prevented it from reaching absolute 100%. Overall, eliminating virtual I/O and reducing the amount of memory management in both CP and Linux provided benefit in the DCSS environment.
Comparing this back to Table 2, we expected that as we added servers both measurements would reach 100% CPU busy. But again, in the base case the unexpected DASD I/O caused by the MDC problem prevented it from reaching 100% CPU busy. The DCSS run was nearly 100% CPU busy, but unexpected paging DASD I/Os prevented it from reaching an absolute 100%.
MDC versus DCSS, More Servers and Constrained Storage
Table 5 has a comparison of selected values for the MDC standard configuration except the number of servers was increased from 4 to 12 and central storage was reduced from 8 GB to 6 GB.
Table 5. MDC versus DCSS, 12 Servers, 6 GB Central Storage
Apache HTTP files | MDC | DCSS | ||
Run ID | MDCGR6G0 | DCSGR6G3 | Delta | Pct |
Tx/sec (p) | 192.3 | 218.7 | 26.4 | 13.7 |
CP msec/Tx (p) | 6.6 | 6.2 | -0.4 | -5.9 |
Emul msec/Tx (p) | 13.8 | 12.2 | -1.7 | -11.9 |
Total Util/Proc (p) | 92.9 | 94.5 | 1.6 | 1.7 |
Virtual I/O Rate (p) | 916 | 14 | -902 | -98.5 |
DASD paging rate* (p) | 246.7 | 969.2 | 722.5 | 292.9 |
Resident Pages Total (p) | 1525860 | 241830 | -1284030 | -84.2 |
Resident Shared Frames (p) | 55 | 1261542 | 1261487 | **** |
Notes: See footnotes in Table 1
for data definitions.
Configuration: Processor Model 2094-719; Processors 4; Central Storage 6G; XSTORE 2 GB; MDC 6G (capped); 12 servers (512M); 2 client (1G); Apache files 5000 |
In the base case, the unexpected DASD I/O caused by the MDC problem prevented it from reaching the expected storage over commitment.
The throughput increased by 13.7% in the DCSS environment. Virtual I/O decreased by 98.5% because the URL files resided in shared memory. Resident pages decreased by approximately 5 GB while shared pages increased by approximately 5 GB. Paging DASD I/O increased because CP was managing an extra 10 GB of shared memory. The DCSS run was nearly 100% CPU busy at steady state. Paging I/O was preventing the system from reaching 100% CPU utilization.
Serving pages in the DCSS environment cost less than in the MDC environment. Emulation msec/tx decreased by 11.9%, because Linux memory management activity decreased, because DCSS XIP made it unnecessary to read the files into the Linux file cache. CP msec/tx decreased by 5.9% because CP handled less virtual I/O.
Comparing this back to Table 4, the throughput for the MDC case increased as memory reduced. Again, this is the MDC problem. The system is less affected by the MDC problem when memory contention increases. The throughput for the DCSS case decreased because paging DASD I/O increased.
DCSS non-xip versus DCSS xip
Table 6 has a comparison of selected values for DCSS without XIP versus DCSS with XIP, using the LFC configuration.
Table 6. DCSS without -xip option versus DCSS with -xip option
Apache HTTP files mounted | non-xip | xip | ||
Run ID | DCSSNXG4 | DCSSNXG2 | Delta | Pct |
Tx/sec (p) | 150.0 | 156.1 | 10.1 | 6.9 |
CP msec/Tx (p) | 6.7 | 6.4 | -0.4 | -5.5 |
Emul msec/Tx (p) | 14.2 | 13.3 | -0.8 | -6.0 |
Total Util/Proc (p) | 97.0 | 98.5 | 1.5 | 1.5 |
DASD paging rate* (p) | 748.8 | 0.0 | -748.8 | -100.0 |
Avail List >2 GB (p) | 4241 | 7836000 | 7831759 | **** |
Avail List <2 GB (p) | 165 | 275000 | 274835 | **** |
Resident Pages Total (p) | 15707550 | 4098300 | -11609250 | -73.9 |
Resident Shared Frames (p) | 542928 | 2665817 | 2211889 | 391.0 |
Notes: See footnotes in Table 1
for data definitions.
Configuration: Processor Model 2094-719; Processors 3; Central Storage 64G; XSTORE 2 GB; ON (default); 6 servers (10G); 3 client (1G); Apache files 10000 |
In the base case, CP was managing seven copies of the 10,000 URL files. In the DCSS XIP case, CP was managing one copy of the 10,000 URL files.
The throughput increased by 6.9%. This was attributed to the reduction in memory requirements that eliminated DASD paging. We estimated the memory savings to be about 47 GB, based on the available list having grown by 31 GB, plus 9 GB of Linux file cache space that one guest used at the beginning of the run to load the XIP file system and never released, plus 7 GB that MDC used during the XIP load and never released. The "Resident Pages Total" row of the table shows a decrease of 44 GB in resident pages, which roughly corroborates the 47 GB estimate. Because the partition was sized at 64 GB central plus 2 GB XSTORE, we roughly estimate the percent memory savings to be at least (44 GB / 66 GB) or 67%.
Again, CP msec/tx and emulation msec/tx were reduced because both CP and Linux were managing less memory.
Summary and Conclusions
Overall, sharing read-only data in a DCSS reduced system resource requirements.
Compared to the non-cached virtual I/O base case, the corresponding DCSS environment reduced the number of virtual I/Os and real I/Os. Paging DASD I/O increased but this was to be expected because CP was managing more memory.
In the MDC configurations, the corresponding DCSS environments reduced the number of virtual I/Os. Paging DASD I/O increased in all three configurations but this did not override the benefit.
In the Linux file cache configuration, where the Linux file cache was large enough to hold the URL files, the DCSS environment reduced the memory requirement.
Compared to not using XIP, the Linux mount option -xip eliminated the need to move the read-only shared data from the DCSS into the individual Linux server file caches. This reduced the memory requirement and memory management overhead.
It should be stressed that the mount option -xip was an important factor in all of our DCSS measurement results.