|
Contents | Previous | Next
DCSS Above 2 GB
Abstract
In z/VM 5.4, the usability of Discontiguous Saved Segments (DCSSs)
is improved. DCSSs can now
be defined in storage up to 512 GB, and so more DCSSs
can be mapped into each guest.
A
Linux enhancement
takes advantage of this to build a large
block device out of several contiguously-defined DCSSs.
Because Linux can build an ext2 execute-in-place
(XIP) file system on a
DCSS block device,
large XIP file systems are now possible.
Compared to sharing large read-only file systems
via DASD or
Minidisk Cache (MDC),
ext2 XIP in DCSS
offers reductions in storage and CPU
utilization.
In the workloads measured for this report,
we saw reductions of
up to 67% in storage consumption,
up to 11% in CPU utilization,
and elimination of nearly all virtual I/O and real I/O.
Further, compared to achieving data-in-memory via large
Linux file caches, XIP in DCSS offers savings in storage,
CPU, and I/O.
Introduction
With z/VM 5.4 the restriction of having to define
Discontiguous Saved Segments (DCSSs)
below 2 GB is removed.
The new support lets a DCSS be defined
in storage up to the 512 GB line.
Though the maximum size of a DCSS remains 2047 MB,
new Linux support lets
numerous DCSSs defined contiguously
be used together as one large block device.
We call such contiguous placement
stacking and we call such DCSSs stacked DCSSs.
The Linux support for this is available from the features branch of the
git390 repository found at
"git://git390.osdl.marist.edu/pub/scm/linux-2.6.git features".
Customers should check with specific distributions' vendors
for information about availability in future distributions.
This article evaluates the performance benefits when
Linux virtual machines
share read-only data in stacked DCSSs.
This evaluation includes measurements that compare
storing shared read-only data
in a DCSS to
storing shared read-only data on DASD, or
in MDC, or
in the individual servers' Linux file caches.
Background
Both z/VM and Linux have made recent
improvements to enhance their DCSS support for
larger Linux block devices,
Linux filesystems,
and Linux swap devices.
With z/VM 5.4,
a DCSS can be defined
in guest storage up to but not including
512 GB.
For information on how to define a DCSS in CP, see Chapter 1 of
Saved Segments Planning and Administration.
Additionally, Linux has
added support to exploit
stacked DCSSs
as one large block device.
Although z/VM continues to
restrict the size of a DCSS to 2047 MB, this support
removes the 2047 MB size restriction from Linux.
Note that for Linux to combine the DCSSs in this way,
the DCSSs must be defined contiguously. Linux cannot
combine discontiguous DCSSs into one large block device.
For Linux to use a DCSS, it must create
tables large enough to map memory up to
and including
the highest DCSS it is using.
Linux supports
a mem=xxx kernel parameter
to size these tables
to span the DCSSs being used.
For more information on how to extend the Linux address space, see
Chapter 33, Selected Kernel Parameters, of
Device Drivers, Features and Commands.
The Linux kernel requires 64 bytes
of kernel memory
for each page defined in the
mem=xxx
statement.
For example,
a Linux guest capable of mapping memory up to the 512 GB line
will need 8 GB of kernel memory to construct the map.
Defining the stacked DCSSs
lower in guest storage will reduce the amount of
kernel memory needed to map them.
DCSS Type: SN versus SR
There are two primary methods for defining a segment for Linux usage.
They are SN (shared read/write access) and SR (shared read-only
access). The following table lists a few trade offs for SN and SR.
Trade Offs SN vs. SR
| DCSS Attribute |
SN: Shared R/W |
SR: Shared R/O |
| Initial elapsed time to populate the DCSS with the files to be
shared
|
faster (no DASD I/O necessarily) |
slower (DASD I/O required) |
| File system loaded into DCSSs gets written to z/VM spool |
no |
yes |
| Spool processing for DCSS can delay other spool activity |
no |
yes |
|
Note: A file system built in an SN segment does
not survive a z/VM IPL.
|
Method
Three separate Apache workloads
were used to evaluate the
system benefits experienced
when Linux stacked-DCSS exploitation
was applied to a Linux-file-I/O-intensive
workload running in several different base case
configurations.
The first base-case
environment studied is a non-cached virtual I/O
environment in which the files served by Apache reside
on minidisk and, due to disabling of MDC and
defining the virtual machine size small enough to disable
its Linux file cache,
the z/VM system is constrained by real I/O.
The second base-case
environment studied is the MDC
environment.
We attempted
to size MDC in such a way that the majority, if not all, of the
served files would be found in MDC.
The number of servers was chosen in such a way that
paging I/O would not be a
constraining factor.
The last base-case
environment studied is a Linux file cache (LFC)
environment.
The Linux servers are sized sufficiently
large so that the served files find their way into, and
remain in, the Linux file cache. Central storage is sized sufficiently
large to hold all user pages.
The following table contains the configuration parameters
for the three base-case environments.
Apache workload parameters for
various base-case environments
| Attribute or parameter |
Non-cached virtual I/O environment |
MDC environment |
LFC environment |
| Processors |
4 |
4 |
3 |
| Central Memory |
10 GB |
8 GB |
64 GB |
| XSTORE |
OFF |
2 GB |
2 GB |
| PAGE slots |
10378K |
10378K |
10378K |
| MDC |
OFF |
6 GB (capped) |
ON (default) |
| Server virtual machines |
16 |
4 |
6 |
| Client virtual machines |
1 |
2 |
3 |
| Client connections per server |
1 |
1 |
1 |
| Number of 1 MB HTML files |
3000 (3 GB) |
5000 (5 GB) |
10000 (10 GB) |
| Files reside during measurement |
Minidisk |
MDC |
Linux file cache |
| Server virtual memory |
512 MB |
512 MB |
10 GB |
| Client virtual memory |
1 GB |
1 GB |
1 GB |
| Server virtual processors |
1 |
1 |
1 |
| Client virtual processors |
3 |
1 |
1 |
Notes:
System Model: 2094-719
DASD subsystem: 2105-E8, 8 GB, 4 FICON chpids
Linux device driver: SSCH
Minidisk file system:
10 GB residing on five
2-GB minidisks, mounted ext3 ro,
five mount points
|
For each of the three base-case configurations above,
to construct a corresponding
DCSS comparison case,
the 10 GB file system was copied from DASD to an XIP-in-DCSS
file system
and mem=25G was added to the Linux kernel parameter file to
extend the Linux kernel address space.
To provide storage for the XIP-in-DCSS file system,
six DCSSs, each 2047 MB (x'7FF00' pages)
in size, were defined contiguously in storage
to hold 10 GB worth of files to be served by Apache.
The first segment starts at the 12 GB line and runs for
2047 MB. The next five segments are stacked contiguously
above the first.
This excerpt from
QUERY NSS MAP ALL illustrates the segments used.
The output is sorted in starting-address order, so the reader
can see the contiguity.
FILE FILENAME FILETYPE BEGPAG ENDPAG TYPE CL #USERS
0101 HTTP1 DCSSG 300000 37FEFF SN A 00000
0102 HTTP2 DCSSG 37FF00 3FFDFF SN A 00000
0103 HTTP3 DCSSG 3FFE00 47FCFF SN A 00000
0104 HTTP4 DCSSG 47FD00 4FFBFF SN A 00000
0105 HTTP5 DCSSG 4FFC00 57FAFF SN A 00000
0106 HTTP6 DCSSG 57FB00 5FF9FF SN A 00000
For this report, all of the segments were defined as SN.
The DCSS file system was mounted as read-only
ext2 with
execute-in-place
(XIP) technology.
Using XIP lets Linux read the files without copying the file data
from the DCSS to primary memory. As the report shows later,
this offers opportunity for memory savings.
For most real customer workloads using shared read-only
file systems,
it is likely the workload will reference only some subset
of all the files actually present in the shared file system.
Therefore, for each DCSS measurement,
we copied all 10,000 of our ballast
files into the DCSS, even though each
measurement actually touched only
a subset of them.
Finally, for each run that used any kind of data-in-memory
technique (MDC, LFC, DCSS),
the run was primed before measurement data were
collected. By priming we mean that the workload ran
unmeasured for a while, so as to touch each file of interest
and thereby load it into memory, so that once the measurement
finally began, files being touched would already be in memory.
For the DCSS runs, we expected that during priming,
CP would page out the unreferenced portions of the DCSSs.
Results and Discussion
Non-Cached Virtual I/O versus DCSS
Table 1
compares the non-cached virtual
I/O environment
to its corresponding
DCSS environment.
Table 1. Non-Cached Virtual I/O versus DCSS
| Apache HTTP files |
Non-cached virtual I/O |
DCSS |
|
|
| Run ID |
DASDGR00 |
DCSSGR00 |
Delta |
Pct |
| Tx/sec (p) |
84.7 |
106.9 |
22.2 |
26.2 |
| Total Util/Proc (p) |
63.8 |
87.8 |
24.0 |
37.6 |
| Virtual I/O Rate (p) |
436 |
14 |
-404 |
-96.7 |
| DASD Paging Rate* (p) |
0.0 |
1095.0 |
1095.0 |
**** |
| DASD I/O Rate (p) |
465 |
434 |
-31 |
-6.7 |
| Apache server I/O Wait (p) |
78 |
0 |
-78 |
-100.0 |
| Resident Pages Total (p) |
2206116 |
409761 |
-1796355 |
-81.4 |
| Resident Pages DASD (p) |
33 |
1949178 |
1949145 |
99.9 |
| Resident Shared Frames (p) |
5379 |
2133241 |
2127862 |
396 |
| DASD service time** msec (p) |
7.1 |
.4 |
-6.7 |
-94.4 |
| DASD response time** msec(p) |
31.2 |
.4 |
-30.8 |
-98.7 |
| DASD wait queue (p) |
1.9 |
0 |
-1.9 |
-100.0 |
|
Notes: (p) = Data taken from Performance Toolkit;
Resident Pages Total = total number of user pages in central storage;
Resident Pages DASD = total number of pages on paging DASD;
Resident Shared Frames = total number of shared frames
in central storage.
* This is the paging rate.
The paging DASD I/O rate is lower because
it takes into account I/O chaining.
** Average for the 5 user volumes that contain the URL files
Configuration:
Processor Model 2094-719;
Processors 4;
Central Storage 10G;
XSTORE OFF;
MDC OFF;
16 servers (512M);
1 client (1G);
Apache files 3000
|
The base measurement
was constrained by real I/O.
The Apache
servers were waiting on minidisk I/O 78% of the time.
The DASD I/O rate is slightly higher than the virtual I/O rate because
CP monitor is reading performance data.
The average DASD response time is greater than the DASD service time.
Additionally, the wait queue is not zero. This all demonstrates the
workload is constrained by real I/O.
Switching to an XIP-in-DCSS file system
increased the transaction rate
by 26.2%.
Several factors contributed to the increase.
The virtual I/O rate decreased by 96.7% because the URL files
resided in shared memory.
This can be seen by the increase in resident shared pages to
approximately 8 GB.
Server I/O wait disappeared completely.
The average DASD response time and average service time reduced by
98.7% and 94.4%, respectively. Additionally, the DASD queue length
decreased by 100%. All of
this illustrates the DASD I/O constraint is
eliminated when the URL files reside in shared memory.
The servers used most of their 512 MB virtual memory to build the
page and segment tables.
Approximately 400 MB
of kernel memory was needed to build the page and segment tables
for 25 GB of virtual memory.
Paging DASD I/O increased in the DCSS environment.
It was observed that
as the measurement progressed, paging I/O was decreasing
suggesting that CP was moving the unreferenced pages to paging DASD.
Thus, the DCSS run became constrained by paging DASD I/O.
Total processor utilization increased from 63.8 to 87.8%.
This was attributed to the reduction in virtual I/O and
corresponding real I/O to user volumes.
CPU time per transaction increased in the DCSS case because
CP was managing an additional 10 GB of shared memory.
MDC versus DCSS
Table 2
compares the base-case
MDC environment to its corresponding
DCSS environment.
Table 2. MDC versus DCSS
| Apache HTTP files |
MDC |
DCSS |
|
|
| Run ID |
MDC0GR03 |
DCSSGR07 |
Delta |
Pct |
| Tx/sec (p) |
188.8 |
229.8 |
41.0 |
21.7 |
| CP msec/Tx (p) |
6.5 |
5.8 |
-0.7 |
-10.8 |
| Emul msec/Tx (p) |
12.3 |
11.2 |
-1.1 |
-9.2 |
| Total Util/Proc (p) |
90.9 |
94.6 |
3.7 |
4.1 |
| Virtual I/O Rate (p) |
957 |
8 |
-949 |
-99.2 |
| DASD Avoid Rate (p) |
927 |
.2 |
-926.8 |
-100.0 |
| DASD paging rate* (p) |
0.0 |
81.1 |
81.1 |
**** |
| DASD I/O Rate (p) |
354 |
40.8 |
-313.2 |
-88.5 |
| Resident Pages Total (p) |
734998 |
148236 |
-586762 |
-79.8 |
| Resident Pages DASD (p) |
0 |
517022 |
517022 |
**** |
| Resident Shared Frames (p) |
5379 |
1881079 |
1875700 |
34870.8 |
|
Notes: See footnotes in Table 1
for data definitions.
Configuration:
Processor Model 2094-719;
Processors 4;
Central Storage 8G;
XSTORE 2 GB;
MDC 6G (capped);
4 servers (512M);
2 client (1G);
Apache files 5000
|
In
the base measurement, unexpected DASD I/O caused by the
MDC problem
prevented the run from reaching 100% CPU utilization.
This I/O was unexpected because
we had configured the measurement so that CP
had enough available pages to hold all of the
referenced URL files in MDC.
As a consequence, this base case measurement did not
yield optimum throughput for its configuration.
The throughput increased by 21.7% in the DCSS environment.
Several factors contributed to the benefit.
The virtual I/O rate decreased by 99.1% because the URL
files resided in shared memory.
This can be seen by the increase in resident shared pages to
approximately 7 GB.
The other factor is the base measurement did not yield optimum
results.
The DCSS run
was nearly 100% CPU busy, but unexpected paging DASD I/Os
prevented it from reaching an absolute 100%.
CP was paging to move the
unreferenced URL files out of storage.
Overall, by eliminating a majority of the virtual I/O, processor
utilization increased by 4.1% and CP msec/tx and emulation
msec/tx decreased
by 10.8% and 9.2% respectively.
In the special studies
section, two additional pairs of MDC-vs.-DCSS measurements
were completed. In the first pair, we increased the
number of servers from 4 to 12. In the second pair, we
both increased the number of servers from 4 to 12 and
decreased central storage from 8 GB to 6 GB.
LFC versus DCSS
Table 3
compares the Linux file cache environment to
its corresponding DCSS environment.
Table 3. Linux File Cache versus DCSS
| Apache HTTP files |
Linux File Cache |
DCSS |
|
|
| Run ID |
LXCACHE1 |
DCSSLFC1 |
Delta |
Pct |
| Tx/sec (p) |
149.5 |
157.4 |
8.0 |
5.3 |
| CP msec/Tx (p) |
6.9 |
6.3 |
-0.6 |
-8.2 |
| Emul msec/Tx (p) |
14.2 |
13.3 |
-0.9 |
-6.2 |
| Total Util/Proc (p) |
98.7 |
98.6 |
-0.1 |
-0.1 |
| Resident Pages Total (p) |
15666025 |
4018575 |
-11647450 |
-74.3 |
| Resident Pages All (p) |
16067050 |
4018575 |
-12048475 |
-75.0 |
| Resident Shared Frames (p) |
696 |
2665862 |
2665166 |
**** |
|
Notes: See footnotes in Table 1
for data definitions.
Configuration:
Processor Model 2094-719;
Processors 3;
Central Storage 64G;
XSTORE 2 GB;
ON (default);
6 servers (10G);
3 client (1G);
Apache files 10000
|
In the base measurement,
all of the URL files reside in the Linux file cache of each
server. In the DCSS environment the URL files
reside in the XIP-in-DCSS file system.
The throughput increased by 5.3% in the DCSS environment,
but the significant benefit was the reduction in memory.
In the DCSS environment the number of resident pages
decreased by 75.0% or approximately 46 GB.
This is because
when a read-only
file system is mounted with the option -xip,
the referenced data is never inserted into
the six Linux server file caches.
CP msec/tx and emulation msec/tx decreased slightly because
both CP and Linux were managing less storage.
In the special studies
section
a pair of measurements was completed to isolate and
study the effect
of the -xip option when using DCSS file systems.
MDC versus DCSS, More Servers
Table 4
compares an adjusted MDC run to its corresponding DCSS case.
The MDC run is like the MDC
standard configuration
described above,
with the number of servers increased from 4 to 12.
We added servers to try to drive up CPU utilization for
the MDC case.
Table 4. MDC versus DCSS with 12 servers
| Apache HTTP files |
MDC |
DCSS |
|
|
| Run ID |
MDC0GR05 |
DCSSGR05 |
Delta |
Pct |
| Tx/sec (p) |
181.6 |
221.4 |
39.9 |
22.0 |
| CP msec/Tx (p) |
6.6 |
6.2 |
-0.4 |
-5.9 |
| Emul msec/Tx (p) |
14.0 |
12.2 |
-1.8 |
-13.1 |
| Total Util/Proc (p) |
89.4 |
95.7 |
6.3 |
7.0 |
| Virtual I/O Rate (p) |
874 |
13 |
-861 |
-98.5 |
| DASD paging rate* (p) |
1.2 |
668.0 |
667.0 |
**** |
| Resident Pages Total (p) |
1732962 |
229648 |
-1503314 |
-86.7 |
| Resident Shared Frames (p) |
1070 |
1791124 |
1790054 |
**** |
|
Notes: See footnotes in Table 1
for data definitions.
Configuration:
Processor Model 2094-719;
Processors 4;
Central Storage 8G;
XSTORE 2 GB;
MDC 6G (capped);
12 servers (512M);
2 client (1G);
Apache files 5000
|
In the base case,
the unexpected DASD I/O caused by the MDC problem prevented
the workload from reaching 100% CPU utilization.
The throughput increased by 22.0% in the DCSS environment.
Several factors contributed to the benefit.
The virtual I/O rate decreased by 98.5% because the URL files resided in
shared memory. This can be seen by the increase in resident shared pages
to approximately 7 GB.
On average
CP was paging approximately 667 pages/sec to paging DASD and it
ran nearly 100% CPU utilization at steady state
but the DASD I/O prevented it
from reaching absolute 100%.
Overall, eliminating virtual I/O and reducing the amount of
memory management in both CP and Linux provided benefit
in the DCSS environment.
Comparing this back to Table 2,
we expected that as we added servers
both measurements would reach 100% CPU busy.
But again, in the base case
the unexpected DASD I/O caused by the MDC problem prevented
it from reaching 100% CPU busy.
The DCSS run was nearly 100% CPU busy, but unexpected paging DASD I/Os
prevented it from reaching an absolute 100%.
MDC versus DCSS, More Servers and Constrained Storage
Table 5
has a comparison of selected values for the MDC
standard configuration
except
the number of servers was increased from 4 to 12 and central
storage was reduced from 8 GB to
6 GB.
Table 5. MDC versus DCSS, 12 Servers, 6 GB
Central Storage
| Apache HTTP files |
MDC |
DCSS |
|
|
| Run ID |
MDCGR6G0 |
DCSGR6G3 |
Delta |
Pct |
| Tx/sec (p) |
192.3 |
218.7 |
26.4 |
13.7 |
| CP msec/Tx (p) |
6.6 |
6.2 |
-0.4 |
-5.9 |
| Emul msec/Tx (p) |
13.8 |
12.2 |
-1.7 |
-11.9 |
| Total Util/Proc (p) |
92.9 |
94.5 |
1.6 |
1.7 |
| Virtual I/O Rate (p) |
916 |
14 |
-902 |
-98.5 |
| DASD paging rate* (p) |
246.7 |
969.2 |
722.5 |
292.9 |
| Resident Pages Total (p) |
1525860 |
241830 |
-1284030 |
-84.2 |
| Resident Shared Frames (p) |
55 |
1261542 |
1261487 |
**** |
|
Notes: See footnotes in Table 1
for data definitions.
Configuration:
Processor Model 2094-719;
Processors 4;
Central Storage 6G;
XSTORE 2 GB;
MDC 6G (capped);
12 servers (512M);
2 client (1G);
Apache files 5000
|
In the base case, the unexpected DASD I/O caused by the MDC problem
prevented it from reaching the expected storage over commitment.
The throughput increased by 13.7% in the DCSS environment.
Virtual I/O decreased by 98.5% because the URL files
resided in shared memory.
Resident pages decreased by approximately 5 GB while shared
pages increased by approximately 5 GB.
Paging DASD I/O increased because CP was managing an extra
10 GB of shared memory.
The DCSS run was nearly 100% CPU busy at steady state.
Paging I/O was preventing the system from reaching 100% CPU
utilization.
Serving pages in the DCSS environment cost less than in the
MDC environment.
Emulation
msec/tx decreased by 11.9%, because Linux memory management
activity decreased, because DCSS XIP made it unnecessary to read
the files into the Linux file cache. CP msec/tx decreased by 5.9%
because CP handled less virtual I/O.
Comparing this back to Table 4,
the throughput for the MDC case increased as
memory reduced. Again, this is the MDC problem.
The system is less affected by the
MDC problem when memory contention increases.
The throughput for the DCSS case decreased because paging DASD I/O
increased.
DCSS non-xip versus DCSS xip
Table 6
has a comparison of selected values for DCSS without XIP
versus DCSS with XIP,
using the LFC configuration.
Table 6. DCSS without -xip option
versus DCSS with -xip option
| Apache HTTP files mounted |
non-xip |
xip |
|
|
| Run ID |
DCSSNXG4 |
DCSSNXG2 |
Delta |
Pct |
| Tx/sec (p) |
150.0 |
156.1 |
10.1 |
6.9 |
| CP msec/Tx (p) |
6.7 |
6.4 |
-0.4 |
-5.5 |
| Emul msec/Tx (p) |
14.2 |
13.3 |
-0.8 |
-6.0 |
| Total Util/Proc (p) |
97.0 |
98.5 |
1.5 |
1.5 |
| DASD paging rate* (p) |
748.8 |
0.0 |
-748.8 |
-100.0 |
| Avail List >2 GB (p) |
4241 |
7836000 |
7831759 |
**** |
| Avail List <2 GB (p) |
165 |
275000 |
274835 |
**** |
| Resident Pages Total (p) |
15707550 |
4098300 |
-11609250 |
-73.9 |
| Resident Shared Frames (p) |
542928 |
2665817 |
2211889 |
391.0 |
|
Notes: See footnotes in Table 1
for data definitions.
Configuration:
Processor Model 2094-719;
Processors 3;
Central Storage 64G;
XSTORE 2 GB;
ON (default);
6 servers (10G);
3 client (1G);
Apache files 10000
|
In the base case, CP was
managing seven
copies of the 10,000 URL files.
In the DCSS XIP case, CP was managing one copy of the 10,000
URL files.
The throughput increased by 6.9%.
This was attributed to the reduction in memory requirements that
eliminated DASD paging.
We estimated the memory savings to be about 47 GB, based on
the available list having grown by 31 GB, plus 9 GB of Linux
file cache space that one guest
used at the beginning of the run
to load the XIP file system and never released,
plus 7 GB that
MDC used during the XIP load and never
released. The "Resident Pages Total" row
of the table
shows a decrease of 44 GB in resident pages,
which roughly corroborates the 47 GB
estimate. Because the
partition was sized at 64 GB central plus 2 GB XSTORE, we
roughly estimate the percent
memory savings to be at least
(44 GB / 66 GB)
or 67%.
Again, CP msec/tx and emulation msec/tx were
reduced because
both CP and Linux were managing less memory.
Summary and Conclusions
Overall, sharing read-only data in a DCSS reduced system
resource requirements.
Compared to the non-cached virtual I/O base case,
the corresponding
DCSS environment
reduced the number of virtual I/Os and real I/Os.
Paging DASD I/O increased but this was to be expected because
CP was managing more memory.
In the MDC configurations, the corresponding
DCSS environments
reduced the number of virtual I/Os.
Paging DASD I/O increased in all three configurations but this did not
override the benefit.
In the Linux file cache configuration, where the Linux file cache
was large enough to hold the URL files, the DCSS environment
reduced the memory requirement.
Compared to not using XIP,
the Linux mount option -xip
eliminated the need to move the read-only shared
data from the DCSS into the individual Linux server file caches.
This
reduced the memory requirement and memory
management overhead.
It should be stressed that the mount option
-xip was an
important factor in all of our DCSS
measurement results.
Contents | Previous | Next
|