|
Contents | Previous | Next
Linux Disk I/O Alternatives
Introduction
With z/VM 5.2.0, customers have a number of choices for the technology they use
to perform disk I/O with Linux guest systems. In this chapter, we compare and
contrast the performance of several alternatives for Linux disk I/O as a guest
of z/VM. The purpose is to provide insight into the z/VM alternatives that may
yield the best performance for Linux guest application workloads that include
heavy disk I/O. This study does not explore Linux-specific alternatives.
The evaluated disk I/O choices are:
- Dedicated Extended Count Key Data (ECKD) on an ESS 2105-F20, via FICON channel
- Minidisks on ECKD on ESS 2105-F20, via FICON channel
- Dedicated emulated Fixed Block Architecture (FBA) on ESS 2105-F20, via Fibre
Channel Protocol (FCP)
- Minidisks on emulated FBA on ESS 2105-F20, via FCP
- Linux-owned FCP subchannel to an ESS 2105-F20, via FCP
- Linux Diagnose X'250' Block I/O to minidisks on ECKD on ESS 2105-F20, via
FICON channel
The Diagnose X'250' evaluation was done with an internal Diagnose driver for 64-bit
Linux. This driver is not yet available in any Linux distributions. It is
expected to be available in distributions sometime during 2006.
Absent
from this study is an evaluation of Diagnose X'250' with emulated FBA DASD. The
version of the internal Diagnose driver that we used has a kernel level dependency
that is not met with SLES 9 SP 1. We expect to include this choice in future Linux
disk I/O evaluations.
For this study, we used the Linux disk exerciser IOzone to measure disk performance
a Linux guest experiences. We ran IOzone against the disk choices listed above. In
the following sections we discuss the results of the experiments using these choices.
Method
To measure disk performance with Linux guests, we set up a single Linux guest running
the IOzone disk exerciser with an 800 MB file. IOzone is a file system exercise tool.
See our
IOzone workload description
for details about how we run IOzone.
For this experiment, we used the following configuration:
- 2084-324, two-way dedicated partition, 2 GB central storage, 2 GB expanded storage
- No activity in the partition being measured, except our Linux guest and the test
case driver
- 2105-F20, 16 GB of cache, FICON and FCP attached
- z/VM 5.2.0 GA RSU
- One VM virtual machine, 192 MB virtual uniprocessor, running 64-bit
Linux SLES 9 SP 1, Linux ext2 file systems
Notes:
- A 31-bit Linux SLES 9 SP 1 system was also used with the ECKD Diagnose X'250' cases.
The results were very similar to the 64-bit Linux cases presented here.
- A virtual machine size of 192 MB was used to ensure that a significant portion of the
800 MB ballast file did not fit into Linux page cache. If the file does fit into the
page cache, few if any disk I/Os will be done.
- Native SCSI allows more data to be moved in a single I/O request with the FCP
subchannel than traditional ECKD, FBA, and Diagnose X'250'. With the large ballast file
used in this study (800 MB), this creates an advantage for native SCSI over other disk
I/O alternatives.
- The 2105-F20 had multiple FICON chpids attaching it to the 2084.
The Linux guest had one FCP device attached to it, so it was not doing
any nontrivial path selection activity. Note also that the IOzone workload
we ran was serial and single-threaded.
This chapter compares disk I/O choices with Linux as a guest virtual machine on
z/VM 5.2.0. To view performance comparisons from a regression perspective (z/VM 5.2.0
compared with z/VM 5.1.0), refer to the
CP Disk I/O Performance
chapter.
The disk configurations mentioned in this chapter (e.g., EDED, LNS0, and so on) are
defined in detail in our
IOzone workload description
appendix. The configuration naming
conventions used in the tables in this chapter include some key indicators that help the
reader to decode the configuration without the need to refer to the appendix:
| Name prefix |
Name suffix |
|
E for ECKD with the Linux ECKD CCW driver.
|
- DED for a dedicated volume.
- MD0 for a minidisk with MDC OFF.
- MD1 for a minidisk with MDC ON.
|
|
F for emulated FBA on 2105-F20 with the Linux FBA CCW driver.
|
- DED for a dedicated volume.
- MD0 for a minidisk with MDC OFF.
- MD1 for a minidisk with MDC ON.
|
|
D2 for ECKD minidisk with the Linux Diagnose X'250' driver.
|
- 10 for blocksize 1024, MDC OFF.
- 11 for blocksize 1024, MDC ON.
- 20 for blocksize 2048, MDC OFF.
- 21 for blocksize 2048, MDC ON.
- 40 for blocksize 4096, MDC OFF.
- 41 for blocksize 4096, MDC ON.
|
|
LNS0 for Linux owning an FCP subchannel and
using the zFCP driver.
|
None.
|
Summary of Results
While this study of performance shows
native SCSI (Linux-owned FCP subchannel) is the best overall choice for
Linux disk I/O, customers should also consider
the challenges associated with managing the different disk I/O configurations
as part of their system.
This evaluation of Linux disk I/O alternatives as a z/VM guest system
considers performance characteristics only.
It is also important to keep in mind that this evaluation was done with
FCP and FICON channels, which have comparable bandwidth characteristics.
If ESCON channels had been used for the ECKD configuration, the
throughput results would be significantly different.
The results of this study show that native SCSI outperforms all of
the other choices evaluated in this experiment when considering reads and writes.
It combines high levels of throughput with efficient use of CPU capacity.
That said, there may be other I/O choices that provide favorable throughput and
efficient use of CPU capacity based on the I/O characteristics of customer
application workloads.
For application workloads that are predominantly read I/O with many rereads
(for example, in cases where shared, read only DASD is used),
there are other attractive choices. While the Linux-owned FCP
subchannel is a good choice, there are other good choices when
minidisk cache is exploited.
ECKD minidisk, Diagnose X'250' ECKD minidisk (with block sizes of 2K
or 4K), and emulated FBA on ESS 2105-F20 are all good choices with MDC ON.
They all provide impressive throughput rates with efficient use of CPU capacity.
For application workloads that are predominantly write I/O, Linux-owned FCP
subchannel is the best choice. It provides the best throughput rates with the most
efficient use of CPU time. However, customers may want to consider other choices
that yield improvements in throughput and use less CPU time when compared to the
baseline dedicated ECKD case.
Customers should consider the characteristics of their environment when considering
which disk I/O configuration to use for
Linux guest systems on z/VM. Characteristics such as systems management and
disaster recovery should be considered along with application workload characteristics.
Discussion of Results
For each configuration, the tables show the configuration values as a
ratio scaled to the dedicated ECKD case. The tables are organized to
show the KB per second ratio (KB/sec), total CPU time per KB ratio (Total
CPU/KB), the VM Control Program CPU per KB ratio (CP CPU/KB), and the
virtual CPU per KB ratio (Virtual CPU/KB). There are five tables in
all included in this chapter. This allows us to compare the data rates
and CPU consumptions for each of the four IOzone phases:
- Initial write phase
- Rewrite phase
- Initial Read phase
- Reread phase
The last table is a summary table that shows the average of the ratios that are shown
in the four IOzone phases. For customers that have applications that result in a mixture
of writes and reads where the percentage of each is similar or the percentages have not
been determined, this table can be valuable as a summary of overall performance.
For customers with applications that are heavily skewed to read or write I/O operations,
the other four tables will provide valuable insight as to the best choices and acceptable
alternatives.
IOzone Initial Write Results
IOzone Initial Write Results
(scaled to EDED)
|
| Configuration |
KB/sec |
Total CPU/KB |
CP CPU/KB |
Virtual CPU/KB |
| ECKD ccw |
|
|
|
|
| EDED |
1.00 |
1.00 |
1.00 |
1.00 |
| EMD0 |
1.12 |
0.996 |
0.946 |
1.00 |
| EMD1 |
1.12 |
0.989 |
0.961 |
0.993 |
| ECKD Diag X'250' (64-bit) |
|
|
|
|
| D210 |
0.245 |
1.27 |
2.89 |
1.06 |
| D211 |
0.452 |
1.18 |
2.38 |
1.03 |
| D220 |
0.354 |
1.10 |
1.68 |
1.02 |
| D221 |
0.800 |
1.04 |
1.30 |
1.00 |
| D240 |
0.467 |
1.03 |
1.16 |
1.02 |
| D241 |
0.965 |
0.969 |
0.866 |
0.982 |
| EFBA ccw (2105-F20) |
|
|
|
|
| FDED |
1.24 |
1.57 |
5.44 |
1.06 |
| FMD0 |
1.24 |
1.58 |
5.46 |
1.07 |
| FMD1 |
1.24 |
1.57 |
5.45 |
1.06 |
| Dedicated FCP (2105-F20) |
|
|
|
|
| LNS0 |
1.54 |
0.952 |
0.674 |
0.988 |
|
Notes:
2084-324. Two-way dedicated partition. 2 GB central. 2 GB XSTORE.
2105-F20 16 GB FICON/FCP.
z/VM 5.2.0 GA RSU + VM63893.
Linux SLES 9 SP 1, 192 MB virtual uniprocessor.
|
The IOzone initial write results show that the native SCSI case (Linux-owned
FCP subchannel) is the best performer. It provides a 54% improvement
in throughput over the benchmark dedicated ECKD case and a savings of 4.8%
in total CPU time per KB moved.
The ECKD Diagnose X'250' cases show that throughput is the best at the
4K block size.
The emulated FBA cases on the 2105-F20 show much higher CPU time per
transaction to achieve their throughput. Much of this can be attributed to the
additional processing required in the VM Control Program to emulate FBA.
IOzone Rewrite Results
IOzone Rewrite Results
(scaled to EDED)
|
| Configuration |
KB/sec |
Total CPU/KB |
CP CPU/KB |
Virtual CPU/KB |
| ECKD ccw |
|
|
|
|
| EDED |
1.00 |
1.00 |
1.00 |
1.00 |
| EMD0 |
1.08 |
0.992 |
1.01 |
0.989 |
| EMD1 |
1.09 |
0.986 |
0.960 |
0.991 |
| ECKD Diag X'250' (64-bit) |
|
|
|
|
| D210 |
0.454 |
1.27 |
2.37 |
1.05 |
| D211 |
0.448 |
1.29 |
2.50 |
1.05 |
| D220 |
0.787 |
1.07 |
1.29 |
1.03 |
| D221 |
0.781 |
1.08 |
1.43 |
1.02 |
| D240 |
1.10 |
0.946 |
0.799 |
0.975 |
| D241 |
1.08 |
0.961 |
0.868 |
0.979 |
| EFBA ccw (2105-F20) |
|
|
|
|
| FDED |
1.23 |
1.93 |
6.12 |
1.12 |
| FMD0 |
1.23 |
1.95 |
6.11 |
1.14 |
| FMD1 |
1.22 |
1.93 |
6.06 |
1.13 |
| Dedicated FCP (2105-F20) |
|
|
|
|
| LNS0 |
1.46 |
0.915 |
0.628 |
0.971 |
|
Notes:
2084-324. Two-way dedicated partition. 2 GB central. 2 GB XSTORE.
2105-F20 16 GB FICON/FCP.
z/VM 5.2.0 GA RSU + VM63893.
Linux SLES 9 SP 1, 192 MB virtual uniprocessor.
|
For the rewrite phase, we see similar results to the write phase.
IOzone Initial Read Results
IOzone Initial Read Results
(scaled to EDED)
|
| Configuration |
KB/sec |
Total CPU/KB |
CP CPU/KB |
Virtual CPU/KB |
| ECKD ccw |
|
|
|
|
| EDED |
1.00 |
1.00 |
1.00 |
1.00 |
| EMD0 |
1.00 |
0.998 |
0.997 |
0.998 |
| EMD1 |
0.539 |
1.34 |
2.65 |
1.02 |
| ECKD Diag X'250' (64-bit) |
|
|
|
|
| D210 |
0.380 |
1.34 |
2.38 |
1.08 |
| D211 |
0.206 |
1.72 |
4.85 |
0.946 |
| D220 |
0.656 |
1.08 |
1.36 |
1.01 |
| D221 |
0.357 |
1.49 |
4.09 |
0.846 |
| D240 |
0.976 |
0.905 |
0.711 |
0.954 |
| D241 |
0.546 |
1.36 |
2.76 |
1.01 |
| EFBA ccw (2105-F20) |
|
|
|
|
| FDED |
0.950 |
2.31 |
6.73 |
1.21 |
| FMD0 |
0.950 |
2.29 |
6.60 |
1.21 |
| FMD1 |
0.771 |
2.64 |
9.17 |
1.01 |
| Dedicated FCP (2105-F20) |
|
|
|
|
| LNS0 |
1.64 |
0.852 |
0.515 |
0.936 |
|
Notes:
2084-324. Two-way dedicated partition. 2 GB central. 2 GB XSTORE.
2105-F20 16 GB FICON/FCP.
z/VM 5.2.0 GA RSU + VM63893.
Linux SLES 9 SP 1, 192 MB virtual uniprocessor.
|
For the initial read phase, the native SCSI case (Linux-owned FCP subchannel) is the
best performer once again. It provides a 64% improvement in throughput over the
baseline dedicated ECKD case, along with a 14.8% savings in total CPU time per KB
moved.
The ECKD minidisk cases illustrate the cost in throughput and CPU time per transaction
when MDC is ON. Comparing the EMD0 and EMD1
runs, there is a 46% loss in throughput and a 34% increase in total CPU time per KB
moved with MDC ON. These costs are the result of populating the minidisk cache.
When we look at the reread phase, we should find that there is a significant benefit
with MDC ON as the read is done from the cache (i.e., no I/O is performed from the disk).
The ECKD Diagnose X'250' cases show a similar trend to the write and rewrite phases
in terms of block size. The 4K block size results in the best throughput.
Comparing the 4K block size cases (D240 and D241), we find a similar trend to the
ECKD minidisk runs related to MDC. The cost of MDC ON is paid in terms of throughput
and CPU time per transaction. As mentioned above in the ECKD minidisk discussion, we
should find that there is a significant benefit with MDC ON in the reread phase.
The emulated FBA cases on the 2105-F20 show much higher CPU time per
transaction, as is the case in the write and rewrite phases. A difference in the
read phase is that the throughput in 2105-F20 cases is less than the dedicated ECKD
baseline case. In the write and rewrite phases we saw that there was a significant
increase in throughput at the cost of high CPU time per transaction.
IOzone Reread Results
IOzone Reread Results
(scaled to EDED)
|
| Configuration |
KB/sec |
Total CPU/KB |
CP CPU/KB |
Virtual CPU/KB |
| ECKD ccw |
|
|
|
|
| EDED |
1.00 |
1.00 |
1.00 |
1.00 |
| EMD0 |
0.999 |
1.03 |
1.03 |
1.03 |
| EMD1 |
10.9 |
0.783 |
1.09 |
0.706 |
| ECKD Diag X'250' (64-bit) |
|
|
|
|
| D210 |
0.380 |
1.35 |
2.37 |
1.09 |
| D211 |
9.49 |
0.929 |
2.29 |
0.588 |
| D220 |
0.656 |
1.10 |
1.29 |
1.05 |
| D221 |
10.2 |
0.929 |
2.25 |
0.598 |
| D240 |
0.975 |
0.973 |
0.785 |
1.02 |
| D241 |
11.0 |
0.642 |
0.918 |
0.573 |
| EFBA ccw (2105-F20) |
|
|
|
|
| FDED |
0.947 |
2.35 |
6.78 |
1.25 |
| FMD0 |
0.948 |
2.35 |
6.78 |
1.24 |
| FMD1 |
9.07 |
0.901 |
2.20 |
0.576 |
| Dedicated FCP (2105-F20) |
|
|
|
|
| LNS0 |
1.64 |
0.951 |
0.609 |
1.04 |
|
Notes:
2084-324. Two-way dedicated partition. 2 GB central. 2 GB XSTORE.
2105-F20 16 GB FICON/FCP.
z/VM 5.2.0 GA RSU + VM63893.
Linux SLES 9 SP 1, 192 MB virtual uniprocessor.
|
For the reread phase, the native SCSI case (Linux-owned FCP subchannel) is the
best performer as it is in the other three phases (write, rewrite, and read phases). It
provides a 64% improvement in throughput over the baseline dedicated ECKD case, along
with a 4.9% savings in total CPU time per KB moved.
As expected, the ECKD minidisk case with MDC ON
yields a very large benefit in throughput
and a significant savings in CPU time per KB moved of 21.7%. As discussed in the read phase,
this benefit is achieved because the reread phase performs the reread using the minidisk
cache, so there is no I/O performed with the disk.
The benefit of MDC is even more
substantial when you consider z/VM environments with multiple Linux guest
systems sharing read only minidisks as part of their application workload.
Please note, however, that in cases where the Linux page cache is made large enough to
achieve a high hit ratio, you should consider turning off MDC because it is redundant.
The ECKD Diagnose X'250' cases with MDC ON all show large improvements in
throughput ratios, similar to what we see with the ECKD minidisk cases, along with significant
savings in CPU time per transaction. As in the other three IOzone phases, ECKD Diagnose X'250'
shows the most benefit using a block size of 4K. In this case, the throughput is
improved by 1000%, and the total CPU time per KB moved is reduced by 35.8% over the baseline
dedicated ECKD case. For the MDC OFF cases, the 4K block size case yields the best
throughput.
The emulated FBA cases on the 2105-F20 show much higher CPU time per transaction
as is the case in the write and rewrite phases, with one exception. The emulated FBA
case for the 2105-F20 with MDC ON shows a reduction in total CPU time of 9.9%
along with more than 800% increase in throughput. All other cases have a very high
CPU time per transaction.
Overall IOzone Results
IOzone Overall Results
(scaled to EDED)
|
| Configuration |
KB/sec |
Total CPU/KB |
CP CPU/KB |
Virtual CPU/KB |
| ECKD ccw |
|
|
|
|
| EDED |
1.00 |
1.00 |
1.00 |
1.00 |
| EMD0 |
1.06 |
1.00 |
0.994 |
1.00 |
| EMD1 |
1.07 |
1.02 |
1.40 |
0.944 |
| ECKD Diag X'250' (64-bit) |
|
|
|
|
| D210 |
0.337 |
1.30 |
2.51 |
1.07 |
| D211 |
0.432 |
1.27 |
2.99 |
0.938 |
| D220 |
0.535 |
1.09 |
1.41 |
1.03 |
| D221 |
0.752 |
1.12 |
2.24 |
0.903 |
| D240 |
0.739 |
0.975 |
0.871 |
0.995 |
| D241 |
1.03 |
0.980 |
1.34 |
0.911 |
| EFBA ccw (2105-F20) |
|
|
|
|
| FDED |
1.12 |
1.95 |
6.25 |
1.14 |
| FMD0 |
1.12 |
1.95 |
6.22 |
1.14 |
| FMD1 |
1.29 |
1.74 |
5.72 |
0.979 |
| Dedicated FCP (2105-F20) |
|
|
|
|
| LNS0 |
1.55 |
0.923 |
0.608 |
0.983 |
|
Notes:
2084-324. Two-way dedicated partition. 2 GB central. 2 GB XSTORE.
2105-F20 16 GB FICON/FCP.
z/VM 5.2.0 GA RSU + VM63893.
Linux SLES 9 SP 1, 192 MB virtual uniprocessor.
|
The overall IOzone results table summarizes the performance of disk I/O choices across
all four IOzone phases (initial write, rewrite, initial read, reread). This table
characterizes the performance that can be expected for each choice for customers that
have workloads that are not predominantly write or predominantly read.
As in the four phase discussions, the native SCSI case (Linux-owned FCP subchannel)
is the clear winner. It outperforms all other choices with a 55%
improvement in throughput, with a 7.7% savings in total CPU time per KB moved in
comparison to the dedicated ECKD baseline case.
The ECKD minidisk cases show an increase in throughput over the dedicated ECKD case
with little change in CPU cost.
The ECKD Diagnose X'250' cases show that throughput is best at the
4K block size. Minidisk cache (MDC) ON shows some improvement over MDC OFF in both
throughput and total CPU time.
The emulated FBA cases on the 2105-F20 show very high CPU time per transaction
to achieve their throughput.
Contents | Previous | Next
|