IBM: z/VM Performance Report: Scheduler Lock Improvement

Scheduler Lock Improvement

Prior to z/VM 4.4.0, the CP scheduler lock was used to serialize scheduler activities, timer requests, and processor local dispatch vectors (PLDVs). With z/VM 4.4.0, a new timer request lock has been integrated into the z/VM Control Program to manage timer requests. The introduction of the Timer Request Lock (TRQBK lock) reduces contention on the scheduler lock, allowing an increase in the volume of Linux guest virtual machines and other guest operating systems that can be managed concurrently by a z/VM image. While this can improve capacity with large n-way configurations, little or no effect is experienced on systems with very few processors.

This section summarizes the results of a performance evaluation that was done to verify that there is a significant reduction in scheduler lock contention in environments with large numbers of guest virtual machines running on large n-way systems. This was accomplished by comparing the overall time spent spinning on CP locks and CP CPU time per transaction. The expected result was that the number of requests to spin (Avg Spin Lock Rate), the time spent spinning on CP locks (Spin Time), and the CP CPU time per transaction (CP msec/Tx) would all decrease.

Methodology: All performance measurements were done on z900 systems. A 2064-109 system was used to conduct experiments with 3-way and 9-way LPAR configurations; a 2064-116 was used to experiment with a 16-way LPAR configuration. The 3-way and 9-way LPAR configurations each included dedicated processors, 6.5GB of central storage and 0.5GB of expanded storage. The 16-way LPAR configuration included dedicated processors, 16GB of central storage and 1GB of expanded storage. ¹

The software configuration for the evaluation used z/VM 4.3.0 for the baseline comparison against z/VM 4.4.0. The application workload included a combination of busy and idle users. The purpose of including idle users was to create an environment with numerous timer interrupts to evaluate the effect of the new TRQBK lock on system performance. The specifics of the application workload are:

100 Linux guest virtual machines performing HTTP web serving with zero think time (these machines were constantly busy serving up web pages).
500 idle guest virtual machines generating timer requests every 10 milliseconds. ² This portion of the workload put a significant load on CP to manage timer interrupts. Typical customer environments with large numbers of Linux guests would have the time patch applied.

The scenarios included the application workload being executed on z/VM 4.3.0 using the 3-way, 9-way, and 16-way LPAR configurations discussed above to create a set of baseline measurements for comparison. z/VM 4.4.0 was measured with the same workload for each of the LPAR configurations. A discussion of the results for each LPAR configuration follows.

Internal tools were used to drive the application workload for each scenario. The idle users were allowed to reach a steady state (indicated by the absence of paging activity) before taking measurements. Hardware instrumentation and CP monitor data were collected for each scenario.

Results:

Figure 1 shows the percent of time spent spinning on CP locks, comparing z/VM 4.3.0 to z/VM 4.4.0. Figure 2 shows the CP CPU time for each transaction comparing z/VM 4.3.0 to z/VM 4.4.0.

Table 1 shows a summary of the data collected for the 3-way, 9-way, and 16-way comparisons.

Figure 1. Reduction in Time Spent Spinning on CP Locks

Figure 2. Reduction in CP CPU Time per Transaction

In the 3-way LPAR comparison, there is a noticeable improvement in the CP spin lock area. The average spin lock rate (the number of times a request to spin is made) is reduced by 55%, and the percent spin time (percentage of time spent spinning on a lock) is reduced by 62%. However, the CP CPU time per transaction (CP msec/Tx) does not change much. This is expected since CP locking is not a significant problem in the base case (z/VM 4.3.0) with the 3-way LPAR configuration. Generally, as the number of CPUs increases, and each of them is sending timer requests to CP, there are more timer requests queued up to be handled.

With the increased number of CPUs in the 9-way LPAR comparison, the benefit of splitting off the timer request management from the scheduler lock stands out. The CP CPU time per transaction rate is reduced by 87% along with a 54% reduction in the average spin lock rate and a 92% reduction the percent spin time. With the increase to 9 CPUs, this data illustrates that the serialization of timer requests being handled by the new TRQBK lock has a very positive effect on the system because CP locking is a severe problem in the base case.

With the 16-way LPAR configuration, again there is a improvement in the CP CPU time per transaction rate. It is reduced by 13%, which is much less of an improvement than in the 9-way comparison. The average spin lock rate is reduced by 29% and the percent spin time is reduced by 41%. While these improvements are significant, they are less dramatic in this configuration because the z/VM 4.4.0 case is now being limited by CP lock contention. For the base case, CP lock contention is much more severe, so z/VM 4.4.0 still shows an improvement.

Table 1. Reduced Scheduler Lock Contention: 3-Way, 9-Way, 16-Way LPAR

z/VM Version # of CPUs Run ID	4.3.0 3 43UPP3	4.4.0 3 44UPP3	4.3.0 9 43UPP9	4.4.0 9 44UPP9	4.3.0 16 43S1UP16	4.4.0 16 44S1UP16
Tx/sec (n) SIE Instruction Rate (v)	416.73 159414	420.88 160020	270.11 88416	999.88 197199	248.63 59856	268.35 77632
Total msec/Tx (h) CP msec/Tx (h) Emul msec/Tx (h)	7.110 3.231 3.880	7.033 3.133 3.901	28.213 23.113 5.101	7.864 3.034 4.830	58.627 54.142 4.484	51.868 47.014 4.854
Avg Spin Lock Rate (v) Spin Time (v) Pct Spin Time (v)	6150 3.338 2.053	2797 2.739 0.766	10047 24.782 24.90	4587 4.275 1.961	4108 104.895 43.09	2929 87.360 25.59
Total Util/Proc (v) System Util/Proc (v)	99.3 9.5	99.5 9.8	83.4 42.0	87.4 12.9	90.3 54.2	86.6 72.0
Ratios
Tx/sec (n) SIE Instruction Rate (v)	1.000 1.000	1.010 1.004	1.000 1.000	3.702 2.230	1.000 1.000	1.079 1.297
Total msec/Tx (h) CP msec/Tx (h) Emul msec/Tx (h)	1.000 1.000 1.000	0.989 0.970 1.005	1.000 1.000 1.000	0.279 0.131 0.947	1.000 1.000 1.000	0.885 0.868 1.083
Avg Spin Lock Rate (v) Spin Time (v) Pct Spin Time (v)	1.000 1.000 1.000	0.455 0.821 0.373	1.000 1.000 1.000	0.457 0.173 0.079	1.000 1.000 1.000	0.713 0.833 0.594
Total Util/Proc (v) System Util/Proc (v)	1.000 1.000	1.002 1.032	1.000 1.000	1.048 0.307	1.000 1.000	0.959 1.328
Note: These measurements were done on z900 machines in a non-paging environment; the workload consisted of 100 zLinux guests performing HTTP web serving and 500 simulated idle Linux guests.
Note: The transaction/sec rate represents the rate for the busy virtual machines performing HTTP web serving. It does not include the idle users that are generating timer interrupts.

Summary:

In all cases tested, it was verified that CP CPU time per transaction is positively affected by introduction of the timer request lock (TRQBK lock). The TRQBK lock significantly reduces the bottleneck experienced by the scheduler lock when the workload on the system includes a large number of timer interrupts to be handled. Since idle Linux systems generate frequent timer interrupts to look for work, the TRQBK lock plays an important role in maintaining good performance in z/VM systems when there are large numbers of Linux guests present. ³

Additional evidence of the positive effect of the TRQBK lock is illustrated in the Linux Guest Crypto on z990 section of this report.

In the case of the 16-way configuration, CP lock contention is reduced when compared to z/VM 4.3.0, but is still a significant limitation with the workload that was used. However, as mentioned earlier, this workload was created to stress CP's management of timer interrupts. Typical customer environments with Linux guests would perform much better because the Linux guests would have the timer patch applied.

Footnotes:

¹: The breakout of central storage and expanded storage for this evaluation was arbitrary. Similar results are expected with other breakouts because the measurements were obtained in a nonpaging environment.
²: These virtual machines emulate idle Linux guests without the timer patch installed.
³: Such Linux systems are good candidates for the timer patch to reduce the frequency of timer interrupts.

Contents | Previous | Next