One-Lane Bridge

From CloudScale
Jump to: navigation, search

Contents

One-Lane Bridge

A One Lane Bridge occurs, if a passive resource limits the concurrency in an application. Passive resources can be for instance mutexes, connection pools, or database locks. It is analogous to the one lane bridge problem in vehicular traffic, where traffic may only travel in one direction at a time, and, if there are multiple lanes of traffic all moving in parallel, they must merge and proceed across the bridge. [1]

Also Known As

OLB

Example

In a sales system where orders are recorded in the database, the system may be designed such that each order requires a database update for each item ordered. If the database assigns a new order item number to each item, and inserts all items at the end of the table, and if every new update must go to the same physical location, and all new items are “inserted,” then the update behaves like a One-Lane Bridge. In such a system, only one insert may proceed at a time and all others must wait. [2]

Context

Systems with concurrency and mostly those that access a database.

Problem

The One Lane Bridge typically occurs in multi-threaded software systems, where only few threads can resume execution while most threads have to wait. Synchronization points in the program flow or database locks are often reasons for the One Lane Bridge. At a One Lane Bridge only few threads are active concurrently. Thus, less work can be processed resulting in higher response times. Furthermore, the One Lane Bridge can lead to a congestion resulting in the Traffic Jam antipattern.

Detection

Since a One Lane Bridge is a typical scalability problem, we look at the performance behaviour with respect to an increasing level of concurrency. To detect this antipattern, we define a series of experiments observing the end-to-end response time while increasing the number of users for each experiment. The strategy increases the number of users until

  1. a resource is fully utilized (i.e., its utilization is larger than 90%),
  2. response times increase more than 10 times, or
  3. the maximum number of potential concurrent users is reached.

In order to distinguish an OLB from a Bottleneck Resource, we additionally measure resource utilization during each experiment. The following figure illustrates the measurement results for three different scenarios: i) the system under test (SUT) contains an OLB, ii) the SUT contains no problem, and iii) the SUT contains a bottleneck resource (BR). The graphs on the left hand side show the average response times with respect to the number of users. The graphs on the right hand side show the corresponding mean CPU utilization.

OLB.png

If the SUT contains an OLB, its critical passive resource leads to strongly increasing response times for an increasing number of users. Additionally, CPU utilization is low since the throughput is limited by the passive resource. If the CPU is a Bottleneck Resource (BR), response times increase and CPU utilization is high. Thus, we do not assume an OLB to be present. Finally, if no performance problem occurs, response times and CPU utilization increase only moderately.

Solution

In vehicular traffic, the one-lane bridge problem can be resolved by constructing multiple lanes, constructing additional bridges, or rerouting traffic. In the realm of software systems we have similar solutions:

  • Increasing the number of threads (or processes) that can simultaneously access the resource. In our example above, this can be achieved for example by using multiple tables that can be consolidated later.
  • Reducing the holding time, by limiting the time every process needs to hold the resource, either by removing code from the holding-block that does not need a lock on the resource, or reducing the service time of the resource.

See also

Traffic Jam

References

  1. A. Wert, J. Happe, and L. Happe, Supporting Swift Reaction: Automatically Uncovering Performance Problems by Systematic Experiments, in Proceedings or the International Conference on Software Engineering (ICSE), 2013.
  2. C. Smith and L. Williams, Software Performance Antipatterns, in Proceedings of WOSP. ACM, 2000, pp. 127–136.