Bufferbloat
Bufferbloat is a cause of high latency in packet-switched networks caused by excess buffering of packets. Bufferbloat can also cause packet delay variation (also known as jitter), as well as reduce the overall network throughput. When a router or switch is configured to use excessively large buffers, even very high-speed networks can become practically unusable for many interactive applications like voice over IP (VoIP), online gaming, and even ordinary web surfing.
Some communications equipment manufacturers designed unnecessarily large buffers into some of their network products. In such equipment, bufferbloat occurs when a network link becomes congested, causing packets to become queued for long periods in these oversized buffers. In a first-in first-out queuing system, overly large buffers result in longer queues and higher latency, and do not improve network throughput.
The bufferbloat phenomenon was described as early as 1985.[1] It gained more widespread attention starting in 2009.[2]
Buffering
An established rule of thumb for the network equipment manufacturers was to provide buffers large enough to accommodate at least 250 ms of buffering for a stream of traffic passing through a device. For example, a router's Gigabit Ethernet interface would require a relatively large 32 MB buffer.[3] Such sizing of the buffers can lead to failure of the TCP congestion control algorithm. The buffers then take some time to drain, before congestion control resets and the TCP connection ramps back up to speed and fills the buffers again.[4] Bufferbloat thus causes problems such as high and variable latency, and choking network bottlenecks for all other flows as the buffer becomes full of the packets of one TCP stream and other packets are then dropped.[5]
A bloated buffer has an effect only when this buffer is actually used. In other words, oversized buffers have a damaging effect only when the link they buffer becomes a bottleneck. The size of the buffer serving a bottleneck can be measured using the ping utility provided by most operating systems. First, the other host should be pinged continuously; then, a several-seconds-long download from it should be started and stopped a few times. By design, the TCP congestion avoidance algorithm will rapidly fill up the bottleneck on the route. If downloading (and uploading, respectively) correlates with a direct and important increase of the round trip time reported by ping, then it demonstrates that the buffer of the current bottleneck in the download (and upload, respectively) direction is bloated. Since the increase of the round trip time is caused by the buffer on the bottleneck, the maximum increase gives a rough estimation of its size in milliseconds.
In the previous example, using an advanced traceroute tool instead of the simple pinging (for example, MTR) will not only demonstrate the existence of a bloated buffer on the bottleneck, but will also pinpoint its location in the network. Traceroute achieves this by displaying the route (path) and measuring transit delays of packets across the network. The history of the route is recorded as round-trip times of the packets received from each successive host (remote node) in the route (path).[6]
Mechanism
Most TCP congestion control algorithms rely on measuring the occurrence of packet drops to determine the available bandwidth between two ends of a connection. The algorithms speed up the data transfer until packets start to drop, then slow down the transmission rate. Ideally, they keep adjusting the transmission rate until it reaches an equilibrium speed of the link. So that the algorithms can select a suitable transfer speed, the feedback about packet drops must occur in a timely manner. With a large buffer that has been filled, the packets will arrive at their destination, but with a higher latency. The packets were not dropped, so TCP does not slow down once the uplink has been saturated, further filling the buffer. Newly arriving packets are dropped only when the buffer is fully saturated. Once this happens TCP may even decide that the path of the connection has changed, and again go into the more aggressive search for a new operating point.[7]
Packets are queued within a network buffer before being transmitted; in problematic situations, packets are dropped only if the buffer is full. On older routers, buffers were fairly small so they filled quickly and therefore packets began to drop shortly after the link became saturated, so the TCP protocol could adjust and the issue would not become apparent. On newer routers, buffers have become large enough to hold several seconds of buffered data. To TCP, a congested link can appear to be operating normally as the buffer fills. The TCP algorithm is unaware the link is congested and does not start to take corrective action until the buffer finally overflows and packets are dropped.
All packets passing through a simple buffer implemented as a single queue will experience similar delay, so the latency of any connection that passes through a filled buffer will be affected. Available channel bandwidth can also end up being unused, as some fast destinations may not be promptly reached due to buffers clogged with data awaiting delivery to slow destinations. These effects impair interactivity of applications using other network protocols, including UDP used in latency-sensitive applications like VoIP and online gaming.[8]
Impact on applications
Regardless of bandwidth requirements, any type of a service which requires consistently low latency or jitter-free transmission can be affected by bufferbloat. Examples include voice calls, online gaming, video chat, and other interactive applications such as instant messaging, radio streaming, video on demand, and remote login.
When the bufferbloat phenomenon is present and the network is under load, even normal web page loads can take many seconds to complete, or simple DNS queries can fail due to timeouts.[9] Actually any TCP connection can timeout and disconnect, and UDP packets can get lost. Since the continuation of a TCP download stream depends on ACK packets in the upload stream, a bufferbloat problem in the upload can cause failure of other non-related download applications, because the ACK packets does not timely reach the internet server. You might e.g. limit the transmission rate of an upload OneDrive synchronisation in order not to disturb other home network users.
Diagnostic tools
The DSL Reports Speedtest[10] is an easy-to-use test that includes a score for bufferbloat. The ICSI Netalyzr[11] was another on-line tool that could be used for checking networks for the presence of bufferbloat, together with checking for many other common configuration problems.[12] The service was shut down in March 2019. The bufferbloat.net web site lists tools and procedures for determining whether a connection has excess buffering that will slow it down.[13]
Solutions and mitigations
Several technical solutions exist which can be broadly grouped into two categories: solutions that target the network and solutions that target the endpoints. The two types of solutions are often complementary. The problem sometimes arrives with a combination of fast and slow network paths.
Network solutions generally take the form of queue management algorithms. This type of solution has been the focus of the IETF AQM working group.[14] Notable examples include:
- AQM algorithms such as CoDel and PIE.[15]
- Hybrid AQM and packet scheduling algorithms such as FQ-CoDel.[16]
- Amendments to the DOCSIS standard[17] to enable smarter buffer control in cable modems.[9]
- Integration of queue management (FQ-CoDel) into the WiFi subsystem of the Linux operating system as Linux is commonly used in wireless access points.[18]
Notable examples of solutions targeting the endpoints are:
- The BBR congestion control algorithm for TCP.
- The Micro Transport Protocol employed by many BitTorrent clients.
- Techniques for using fewer connections, such as HTTP pipelining or HTTP/2 instead of the plain HTTP protocol.[9]
The problem may also be mitigated by reducing the buffer size on the OS[9] and network hardware; however, this is often not configurable and optimal buffer size is dependent on line rate which may differ for different destinations.
DiffServ does not solve, nor can avoid the bufferbloat problem, since all packets are impacted once the problem occurs.[19]
See also
References
- "On Packet Switches With Infinite Storage". December 31, 1985.
- van Beijnum, Iljitsch (January 7, 2011). "Understanding Bufferbloat and the Network Buffer Arms Race". Ars Technica. Retrieved November 12, 2011.
- Guido Appenzeller; Isaac Keslassy; Nick McKeown (2004). "Sizing Router Buffers" (PDF). ACM SIGCOMM. ACM. Retrieved October 15, 2013.
- Nichols, Kathleen; Jacobson, Van (May 6, 2012). "Controlling Queue Delay". ACM Queue. ACM Publishing. Retrieved September 27, 2013.
- Gettys, Jim (May–June 2011). "Bufferbloat: Dark Buffers in the Internet". IEEE Internet Computing. IEEE. pp. 95–96. doi:10.1109/MIC.2011.56. Archived from the original on October 12, 2012. Retrieved February 20, 2012.
- "traceroute(8) – Linux man page". die.net. Retrieved September 27, 2013.
- Jacobson, Van; Karels, MJ (1988). "Congestion avoidance and control" (PDF). ACM SIGCOMM Computer Communication Review. 18 (4). Archived from the original (PDF) on June 22, 2004.
- "Technical Introduction to Bufferbloat". Bufferbloat.net. Retrieved September 27, 2013.
- Gettys, Jim; Nichols, Kathleen (January 2012). "Bufferbloat: Dark Buffers in the Internet". Communications of the ACM. 55 (1). ACM: 57–65. doi:10.1145/2063176.2063196. Retrieved February 28, 2012. Cite journal requires
|journal=
(help) - "Speed test - how fast is your internet?". dslreports.com. Retrieved October 26, 2017.
- "ICSI Netalyzr". berkeley.edu. Archived from the original on April 7, 2019. Retrieved January 30, 2015.
- "Understanding your Netalyzr results". Retrieved October 26, 2017.
- "Tests for Bufferbloat". bufferbloat.net. Retrieved October 26, 2017.
- "IETF AQM working group". ietf.org. Retrieved October 26, 2017.
- Pan, Rong; Natarajan, Preethi; Piglione, Chiara; Prabhu, Mythili; Subramanian, Vijay; Baker, Fred; VerSteeg, Bill (2013). PIE: A Lightweight Control Scheme To Address the Bufferbloat Problem. 2013 IEEE 14th International Conference on High Performance Switching and Routing (HPSR). IEEE. doi:10.1109/HPSR.2013.6602305.
- Høiland-Jørgensen, Toke; McKenney, Paul; Taht, Dave; Gettys, Jim; Dumazet, Eric (March 18, 2016). "The FlowQueue-CoDel Packet Scheduler and Active Queue Management Algorithm". Retrieved September 28, 2017.
- "DOCSIS "Upstream Buffer Control" feature". CableLabs. pp. 554–556. Retrieved August 9, 2012.
- Høiland-Jørgensen, Toke; Kazior, Michał; Täht, Dave; Hurtig, Per; Brunstrom, Anna (2017). Ending the Anomaly: Achieving Low Latency and Airtime Fairness in WiFi. 2017 USENIX Annual Technical Conference (USENIX ATC 17). USENIX - The Advanced Computing Systems Association. pp. 139–151. ISBN 978-1-931971-38-6. Retrieved September 28, 2017. source code.
- Hein, Mathias. "Bufferbloat » ADMIN Magazine". ADMIN Magazine. Retrieved June 11, 2020.
External links
- BufferBloat: What's Wrong with the Internet? A discussion with Vint Cerf, Van Jacobson, Nick Weaver, and Jim Gettys
- Google Tech Talk on YouTube April, 2011, by Jim Gettys, introduction by Vint Cerf
- Bufferbloat: Dark Buffers in the Internet — Demonstrations Only on YouTube April, 2011, by Jim Gettys, introduction by Vint Cerf
- Bufferbloat: Dark Buffers in the Internet — Demonstrations and Discussions on YouTube 21 minute demonstration and explanation of typical broadband bufferbloat
- LACNIC - BufferBloat on YouTube May 2012, by Fred Baker (IETF chair) in Spanish, English slides available
- TSO sizing and the FQ scheduler (Jonathan Corbet, LWN.net)