AQM Working Group D. Täht
Internet-Draft Teklibre
Intended status: Informational Feburary 2014
Expires: August 24, 2014

Implementing comprehensive queue management on home routers
draft-taht-bcp-queuemanagement-homerouter-00

Abstract

This memo presents techniques for utilizing "Flow Queuing" and the Codel AQM algorithm together on home gateways.

On bandwith constrained home gateways the need for some form of rate limitation and packet scheduling has been embedded in many devices.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on August 24, 2014.

Copyright Notice

Copyright (c) 2014 IETF Trust and the persons identified as the document authors. All rights reserved.

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.


Table of Contents

1. Introduction

This document presents best current practices for best combining Flow Queuing, with AQM, traffic shaping and packet prioritization on Internet edge CPE and head ends.

It does not address queue management problems on wireless technologies.

While the new single queued and latency sensitive AQM systems PIE, CODEL [CODEL2012] offer tremendous benefits standalone, they are insufficient to meet all needs on an edge gateway, which has additional needs to serve:

End to end techniques, such as [SABRE], and LEDBAT [RFC6817] offer at best a reduction of queue latency below 100ms, leaving a side trip of halfway around the planet, under load, on an edge gateway.

Combining better packet scheduling, AQM and prioritization techniques leads to an enormous reduction in induced latency under load, to below 5ms for queue building flows, and and nearly 0 for non-queue building flows.

Large network bursts caused by IW10, TCP offloads, 802.11n aggregation, and misbehaved applications are all mitigated by using flow queueing techniques, making them nearly invisible to other network applications.

2. TAHT14

This memo does not deprecate [RFC4594]. However we note that humans slicing and dicing up bandwidth along 64 different options in general has un-expected side effects, and it seems better to do better mixing of flows, more tightly control latency, and allow for group congestion control than allow for extensive classification. Three to five bands of traffic types are enough.

3. Inbound and Outbound Queue Management

Much work [CABLELABS12], [CABLELABS13], has focused on reducing the queue length on outbound flows from the CPE. However a large portion of the edge gateway bufferbloat problem occurs with dramatic overbuffering coming from the DSLAM, CMTS, and gpon host device.

Since the publication of BROADBAND09, DSL devices have largely been shown to have reasonable ingress buffering, and some form of AQM or packet management on egress.

This is not the case on VERIZON and COMCAST, where downstream buffering has been measured at 300ms and 1800ms respectively.

Therefore, applying smart queuing techniques on both inbound and outbound traffic is both desirable and necessary.

4. Examples of deployed Smart Queueing Systems

4.1. FREE.FR Queue Management system

The French ISP "Free.fr", adopted fq-codel as their default Smart Queue Management system for their Revolution "V6" CPE product in August, 2012, replacing their previous SFQ based system.

Their system uses 3 bands of classification and priority, utilizing DRR in the second 2 bands to avoid starvation of the background queue, and fq-codel on all 3 bands to break up flows and manage queue length.

The first queue is strictly admission controlled, and can starve other queues if needed.

prio queue 2 bands, priomap 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1
|
|= fq-codel (Strict priority admission controlled)
|
|= drr (80% 20%)
     |
     |= fq-codel (best effort)
     |
     |= fq-codel (background)

Figure 1: Description of Free.fr SQM system

Rate limiting is provided by the native line rate of their DSL devices, and their device drivers are tuned to create no more than 1ms of native latency in the hardware on egress from the home gateway device.

ECN is enabled on egress from the CPE.

FQ-Codel is set to quantum of 300, giving priority to smaller packets, while not overly starving larger ones.

The FQ-codel "target" parameter is increased from the default of 5ms to a formula relative to the bandwidth on egress:

delay=$((1500*5/((bw_up/8))))
[ "$delay" -lt 5 ] && delay=5

There is no (despite a desire for such) queue management system presently in place for ingress as the DSLAM vendor has no support for it. And: rate limiting inbound on the CPE cannot currently account for the non-rate limited TV broadcast streams multiplexed on the same virtual circuit.

Despite extensive benchmarking #FREEBENCHMARK, we've been unable to come up with a better system.

4.2. CeroWrt SQM system

The principal researh tool of the Bufferbloat effort has been poured into the CeroWrt SQM system, which, while defaulting to sensible values, can also be configured to use variants of codel, pie, and SFQ for experimentation.

Emulations of typical byte-limited queues of DSL and Cable are also available, as well as naive head and tail drop fifo Qdiscs.

It has been ported to mainline Linux variants and multiple embedded Linux systems.

There are two main versions, one that exercises a single band queuing system (with variants of codel, fq-codel, PIE, SFQ, SFQRED, DRR, and RED) and a simple 3 band system modeled on the extremely successful but obsolete [wondershaper].

The simplest system is useful primarily as a test of new queuing approaches, and as a reference to compare against the multi-band approaches.

4.3. Example #2 Simple

Simple.qos is a three band system that can use diffserv and simple prioritization to give or remove priority to certain kinds of identifiable flows.

It gives priority to CPE generated DNS and NTP packets, respects a few diffserv markings, and deprioritizes CS1.

It is not feature-competitive with OpenWrt's qos-scripts, which have (for example) l7 inspection to find common torrent-like protocols, and a gui with lots of knobs to control that aspect.

It does, however, work correctly with ipv6 and IPv4, which most shapers today do not do.

HTB queue 3 bands
|
|= fq-codel (Strict priority max 30% admission controlled)
|
|= fq-codel (best effort)
|
|= fq-codel (background max 30%)

Figure 2: Description of CeroWrt 3 band SQM system

The first band is strictly admission controlled, and is limited to DNS and NTP queries emitted directly from the local router's DNS and NTP servers, and packets marked TOS-immediate.

Optionally the user can specify additional devices, protocols or diffserv classifications to fit into this band (or any other band).

The priority band is limited to a maximum of 30% of all bandwidth by default.

The second band (Best Effort) can absorb all bandwidth available on the system.

The third band (Background) is limited to a minimum of 30%, due to observing many mis-classified flows in the field. Experiments with a maximum of 5% starved many common flow types.

By default ECN is enabled on ingress and disabled on egress. It is recomended that ECN be enabled when using fq_codel or pie at bandwidths above 10Mbit.

FQ-Codel uses a quantum of 300, giving some priority to smaller packets.

There is extensive support in the HTB-based rate limiter for correctly estimating bandwidth for various forms of (mostly xDSL) based framing.

TBF + DRR could be substituted for HTB.

4.4. Example #3 StreamBoost(tm)

Streamboost(tm) implements a 5 band fq-codel based system, which does deep packet inspection using a regularly updated classification table (primarily for games), and allows for reservation of bandwidth on a per device basis.

4.5. Other implementations

TBD (1/d)

5. Overall Summary of recomendations

  • Queue-building flows MUST be broken up into packets and mixed with other flows.
  • Local latency sensitive traffic (dns, ntp) SHOULD be in the priority queue.
  • No queue should have starvation possibilities
  • ICMP SHOULD be moved to the background queue.
  • Classification can help.

6. Known Problems with these approaches

6.1. LEDBAT

It has been shown [YIXI2012] that any form of AQM or Fair Queue-ing induces reprioritization in low priority congestion control algorithms such as LEDBAT, turning all known forms of delay based congestion control back into loss-based or ecn-based congestion control. However this is not much of a problem as queue delays under fq_codel rarely exceed 20ms, and are usually invisible, rather than the target 100ms of LEDBAT.

Verses web traffic, fq-codel shoots at many LEDBAT streams, forcing them to back off into their slower slow start phase, so web and LEDBAT traffic tend to co-exist nicely.

Still, torrent flows SHOULD be depriorized via classification, and torrent clients SHOULD use diffserv markings and use a more limited number of flows (10) rather than the default 50.

It is possible to do so using UPNP rules on inbound and outbound IPv4 traffic.

6.2. Single large flows (DASH) & VPNS

The hashing logic contains means to disambiguate GRE based flows. FIXME, write more about the flaws with VPNS and how to fix them.

6.3. Bandwidths below 4Mbit

The combination of big-iron server side techniques like IW10, and large MTUs (1500 bytes), make finding any general purpose solution difficult at uplink bandwidths less than 4Mbit.

While the above techniques have been applied at rates as low as 128kbit, some things simply cannot be adaquately compensated for. Increasing the codel target does help.

6.4. CPU usage

The FQ-Codel and DRR algorithms by themselves incur very little CPU impact. However the rate limiter algorithms (HTB and HFSC) can and do occupy a great deal of CPU, particularly at higher bandwidths, on current hardware.

More research into effective rate limitation techniques is desireable.

7. Deprecated techniques

7.1. Policing

Policing as presently used (WSHAPE,CISCO) involves setting a fixed buffer limit to inbound streams. Although this technique is simple and easy to use, it is insensitive to latency and to bandwidth changes, and can do harm WONDERSHAPERDIE, when used inappropriately.

Conventional policing is deprecated in favor of applying Smart Queueing techniques also to inbound streams, which has a single core parameter: allowable latency.

7.2. TCP ACK prioritization

TCP ACK prioritization is deprecated.

With the movement of major protocols from TCP to others such as QUIC, SPDY, µTP, and wide adoption of new TCP options (timestamping, TFO), explicit optimizations for recognisable TCP segments or acknowledgements is not advised.

7.3. Port Reservations UPnP and PCP

8. Dependencies

The core fq-codel and codel algorthms have been in the Linux kernel since version 3.5, and have been backported to kernel versions as old as 2.6.32. Improvements to the fq-codel jhash algoritm landed in 3.8, supporting better 6in4, 6rd, and GRE hashing. Improvements to HTB landed in 3.11.

9. Future work

Combination with weighting techniques (QFQ), lighter weight shaping techniques, and full diffserv classification are only beginning to be explored. Full, rather than stochastic, fair queueing techniques are being explored. [SCH_FQ].

Refinements to the Codel [NFQCODEL],[SFQCODEL] algorithm are possible. The combination of Flow Queueing with (PIE) or RED (SFQRED) has not been fully explored.

10. Definition of Terms

10.1. Rate Limiting

Using algorithms such as TBF, HTB or HFSC.

10.2. Classification

Deep Packet Inspection (DPI), per-port analysis, and utilization of the Diffserv field all count as "Classification".

10.3. Policing

10.4. Shaping

10.5. Active Queue Management

[RFC2309] is a specification for utilizing active queue management.

bufferbloat

10.6. Smart Queue Management

11. Methods of flow queueing

11.1. 1/F(low)

1/F flow queuing tries to provide fairness by allocating an equal portion of the bandwidth to each flow on the network.

11.2. 1/D(evice)

1/D flow queueing tries to provide fairness by allocating an equal portion of the bandwidth to each queue generating device on the network.

Unfortunately in an IPv6 world, devices can contain (far) more than a single ip address, and to correctly optimize on a per device basis requires co-joining all information about it's IPv4 and IPv6 address set into a single lookup table.

11.3. 1/U(ser)

1/U flow queueing attempts to provide fairness over users.

11.4. Mixed

It is anticipated that most deployed systems will be a hybrid of all techniques.

12. Security Considerations

This document raises no security issues. The hash values of fq_codel are permuted at invocation time and are difficult to guess. Deprioritizing ICMP slightly, and applying flow queueing techniques, mitigates some forms of attack (naive udp and and ping floods), while rendering more difficult disabling a system entirely with multiple forms of attack.

13. IANA Considerations

This document has no actions for IANA.

14. Acknowlegements

The members of Bufferbloat.net, OpenWrt, Eric Dumazet, Kathy Nichols, Van Jacobson, Maxime Bizon.

15. References

15.1. Normative References

[RFC0896] Nagle, J., "Congestion control in IP/TCP internetworks", RFC 896, January 1984.
[RFC0970] Nagle, J., "On packet switches with infinite storage", RFC 970, December 1985.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC2309] Braden, B., Clark, D., Crowcroft, J., Davie, B., Deering, S., Estrin, D., Floyd, S., Jacobson, V., Minshall, G., Partridge, C., Peterson, L., Ramakrishnan, K., Shenker, S., Wroclawski, J. and L. Zhang, "Recommendations on Queue Management and Congestion Avoidance in the Internet", RFC 2309, April 1998.
[RFC4594] Babiarz, J., Chan, K. and F. Baker, "Configuration Guidelines for DiffServ Service Classes", RFC 4594, August 2006.
[RFC6817] Shalunov, S., Hazel, G., Iyengar, J. and M. Kuehlewind, "Low Extra Delay Background Transport (LEDBAT)", RFC 6817, December 2012.
[TSV2011] Gettys, J., "Bufferbloat: Dark Buffers in the internet, IETF 80", March 2011.

15.2. Informative References

[BB2011] Nichols, K. and J. Gettys, "Bufferbloat: Dark Buffers in the Internet, Communications of the ACM 9(11),pp 57-65", May 2011.
[BMPFQ] Suter, B., "Buffer Management Schemes for Supporting TCP in Gigabit Routers with Per-flow Queueing", June 1999.
[CODEL2012] Nichols, K. and V. Jacobson, "Controlling Queue Delay", July 2012.
[DRR] Shreedhar, M. and G. Varghese, "Efficient Fair Queueing Using Deficit Round Robin", June 1996.
[REDL1998] Jacobson, V., Nichols, K. and K. Poduri, "RED in a different Light", September 1999.
[SABRE] Mansy, A., Steeg, B. and M. Ammar, "SABRE: A Client based technique for mitigating the bufferbloat effect of adaptive video flows", September 2013.
[SFQ] McKenney, P., "Stochastic Fairness Queuing", September 1990.
[VANQ2006] Jacobson, V., "A Rant on Queues", July 2006.
[YIXI2012] Gong, Y., Rossi, D., Testa, C., Valenti, S. and D. Täht, "Fighting the Bufferbloat: On the coexistences of AQM and low priority congestion control", April 2012.

Author's Address

Dave Täht Teklibre 2104 W First street Apt 2002 FT Myers, FL 33901 USA EMail: d@taht.net URI: http://www.teklibre.com/