As in the original paper the second goal is not discussed here. This behavior
is affected by used scheduler algorithm. Most CBQ implementations use deficit
round-robin (DRR) [4] in which excess bandwidth is distributed according to the
weight parameters specified. I will address related issues later in this paper.
In [1], formal guidelines are described which implement the goals
above. They are repeated here for your convenience:
A class can continue unregulated if one of the following conditions hold:
(see [1] for terms' declarations).
The second rule ensures that we don't borrow bandwidth from parent class while there is child which itself can send without borrowing.
The leaf can continue unregulated only if there exists no overlimit ancestor class at level L.
In the case where all leaves are overlimit (step 1 didn't yield a packet) we
increment L and try again until we either get packet or L is bigger than the
level of root class.
It is not hard to prove correctness of algorithm above.
For L=1 the algorithm behaves exactly as first rule of formal guidelines. To
agree with the second rule we have to assure that we don't allow a leaf to
borrow from class at level N if other leaf exists which has demand and can
borrow from class at lower level. In iterative approach we always test lower
levels before higher ones thus we satisfy both formal sharing rules.
The algorithm can be implemented in faster way. During the first step (with L=1) we can remember the lowest level we can send at and use it directly in the second pass (if the first one didn't yield us a packet). I call this two phase approach.
The key idea is to maintain one sorted list of classes per level. Each list
contains only active classes from single level and these are sorted by
time which they become underlimit at. When we need to know top level then we can
simply look heads of the sorted lists up starting from lowest level. The first
level we find underlimit class at becomes new top level.
The ceiling will
make it a bit more complicated but still it is not hard to prove correctness of
the idea above.
Note that this idea was not tested and should be treated as such.
I replaced the estimator by a classical token bucket one [2]. It is simple to
setup as the most of network administrators have knowledge about it. It uses
only two parameters (rate and burst) of which both have unambiguous physical
meaning.
In my feeling the major contribution is fact that token bucket
algorithm is simple to implement on systems with finite nonzero granularity of
timer and it doesn't depent on hardware rate of network device.
The token bucket algorithm has to remember not only positive tokens (burst)
but also negative one at least for non leaf classes. It is because parent class
can be overlimit while one of its children is underlimit. Sending from the child
causes overlimit parent to be yet more overlimit thus forcing tokens to be
negative.
To assure that the overal goal of scheduler is still satisfied,
inner classes must allow for negative tokens up to sum of bursts of all
descendant classes. This limit can however be set to arbitrary large value, say
2 minutes because it is unlikely that user will use bursts bigger than several
seconds. Bursts are in this case specified in the time domain, whereas they are
usually expressed in bytes. These two views can be converted bidirectionaly
using the token bucket's rate).
Probably you see the advantages of this approach. One can guarantee 1Mbit to an agency and allow it to use up to 2Mbit (ceil) if the link is idle. From the implementation side it is trivial. Just another estimator test in underlimit parent lookup loop.
The simple solution is to change DRR to use separate deficit and active class
pointer for each borrowing level. Because active pointer is private for given
borrowing level then class borrowing from other level (and possibly forcing us
to be regulated) can't mess our DRR position.
Tests in my implementation
shows rate ratios improvement ranging from 5 to 50% when implementing this
simple solution.
Use of clock based scheduler like WF2Q with link sharing could be helpful to better isolate interactive traffic and provide better delay bounds.
[1] Link-sharing and Resource Management models for Packet Networks, Sally
Floyd and Van Jacobson, 1995
[2] Token Bucket ???? - can't remember the
paper :(
[3] HTB for Linux, http://luxik.cdi.cz/~devik/qos/htb
[4]
Efficient Fair Queuing using Deficit Round Robin, M. Shreedhar and G. Varghese
[5] WF2Q: Worst-case Fair Weighted Fair Queuing, J.C.R. Bennet and Hui Zhang