Discussion:
[tor-dev] Bandwidth scanner: request for feedback
teor
2018-08-27 03:50:01 UTC
Permalink
Hi,

Juga and pastly have been working hard on sbws.

Sbws' results are now similar to torflow's results:
Loading Image...

Now that sbws is close to torflow, we want some feedback on its
design. We’ll work on the design at the tor meeting in September.

Please feel free to give feedback by email, or on the tickets:


What happens when sbws doesn't match torflow?

https://trac.torproject.org/projects/tor/ticket/27339

We suggest this rule:

If an sbws deployment is within X% of an existing bandwidth
authority, sbws is ok. (The total consensus weights of the
existing bandwidth authorities are within 25% - 50% of each
other, see #25459.)


How long should sbws keep relay bandwidths?

https://trac.torproject.org/projects/tor/ticket/27338

Torflow uses the latest self-reported relay observed bandwidth
and bandwidth rate.

Torflow uses a complex feedback loop for measured bandwidths.
We think sbws can use a simple average or exponentially
decaying weighted average.


How should we scale sbws consensus weights?

https://trac.torproject.org/projects/tor/ticket/27340

If sbws' total consensus weight is different to torflow's total
consensus weight, how should we scale sbws?

(The weights might differ because the measurement method is
different, or because scanners and servers are in different
locations.)

In the bandwidth file spec, we suggest linear scaling.


How should we round sbws consensus weights?

https://trac.torproject.org/projects/tor/ticket/27337

Torflow currently rounds to 3 significant figures (which is a maximum
of 0.5%). But I suggest 2 significant figures for sbws (or max 5%),
because:
- tor has a daily usage cycle that varies by 10% - 20%
- existing bandwidth authorities vary by 25% - 50%

Proposal 276 contains a slightly more complicated rounding algorithm,
which we may want to implement in sbws or in tor:

https://gitweb.torproject.org/torspec.git/tree/proposals/276-lower-bw-granularity.txt


Does sbws need a maximum consensus weight fraction?

https://trac.torproject.org/projects/tor/ticket/27336

Torflow uses 5%, but I suggest 1%, because the largest relay right
now is only 0.5%.

T

--
teor

Please reply @torproject.org
New subkeys 1 July 2018
PGP C855 6CED 5D90 A0C5 29F6 4D43 450C BA7F 968F 094B
----------------------------------------------------------------------
Mike Perry
2018-08-29 21:11:57 UTC
Permalink
Post by teor
Hi,
Juga and pastly have been working hard on sbws.
https://trac.torproject.org/projects/tor/attachment/ticket/27135/20180826_081902.png
Congratulations, Juga and pastly!
Post by teor
Now that sbws is close to torflow, we want some feedback on its
design. We’ll work on the design at the tor meeting in September.
What happens when sbws doesn't match torflow?
https://trac.torproject.org/projects/tor/ticket/27339
If an sbws deployment is within X% of an existing bandwidth
authority, sbws is ok. (The total consensus weights of the
existing bandwidth authorities are within 25% - 50% of each
other, see #25459.)
I would like an additional criteria for when we finally replace torflow
with sbws.

Ideally, I would like us to perform A/B experiments to ensure that our
performance metrics do not degrade in terms of average *or* quartile
range/performance variance. (Ie: alternate torflow results for a week vs
sbws for a week, and repeat for a few weeks). I realize this might be
complicated for dirauth operators, though. Can we make it easier
somehow, so that it is easy to switch which result files they are voting
with?

If we can't do this, at minimum, we should definitely watch the change
in our average and quartile variance performance metrics when we first
switch to sbws.

Additionally, if we ever change how sbws behaves to be different than
torflow, I would like sbws to have a well-defined load balancing
equilibrium goal, and I would like us to not change this load balancing
equilibrium goal unless we perform A/B testing and compare the average
and variance of our performance metrics.

I'll explain what I mean by "load balancing equilibrium goal" below,
when I try to explain the PID mechanism again.
Post by teor
How long should sbws keep relay bandwidths?
https://trac.torproject.org/projects/tor/ticket/27338
Torflow uses the latest self-reported relay observed bandwidth
and bandwidth rate.
Torflow uses a complex feedback loop for measured bandwidths.
We think sbws can use a simple average or exponentially
decaying weighted average.
As I said in
https://lists.torproject.org/pipermail/tor-dev/2017-December/012714.html,
this feedback loop is disabled. I know you don't believe that the
bandwidth auth spec is accurate, but I'm telling you it is. There's
just a lot going on there because the bwauths have required a long
history of experimentation to get to where they are now, just as sbws is
now encountering with trying to make various measurement and scaling
decisions. (As you A/B test ways to improve performance on the live
network, you tend to accumulate a lot of options for different ways of
doing things).

The point of the PID control stuff was to formalize the type of load
balancing equilibrium goal that the bandwidth auths are using, and to
experiment with convergence on a specific target load balancing
equilibrium point (where that target equilibrium point is "all relays
have the same spare capacity for one additional client stream"). The
problem was that when you only use this criteria, faster relays run out
of CPU, memory, or sockets before this criteria was satisfied for them.
Hence all of the circuit failure reason statistics in the code base (to
try to back off on PID control if we hit a different limiting factor
other than bandwidth).

Unfortunately, Tor does not provide enough error code feedback to
reliably determine if a relay is low on memory, sockets, or CPU. Funding
ended for the bandwidths auths before we could implement proper overload
error feedback in Tor, and we got funding for me to work on Tor Browser
instead.

With the parameters in the current consensus (currently bwauthpid=1, and
no others), the PID control is operating as only "Proportional control":
https://gitweb.torproject.org/torflow.git/tree/NetworkScanners/BwAuthority/README.spec.txt#n476
(The default values for K_i and K_d are 0, as per Section 3.6 of the
spec).

In section 3.1 of the spec, I have a proof that using "Proportional
control" (ie PID control with no I or D) is equivalent to what we were
doing in Section 2.2. This means that Section 2.2 does describe what we
are doing now:
https://gitweb.torproject.org/torflow.git/tree/NetworkScanners/BwAuthority/README.spec.txt#n390
https://gitweb.torproject.org/torflow.git/tree/NetworkScanners/BwAuthority/README.spec.txt#n298

I left the PID code itself enabled (but in "Proportional-only" mode)
because it is cleaner, and it makes it formally clear that the bandwidth
authorities are actually measuring the difference in the ability of
relays to carry additional client traffic, and correcting for that
difference by adjusting weights in proportion to that difference. I
naively assumed that eventually Tor would get funding to implement
better feedback for CPU, memory, and socket overload. That was almost 10
years ago.

(Incidentally, the trickiest non-bandwidth overload condition to report
was memory shortage -- a problem that effectively goes away in a
datagram Tor world with bounded queue lengths.. In fact, CPU limits
could also be implemented as a congestion drop condition in a datagram
scenario, and measured indirectly via relay throughput, leaving only
socket limits.)


I'm glad that we are exploring load balancing again, and with a modern,
simpler, and well-tested code base. That's all good. But as you make
choices about how to load balance, please have a specific goal as to
what target load balancing equilibrium point you're actually going for.

(The reason why I did not consider raw measured stream bandwidth to be a
valid equilibrium is because it is *not* measuring the total capacity of
a relay, and it does not have an equilibrium point in terms of expected
client performance or overall load balancing. In order to use raw
measurements directly as a load balancing equilibrium point, you
actually need to measure total relay throughput in some way, such as
Peerflow did).
Post by teor
How should we scale sbws consensus weights?
https://trac.torproject.org/projects/tor/ticket/27340
If sbws' total consensus weight is different to torflow's total
consensus weight, how should we scale sbws?
(The weights might differ because the measurement method is
different, or because scanners and servers are in different
locations.)
In the bandwidth file spec, we suggest linear scaling.
This seems reasonable.

I am wary of the idea of trying to use some kind of ideal distribution
for relays, that you mention in #27135. That is not something that you
can enforce in measurement without causing performance variance to
suffer tremendously. It can only be enforced by a consensus cutoff
threshold and/or relay operator incentive mechanisms.

I believe quite strongly that even if the Tor network gets faster on
average, if this comes at the cost of increased performance variance,
user experience and perceived speed of Tor will be much worse. There's
nothing more annoying than a system that is *usually* fast enough to do
what you need it to do, but fails to be fast enough for that activity at
unpredictable times.
Post by teor
How should we round sbws consensus weights?
https://trac.torproject.org/projects/tor/ticket/27337
Torflow currently rounds to 3 significant figures (which is a maximum
of 0.5%). But I suggest 2 significant figures for sbws (or max 5%),
- tor has a daily usage cycle that varies by 10% - 20%
- existing bandwidth authorities vary by 25% - 50%
Proposal 276 contains a slightly more complicated rounding algorithm,
https://gitweb.torproject.org/torspec.git/tree/proposals/276-lower-bw-granularity.txt
If we can measure relays frequently enough such that we can accurately
report the effects of Tor's daily usage cycle and adjust our weights
accordingly, then I think that retaining the ability to represent this
variance is worth the overhead.

Again, this comes back to my belief that performance variance is
actually the major performance problem facing Tor right now.

On the other hand, if we cannot measure accurately or often enough for
this to matter, then it doesn't matter.

But a successor to sbws might, if we can manage to build one sooner than
a decade from now, so it would be wise not to bake this sig fig limit
into our actual consensus format.
Post by teor
Does sbws need a maximum consensus weight fraction?
https://trac.torproject.org/projects/tor/ticket/27336
Torflow uses 5%, but I suggest 1%, because the largest relay right
now is only 0.5%.
Sounds reasonable.

If we ever get working multi-core crypto+networking, this number will
change, though.
--
Mike Perry
Tom Ritter
2018-08-30 21:35:06 UTC
Permalink
Post by Mike Perry
Ideally, I would like us to perform A/B experiments to ensure that our
performance metrics do not degrade in terms of average *or* quartile
range/performance variance. (Ie: alternate torflow results for a week vs
sbws for a week, and repeat for a few weeks). I realize this might be
complicated for dirauth operators, though. Can we make it easier
somehow, so that it is easy to switch which result files they are voting
with?
Having both voting files means running both scanners at the same time.
Depending on one's pipes, that might skew the results from the
scanners.

-tom
teor
2018-11-19 12:36:28 UTC
Permalink
Hi,

We have deployed sbws on one bandwidth authority (longclaw).

Here's a request for additional feedback, and a progress update:


Request for Feedback: Relay Bandwidth Self-Tests

Torflow and sbws use relays' self-reported observed bandwidths for
load balancing. But relays can have really low bandwidths because
they're new, or due to random path selection.

In torflow, relays can get stuck in a low-bandwidth partition. sbws
doesn't have partitions. But in both systems, low bandwidths can
cause inaccurate or unstable load balancing.

Since torflow and sbws need accurate self-reported relay bandwidths,
some component of the Tor network needs to send enough bandwidth
through every relay.

Here are our current choices:

Tor relays can do a regular bandwidth self-test, so that their
first descriptor has an accurate bandwidth (up to some minimum). But
the current self-test is too small, and buggy.

sbws already sends bandwidth to all relays to measure them. sbws gets
accurate bandwidths for most relays within 2 weeks, but the fastest
relays can take a month to ramp up. (sbws starts measuring at the
median relay bandwidth, and can double every 5 days.)

Should we improve relay bandwidth self-tests? (#22453)
Or should we rely on sbws to create the bandwidths it needs?
What about test networks?

Should we make bandwidths grow faster in sbws?
Or is a ramp-up period of 2-5 weeks fast enough?

(We won't modify and re-deploy torflow.)


Progress Update
Post by Mike Perry
Post by teor
What happens when sbws doesn't match torflow?
https://trac.torproject.org/projects/tor/ticket/27339
If an sbws deployment is within X% of an existing bandwidth
authority, sbws is ok. (The total consensus weights of the
existing bandwidth authorities are within 25% - 50% of each
other, see #25459.)
We have successfully used this rule to discover and fix some bugs in
sbws.
Post by Mike Perry
I would like an additional criteria for when we finally replace torflow
with sbws.
Ideally, I would like us to perform A/B experiments to ensure that our
performance metrics do not degrade in terms of average *or* quartile
range/performance variance. (Ie: alternate torflow results for a week vs
sbws for a week, and repeat for a few weeks). I realize this might be
complicated for dirauth operators, though. Can we make it easier
somehow, so that it is easy to switch which result files they are voting
with?
We do not have the capacity to A/B test sbws and torflow.
(As far as I understand, we don't have enough people, and we don't have
enough servers.)
Post by Mike Perry
If we can't do this, at minimum, we should definitely watch the change
in our average and quartile variance performance metrics when we first
switch to sbws.
We deployed sbws on 1/6 bandwidth authorities, and the performance of
the network has been stable:
https://metrics.torproject.org/torperf.html?start=2018-01-21&end=2018-11-19&source=all&server=public&filesize=50kb

(The drop in performance at the start of the year was due to extra
network load.)
Post by Mike Perry
Additionally, if we ever change how sbws behaves to be different than
torflow, I would like sbws to have a well-defined load balancing
equilibrium goal, and I would like us to not change this load balancing
equilibrium goal unless we perform A/B testing and compare the average
and variance of our performance metrics.
I'll explain what I mean by "load balancing equilibrium goal" below,
when I try to explain the PID mechanism again.
sbws has adopted Torflow's load-balancing equilibrium goal.

Our priority is transitioning away from Torflow successfully.

We've deferred changes to the load-balancing goal until a later sbws
release. We may never make this change.
Post by Mike Perry
Post by teor
How long should sbws keep relay bandwidths?
https://trac.torproject.org/projects/tor/ticket/27338
Torflow uses the latest self-reported relay observed bandwidth
and bandwidth rate.
Torflow uses a complex feedback loop for measured bandwidths.
We think sbws can use a simple average or exponentially
decaying weighted average.
As I said in
https://lists.torproject.org/pipermail/tor-dev/2017-December/012714.html,
this feedback loop is disabled. I know you don't believe that the
bandwidth auth spec is accurate, but I'm telling you it is.
Improving bandwidth measurement has been one of the most difficult
things I have done with Tor.

You're right: I don't know if the Torflow spec is accurate, because I
often struggle to find the information I need in the spec.

That's not anyone's fault: it's a difficult and complex topic. But it
does mean that I need your help to answer some questions about Torflow.
Post by Mike Perry
The point of the PID control stuff was to formalize the type of load
balancing equilibrium goal that the bandwidth auths are using, and to
experiment with convergence on a specific target load balancing
equilibrium point (where that target equilibrium point is "all relays
have the same spare capacity for one additional client stream"). The
problem was that when you only use this criteria, faster relays run out
of CPU, memory, or sockets before this criteria was satisfied for them.
Hence all of the circuit failure reason statistics in the code base (to
try to back off on PID control if we hit a different limiting factor
other than bandwidth).
...
I'm glad that we are exploring load balancing again, and with a modern,
simpler, and well-tested code base. That's all good. But as you make
choices about how to load balance, please have a specific goal as to
what target load balancing equilibrium point you're actually going for.
sbws has adopted Torflow's goals.
Post by Mike Perry
Post by teor
How should we scale sbws consensus weights?
https://trac.torproject.org/projects/tor/ticket/27340
If sbws' total consensus weight is different to torflow's total
consensus weight, how should we scale sbws?
(The weights might differ because the measurement method is
different, or because scanners and servers are in different
locations.)
In the bandwidth file spec, we suggest linear scaling.
This seems reasonable.
Unfortunately, linear scaling did not work.

sbws now uses Torflow's scaling method, with relay observed bandwidths.
Post by Mike Perry
...
I believe quite strongly that even if the Tor network gets faster on
average, if this comes at the cost of increased performance variance,
user experience and perceived speed of Tor will be much worse. There's
nothing more annoying than a system that is *usually* fast enough to do
what you need it to do, but fails to be fast enough for that activity at
unpredictable times.
I agree. And I'm usually using Tor in high-latency locations, so I see
this variance every day.
Post by Mike Perry
Post by teor
How should we round sbws consensus weights?
https://trac.torproject.org/projects/tor/ticket/27337
Torflow currently rounds to 3 significant figures (which is a maximum
of 0.5%). But I suggest 2 significant figures for sbws (or max 5%),
- tor has a daily usage cycle that varies by 10% - 20%
- existing bandwidth authorities vary by 25% - 50%
Proposal 276 contains a slightly more complicated rounding algorithm,
https://gitweb.torproject.org/torspec.git/tree/proposals/276-lower-bw-granularity.txt
If we can measure relays frequently enough such that we can accurately
report the effects of Tor's daily usage cycle and adjust our weights
accordingly, then I think that retaining the ability to represent this
variance is worth the overhead.
Again, this comes back to my belief that performance variance is
actually the major performance problem facing Tor right now.
On the other hand, if we cannot measure accurately or often enough for
this to matter, then it doesn't matter.
I don't believe sbws can measure relays fast enough for it to matter.

sbws was rounding to the nearest 1000 kilobytes on 1/6 authorities, with
no discernible performance on the network. We've fixed this bug, and
sbws will now round to 2 significant figures. (We haven't implemented the
extra last-digit rounding in prop276.)

If sbws can measure fast enough in future, we can modify it to report more
accurate bandwidths.
Post by Mike Perry
But a successor to sbws might, if we can manage to build one sooner than
a decade from now, so it would be wise not to bake this sig fig limit
into our actual consensus format.
Thanks, we won't modify tor to round bandwidths: that responsibility
belongs in the bandwidth measurement code.

But tor's consensus diffs and compression benefit from rounded relay
bandwidths, so any performance gain needs to be measured against an
increase in consensus download sizes.
Post by Mike Perry
Post by teor
Does sbws need a maximum consensus weight fraction?
https://trac.torproject.org/projects/tor/ticket/27336
Torflow uses 5%, but I suggest 1%, because the largest relay right
now is only 0.5%.
Sounds reasonable.
If we ever get working multi-core crypto+networking, this number will
change, though.
We went with 5%, to match Torflow.

T
Nick Mathewson
2018-11-20 19:06:31 UTC
Permalink
Post by teor
Hi,
We have deployed sbws on one bandwidth authority (longclaw).
Request for Feedback: Relay Bandwidth Self-Tests
Torflow and sbws use relays' self-reported observed bandwidths for
load balancing. But relays can have really low bandwidths because
they're new, or due to random path selection.
In torflow, relays can get stuck in a low-bandwidth partition. sbws
doesn't have partitions. But in both systems, low bandwidths can
cause inaccurate or unstable load balancing.
Since torflow and sbws need accurate self-reported relay bandwidths,
some component of the Tor network needs to send enough bandwidth
through every relay.
Tor relays can do a regular bandwidth self-test, so that their
first descriptor has an accurate bandwidth (up to some minimum). But
the current self-test is too small, and buggy.
sbws already sends bandwidth to all relays to measure them. sbws gets
accurate bandwidths for most relays within 2 weeks, but the fastest
relays can take a month to ramp up. (sbws starts measuring at the
median relay bandwidth, and can double every 5 days.)
Should we improve relay bandwidth self-tests? (#22453)
Or should we rely on sbws to create the bandwidths it needs?
What about test networks?
Hi! I don't think I have the answers here, but maybe I can think
aloud in a useful way.

From my point of view, either of these is a fine idea, if it works.
We could decide based on a lot of factors, like:

* Which one is easier to do?
* Which creates the greater maintenance burden, moving forward?
* Which is more robust if something breaks in the future?
* Which consumes the most relay bandwidth?
* Which requires SBWS to use the most bandwidth?

Maybe if we had those figured out, we'd have a better time deciding.
Post by teor
Should we make bandwidths grow faster in sbws?
Or is a ramp-up period of 2-5 weeks fast enough?
I think that's fast enough, though I'm not sure. How does it compare
with the current average torflow ramp-up time?
Post by teor
(We won't modify and re-deploy torflow.)
Loading...