Discussion:
[tor-dev] monitoring significant drops of flags in dirauth votes
nusenu
2018-02-11 09:21:00 UTC
Permalink
Hi Damian and Tom,

Roger discovered that dannenberg did not include any exit flags
in certain votes anymore [1].

It would be great if we would detect and notify about such events in the future.

I see two places where this could be added:

DocTor:
a new check that alerts on events where a certain dir auth does
either no longer include certain flags (guard, hsdir, exit, ..) at all
or better: if the amount of relays with a certain flag significantly dropped
by xx % from one vote to the next.

consenus-health graphs:
we have nice graphs per dirauth and bwauth, if we would have
per-dirauthvote-per-flag (mainly guard, exit, hsdir - we have already running)
graphs as well we could spot such events (and even trends)
better. (btw: what caused there recent flat-line in graphs on 2018-02-03 - 2018-02-05)

What do you think?


thanks for considering it,
nusenu


[1] https://lists.torproject.org/pipermail/tor-relays/2018-February/014480.html
--
https://mastodon.social/@nusenu
twitter: @nusenu_
Damian Johnson
2018-02-11 20:58:52 UTC
Permalink
Thanks nusenu! Nice idea, added it to DocTor...

https://gitweb.torproject.org/doctor.git/commit/?id=8945013

It gives a notice if flags issued by an authority are 50% different
from the conensus. Presently there's only one instance of that...

[consensus-health] NOTICE: moria1 had 756 HSDir flags in its vote but
the consensus had 2583
Post by nusenu
Hi Damian and Tom,
Roger discovered that dannenberg did not include any exit flags
in certain votes anymore [1].
It would be great if we would detect and notify about such events in the future.
a new check that alerts on events where a certain dir auth does
either no longer include certain flags (guard, hsdir, exit, ..) at all
or better: if the amount of relays with a certain flag significantly dropped
by xx % from one vote to the next.
we have nice graphs per dirauth and bwauth, if we would have
per-dirauthvote-per-flag (mainly guard, exit, hsdir - we have already running)
graphs as well we could spot such events (and even trends)
better. (btw: what caused there recent flat-line in graphs on 2018-02-03 - 2018-02-05)
What do you think?
thanks for considering it,
nusenu
[1] https://lists.torproject.org/pipermail/tor-relays/2018-February/014480.html
--
nusenu
2018-02-11 21:21:00 UTC
Permalink
Post by Damian Johnson
Thanks nusenu! Nice idea, added it to DocTor...
thanks for implementing the new check so fast.
Post by Damian Johnson
https://gitweb.torproject.org/doctor.git/commit/?id=8945013
It gives a notice if flags issued by an authority are 50% different
from the conensus. Presently there's only one instance of that...
This is also very useful but slightly different from what I had in mind,
because it would not trigger if dirauths upgrade from A to B in the
same hour and most exits, guards or hsdirs are gone due to a bug in version B.
Post by Damian Johnson
NOTICE: moria1 had 756 HSDir flags in its vote but
the consensus had 2583
I tried to find something related to this in the 0.3.3.x changelogs
because moria1 (the affected dirauth) is the only one running tor alpha
but I didn't find anything related to a change in what is required
to earn the HSDir flag. Has there been any change related to how
HSDir is assigned that would explain that significant difference?
--
https://mastodon.social/@nusenu
twitter: @nusenu_
Damian Johnson
2018-02-12 01:59:09 UTC
Permalink
Post by nusenu
thanks for implementing the new check so fast.
No problem! Thanks for suggesting it.
Post by nusenu
This is also very useful but slightly different from what I had in mind,
because it would not trigger if dirauths upgrade from A to B in the
same hour and most exits, guards or hsdirs are gone due to a bug in version B.
This should catch a bug with B unless every authority upgrades to B in
the same hour. Otherwise we'd get an alert - either because the
majority is B and the remaining A votes are out of band, or the
consensus is made with A and authorities that upgraded to B are
different.

Is there another check in particular that you'd like? One gotcha is
that checks that require state (such as comparing with the last hour's
consensus) is a bit more work.
Post by nusenu
I tried to find something related to this in the 0.3.3.x changelogs
because moria1 (the affected dirauth) is the only one running tor alpha
but I didn't find anything related to a change in what is required
to earn the HSDir flag. Has there been any change related to how
HSDir is assigned that would explain that significant difference?
For what it's worth I started with alarming when authorities differed
more than 20% from the consensus but it was a bit noisier...

[consensus-health] NOTICE: longclaw had 3100 HSDir flags in its vote
but the consensus had 2583
[consensus-health] NOTICE: moria1 had 756 HSDir flags in its vote but
the consensus had 2583
[consensus-health] NOTICE: moria1 had 1397 Guard flags in its vote but
the consensus had 1761
nusenu
2018-02-12 15:12:00 UTC
Permalink
Post by Damian Johnson
Post by nusenu
thanks for implementing the new check so fast.
No problem! Thanks for suggesting it.
Post by nusenu
This is also very useful but slightly different from what I had in mind,
because it would not trigger if dirauths upgrade from A to B in the
same hour and most exits, guards or hsdirs are gone due to a bug in version B.
This should catch a bug with B unless every authority upgrades to B in
the same hour. Otherwise we'd get an alert - either because the
majority is B and the remaining A votes are out of band, or the
consensus is made with A and authorities that upgraded to B are
different.
Is there another check in particular that you'd like?
Yes, but not directly related to this thread. I will file it via trac.tpo.
Post by Damian Johnson
One gotcha is
that checks that require state (such as comparing with the last hour's
consensus) is a bit more work.
Yes, that is what I was wondering if DocTor keeps any state at all already.
Post by Damian Johnson
Post by nusenu
I tried to find something related to this in the 0.3.3.x changelogs
because moria1 (the affected dirauth) is the only one running tor alpha
but I didn't find anything related to a change in what is required
to earn the HSDir flag. Has there been any change related to how
HSDir is assigned that would explain that significant difference?
For what it's worth I started with alarming when authorities differed
more than 20% from the consensus but it was a bit noisier...
[consensus-health] NOTICE: longclaw had 3100 HSDir flags in its vote
but the consensus had 2583
[consensus-health] NOTICE: moria1 had 756 HSDir flags in its vote but
the consensus had 2583
[consensus-health] NOTICE: moria1 had 1397 Guard flags in its vote but
the consensus had 1761
I assume this has not been deployed - 50% or maybe 40% are fine I guess.
To come up with good threshold values
one would need to look at historic data for the past few months.
--
https://mastodon.social/@nusenu
twitter: @nusenu_
Damian Johnson
2018-02-12 16:31:13 UTC
Permalink
Post by nusenu
I assume this has not been deployed - 50% or maybe 40% are fine I guess.
To come up with good threshold values
one would need to look at historic data for the past few months.
Nope, it is deployed (if by 'deployed' you mean DocTor is presently
performing this check). From David's reply about moria1 it sounds like
any check of this sort may be a red herring since they experiment with
moria1, but I'll leave that up to you guys. Just let me know what kind
of check you want.
Roger Dingledine
2018-02-15 17:51:19 UTC
Permalink
Post by Damian Johnson
Nope, it is deployed (if by 'deployed' you mean DocTor is presently
performing this check). From David's reply about moria1 it sounds like
any check of this sort may be a red herring since they experiment with
moria1, but I'll leave that up to you guys. Just let me know what kind
of check you want.
It might be smartest to just put in an exception for moria1's
HSDir votes, since we know it's being different.

--Roger
nusenu
2018-02-16 23:10:00 UTC
Permalink
Post by Roger Dingledine
It might be smartest to just put in an exception for moria1's
HSDir votes, since we know it's being different.
yes, please :)

and it would also be nice to have:
https://trac.torproject.org/projects/tor/ticket/25222

so we can filter for those emails that we care about most
(or filter those that we do not care about)
--
https://mastodon.social/@nusenu
twitter: @nusenu_
Damian Johnson
2018-02-17 00:59:23 UTC
Permalink
Post by Roger Dingledine
It might be smartest to just put in an exception for moria1's
HSDir votes, since we know it's being different.
Suppressed any notices for HSDir flags. Also fixed the time based
suppression for the check (it should have sent one notice a day rather
than one an hour).

David Goulet
2018-02-12 14:49:33 UTC
Permalink
Post by nusenu
Post by Damian Johnson
Thanks nusenu! Nice idea, added it to DocTor...
thanks for implementing the new check so fast.
Post by Damian Johnson
https://gitweb.torproject.org/doctor.git/commit/?id=8945013
It gives a notice if flags issued by an authority are 50% different
from the conensus. Presently there's only one instance of that...
This is also very useful but slightly different from what I had in mind,
because it would not trigger if dirauths upgrade from A to B in the
same hour and most exits, guards or hsdirs are gone due to a bug in version B.
Post by Damian Johnson
NOTICE: moria1 had 756 HSDir flags in its vote but
the consensus had 2583
I tried to find something related to this in the 0.3.3.x changelogs
because moria1 (the affected dirauth) is the only one running tor alpha
but I didn't find anything related to a change in what is required
to earn the HSDir flag. Has there been any change related to how
HSDir is assigned that would explain that significant difference?
This is because moria1 is running an experimental patch that drastically cut
down the number of voted HSDir. It has been doing that for a while now. I
can't recall the ticket but this was an attempt a while back to see how much
it would be bad to vote down for the most stable relays on the network to be
HSDir.

Roger can probably explain it better but all in all nothing to worried about
there as it is expected.

Actually, from time to time, you should expect moria1 to behave strangely
sometimes since it is often running alpha code :).

Cheers!
David
--
And8vxUcJVOn9srRjJ3mpKMUC5pScfYMRq9Qv9yt54Y=
nusenu
2018-02-12 15:09:00 UTC
Permalink
Post by David Goulet
Post by nusenu
Post by Damian Johnson
NOTICE: moria1 had 756 HSDir flags in its vote but
the consensus had 2583
I tried to find something related to this in the 0.3.3.x changelogs
because moria1 (the affected dirauth) is the only one running tor alpha
but I didn't find anything related to a change in what is required
to earn the HSDir flag. Has there been any change related to how
HSDir is assigned that would explain that significant difference?
This is because moria1 is running an experimental patch that drastically cut
down the number of voted HSDir. It has been doing that for a while now. I
can't recall the ticket but this was an attempt a while back to see how much
it would be bad to vote down for the most stable relays on the network to be
HSDir.
Roger can probably explain it better but all in all nothing to worried about
there as it is expected.
Thanks for the explanation!

I tried to find it on trac, I guess this is:
https://trac.torproject.org/projects/tor/ticket/19162
--
https://mastodon.social/@nusenu
twitter: @nusenu_
Roger Dingledine
2018-02-15 17:47:47 UTC
Permalink
Post by nusenu
Post by Damian Johnson
NOTICE: moria1 had 756 HSDir flags in its vote but
the consensus had 2583
https://trac.torproject.org/projects/tor/ticket/19162
Yes, correct. moria1 runs all sorts of experimental patches.

One of them is choosing the HSDir flag for relays based on:

+ hsdir_tk = find_nth_long(tks, n_active, n_active*3/4);
+ hsdir_bandwidth = find_nth_uint32(bandwidths_kb, n_active, n_active/4);

That is, the relay needs to be in the top quarter of the relays
by time-known, and in the top three-quarters of the relays by
bandwidth weights (as decided by moria1's bwauth).

I think the time-known idea is a potentially really smart one, since if
we do it right we force attacking hsdir relays to be in the network for
a long time before they are allowed to become hsdirs.

--Roger
Tom Ritter
2018-02-11 21:22:54 UTC
Permalink
I think the doctor notification is the best mechanism.

I'm not opposed to adding more graphs to consensus-health, but I think
I'd want to coordinate with the metrics team. There was talk about
them absorbing consensus health in some capacity, so I'd prefer to
avoid doing a lot of work on graphs if it's going to be redone or
throw away.

The host running depictor was down for several days, which explains
the gap in data.

Thanks for the thoughts!

-tom
Loading...