Discussion:
[tor-dev] Rebooting work on proposal #247 (guard discovery)
George Kadianakis
2017-11-29 12:08:15 UTC
Permalink
Hello Mike,

I'm finally getting out of the prop224/microdescriptor bug pile, and
getting more time to start working on guard stuff like prop247 again.

I'm planning to spend a few days next week to regain knowledge on
prop247. I'll check out the notes from the Wilmington hackfest, re-read
my old simulator's code, etc.

I know you have thought more about prop247 the past months, and it would
be great if you could brief me up on any updates that I should know
about. Specifically I'm wondering if you have any new insights on how
the proposed prop247 changes interact with Tor's guard algorithm (prop271)?

Also any other things I should know about from your work on the
performance simulator? Perhaps ideas about performance, topology or path
restrictions?

I'm hoping that such updates might help me boot up faster!

Cheers! :-)
Mike Perry
2017-12-05 04:19:29 UTC
Permalink
Post by George Kadianakis
Hello Mike,
I'm finally getting out of the prop224/microdescriptor bug pile, and
getting more time to start working on guard stuff like prop247 again.
I'm planning to spend a few days next week to regain knowledge on
prop247. I'll check out the notes from the Wilmington hackfest, re-read
my old simulator's code, etc.
I was not involved in with Prop271, so I am not deeply familiar with it.
However, it has several things we do not need. In particular, the plan
for prop247 still is to treat consensus information as the official
notion of vanguard reachability, so there is no need to try to determine
censorship, firewall, or local network reachability information. If a
node is in the consensus, it stays in our vanguard set and does not get
replaced until it actually leaves the consensus. This is consistent with
how the consensus is currently used for interior hops, and mitigates
path bias attacks.

I have not thought hard about what to do about nodes that leave the
consensus while they are still in our vanguard sets and rejoin later. I
am thinking that the simplest situation is to just pick a new node to
replace them and not worry about it.

If it is dead-simple to use only the consensus uptime portions of
prop271 without the reachability code, I could be convinced of that. But
as it is, the rotation times do not need to be as long as guards, and
the implementation simplification here is attractive. Plus, nodes that
fall completely out of the consensus periodically like this are probably
bad choices anyway..

What do you think?
Post by George Kadianakis
I know you have thought more about prop247 the past months, and it would
be great if you could brief me up on any updates that I should know
about. Specifically I'm wondering if you have any new insights on how
the proposed prop247 changes interact with Tor's guard algorithm (prop271)?
Also any other things I should know about from your work on the
performance simulator? Perhaps ideas about performance, topology or path
restrictions?
Yes. I have decided to simplify everything as much as possible. I am
going with a mesh topology for the prop247 performance tests (via
https://bugs.torproject.org/13837, https://bugs.torproject.org/23101
and https://bugs.torproject.org/24487). That is the simplest option to
implement and test for performance, and intuitively seems to have
almost as good security properties as the bin version (unless your
security simulator tells us otherwise).

I am also aiming for these high-level design goals, most important
first:

0. All service-side circuits use 3 hops of vanguards.
1. Hidden services should avoid trivially disclosing their third
vanguard to a non-network adversary (ie one that is not running nodes
but that is watching either HSDESCS or connecting to the service).
This means their paths look like this:
S - G - L2 - L3 - HSDIR
S - G - L2 - L3 - Intro
S - G - L2 - L3 - M - Rend
2. Clients should avoid revealing their third vanguard hop to services
and to nodes that have information about which service they are
accessing. This means that their paths look like this:
C - G - L2 - L3 - M - HSDIR
C - G - L2 - L3 - M - Intro
C - G - L2 - L3 - Rend
3. Clients use 3 hops of vanguards for all hidden service circuits.

If we do all of these, it will mean that we will have long path lengths
(8 hop rends), but it also means that it is easy to reason about
linkability and information disclosure. My thinking is that we should do
the performance tests with the safest option first (ie: all of these
goals), and see exactly how bad it is, and then make compromises if it
turns out to be much worse performance than status quo.

In the event of bad performance, I would alter property #3 before
messing with property #2, and alter #2 before property #1, but I could
be talked into a different strategy, or driven to one based on data.


In terms of pre-building and cannibalization
(https://bugs.torproject.org/23101), for vanguard-enabled clients, I am
going with the plan to create a special HS_GENERAL pre-built circuit
set. HS_GENERAL circuits will be four hops long (3 vanguards plus a
random middle), and will be used for all vanguard circuits except for
service-side INTRO circuits (since those are already long-lived and
pre-built, and don't need the extra middle). I have an implementation of
this and have tested it lightly -- it seems to work.

One additional wrinkle is that we will need to reverse our path
selection order, so that we do not leak information about earlier
vanguards to later hops in the path. This is
https://bugs.torproject.org/24487. For now, so I can have more of an
apples-to-apples comparison in terms of vanguard set sizes, I simply
allow the same vanguard to appear in multiple positions in the circuit,
if the prototype is enabled. I do hope to get #24487 done for 0.3.3,
though.


I have not written up the set of performance experiments I intend to run
yet, but at a high level I want to measure two things for a few
different L2 and L3 guard set sizes:

A. How does the average performance compare to existing onionperf data
at https://metrics.torproject.org/torperf.html?
B. What is that variance over time in performance with a fixed entry
guard, as the L2 and L3 guards rotate? Is the variance measurably
different than what happens on onionperf?

#A here will tell us if our paths are too long and seriously impact
average performance, meaning we have to revisit goals #0-3.

#B will tell us how much a really bad L2 or L3 set can impact
performance, and how often that happens. I expect that as we increase L2
and L3 sizes, variance in performance will go down, until we hit
diminishing returns. The goal is to find that sweet spot for choosing L2
and L3 as small as possible for as little variance as possible.

It would be great if your security simulator can tell us which L2 and L3
values are worth considering, so I can gather more useful (and more
detailed) performance data with fewer experiments.


I think that is it for now. As far as implementation goes, I am doing my
best to keep
https://trac.torproject.org/projects/tor/wiki/org/sponsors/SponsorV up
to date and stick with that timetable.

This means I want to merge all of the torrc options needed for the
performance tests into 0.3.3 (by mid January), so that hidden
service operators have the option of using the performance test
controller to get vanguard behavior if they want. My assumption here is
that we basically can all agree on the high level approach, and all
agree it is an improvement over status quo, but we will want the extra
time to actually make specific parameter choices and decide if we need
to or want to live with shorter paths for some scenarios..
--
Mike Perry
George Kadianakis
2017-12-06 13:33:33 UTC
Permalink
Post by Mike Perry
Post by George Kadianakis
Hello Mike,
I'm finally getting out of the prop224/microdescriptor bug pile, and
getting more time to start working on guard stuff like prop247 again.
I'm planning to spend a few days next week to regain knowledge on
prop247. I'll check out the notes from the Wilmington hackfest, re-read
my old simulator's code, etc.
Hey, and thanks for the reply!
Post by Mike Perry
I was not involved in with Prop271, so I am not deeply familiar with it.
However, it has several things we do not need. In particular, the plan
for prop247 still is to treat consensus information as the official
notion of vanguard reachability, so there is no need to try to determine
censorship, firewall, or local network reachability information. If a
node is in the consensus, it stays in our vanguard set and does not get
replaced until it actually leaves the consensus. This is consistent with
how the consensus is currently used for interior hops, and mitigates
path bias attacks.
If it is dead-simple to use only the consensus uptime portions of
prop271 without the reachability code, I could be convinced of that. But
as it is, the rotation times do not need to be as long as guards, and
the implementation simplification here is attractive. Plus, nodes that
fall completely out of the consensus periodically like this are probably
bad choices anyway..
What do you think?
Sounds plausible.

I'm slightly worried that not tracking transient reachability status
might cause situations where all the L2 guards are temporarily down and
hence brings complete the service/client to complete halt. I'm not sure
how likely this is to happen, and it surely depends on L2 size and L2
node selection parameters.

I have not thought about the engineering aspects of this. I think
bending the prop271 code to do this might be a PITA. But I'm also not
sure if not using the guard layer is gonna be easier either. I imagine
there will be engineering complexities either way here. Will try to
figure this out in the coming weeks.
Post by Mike Perry
Post by George Kadianakis
I know you have thought more about prop247 the past months, and it would
be great if you could brief me up on any updates that I should know
about. Specifically I'm wondering if you have any new insights on how
the proposed prop247 changes interact with Tor's guard algorithm (prop271)?
Also any other things I should know about from your work on the
performance simulator? Perhaps ideas about performance, topology or path
restrictions?
Yes. I have decided to simplify everything as much as possible. I am
going with a mesh topology for the prop247 performance tests (via
https://bugs.torproject.org/13837, https://bugs.torproject.org/23101
and https://bugs.torproject.org/24487). That is the simplest option to
implement and test for performance, and intuitively seems to have
almost as good security properties as the bin version (unless your
security simulator tells us otherwise).
Sounds reasonable!
Post by Mike Perry
I am also aiming for these high-level design goals, most important
0. All service-side circuits use 3 hops of vanguards.
1. Hidden services should avoid trivially disclosing their third
vanguard to a non-network adversary (ie one that is not running nodes
but that is watching either HSDESCS or connecting to the service).
S - G - L2 - L3 - HSDIR
S - G - L2 - L3 - Intro
S - G - L2 - L3 - M - Rend
2. Clients should avoid revealing their third vanguard hop to services
and to nodes that have information about which service they are
C - G - L2 - L3 - M - HSDIR
C - G - L2 - L3 - M - Intro
C - G - L2 - L3 - Rend
3. Clients use 3 hops of vanguards for all hidden service circuits.
If we do all of these, it will mean that we will have long path lengths
(8 hop rends), but it also means that it is easy to reason about
linkability and information disclosure. My thinking is that we should do
the performance tests with the safest option first (ie: all of these
goals), and see exactly how bad it is, and then make compromises if it
turns out to be much worse performance than status quo.
In the event of bad performance, I would alter property #3 before
messing with property #2, and alter #2 before property #1, but I could
be talked into a different strategy, or driven to one based on data.
Sounds reasonable.

In general, I imagine that this feature is gonna be opt-in initially
which makes me worry less about performance in the beginning.

Also, I'm currently not too afraid of guard discovery attacks for the
client-side, which might mean that we can let the vanguard feature as
optional for much longer time for clients (bringing the hop count to 7).
Post by Mike Perry
I have not written up the set of performance experiments I intend to run
yet, but at a high level I want to measure two things for a few
A. How does the average performance compare to existing onionperf data
at https://metrics.torproject.org/torperf.html?
B. What is that variance over time in performance with a fixed entry
guard, as the L2 and L3 guards rotate? Is the variance measurably
different than what happens on onionperf?
#A here will tell us if our paths are too long and seriously impact
average performance, meaning we have to revisit goals #0-3.
#B will tell us how much a really bad L2 or L3 set can impact
performance, and how often that happens. I expect that as we increase L2
and L3 sizes, variance in performance will go down, until we hit
diminishing returns. The goal is to find that sweet spot for choosing L2
and L3 as small as possible for as little variance as possible.
It would be great if your security simulator can tell us which L2 and L3
values are worth considering, so I can gather more useful (and more
detailed) performance data with fewer experiments.
ACK. Will be working on this.
Post by Mike Perry
I think that is it for now. As far as implementation goes, I am doing my
best to keep
https://trac.torproject.org/projects/tor/wiki/org/sponsors/SponsorV up
to date and stick with that timetable.
This means I want to merge all of the torrc options needed for the
performance tests into 0.3.3 (by mid January), so that hidden
service operators have the option of using the performance test
controller to get vanguard behavior if they want. My assumption here is
that we basically can all agree on the high level approach, and all
agree it is an improvement over status quo, but we will want the extra
time to actually make specific parameter choices and decide if we need
to or want to live with shorter paths for some scenarios..
Makes sense.

Loading...