Discussion:
[tor-dev] Proposal #291 (two guards) IRC meeting Wed Apr 18,
Mike Perry
2018-04-16 18:35:08 UTC
Permalink
Heyo.

We're going to have a meeting to discuss Proposal 291. See this thread:
https://lists.torproject.org/pipermail/tor-dev/2018-April/013053.html

The meeting will be at 17:00 UTC, on Wednesday, April 18th, in
#tor-meeting on irc.oftc.net. (That's 10:00 left coast, 12:00 middle
coast, 13:00 right coast, and 19:00 in several socialist paradises that
strangely do not have public water fountains.)
https://www.timeanddate.com/worldclock/fixedtime.html?iso=20180415T1700

Things we need to decide:
1. Do we abandon Tor's path restrictions?
2. Do we use two guards?

At the end of this meeting, we should commit to one or both of these
things long-term. (Surprise twist: we're already doing #2!)

Each of these choices is a nuanced thing. And just picking one or the
other doesn't solve everything. I think it's best to think of them as a
commitment to a plan over some timescale, based on the information we
have available today.


People who mos def should attend:
George Kadianakis,
Roger,
Nick,
Me

People who probably maybe should attend:
Aaron Johnson,
Isis (and others concerned about guard fingerprinting),
You?
--
Mike Perry
Mike Perry
2018-04-18 19:39:31 UTC
Permalink
Post by Mike Perry
Heyo.
https://lists.torproject.org/pipermail/tor-dev/2018-April/013053.html
Ok, we had this meeting. High level (ammended) action items are:

1. Use patches in https://trac.torproject.org/projects/tor/ticket/25843
to set NumEntryGuards=2 in torrc, and observe results. Please join us!
Stuff we are looking for during testing is on that ticket!
2. Merge that patch to make the torrc guard options do what we meant for
them to do. Probably backport it.
3. Descibe adversary models for our variant proposals from the notes.
(Why do we disagree? In Mike's case, my disagreements are because I
think each step is an improvement over previous/status quo -- we can
decide harder things later and still do better both now and later.)
4. Agree on an order of operations for fixes+changes, ideally such that we
don't block forever trying to come up with a perfect solution. Things
are pretty bad now. All we really need to do is agree on steps to make
it better.

The full meeting logs are here:
http://meetbot.debian.net/tor-meeting/2018/tor-meeting.2018-04-18-17.01.log.html

Our notes from the pad (https://pad.riseup.net/p/TwoGuardMeeting) are
also below, for archival. Please comment further here on list or in the
testing ticket, not on the pad. It will disappear eventually (and/or get
edited by randos). Please pay particular attention to the proposal
variants we have below, and weigh in if you like (especially with
adversary differentiation).

===============================

Things to decide:
1. Remove some or all of Tor's path restrictions?
1a. Remove some, for some hops? (Allow just same node, or same /16 + family two? and for which hops?)
1b. Remove all?
1c. Allow "same node, same /16, same family" between guard and last hop. If it's a 3-hop circ (A - B - A), extend it to a 4-hop circ (A - B - C - A).
2. Use two guards?
2a. Set prop#271 values?
2b. Modify prop#271 behavior?
2c. Two directory guards?
3. Alternatives?
3a. Allow some leakage about the guard, such as dividing guards into sets sharing similar /16 and family restrictions and then choosing exits and middles in a way that violates no path restrictions for any guard in your set. Taken to the extreme, we get the radical solution of two Tors: A-Tor and B-Tor. A-Tor exits, middles, and guards don't conflict each other, and similary for B-Tor. Alternately, we can just enforce that no exit is in the same /16 or family as any guard.


Reasons for 1:
1. Eliminates cases where adversary gets to influence your guard choice
2. Doing 1b also makes vanguard implementation simpler (no risk of choosing an impossible set of vanguards)

Blockers to 1:
1. Relay operators may like node family as protection?
2. 1b would make nearly _all_ kinds of path restriction impossible, indefinitely.
3. Circular paths make traffic analysis easier.
4. Circular paths are scary. :/


Reasons for 2:
1. Two guards inherently more resilient to downtime/DoS than one.
2. Helps conceal transition information when adding/removing single guards
3. Conflux will help us in more ways than just performance (reliability, congestion/DoS resistence)

Blockers for 2:
1. Current Prop#271 options may not be what we want (what do we do when two guards go down?)
2. May still need to remove/relax some restrictions, to avoid using 3rd guard if one is down.
3. Sybil time is halved (but still large)
4. Prop#271 mishandles directory guards (but maybe in a way we want it to)
5. Two-equal-guards means 2X external observers on the path for 1/2 of client traffic (but more multiplexed activity)


Relevant tickets related to guard-selection/path-restriction designs:
https://trac.torproject.org/projects/tor/ticket/14917 (Original bug that cuased us to use a second guard)
https://trac.torproject.org/projects/tor/ticket/25347 (Clients thrash at one busy guard)
https://trac.torproject.org/projects/tor/ticket/13908 (one directory guard?)
https://trac.torproject.org/projects/tor/ticket/25546 (vanguard patches -- open children are all about restriction issues)
https://trac.torproject.org/projects/tor/ticket/25783 (prop#271 bug we might encounter if we switch to prop#291 (2 primaries) right now. there's probably more where this came from)
https://bugs.torproject.org/17773 (How to transition if guard lose guard flag?)
https://bugs.torproject.org/2998 (Bridge path restriction circuit failure bug)
Other relevant tickets:
https://trac.torproject.org/projects/tor/ticket/24309 (UX for communicating guard purpose / protection to user)

Roger's proposal:
* Remove /16 and family path restrictions between guard and last hop
* Optionally, dir auths don't give you Guard if you're an Exit
* Use first guard but pad to backup guard so the switch isn't as obvious
* First and backup guard are chosen in different /16's and different families

asn proposal:
* Allow "same node, same /16, same family" between guard and last hop. If it's a 3-hop circ (A - B - A), extend it to a 4-hop circ (A - B - C - A).
* Switch to two primary guards; and revisit prop#271 as needed to make this possible and good.

Nick's proposal:
* allow two primary guards
* tweak guard design so that primary guards are not chosen in same /16 or family
* separately, consider relaxing path restriction rules. Not removing.
* separately, consider other proposals for new behavior on guard failure (as modification to guard-spec).
* separately, consider requiring introduce cells to contain >=two possible rendezvous points in separate families.
* separately, require that introduction points be chosen from different families.

Aaron's proposal:
* Use first guard but pad to backup guard so the switch isn't as obvious
* First and backup guard are chosen in different /16's and different families

Mike's proposal:
* Set "num primary guards"=2 and "num primary guards to use"=2
* Make no other changes right now
* File a path selection parent ticket to decide/fix path selection issues
* Tweak prop#271 behavior when both guards are down
* Investigate either favor-one-guard preference, conflux, and/or padding, but do this carefully.

Concrete things we can do now:
#1: ourselves set those guard params to 2 and find bugs. once #3 below is done, encourage others, like on tor-talk, to do it too.
#2: enumerate the current situations where we use a guard other than our first guard, especially noting the ones where the attacker can make us use a guard other than our first guard. fix as many as we want to fix. maybe categorize by whether they cause us to mark our first guard as down or not.
#3: merge a patch to make the torrc guard options do what we meant for them to do
#4 Descibe adversary models for above proposals? (Why do we disagree? In Mike's case, my disagreements are primarily because I think ech step is an improvement over previous/status quo -- we can decide harder things later and still do better).


===================
--
Mike Perry
George Kadianakis
2018-04-20 11:20:13 UTC
Permalink
Post by Mike Perry
Post by Mike Perry
Heyo.
https://lists.torproject.org/pipermail/tor-dev/2018-April/013053.html
1. Use patches in https://trac.torproject.org/projects/tor/ticket/25843
to set NumEntryGuards=2 in torrc, and observe results. Please join us!
Stuff we are looking for during testing is on that ticket!
2. Merge that patch to make the torrc guard options do what we meant for
them to do. Probably backport it.
Hello,

I wrote the patch on #25843 and I'm now testing 2-guards on my Tor. So far so
good, but I think we need people on more unstable connections to test this.
Post by Mike Perry
3. Descibe adversary models for our variant proposals from the notes.
(Why do we disagree? In Mike's case, my disagreements are because I
think each step is an improvement over previous/status quo -- we can
decide harder things later and still do better both now and later.)
Here is my proposal, but please don't consider it set on stone. I
actually think these are really complicated issues that take a while to
understand, and we should probably not rush it. Even on a short first
IRC meeting we came up with new issues and ideas while discussing this
topic.

asn proposal:
1) Allow "same node, same /16, same family" between guard and last hop.
If it's a 3-hop circ (A - B - A), extend it to a 4-hop circ (A - B - C - A).
2) Switch to two primary guards; and revisit prop#271 as needed to make this possible and good.

Rationale:

I care about an attacker who is trying to deanon Tor client by setting
up Tor nodes and comboing various active attacks. In particular, I worry
about adversary who uses guard discovery to learn client's guard nodes
and then uses #14917 or tries to DoS them.

I like two guards because it makes us stronger and more redundant
against such attacks, and also because it improves congestion. The
"pad-to-backup" idea seems too experimental to me, and not sufficiently
specified right now hence I'm unable to analyze it (e.g. how much do we
pad, how often, can this actually mask us against adversary who launches
#14917 repeatedly?).

I propose altering the above path restrictions because that seems to be
the only way to concretely defend against #14917 (e.g. see attacks
against idle clients on meeting log, etc.). Attackers who have already
owned our guard node are not in my threat model wrt these attacks. IMO
simple A - B - A path restrictions don't help us against such persistent
adversaries; e.g. attacker can simply spawn up another tiny relay C on
another data center and do an A - B - C correlation attack.
Post by Mike Perry
4. Agree on an order of operations for fixes+changes, ideally such that we
don't block forever trying to come up with a perfect solution. Things
are pretty bad now. All we really need to do is agree on steps to make
it better.
I think (1) and (2) above can be considered as orthogonal issues and get
done in any order. IMO, here are the prerequisites for doing these tasks:

For path restrictions: Specify current path restrictions through the whole Tor circuit
and write a concrete proposal with proposed changes. I think we
are looking for 0.3.5 if we want to do this.

For 2-guards: Get the 2-guard design sufficiently tested to ensure that we
are not gonna bug out the whole network by switching to
2-guards. I'm particularly worried about clients on bad
networks, and clients continuously flapping on-and-off the net.
If we toggle the consensus param switch soon, we should be
prepared for another round of guard bugs in 034, and that's fine.

Cheers! :)
Mike Perry
2018-04-25 08:30:46 UTC
Permalink
Post by Mike Perry
Heyo.
https://lists.torproject.org/pipermail/tor-dev/2018-April/013053.html
3. Describe adversary models for our variant proposals from the notes.
(Why do we disagree? In Mike's case, my disagreements are because I
think each step is an improvement over previous/status quo -- we can
decide harder things later and still do better both now and later.)
Ok, in the interest of getting closer to an adversary model, let's first
start with enumerating the properties the proposals below provide.
Properties #1-5 have parenthesis at the end of them. When the condition
in parenthesis is met for property #N, we'll call that "strong #N".

1. Hidden service use can't push you over to an unused guard (at all).
2. Hidden service use can't influence your choice of guard (at all).
3. Exits and websites can't push you over to an unused guard (at all)
4. DoS/Guard node downtime signals are rare (absent)
5. Nodes are not reused for Guard and Exit positions ("any" positions)
6. Information about the guard(s) does not leak to the website/RP (at all).
7. Relays in the same family can't be forced to correlate Exit traffic.
* Remove /16 and family path restrictions between guard and last hop
* Optionally, dir auths don't give you Guard if you're an Exit
* Use first guard but pad to backup guard so the switch isn't as obvious
* First and backup guard are chosen in different /16's and different families
Depending on how good the padding is, this proposal maybe-provides:
1. Hidden service use can't push you over to an unused guard (at all).
3. Exits and websites can't push you over to an unused guard (at all)

Depending on how good the detection mechanism is:
4. DoS/Guard node downtime signals are much more rare (absent)

It provides strong:
5. Nodes are not reused for Guard and Exit positions ("any" positions)

It provides:
7. Relays in the same family can't be forced to correlate Exit traffic.

It does not provide:
2. Hidden service use can't influence your choice of guard (at all).
6. Information about the guard(s) does not leak to the website/RP (at all).
* Allow "same node, same /16, same family" between guard and last hop. If it's a 3-hop circ (A - B - A), extend it to a 4-hop circ (A - B - C - A).
* Switch to two primary guards; and revisit prop#271 as needed to make this possible and good.
This proposal provides strong:
1. Hidden service use can't push you over to an unused guard (at all).
2. Hidden service use can't influence your choice of guard (at all).
3. Exits and websites can't push you over to an unused guard (at all)

If we fix prop#271's downtime detection for the two primaries, it provides:
4. DoS/Guard node downtime signals are rare (absent)

If the client chooses its primary guards from the same /16 or family, it
does not provide #6 (since the hop before the RP won't ever be in that
family):
6. Information about the guard(s) does not leak to the website/RP (at all).

It does not provide:
5. Nodes are not reused for Guard and Exit positions ("any" positions)
7. Relays in the same family can't be forced to correlate Exit traffic.
* allow two primary guards
* tweak guard design so that primary guards are not chosen in same /16 or family
* separately, consider relaxing path restriction rules. Not removing.
* separately, consider other proposals for new behavior on guard failure (as modification to guard-spec).
* separately, consider requiring introduce cells to contain >=two possible rendezvous points in separate families.
* separately, require that introduction points be chosen from different families.
In the short term, this proposal provides #1,3-4,6 (not strong, because
if one of the primary guards is down, you can be forced into using a
third):
1. Hidden service use can't push you over to an unused guard (at all).
3. Exits and websites can't push you over to an unused guard (at all)
4. DoS/Guard node downtime signals are rare (absent)
6. Information about the guard(s) does not leak to the website/RP (at all).

In the short term, it gets strong #5 and #7, though this may change if
we relax restrictions.
5. Nodes are not reused for Guard and Exit positions ("any" positions)
7. Relays in the same family can't be forced to correlate Exit traffic.

Changing the introduce cell will provide strong #1-2. Improving guard
failure conditions gets it strong #4:
1. Hidden service use can't push you over to an unused guard (at all).
2. Hidden service use can't influence your choice of guard (at all).
4. DoS/Guard node downtime signals are rare (absent)
* Use first guard but pad to backup guard so the switch isn't as obvious
* First and backup guard are chosen in different /16's and different families
Depending on how good the padding is, this proposal maybe-provides:
1. Hidden service use can't push you over to an unused guard (at all).
3. Exits and websites can't push you over to an unused guard (at all)

Depending on how good the detection mechanism is:
4. DoS/Guard node downtime signals are much more rare (absent)

It provides strong #5:
5. Nodes are not reused for Guard and Exit positions ("any" positions)

It provides #7:
7. Relays in the same family can't be forced to correlate Exit traffic.

It does not provide #2 or #6:
2. Hidden service use can't influence your choice of guard (at all).
6. Information about the guard(s) does not leak to the website/RP (at all).
* Set "num primary guards"=2 and "num primary guards to use"=2
* Make no other changes right now
* File a path selection parent ticket to decide/fix path selection issues
* Tweak prop#271 behavior when both guards are down
* Investigate either favor-one-guard preference, conflux, and/or padding, but do this carefully.
In the short term, this proposal provides #1,3-4,6 (not strong, because if
one of the primary guards is down, you can be forced into using a
third):
1. Hidden service use can't push you over to an unused guard (at all).
3. Exits and websites can't push you over to an unused guard (at all)
4. DoS/Guard node downtime signals are rare (absent)
6. Information about the guard(s) does not leak to the website/RP (at all).

If you get unlucky and choose both primaries from the same /16 or
family, you also lose #1,3,6.

In the short term, it gets strong #5 and #7, though this may
change if we relax restrictions:
5. Nodes are not reused for Guard and Exit positions ("any" positions)
7. Relays in the same family can't be forced to correlate Exit traffic.

Improving guard failure conditions gets it strong #4:
4. DoS/Guard node downtime signals are rare (absent)

It does not provide:
2. Hidden service use can't influence your choice of guard (at all).
[Mike's rules proposal to Roger in the other thread]
1. Two equal guards, chosen from not the same /16 or family
2. Choose each vanguard layer members such that each layer has at least
one node from a unique /16 and family.
3. Build paths in a strict order, from last hop towards guard. If you
can't build a path with this ordering, start over with a sampled guard.
(With rule #1 and #2, this should be very rare and should mean that
a guard is marked down locally but still marked up in the consensus.)
4. No guards as exits (Not needed but do it anyway for other reasons).
0. When a guard leaves the consensus, replace it with a new primary
guard.
1. Temporarily pick a new guard when your two primaries are locally down
or unusable (ie step #3 above fails).
This gets #1 and #6, but not strong (if one guard is temporarily down):
1. Hidden service use can't push you over to an unused guard (at all).
6. Information about the guard(s) does not leak to the website/RP (at all).

It gets strong #3-5:
3. Exits and websites can't push you over to an unused guard (at all)
4. DoS/Guard node downtime signals are rare (absent)
5. Nodes are not reused for Guard and Exit positions ("any" positions)

It does not provide:
2. Hidden service use can't influence your choice of guard (at all).

======================================================================

Ok, so here's a proposal that gets strong #1-4, and regular #5-7. It is
* Set "num primary guards"=2 and "num primary guards to use"=2
* Don't give Exit nodes the Guard flag.
* Allow "same node, same /16, same family" between guard and last hop,
but only for HS circuits (which are at least 4 hops long for these
cases).
* Allow same /16 and same family for HS circuits.
* When a primary guard leaves the consensus, pick a new one.
* If both primary guards are down/not completing circuits, pick a new one.
Strong:
1. Hidden service use can't push you over to an unused guard (at all).
2. Hidden service use can't influence your choice of guard (at all).
3. Exits and websites can't push you over to an unused guard (at all)
4. DoS/Guard node downtime signals are rare (absent)

Regular:
5. Nodes are not reused for Guard and Exit positions ("any" positions)
6. Information about the guard(s) does not leak to the website/RP (at all).
7. Relays in the same family can't be forced to correlate Exit traffic.

It gives up on strong #5 to get strong #1 and strong #2, because I don't
see a lot of difference between an HS circuit that uses the same Guard
as the RP vs one that uses the same Guard node for one of the other
side's middle or Guard hops (which we can't prevent).

We don't get strong #6, because if one guard is temporarily down but
still in the consensus and the adversarial RP makes enough circuits fast
enough, it could theoretically notice that the next node is never the
remaining not-down Guard. This window of time can be minimized by more
eagerly switching guards when one of them is unresponsive. It could be
eliminated by using S - G - L2 - L3 - R paths with vanguards (at the
expense of directly exposing service L3 vanguards to the RP, and
creating service linkability).
--
Mike Perry
George Kadianakis
2018-04-26 11:02:49 UTC
Permalink
Post by Mike Perry
Post by Mike Perry
Heyo.
https://lists.torproject.org/pipermail/tor-dev/2018-April/013053.html
3. Describe adversary models for our variant proposals from the notes.
(Why do we disagree? In Mike's case, my disagreements are because I
think each step is an improvement over previous/status quo -- we can
decide harder things later and still do better both now and later.)
Ok, in the interest of getting closer to an adversary model, let's first
start with enumerating the properties the proposals below provide.
Properties #1-5 have parenthesis at the end of them. When the condition
in parenthesis is met for property #N, we'll call that "strong #N".
Thanks Mike for this email. I think this moves us forward quite a bit
Post by Mike Perry
1. Hidden service use can't push you over to an unused guard (at all).
2. Hidden service use can't influence your choice of guard (at all).
Can we have a bit of more detailed description about the two properties above?
(2) seems like a superset of (1), so making these properties clear would be useful.
Post by Mike Perry
3. Exits and websites can't push you over to an unused guard (at all)
4. DoS/Guard node downtime signals are rare (absent)
Also, what does property (4) mean exactly?
Post by Mike Perry
5. Nodes are not reused for Guard and Exit positions ("any" positions)
6. Information about the guard(s) does not leak to the website/RP (at all).
7. Relays in the same family can't be forced to correlate Exit traffic.
Also it might be useful to rate the current guard design with these
properties and see how well we are currently doing.

IIUC, since we use all the primaries for dirguards it provides:
1. Hidden service use can't push you over to an unused guard (at all).
3. Exits and websites can't push you over to an unused guard (at all)

Because of the path restrictions it also provides:
5. Nodes are not reused for Guard and Exit positions ("any" positions)
7. Relays in the same family can't be forced to correlate Exit traffic.

It does *not* provide
2. Hidden service use can't influence your choice of guard (at all).
4. DoS/Guard node downtime signals are rare (absent)
6. Information about the guard(s) does not leak to the website/RP (at all).

Let me know if I messed it up.

Clearly since everyone in this thread wants to improve the current
situation, the properties the current system lacks are important. In
particular it seems like (2) and (6) are particularly important properties.
Post by Mike Perry
* Remove /16 and family path restrictions between guard and last hop
* Optionally, dir auths don't give you Guard if you're an Exit
* Use first guard but pad to backup guard so the switch isn't as obvious
* First and backup guard are chosen in different /16's and different families
1. Hidden service use can't push you over to an unused guard (at all).
3. Exits and websites can't push you over to an unused guard (at all)
4. DoS/Guard node downtime signals are much more rare (absent)
5. Nodes are not reused for Guard and Exit positions ("any" positions)
7. Relays in the same family can't be forced to correlate Exit traffic.
How does it provide 7?
Post by Mike Perry
<snip>
* Use first guard but pad to backup guard so the switch isn't as obvious
* First and backup guard are chosen in different /16's and different families
1. Hidden service use can't push you over to an unused guard (at all).
3. Exits and websites can't push you over to an unused guard (at all)
4. DoS/Guard node downtime signals are much more rare (absent)
5. Nodes are not reused for Guard and Exit positions ("any" positions)
7. Relays in the same family can't be forced to correlate Exit traffic.
2. Hidden service use can't influence your choice of guard (at all).
6. Information about the guard(s) does not leak to the website/RP (at all).
How come Aaron's proposal provides the same benefits as Roger's even tho
they different? Am I missing something?
Post by Mike Perry
<snip>
Ok, so here's a proposal that gets strong #1-4, and regular #5-7. It is
* Set "num primary guards"=2 and "num primary guards to use"=2
* Don't give Exit nodes the Guard flag.
* Allow "same node, same /16, same family" between guard and last hop,
but only for HS circuits (which are at least 4 hops long for these
cases).
* Allow same /16 and same family for HS circuits.
* When a primary guard leaves the consensus, pick a new one.
We already do this one. Primary guards come from the filtered set, and
filtered set guards need to be listed in the consensus. See
entry_guard_passes_filter(). If this is not the case in reality, it's a bug.
Post by Mike Perry
* If both primary guards are down/not completing circuits, pick a new one.
Hmm, this is almost impossible to do. People with laptops and unstable
networks frequently have both of their primary guards marked as
unreachable while Tor is trying to reach network. Picking new primaries
at that point would not be a good move.
Post by Mike Perry
1. Hidden service use can't push you over to an unused guard (at all).
2. Hidden service use can't influence your choice of guard (at all).
3. Exits and websites can't push you over to an unused guard (at all)
4. DoS/Guard node downtime signals are rare (absent)
5. Nodes are not reused for Guard and Exit positions ("any" positions)
6. Information about the guard(s) does not leak to the website/RP (at all).
7. Relays in the same family can't be forced to correlate Exit traffic.
All in all I like the above proposal (modulo the issues above) and I
think it's quite sane, and gets the best of most worlds ;) We should
perhaps think more about it and try to spec it out! :)

Let's see what other people think.
Mike Perry
2018-04-26 10:32:19 UTC
Permalink
Post by George Kadianakis
Post by Mike Perry
Post by Mike Perry
Heyo.
https://lists.torproject.org/pipermail/tor-dev/2018-April/013053.html
3. Describe adversary models for our variant proposals from the notes.
(Why do we disagree? In Mike's case, my disagreements are because I
think each step is an improvement over previous/status quo -- we can
decide harder things later and still do better both now and later.)
Ok, in the interest of getting closer to an adversary model, let's first
start with enumerating the properties the proposals below provide.
Properties #1-5 have parenthesis at the end of them. When the condition
in parenthesis is met for property #N, we'll call that "strong #N".
Thanks Mike for this email. I think this moves us forward quite a bit
Post by Mike Perry
1. Hidden service use can't push you over to an unused guard (at all).
2. Hidden service use can't influence your choice of guard (at all).
Can we have a bit of more detailed description about the two properties above?
(2) seems like a superset of (1), so making these properties clear would be useful.
Yes, if a defense provides #2, then it always provides #1. Also, a
defense provides #1 without providing #2 (by using two guards equally,
for example).

Or said a different way, an attacker who can break #2 can sometimes use
that to break #1.

To avoid confusion, I don't think we should change the property wording
or numbering until we do another round of proposal comparison, and/or
until people propose new properties that some designs satisfy (or
failed to satisfy).
Post by George Kadianakis
Post by Mike Perry
3. Exits and websites can't push you over to an unused guard (at all)
4. DoS/Guard node downtime signals are rare (absent)
Also, what does property (4) mean exactly?
Property 4 is the best argument for using two guards as opposed to only
fiddling with restrictions. With the current way we handle onionskin
failure (#25347), clients will simply lose connectivity by way of
endless DESTROY responses before making a valid circuit. This means that
the adversary can onionskin-DoS guard nodes one at a time, and wait for
a hidden service to become unresponsive. That is what it means to have a
DoS (or downtime) signal.

Using two guards dumbly makes this rare. Both are down at the same time
by chance much less frequently than one is down, and a two-node DoS
search is harder to pull off when the adversary has to keep pairs (or
more) nodes offline at the same time, without taking other services
offline and causing false positives.

Using additional guards as soon as things fail makes these signals
absent, in theory. If a client is always trying to connect to new
guards, as long as the client can connect to the network, it will find a
guard that works pretty soon. This is also be another way of using two
guards dumbly, though.
Post by George Kadianakis
Post by Mike Perry
5. Nodes are not reused for Guard and Exit positions ("any" positions)
6. Information about the guard(s) does not leak to the website/RP (at all).
7. Relays in the same family can't be forced to correlate Exit traffic.
Also it might be useful to rate the current guard design with these
properties and see how well we are currently doing.
1. Hidden service use can't push you over to an unused guard (at all).
3. Exits and websites can't push you over to an unused guard (at all)
If by current design, you mean the current network as-is without
changing any consensus parameters, then these two aren't provided.

Since the current design is "num primary guard to use"=1, the current
design tries really hard to use only this guard. This means that as soon
as a hidden service chooses that guard as it's RP, it will use a second
guard. This second guard is normally unused. Hence: Hidden service use
pushed the service over to an unused guard.

Similarly, if website can cause a client to keep connecting through
different circuits over and over (via at least 3 different attacks,
mentioned in the other thread), then it can eventually cause that client
to use a second guard. We want to fix this for other reasons (guard
discovery), but that doesn't change it as a property here. And there may
be more like them if we fix just these three.
Post by George Kadianakis
5. Nodes are not reused for Guard and Exit positions ("any" positions)
7. Relays in the same family can't be forced to correlate Exit traffic.
Correct. It does provide these.
Post by George Kadianakis
It does *not* provide
2. Hidden service use can't influence your choice of guard (at all).
4. DoS/Guard node downtime signals are rare (absent)
6. Information about the guard(s) does not leak to the website/RP (at all).
Correct. It does not provide these.
Post by George Kadianakis
Let me know if I messed it up.
Clearly since everyone in this thread wants to improve the current
situation, the properties the current system lacks are important. In
particular it seems like (2) and (6) are particularly important properties.
Post by Mike Perry
* Remove /16 and family path restrictions between guard and last hop
* Optionally, dir auths don't give you Guard if you're an Exit
* Use first guard but pad to backup guard so the switch isn't as obvious
* First and backup guard are chosen in different /16's and different families
1. Hidden service use can't push you over to an unused guard (at all).
3. Exits and websites can't push you over to an unused guard (at all)
4. DoS/Guard node downtime signals are much more rare (absent)
5. Nodes are not reused for Guard and Exit positions ("any" positions)
7. Relays in the same family can't be forced to correlate Exit traffic.
How does it provide 7?
Woops, it does not. All it does is prevent the same *node* from being
used in the Guard and Exit position. I mixed that up with an earlier
revision of these properties...
Post by George Kadianakis
Post by Mike Perry
* Use first guard but pad to backup guard so the switch isn't as obvious
* First and backup guard are chosen in different /16's and different families
1. Hidden service use can't push you over to an unused guard (at all).
3. Exits and websites can't push you over to an unused guard (at all)
4. DoS/Guard node downtime signals are much more rare (absent)
5. Nodes are not reused for Guard and Exit positions ("any" positions)
7. Relays in the same family can't be forced to correlate Exit traffic.
2. Hidden service use can't influence your choice of guard (at all).
6. Information about the guard(s) does not leak to the website/RP (at all).
How come Aaron's proposal provides the same benefits as Roger's even tho
they different? Am I missing something?
Aaron's proposal actually does provide #7.

The key difference between the two is Roger's "Remove /16 and family
path restrictions between guard and last hop". That causes Roger to lose
#7. They also differ in the Guard+Exit flag assignment, but in this case
that does not change the properties provided, because no node
restrictions are removed.
Post by George Kadianakis
Post by Mike Perry
<snip>
Ok, so here's a proposal that gets strong #1-4, and regular #5-7. It is
* Set "num primary guards"=2 and "num primary guards to use"=2
* Don't give Exit nodes the Guard flag.
* Allow "same node, same /16, same family" between guard and last hop,
but only for HS circuits (which are at least 4 hops long for these
cases).
* Allow same /16 and same family for HS circuits.
* When a primary guard leaves the consensus, pick a new one.
We already do this one. Primary guards come from the filtered set, and
filtered set guards need to be listed in the consensus. See
entry_guard_passes_filter(). If this is not the case in reality, it's a bug.
Good.
Post by George Kadianakis
Post by Mike Perry
* If both primary guards are down/not completing circuits, pick a new one.
Hmm, this is almost impossible to do. People with laptops and unstable
networks frequently have both of their primary guards marked as
unreachable while Tor is trying to reach network. Picking new primaries
at that point would not be a good move.
Yuck. Well, minimizing this time/chance perhaps. Like if the client has
a TLS connection but both are failing all onionskins, then choose a
third?
Post by George Kadianakis
Post by Mike Perry
1. Hidden service use can't push you over to an unused guard (at all).
2. Hidden service use can't influence your choice of guard (at all).
3. Exits and websites can't push you over to an unused guard (at all)
4. DoS/Guard node downtime signals are rare (absent)
5. Nodes are not reused for Guard and Exit positions ("any" positions)
6. Information about the guard(s) does not leak to the website/RP (at all).
7. Relays in the same family can't be forced to correlate Exit traffic.
All in all I like the above proposal (modulo the issues above) and I
think it's quite sane, and gets the best of most worlds ;) We should
perhaps think more about it and try to spec it out! :)
I would prefer a proposal that has strong #6, but I think we are close
to that. All we need to do is prevent the case where "one guard down &&
guards can be chosen next to the RP."

Right now, I am leaning towards a hack that says "Vanguards can choose a
guard before the RP." We'd still be S - G - L2 - L3 - G - RP in that
case, though. As I said, an alternative is S - G - L2 - L3 - RP, but I
think I would rather preserve unlinkability for services run on the same
Tor client. A third alternative is trying to minimize the "only one
guard down" time. Such downtime minimization does seem tricky, though.

I would also like to try to beef up #4 as much as we can. If we can't
make all node downtime signals absent, we should aim to minimize them.
--
Mike Perry
teor
2018-04-26 22:16:47 UTC
Permalink
Post by Mike Perry
1. Hidden service use can't push you over to an unused guard (at all).
2. Hidden service use can't influence your choice of guard (at all).
3. Exits and websites can't push you over to an unused guard (at all)
4. DoS/Guard node downtime signals are rare (absent)
5. Nodes are not reused for Guard and Exit positions ("any" positions)
6. Information about the guard(s) does not leak to the website/RP (at all).
7. Relays in the same family can't be forced to correlate Exit traffic.
I think this list is missing some important user-visible properties, or it's
not clear which property above corresponds to these properties:

* Is Tor reliable and responsive when guards go down, or when I move
networks, or when I have lost and regained service?

I also think it's missing an implicit property, which we should make explicit:

* Can Tor users be fingerprinted by their set of guards or directory guards?

Perhaps this property is out of scope.

T
Mike Perry
2018-04-27 10:43:27 UTC
Permalink
Post by teor
Post by Mike Perry
1. Hidden service use can't push you over to an unused guard (at all).
2. Hidden service use can't influence your choice of guard (at all).
3. Exits and websites can't push you over to an unused guard (at all)
4. DoS/Guard node downtime signals are rare (absent)
5. Nodes are not reused for Guard and Exit positions ("any" positions)
6. Information about the guard(s) does not leak to the website/RP (at all).
7. Relays in the same family can't be forced to correlate Exit traffic.
I think this list is missing some important user-visible properties, or it's
* Is Tor reliable and responsive when guards go down, or when I move
networks, or when I have lost and regained service?
I think this is implicitly provided by #4. Downtime is a security issue.
If (any of) a client Guard(s) are down, and the adversary can detect
this based on client behavior, well, that is a side channel signal that
provides information about the Guard. So by satisfying #4, we also
satisfy the weaker conditions of general reliability and responsiveness.
Post by teor
* Can Tor users be fingerprinted by their set of guards or directory guards?
Perhaps this property is out of scope.
I think it is worth considering. We should add it if we need to do
another round of evaluation.

But remmeber that we are already in the situation where Tor is using two
guards for a lot (or all) users right now: it uses a second guard right
now whenever an RP or Exit is the same as the Guard node, or is chosen
from the same /16 or family as the Guard node. Depending on how unlucky
you are, you could be using 2 guards pretty often right now. Just not
often enough to benefit from any multiplexing and netflow padding.

Tor also currently uses 3 directory guards, and unless we set "num entry
guards to use" and "num entry guards" to the same number, these are
different nodes than the primary guard. Miraculously, if we set this to
two, then Tor uses those two primary guards *as* its directory guards.
This means that any proposal that said "Set these to 2" has *less*
fingerprinting than those that did not. My proposal was the only one
that explicitly said this, but I think asn wants this too.

That means if we accept the proposal at the end of my mail, which gets
us strong #1-4, non-strong #5, strong #6 (with mods), and #7, then we'll
have less guard fingerprintability than today.
--
Mike Perry
Mike Perry
2018-04-27 11:56:35 UTC
Permalink
Post by Mike Perry
Post by teor
Post by Mike Perry
1. Hidden service use can't push you over to an unused guard (at all).
2. Hidden service use can't influence your choice of guard (at all).
3. Exits and websites can't push you over to an unused guard (at all)
4. DoS/Guard node downtime signals are rare (absent)
5. Nodes are not reused for Guard and Exit positions ("any" positions)
6. Information about the guard(s) does not leak to the website/RP (at all).
7. Relays in the same family can't be forced to correlate Exit traffic.
I think this list is missing some important user-visible properties, or it's
* Is Tor reliable and responsive when guards go down, or when I move
networks, or when I have lost and regained service?
I think this is implicitly provided by #4. Downtime is a security issue.
If (any of) a client Guard(s) are down, and the adversary can detect
this based on client behavior, well, that is a side channel signal that
provides information about the Guard. So by satisfying #4, we also
satisfy the weaker conditions of general reliability and responsiveness.
Post by teor
* Can Tor users be fingerprinted by their set of guards or directory guards?
Perhaps this property is out of scope.
I think it is worth considering. We should add it if we need to do
another round of evaluation.
Alright, for the sake of argument, let's call this Property #8:
8. Less information from guard fingerprinting (the least information)

I argue that this #8 is also equivalent to a #9 that Roger would ask
for:
9. Fewer points of observation into the network (the fewest points).

To avoid TL;DR, that argument is an exercise to the reader ;).

Here is a proposal that beats my previous proposal on Property #8 and
#9, while trying to preserve as many of the other properties as
possible:

* Set "num primary guards"=1 and "num primary guards to use"=1
* Set "num directory guards"=1 and "num directory guards to use"=1
* Don't give Exit nodes the Guard flag.
* Allow "same node, same /16, same family" between guard and last hop,
but only for HS circuits (which are at least 4 hops).
* Allow same /16 and same family for HS circuits.
* When a primary guard leaves the consensus, pick a new one.
* When a primary guard fails circuits, do $MAGIC_FAILURE_HEURISTIC.

This proposal gets strong:
1. Hidden service use can't push you over to an unused guard (at all).
2. Hidden service use can't influence your choice of guard (at all).
3. Exits and websites can't push you over to an unused guard (at all)
8. Less information from guard fingerprinting (the least information)

It loses #4 (and your reliability point above), because if we transition
to a second guard too quickly when the first one starts failing, then we
lose the winning fingerprinting property we want to keep. So then
therefore, we must tolerate failure and RESOURCELIMIT issues and suffer
through connectivity issues during DoS:
4. DoS/Guard node downtime signals are rare (absent)

It then gets us regular:
5. Nodes are not reused for Guard and Exit positions ("any" positions)
6. Information about the guard(s) does not leak to the website/RP (at all).
7. Relays in the same family can't be forced to correlate Exit traffic.

And again, we could get strong #6 if we allow the guard node for both RP
and the node before the RP:
6. Information about the guard(s) does not leak to the website/RP (at all).


So the key thing (in this property list) that forcing one guard causes us
to lose is reliability under DoS, which is a guard discovery vector (and
probably a source of other side channels, too).
--
Mike Perry
George Kadianakis
2018-05-03 11:14:02 UTC
Permalink
Post by Mike Perry
Post by Mike Perry
Post by teor
Post by Mike Perry
1. Hidden service use can't push you over to an unused guard (at all).
2. Hidden service use can't influence your choice of guard (at all).
3. Exits and websites can't push you over to an unused guard (at all)
4. DoS/Guard node downtime signals are rare (absent)
5. Nodes are not reused for Guard and Exit positions ("any" positions)
6. Information about the guard(s) does not leak to the website/RP (at all).
7. Relays in the same family can't be forced to correlate Exit traffic.
I think this list is missing some important user-visible properties, or it's
* Is Tor reliable and responsive when guards go down, or when I move
networks, or when I have lost and regained service?
I think this is implicitly provided by #4. Downtime is a security issue.
If (any of) a client Guard(s) are down, and the adversary can detect
this based on client behavior, well, that is a side channel signal that
provides information about the Guard. So by satisfying #4, we also
satisfy the weaker conditions of general reliability and responsiveness.
Post by teor
* Can Tor users be fingerprinted by their set of guards or directory guards?
Perhaps this property is out of scope.
I think it is worth considering. We should add it if we need to do
another round of evaluation.
8. Less information from guard fingerprinting (the least information)
I argue that this #8 is also equivalent to a #9 that Roger would ask
9. Fewer points of observation into the network (the fewest points).
If we are actually aiming for 8 and 9 we need to do something about the
numdirguard=3 situation, otherwise we still have a huge guard fpr and we
still expose ourselves to more of the network even if we keep one guard.
Post by Mike Perry
To avoid TL;DR, that argument is an exercise to the reader ;).
Here is a proposal that beats my previous proposal on Property #8 and
#9, while trying to preserve as many of the other properties as
* Set "num primary guards"=1 and "num primary guards to use"=1
* Set "num directory guards"=1 and "num directory guards to use"=1
* Don't give Exit nodes the Guard flag.
* Allow "same node, same /16, same family" between guard and last hop,
but only for HS circuits (which are at least 4 hops).
* Allow same /16 and same family for HS circuits.
This's for all hops? So all service-side HS circ hops can share the same
family? I gues that's OK since we don't know what's happening on the
other side of the HS circuit anyhow? Or what?
Post by Mike Perry
* When a primary guard leaves the consensus, pick a new one.
* When a primary guard fails circuits, do $MAGIC_FAILURE_HEURISTIC.
What is the $MAGIC_FAILURE_HEURISTIC supposed to do? Also I doubt we can
do anything magic here, we even have trouble doing very naive stuff when
it comes to network-uptime response.
Post by Mike Perry
1. Hidden service use can't push you over to an unused guard (at all).
2. Hidden service use can't influence your choice of guard (at all).
3. Exits and websites can't push you over to an unused guard (at all)
8. Less information from guard fingerprinting (the least information)
It loses #4 (and your reliability point above), because if we transition
to a second guard too quickly when the first one starts failing, then we
lose the winning fingerprinting property we want to keep. So then
therefore, we must tolerate failure and RESOURCELIMIT issues and suffer
4. DoS/Guard node downtime signals are rare (absent)
5. Nodes are not reused for Guard and Exit positions ("any" positions)
6. Information about the guard(s) does not leak to the website/RP (at all).
7. Relays in the same family can't be forced to correlate Exit traffic.
And again, we could get strong #6 if we allow the guard node for both RP
6. Information about the guard(s) does not leak to the website/RP (at all).
So the key thing (in this property list) that forcing one guard causes us
to lose is reliability under DoS, which is a guard discovery vector (and
probably a source of other side channels, too).
Mike Perry
2018-05-02 20:16:21 UTC
Permalink
Post by George Kadianakis
Post by Mike Perry
Post by Mike Perry
Post by teor
Post by Mike Perry
1. Hidden service use can't push you over to an unused guard (at all).
2. Hidden service use can't influence your choice of guard (at all).
3. Exits and websites can't push you over to an unused guard (at all)
4. DoS/Guard node downtime signals are rare (absent)
5. Nodes are not reused for Guard and Exit positions ("any" positions)
6. Information about the guard(s) does not leak to the website/RP (at all).
7. Relays in the same family can't be forced to correlate Exit traffic.
I think this list is missing some important user-visible properties, or it's
* Is Tor reliable and responsive when guards go down, or when I move
networks, or when I have lost and regained service?
I think this is implicitly provided by #4. Downtime is a security issue.
If (any of) a client Guard(s) are down, and the adversary can detect
this based on client behavior, well, that is a side channel signal that
provides information about the Guard. So by satisfying #4, we also
satisfy the weaker conditions of general reliability and responsiveness.
Post by teor
* Can Tor users be fingerprinted by their set of guards or directory guards?
Perhaps this property is out of scope.
I think it is worth considering. We should add it if we need to do
another round of evaluation.
8. Less information from guard fingerprinting (the least information)
I argue that this #8 is also equivalent to a #9 that Roger would ask
9. Fewer points of observation into the network (the fewest points).
If we are actually aiming for 8 and 9 we need to do something about the
numdirguard=3 situation, otherwise we still have a huge guard fpr and we
still expose ourselves to more of the network even if we keep one guard.
Yeah. Hrmm. I suppose this is a way that property #8 differs from
property #9... The dirguard usage increases fingerprinting, but if
observation for #9 means "observation of relayed application traffic",
then not setting the dirguards to 1 costs us #8, but not #9.
Post by George Kadianakis
Post by Mike Perry
To avoid TL;DR, that argument is an exercise to the reader ;).
Here is a proposal that beats my previous proposal on Property #8 and
#9, while trying to preserve as many of the other properties as
* Set "num primary guards"=1 and "num primary guards to use"=1
* Set "num directory guards"=1 and "num directory guards to use"=1
* Don't give Exit nodes the Guard flag.
* Allow "same node, same /16, same family" between guard and last hop,
but only for HS circuits (which are at least 4 hops).
* Allow same /16 and same family for HS circuits.
This's for all hops? So all service-side HS circ hops can share the same
family? I gues that's OK since we don't know what's happening on the
other side of the HS circuit anyhow? Or what?
Yeah, that was my reasoning for defining property #7 in terms of Exit
traffic only. There may be alterations of this that prevent the same
family from being in every position of one end of the circuit, but since
we can't prevent the case where the same family is on both entry points
across the entire HS connection to correlate the entire circuit, I am
not sure how to define this property.

Maybe there is a difference if the same family is allowed to be the IP
and HSDIR, though, since that could allow forced correlation to
deanonymize the HS itself... We could consider preventing that. With one
guard, it definitely will leak information about the choice of IPs over
time, though, which is worse (and is the case today :/). With two guards
chosen from different families and /16, it should be fine with respect
to chosen IPs and used HSDIRs, except in the event that one of the
guard's downtime happens at the same time as an IP or HSDIR is chosen
from the same family as the still-up guard. This is a much more rare and
less risky event than the similar situation with an RP, though (since
the RP cycles frequently and can be adversary controlled).
Post by George Kadianakis
Post by Mike Perry
* When a primary guard leaves the consensus, pick a new one.
* When a primary guard fails circuits, do $MAGIC_FAILURE_HEURISTIC.
What is the $MAGIC_FAILURE_HEURISTIC supposed to do? Also I doubt we can
do anything magic here, we even have trouble doing very naive stuff when
it comes to network-uptime response.
In order to preserve property #8 (and #9), this failure heuristic has to
try really hard not to quickly switch over to the second guard as soon
as there is a RESOURCELIMIT or other failure. It needs to be "sure" that
the guard is really down. This means waiting for some number of
RESOURCELIMITs or other failures to happen before the switch to the
second guard, which necessarily introduces some level of downtime
signal, which costs us property #4. (We already have decided in
https://trac.torproject.org/projects/tor/ticket/25347 that it is
preferable to accept large amounts of RESOURCELIMITs before switching
guards.)

That was the point of this proposal -- I wanted to demonstrate that with
only one guard, we basically have to accept either a louder downtime
signal, or we have to accept cases where we use two guards more often.

I still believe that two always-on guards is the better choice (and
gives us more flexibility with alternate ways to handle things like
family restrictions above), but I also wanted to compare apples to
apples in terms of one guard vs two guard proposals.
Post by George Kadianakis
Post by Mike Perry
1. Hidden service use can't push you over to an unused guard (at all).
2. Hidden service use can't influence your choice of guard (at all).
3. Exits and websites can't push you over to an unused guard (at all)
8. Less information from guard fingerprinting (the least information)
It loses #4 (and your reliability point above), because if we transition
to a second guard too quickly when the first one starts failing, then we
lose the winning fingerprinting property we want to keep. So then
therefore, we must tolerate failure and RESOURCELIMIT issues and suffer
4. DoS/Guard node downtime signals are rare (absent)
5. Nodes are not reused for Guard and Exit positions ("any" positions)
6. Information about the guard(s) does not leak to the website/RP (at all).
7. Relays in the same family can't be forced to correlate Exit traffic.
And again, we could get strong #6 if we allow the guard node for both RP
6. Information about the guard(s) does not leak to the website/RP (at all).
So the key thing (in this property list) that forcing one guard causes us
to lose is reliability under DoS, which is a guard discovery vector (and
probably a source of other side channels, too).
--
Mike Perry
George Kadianakis
2018-05-09 09:08:32 UTC
Permalink
Post by Mike Perry
Post by Mike Perry
Heyo.
https://lists.torproject.org/pipermail/tor-dev/2018-April/013053.html
1. Use patches in https://trac.torproject.org/projects/tor/ticket/25843
to set NumEntryGuards=2 in torrc, and observe results. Please join us!
Stuff we are looking for during testing is on that ticket!
2. Merge that patch to make the torrc guard options do what we meant for
them to do. Probably backport it.
3. Descibe adversary models for our variant proposals from the notes.
(Why do we disagree? In Mike's case, my disagreements are because I
think each step is an improvement over previous/status quo -- we can
decide harder things later and still do better both now and later.)
4. Agree on an order of operations for fixes+changes, ideally such that we
don't block forever trying to come up with a perfect solution. Things
are pretty bad now. All we really need to do is agree on steps to make
it better.
<snip>
#1: ourselves set those guard params to 2 and find bugs. once #3 below is done, encourage others, like on tor-talk, to do it too.
#2: enumerate the current situations where we use a guard other than our first guard, especially noting the ones where the attacker can make us use a guard other than our first guard. fix as many as we want to fix. maybe categorize by whether they cause us to mark our first guard as down or not.
OK, I did a bit of #2 yesterday as part of an IRC discussion with Mike
and Roger. In particular, I attempted to enumerate the places in our
codebase where we mark a guard as unreachable and hence skip it for
future circuits.

The key functions here are entry_guard_failed() and entry_guard_chan_failed().
These are called in the following places:

1) circuit_build_failed(): We blame the guard if there was an error
during path building when we don't have the first hop open on the
circuit yet. We don't blame the guard for errors during path
selection.

2) connection_dir_request_failed(): We blame the guard if we fail to
connect to a dirserver because of network error.

3) connection_or_about_to_close(): We blame the guard when we are
closing an OR connection that started at us but never made it to
state open. We do this because otherwise we would keep beating our
heads against a broken guard.

4) connection_or_client_learned_peer_id(): We blame the guard when we
receive the wrong RSA identity key from the guard during the TLS handshake.

The first 3 cases here seem to handle the cases of network errors and
unreachable guards. It's interesting how we have to handle this case in
three different places. I wonder if we are missing any other places here.

The last case seems to handle the case of network MITM attacks. I don't
see anything wrong with that, since encountering an MITM certainly means
that something bad is going on, and also an MITM adversary could also
cause one of the first 3 cases.

Loading...