Discussion:
[tor-dev] The case for Tor-over-QUIC
Mike Perry
2018-03-23 23:18:47 UTC
Permalink
In Rome, I held a session about network protocol upgrades. My intent was
to cover the switch to two guards, conflux, datagram transports, and
QUIC. We ended up touching only briefly on everything but QUIC, but we
went into enough depth on QUIC itself that it was a worthwhile and very
productive session.

Our notes are here:
https://trac.torproject.org/projects/tor/wiki/org/meetings/2018Rome/Notes/FutureTorNetworkProtocolUpgrades

I wanted to give a bit of background and some historical perspective
about datagram transports for Tor, as well as explain those notes in
long form, to get everybody on the same page about this idea. With
the forthcoming IETF standard ratification of QUIC along with several
solid reference implementations (canonical list:
https://github.com/quicwg/base-drafts/wiki/Implementations), I believe
we are close to the point where we can finally put together a plan (and
a funding proposal) to undertake this work.

Consider this mail a pre-proposal to temperature check and solicit early
feedback about a Tor-over-QUIC deployment, before we invest the effort
to deep dive into the framing, protocol, and implementation details that
will be necessary for a full proposal.


# Historical Background

So, first: Tor has been considering UDP transports for a long time.
There have been a series of research papers on the topic, dating back to
Proposal 100 -- the very first Tor proposal:
https://gitweb.torproject.org/torspec.git/tree/proposals/100-tor-spec-udp.txt

That proposal only examines adding an unreliable DTLS link between
relays, to enable transmission of unreliable (UDP) traffic end-to-end.
Despite being marked as Dead, it is a good conceptual starting point,
because it examines the relay-to-relay link crypto properties we need
from a datagram layer and has some good comments at the bottom.

However, adding the ability to simply carry UDP is not sufficient. What
we have always really wanted is to transition the network to some form
of end-to-end drop-based reliable congestion and flow control. There are
numerous benefits from such a transition.

Drop-signaled end-to-end congestion control means that instead of having
to maintain arbitrary queues at relays (to make room for Tor's fixed
SENDME window sizes for every single Tor circuit -- which as we have
seen does not scale and is vulnerabe to DoS and OOM attacks), we can
instead use a global fixed-length queue at each relay, and have each
relay simply drop packets when this queue is full. This behavior mirrors
how the internet works, and will not only put an upper bound on the
end-to-end queuing latency of the Tor network, but it will also allow us
to better ensure fairness, eliminate OOM conditions, and mitigate DoS
and congestion attacks. QUIC's stream management and byte-acked teardown
mechanisms will also eliminate a class of sidechannel issues in Tor
where an adversary gets to send ignored data on invalid and/or recently
closed streams: https://trac.torproject.org/projects/tor/ticket/25573

Furthermore, the switch to a secure datagram link protocol will
eliminate RST spamming attacks, where an adversary spoofs RSTs to/from
relay IP addresses to tear down our TLS links to gain information from
the resulting circuit failures and retries across the network.

Prior to the advent of QUIC, we studied many options for achieving
end-to-end congestion control, but have found all of them to be lacking
in one way or another. In 2011, Steven Murdoch did a literature review
of these approaches:
https://research.torproject.org/techreports/datagram-comparison-2011-11-07.pdf

Section 4 of that review is the most useful thing to skim -- it explains
why we found each of the end-to-end transport modes available at the
time to be either insufficient for Tor's use, or encumbered by licensing
or engineering incompatibilities.


# QUIC Overview

QUIC, however, is different. It basically serves as a complete drop-in
replacement for Tor's n:1 streams-on-circuit connection model. QUIC has
a connection layer, which corresponds to Tor's circuit concept, and a
framing layer, which supports multiplexing for multiple reliable streams
inside a single connection (corresponding to Tor's streams).

Each QUIC connection provides an outer layer of drop-based congestion
control. Unlike latency-based congestion control like bittorrent's utp,
which was designed to compensate for poor queuing management at edge
routers (as well as yield to TCP flows), QUIC's drop-based congestion
control is designed to find the common bottleneck anywhere on the
connection's path, and use packet drops from that queue to signal
congestion and ensure fairness among connections at that bottleneck
point. QUIC also provides optimizations to handle higher latency and
drop-heavy links (SACK, SACK ranges, early retransmit, reordering
tolerance, RTT estimates):
https://quicwg.github.io/base-drafts/draft-ietf-quic-recovery.html

Additionally, QUIC provides flow control at the connection level, as
well as at the individual stream level. This is analogous to Tor's flow
control model, but unlike Tor, QUIC implementations can adjust these
window sizes automatically based on RTT measurements and congestion
feedback:
https://datatracker.ietf.org/doc/html/draft-ietf-quic-transport#section-3.4
https://datatracker.ietf.org/doc/html/draft-ietf-quic-transport#section-11
https://docs.google.com/document/d/1F2YfdDXKpy20WVKJueEf4abn_LVZHhMUMS5gX6Pgjl4

It is perhaps no small coincidence that QUIC's architecture fits so
perfectly into Tor's circuit model. While at an IETF meeting years ago,
an attendee quietly whispered in my ear, "You know, we're designing IETF
QUIC with Tor-like networks in mind." I always love me a good prospiracy
(https://www.urbandictionary.com/define.php?term=prospiracy ;).


# Using QUIC in Tor

While QUIC does indeed solve a great many problems for Tor out of the
box, the path to Tor-over-QUIC requires that we make a few design and
engineering decisions specific to Tor.

There are two research implementations that study using Tor-over-QUIC:
Quux and QuTor. Quux was implemented by wrapping Chromium's C++ QUIC
implementation in C bindings:
https://www.benthamsgaze.org/2016/09/29/quux-a-quic-un-multiplexing-of-the-tor-relay-transport/
https://www.benthamsgaze.org/wp-content/uploads/2016/09/393617_Alastair_Clark_aclark_2360661_860598830.pdf

QuTor is unfortunately paywalled at researchgate, so it is hard for me
to tell what they did:
https://www.researchgate.net/profile/Raik_Aissaoui/publication/292392094_QUTor_QUIC-based_Transport_Architecture_for_Anonymous_Communication_Overlay_Networks/links/56ae170008ae43a3980e6890.pdf?origin=publication_list

Unfortunately, the Quux implementation appears to use QUIC at a
suboptimal position -- they replace Tor's TLS connections with QUIC, and
use QUIC streams in place of Tor's circuit ID -- but only between
relays. This means that it does not benefit from QUIC's end-to-end
congestion control for the entire path of the circuit. Such a design
will not solve the queuing and associated OOM problems at Tor relays,
since relays would be unable to drop cells to signal backpressure to
endpoints. Drops will instead block every circuit on a connection
between relays, and even then, due to end-to-end reliability, relays
will still need to queue without bound, subject to Tor's current (and
inadequate) flow control.

The optimal design is to use QUIC end-to-end. (Based on the researchgate
abstract, it looks like this might be what QuTor did, but it is hard to
tell.) Instead of replacing TLS with QUIC, we should use an unreliable
link encryption protocol between relays (such as DTLS), and replace
Tor's circuits with QUIC connections. In this design, Tor applications
connect to the local SOCKS port using TCP, and the tor client converts
it to a QUIC stream on a QUIC connection-circuit that runs all the way
to the exit or onion service, which then will convert the stream back to
TCP. The exit/service keeps buffering to a minimum by using QUIC
pushback on the Tor side, and TCP pushback on the server side.

While IETF QUIC provides its own crypto layer based on TLS 1.3, we won't
really need this encryption. Instead, we will layer QUIC inside of Tor's
layered onion encryption, with an ntor handshake to each hop in the
path.


# QUIC Deployment Considerations

Past the above high-level decision of how to use QUIC, a handful of
deployment considerations remain.

## Implementation

As mentioned previously, there are a growing number of QUIC
implementations out there, in several languages, with varying degrees of
IETF compatibility:
https://github.com/quicwg/base-drafts/wiki/Implementations

For our purposes, the most interesting ones are ngtcp2 (native C
implementation, BoringSSL, MIT licensed), picoquic (C, MIT license), and
mozquic (C++ with C bindings, NSS, MPL licensed). The QUIC in Chromium
is not IETF compliant, and is C++ with no C bindings. Sadly Rust
implementations of QUIC are not complete. Google does also maintain a Go
QUIC, which is complete. This suggests that experimental designs using
one of the golang Tor implementations might be worthwhile.

I'm told that the IETF hopes to finalize the QUIC standard within the
year.

## Framing

QUIC actually differentiates between connection-level packet sizes and
frame sizes. If we want to preserve Tor's wire behavior, we will want to
enforce a one-frame-per-packet policy from our QUIC library, and use
QUIC's padding frames to make packet sizes uniform.

## Upgrade Path

The upgrade path to QUIC will require some thorough thinking -- if all
relays on a path don't yet support datagram links, should we embed QUIC
inside an opaque RELAY cell type, or should wait for a flag day to
switch over once sufficient relays have upgraded? Embedding QUIC in
opaque RELAY cells seems like the natural choice, but will require
additional engineering considerations.

Additionally, a small and declining percentage of networks currently
block all UDP other than DNS. At last measurement, the rate of QUIC
failure was below 5%. However, like Google, we will likely want to
retain the ability to fall back to TLS over TCP for these situations.

## Signaling

Because we need to transition to unreliable relay links to get the most
out of QUIC, we will need to figure out how to handle circuit setup --
there is currently no mechanism to ensure reliability for onionskins in
Tor, since this is provided by TLS today. The simplest solution is to
retain TLS between relays to transmit circuit extend cells out-of-band.
Padding could be used to obscure timing information of this signaling.

The alternative would be to implement some onionskin-level acking
between relays, so that the datagram transport could be used for all
traffic. However, since our deployment path will likely involve
migration from TCP anyway, and since a small (and shrinking) percentage
of networks simply block all non-DNS traffic, we will likely want to
retain TCP connections for some time, making out-of-band signaling a
natural first choice.

## Native UDP Support

Unfortunately, during the standardization process, the IETF QUIC decided
to drop support for unreliable datagram streams inside the QUIC
connection, to simplify the layering of their implementation. However,
because we will still need to layer QUIC inside of an onion layer, we
can use that layer to carry either QUIC or UDP (at the expense of an
additional field to differentiate this).


# QUIC Research Questions

A couple areas of open research into how QUIC will behave in a Tor
deployment remain. It is my belief that while these areas are worth
further study, they do not represent blockers to QUIC, as a naive
deployment of QUIC will substantially outperform our current (lack of)
congestion and poor flow control.

## End-to-end Congestion Control Study

Unfortunately, it appears that the Quux QUIC paper studied QUIC at the
wrong position - between relays, and the QuTor implementation is
unclear. This means that it may still be an open question as to if
QUIC's congestion control algorithms will behave optimally in a Tor-like
network under heavy load.

However, Google has conducted extensive internet-scale research into the
congestion control algorithm, which very likely covers enough networks
with Tor-like latency and drop characteristics. Because of this, I do
not expect us to have to do a lot of tuning of their congestion control
here, but it is worth investigating.

## Queue Management

Fairness among flows in drop-based congestion control is heavily
influenced by how the queue is managed, and particularly by the decision
of which packet to drop when the queue is full. Modern favored
algorithms involve probabilistic early dropping from the middle of the
queue. The most popular approaches are the adaptive variants of RED and
Blue:
https://en.wikipedia.org/wiki/Random_early_detection
https://en.wikipedia.org/wiki/Blue_(queue_management_algorithm)

The simplest possible queue management design would be to set Tor's
queues to 0, and allow the OS kernel or upstream router to apply its
drop strategy. While this will be sufficient to ensure backpressure under
congestion, it is not optimal for fairness.

To enforce fairness, Tor would want to take information about QUIC
connection IDs into account when dropping packets, similar to what is
done with circuit ids by EWMA currently. This information will be
unavailable to the kernel and upstream routers, due to the DTLS link
encryption that will conceal the multiplexing of QUIC connections on
connections to relays. Moreover, ensuring that sufficient queuing
happens inside the Tor daemon (as opposed to the kernel) will require an
aggressive socket-reading strategy. Tor's single-threaded nature will
make this difficult -- to ensure that queuing happens inside the Tor
daemon, packets must be continuously read from the kernel with minimal
processing delay or delay from other events. Even in this case,
optimality will still depend upon the bottleneck being Tor, and not in
the kernel or in an upstream internet router. Thus, optimal queue
management in Tor for QUIC remains an open research and engineering
problem.

(However, this is not really a deployment blocker. The deployment of
QUIC on the internet has similar shortcomings -- the vast majority of
internet routers also do not take QUIC connection ID into account when
making UDP drop decisions, and yet the protocol still outperforms TCP in
link utilization and drop recovery).

## Anonymity

Since a Tor with bounded-length queues will have more predictable
latency characteristics, a faster Tor based on QUIC may be more
vulnerable to inferences about location that can be made by connection
latency. Particularly, middle nodes may be able to tell how far the
client is from the guard by measuring RTT. One solution to this may be
adding delay in cases where RTT could be inferred. Another solution is
to have two middle nodes for four hop paths, so that it is no longer
clear which middle node you are in a QUIC circuit.

There may be other attacks that come out of using end-to-end congestion
control, but I am personally having a hard time coming up with cases
where such side channels and resource starvation conditions could be any
worse than those that already exist, or are unique to the OOM and DoS
conditions currently possible due to Tor's lack of congestion control.
--
Mike Perry
Fabio Pietrosanti - Lists
2018-03-26 09:52:46 UTC
Permalink
I would like to add and advocate that, whatever a network protocol
upgrade will be done, Tor should starts supporting NAT traversal for
it's relay, enabling users to contribute also without a "public ip
address" .

Enabling NATted users to become Tor Relay would increase the baseline of
contributors.


Protocols for NAT Traversal are used since +10 years in VoIP network
stacks successfully:

ICE - Internet Connectivity Establishment

https://www.ietfjournal.org/interactive-connectivity-establishment/

NAT Traversal Practices for Client-Server SIPĀ 

https://tools.ietf.org/html/rfc6314

Fabio
Post by Mike Perry
In Rome, I held a session about network protocol upgrades. My intent was
to cover the switch to two guards, conflux, datagram transports, and
QUIC. We ended up touching only briefly on everything but QUIC, but we
went into enough depth on QUIC itself that it was a worthwhile and very
productive session.
https://trac.torproject.org/projects/tor/wiki/org/meetings/2018Rome/Notes/FutureTorNetworkProtocolUpgrades
Rob Jansen
2018-03-27 16:37:33 UTC
Permalink
Thanks for the detailed write-up Mike! Theoretically, moving to QUIC seems great; it seems to solve a lot of problems and has enough advantages that we could just run with it.

I'll reiterate some of my primary concerns that I gave in Rome:

- I think it would be a mistake to push forward with such a significant change to Tor's transport layer without significant testing and experimentation. We have tools that allow us to do full-network-sized experiments (Shadow), and we have interested researchers that want to help (including me).

- However, I am much less optimistic than you that it will just work and instantly improve performance. You mention that Google has done lots of tests, but my guess is they test in environments that look like the Internet - i.e., fast core and slow edges. Do we know how it would perform when the path contains additional 6 edges sandwiching 3 potentially low bandwidth Tor relays? Tor is a significantly different environment than the Internet; for example, an end-to-end congestion signal in Tor will take orders of magnitude longer to reach the client than in traditional networks.

- Because of the above, I'm not sure that an end-to-end design is the right way to go. As I mentioned, we have simulators and researchers, so we should be able to learn more and make a more informed decision before committing to a design that will be difficult to change later.

- We should be sure to pay close attention to how this will affect emerging networks and applications, e.g., mobile devices and onion services.

- The DoS attacks will change form, but I don't think they will disappear. I think it would be wise to understand how DoS might change, which is much easier once we have a design to analyze. Your summary helps with that.

I think it would be worth including R&D effort to investigate these issues in any proposal that gets written.

Cheers,
Rob
Post by Mike Perry
In Rome, I held a session about network protocol upgrades. My intent was
to cover the switch to two guards, conflux, datagram transports, and
QUIC. We ended up touching only briefly on everything but QUIC, but we
went into enough depth on QUIC itself that it was a worthwhile and very
productive session.
https://trac.torproject.org/projects/tor/wiki/org/meetings/2018Rome/Notes/FutureTorNetworkProtocolUpgrades
I wanted to give a bit of background and some historical perspective
about datagram transports for Tor, as well as explain those notes in
long form, to get everybody on the same page about this idea. With
the forthcoming IETF standard ratification of QUIC along with several
https://github.com/quicwg/base-drafts/wiki/Implementations), I believe
we are close to the point where we can finally put together a plan (and
a funding proposal) to undertake this work.
Consider this mail a pre-proposal to temperature check and solicit early
feedback about a Tor-over-QUIC deployment, before we invest the effort
to deep dive into the framing, protocol, and implementation details that
will be necessary for a full proposal.
Mike Perry
2018-03-28 19:23:41 UTC
Permalink
Post by Rob Jansen
Thanks for the detailed write-up Mike! Theoretically, moving to QUIC
seems great; it seems to solve a lot of problems and has enough
advantages that we could just run with it.
- I think it would be a mistake to push forward with such a
significant change to Tor's transport layer without significant
testing and experimentation. We have tools that allow us to do
full-network-sized experiments (Shadow), and we have interested
researchers that want to help (including me).
I am not trying to argue against doing the research. My goal was to make
enough of a case for QUIC that we can begin work on an implementation
and tune it as we study it, or at least identify the minimum set of
experiments that are needed before we could commit to such a path.

I expect us to have to tune things like queue length, queue management,
slow start, reordering parameters, drop recovery strategies, and the
backoff rates when drops happen. QUIC is also extensible enough such
that things like explicit congestion notification and link capacity
estimates can be used to try to avoid drops (though we would need to do
this at the onion layer rather than the QUIC layer, because intermediate
relays will not be able to add in ECN information in-band in our use of
QUIC, due to onion crypto):
https://tools.ietf.org/html/draft-johansson-quic-ecn-00

Because of this flexibility, I would be very surprised to discover that
QUIC proves impossible to tune in order to outperform our current lack
of congestion control.
Post by Rob Jansen
- However, I am much less optimistic than you that it will just work
and instantly improve performance. You mention that Google has done
lots of tests, but my guess is they test in environments that look
like the Internet - i.e., fast core and slow edges. Do we know how it
would perform when the path contains additional 6 edges sandwiching 3
potentially low bandwidth Tor relays? Tor is a significantly
different environment than the Internet; for example, an end-to-end
congestion signal in Tor will take orders of magnitude longer to
reach the client than in traditional networks.
In drop-based congestion control, the duration of how long the drop
signal takes to reach the client is not a function of where the drop
happens. It is a function of the total RTT of the path. A drop early on
the path takes just as long to discover as one burried in the middle.

As a result, higher RTT latency does impact drop-based schemes quite
heavily (and the higher the drop rates, the worse this gets), but Tor's
latency is only orders of magnitude greater than the internet because of
queuing. If our queues can be bounded, then the latency multiplier
should be proportional to the number of Tor hops (and the average
physical distance of these paths).
Post by Rob Jansen
- Because of the above, I'm not sure that an end-to-end design is the
right way to go. As I mentioned, we have simulators and researchers,
so we should be able to learn more and make a more informed decision
before committing to a design that will be difficult to change later.
I suppose that before we undertake or commit to a full implementation, a
couple of basic experiments could inform us as to if Tor's latency and
drop characteristics might severely impact vanilla QUIC performance.

1. What drop rates do fully-utilized QUIC networks tend to see? Are
QUIC's backoff and recovery properties sufficient such that packet loss
will remain reasonable under heavy use? Is this drop rate a function of
the number of concurrent QUIC connections or other network properties?
(I bet this information is known by groups studying QUIC and similar
congestion control schemes, but I am not finding it with casual
searching. TCP folk lore says "drop rate increases as concurrent
connections increases" but I can't find concrete relationships.)

2. Given the above information about the level of drop rates that
fully-utilized QUIC networks see under what circumstances, we can then
conduct an experiment to inform us of what fairness and goodput look
like under various link latencies with these drop rates.

#2 will inform us about whether QUIC is acceptable as-is, or if we would
need to explore ECN or other non-drop based congestion signals.

We will need to be careful while conducting these experiments, though. I
found a few research papers on QUIC, but nearly all of them state
limitations wrt varying aspects of the protocol being disabled or
enabled depending on Chromium/QUIC version (presumably due to whatever
experiments Google was conducting at the time).

Additionally, it looks like many of the QUIC implementations do not
implement all (or any) of the drop detection and recovery strategies
mentioned in the spec, and even the Google implementation goes back and
forth on things like FEC. So we will need to be careful to test what we
intend to use.
Post by Rob Jansen
- We should be sure to pay close attention to how this will affect
emerging networks and applications, e.g., mobile devices and onion
services.
- The DoS attacks will change form, but I don't think they will
disappear. I think it would be wise to understand how DoS might
change, which is much easier once we have a design to analyze. Your
summary helps with that.
I agree that DoS will change form. But, the key draw for me is that
instead of having DoS attacks that kill the circuit or even bring down
the relay due to OOM (which provides a strong, clear signal to the
adversary and enables Sniper attacks), DoS attacks will become
congestion attacks that slow down service at specific bottlenecks, which
I believe that conflux can further mitigate by dynamically avoiding
them.
Post by Rob Jansen
I think it would be worth including R&D effort to investigate these
issues in any proposal that gets written.
Cheers,
Rob
Post by Mike Perry
In Rome, I held a session about network protocol upgrades. My intent was
to cover the switch to two guards, conflux, datagram transports, and
QUIC. We ended up touching only briefly on everything but QUIC, but we
went into enough depth on QUIC itself that it was a worthwhile and very
productive session.
https://trac.torproject.org/projects/tor/wiki/org/meetings/2018Rome/Notes/FutureTorNetworkProtocolUpgrades
I wanted to give a bit of background and some historical perspective
about datagram transports for Tor, as well as explain those notes in
long form, to get everybody on the same page about this idea. With
the forthcoming IETF standard ratification of QUIC along with several
https://github.com/quicwg/base-drafts/wiki/Implementations), I believe
we are close to the point where we can finally put together a plan (and
a funding proposal) to undertake this work.
Consider this mail a pre-proposal to temperature check and solicit early
feedback about a Tor-over-QUIC deployment, before we invest the effort
to deep dive into the framing, protocol, and implementation details that
will be necessary for a full proposal.
_______________________________________________
tor-dev mailing list
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
--
Mike Perry
Mike Perry
2018-04-02 10:50:48 UTC
Permalink
Post by Mike Perry
In Rome, I held a session about network protocol upgrades. My intent was
to cover the switch to two guards, conflux, datagram transports, and
QUIC. We ended up touching only briefly on everything but QUIC, but we
went into enough depth on QUIC itself that it was a worthwhile and very
productive session.
https://trac.torproject.org/projects/tor/wiki/org/meetings/2018Rome/Notes/FutureTorNetworkProtocolUpgrades
# QUIC Research Questions
A couple areas of open research into how QUIC will behave in a Tor
deployment remain. It is my belief that while these areas are worth
further study, they do not represent blockers to QUIC, as a naive
deployment of QUIC will substantially outperform our current (lack of)
congestion and poor flow control.
## End-to-end Congestion Control Study
Unfortunately, it appears that the Quux QUIC paper studied QUIC at the
wrong position - between relays, and the QuTor implementation is
unclear. This means that it may still be an open question as to if
QUIC's congestion control algorithms will behave optimally in a Tor-like
network under heavy load.
However, Google has conducted extensive internet-scale research into the
congestion control algorithm, which very likely covers enough networks
with Tor-like latency and drop characteristics. Because of this, I do
not expect us to have to do a lot of tuning of their congestion control
here, but it is worth investigating.
## Queue Management
Fairness among flows in drop-based congestion control is heavily
influenced by how the queue is managed, and particularly by the decision
of which packet to drop when the queue is full. Modern favored
algorithms involve probabilistic early dropping from the middle of the
queue. The most popular approaches are the adaptive variants of RED and
https://en.wikipedia.org/wiki/Random_early_detection
https://en.wikipedia.org/wiki/Blue_(queue_management_algorithm)
An IETFer mailed me and pointed out that the IETF now recommends
PIE or FQ-CoDel over RED/BLUE, since they explicitly control
queue latency:
https://tools.ietf.org/html/rfc8033
https://tools.ietf.org/html/rfc8290

FQ-CoDel is appealing for us because it breaks the queue into separate
per-flow queues, and has mechanisms to favor quieter flows over busy
ones when deciding which packets to send first, and which to drop. It
sounds quite promising for getting similar properties out of a QUIC-Tor
as we have with Circuit EWMA scheduling.
--
Mike Perry
Loading...