[tor-dev] Connection, Channel and Scheduler

Discussion:

[tor-dev] Connection, Channel and Scheduler - An Intense Trek

David Goulet

2017-10-30 19:57:04 UTC

Hello everyone!

DISCLAIMER: The following is enormous and tries to describe in some level of
details the situation in tor with connection<->channel<->scheduler. This comes
after we've merged the KIST scheduler, we've realized many things we'ren't what
they were suppose to be or meant for. In the end, I'm asking questions so we
can move forward with development and fixing things.

Last thing before you start your journey in the depth of Tor, the 3 subsystems
I'm going to talk about and how they interact are kind of very complicated so
it is very possible that I might have gotten things wrong or miss some details.
Please, point them out so we can better document, better be informed and make
good decisions. I plan to document as much as I can from this process for a new
file in torguts.git repository.

This is part of the work from ticket #23993.

Enjoy!

== Part One - The Distant Past ==

Once upon a time, Tor had 0.2.3 years old. It was before the "big change" and
it was using connections and circuits in a simplistic but yet clever way. I'll
briefly describe here how it worked out and why we wanted to change it.

Roughly, once a cell comes in, it is put in the inbound buffer called "inbuf"
of a connection_t and it is processed. Then, if it needs to be relayed that is
sent along the circuit it came on, it is put in that circuit queue. Each
connection_t had a priority queue of "active circuits" using the EWMA policy to
prioritize. The first circuit was to be flushed of a certain amount of cells
onto the connection_t outbound buffer or "outbuf" where libevent main loop
takes over to actually write the bytes on the network from the outbuf. Then,
the circuit pqueue is updated and we would move on. The following is a high
level diagram of a relayed cell:

+--------------+ +-----------+ +----------------+
|connection_t A| | circuit_t | | connection_t B|
| (inbuf) |--->| (pqueue) |--->| (outbuf) |---> Network
+--------------+ +-----------+ +----------------+

That had at least one big problem. Not considering the entire set of circuits
could make one connection (with some circuits on it) bloat the network link of
the relay. It would actually be worst than that if it had bandwidth limitation,
you want to pick the connection that has the highest priority for the circuit
on it to flush

All in all, tor needed to move to a logic where _all_ circuits are considered
using the EWMA policy, flushing the cell(s) of that picked circuit on the
connection outbuf so it is prioritized on the network.

Then in 0.2.4, everything changed!

== Part Two - The Big Change ==

Two new abstractions were added to Tor 0.2.4 (#6465), channel and circuitmux
(short for circuit multiplexer or multiplexing, history is unclear ;). Up until
this day, this is what tor is using.

The channel_t object was intended to provide a "transport" abstraction layer
for relay to relay connection which is reprensented by a connection_t object.
Each channel has a circuitmux_t object that encapsulates the queuing logic and
selection policy of circuits of a channel. In other words, the circuits that
goes through that connection are listed in that object so we can do cell
prioritization.

To summarize, this roughly looks like this now:

+------------+
|connection_t|
+------------+
|
v 1
+---------+
|channel_t|
+---------+
|
v 0..n
+---------+
|circuit_t|
+---------+

Channels are suppose to work as a protocol abstraction on that connection. The
original intent of this was to be able to implement more easily different
protocl like for instance helping researchers to experiment with other things
more easily. Decoupling TLS from connection_t and putting them in a subclass of
channel_t and thus was born channel_tls_t. It implements the channel_t
interface and is in theory designed to handle TLS layer of relay to relay
connection (handshake, cells, etc). So far, tor only has that channel class.

Now that we had this layer and a way to multiplex circuits on a single
connection through the use of channel, we needed one last piece, a way to
prioritize cells on the network considering all circuits. Along came the "cell
scheduler" (scheduler.c) in Tor 0.2.6 and now called the Vanilla scheduler.

In a nutshell, libevent main loop calls the scheduler at a constant rate and
the subsystem will consider all channels between each other comparing circuits
priority using the EWMA policy (an interface was created to implement more
policies but I won't get into that in this email).

So now, tor finally had a good infrastructure to prioritize cells on the
network by looking at all circuits at once. TCP connections would be
established between relays, the channel layer would be handling the TLS
business and the scheduler would be flushing cells on the network by
considering all circuits on all channels. It is a really nice modularized
design and would make tor improve in performance.

But, that's not entirely the reality of things right now.

== Part Three - The Thruth ==

In 2014, a paper came out titled: Never Been KIST: Torâs Congestion Management
Blossoms with Kernel-Informed Socket Transport
(http://www.robgjansen.com/publications/kist-sec2014.pdf). An improved way for
tor to scheduled data transmission properly by asking the kernel information
about the state of the TCP buffers so tor could flush data on the network
without bloating the local link.

Then in 2016 up to now, Matthew Traudt implemented KIST in tor which required
an important refactoring of the scheduler subsystem in order to make it work
with different multiple scheduling type. It is today the Scheduler option in
torrc. And all 0.3.2 tor uses KIST by default. Many things were discovered
during that piece of work by Matt. For instance, this very important #20459 bug
for which we were badly comparing cmux policy thus ultimately prioritizing
wrongly our circuits.

But fundemental issues in the design of channels and the scheduler were found
for which I'll list some here and will ultimately lead me to Part Four of this
email about our next steps moving forward.

1. Channels implemented cell queues that weren't really queues.

If you look at the channel_t object, you'll notice "incoming_queue" and
"outgoing_queue" which basically allow the transition of a cell from a
connection_t inbuf to the circuit queue and then to the connection_t oubuf. It
looks like this for a relayed cell:

conn inbuf -> chan in_queue -> circ queue -> chan out_queue -> conn outbuf

The idea behind this is to quickly empty the conncetion inbuf to an
intermediary object (channel) leaving the kernel inbound TCP queue as much room
as we can. In other words, offloading the kernel buffer to user space since we
need to run some processing on that data and there was no need to leave that
data on the kernel buffer during that time.

Then from the channel, a cell would be put on the circuit queue and in turn
would be process by the scheduler. The scheduler would select a circuit (EWMA
policy), flush cells from its queue onto the channel outgoing queue and then it
would be flushed to the connection outbuf which is taken care of by libevent to
write on the socket (to the network).

One could argue here that in terms of design we have too many queues but that
would be true if the channel queues were actually acting like queues. It turns
out that the incoming and outgoing queues of a channel are always empty
(#23709) and this is due to the fact that the code queues a cell then processes
it immediately meaning either put it on the circuit queue or connection outbuf.

Each cell is memcpy() over the a channel queue and then processed right away
just to be copied again into the circuit queue or outbuf. This has a
performance cost over time as a relay sees thousands of cells a second. But
also it makes the channel subsystem much more complicated in terms of code with
trying to handle those queues every part of the way in and out of the channel
subsystem. But the worst part is that the scheduler subsystem assumed some
decisions based on the outgoing queue size. And when it is always 0, issue
arise like #23676.

2. DESTROY cells handling

Within a circuitmux object, there is a "destroy cell queue" on which a DESTROY
cell is put in for one of the circuit on the cmux. An important thing for tor
is that when it needs to send a DESTROY, it needs to _stop_ sending any queued
cell on that circuit, dump them and only send the DESTROY cell.

Many things are problematic currently. For starter, sending a DESTROY cell
bypasses the scheduler (#23712). Tor will immediately try to flush the cell on
the network *if* the channel doesn't have queued *writes* which means if the
connection outbuf is empty. If not, once it is sufficiently drained, the
channel will be scheduled in and the DESTROY cell finally sent. I've observed
this process going from 0.5 seconds up to 10 seconds on a normal relay if the
outbuf has data on it.

Second, because of this concept of different queue for the DESTROY cell, tor
will back and forth between that queue and the normal queue of a circuit.
Remember, that the destroy queue is *not* per circuit but per circuitmux. This
is used by the "cmux->last_cell_was_destroy" which is set to 1 if the last cell
the cmux handled was a DESTROY or 0 if not.

This has a side effect. Before sending a DESTROY cell, the circuit queue is
cleared out without changing the circuit EWMA priority, in order to make sure
we don't send cells from that point on. However, because we back and forth, the
EWMA policy will pick the right channel based on the overall state of the
circuits (see scheduler_compare_channels()) on it but will not prioritize the
circuit with the DESTROY cell on using that policy. Furthermore, the destroy
cell queue is a FIFO so if 10 DESTROY cells were queued bu th 10th one is
actually on the circuit with the highest priority, we won't process it until
those 9 previous who came in before are flushed.

3. Connection and Channel decoupling is kind of a lie

The connection_t object is bound to be a TCP connection except for DNSPort
which will be a UDP. That object also has a "TLS connection state" (tor_tls_t)
and a channel_tls_t object (chan).

First, it turns out that the connection subsystem will establish the TLS
session and not the channel tls layer. When opening a circuit, it looks like
this:
channel_connect_for_circuit()
|- channel_connect()
|- channel_tls_connect()
|- connection_or_connect()
|- connection_tls_start_handshake()

Second, when processing a cell from the connection inbuf in
connection_or_process_cells_from_inbuf(), we'll directly call
channel_tls_handle_cell(). The channel_tls_t layer handles PADDING, VERSIONS,
CERTS, AUTH_CHALLENGE, AUTHENTICATE and NETINFO cells. The other cell types
will be queued on the corresponding circuit queue.

Third, once the TLS session has been established, a VERSIONS cell can be sent
which is done by the connection subsystem with connection_or_send_versions().
But it is bypassing the channel_tls_t layer that processes them and should
write them using channel_tls_write_cell_method(). Futhermore, the channel_tls_t
layer is using the connection layer to send those cells using
connection_or_send_certs_cell() and cie.

My point here is that the channel abstraction is kind of violated in many ways
by either directly calling the TLS channel layer or not using it to send
specific cells that are for the TLS handshake. And also, the TLS state is baked
in the connection subsystem. This means that anyone wanting to implement a new
channel will need to do a major refactoring first which unfortunately leaves
this abstraction kind of pointless. And do not make the mistake to label the
channel subsystem as a "transport layer", it is at best a protocol layer,
connection_t handles transport, not channels.

4. Vanilla scheduler is not really a scheduler.

It turns out that the Vanilla scheduler is not really scheduling things but
rather shoving as much as it could onto the connection outbuf. This ain't ideal
because tor is selecting circuit based on a policy (EWMA) and when the circuit
is selected, we would push on the connection outbuf by either writing as much
as we can on the outbuf or stop at 1000 cells (see MAX_FLUSH_CELLS) which is
512 * 1000 bytes == 0.5MB. One can argue, is it too little to only write 0.5MB
per scheduler tick or actually too much?

So imagine tor flushed 1000 cells everytime it picks a circuit and then moves
on to the next circuit. I'm not sure this is playing well with a relay with
bandwidth limitation for instance. If you have 100 circuits, and 100KB to
spend, you might not want to spend it all on one single circuit?

Also, because this scheduler runs as often as possible, only one channel was
considered everytime. Matt's experiment showed that in practice which means
that a scheduler that is only considering one channel doesn't make good
priority decision because there is none in the first place.

Finally, when trying to bloat the outbuf as much as possible, and for that I'm
not sure how libevent operates, but it leaves the decision of when and what to
flush on the network to the process that handles outbuf. In other words, the
scheduler doesn't actually control when the data is written to the socket so
the scheduling decision aren't necessarly followed through on the transport
layer. This is something the scheduler and connection subsystem should be
thightly connected.

== Part Four - The Conclusion ==

Through this epic journey, we've discovered some issues as well as design
problems. Now the question is what should and can do about it?

In a nutshell, there are a couple of questions we should ask our selfves and
try to answer so we can move forward:

* I believe now that we should seriously discuss the relevance of channels.
Originally, the idea was good that is providing an abstraction layer for the
relay to relay handshake and send/process cells related to the protocol. But,
as of now, they are half doing it.

There is an important cost in code and maintanance of something that is not
properly implemented/finished (channel abstraction) and also something that
is unused. An abstraction implemented only for one thing is not really useful
except maybe to offer an example for others? But we aren't providing a good
example right now imo...

That being said, we can spend time fixing the channel subsystem, trying to
turn it in a nicer interface, fixing all the issues I've described above (and
I suspect there might be more) so the cell scheduler can play nicely with
channels. Or, we could rip them off eliminating lots of code and reducing our
technical debt. I would like us to think about what we want seriously because
that channel subsystem is _complicated_ and very few of us fully understands
it afaict.

Which would bring us back to (which is btw basically what we have now
considering the channel queues are useless):

conn inbuf -> circ queue -> conn outbuf

If we don't want to get rid of channel, the fixes are non trivial. For
starter, we have to decide if we want to keep the channel queue or not and if
yes, we need to almost start from square 1 in terms of testing because we
would basically introduce a new layer of queuing cells.

* Dealing with the DESTROY cell design issue will require a bit more tricky
work I think. We must not starve circuit with a DESTROY cell pending to be
sent else the other side keeps sending data. But we should also not starve
all the circuits because if we ever need to send a gazillion DESTROY cell in
priority, we'll make the relay useless (DoS vector).

The question is, do we trust our EWMA policy to be wise enough to pick the
circuit in a reasonable amount of time so we can flush the DESTROY cell from
the circuit queue? Or we really need to bypass or prioritize somehow that
cell in order to send them asap in order to avoid load on the network because
the other side of the circuit is still sending?

* In the short term, we should get rid of Vanilla scheduler because it
complefixies a lot the scheduler code by adding uneeded things to channel_t
but also bloated the scheduler interface with pointless function pointers for
instance. And in my opinion, it is not helping performance the way it is done
right now.

Cheers!
David

--
tU3F3BDoF3rwk9y3O+UAixfuPnil7VNkwl0LGe2cdpA=

teor

2017-11-01 03:28:03 UTC

Permalink

Post by David Goulet
* I believe now that we should seriously discuss the relevance of channels.
Originally, the idea was good that is providing an abstraction layer for the
relay to relay handshake and send/process cells related to the protocol. But,
as of now, they are half doing it.
There is an important cost in code and maintanance of something that is not
properly implemented/finished (channel abstraction) and also something that
is unused. An abstraction implemented only for one thing is not really useful
except maybe to offer an example for others? But we aren't providing a good
example right now imo...
That being said, we can spend time fixing the channel subsystem, trying to
turn it in a nicer interface, fixing all the issues I've described above (and
I suspect there might be more) so the cell scheduler can play nicely with
channels. Or, we could rip them off eliminating lots of code and reducing our
technical debt. I would like us to think about what we want seriously because
that channel subsystem is _complicated_ and very few of us fully understands
it afaict.

It depends what the goal of the channel layer is.

Do we seriously think we will use another protocol in place of TLS?

Even if we are serious about non-TLS connections, do we want to rip out
this code now, and write something better when we know what we
actually need?

Is the channel layer the right place to hide the differences between
TLS-over-IPv4 and TLS-over-IPv6?
(I don't think it is, but it's worth thinking about how much work it
was to add IPv6 support, and using that as a guide for how much work
it would be to add address/port/keys/etc. for another protocol.)

T

--
Tim / teor

PGP C855 6CED 5D90 A0C5 29F6 4D43 450C BA7F 968F 094B
ricochet:ekmygaiu4rzgsk6n
------------------------------------------------------------------------

Ian Goldberg

2017-11-01 11:31:50 UTC

Permalink

Post by teor

It depends what the goal of the channel layer is.
Do we seriously think we will use another protocol in place of TLS?

The channel layer has certainly been used fruitfully in the past for
experiments with other transports, such as UDP-based ones, QUIC-Tor,
etc. I would be a little sad to see it disappear completely.

--
Ian Goldberg
Professor and University Research Chair
Cheriton School of Computer Science
University of Waterloo

David Goulet

2017-11-01 13:09:36 UTC

Permalink

Post by Ian Goldberg

Post by teor

It depends what the goal of the channel layer is.
Do we seriously think we will use another protocol in place of TLS?

The channel layer has certainly been used fruitfully in the past for
experiments with other transports, such as UDP-based ones, QUIC-Tor,
etc. I would be a little sad to see it disappear completely.

So after Montreal meeting, I got access to QUIC-Tor code. And, as a
misconception of channels, they aren't about "transport" but "protocol".

Thus the QUIC-Tor code didn't even *touch* channels ;). Everything they did
had to be done mostly at the connection layer.

For some reasearch to experiement with channels, it would be basically a
research based on _removing_ TLS between relays. I'm not aware of such a thing
right now but I'm sure someone did poked at it for sure!

Cheers!
David

Post by Ian Goldberg
--
Ian Goldberg
Professor and University Research Chair
Cheriton School of Computer Science
University of Waterloo
_______________________________________________
tor-dev mailing list
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev

--
UPl9eGJV+i+xjpXGu3Z4MvZvCwpXqUVr4EtQNSNE19w=

Ian Goldberg

2017-11-01 13:18:28 UTC

Permalink

Post by David Goulet

Post by Ian Goldberg

Post by teor

It depends what the goal of the channel layer is.
Do we seriously think we will use another protocol in place of TLS?

The channel layer has certainly been used fruitfully in the past for
experiments with other transports, such as UDP-based ones, QUIC-Tor,
etc. I would be a little sad to see it disappear completely.

So after Montreal meeting, I got access to QUIC-Tor code. And, as a
misconception of channels, they aren't about "transport" but "protocol".
Thus the QUIC-Tor code didn't even *touch* channels ;). Everything they did
had to be done mostly at the connection layer.
For some reasearch to experiement with channels, it would be basically a
research based on _removing_ TLS between relays. I'm not aware of such a thing
right now but I'm sure someone did poked at it for sure!

Ah, bad example I guess. But some of my work in the past certainly has
used channels for this purpose.

David Goulet

2017-11-01 13:08:05 UTC

Permalink

Post by teor

Right, that is basically a very good question to try to answer. I think it is
totally conceivable to think about researchers willing to experiement there
and go with a Tor without TLS. At that point, channel could be useful but in a
state that actually is a proper abstraction (not the case right now). We would
need work to make it happen.

However, I do *NOT* see Tor moving away from TLS anytime soon. The network and
clients out there are all using that, moving to something else would mean dual
stack protocol, years of ramping up to network maturity (basically ntor vs tap
is a good example I think). But at that point, we'll have to do massive amount
of work in the channel/connection subsystem.

All in all, my answer is that "No, we aren't serious about moving away from
TLS but always possible".

Thus, in the end, this channel subsystem is really about letting researchers
play with it and not helping us developers do our job. There could be a gain
there of fixing it but would be one sided for now imo.

Post by teor
Is the channel layer the right place to hide the differences between
TLS-over-IPv4 and TLS-over-IPv6?

That would be the connection layer, handling 1 to 1 socket connection and
talking to the kernel.

SCTP-Tor for instance, would also be at the connection layer. That subsystem
in my opinion would benefit *greatly* for a nice interface but that is huge
amount of work.

Thus, I want to re-iterate that if we care about providing a nice abstraction
for researchers to do a better job, we have a broken one at the channel layer
and none at the connection layer which is actually the one that would be most
useful (QUIC, SCTP, UDP, ...). So even today we do *not* offer anything useful
to researchers imo.

Cheers!
David

Post by teor
(I don't think it is, but it's worth thinking about how much work it
was to add IPv6 support, and using that as a guide for how much work
it would be to add address/port/keys/etc. for another protocol.)
T
--
Tim / teor
PGP C855 6CED 5D90 A0C5 29F6 4D43 450C BA7F 968F 094B
ricochet:ekmygaiu4rzgsk6n
------------------------------------------------------------------------
_______________________________________________
tor-dev mailing list
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev

--
sGwH3ikKXaZjLVTHXKbL8U8r+dQrx5mLsny/C4ZarV4=

teor

2017-11-12 21:07:43 UTC

Permalink

Post by David Goulet
2. DESTROY cells handling
·
Within a circuitmux object, there is a "destroy cell queue" on which a DESTROY
cell is put in for one of the circuit on the cmux. An important thing for tor
is that when it needs to send a DESTROY, it needs to _stop_ sending any queued
cell on that circuit, dump them and only send the DESTROY cell.

Careful! I think this might be the opposite of what it needs to do.
If Tor wants to tear down a circuit, in normal circumstances it ought
to finish flushing the currently queued cells first. If it discards
the queued cells and only sends the destroy cell, then we end up with
missing data.

Sending a DESTROY cell after dropping data still tears down a circuit, but
(depending on the sender's position in the circuit) it tears it down with a digest
error. Which is probably not what we want.

That said, there may be no way to tell if the application-level data is complete
or not, so an error teardown may be appropriate.

T

David Goulet

2017-11-13 16:06:12 UTC

Permalink

This is *exactly* that Tor does right now, it clears the circuit queue
immediately when it is ready to send the DESTROY cell.

You (others) might want to double check that but hints (notice the clear cell
queue):

circuit_about_to_free():
if (circ->n_chan) {
circuit_clear_cell_queue(circ, circ->n_chan);
/* Only send destroy if the channel isn't closing anyway */
if (!CHANNEL_CONDEMNED(circ->n_chan)) {
channel_send_destroy(circ->n_circ_id, circ->n_chan, reason);
}

connection_edge_process_relay_cell():
Look for channel_send_destroy(), you'll notice a clear queue before.

Still bad... ?

Post by David Goulet
Second, because of this concept of different queue for the DESTROY cell, tor
will back and forth between that queue and the normal queue of a circuit.
Remember, that the destroy queue is *not* per circuit but per circuitmux. This
is used by the "cmux->last_cell_was_destroy" which is set to 1 if the last cell
the cmux handled was a DESTROY or 0 if not.

Yes, this part is definitely a mess.
I think we need to invent and design a better way to handle destroys --
getting the abstraction layer right between the "control" cells and the
"data" cells is definitely a hard problem, especially when both kinds
of cells end up being sent through the same mechanism.

Right. It is also unclear to me how much it will affect tor if we simply put
the DESTROY cell on the circuit queue and let the scheduler take care of it...
I would say we could have some cases where it will take a bit more time than
right now to send the DESTROY.

(Right now, it is sent right away bypassing the scheduler.)

I am not opposed to ripping out the channel abstraction.
You make a good case that it was never completed, and now it will be
harder to complete since it's been so long (and the person who designed
and built it won't be completing it). Also, if we are hoping to modularize
the Tor architecture to prepare it for component-by-component transitions
to Rust or some other language, then simplifying first is a good move.
I guess the other option is to keep it limping along and make a plan to
fix it and move to the right abstraction levels. That option would be best
if we have a particular new channel in mind that we want to add -- such
as the switch to udp transport that various research papers have proposed.

Right so UDP, see earlier post in the thread with Ian, has nothing to do with
channels :). It has to do with TLS cells.

At least last I checked though, udp transport implies user-space tcp
which is not a practical thing at our scale.
All of this said though, Nick is the Chief Architect, so I will defer
to his judgment on which approach will get us a great fast stable Tor
in the long term.

Post by David Goulet
* In the short term, we should get rid of Vanilla scheduler because it
complefixies a lot the scheduler code by adding uneeded things to channel_t
but also bloated the scheduler interface with pointless function pointers for
instance. And in my opinion, it is not helping performance the way it is done
right now.

Getting rid of the Vanilla scheduler is fine with me. I imagine the Kist
plan was to leave the Vanilla scheduler in place so there's something
to fall back to in case of bugs or design surprises. It might be wisest
to leave it in during 0.3.2, so Kist can stabilize, and plan to take it
out during 0.3.3 or 0.3.4 if Kist continues looking good.

Agree.

Thanks!
David

--Roger
_______________________________________________
tor-dev mailing list
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev

--
M72MGWsMq9KJ+hYLXg8sXrwfexA4QUqnNwWVOMxVBvM=

Nick Mathewson

2017-11-15 18:49:54 UTC

Permalink

Post by David Goulet
Hello everyone!
DISCLAIMER: The following is enormous and tries to describe in some level of
details the situation in tor with connection<->channel<->scheduler. This comes
after we've merged the KIST scheduler, we've realized many things we'ren't what
they were suppose to be or meant for. In the end, I'm asking questions so we
can move forward with development and fixing things.
Last thing before you start your journey in the depth of Tor, the 3 subsystems
I'm going to talk about and how they interact are kind of very complicated so
it is very possible that I might have gotten things wrong or miss some details.
Please, point them out so we can better document, better be informed and make
good decisions. I plan to document as much as I can from this process for a new
file in torguts.git repository.

Snipping the analysis, and going straight to the conclusions. I'll leave

Post by David Goulet
Many things are problematic currently

They sure are. :)

== Part Four - The Conclusion ==

Post by David Goulet
Through this epic journey, we've discovered some issues as well as design
problems. Now the question is what should and can do about it?
In a nutshell, there are a couple of questions we should ask our selfves and
* I believe now that we should seriously discuss the relevance of channels.
Originally, the idea was good that is providing an abstraction layer for the
relay to relay handshake and send/process cells related to the protocol. But,
as of now, they are half doing it.
There is an important cost in code and maintanance of something that is not
properly implemented/finished (channel abstraction) and also something that
is unused. An abstraction implemented only for one thing is not really useful
except maybe to offer an example for others? But we aren't providing a good
example right now imo...
That being said, we can spend time fixing the channel subsystem, trying to
turn it in a nicer interface, fixing all the issues I've described above (and
I suspect there might be more) so the cell scheduler can play nicely with
channels. Or, we could rip them off eliminating lots of code and reducing our
technical debt. I would like us to think about what we want seriously because
that channel subsystem is _complicated_ and very few of us fully understands
it afaict.
Which would bring us back to (which is btw basically what we have now
conn inbuf -> circ queue -> conn outbuf
If we don't want to get rid of channel, the fixes are non trivial. For
starter, we have to decide if we want to keep the channel queue or not and if
yes, we need to almost start from square 1 in terms of testing because we
would basically introduce a new layer of queuing cells.

So, this is the question I'm least sure about. Please take the following as
tentative.

I think that the two choices ("refactor channels" and "rip out channels")
may be less different than we think. Neither one is going to be trivial to
do, and we shouldn't assume that sticking everything together into one big
type will actually make the code _simpler_.

The way I think about the code right now, "channel" is an interface which
"connection_or" implements, and there is no meaningful barrier between
connection_or and channeltls. I _do_ like the idea of keeping some kind of
abstraction barrier, though: a "channel" is "whatever we can send and
receive cells from", whereas an "or_connection" has a lot of other baggage
that comes with it.

From my POV, we *should* definitely abolish the channels' queues, and
minimize the amount of logic that channels do on their own. I'm not sure if
we should rip them out entirely, or just simplify them a lot. I don't think
either necessarily simpler or less bug-prone than the other.

Perhaps we should sketch out what the new interface would look like? Or
maybe do an hour or two worth of exploratory hacking on each approach?

(This reminds me of another change I want someday, which is splitting
edge_connection_t into an "edge_connection" type that implements a "stream"
interface: right now, we have quite a few streams that aren't actually edge
connections, but which use the type anyway.)

* Dealing with the DESTROY cell design issue will require a bit more tricky

Post by David Goulet
work I think. We must not starve circuit with a DESTROY cell pending to be
sent else the other side keeps sending data. But we should also not starve
all the circuits because if we ever need to send a gazillion DESTROY cell in
priority, we'll make the relay useless (DoS vector).
The question is, do we trust our EWMA policy to be wise enough to pick the
circuit in a reasonable amount of time so we can flush the DESTROY cell from
the circuit queue? Or we really need to bypass or prioritize somehow that
cell in order to send them asap in order to avoid load on the network because
the other side of the circuit is still sending?

So, elsewhere in the thread, folks have been discussing whether a circuit
that's going to send a DESTROY cell should flush its pending cells first.

The answer is sadly, "it depends".

Case 1: Suppose Alice is downloading a webpage. Suppose we are the middle
relay and we lose our connection to the exit. It would be nice to keep
flushing the data we have towards Alice -- maybe. If she can use partial
data. But any data that Alice sends to us would be lost, so it would be
good if we had some way to tell Alice "stop sending please".

Case 2: Suppose Alice is uploading something to a webserver. Suppose we are
the middle relay and we lose our connection from Alice. In this case,
there's no point in sending any more data towards the webserver before we
send it a DESTROY cell. (Even if Alice was in the middle of a big upload,
she'll need to repeat any part of it that wasn't ACKed, since she won't
know what was received and what wasn't.)

Case 3: Suppose we hit our OOM killer. In this case, we had better discard
all the data on the circuit we're killing, or we're vulnerable to "sniper
attacks" again.

So it's clear that sometimes we should dump the data, and sometimes we
shouldn't. I think this is an independent question from what we're asking
here. (My own take is that solving case 1 right requires "RELAY_TRUNCATED"
cells, which I believe we don't implement today.)

What we're asking here is: how can we reintegrate DESTROY cells with the
rest of the scheduler logic?

I think that, from a priority POV, DESTROY cells are in a certain sense the
_opposite_ of traffic, and we might actually want to treat them differently
from data cells. Consider that if we have a choice between DESTROYing a
busy circuit or a quiet one, we will save more bandwidth by destroying the
busy circuit first, so that no more data is sent to us over it.

On the other hand, this doesn't mean that the FIFO structure we have today
is a good idea at all. It probably makes sense to use the same priority
queue-based scheduler thing that we use everywhere else, but possibly with
a different (inverted??) priority parameter for destroyed circuits.

One more thing to be aware of: the destroy_cell_queue exists in part
because we tear down circuits at the same time that we queue their destroy
cells. If we changed Tor so that "destroyed" circuits were kept around
somehow until their cells could be sent, then we'd be introducing a new
state to our state machine, to represent circuits that were schedulable but
not actually usable for traffic. We'd need to be careful to handle that
correctly: this kind of "unusable object that still exists" has caused us
problems before. (The solution I like best for avoiding this confusion is
to make it so the scheduler can schedule two types of "schedule-able"
things: circuits, and "pending destroy cells".)

I agree with Roger here: it's fine to throw away the vanilla scheduler, but
we should wait until KIST has been running unproblematically in a stable
release for a while. 0.3.4 seems like a good time for this.

--
Nick

David Goulet

2017-11-16 13:56:59 UTC

Permalink

[snip]

Post by Nick Mathewson

Post by David Goulet
Through this epic journey, we've discovered some issues as well as design
problems. Now the question is what should and can do about it?
In a nutshell, there are a couple of questions we should ask our selfves
* I believe now that we should seriously discuss the relevance of
channels. Originally, the idea was good that is providing an abstraction
layer for the relay to relay handshake and send/process cells related to
the protocol. But, as of now, they are half doing it.
There is an important cost in code and maintanance of something that is
not properly implemented/finished (channel abstraction) and also
something that is unused. An abstraction implemented only for one thing
is not really useful except maybe to offer an example for others? But we
aren't providing a good example right now imo...
That being said, we can spend time fixing the channel subsystem, trying
to turn it in a nicer interface, fixing all the issues I've described
above (and I suspect there might be more) so the cell scheduler can play
nicely with channels. Or, we could rip them off eliminating lots of code
and reducing our technical debt. I would like us to think about what we
want seriously because that channel subsystem is _complicated_ and very
few of us fully understands it afaict.
Which would bring us back to (which is btw basically what we have now
conn inbuf -> circ queue -> conn outbuf
If we don't want to get rid of channel, the fixes are non trivial. For
starter, we have to decide if we want to keep the channel queue or not
and if yes, we need to almost start from square 1 in terms of testing
because we would basically introduce a new layer of queuing cells.

So, this is the question I'm least sure about. Please take the following as
tentative.
I think that the two choices ("refactor channels" and "rip out channels")
may be less different than we think. Neither one is going to be trivial to
do, and we shouldn't assume that sticking everything together into one big
type will actually make the code _simpler_.
The way I think about the code right now, "channel" is an interface which
"connection_or" implements, and there is no meaningful barrier between
connection_or and channeltls. I _do_ like the idea of keeping some kind of
abstraction barrier, though: a "channel" is "whatever we can send and
receive cells from", whereas an "or_connection" has a lot of other baggage
that comes with it.
From my POV, we *should* definitely abolish the channels' queues, and
minimize the amount of logic that channels do on their own. I'm not sure if
we should rip them out entirely, or just simplify them a lot. I don't think
either necessarily simpler or less bug-prone than the other.
Perhaps we should sketch out what the new interface would look like? Or
maybe do an hour or two worth of exploratory hacking on each approach?

I tend to agree here with you. I did the exercise already to remove the
channel queues and I ran a relay for weeks on it without any apparent
problems. It removed 1000+ lines and made everything simpler.

So, I do think having the "channel" abstraction is good and both in terms of
future use of channels but also in terms of maintanability.

That being said, I think the next move for me will be to act on removing the
channel queues and try to decouple channeltls and connection_or so we can end
up with a real abstraction. Achieving this will provide us some more
encapsulation between subsystem and thus be able to provide strong guarantees
between them a.k.a between channel and scheduler.

I do think the current channel interface is "ok" for now but as we do this
exercise and refactoring, we'll discover more things which will just improve
the subsystem overall.

Post by Nick Mathewson
(This reminds me of another change I want someday, which is splitting
edge_connection_t into an "edge_connection" type that implements a "stream"
interface: right now, we have quite a few streams that aren't actually edge
connections, but which use the type anyway.)
* Dealing with the DESTROY cell design issue will require a bit more tricky

Post by David Goulet
work I think. We must not starve circuit with a DESTROY cell pending to
be sent else the other side keeps sending data. But we should also not
starve all the circuits because if we ever need to send a gazillion
DESTROY cell in priority, we'll make the relay useless (DoS vector).
The question is, do we trust our EWMA policy to be wise enough to pick
the circuit in a reasonable amount of time so we can flush the DESTROY
cell from the circuit queue? Or we really need to bypass or prioritize
somehow that cell in order to send them asap in order to avoid load on
the network because the other side of the circuit is still sending?

For the record, right now in tor, whatever (1) or (2), when the middle relay
decides it needs to send a DESTROY, circuit queue are cleared (nothing is
sent) and a DESTROY is sent very fast.

Post by Nick Mathewson
What we're asking here is: how can we reintegrate DESTROY cells with the
rest of the scheduler logic?
I think that, from a priority POV, DESTROY cells are in a certain sense the
_opposite_ of traffic, and we might actually want to treat them differently
from data cells. Consider that if we have a choice between DESTROYing a
busy circuit or a quiet one, we will save more bandwidth by destroying the
busy circuit first, so that no more data is sent to us over it.
On the other hand, this doesn't mean that the FIFO structure we have today
is a good idea at all. It probably makes sense to use the same priority
queue-based scheduler thing that we use everywhere else, but possibly with
a different (inverted??) priority parameter for destroyed circuits.

(We kind of need the FIFO concept for cells afaict because of the parent
relationship between cells with their digest (à la git). And that is of course
per circuit.)

Post by Nick Mathewson
One more thing to be aware of: the destroy_cell_queue exists in part
because we tear down circuits at the same time that we queue their destroy
cells. If we changed Tor so that "destroyed" circuits were kept around
somehow until their cells could be sent, then we'd be introducing a new
state to our state machine, to represent circuits that were schedulable but
not actually usable for traffic. We'd need to be careful to handle that
correctly: this kind of "unusable object that still exists" has caused us
problems before. (The solution I like best for avoiding this confusion is
to make it so the scheduler can schedule two types of "schedule-able"
things: circuits, and "pending destroy cells".)

After some thoughts, yes I do think you are right here. For the two reasons
you mentionned:

1. DESTROY cell needs to be considered differently in terms of priority
depending on the scheduler. KIST for instance, we might think that we want
to bypass the kernel limit and just dump it asap. But we would also need to
select the active circuit using a different priority policy (inverted) as
you suggested to prioritize busy circuit.

2. We can't keep circuit objects alive while we wait for the destroy cell, the
logic inside tor might become complicated and kind of also opens a way for
memory DoS if we ever have a DESTROY cell starvation problem in tor.

Then the problem becomes how do those two queues interact with each other
(normal and DESTROY cells). The issue at hands becomes possible starvation.

Right now, Tor does a back and forth between sending a DESTROY cell and a
normal cell (regardless of the circuit, actually never for the same circuit).
The destroy cell queue is a FIFO which is not ideal with what we've discussed
but overtime we'll make sure to avoid starvation because DESTROY cells aren't
picked by priority.

Thus, if the DESTROY queue becomes a priority queue, starvation becomes a
problem. And extra difficulty, the priority in the normal cell queue can't be
compared to the one of the destroy cell queue because the former evolves over
time and the later is frozen in time because the circuit simply doesn't exists
anymore. In other words, we can't consider the two queues' priorities within
the same policy.

One avenue of solution to this is having some sort of "throtlling" or a
"threshold" that the scheduller allows a queue to be emptied so the other
queue doesn't starve. Chances are that the threshold in our cases would be a
maximum number of cells before the destroy cell queue yields its time back to
the normal cell queue which also would need a threshold. And to be efficient
there, good chance that threshold would be adaptative of the load of the
relay.

All in all, this ain't an easy problem so simply switching the DESTROY cell
queue to a priority one will need important changes.

We are getting in the land of proposal so I'll stop for now, do some more
homework in the starvation problem in scheduling (vis-a-vis routing) and write
something up in a proposal so we can go from there. But at least now, we've
come up with some agreement and more data points on how DESTROY cells needs to
be handled in Tor.

Post by Nick Mathewson

Agree.

Thanks!
David

Post by Nick Mathewson
--
Nick
_______________________________________________
tor-dev mailing list
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev

--
Y7bJ1MNf9h5n1CLO6oGz6GAWskRPqhTOELJMs7AdqGg=

Nick Mathewson

2017-11-16 14:06:03 UTC

Permalink

[...]

Post by David Goulet

Post by Nick Mathewson
On the other hand, this doesn't mean that the FIFO structure we have today
is a good idea at all. It probably makes sense to use the same priority
queue-based scheduler thing that we use everywhere else, but possibly with
a different (inverted??) priority parameter for destroyed circuits.

(We kind of need the FIFO concept for cells afaict because of the parent
relationship between cells with their digest (à la git). And that is of course
per circuit.)

Are you sure? DESTROY cells aren't relay cells; they don't have relay
crypto done to them, and I think it's okay to re-order them with
respect to other cells. I don't think they have a digest on them, do
they?

peace,

--
Nick

David Goulet

2017-11-16 14:10:41 UTC

Permalink

Post by Nick Mathewson
[...]

Post by David Goulet

(We kind of need the FIFO concept for cells afaict because of the parent
relationship between cells with their digest (à la git). And that is of course
per circuit.)

OH sorry I thought you were talking about normal circuit queue here... I
mis-read.

But yes, as I mentionned in this email after, moving to a prio queue for
instance has starvation implication.

Sorry!
David

Post by Nick Mathewson
peace,
--
Nick
_______________________________________________
tor-dev mailing list
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev

--
G5KCdRxFvQYxoWyaIKqONQDGxWeZWspNjvaPIbpYFtQ=