[tor-dev] Proposal 286: Controller APIs for hibernation access on mobile

On Thu, 30 Nov 2017 07:55:49 -0500

Post by Nick Mathewson
Filename: 286-hibernation-api.txt
Title: Controller APIs for hibernation access on mobile
Author: Nick Mathewson
Created: 30-November-2017
Status: Open

[snip]

Is this a general call for feedback/questions? If so, what do you have
in mind for Pluggable Transports?

Currently I can count on zero fingers, the number of PTs that honor
hibernation state, or that have provisions for something like a
hibernation state.

I assume that if this was to be solved, the hibernation code would need
to tear down/respawn PTs, or someone needs to design an out of band IPC
mechanism between tor and PTs that can signal hibernation status.

The current approach to this problem involves toggling `DisableNetwork`.
See: https://trac.torproject.org/projects/tor/ticket/13213

Regards,

--
Yawning Angel

teor

2017-11-30 21:35:24 UTC

Hi Nick,

Post by Nick Mathewson
Filename: 286-hibernation-api.txt
Title: Controller APIs for hibernation access on mobile
Author: Nick Mathewson
Created: 30-November-2017
Status: Open
1. Introduction
On mobile platforms, battery life is achieved by reducing
needless network access and CPU access. Tor currently provides
few ways for controllers and operating systems to tune its
behavior.
This proposal describes controller APIs for better management of
Tor's hibernation mechanisms, and extensions to those mechanisms,
for better power management in mobile environments.
1.1. Background: hibernation and idling in Tor today
We have an existing "hibernation" mechanism that we use to
When a Tor instance is close to its bandwidth limit: it stops
accepting new connections or circuits, and only processes those
it has, until the bandwidth limit is reached. Once the bandwidth
limit is reached, Tor closes all connections and circuits, and
all non-controller listeners, until a new accounting limit
begins.
Tor handles the INT signal on relays similarly: it stops
accepting new connections or circuits, and gives the existing
ones a short interval in which to shut down. Then Tor closes all
connections and exits the process entirely.
Tor's "idle" mechanism is related to hibernation, though its
implementation is separate. When a Tor clients has passed a
certain amount of time without any user activity, it declares
itself "idle" and stops performing certain background tasks, such
as fetching directory information, or building circuits in
anticipation of future needs. (This is tied in the codebase to
the "predicted ports" mechanism, but it doesn't have to be.)
1.2. Background: power-management signals on mobile platforms
(I'm not a mobile developer, so I'm about to wildly oversimplify.
Please let me know where I'm wrong.)
Mobile platforms achieve long battery life by turning off the
parts they don't need. The most important parts to turn off are
the antenna(s) and the screen; the CPU can be run in a slower
mode.
But it doesn't do much good turning things off when they're
unused, if some background app is going to make sure that they're
always in use! So mobile platforms use signals of various kinds
to tell applications "okay, shut up now".
Some apps need to do online background activities periodically;
to help this out, mobile platforms give them a signal "Hey, now
is a good time if you want to do that" and "stop now!"
1.3. Mostly out-of-scope: limiting CPU wakeups when idle.
The changes described here will be of limited use if we do not
also alter Tor so that, when it's idle, the CPU is pretty quiet.
That isn't the case right now: we have large numbers of callbacks
that happen periodically (every second, every minute, etc)
whether they need to or not. We're hoping to limit those, but
that's not what this proposal is about.
2. Improvements to the hibernation model
To present a consistent interface that applications and
controllers can use to manage power consumption, we make these
enhancements to our hibernation model.
First, we add three new hibernation states: "IDLE",
"IDLE_UPDATING", "SLEEP", and "SLEEP_UPDATING".

Four new hibernation states

Post by Nick Mathewson
Tor doesn't launch circuits or start any directory activity, but
its listeners are still open. Tor clients can enter the IDLE
state on their own when they are LIVE, but haven't gotten any
client activity for a while. Existing connections and circuits
are not closed. If the Tor instance receives any new connections,
it becomes LIVE.
"IDLE_UPDATING" is like IDLE, except that Tor should check for
directory updates as appropriate. If there are any, it should
fetch directory information, and then become IDLE again.
"SLEEPING" is like the current "dormant state we use for
bandwidth exhaustion, but it is controller-initiated: it begins
when Tor is told to enter it, and ends when Tor is told to leave
it. Existing connections and circuits are closed; listeners are
closed too.
"SLEEP_UPDATING" is like SLEEP, except that Tor should check for
directory updates as appropriate. If there are any, it should
fetch directory information, and then SLEEP again.
2.1. Relay operation
Relays and bridges should not automatically become IDLE on their
own.
2.2. Onion service operation
When a Tor instance that is running an onion service is IDLE, it
does the minimum to try to remain responsive on the onion
service: It keeps its introduction points open if it can. Once a
day, it fetches new directory information and opens new
introduction points.

… and re-posts its descriptor?

And if an IP goes down, does it pick a new one?

Or if the descriptor expires?
How often does that happen in v2?
I think it happens after 3 hours by default in v3.

Post by Nick Mathewson
3. Controller hibernation API
3.1. Examining the current hibernation state
We define a new "GETINFO status/hibernation" to inspect the
- "live"
- "idle:control"
- "idle:no-activity"
- "sleep:control"
- "sleep:accounting"
- "idle-update:control"
- "sleep-update:control"
- "shutdown:exiting"
- "shutdown:accounting"
- "shutdown:control"

Why is there no reason for "live"?

Post by Nick Mathewson
"live" -- completely awake
"idle" -- waiting to see if anything happens
"idle-update" -- waiting to see if anything happens; probing
for directory information
"sleep" -- completely unresponsive

Missing sleep-update

Post by Nick Mathewson
"shutdown" -- unresponsive to new requests; still processing
existing requests.
The second part of each value indicates the reason that Tor
"control" -- a controller told us to do this.
"no-activity" -- Tor became idle on its own due to not
noticing any requests.
"accounting" -- the bandwidth system told us to enter this
state.
"exiting" -- Tor is in this state because it's getting ready
to exit.

Missing a reason corresponding to StartIdle

Post by Nick Mathewson
HIBERNATION
"STATUS=" (one of the status pairs above.)
Indicates that Tor's hibernation status has changed.
Note: Controllers MUST accept status values here that they don't
recognize.
The "GETINFO accounting/hibernating" value and the "STATUS_SERVER
HIBERANATION_STATUS" event keep their old meaning.
3.2. Changing the hibernation state
We add the following new possible values to the SIGNAL controller
"SLEEP" -- enter the sleep state, after an appropriate
shutdown interval.
"IDLE" -- enter the idle state
"SLEEPWALK" -- If in sleep or idle, start probing for
directory information in the sleep-update or idle-update
state respectively. Remain in that state until we've
probed for directory information, or until we're told to
IDLE or SLEEP again, or (if we're idle) until we get client
activity. Has no effect if not in sleep or idle.
"WAKEUP" -- If in sleep, sleep-update, idle, idle-update, or
shutdown:sleep state, enter the live state. Has no effect
in any other state.
3.3. New configuration parameters
StartIdle -- Boolean. If set to 1, Tor begins in IDLE mode.

Michael Rogers

2017-12-01 10:56:11 UTC

Hi Nick,

Post by Nick Mathewson
2. Improvements to the hibernation model
To present a consistent interface that applications and
controllers can use to manage power consumption, we make these
enhancements to our hibernation model.
First, we add three new hibernation states: "IDLE",
"IDLE_UPDATING", "SLEEP", and "SLEEP_UPDATING".
Tor doesn't launch circuits or start any directory activity, but
its listeners are still open. Tor clients can enter the IDLE
state on their own when they are LIVE, but haven't gotten any
client activity for a while. Existing connections and circuits
are not closed. If the Tor instance receives any new connections,
it becomes LIVE.

Does receiving a new connection include receiving a rendezvous cell from
one of the instance's intro points? If not, do we need a new status
message to tell the controller about this, or is there an existing
message we can use?

Post by Nick Mathewson
2.2. Onion service operation
When a Tor instance that is running an onion service is IDLE, it
does the minimum to try to remain responsive on the onion
service: It keeps its introduction points open if it can. Once a
day, it fetches new directory information and opens new
introduction points.

If a client connects to the service, the service will need to build a
circuit to the rendezvous point. Does it fetch up-to-date directory
information before doing so? If so, there's a delay that may let the
client know the service was idle. Is that a problem?

Two other possibilities would be for the service to fetch directory
information every hour in case a client connects, or to build the
circuit using whatever information it has available, which may be up to
a day old. Is that a problem?

Post by Nick Mathewson
3.2. Changing the hibernation state
We add the following new possible values to the SIGNAL controller
"SLEEP" -- enter the sleep state, after an appropriate
shutdown interval.
"IDLE" -- enter the idle state
"SLEEPWALK" -- If in sleep or idle, start probing for
directory information in the sleep-update or idle-update
state respectively. Remain in that state until we've
probed for directory information, or until we're told to
IDLE or SLEEP again, or (if we're idle) until we get client
activity. Has no effect if not in sleep or idle.
"WAKEUP" -- If in sleep, sleep-update, idle, idle-update, or
shutdown:sleep state, enter the live state. Has no effect
in any other state.

How does the controller find out when the Tor instance next needs to
fetch directory information (or post a hidden service descriptor) so it
can send a SLEEPWALK command at the right time? Or should the controller
just send the command periodically, maybe once an hour?

Cheers,
Michael

teor

2017-12-01 12:16:40 UTC

Post by teor
Hi Nick,

For services, it should include receiving an INTRODUCE cell.

For clients, it should include application activity on the SOCKSPort
and similar.

Do clients fetch directory information on an application request?
It might have to get microdescs to get the right position on the HSDir
hash ring.
For a client .onion address request, does the client fetch the latest onion
service descriptor before connecting?

If a client connects to the service, the service will need to build a
circuit to the rendezvous point. Does it fetch up-to-date directory
information before doing so?

Interesting question.

It's not required, because the INTRODUCE cell contains all the
rendezvous point details. But I think we should be consistent,
and fetch a consensus and enough microdescs before performing
any client or service activity, just like we do when bootstrapping.
Otherwise, we'll end up with weird bugs.

(Services require microdescs to choose intro points, and to post
their descriptors to the right HSDirs on the hash ring.)

Post by teor
If so, there's a delay that may let the
client know the service was idle. Is that a problem?

Mobile clients typically have high latency already.
If enough clients do this, it won't be a problem.

Post by teor
Two other possibilities would be for the service to fetch directory
information every hour in case a client connects,

Even if it isn't a CPU drain, fetching info every hour would add to
network usage.

Post by teor
or to build the
circuit using whatever information it has available, which may be up to
a day old. Is that a problem?

It increases the failure rate.

It also makes the client stick out a bit more. We could mitigate this by
allowing all clients to use stale directory information to build circuits.

But I still think it's best to be consistent and get enough directory info.

I'm trying to work out what the use case is here, and why SLEEPWALK
is a good solution,

If the controller sends SLEEPWALK, and Tor has nothing to do, it should
immediately return to IDLE or SLEEP.

If the controller puts the Tor instance in IDLE mode, it doesn't need to
issue a SLEEPWALK command every hour, because Tor will do the
minimum it needs to do to be connected.

If the controller wants to keep the directory information more up to date
(so it can "warm start" with less delay), issuing a SLEEPWALK every
2 hours should be sufficient. (Clients typically have consensuses that
expire in 1.5 - 3 hours.)

The more options that Tor provides, and the more unusual things a
controller tries to do, the more clients will stick out due to delays.
So I don't think SLEEPWALK is a good idea, because it allows every
different controller to pick a different update interval.

So if IDLE doesn't meet your needs, it would help us to know why. If
there's enough demand for it, it may be better to add a "WARM" state,
where Tor checks for directory documents whenever a consensus
expires, and otherwise acts like IDLE.

That might be better than every controller choosing a separate
SLEEPWALK interval. In fact, it would allow us to remove the SLEEPWALK
state in favour of "WARM".

Is there any reason that controllers would want to do a once-off
SLEEPWALK, rather than staying "WARM" under certain conditions?

I can imagine that if we are on battery and wifi, we would want "WARM".
(It's simpler than triggering periodic SLEEPWALKs.)

But if the user launches a Tor-using app, or unlocks the screen, would we
want to issue one SLEEPWALK? Or would we want to stay "WARM" the
entire time the app or screen was being used?

The difference is subtle: it would only become apparent after 1-2 hours of
continuous device use. In the "WARM" state, the consensus would still
be live. If SLEEPWALK was issued once a few hours ago, it wouldn't be.

(The nice thing about SLEEPWALK is that controllers don't have to tell Tor
to IDLE again. Maybe that's enough of a benefit to justify it?)

T

Michael Rogers

2017-12-05 18:12:39 UTC

Post by Nick Mathewson
When a Tor instance that is running an onion service is IDLE, it
does the minimum to try to remain responsive on the onion
service: It keeps its introduction points open if it can. Once a
day, it fetches new directory information and opens new
introduction points.

If a client connects to the service, the service will need to build a
circuit to the rendezvous point. Does it fetch up-to-date directory
information before doing so?

Interesting question.
It's not required, because the INTRODUCE cell contains all the
rendezvous point details. But I think we should be consistent,
and fetch a consensus and enough microdescs before performing
any client or service activity, just like we do when bootstrapping.
Otherwise, we'll end up with weird bugs.

Could/should this be done by reusing the existing bootstrapping process,
i.e. by reverting back to an earlier stage in the process and repeating
the rest of the process?

Post by Michael Rogers
If so, there's a delay that may let the
client know the service was idle. Is that a problem?

Mobile clients typically have high latency already.
If enough clients do this, it won't be a problem.

If the service needs to fetch a consensus and microdescs before it can
respond to a rendezvous cell, the delay could be far longer than the
difference in latency between a mobile phone and a laptop. So my point
is that the client will be able to tell that the service was woken from
idle by the rendezvous cell, which might have implications for the
service's anonymity.

For example, it lets the client know that the service isn't running on
the same device as another service the client recently connected to,
otherwise the device wouldn't have been idle. Maybe that's unavoidable,
or not worth avoiding, but I just wanted to flag the issue.

Maybe I've misunderstood the proposal, but I thought the intent was that
Tor wouldn't fetch anything in IDLE mode, and wouldn't automatically
change from IDLE to IDLE_UPDATING - it would need a SLEEPWALK signal to
tell it to change to IDLE_UPDATING, and then it would automatically
change back to IDLE when it was done.

I'm guessing that although limiting CPU wakeups is outside the scope of
this proposal, the SLEEPWALK mechanism is meant to be compatible with
some future changes where the device will be allowed to go into a sleep
state from which Tor can't wake it, and the controller will use the
platform's alarm API to schedule a SLEEPWALK signal to wake Tor so it
can perform its periodic tasks.

Post by teor
The more options that Tor provides, and the more unusual things a
controller tries to do, the more clients will stick out due to delays.
So I don't think SLEEPWALK is a good idea, because it allows every
different controller to pick a different update interval.

Rather than the controller picking an interval, would it be better for
Tor to specify (maybe in its response to the IDLE signal) when it next
needs to be woken?

Post by teor
So if IDLE doesn't meet your needs, it would help us to know why. If
there's enough demand for it, it may be better to add a "WARM" state,
where Tor checks for directory documents whenever a consensus
expires, and otherwise acts like IDLE.

Within the scope of this proposal that sounds like a good solution. But
if we're looking ahead to changes that allow the device to sleep without
shutting down Tor or disabling its network connectivity, then the
controller will need to be responsible for managing sleeps and wakeups,
which fits better with [my guess as to the intent of] the SLEEPWALK
mechanism than a WARM state.

Cheers,
Michael

teor

2017-12-05 22:18:12 UTC

If a client connects to the service, the service will need to build a
circuit to the rendezvous point. Does it fetch up-to-date directory
information before doing so?

Interesting question.
It's not required, because the INTRODUCE cell contains all the
rendezvous point details. But I think we should be consistent,
and fetch a consensus and enough microdescs before performing
any client or service activity, just like we do when bootstrapping.
Otherwise, we'll end up with weird bugs.

Could/should this be done by reusing the existing bootstrapping process,
i.e. by reverting back to an earlier stage in the process and repeating
the rest of the process?

Yes, this is what Tor already does when its clock changes or its directory
documents expire. We'll just have to make sure we don't break it!

Post by Michael Rogers
If so, there's a delay that may let the
client know the service was idle. Is that a problem?

Mobile clients typically have high latency already.
If enough clients do this, it won't be a problem.

If the service needs to fetch a consensus and microdescs before it can
respond to a rendezvous cell, the delay could be far longer than the
difference in latency between a mobile phone and a laptop. So my point
is that the client will be able to tell that the service was woken from
idle by the rendezvous cell, which might have implications for the
service's anonymity.
For example, it lets the client know that the service isn't running on
the same device as another service the client recently connected to,
otherwise the device wouldn't have been idle. Maybe that's unavoidable,
or not worth avoiding, but I just wanted to flag the issue.

We try to avoid attacks like this.
Or, if we can't, we try to minimise their effect.

But when multiple onion services or clients share a tor instance, they also
share the state of the consensus, directory documents, and guards.

Our best answer is probably: "don't share a tor instance if you want
unlinkable onion services".

Or: "don't IDLE if you want unlinkable onion services".
(Also, never lose your network connection.)

LIVE fetches directory documents so it always has a live consensus.
IDLE fetches directory documents just often enough to stay online.
SLEEP fetches nothing.

Post by Michael Rogers
I'm guessing that although limiting CPU wakeups is outside the scope of
this proposal, the SLEEPWALK mechanism is meant to be compatible with
some future changes where the device will be allowed to go into a sleep
state from which Tor can't wake it, and the controller will use the
platform's alarm API to schedule a SLEEPWALK signal to wake Tor so it
can perform its periodic tasks.

Rather than the controller picking an interval, would it be better for
Tor to specify (maybe in its response to the IDLE signal) when it next
needs to be woken?

Or, "the latest time it can be woken to have directory documents with
property X", where X is some combination of:
* a live consensus
* a reasonably live consensus
* enough non-expired descriptors to build circuits

We do need a use case here :-)

And yes, I agree that the controller should be able to manage wakeups.

T

Michael Rogers

2017-12-06 16:21:21 UTC

Post by Michael Rogers
If the service needs to fetch a consensus and microdescs before it can
respond to a rendezvous cell, the delay could be far longer than the
difference in latency between a mobile phone and a laptop. So my point
is that the client will be able to tell that the service was woken from
idle by the rendezvous cell, which might have implications for the
service's anonymity.
For example, it lets the client know that the service isn't running on
the same device as another service the client recently connected to,
otherwise the device wouldn't have been idle. Maybe that's unavoidable,
or not worth avoiding, but I just wanted to flag the issue.

We try to avoid attacks like this.
Or, if we can't, we try to minimise their effect.
But when multiple onion services or clients share a tor instance, they also
share the state of the consensus, directory documents, and guards.
Our best answer is probably: "don't share a tor instance if you want
unlinkable onion services".
Or: "don't IDLE if you want unlinkable onion services".
(Also, never lose your network connection.)

Sounds reasonable. Maybe something to this effect could be added to the
proposal, so app developers know what to expect in terms of linkability?

Could a long delay between receiving a rendezvous cell and responding
cause any other issues? For example, is there a high probability of the
client timing out before the service has fetched enough directory info
to be able to respond? If so, maybe it's worthwhile for the service to
be more proactive about keeping its directory info fresh?

Post by Michael Rogers
Maybe I've misunderstood the proposal, but I thought the intent was that
Tor wouldn't fetch anything in IDLE mode, and wouldn't automatically
change from IDLE to IDLE_UPDATING - it would need a SLEEPWALK signal to
tell it to change to IDLE_UPDATING, and then it would automatically
change back to IDLE when it was done.

LIVE fetches directory documents so it always has a live consensus.
IDLE fetches directory documents just often enough to stay online.
SLEEP fetches nothing.

OK, so I guess the use case for SLEEPWALK is telling Tor to fetch a live
consensus and microdescs when it otherwise wouldn't have done - i.e. it
allows the controller to manage the freshness of the directory info?

But I'm really just guessing here. Nick, can you clarify?

Post by Michael Rogers
Rather than the controller picking an interval, would it be better for
Tor to specify (maybe in its response to the IDLE signal) when it next
needs to be woken?

Or, "the latest time it can be woken to have directory documents with
* a live consensus
* a reasonably live consensus
* enough non-expired descriptors to build circuits

Yup, I think that makes sense - it achieves [what I guess is] the
purpose of SLEEPWALK while keeping knowledge about *why* Tor needs to be
woken at that time encapsulated within Tor, which is an improvement.

We do need a use case here :-)
And yes, I agree that the controller should be able to manage wakeups.

OK, I have two use cases. They go beyond the scope of this proposal
because they're also concerned with CPU wakeups, but I'm not sure we can
really design the controller API without considering CPU wakeups at all.

The first use case is saving power by putting the device to sleep, while
keeping a hidden service available.

"Sleep" on Android is similar to suspend on Linux (for recent Android
kernels it's identical). User-space code is paused and the kernel only
responds to a limited set of interrupts, including network activity and
alarms.

Entering this state without disabling Tor's network connectivity causes
it to panic when the device wakes up - its libevent timers don't fire
during sleep, so it thinks the clock has jumped. Just suppressing that
panic reaction doesn't seem like a good idea. It would be better if we
could warn Tor that the device is about to sleep, so it should stop
building circuits and making connections, and should expect the clock to
jump. This was my original motivation for asking for an "idle mode".

If Tor tells the controller when it needs to be woken then the
controller can schedule an alarm.

The second use case is similar to the first, except the controller has
less flexibility about scheduling alarms because the device is in "doze
mode". This is a new power-saving mode on Android 6 that imposes extra
restrictions beyond normal sleep, such as ignoring wake locks, shutting
off network access and postponing alarms. If the user adds the app to a
whitelist then it can hold wake locks and use the network, but it's
still limited to scheduling alarms once every 15 minutes.

(This is totally braindead, by the way - if you want to do one second of
work every 14 minutes, your only choice is to hold a wake lock that
keeps the device awake the whole time, whereas before the new "power
saving" restrictions you could have scheduled an alarm and let the
device sleep. But such is Android.)

So for this use case, the controller can't wake Tor more than once every
15 minutes. If we're only waking Tor to fetch directory info and publish
HS descriptors, that's probably fine. But other ideas we considered,
like waking Tor once per minute so it can keep behaving more or less
normally, won't work in doze mode.

In both scenarios (sleep and doze), if there's any network activity
before the next alarm, the controller can grab a wake lock to keep the
device awake while Tor handles the network activity. There are two ways
we could achieve that: Tor could inform the controller about the network
activity, or Tor could connect to the network through a localhost proxy
provided by the controller, which would manage the wake locks transparently.

Cheers,
Michael

meejah

2017-12-08 16:37:17 UTC