[tor-dev] Proposal 295: Using the ATL construction for relay cryptography (solving the crypto-tagging attack)

Discussion:

Nick Mathewson

2018-07-03 21:41:25 UTC

Hi, all!

I've got the privilege to relay the following proposal from Tomer
Ashur: it reflects a whole lot of work. I didn't make any changes,
other than wrapping the lines: please let me know if corrupted
anything.

Filename: 295-relay-crypto-with-atl.txt
Title: Using the ATL construction for relay cryptography (solving the
crypto-tagging attack)
Author: Tomer Ashur
Created: 09 Apr 2018
Status: Open

0. Context

Although Crypto Tagging Attacks were identified already in the
original Tor design, it was not before the rise of the Procyonidae in
2012 that their severity was fully realized. In Proposal 202 (Two
improved relay encryption protocols for Tor cells) Nick Mathewson
discussed two approaches to stymie tagging attacks and generally
improve Tor's cryptography. In Proposal 261 (AEZ for relay
cryptography) Nick put forward a concrete approach which uses the
tweakable wide-block cipher AEZ.

1. Motivation

For motivations, see proposal 202.

2. Summary and preliminaries

This proposal suggests an alternative approach to Proposal 261 using
the notion of Release (of) Unverified Plaintexts (RUP-security). It
describes an improved algorithm for circuit encryption based on
CTR-mode which is already used in Tor, and an additional component
for hashing. When this additional component is GHash, a GCM-based
solution can be used from the OpenSSL library. If a solution that is
based solely on components that already exist in Tor is desirable,
SHA-1 can be used for hashing (however, this is highly discouraged as
SHA-1 is broken!).

Incidentally, and similar to Proposal 261, this proposal employs the
ENCODE-then-ENCIPHER approach thus it improves Tor's E2E integrity by
using (sufficiently enough) redundancy.

For more information about the scheme and a security proof for its
RUP-security see

Tomer Ashur, Orr Dunkelman, Atul Luykx: Boosting Authenticated
Encryption Robustness with Minimal Modifications. CRYPTO (3) 2017:
3-33 available online at https://eprint.iacr.org/2017/239 .

2.1 An instructive 1-node circuit

Noting that in CTR-mode the nonce is essential for the
encryption/decryption operations, the proposed solution is based on
encrypting its value in such a way that any change to the ciphertext
or the tag (e.g., through a tagging attack) would randomize the nonce
and result in a random value as the output of CTR-mode.

The basic crypto components of the proposal are:
* a block cipher E(·)
* A universal hash function, H(·);

Using the block cipher E(·) and a non-repeating nonce N, one can
construct the CTR mode-of-operation:

CTR(N) = E(N)||E(N+1)||...||E(N+l)

For simplicity, let us for now consider normal encryption using
CTR-mode (This is akin to building a circuit with a single node. In
the sequel we will see what happens when this idea is used in a 3-node
circuit):

* Input: a nonce N, a message M = (m_0,...,m_l)
1. (z_0,...,z_l) = CTR(N)
2. (c_0,...,c_l) = (z_0 ^ m_0, ... z_l ^ m_l)

The client sends C = (c_0,...,c_l) to the receiver which repeats Step
(1) to generate the random stream Z = (z_0,...,z_l) and recovers the
message through M = C ^ Z. A protocol using this scheme is vulnerable
to the crypto tagging attack due to the malleability of
CTR-mode. Indeed, since Z depends only on N rather than on the
message M itself, a change to a ciphertext bit affects the respective
plaintext bit and nothing else.

Our solution is based on the idea that linking N and C makes it
impossible to change C without affecting N (and hence Z) in such a
way that it is impossible to recover M. This can be done by extending
the protocol in the following way:

3. X = H(C)
4. S = X ^ E(N ^ X)

Now, instead of sending C, the client sends C||S to the receiver. To
decrypt, the receiver first repeats Step (3) to obtain X, then
inverts Step (4) to obtain N: N = D(S ^ X) ^ X (where D(·) is the
inverse function of E(·)). Then, having obtained N, the receiver
continues to decrypt C as before. If the receiver already knows N
(e.g., in the case of a synchronized nonce), they can authenticate C
by comparing N = D(S ^ X) ^ X to the nonce they expect.

Let us consider what happens when a change is made to C. Let C' be a
variant of C with one or more bits flipped. Since H(·) is a universal
hash function, X' = H(C' ≠ C) is random. Trying to execute D(S ^ X')
^ X' would result in a random N' ≠ N which in turn would result in a
random Z' (and hence a random M'). Similarly for a change in S.

2.2 Extending to a 3-node Tor circuit

The Tor protocol uses layered encryption to encapsulate the
message. Generally speaking, this can be written as:

* input: synchronized nonces (N^0, N^1, N^2), a message M = (m_0,...,m_l)
1. (z^2_0,...,z^2_l) = CTR(N^2)
2. (c^2_0,...,c^2_l) = (z^2_0 ^ m_0, ... z^2_l ^ m_l)
3. (z^1_0,...,z^1_l) = CTR(N^1)
4. (c^1_0,...,c^1_l) = (z^1_0 ^ c^2_0, ... z^1_l ^ c^2_l)
5. (z^0_0,...,z^0_l) = CTR(N^0)
6. (c^0_0,...,c^0_l) = (z^0_0 ^ c^1_0, ... z^0_l ^ c^0_l)

The crypto-tagging stems from the fact that a change to C affects M
directly since all z^j_i depend only on N^j rather than on C^j.

An extension of the 1-layer solution presented in Section 2.1 would
look like this:

* Input: a nonce N, a message M = (m_0,...,m_l)
1. (z^2_0,...,z^2_l) = CTR(N)
2. (c^2_0,...,c^2_l) = (z^2_0 ^ m_0, ... z^2_l ^ m_l)
3. X^2 = H(C^2)
4. S^2 = X^2 ^ E(N^2 ^ X^2)

5. (z^1_0,...,z^1_l) = CTR(S^2)
6. (c^1_0,...,c^1_l) = (z^1_0 ^ c^2_0, ... z^1_l ^ c^2_l)
7. X^1 = H(C^1)
8. S^1 = X^1 ^ E(S^2 ^ X^1)

9. (z^0_0,...,z^0_l) = CTR(S^1)
10. (c^0_0,...,c^0_l) = (z^0_0 ^ c^1_0, ... z^1_l ^ c^0_l)
11. X^0 = H(C^0)
12. S^0 = X^0 ^ E(S^1 ^ X^0)

The client sends the message C^0||S^0 = ((c^0_0,...,c^0_l)||S^0) to
the guard. To decrypt, the guard repeats Step (11) and inverts Step
(12) to obtain S^1 which it uses to generate Z^0 and decrypt C^0 into
C^1. The guard then forwards C^1||S^1 to the middle node which
repeats this process and forwards C^2||S^2 to the exit. The exit
removes the final encryption layer and processes the cell as in the
old protocol.

Let us now consider what happens when the crypto tagging attack is
applied to this protocol. Without loss of generality, after inverting
Step (11) the guard tags either C^1 or S^1 and forwards C'^1||S'^1 to
the middle node. The middle node, unaware of the change follows the
protocol to decrypt C'^1||S'^1 which results in a random string as
per the explanation above. Since both CircID and CMD are not part of
the payload, the middle node can still forward the random string
(unaware of this fact) to the exit node. Upon receiving the random
string, the exit node decrypts it, sees that the 'Rec' field is not
all-zero and acts accordingly.

3. Design considerations

3.1 Choice of primitives

We stress that the proposed protocol is primitive agnostic.

Noting that Tor currently uses AES_CTR we tried to design a solution
that requires only minimal changes. Most importantly, the encryption
is still performed using CTR-mode, and can be instantiated using
AES_CTR (however, the solution works just as well with any other
block cipher).

The hashing of the ciphertext requires a universal hash function. The
GCM mode of operation uses CTR+GHash and is available from the
OpenSSL crypto library which is already used within Tor. Any
available universal hash function can be used instead of GHash if the
latter introduces an unacceptable latency.

The nonce-encryption step uses a tweakable block cipher. In the
example above we used the XEX trick to transform a "regular" block
cipher into a tweakable block cipher which allows reusing whatever
underlying primitive is used in CTR-mode. A tweakable block cipher
can be used directly (with X as the tweak) if one is available but
this is not required.

3.2 Security requirements

In Proposal 202, Nick Mathewson outlined the security requirements
for any solution. We copy them here verbatim and discuss how each of
them is addressed in the proposed solution:

* " ... we'd like any attacker who controls a node on the circuit
not to have a good way to learn any other nodes, even if he/she
controls those nodes" - By construction, once processed by an
honest node the cell's payload is completely destroyed, thus
unlinking any relation between what is seen by nodes further
down the circuit and the original message. An adversary
controlling the exit node can still see that the circuit turned
into junk (borrowing the professional jargon of Proposal 202);
this seems unavoidable (if any of the other proposals somehow
solved this, we would like to know so we can see if it can
somehow be adapted here.

* "no relay should be able to alter a cell between an honest sender
and an honest recipient in a way that they cannot detect" - Any
alternation of a cell between a sender and a recipient will be
decrypted to junk and never delivered to the recipient. (**)

* "Relay cells should be secret: nobody but the sender and
recipient of a relay cell should be able to learn what it says."
- since the protocol is an extension of the existing one,
whatever secrecy was present before remains so.

* "... Circuits should resist transparent, recoverable tagging
attacks... " - that's the whole point. Once a change was injected
and passed to an adjacent honest node, no node further down the
circuit can recover the relay cell.

* "... The above properties should apply to sequences of cells
too... " - this one is more tricky. To the best of my
understanding this property is currently ensured through the
'Digest' field in the payload. Keeping this field, our solution
can provide the same functionality as before. However,
re-arranging the 'Rec' and 'Digest' fields in a way similar to
Proposal 261 would reduce the overhead but would require
developing a different solution. It seems possible though. (*)

In addition to supporting these security requirements, this proposal
improves Tor's E2E authentication by replacing the broken SHA-1 with
an ENCODE-then-ENCIPHER authentication. By introducing redundancy to
the cell (either through the 'Rec' and 'Digest' fields or otherwise)
the exit node can verify that the cell was not altered en route. This
is similar to what is done in Proposal 261 but without the trouble of
using a tweakable wide-block cipher.

(*) a possible solution would be to digest X as part of H(C) but
this requires further discussion.

(**) my understanding is that a single circuit can be used for
more than one TCP stream. If this is not the case, then the
recipient receives a random string and can identify that this is
not a valid message.

3.3 Feature requirements

In addition to security requirements, Proposal 202 also discusses
some feature requirements. We discuss two of them here as the others
seems to be trivially supported (although this needs to be verified
by someone who understands Tor better than me).

* "Any new approach should be able to coexist on a circuit with the
old approach" - following the IRC discussion, we had an internal
discussion and we think that the proposed solution works even if
only the middle node supports the new protocol. This involves
encrypting the nonce only for the middle nonce. We came up with
two ways to achieve this, both have their pro's and con's and
should be discussed once we agree on the general idea.

Since the Crypto tagging attack requires a collusion between a
corrupted guard and a corrupted exit, it does not make sense to
discuss what happens when one of them runs the new protocol.

* "Cell length needs to be constant as cells move through the
network." - The solution was designed to allow for a fixed cell
size. this should be discussed though.

4. Specifications
TBD

5. Compatibility
See Section 3.3

6. Implementation
TBD

7. Performance and scalability notes
TBD

Nick Mathewson

2018-07-04 18:16:09 UTC

Permalink

Oh no! I uploaded the wrong version of this. I'll replace it once I
reformat the new one, but in the meantime, I'm attaching (what I hope
is) the latest version from Tomer.

Nick Mathewson

2018-07-04 20:42:31 UTC

Permalink

Here is (I hope) the correct version of this proposal, reformatted.
Filename: 295-relay-crypto-with-atl.txt
Title: Using ADL-GCM for relay cryptography (solving the crypto-tagging attack)
Author: Tomer Ashur
Created: 09 Apr 2018
Status: Open

0. Context

Although Crypto Tagging Attacks were identified already in the
original Tor design, it was not before the rise of the Procyonidae in
2012 that their severity was fully realized. In Proposal 202 (Two
improved relay encryption protocols for Tor cells) Nick Mathewson
discussed two approaches to stymie tagging attacks and generally
improve Tor's cryptography. In Proposal 261 (AEZ for relay
cryptography) Mathewson puts forward a concrete approach which uses
the tweakable wide-block cipher AEZ.

1. Motivation

For motivations, see proposal 202.

2. Summary and preliminaries

This proposal suggests an alternative approach to Proposal 261 using
the notion of Release (of) Unverified Plaintexts (RUP-security). It
describes an improved algorithm for circuit encryption based on
CTR-mode which is already used in Tor, and an additional component
for hashing. When this additional component is GHash, a GCM-based
solution can be used from the OpenSSL library.

Incidentally, and similar to Proposal 261, this proposal employs the
ENCODE-then-ENCIPHER approach thus it improves Tor's E2E integrity by
using (sufficiently enough) redundancy.

For more information about the scheme and a security proof for its
RUP-security see

Tomer Ashur, Orr Dunkelman, Atul Luykx: Boosting Authenticated
Encryption Robustness with Minimal Modifications. CRYPTO (3)
2017: 3-33 available online at https://eprint.iacr.org/2017/239 .

2.1. Cryptographic algorithms

As an encryption primitive we use AES. We denote one call to the
primitive by E(·), e.g.,

Z = E_k(X)

where k is a K-bit key, X is the input to AES and Z is the output. We
omit k where it is irrelevant or is clear from context. While we
propose to set K=256 to ensure long-term post-quantum resistance, we
stress that the solution also works for other key lengths.

To encrypt, we use AES is counter mode and a nonce N to produce a
random bit stream, e.g.,

CTR(N) = E(N)||E(N+1)||...||E(N+l) = z_1||z_2||...||Z_{n+1} = Z

and XOR Z with the message M to produce the ciphertext C, e.g.,

C = Z ^ M

For authentication, we use the universal hash function GHASH and a
key k, e.g.,

T = H(K||X) = H_k(X)

where K is a 256-bit key, X an arbitrary length string, and T a
unique authentication tag for X. While we propose to set K=256 to
ensure long-term post-quantum resistance, we stress that the solution
works for any universal hash function as well as for keys of length
K!=256.

3. The proposed solution

3.1 An overview of the proposed solution

Noting that in CTR-mode the nonce is essential for the
encryption/decryption operations, the proposed solution is based on
linking the nonce and the transmitted data in such a way that any
change to the data (e.g., through a tagging attack) randomizes the
nonce and results in a random value Z' != Z for CTR(N' != N).

3.1.1. 'Forward' direction (same direction as CREATE):

When preparing a message to be sent in the forward direction, the
client is responsible for encrypting all layers. The client starts by
digesting the message to create a synthetic tweak T_0 = H_{k_fM}(M)
and uses this tweak to encrypt an all-zero string to create a
synthetic IV: S_0 = T_0 ^ E_{k_fE}(0 ^ T_0).

The message and the synthetic IV (i.e., M||S_0) are passed as input
to the first encryption layer which uses S_0 to produce the random
stream Z = CTR_k1(S_0) and the ciphertext C = Z ^ M. Then, a tweak T
= H_k2(C) is generated by digesting the ciphertext of this
layer. Finally, S_0 is encrypted using AES and the tweak T to produce
an encrypted nonce S = T ^ E_k3(N ^ T) which is appended to the the
ciphertext and is used as the nonce for the next encryption layer.

After encrypting 3 times, the client sends C||S to the guard. Whereas
the protocol previously required that the nonce be kept in sync
between each node and the client, this is no longer required as it
can now be recovered from C||S. To decrypt, the node first computes T
= H_k2(C) then uses it to obtain S = D_k3(S ^ T) ^ T. Finally, CTR(S)
is used to decrypt C into M. The once-decrypted nonce is appended to
the data and M||S is sent to the next node which repeats this
process.

The exit verifies the integrity of the message by digesting it and
using the resulting T to decrypt S_0 into N. If N=0, the message is
authenticated. Otherwise, if the message has been changed by an
adversary, N!=0 and the circuit is destroyed.

3.1.2 'Back' direction (opposite direction from CREATE):

When sending back a message from the exit to the client, each node
adds an additional layer of encryption. The exit starts by digesting
the message to create a synthetic tweak T_0 = H_{k_bM}(M) and uses
this tweak to encrypt an all-zero string to create a synthetic IV:
S_0 = T_0 ^ E_{k_bE}(0 ^ T_0). The synthetic IV is then used to
produce the random stream Z = CTR_k1(S_0) and uses that to encrypt
the plaintext through C = Z ^ M. The nonce is appended to the
ciphertext and C||S_0 is sent to the next node.

Receiving a message from the exit, the middle starts by producing a
tweak T = H_k2(C). The tweak is then used to encrypt the nonce
through S = T ^ E_k3(N ^ T) which is then used to produce the random
stream Z' = CTR_k1(S). Finally, the message is encrypted via C' = Z'
^ C and the encrypted nonce is appended to C'. The message C'||S is
sent to the next node which repeats this procedure.

When the message arrives at the client, the client peels each layer
using the nonce that is appended at the end of the message. The
decrypted layer is used to compute the tweak T = H_k2(P), which is
used to decrypt this layer's nonce into the next layer's nonce.

The client verifies the integrity of the message by digesting it and using
the resulting T to decrypt S_0 into N. If N=0, the message is
authenticated. Otherwise, if the message has been changed by an adversary,
N!=0 and the circuit is destroyed.

3.2 Circuit management

3.2.1 Creating circuits

The general process of creating a circuit remains the same (see
tor-spec.txt), except for step 5 where instead of creating a pair of
keys (kf_1,kb_1) for forward and backward communication,
respectively, 2 triplets of keys (kf_1,kf_2,kf_3) and
(kb_1,kb_2,kb_3) are created for communication in the forward and
backward directions, respectively. Two additional pairs of
"authentication keys" (k_fM,k_fE) and (k_bM,k_bE) are created between
the client and the exit in the forward and backward directions,
respectively.

note: the keys can be created by running KDF-RFC5869 3 times in each
direction, or by running it once in each direction to create a
master-key and derive the rest as subkeys from this master key

3.2.2 Routing relay cells

When an OR receives a RELAY or RELAY_EARLY cell, it checks the cell's
circID and determines whether it has a corresponding circuit along
that connection. If not, the OR drops the cell.

Otherwise, if the OR is not at the OP edge of the circuit (that is,
either an 'exit node' or a non-edge node), it de/encrypts the payload
with the stream cipher, as follows (the message structure is C||S):

'Forward' relay cell (same direction as CREATE)
(input structure is: C||S):

1. T = H_{kf_2}(T'||C); digest ciphertext (T' is a running digest
of all previous cells, i.e., last round's T)

2. N = T ^ D_{kf_3}(S ^ T); decrypt nonce

3. M = CTR_{kf_1}(N) ^ C; decrypt message
(output structure is M||N)

'Back' relay cell (opposite direction from CREATE)
(input structure is: M||N):

1. T = H_{kb_2}(T'||M); digest ciphertext (T' is a running digest
of all previous cells, i.e., last round's T)

2. S = T ^ E_{kb_3}(N ^ T); encrypt nonce

3. C = CTR_{kb_1}(S) ^ M; encrypt message
(output structure is C||S)

Note that in counter mode, decrypt (D) and encrypt (E) are the same
operation.

The OR then decides whether it recognizes the relay cell, by
inspecting the payload as described in Section 4.1 below. If the OR
recognizes the cell, it processes the contents of the relay cell.
Otherwise, it passes the decrypted relay cell along the circuit if
the circuit continues. If the OR at the end of the circuit
encounters an unrecognized relay cell, an error has occurred: the OR
sends a DESTROY cell to tear down the circuit.

4. Application connections and stream management

4.1 Relay cells

Within a circuit, the OP and the exit node use the contents of RELAY
packets to tunnel end-to-end commands and TCP connections ("Streams")
across circuits. End-to-end commands can be initiated by either
edge; streams are initiated by the OP.

The payload of each unencrypted RELAY cell consists of:
Relay command [1 byte]
'Recognized' [2 bytes]
StreamID [2 bytes]
Length [2 bytes]
Data [PAYLOAD_LEN-23 bytes]

The 'recognized' field is used for a simple indication if the cell
still encrypted or not. When sending cells, the unencrypted
'recognized' MUST be set to zero.

When receiving and decrypting cells the 'recognized' will always be
zero if we're the endpoint that the cell is destined for. For cells
that we should relay, the 'recognized' field will usually be nonzero,
but will accidentally be zero with P=2^-32.

The old Digest field is removed, since sufficient information for
authentication is now included in the all-zero string used to create the
nonce for the first encryption layer.

The 'Length' field of a relay cell contains the number of bytes in
the relay payload which contain real payload data. The remainder of
the payload is padded with NUL bytes.

4.2 Appending the encrypted nonce

When a cell is prepared by the client to be sent in the 'forward'
direction, as explained in Section 3.2.2, the encrypted nonce S is
appended to the encrypted cell (occupying the last 16 bytes of the
cell). If the cell is prepared to be sent to a node supporting the
new protocol, S is used as the nonce for the next round's
encryption. Otherwise, if the next node only supports the old
protocol, S is still appended to the encrypted cell, but a
synchronized nonce (as per the old protocol) is used for encryption.

When a cell is sent along the circuit in the 'backward' direction,
nodes supporting the new protocol always assume that the last 16
bytes of the input are the nonce used by the previous node, which
they process as per Section 3.2.2. If the previous node also supports
the new protocol, these cells are indeed the nonce. If the previous
node only supports the old protocol, these bytes are either encrypted
NUL bytes or encrypted data.

4.3 Authenticating the message

End-to-end authentication is performed in both directions in the same
way. Once the final layer of encryption is removed, the
message-to-be-authenticated is digested and this digest is used to
decrypt the nonce used in this layer. If the decrypted nonce is
all-zero, the message is authentic. Otherwise, if a change was made
to the ciphertext or the encrypted nonce somewhere along the circuit,
the nonce decrypts into a random value and the circuit is destroyed.

teor

2018-07-20 07:27:42 UTC

Permalink

Hi,

A few of us discussed this proposal in the #tor-dev IRC earlier this week.

There are still a few things in the proposal that we found confusing.
(Some of these things are our mistakes in tor-spec, which were copied
to the proposal. Weâre working on fixing them in tor-spec.)

Once the proposal has been revised, we would like to:
* try condensing the proposal into an algorithm in pseudocode
* estimate how it will perform compared to Torâs current relay crypto
* write a tor-spec patch based on the proposal

Hereâs a summary of our feedback:

Changing the Proposal Structure

Weâre confused by 3.1 and 3.2, which seem to be duplicate sections using
different notation.

We think the following structure would help us understand the proposal
better:
* 3.1. Describe key derivation for a single hop circuit, like tor-spec
section 5.2.2 at:
https://github.com/torproject/torspec/blob/master/tor-spec.txt#L1210
* 3.2. Describe encryption and decryption for a single-hop circuit, like
tor-spec sections 0.3 and 0.4 at:
https://github.com/torproject/torspec/blob/master/tor-spec.txt#L77
https://github.com/torproject/torspec/blob/master/tor-spec.txt#L124
* 3.3. Describe encryption and decryption for multi-hop circuits, based on
the latest tor-spec section 5.5 at:
https://github.com/torproject/torspec/pull/27/files#diff-390fa89f5c74e769e6e5bef55fc1918bR1366

Making Re-Keying and Tweaking more Explicit

Torâs current cell crypto keeps the AES-CTR counters synchronized and
incrementing, while this proposal sems to derive them from the cell
contents.

In this proposal, we had trouble working out:
* when the AES is re-keyed vs tweaked
* when the GHASH is re-keyed vs updated

So we wondered if the proposed scheme is not IND-CPA. i.e., if the client
repeats a plaintext cell, it will produce identical ciphertext cells,
because the AES-CTR nonce is based solely on the message and keys.

We think the new structure will be clearer, because the re-keying should
all be in section 3.1, and the tweaks/updates should be in section 3.2.

Generalising to Different Numbers of Hops

The proposal assumes that all circuits are 3-hop circuits, but Tor typically
builds 1, 3, 4, and 5-hop circuits. Also, Tor currently generates the same
number of keys for each hop. Thereâs no way it can just generate kf_M, kf_E,
kb_M, and kb_E for the final hop, because sometimes the final hop changes.
(Due to circuit cannibalisation, or failed intro extension.)

Please generalise the proposal so that:
* all references to â3-hop circuitâ are changed to "N-hop circuit".
* all references to kf_1,2,3, kb_1,2,3, k_fM, k_fE, k_bM and k_bE;
are changed to kf_N, kb_N, kfM_N, kfE_N, kbM_N and kbE_N.

Do we really need 6 keys per hop?
Itâs ok if we do, letâs make sure there are no redundant keys.

Explaining which Hops do Integrity Checks

Are kfM_N, kfE_N, kbM_N and kbE_N only used for the innermost encryption and
decryption? (That is, when N is the clientâs target hop, or the cell is sent
from hop N to the client.)
If so, please explain that the integrity checks only apply to the lowest
layer, and how this is secure.
If we can do the integrity checks for each layer, perhaps we should, because
then we are protecting against tagging attacks on some nodes, even when the
exit doesnât support the protocol.

For example, in a standard 3-hop circuit:
C = G = M - E
Where:
= means end-to-end integrity checks
- means no integrity checks
If C and M support the proposed protocol, can they protect against tagging
between G and E, even if G and E donât support the proposed protocol?
(Perhaps this could be a good example for the proposal?)

Protecting Onion Services

When onion services communicate with clients, we want to make sure that
they arenât vulnerable to tagging attacks.

For example, in a standard onion service rendezvous, two circuits are
spliced at the rendezvous point R:
C = G = M1 = R ~ M2 ~ M3 ~ G ~ S
Where:
= means end-to-end integrity checks between C and R
~ means end-to-end integrity checks between S and R
- means no integrity checks

We think that the cells from C to S are protected from tagging, if the
C to R and S to R circuits are protected from tagging, because tagging
requires two unprotected nodes, one to add the tag, and one to remove it.
(Also, like the scenario above with the exit circuit, if some of the hops
support the protocol, then any hops closer to the client or service canât
add or remove tags.)

If onion services are protected, letâs explain that in the proposal.

Minor Stylistic Comments

Weâre not used to the notation that youâre using, so itâs hard for us
to follow some sections.

In particular:
* Using the notation X||Y to represent a tuple of arguments to an
operation is confusing, because || usually means concatenation.
(e.g., H(K||X) should probably be H(K, X) because GHASH doesn't do a
simple concatentation of K and X)
* The capitalization of "K" vs âkâ is inconsistent, so itâs hard to
tell if keys are the same or different.

Here are some of our notes for our future reference:

Impact on Relay Performance

We would like to estimate how the proposed encryption will perform,
compared to Torâs current relay crypto.

We think the proposed scheme could be faster, because:
* If we can throw out SHA1, we're saving a lot in practice: SHA1 is
expensive compared to AES_CTR on current cpus
* We should figure this out with the assumption that we do/don't have
aesni and/or carryless-multiply
* If we have a no-carry multiply (*CLMUL*), fast ghash is easy
* If not, it's a bit of a dark art to get it without timing sidechannels

It looks like we trade 1 SHA1 for N+1 GHASHes, where N is the number of
hops. (GHASH is much faster than SHA1, we think more than 3x.)

Post by Nick Mathewson
Here is (I hope) the correct version of this proposal, reformatted.
Filename: 295-relay-crypto-with-atl.txt
Title: Using ADL-GCM for relay cryptography (solving the crypto-tagging attack)
Author: Tomer Ashur
Created: 09 Apr 2018
Status: Open
0. Context
Although Crypto Tagging Attacks were identified already in the
original Tor design, it was not before the rise of the Procyonidae in
2012 that their severity was fully realized. In Proposal 202 (Two
improved relay encryption protocols for Tor cells) Nick Mathewson
discussed two approaches to stymie tagging attacks and generally
improve Tor's cryptography. In Proposal 261 (AEZ for relay
cryptography) Mathewson puts forward a concrete approach which uses
the tweakable wide-block cipher AEZ.
1. Motivation
For motivations, see proposal 202.
2. Summary and preliminaries
This proposal suggests an alternative approach to Proposal 261 using
the notion of Release (of) Unverified Plaintexts (RUP-security). It
describes an improved algorithm for circuit encryption based on
CTR-mode which is already used in Tor, and an additional component
for hashing. When this additional component is GHash, a GCM-based
solution can be used from the OpenSSL library.
Incidentally, and similar to Proposal 261, this proposal employs the
ENCODE-then-ENCIPHER approach thus it improves Tor's E2E integrity by
using (sufficiently enough) redundancy.
For more information about the scheme and a security proof for its
RUP-security see
Tomer Ashur, Orr Dunkelman, Atul Luykx: Boosting Authenticated
Encryption Robustness with Minimal Modifications. CRYPTO (3)
2017: 3-33 available online at https://eprint.iacr.org/2017/239 .
2.1. Cryptographic algorithms
As an encryption primitive we use AES. We denote one call to the
primitive by E(Â·), e.g.,
Z = E_k(X)
where k is a K-bit key, X is the input to AES and Z is the output. We
omit k where it is irrelevant or is clear from context. While we
propose to set K=256 to ensure long-term post-quantum resistance, we
stress that the solution also works for other key lengths.
To encrypt, we use AES is counter mode and a nonce N to produce a
random bit stream, e.g.,
CTR(N) = E(N)||E(N+1)||...||E(N+l) = z_1||z_2||...||Z_{n+1} = Z
and XOR Z with the message M to produce the ciphertext C, e.g.,
C = Z ^ M
For authentication, we use the universal hash function GHASH and a
key k, e.g.,
T = H(K||X) = H_k(X)
where K is a 256-bit key, X an arbitrary length string, and T a
unique authentication tag for X. While we propose to set K=256 to
ensure long-term post-quantum resistance, we stress that the solution
works for any universal hash function as well as for keys of length
K!=256.
3. The proposed solution
3.1 An overview of the proposed solution
Noting that in CTR-mode the nonce is essential for the
encryption/decryption operations, the proposed solution is based on
linking the nonce and the transmitted data in such a way that any
change to the data (e.g., through a tagging attack) randomizes the
nonce and results in a random value Z' != Z for CTR(N' != N).
When preparing a message to be sent in the forward direction, the
client is responsible for encrypting all layers. The client starts by
digesting the message to create a synthetic tweak T_0 = H_{k_fM}(M)
and uses this tweak to encrypt an all-zero string to create a
synthetic IV: S_0 = T_0 ^ E_{k_fE}(0 ^ T_0).
The message and the synthetic IV (i.e., M||S_0) are passed as input
to the first encryption layer which uses S_0 to produce the random
stream Z = CTR_k1(S_0) and the ciphertext C = Z ^ M. Then, a tweak T
= H_k2(C) is generated by digesting the ciphertext of this
layer. Finally, S_0 is encrypted using AES and the tweak T to produce
an encrypted nonce S = T ^ E_k3(N ^ T) which is appended to the the

Duplicate word "the theâ.

Also the N should probably be S_0.

Post by Nick Mathewson
ciphertext and is used as the nonce for the next encryption layer.
After encrypting 3 times, the client sends C||S to the guard. Whereas

Some circuits are not 3 hops, see above.

Entry nodes can be bridges or guards or a random node (when UseEntryGuards
is 0). So letâs say âentryâ, not âguard".

Post by Nick Mathewson
the protocol previously required that the nonce be kept in sync
between each node and the client, this is no longer required as it
can now be recovered from C||S. To decrypt, the node first computes T
= H_k2(C) then uses it to obtain S = D_k3(S ^ T) ^ T. Finally, CTR(S)

should be S_0 = D_k3(S ^ T) ^ T and CTR_k1(S_0)?

Post by Nick Mathewson
is used to decrypt C into M. The once-decrypted nonce is appended to
the data and M||S is sent to the next node which repeats this
process.
The exit verifies the integrity of the message by digesting it and

End nodes can be exits or directories or intro or rend nodes.
This is an existing problem in tor-spec, see:
https://trac.torproject.org/projects/tor/ticket/26885

Post by Nick Mathewson
using the resulting T to decrypt S_0 into N. If N=0, the message is
authenticated. Otherwise, if the message has been changed by an
adversary, N!=0 and the circuit is destroyed.
When sending back a message from the exit to the client, each node
adds an additional layer of encryption. The exit starts by digesting
the message to create a synthetic tweak T_0 = H_{k_bM}(M) and uses
S_0 = T_0 ^ E_{k_bE}(0 ^ T_0). The synthetic IV is then used to
produce the random stream Z = CTR_k1(S_0) and uses that to encrypt
the plaintext through C = Z ^ M. The nonce is appended to the
ciphertext and C||S_0 is sent to the next node.
Receiving a message from the exit, the middle starts by producing a
tweak T = H_k2(C). The tweak is then used to encrypt the nonce
through S = T ^ E_k3(N ^ T) which is then used to produce the random
stream Z' = CTR_k1(S). Finally, the message is encrypted via C' = Z'
^ C and the encrypted nonce is appended to C'. The message C'||S is
sent to the next node which repeats this procedure.

Some circuits are not 3 hops, see above.

Post by Nick Mathewson
When the message arrives at the client, the client peels each layer
using the nonce that is appended at the end of the message. The
decrypted layer is used to compute the tweak T = H_k2(P), which is
used to decrypt this layer's nonce into the next layer's nonce.

During our review, we noticed that the decryption order in tor-spec is
wrong, so we fixed it:
https://trac.torproject.org/projects/tor/ticket/26860

It looks like this proposal has the correct order, but we think it will be
clearer if it is re-structured. (See above.)

Post by Nick Mathewson
The client verifies the integrity of the message by digesting it and using
the resulting T to decrypt S_0 into N. If N=0, the message is
authenticated. Otherwise, if the message has been changed by an adversary,
N!=0 and the circuit is destroyed.
3.2 Circuit management
3.2.1 Creating circuits
The general process of creating a circuit remains the same (see
tor-spec.txt), except for step 5 where instead of creating a pair of
keys (kf_1,kb_1) for forward and backward communication,
respectively, 2 triplets of keys (kf_1,kf_2,kf_3) and
(kb_1,kb_2,kb_3) are created for communication in the forward and
backward directions, respectively.

Some circuits are not 3 hops, see above.

Post by Nick Mathewson
Two additional pairs of
"authentication keys" (k_fM,k_fE) and (k_bM,k_bE) are created between
the client and the exit in the forward and backward directions,

We think we might need these keys for each node, not just the exit.
(See above.)

Post by Nick Mathewson
respectively.
note: the keys can be created by running KDF-RFC5869 3 times in each
direction, or by running it once in each direction to create a
master-key and derive the rest as subkeys from this master key

Letâs use the same construction as tor-spec to derive the keys, but
generate more data when we need more keys.

Post by Nick Mathewson
3.2.2 Routing relay cells
When an OR receives a RELAY or RELAY_EARLY cell, it checks the cell's
circID and determines whether it has a corresponding circuit along
that connection. If not, the OR drops the cell.

This paragraph is copied from tor-spec.txt.
I think we can drop it from the proposal, and reference tor-spec instead.

Post by Nick Mathewson
Otherwise, if the OR is not at the OP edge of the circuit (that is,
either an 'exit node' or a non-edge node), it de/encrypts the payload
with the stream cipher,

This sentence is copied from tor-spec.txt.
Itâs confusing by itself.

The corresponding section in tor-spec.txt contains some errors.
And itâs also hard to read.

We opened a ticket for this issue:
https://trac.torproject.org/projects/tor/ticket/26859

And then we fixed it in this ticket, because the modified the same sections:
https://trac.torproject.org/projects/tor/ticket/26860

Weâd like to see the proposal revised so it matches the new tor-spec structure:
* forward encryption at the client
* forward decryption at ORs
* backward encryption at the end (exit)
* backward decryption at the client

Post by Nick Mathewson
'Forward' relay cell (same direction as CREATE)
1. T = H_{kf_2}(T'||C); digest ciphertext (T' is a running digest
of all previous cells, i.e., last round's T)
2. N = T ^ D_{kf_3}(S ^ T); decrypt nonce
3. M = CTR_{kf_1}(N) ^ C; decrypt message
(output structure is M||N)
'Back' relay cell (opposite direction from CREATE)
1. T = H_{kb_2}(T'||M); digest ciphertext (T' is a running digest
of all previous cells, i.e., last round's T)
2. S = T ^ E_{kb_3}(N ^ T); encrypt nonce
3. C = CTR_{kb_1}(S) ^ M; encrypt message
(output structure is C||S)

Is Tâ the previous T (tweak) held over as a piece of state?

Because the "running digestâ doesnât seem to be the internal state of
a digest algorithm.

Post by Nick Mathewson
Note that in counter mode, decrypt (D) and encrypt (E) are the same
operation.

This paragraph is copied from tor-spec.txt.
In this context, itâs incorrect.

Post by Nick Mathewson
The OR then decides whether it recognizes the relay cell, by
inspecting the payload as described in Section 4.1 below. If the OR
recognizes the cell, it processes the contents of the relay cell.
Otherwise, it passes the decrypted relay cell along the circuit if
the circuit continues. If the OR at the end of the circuit
encounters an unrecognized relay cell, an error has occurred: the OR
sends a DESTROY cell to tear down the circuit.

This paragraph is copied from tor-spec.txt.
I think we can drop it from the proposal, and reference tor-spec instead.

Post by Nick Mathewson
4. Application connections and stream management
4.1 Relay cells
Within a circuit, the OP and the exit node use the contents of RELAY
packets to tunnel end-to-end commands and TCP connections ("Streams")
across circuits. End-to-end commands can be initiated by either
edge; streams are initiated by the OP.
Relay command [1 byte]
'Recognized' [2 bytes]
StreamID [2 bytes]
Length [2 bytes]
Data [PAYLOAD_LEN-23 bytes]
The 'recognized' field is used for a simple indication if the cell
still encrypted or not. When sending cells, the unencrypted
'recognized' MUST be set to zero.

We revised this paragraph in tor-spec, see below.

Maybe the proposal can just reference tor-spec, rather than copying
this text?

Post by Nick Mathewson
When receiving and decrypting cells the 'recognized' will always be
zero if we're the endpoint that the cell is destined for. For cells
that we should relay, the 'recognized' field will usually be nonzero,
but will accidentally be zero with P=2^-32.

2 bytes, P=2^-16

This error seems to be copied from tor-spec.txt section 6.1.

For a fix, see:
https://trac.torproject.org/projects/tor/ticket/26860

Post by Nick Mathewson
The old Digest field is removed, since sufficient information for
authentication is now included in the all-zero string used to create the
nonce for the first encryption layer.
The 'Length' field of a relay cell contains the number of bytes in
the relay payload which contain real payload data. The remainder of
the payload is padded with NUL bytes.

We have open proposals and tickets on padding, so I suggest that we use
this wording:

"The remainder of the payload is padded with padding bytes.
See tor-spec Section 3 for details."

For those who are interested, here's the list of upcoming changes:

Clarify/determine specification for padding
https://trac.torproject.org/projects/tor/ticket/26228

prop289: Implement authenticated SENDME
https://trac.torproject.org/projects/tor/ticket/26288

In particular:

prop289: Leave unused random bytes in relay cell payload
https://trac.torproject.org/projects/tor/ticket/26846

prop289: randomize the unused part of relay payloads
https://trac.torproject.org/projects/tor/ticket/26871

Post by Nick Mathewson
4.2 Appending the encrypted nonce
When a cell is prepared by the client to be sent in the 'forward'
direction, as explained in Section 3.2.2, the encrypted nonce S is
appended to the encrypted cell (occupying the last 16 bytes of the
cell).

Torâs current digest is in the cell header. But this proposal moves
the integrity check from the header to end of the cell.
Is there any reason why we canât put S in the cell header?

If we did, weâd need a specification like the existing tor-spec
digest:

âThe nonce S is generated based on the content of the entire cell,
with the nonce field filled with NUL bytes.â

We should decide if itâs better to:
* have non-data fields at the start and end of the cell, or
* fill a field with NUL bytes, calculate the nonce, then replace
the NUL bytes with the nonce.

Post by Nick Mathewson
If the cell is prepared to be sent to a node supporting the
new protocol, S is used as the nonce for the next round's
encryption. Otherwise, if the next node only supports the old
protocol, S is still appended to the encrypted cell, but a
synchronized nonce (as per the old protocol) is used for encryption.

Mixed version networks will be easier to implement if we derive keys
pairwise with each node based on the handshake.
(See above.)

We should probably say something like:
"Both the client and the node need to support the new protocol in this
proposal, otherwise we fall back to the old one."

Post by Nick Mathewson
When a cell is sent along the circuit in the 'backward' direction,
nodes supporting the new protocol always assume that the last 16
bytes of the input are the nonce used by the previous node, which
they process as per Section 3.2.2. If the previous node also supports
the new protocol, these cells are indeed the nonce. If the previous
node only supports the old protocol, these bytes are either encrypted
NUL bytes or encrypted data.

âencrypted padding bytes", see above.

Post by Nick Mathewson
4.3 Authenticating the message
End-to-end authentication is performed in both directions in the same
way. Once the final layer of encryption is removed, the
message-to-be-authenticated is digested and this digest is used to
decrypt the nonce used in this layer. If the decrypted nonce is
all-zero, the message is authentic. Otherwise, if a change was made
to the ciphertext or the encrypted nonce somewhere along the circuit,
the nonce decrypts into a random value and the circuit is destroyed.

We think every hop should authenticate, at least in the transition
period. (See above.)

T

--
teor

Please reply @torproject.org
New subkeys 1 July 2018
PGP C855 6CED 5D90 A0C5 29F6 4D43 450C BA7F 968F 094B

Taylor Yu

2018-07-24 20:48:15 UTC

Permalink

Post by teor
A few of us discussed this proposal in the #tor-dev IRC earlier this week.

Hi teor, and thanks for writing this summary! I generally agree with
what you've written.

Post by teor
We’re confused by 3.1 and 3.2, which seem to be duplicate sections using
different notation.

On further reflection, I think sections 3.1 and 3.2 describe two
different revisions of the proposal. 3.1 seems to not include the
previous tweak (T' in 3.2.2) in computing a new tweak, while 3.2.2 does.
It seems like if we adapt 3.1 to incorporate some chaining of the
tweak, the scheme will be IND-CPA. (This is because the ciphertext that
the endpoint receives will no longer be solely a function of the
plaintext and the key.)

I'm going to tentatively propose some more consistent notation here:

C ciphertext

M plaintext message

S encrypted nonce

N CTR mode nonce

T nonce tweak

T' saved nonce tweak

U pre-nonce tweak

U' saved pre-nonce tweak

keys

kf, kb CTR mode key

hf, hb GHASH key

cf, cb nonce-encryption key

gf, gb pre-nonce GHASH key

bf, bb pre-nonce encryption key

(I'm happy to hear feedback on alternative designators.)

-----

single hop outbound (same direction as CREATE):

input C||S

1. T = H_hf(T'||C) -- generate tweak T by hashing saved previous tweak
T' and ciphertext C

2. N = T ^ D_cf(S ^ T) -- decrypt encrypted nonce S using AES and tweak T

3. M = C ^ CTR_kf(N) -- decrypt ciphertext C using CTR and nonce N

4. T' = T -- save tweak for next cell

5. output M||N

single hop inbound (same direction as CREATED):

input M||N

1. C = M ^ CTR_kb(N) -- encrypt message M using CTR and nonce N

2. T = H_hb(T'||C) -- generate tweak T by hashing saved previous tweak
T' and ciphertext C

3. S = T ^ E_cb(N ^ T) -- encrypt nonce N using AES and tweak T

4. T' = T -- save tweak for next cell

5. output C||S

receiving/decrypting a cell at the endpoint (e.g., exit):

input C||S

1. decrypt as above to obtain M and N

2. U = H_gf(U'||M) -- generate tweak U by hashing saved previous tweak
U' and plaintext M

3. V = U ^ D_bf(N ^ U) -- decrypt nonce N with AES and pre-nonce tweak U
for verification

4. U' = U -- save pre-nonce tweak for next cell

5. if V == 0, cell is authentic

sending/encrypting a cell from the endpoint:

input M

1. U = H_gb(U'||M) -- generate tweak U by hashing saved previous tweak
U' and plaintext M

2. N = U ^ E_bb(0 ^ U) -- generate nonce N by encrypting all-zeroes
block using AES and pre-nonce tweak U

3. U' = U

4. encrypt as above using M and N

-----

Open question: what do we initialize the saved tweaks T', U' to? Is it
safe to use an all-zeroes block?

Post by teor
Generalising to Different Numbers of Hops
The proposal assumes that all circuits are 3-hop circuits, but Tor typically
builds 1, 3, 4, and 5-hop circuits. Also, Tor currently generates the same
number of keys for each hop. There’s no way it can just generate kf_M, kf_E,
kb_M, and kb_E for the final hop, because sometimes the final hop changes.
(Due to circuit cannibalisation, or failed intro extension.)
* all references to “3-hop circuit” are changed to "N-hop circuit".
* all references to kf_1,2,3, kb_1,2,3, k_fM, k_fE, k_bM and k_bE;
are changed to kf_N, kb_N, kfM_N, kfE_N, kbM_N and kbE_N.

Judging by the cited paper, the subkeys k1, k2, k3 are all used in the
same hop. Given that we already use digits for hop numbers, we should
choose some alphabetic subkey designators.

(In the paper: k1 = CTR key; k2 = GHASH key for computing tweak; k3 =
AES key for encrypting nonce.)

Post by teor
Do we really need 6 keys per hop?
It’s ok if we do, let’s make sure there are no redundant keys.

It looks like ten keys per hop. Every ordinary relay hop needs the
endpoint authentication keys kf_M, kf_E, kb_M, kb_E to handle
EXTEND/EXTENDED cells at least.

I'm not sure yet which of these are actually required to be independent
to achieve the desired security properties.

Best regards,
-Taylor