Discussion:
[tor-dev] Tor Bandwidth List Format specification
juga
2018-05-09 12:08:00 UTC
Permalink
Hi,

after nick, irl and teor reviewed the last version i sent [1], i paste
below a new version of the specification versions 1.1.0 and 1.0.0.
It's the same version as commit
https://github.com/juga0/torspec/commit/c7cdfd4fcb4b5623e1407e2bec38e9fdf7b70e6b.

The main question that came up was whether we should create a backwards
incompatible specification version 2.0.0.

Since right now it's faster to implement the version 1.1.0 of this
specification, and assuming that we can add the specification and the
code later, i'd propose to continue with 1.1.0.

I've asked dirauths about their opinion about this.

Thanks,
juga

[1] https://lists.torproject.org/pipermail/tor-dev/2018-May/013141.html
[2] https://lists.torproject.org/pipermail/tor-dev/2018-May/013154.html

-----------------------------------------------------------------------

Tor Bandwidth List Format
juga
teor

1. Scope and preliminaries

This document describes the format of Tor's Bandwidth List,
version 1.0.0, 1.1.0 and later.
It is new specification for the existing format 1.0.0.
Describes a new format 1.1.0, which is backwards compatible with
1.0.0 parsers.

Since Tor version 0.2.4.12-alpha the directory authorities use
the Bandwidth List file called "V3BandwidthsFile" generated by
Torflow [1]. The format is described in Torflow's README.spec.txt and
is considered to be version 1.0.0.

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in
RFC 2119.

1.2. Acknowledgements

The original bandwidth generator (Torflow) and format was
created by mike. Teor suggested to write this specification while
contributing on pastly's new bandwidth generator implementation.

This specification was revised after feedback from:

Nick Mathewson (nickm)
Iain Learmonth (irl)

1.3 Outline

The Tor directory protocol (dir-spec.txt [3]) sections 3.4.1
and 3.4.2, use the term bandwidth measurements, to refer to what
here is called Bandwidth List.
A Bandwidth List file contains information on relays' bandwidth
capacities and is produced by bandwidth generators, previously known
as bandwidth scanners.

1.4. Format Versions

1.0.0 - The legacy fallback Bandwidth List format

1.1.0 - Adds KeyValue Lines to the Header List section, add KeyValues
to RelayLines and format versions.

All Tor versions can consume format version 1.0.0.
All Tor versions can consume format version 1.1.0,
but they warn on additional header Lines.
[TODO: this might be fixed, and if it is fixed should be said which
version of Tor]

2. Format details

The Bandwidth List MUST contain the following sections:
- Header List (exactly once)
- Relays' Bandwidth List (zero or more times)
If it does not contain these sections, parsers SHOULD ignore the file.

2.1. Definitions

The following nonterminals are defined in Tor directory protocol
sections 1.2., 2.1.1., 2.1.3.:

Int
SP (space)
NL (newline)
Keyword
ArgumentChar
nickname
hexdigest (a '$', followed by 40 hexadecimal characters
([A-Fa-f0-9]))

Nonterminal defined section 2 of version-spec.txt [4]:

version_number

We define the following nonterminals:

Line ::= ArgumentChar* NL
RelayLine ::= KeyValue (SP KeyValue)* NL
KeyValue ::= Keyword "=" Value
Value ::= ArgumentCharValue+
ArgumentCharValue ::= any printing ASCII character except NL and SP.
Terminator ::= "====="
Timestamp ::= Int
Bandwidth ::= Int
MasterKey ::= a base64-encoded Ed25519 public key, with
padding characters omitted.
DateTime ::= "YYYY-MM-DDTHH:MM:SS", as in ISO 8601

Note that key_value and value are defined in Tor directory protocol
with different formats to KeyValue and Value here.

All Lines in the file MUST be 510 characters or less, to allow for the
trailing newline and NULL characters.
The previous limit was 254 characters in Tor 0.2.6.2-alpha and
earlier.
The parser MAY ignore longer Lines.
[TODO: Change this restriction in 1.1.0 or later]

2.2. Header List format

Some header Lines MUST appear in specific positions, as documented
below.
All other Lines can appear in any order.
If a parser does not recognize any extra material in a header Line,
the Line MUST be ignored.
If a header Line does not conform to this format, the Line SHOULD be
ignored by parsers.

It consists of:

Timestamp NL

[At start, exactly once.]

The Unix Epoch time in seconds when the file was created.
It does not follow the KeyValue format for backwards
compatibility with version 1.0.0.

"version=" version_number NL

[In second position, zero or one time.]

The specification document format version.
It uses semantic versioning [5].

This Line has been added in version 1.1.0 of this specification.

Version 1.0.0 documents do not contain this Line, and the
version_number is considered to be "1.0.0".

"software=" Value NL

[Zero or one time.]

The name of the software that created the document.

This Line has been added in version 1.1.0 of this specification.

Version 1.0.0 documents do not contain this Line, and the software
is considered to be "torflow".

"software_version=" Value NL

[Zero or one time.]

The version of the software that created the document.
The version may be a version_number, a git commit, or some other
version scheme.

This Line has been added in version 1.1.0 of this specification.

"generator_started=" DateTime NL

[Zero or one time.]

The date and time timestamp in ISO 8601 format and UTC time zone
when the generator started.

This Line has been added in version 1.1.0 of this specification.

"earliest_bandwidth=" DateTime NL

[Zero or one time.]

The date and time timestamp in ISO 8601 format and UTC time zone
when the first relay bandwidth was obtained.

This Line has been added in version 1.1.0 of this specification.

KeyValue NL

[Zero or more times.]

There MUST NOT be multiple KeyValue header Lines with the same key.
If there are, the parser SHOULD choose an arbitrary Line.

If a parser does not recognize a Keyword in a KeyValue Line, it
MUST be ignored.

Future format versions may include additional KeyValue header Lines.
Additional header Lines will be accompanied by a minor version
increment.

Implementations MAY add additional header Lines as needed. This
specification SHOULD be updated to avoid conflicting meanings for
the same header keys.

Parsers MUST NOT rely on the order of these additional Lines.

Additional header Lines MUST NOT use any keywords specified in the
relay measurements format.
If there are, the parser MAY ignore conflicting keywords.

Terminator NL

[Zero or one time.]

The Header List section ends with this Terminator.

In version 1.0.0, Header List ends when the first relay bandwidth
is found conforming to the next section.
Implementations of version 1.1.0 SHOULD include this Line.

2.3. Relays' Bandwidth List format

It consists of zero or more RelayLines with the relays' bandwidth
in arbitrary order.

There MUST NOT be multiple KeyValue pairs with the same key in the same
RelayLine.
If there are, the parser SHOULD choose an arbitrary Value.

There MUST NOT be multiple RelayLine per relay identity (node_id or
master_key_ed25519).
If there are, parsers SHOULD issue a warning and MAY choose an arbitrary
value or ignore both values.

If a parser does not recognize any extra material in a RelayLine,
the extra material MUST be ignored.

Each RelayLine MUST include the following KeyValue pairs:
In version 1.0.0, node_id MUST NOT be at the end of the Line.
In version 1.1.0, the KeyValue can be in any arbitrary order.
[TODO: list of Tor version that support it, when it's done]

"node_id=" hexdigest

[Exactly once.]

The fingerprint for the relay's RSA identity key.

"master_key_ed25519=" MasterKey

[Zero or one time.]

The relays's master Ed25519 key, base64 encoded,
without trailing "="s, to avoid ambiguity with KeyValue "="
character.

Implementations of version 1.1.0 SHOULD include both node_id and
master_key_ed25519.
Parsers SHOULD accept Lines that contain at least one of them.

"bw=" Bandwidth

[Exactly once.]

The measured bandwidth of this relay.

Tor accepts zero bandwidths, but they trigger bugs in older Tor
implementations. Therefore, implementations SHOULD NOT produce zero
bandwidths. Instead, they SHOULD use one as their minimum bandwidth.
If there are zero bandwidths, the parser MAY ignore them.

Multiple measurements can be aggregated using an averaging scheme,
such as a mean, median, or decaying average.

Torflow scales bandwidths to kilobytes per second. Other
implementations SHOULD use kilobytes per second for their initial
bandwidth scaling.

If different implementations or configurations are used in votes for
the same network, their measurements MAY need further scaling. See
Appendix B for information about scaling, and one possible scaling
method.

KeyValue

[Zero or more times.]

Future format versions may include additional KeyValue pairs on a
RelayLine.
Additional KeyValue pairs will be accompanied by a minor version
increment.

Implementations MAY add additional relay KeyValue pairs as needed.
This specification SHOULD be updated to avoid conflicting meanings
for the same Keywords.

Parsers MUST NOT rely on the order of these additional KeyValue
pairs.

Additional KeyValue pairs MUST NOT use any keywords specified in the
header format.
If there are, the parser MAY ignore conflicting keywords.

2.4. Implementation notes

KeyValue pairs in RelayLines that current implementations generate.

2.4.1. Simple Bandwidth Scanner

Every RelayLine in sbws version 0.1.0 consists of:

"node_id=" hexdigest SP

As above.

"bw=" Bandwidth SP

As above.

"nick=" nickname SP

[Exactly once.]

The relay nickname.

"rtt=" Int SP

[Exactly once.]

The Round Trip Time in milliseconds to obtain 1 byte of data.

"time=" DateTime NL

[Exactly once.]

The date and time timestamp in ISO 8601 format and UTC time zone
when the last bandwidth was obtained.

2.4.2. Torflow

Torflow RelayLines include node_id and bw, and other KeyValue pairs [2].

References:

1. https://gitweb.torproject.org/torflow.git
2.
https://gitweb.torproject.org/torflow.git/tree/NetworkScanners/BwAuthority/README.spec.txt#n332
3. https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt
4. https://gitweb.torproject.org/torspec.git/tree/version-spec.txt
5. https://semver.org/

A. Sample data

The following has not been obtained from any real measurement.

A.1. Generated by Torflow

This an example version 1.0.0 document:

1523911758
node_id=$68A483E05A2ABDCA6DA5A3EF8DB5177638A27F80 bw=760 nick=Test
measured_at=1523911725 updated_at=1523911725 pid_error=4.11374090719
pid_error_sum=4.11374090719 pid_bw=57136645 pid_delta=2.12168374577
circ_fail=0.2 scanner=/filepath
node_id=$96C15995F30895689291F455587BD94CA427B6FC bw=189 nick=Test2
measured_at=1523911623 updated_at=1523911623 pid_error=3.96703337994
pid_error_sum=3.96703337994 pid_bw=47422125 pid_delta=2.65469736988
circ_fail=0.0 scanner=/filepath

A.2. Generated by sbws version 0.1.X
[TODO: this needs to be implemented when this spec is finished]

1523911758
version=1.1.0
software=sbws
software_version=0.1.0
generator_started=2018-05-08T16:13:25
earliest_bandwidth=2018-05-08T16:13:26
====
node_id=$68A483E05A2ABDCA6DA5A3EF8DB5177638A27F80
master_key_ed25519=YaqV4vbvPYKucElk297eVdNArDz9HtIwUoIeo0+cVIpQ bw=760
nick=Test rtt=380 time=2018-05-08T16:13:26
node_id=$96C15995F30895689291F455587BD94CA427B6FC
master_key_ed25519=a6a+dZadrQBtfSbmQkP7j2ardCmLnm5NJ4ZzkvDxbo0I bw=189
nick=Test2 rtt=378 time=2018-05-08T16:13:36

B. Scaling bandwidths

B.1. Scaling requirements

Tor accepts zero bandwidths, but they trigger bugs in older Tor
implementations. Therefore, scaling methods SHOULD perform the
following checks:
* If the total bandwidth is zero, all relays should be given equal
bandwidths.
* If the scaled bandwidth is zero, it should be rounded up to one.

Initial experiments indicate that scaling may not be needed for
torflow and sbws, because their measured bandwidths are similar
enough already.

B.2. A linear scaling method

If scaling is required, here is a simple linear bandwith scaling
method, which ensures that all bandwidth votes contain approximately
the same total bandwidth:

1. Calculate the relay quota by dividing the total measured bandwidth
in all votes, by the number of relays with measured bandwidth
votes. In the public tor network, this is approximately 7500 as of
April 2018. The quota should be a consensus parameter, so it can be
adjusted for all generators on the network.

2. Calculate a vote quota by multiplying the relay quota by the number
of relays this bandwidth authority has measured
bandwidths for.

3. Calculate a scaling factor by dividing the vote quota by the
total unscaled measured bandwidth in this bandwidth
authority's upcoming vote.

4. Multiply each unscaled measured bandwidth by the scaling
factor.

Now, the total scaled bandwidth in the upcoming vote is
approximately equal to the quota.

B.3. Quota changes

If all generators are using scaling, the quota can be gradually
reduced or increased as needed. Smaller quotas decrease the size
of uncompressed consensuses, and may decrease the size of
consensus diffs and compressed consensuses. But if the relay
quota is too small, some relays may be over- or under-weighted.
juga
2018-05-10 13:40:00 UTC
Permalink
Post by juga
The main question that came up was whether we should create a backwards
incompatible specification version 2.0.0.
I do think it may be easier to finish this specification first before
moving on to a 2.0.0 version. We discussed this specification at the
last Tor Metrics meeting: Currently Tor Metrics' descriptor parsing
library (metrics-lib) does not have a parser for the current (1.0.0)
bandwidth lists. We should implement this, and hopefully it would not be
too much extra work to make this also work for 1.1.0 lists.
It shouldn't be much extra work, if metrics-lib ignore the extra header
lines and extra bandwidth KeyValue (as Tor currently does).
If there were to be a 2.0.0 version, I would hope this brings the syntax
closer to other Tor descriptors to enable us to reuse existing code in
our parser (and also for tor to reuse existing code too) rather than
being something entirely new.
That was the idea :)

Thanks!
juga.
juga
2018-05-21 12:52:00 UTC
Permalink
Hi,

as commented by nickm [0], the specification has been merged into
torspec git repository.

If you find any issues with it, as usual, we can open a ticket and patch
the specification.

Thanks,
juga.

[0] https://trac.torproject.org/projects/tor/ticket/25869#comment:10
Loading...