Discussion:
[tor-dev] Proposal: Tor bandwidth measurements document format
juga
2018-04-30 13:21:00 UTC
Permalink
Hi,

after teor's revision, second version pasted below.

Changes can be seen: in
https://github.com/juga0/torspec/commits/bandwidth-file-spec

Best,
juga

=================================================================

Tor Bandwidth Measurements Document Format
juga
teor

1. Scope and preliminaries

This document describes the format of Tor's bandwidth measurements
document, version 1.0.0 and later.

Since Tor version 0.2.4.12-alpha the directory
authorities use the bandwidth measurements document called
"V3BandwidthsFile" and produced by Torflow [1]
(format described in README.spec.txt [2]).

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in
RFC 2119.

1.2. Acknowledgements

The original bandwidth measurement scanner (Torflow) and format was
created by mike. Teor suggested to write this specification while
contributing on pastly's new bandwidth scanner implementation.

This specification was revised after feedback from:

XXX

1.3 Outline

The bandwidth measurements mentioned in sections 3.4.1 and 3.4.2
of "Tor directory protocol" (dir-spec.txt) [3] are obtained
by bandwidth authorities, which generate a file storing information
on relays' measured bandwidth capacities.

1.4. Format Versions

1.0.0 - The legacy fallback bandwidth measurements document format

1.1.0 - Adds key_value lines to the header, format version,
optional ones and section separator.

2. Format details

Bandwidth measurements MUST contain the following sections:
- Header (exactly once)
- Relays measurements (zero or more times)

2.1. Definitions

The following nonterminals are defined in dir-spec.txt, sections
1.2., 2.1.1., 2.1.3.:

Int
SP (space)
NL (newline)
Keyword
ArgumentChar
fingerprint (hexdigest)
nickname

Nonterminals defined in "Tor Directory List Format" (dir-list-spec.txt),
section 2.2.1.:

version_number

We define the following nonterminals:

value ::= ArgumentChar+
key_value ::= Keyword "=" value
line ::= ArgumentChar* NL
timestamp ::= Int
bandwidth ::= Int
relay_line ::= key_value (SP key_value)* NL

2.2. Header format

Some header lines MUST appear in specific positions, as documented below.
All other lines can appear in any order.

There MUST NOT be multiple key_value header lines with the same key.

It consists of:

timestamp NL

[At start, exactly once.]

The Unix Epoch time in seconds when the file was created.

"version=" version_number NL

[In second position, zero or one time.]

The specification document format version.
It uses semantic versioning [5].

This line has been added in version 1.1.0 of this specification.

Version 1.0.0 documents do not contain this line, and the
version_number is considered to be "1.0.0".

"software=" value NL

[Zero or one time.]

The name of the software that created the document.

This line has been added in version 1.1.0 of this specification.

Version 1.0.0 documents do not contain this line, and the software is
considered to be "torflow".

"software_version=" value NL

[Zero or one time.]

The version of the software that created the document.
The version may be a version_number, a git commit, or some other
version scheme.

This line has been added in version 1.1.0 of this specification.

"scanner_started=" timestamp NL

[Zero or one time.]

The Unix Epoch time in seconds when the scanner that generates the
measurements document started.

This line has been added in version 1.1.0 of this specification.

"earliest_measurement=" timestamp NL

[Zero or one time.]

The Unix Epoch time in seconds when the first relay measurement
was obtained.

This line has been added in version 1.1.0 of this specification.

key_value NL

[Zero or more times.]

Future format versions may include additional key_value header lines.
Additional header lines will be accompanied by a minor version
increment.

Implementations MAY add additional header lines as needed. This
specification SHOULD be updated to avoid conflicting meanings for the
same header keys.

Parsers MUST NOT rely on the order of these additional lines.

Additional header lines MUST NOT use any keywords specified in the
relay measurements format.

If a header line does not conform to this format, the line SHOULD be
ignored by parsers.

NL

[Zero or one time.]

The header ends.

This line has been added in version 1.1.0 of this specification.

For version 1.0.0 documents, the header ends when the first relay
measurement line is found conforming to the next section.

2.3. Relay measurements format

It consists of zero or more relay_line with the measurement results
of relays in arbitrary order.

There can be at most one relay_line per relay identity (fingerprint).

There MUST NOT be multiple key_value pairs with the same key in the same
relay_line.

Each relay_line MUST include the following key_value in arbitrary order:

"node_id=" fingerprint

[Exactly once.]

The fingerprint of the relay being measured.

"bw=" bandwidth

[Exactly once.]

The measured bandwidth of this relay.

Tor accepts zero bandwidths, but they trigger bugs in older Tor
implementations. Therefore, implementations SHOULD NOT produce zero
bandwidths. Instead, they SHOULD use one as their minimum bandwidth.

Multiple measurements can be aggregated using an averaging scheme, such
as a mean, median, or decaying average.

Torflow scales bandwidths to kilobytes per second. Other implementations
SHOULD use kilobytes per second for their initial bandwidth scaling.

If different implementations or configurations are used in votes for the
same network, their measurements MAY need further scaling. See
Appendix B
for information about scaling, and one possible scaling method.

key_value

[Zero or more times.]

Future format versions may include additional key_value pairs on a
relay_line.
Additional key_value pairs will be accompanied by a minor version
increment.

Implementations MAY add additional relay key_value pairs as needed. This
specification SHOULD be updated to avoid conflicting meanings for the
same relay keys.

Parsers MUST NOT rely on the order of these additional key_value pairs.

Additional key_value pairs MUST NOT use any keywords specified in the
header format.

If a relay line does not conform to this format, the line SHOULD be
ignored by parsers.

2.4. Implementation notes

2.4.1. Simple Bandwidth Scanner

Every relay measurement in sbws version 0.1.0 consists of:

"node_id=" fingerprint SP

As above.

"bw=" bandwidth SP

As above.

"nick=" nickname SP

[Exactly once.]

The relay nickname.

"rtt=" Int SP

[Exactly once.]

The Round Trip Time in milliseconds to obtain 1 byte of data.

"time=" timestamp NL

[Exactly once.]

The Unix Epoch time in seconds when the last measurement was performed.

2.4.2. Torflow

Torflow relay lines include node_id and bw, and other key_value pairs [2].

References:

1. https://gitweb.torproject.org/torflow.git
2.
https://gitweb.torproject.org/torflow.git/tree/NetworkScanners/BwAuthority/README.spec.txt#n332
3. https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt
4. https://metrics.torproject.org/onionoo.html#details
5. https://semver.org/

A. Sample data

The following has not been obtained from any real measurement.

A.1. Generated by Torflow

This an example version 1.0.0 document:

1523911758
node_id=$68A483E05A2ABDCA6DA5A3EF8DB5177638A27F80 bw=760 nick=Test
measured_at=1523911725 updated_at=1523911725 pid_error=4.11374090719
pid_error_sum=4.11374090719 pid_bw=57136645 pid_delta=2.12168374577
circ_fail=0.2 scanner=/filepath
node_id=$96C15995F30895689291F455587BD94CA427B6FC bw=189 nick=Test2
measured_at=1523911623 updated_at=1523911623 pid_error=3.96703337994
pid_error_sum=3.96703337994 pid_bw=47422125 pid_delta=2.65469736988
circ_fail=0.0 scanner=/filepath

A.2. Generated by sbws version 0.1.0

1523911758
version=1.1.0
software=sbws
software_version=0.1.0
scanner_started=1523911756
earliest_measurement=1523911757

node_id=$68A483E05A2ABDCA6DA5A3EF8DB5177638A27F80 bw=760 nick=Test
rtt=380 time=1523911725
node_id=$96C15995F30895689291F455587BD94CA427B6FC bw=189 nick=Test2
rtt=378 time=1523911623

B. Scaling bandwidths

B.1. Scaling requirements

Tor accepts zero bandwidths, but they trigger bugs in older Tor
implementations. Therefore, scaling methods SHOULD perform the
following checks:
* If the total bandwidth is zero, all relays should be given equal
bandwidths.
* If the scaled bandwidth is zero, it should be rounded up to one.

Initial experiments indicate that scaling may not be needed for
torflow and sbws, because their measured bandwidths are similar
enough already.

B.2. A linear scaling method

If scaling is required, here is a simple linear bandwith scaling
method, which ensures that all bandwidth votes contain approximately
the same total bandwidth:

1. Calculate the relay quota by dividing the total measured bandwidth
in all votes, by the number of relays with measured bandwidth
votes. In the public tor network, this is approximately 7500 as of
April 2018. The quota should be a consensus parameter, so it can be
adjusted for all scanners on the network.

2. Calculate a vote quota by multiplying the relay quota by the number
of relays this bandwidth authority has measured
bandwidths for.

3. Calculate a scaling factor by dividing the vote quota by the
total unscaled measured bandwidth in this bandwidth
authority's upcoming vote.

4. Multiply each unscaled measured bandwidth by the scaling
factor.

Now, the total scaled bandwidth in the upcoming vote is
approximately equal to the quota.

B.3. Quota changes

If all scanners are using scaling, the quota can be gradually
reduced or increased as needed. Smaller quotas decrease the size
of uncompressed consensuses, and may decrease the size of
consensus diffs and compressed consensuses. But if the relay
quota is too small, some relays may be over- or under-weighted.
Nick Mathewson
2018-05-01 12:36:19 UTC
Permalink
Hi, Juga!

This is a review of the document from
https://raw.githubusercontent.com/juga0/torspec/c7f06023dd1d5d47adad128de541f8eba2a13bfb/bandwidth-file-spec.txt
, which I *think* is the same as the document you have below.

I'm reviewing this as though it were a fully new format, since I'm not sure
how much we already have locked-in based on existing code, and how much is
new. We might decide that backward compatibility is more important than
consistency, and if so, we won't want to take all of my recommendations
here.
Post by juga
Tor Bandwidth Measurements Document Format
juga
teor
1. Scope and preliminaries
This document describes the format of Tor's bandwidth measurements
document, version 1.0.0 and later.
Suggestion: Maybe explicitly say "1.0.0, 1.1.0, and later"?
Post by juga
Since Tor version 0.2.4.12-alpha the directory
authorities use the bandwidth measurements document called
"V3BandwidthsFile" and produced by Torflow [1]
(format described in README.spec.txt [2]).
Recommendation: "Format described in Torflow's README.spec.txt".

Explanation needed: Is this a new format, or a new specification of the
existing format? Let's say so here.

Question: If this is a different format, and we're calling it version
1.0.0, what should we call the old one? But later it seems that we're
introducing 1.1.0, and we're calling the old one 1.0.0.

Suggestion: let's be explicit that we're only describing the format
here, and *not* describing how bwauths generate their data.
Post by juga
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in
RFC 2119.
1.2. Acknowledgements
The original bandwidth measurement scanner (Torflow) and format was
created by mike. Teor suggested to write this specification while
contributing on pastly's new bandwidth scanner implementation.
XXX
1.3 Outline
The bandwidth measurements mentioned in sections 3.4.1 and 3.4.2
of "Tor directory protocol" (dir-spec.txt) [3] are obtained
by bandwidth authorities, which generate a file storing information
on relays' measured bandwidth capacities.
1.4. Format Versions
1.0.0 - The legacy fallback bandwidth measurements document format
1.1.0 - Adds key_value lines to the header, format version,
optional ones and section separator.
Information: Let's repeat in this section which versions of Tor can
consume these versions.
Post by juga
2. Format details
- Header (exactly once)
- Relays measurements (zero or more times)
Grammar suggestion: "Relay measurements".
Post by juga
2.1. Definitions
The following nonterminals are defined in dir-spec.txt, sections
Int
SP (space)
NL (newline)
Keyword
ArgumentChar
fingerprint (hexdigest)
Does this have to start with a "$" ? I think it does. Maybe we should be
explicit about that.
Post by juga
nickname
Nonterminals defined in "Tor Directory List Format" (dir-list-spec.txt),
version_number
value ::= ArgumentChar+
key_value ::= Keyword "=" value
line ::= ArgumentChar* NL
timestamp ::= Int
bandwidth ::= Int
relay_line ::= key_value (SP key_value)* NL
2.2. Header format
Some header lines MUST appear in specific positions, as documented below.
All other lines can appear in any order.
There MUST NOT be multiple key_value header lines with the same key.
Maybe this line belongs below in the key_value section?
Post by juga
timestamp NL
[At start, exactly once.]
The Unix Epoch time in seconds when the file was created.
Question: Why no keyword and equal sign here? Is this a legacy thing?

Also, wouldn't it be more standard to have it be in YYYY-MM-DDTHH:MM:SS
format?
Post by juga
"version=" version_number NL
[In second position, zero or one time.]
The specification document format version.
It uses semantic versioning [5].
This line has been added in version 1.1.0 of this specification.
Version 1.0.0 documents do not contain this line, and the
version_number is considered to be "1.0.0".
General concern: I question the use of = signs here in the headers. If
we use "SP" instead, then we can reuse a lot of the same machinery tor
currently uses to parse other documents.
Post by juga
"software=" value NL
[Zero or one time.]
The name of the software that created the document.
This line has been added in version 1.1.0 of this specification.
Version 1.0.0 documents do not contain this line, and the software is
considered to be "torflow".
"software_version=" value NL
[Zero or one time.]
The version of the software that created the document.
The version may be a version_number, a git commit, or some other
version scheme.
This line has been added in version 1.1.0 of this specification.
"scanner_started=" timestamp NL
[Zero or one time.]
The Unix Epoch time in seconds when the scanner that generates the
measurements document started.
This line has been added in version 1.1.0 of this specification.
See note above about time format. YYYY-MM-DDTHH:MM:SS is how we specify
times elsewhere in Tor.
Post by juga
"earliest_measurement=" timestamp NL
[Zero or one time.]
The Unix Epoch time in seconds when the first relay measurement
was obtained.
This line has been added in version 1.1.0 of this specification.
See note above about time format.
Post by juga
key_value NL
[Zero or more times.]
Future format versions may include additional key_value header lines.
Additional header lines will be accompanied by a minor version
increment.
Post by juga
Implementations MAY add additional header lines as needed. This
specification SHOULD be updated to avoid conflicting meanings for the
same header keys.
Parsers MUST NOT rely on the order of these additional lines.
Additional header lines MUST NOT use any keywords specified in the
relay measurements format.
If a header line does not conform to this format, the line SHOULD be
ignored by parsers.
Suggestion: say what recipients of this document should do with
unrecognized data. In general, it's good for forward compatibility to
say something like, "Recipients MUST ignore key_value lines if they do
not recognize the keyword. Recipients MUST ignore any extra material in
a line that they do not recognize."

Also see suggestion above about using SP as our separator rather than
"=" for consistency with other documents Tor parses.
Post by juga
NL
[Zero or one time.]
The header ends.
This line has been added in version 1.1.0 of this specification.
For version 1.0.0 documents, the header ends when the first relay
measurement line is found conforming to the next section.
Suggestion: Replace this empty line with an explicit keyword, for
consistency with other documents.
Post by juga
2.3. Relay measurements format
It consists of zero or more relay_line with the measurement results
of relays in arbitrary order.
There can be at most one relay_line per relay identity (fingerprint).
There MUST NOT be multiple key_value pairs with the same key in the same
relay_line.
Do existing implementations accept arbitrary order here?
Post by juga
"node_id=" fingerprint
[Exactly once.]
The fingerprint of the relay being measured.
Suggestion: Add a field to hold the Ed25519 Identity of the relay being
measured. Say that implementations SHOULD include both RSA fingerprint
and Ed25519 identity, and that implementations SHOULD accept lines that
contain at least one of them.
Post by juga
"bw=" bandwidth
[Exactly once.]
The measured bandwidth of this relay.
Tor accepts zero bandwidths, but they trigger bugs in older Tor
implementations. Therefore, implementations SHOULD NOT produce zero
bandwidths. Instead, they SHOULD use one as their minimum bandwidth.
Multiple measurements can be aggregated using an averaging scheme, such
as a mean, median, or decaying average.
Torflow scales bandwidths to kilobytes per second. Other
implementations
Post by juga
SHOULD use kilobytes per second for their initial bandwidth scaling.
If different implementations or configurations are used in votes for the
same network, their measurements MAY need further scaling. See
Appendix B
Post by juga
for information about scaling, and one possible scaling method.
key_value
[Zero or more times.]
Technically, this isn't a key_value, because a "value" is made of
ArgumentChar, and ArgumentChar can contain spaces. So if we were
parsing
"foo=abc bar=def"
we might be parsing either one key_value ("foo", "abc bar=def") or two
("foo", "abc"), ("bar, "def").
Post by juga
Future format versions may include additional key_value pairs on a
relay_line.
Post by juga
Additional key_value pairs will be accompanied by a minor version
increment.
Post by juga
Implementations MAY add additional relay key_value pairs as needed. This
specification SHOULD be updated to avoid conflicting meanings for the
same relay keys.
Parsers MUST NOT rely on the order of these additional key_value pairs.
Additional key_value pairs MUST NOT use any keywords specified in the
header format.
As above, let's say that a parser should ignore key_value entries with
keywords that it doesn't recognize.
Post by juga
If a relay line does not conform to this format, the line SHOULD be
ignored by parsers.
2.4. Implementation notes
2.4.1. Simple Bandwidth Scanner
"node_id=" fingerprint SP
As above.
"bw=" bandwidth SP
As above.
"nick=" nickname SP
[Exactly once.]
The relay nickname.
"rtt=" Int SP
[Exactly once.]
The Round Trip Time in milliseconds to obtain 1 byte of data.
"time=" timestamp NL
[Exactly once.]
The Unix Epoch time in seconds when the last measurement was performed.
2.4.2. Torflow
Torflow relay lines include node_id and bw, and other key_value pairs [2].
1. https://gitweb.torproject.org/torflow.git
2.
https://gitweb.torproject.org/torflow.git/tree/NetworkScanners/BwAuthority/README.spec.txt#n332
Post by juga
3. https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt
4. https://metrics.torproject.org/onionoo.html#details
5. https://semver.org/
A. Sample data
The following has not been obtained from any real measurement.
A.1. Generated by Torflow
1523911758
node_id=$68A483E05A2ABDCA6DA5A3EF8DB5177638A27F80 bw=760 nick=Test
measured_at=1523911725 updated_at=1523911725 pid_error=4.11374090719
pid_error_sum=4.11374090719 pid_bw=57136645 pid_delta=2.12168374577
circ_fail=0.2 scanner=/filepath
Post by juga
node_id=$96C15995F30895689291F455587BD94CA427B6FC bw=189 nick=Test2
measured_at=1523911623 updated_at=1523911623 pid_error=3.96703337994
pid_error_sum=3.96703337994 pid_bw=47422125 pid_delta=2.65469736988
circ_fail=0.0 scanner=/filepath
Post by juga
A.2. Generated by sbws version 0.1.0
1523911758
version=1.1.0
software=sbws
software_version=0.1.0
scanner_started=1523911756
earliest_measurement=1523911757
node_id=$68A483E05A2ABDCA6DA5A3EF8DB5177638A27F80 bw=760 nick=Test
rtt=380 time=1523911725
Post by juga
node_id=$96C15995F30895689291F455587BD94CA427B6FC bw=189 nick=Test2
rtt=378 time=1523911623
Post by juga
B. Scaling bandwidths
B.1. Scaling requirements
Tor accepts zero bandwidths, but they trigger bugs in older Tor
implementations. Therefore, scaling methods SHOULD perform the
* If the total bandwidth is zero, all relays should be given equal
bandwidths.
* If the scaled bandwidth is zero, it should be rounded up to one.
Initial experiments indicate that scaling may not be needed for
torflow and sbws, because their measured bandwidths are similar
enough already.
B.2. A linear scaling method
If scaling is required, here is a simple linear bandwith scaling
method, which ensures that all bandwidth votes contain approximately
1. Calculate the relay quota by dividing the total measured bandwidth
in all votes, by the number of relays with measured bandwidth
votes. In the public tor network, this is approximately 7500 as of
April 2018. The quota should be a consensus parameter, so it can be
adjusted for all scanners on the network.
2. Calculate a vote quota by multiplying the relay quota by the number
of relays this bandwidth authority has measured
bandwidths for.
3. Calculate a scaling factor by dividing the vote quota by the
total unscaled measured bandwidth in this bandwidth
authority's upcoming vote.
4. Multiply each unscaled measured bandwidth by the scaling
factor.
Now, the total scaled bandwidth in the upcoming vote is
approximately equal to the quota.
B.3. Quota changes
If all scanners are using scaling, the quota can be gradually
reduced or increased as needed. Smaller quotas decrease the size
of uncompressed consensuses, and may decrease the size of
consensus diffs and compressed consensuses. But if the relay
quota is too small, some relays may be over- or under-weighted.
Karsten Loesing
2018-05-01 14:35:41 UTC
Permalink
Hi Juga,
Post by Nick Mathewson
This is a review of the document from
https://raw.githubusercontent.com/juga0/torspec/c7f06023dd1d5d47adad128de541f8eba2a13bfb/bandwidth-file-spec.txt
, which I *think* is the same as the document you have below.
I'd like to review this document format, too, in particular with regard
to archiving these documents with CollecTor in the future. (Unless there
are no plans to archive them, ever.)

Should I wait for you to revise the document and join in the next review
round, or should I review the document now? In the latter case, where
would I find the most recent version?

Thanks!

All the best,
Karsten
juga
2018-05-01 15:37:00 UTC
Permalink
Post by Karsten Loesing
Hi Juga,
Post by Nick Mathewson
This is a review of the document from
https://raw.githubusercontent.com/juga0/torspec/c7f06023dd1d5d47adad128de541f8eba2a13bfb/bandwidth-file-spec.txt
, which I *think* is the same as the document you have below.
I'd like to review this document format, too, in particular with regard
to archiving these documents with CollecTor in the future. (Unless there
are no plans to archive them, ever.)
Should I wait for you to revise the document and join in the next review
round, or should I review the document now?
From my side, you can review this now.

In the latter case, where
Post by Karsten Loesing
would I find the most recent version?
I don't if i interpret you correctly, but while working on it and not in
the torspec canonical repo, last version should be in
https://github.com/juga0/torspec/tree/bandwidth-file-spec.

Thanks!,
juga.
juga
2018-05-01 17:08:00 UTC
Permalink
Hi,

Thanks Nick for the comments, i'm replaying only to the parts where i
give an answer or i've more questions. I'd accept the rest of your
suggestions unless there will be further comments.
Post by Nick Mathewson
Hi, Juga!
This is a review of the document from
https://raw.githubusercontent.com/juga0/torspec/c7f06023dd1d5d47adad128de541f8eba2a13bfb/bandwidth-file-spec.txt
, which I *think* is the same as the document you have below.
Yes, it is.
Post by Nick Mathewson
I'm reviewing this as though it were a fully new format, since I'm not sure
how much we already have locked-in based on existing code, and how much is
new. We might decide that backward compatibility is more important than
consistency, and if so, we won't want to take all of my recommendations
here.
Post by juga
Tor Bandwidth Measurements Document Format
juga
teor
1. Scope and preliminaries
This document describes the format of Tor's bandwidth measurements
document, version 1.0.0 and later.
Suggestion: Maybe explicitly say "1.0.0, 1.1.0, and later"?
Post by juga
Since Tor version 0.2.4.12-alpha the directory
authorities use the bandwidth measurements document called
"V3BandwidthsFile" and produced by Torflow [1]
(format described in README.spec.txt [2]).
Recommendation: "Format described in Torflow's README.spec.txt".
Explanation needed: Is this a new format, or a new specification of the
existing format? Let's say so here.
New version of existing format. Though old version (Torflow's), didn't
have an specification in the sense this specification is being made).
Post by Nick Mathewson
Question: If this is a different format, and we're calling it version
1.0.0, what should we call the old one? But later it seems that we're
introducing 1.1.0, and we're calling the old one 1.0.0.
yeah, this would be 1.1.0, the old one (Torflow's) would be 1.0.0
Post by Nick Mathewson
Suggestion: let's be explicit that we're only describing the format
here, and *not* describing how bwauths generate their data.
Post by juga
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in
RFC 2119.
1.2. Acknowledgements
The original bandwidth measurement scanner (Torflow) and format was
created by mike. Teor suggested to write this specification while
contributing on pastly's new bandwidth scanner implementation.
XXX
1.3 Outline
The bandwidth measurements mentioned in sections 3.4.1 and 3.4.2
of "Tor directory protocol" (dir-spec.txt) [3] are obtained
by bandwidth authorities, which generate a file storing information
on relays' measured bandwidth capacities.
1.4. Format Versions
1.0.0 - The legacy fallback bandwidth measurements document format
1.1.0 - Adds key_value lines to the header, format version,
optional ones and section separator.
Information: Let's repeat in this section which versions of Tor can
consume these versions.
Post by juga
2. Format details
- Header (exactly once)
- Relays measurements (zero or more times)
Grammar suggestion: "Relay measurements".
Post by juga
2.1. Definitions
The following nonterminals are defined in dir-spec.txt, sections
Int
SP (space)
NL (newline)
Keyword
ArgumentChar
fingerprint (hexdigest)
Does this have to start with a "$" ? I think it does. Maybe we should be
explicit about that.
Yes
Post by Nick Mathewson
Post by juga
nickname
Nonterminals defined in "Tor Directory List Format" (dir-list-spec.txt),
version_number
value ::= ArgumentChar+
key_value ::= Keyword "=" value
line ::= ArgumentChar* NL
timestamp ::= Int
bandwidth ::= Int
relay_line ::= key_value (SP key_value)* NL
2.2. Header format
One more thing that teor pointed at me: any line MUST be shorter than
512 characters (legacy restriction).
Teor pointed at me, i thought it was only for timestamp, but then i
realized it's for any line.
Post by Nick Mathewson
Post by juga
Some header lines MUST appear in specific positions, as documented below.
All other lines can appear in any order.
There MUST NOT be multiple key_value header lines with the same key.
Maybe this line belongs below in the key_value section?
Post by juga
timestamp NL
[At start, exactly once.]
The Unix Epoch time in seconds when the file was created.
Question: Why no keyword and equal sign here? Is this a legacy thing?
Yes, because of the way Tor [0] parses it, and the way Torflow generates it.
Post by Nick Mathewson
Also, wouldn't it be more standard to have it be in YYYY-MM-DDTHH:MM:SS
format?
In this case we would need to patch current versions to accept it. Would
be that ok?. In that case we could also make it key_value.
We need one path right now: change function in [0] to accept additional
headers (ticket #25960).
Post by Nick Mathewson
Post by juga
"version=" version_number NL
[In second position, zero or one time.]
The specification document format version.
It uses semantic versioning [5].
This line has been added in version 1.1.0 of this specification.
Version 1.0.0 documents do not contain this line, and the
version_number is considered to be "1.0.0".
General concern: I question the use of = signs here in the headers. If
we use "SP" instead, then we can reuse a lot of the same machinery tor
currently uses to parse other documents.
I guess we should see then how much we should refactor function in [0]
to reuse parsecommon.c (as you pointed me at by IRC).
Post by Nick Mathewson
Post by juga
"software=" value NL
[Zero or one time.]
The name of the software that created the document.
This line has been added in version 1.1.0 of this specification.
Version 1.0.0 documents do not contain this line, and the software is
considered to be "torflow".
"software_version=" value NL
[Zero or one time.]
The version of the software that created the document.
The version may be a version_number, a git commit, or some other
version scheme.
This line has been added in version 1.1.0 of this specification.
"scanner_started=" timestamp NL
[Zero or one time.]
The Unix Epoch time in seconds when the scanner that generates the
measurements document started.
This line has been added in version 1.1.0 of this specification.
See note above about time format. YYYY-MM-DDTHH:MM:SS is how we specify
times elsewhere in Tor.
Since this is new, then no problem on changing to this format.
Post by Nick Mathewson
Post by juga
"earliest_measurement=" timestamp NL
[Zero or one time.]
The Unix Epoch time in seconds when the first relay measurement
was obtained.
This line has been added in version 1.1.0 of this specification.
See note above about time format.
Post by juga
key_value NL
[Zero or more times.]
Future format versions may include additional key_value header lines.
Additional header lines will be accompanied by a minor version
increment.
Post by juga
Implementations MAY add additional header lines as needed. This
specification SHOULD be updated to avoid conflicting meanings for the
same header keys.
Parsers MUST NOT rely on the order of these additional lines.
Additional header lines MUST NOT use any keywords specified in the
relay measurements format.
If a header line does not conform to this format, the line SHOULD be
ignored by parsers.
Suggestion: say what recipients of this document should do with
unrecognized data. In general, it's good for forward compatibility to
say something like, "Recipients MUST ignore key_value lines if they do
not recognize the keyword. Recipients MUST ignore any extra material in
a line that they do not recognize."
Also see suggestion above about using SP as our separator rather than
"=" for consistency with other documents Tor parses.
Post by juga
NL
[Zero or one time.]
The header ends.
This line has been added in version 1.1.0 of this specification.
For version 1.0.0 documents, the header ends when the first relay
measurement line is found conforming to the next section.
Suggestion: Replace this empty line with an explicit keyword, for
consistency with other documents.
Also to avoid interpreting section ends when there was just garbage.
Any suggestion on which one to use?, dir-list-spec.txt uses "=====",
don't know which ones other documents use.
Post by Nick Mathewson
Post by juga
2.3. Relay measurements format
As in 2.2, to be compatible with current implementations, it MUST be
shorter than 512 characters.
Post by Nick Mathewson
Post by juga
It consists of zero or more relay_line with the measurement results
of relays in arbitrary order.
There can be at most one relay_line per relay identity (fingerprint).
There MUST NOT be multiple key_value pairs with the same key in the same
relay_line.
Do existing implementations accept arbitrary order here?
Good question, it seems like bw must be behind node_id, but they can
have things in front and behind. I probably should create a ticket to
add more test lines in [1] or include them in #25960.
Post by Nick Mathewson
Post by juga
"node_id=" fingerprint
[Exactly once.]
The fingerprint of the relay being measured.
Suggestion: Add a field to hold the Ed25519 Identity of the relay being
measured. Say that implementations SHOULD include both RSA fingerprint
and Ed25519 identity, and that implementations SHOULD accept lines that
contain at least one of them.
Post by juga
"bw=" bandwidth
[Exactly once.]
The measured bandwidth of this relay.
Tor accepts zero bandwidths, but they trigger bugs in older Tor
implementations. Therefore, implementations SHOULD NOT produce zero
bandwidths. Instead, they SHOULD use one as their minimum bandwidth.
Multiple measurements can be aggregated using an averaging scheme,
such
Post by juga
as a mean, median, or decaying average.
Torflow scales bandwidths to kilobytes per second. Other
implementations
Post by juga
SHOULD use kilobytes per second for their initial bandwidth scaling.
If different implementations or configurations are used in votes for
the
Post by juga
same network, their measurements MAY need further scaling. See
Appendix B
Post by juga
for information about scaling, and one possible scaling method.
key_value
[Zero or more times.]
Technically, this isn't a key_value, because a "value" is made of
ArgumentChar, and ArgumentChar can contain spaces. So if we were
parsing
"foo=abc bar=def"
we might be parsing either one key_value ("foo", "abc bar=def") or two
("foo", "abc"), ("bar, "def").
You're right. The closest from dir-spec.txt is KeywordChar, but that
doesn't include colon, for instance. So, we would need to define what is
accepted here (unless it is defined in some other document).
Post by Nick Mathewson
Post by juga
Future format versions may include additional key_value pairs on a
relay_line.
Post by juga
Additional key_value pairs will be accompanied by a minor version
increment.
Post by juga
Implementations MAY add additional relay key_value pairs as needed.
This
Post by juga
specification SHOULD be updated to avoid conflicting meanings for the
same relay keys.
Parsers MUST NOT rely on the order of these additional key_value
pairs.
Post by juga
Additional key_value pairs MUST NOT use any keywords specified in the
header format.
As above, let's say that a parser should ignore key_value entries with
keywords that it doesn't recognize.
Post by juga
If a relay line does not conform to this format, the line SHOULD be
ignored by parsers.
2.4. Implementation notes
2.4.1. Simple Bandwidth Scanner
"node_id=" fingerprint SP
As above.
"bw=" bandwidth SP
As above.
"nick=" nickname SP
[Exactly once.]
The relay nickname.
"rtt=" Int SP
[Exactly once.]
The Round Trip Time in milliseconds to obtain 1 byte of data.
"time=" timestamp NL
[Exactly once.]
The Unix Epoch time in seconds when the last measurement was
performed.
Post by juga
2.4.2. Torflow
Torflow relay lines include node_id and bw, and other key_value pairs [2].
1. https://gitweb.torproject.org/torflow.git
2.
https://gitweb.torproject.org/torflow.git/tree/NetworkScanners/BwAuthority/README.spec.txt#n332
Post by juga
3. https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt
4. https://metrics.torproject.org/onionoo.html#details
5. https://semver.org/
A. Sample data
The following has not been obtained from any real measurement.
A.1. Generated by Torflow
1523911758
node_id=$68A483E05A2ABDCA6DA5A3EF8DB5177638A27F80 bw=760 nick=Test
measured_at=1523911725 updated_at=1523911725 pid_error=4.11374090719
pid_error_sum=4.11374090719 pid_bw=57136645 pid_delta=2.12168374577
circ_fail=0.2 scanner=/filepath
Post by juga
node_id=$96C15995F30895689291F455587BD94CA427B6FC bw=189 nick=Test2
measured_at=1523911623 updated_at=1523911623 pid_error=3.96703337994
pid_error_sum=3.96703337994 pid_bw=47422125 pid_delta=2.65469736988
circ_fail=0.0 scanner=/filepath
Post by juga
A.2. Generated by sbws version 0.1.0
1523911758
version=1.1.0
software=sbws
software_version=0.1.0
scanner_started=1523911756
earliest_measurement=1523911757
node_id=$68A483E05A2ABDCA6DA5A3EF8DB5177638A27F80 bw=760 nick=Test
rtt=380 time=1523911725
Post by juga
node_id=$96C15995F30895689291F455587BD94CA427B6FC bw=189 nick=Test2
rtt=378 time=1523911623
Post by juga
B. Scaling bandwidths
B.1. Scaling requirements
Tor accepts zero bandwidths, but they trigger bugs in older Tor
implementations. Therefore, scaling methods SHOULD perform the
* If the total bandwidth is zero, all relays should be given equal
bandwidths.
* If the scaled bandwidth is zero, it should be rounded up to one.
Initial experiments indicate that scaling may not be needed for
torflow and sbws, because their measured bandwidths are similar
enough already.
B.2. A linear scaling method
If scaling is required, here is a simple linear bandwith scaling
method, which ensures that all bandwidth votes contain approximately
1. Calculate the relay quota by dividing the total measured bandwidth
in all votes, by the number of relays with measured bandwidth
votes. In the public tor network, this is approximately 7500 as of
April 2018. The quota should be a consensus parameter, so it can be
adjusted for all scanners on the network.
2. Calculate a vote quota by multiplying the relay quota by the number
of relays this bandwidth authority has measured
bandwidths for.
3. Calculate a scaling factor by dividing the vote quota by the
total unscaled measured bandwidth in this bandwidth
authority's upcoming vote.
4. Multiply each unscaled measured bandwidth by the scaling
factor.
Now, the total scaled bandwidth in the upcoming vote is
approximately equal to the quota.
B.3. Quota changes
If all scanners are using scaling, the quota can be gradually
reduced or increased as needed. Smaller quotas decrease the size
of uncompressed consensuses, and may decrease the size of
consensus diffs and compressed consensuses. But if the relay
quota is too small, some relays may be over- or under-weighted.
[0] https://gitweb.torproject.org/tor.git/tree/src/or/dirserv.c#n2563
[1] https://gitweb.torproject.org/torspec.git/tree/dir-list-spec.txt#n131
[2] https://gitweb.torproject.org/tor.git/tree/src/test/test_dir.c#n1495
juga
2018-05-02 09:44:00 UTC
Permalink
Post by juga
Post by Nick Mathewson
Do existing implementations accept arbitrary order here?
Good question, it seems like bw must be behind node_id, but they can
have things in front and behind. I probably should create a ticket to
add more test lines in [1] or include them in #25960.
Checked: in the current implementation, the only order required is that
bw must appear before node_id. It probably does not make sense, but to
be compatible with it, it is what this spec should say.


[1] https://gitweb.torproject.org/torspec.git/tree/dir-list-spec.txt#n131
Iain Learmonth
2018-05-01 18:29:35 UTC
Permalink
Hi,
Post by juga
Tor Bandwidth Measurements Document Format
"Measurement" could mean a method for performing a measurement, a single
measurement task, a schedule for a repeating measurement task, a
measurement result or a few other things.

When Large MeAsurement Platforms (LMAP) wrote documents in the IETF,
they only ever used measurement as an adjective to avoid any ambiguity.

https://www.ietf.org/archive/id/draft-eardley-lmap-terminology-02.txt

The architecture for LMAP may not fit well with the bandwidth scanner
architecture, and so I'm not suggesting we adopt the terminology in that
document throughout.
Post by juga
Post by juga
2. Format details
- Header (exactly once)
- Relays measurements (zero or more times)
Grammar suggestion: "Relay measurements".
In this case, this would become "Relay measurement result".

If desirable, I'd be happy to check through the document for any other
places ambiguities pop up, but I'll let others finish having their
comments integrated first.

Thanks,
Iain.
juga
2018-05-02 08:34:00 UTC
Permalink
Hi Iain,
Post by Iain Learmonth
Hi,
Post by juga
Tor Bandwidth Measurements Document Format
"Measurement" could mean a method for performing a measurement, a single
measurement task, a schedule for a repeating measurement task, a
measurement result or a few other things.
I also wondered whether that was the correct word and considered
"capacity", but didn't convince me.
Teor also suggested me to remove "Document", but i thought i'd keep it,
trying to mean that the spec is only about the "file" and not the
process or how they are formatted somewhere else.

Do you have a suggestion on what other word to use instead of measurements?.
Post by Iain Learmonth
When Large MeAsurement Platforms (LMAP) wrote documents in the IETF,
they only ever used measurement as an adjective to avoid any ambiguity.
https://www.ietf.org/archive/id/draft-eardley-lmap-terminology-02.txt
The architecture for LMAP may not fit well with the bandwidth scanner
architecture, and so I'm not suggesting we adopt the terminology in that
document throughout.
Post by juga
Post by juga
2. Format details
- Header (exactly once)
- Relays measurements (zero or more times)
Grammar suggestion: "Relay measurements".
In this case, this would become "Relay measurement result".
More accurate, though starts becoming a bit too long. The title should
probably become then: "Tor Bandwidth Measurements Results Document Format"
Any shorter suggestion?.
Post by Iain Learmonth
If desirable, I'd be happy to check through the document for any other
places ambiguities pop up, but I'll let others finish having their
comments integrated first.
It's fine to continue to make comments on the thread where others
commented, no need to wait until those comments are integrated. But
either way works.

Thanks for your comments!,
juga.
teor
2018-05-02 08:59:57 UTC
Permalink
Post by juga
Post by Iain Learmonth
Post by Nick Mathewson
Post by juga
2. Format details
- Header (exactly once)
- Relays measurements (zero or more times)
Grammar suggestion: "Relay measurements".
In this case, this would become "Relay measurement result".
More accurate, though starts becoming a bit too long. The title should
probably become then: "Tor Bandwidth Measurements Results Document Format"
Any shorter suggestion?.
"Measurements Results" describes how the bandwidths are created by some
generators. But a generator that believes self-reported results doesn't measure,
it just aggregates. (As does a peerflow-style generator.)

"Document" is vague. Let's describe what the document is: a list.

Let's use:
Tor Bandwidth List Format

What is the document?
A Tor Bandwidth List

How do I parse it?
Using the Tor Bandwidth List Format

Are there any similar formats?
The Tor Directory List Format
https://gitweb.torproject.org/torspec.git/tree/dir-list-spec.txt

T
Iain Learmonth
2018-05-02 09:18:28 UTC
Permalink
Hi,
Post by teor
Tor Bandwidth List Format
As we are already using this for the directory lists, I think this makes
sense as a name for the format.
Post by teor
"Measurements Results" describes how the bandwidths are created by
some generators. But a generator that believes self-reported results
doesn't measure, it just aggregates. (As does a peerflow-style generator.)

I'm not sure I understand this. Are you saying that the format will be
used to aggregate results that are collected? In this case, I think the
results can still be called results in that they correspond to an active
measurement of a relay and have a value.

Thanks,
Iain.
teor
2018-05-02 09:31:53 UTC
Permalink
Post by Iain Learmonth
Post by teor
"Measurements Results" describes how the bandwidths are created by
some generators. But a generator that believes self-reported results
doesn't measure, it just aggregates. (As does a peerflow-style generator.)
I'm not sure I understand this. Are you saying that the format will be
used to aggregate results that are collected? In this case, I think the
results can still be called results in that they correspond to an active
measurement of a relay and have a value.
No, I'm saying that the spec is about the format.
It's not about how the numbers in a file in the format are created.

"Measurement" is one way we can create the file.

Other ways to create the file are:
* "copy" self-reported bandwidths from relay descriptors into the
required format (the naive, pre-bandwidth scanner method)
* "aggregate" bandwidths passively observed by other relays into the
required format (the peerflow method)
* assign all relays equal bandwidths (the fallback method in Appendix B)

So let's try to keep "relay measurement" and "relay bandwidths" as
separate concepts.

T
Iain Learmonth
2018-05-02 10:14:06 UTC
Permalink
Hi,
Post by teor
So let's try to keep "relay measurement" and "relay bandwidths" as
separate concepts.
Aaah, ok. Yes, I much prefer "Relay Bandwidth" as the name for the
section in §2. There are then also lots of references to measurement in
§2.2, that should also be changed to talk about bandwidths instead, e.g.
"earliest_bandwidth".

Thanks,
Iain.
teor
2018-05-02 12:39:12 UTC
Permalink
Hi Nick,

Juga asked me to comment on your review, so she could read it before our bandwidth meeting this week. If I don't comment on a suggestion, you should assume I agree with it.

Backwards Compatibility

Nick asked about backwards compatibility. This format uses semantic versioning. Tor 0.2.9 - 0.3.3 reads format version 1.0.0. It also reads format 1.1.0, but ignores the new features with warnings.

If we want to introduce an incompatible format, we should call it 2.0.0, because semantic versioning requires a major increment for breaking changes.

Here's how we could add the new format:
* The new format should have a new torrc option.
* Tor should be modified to support the new format, and we should put time on the roadmap for people to work on implementing, testing, or reviewing it.
* Either we should backport the new format to the latest stable release, or sbws should produce both formats.

The current implementation has at least one security bug, some weird order restrictions, and some line length restrictions. So I would support re-implementing it using the standard directory document parsing code. Even if that takes more time.

Testing the format

Most of us don't have a spare directory authority for testing.

If you run chutney with my bwfile branch, all the authorities in the network read /tmp/bwfile for every consensus. Look for the warnings at the end of the chutney output.

The basic-min network is fast:
chutney/tools/test-network.sh --flavour basic-min

Here's the branch:
https://github.com/teor2345/chutney/commit/ebdb4760fbcae40979ab248e4208c27a71cccb11

I've already found one minor security bug using this branch: #26007.

Next Steps

I'm going to be away next week for a week and a half. I encourage other people to make decisions while I'm away, so we can keep making progress.
Post by Nick Mathewson
Hi, Juga!
This is a review of the document from https://raw.githubusercontent.com/juga0/torspec/c7f06023dd1d5d47adad128de541f8eba2a13bfb/bandwidth-file-spec.txt , which I *think* is the same as the document you have below.
I'm reviewing this as though it were a fully new format, since I'm not sure how much we already have locked-in based on existing code, and how much is new. We might decide that backward compatibility is more important than consistency, and if so, we won't want to take all of my recommendations here.
Post by juga
Tor Bandwidth Measurements Document Format
juga
teor
1. Scope and preliminaries
This document describes the format of Tor's bandwidth measurements
Replace measurements document with list?
Post by Nick Mathewson
Post by juga
document, version 1.0.0 and later.
Suggestion: Maybe explicitly say "1.0.0, 1.1.0, and later"?
Post by juga
Since Tor version 0.2.4.12-alpha the directory
authorities use the bandwidth measurements document called
Replace measurements document with list?
Post by Nick Mathewson
Post by juga
"V3BandwidthsFile" and produced by Torflow [1]
(format described in README.spec.txt [2]).
Recommendation: "Format described in Torflow's README.spec.txt".
Explanation needed: Is this a new format, or a new specification of the
existing format? Let's say so here.
A new specification for the existing format 1.0.0.
A new format 1.1.0, which is backwards compatible with 1.0.0 parsers.
Post by Nick Mathewson
Question: If this is a different format, and we're calling it version
1.0.0, what should we call the old one? But later it seems that we're
introducing 1.1.0, and we're calling the old one 1.0.0.
"The Legacy Torflow format" or just "legacy"?
Post by Nick Mathewson
Suggestion: let's be explicit that we're only describing the format
here, and *not* describing how bwauths generate their data.
I agree. We want to leave room for peerflow and future schemes.
So we might want to:
* replace every "measurements document" with "list"
* replace every "measurements scanner" with "generator"
Post by Nick Mathewson
Post by juga
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in
RFC 2119.
1.2. Acknowledgements
The original bandwidth measurement scanner (Torflow)
Replace measurement scanner with generator?
Post by Nick Mathewson
and format was
Post by juga
created by mike. Teor suggested to write this specification while
contributing on pastly's new bandwidth scanner implementation.
XXX
Please update.
Post by Nick Mathewson
Post by juga
1.3 Outline
The bandwidth measurements mentioned in sections 3.4.1 and 3.4.2
Hmm, the dir-spec calls them measurements.
Maybe we should fix it as well.
Post by Nick Mathewson
Post by juga
of "Tor directory protocol" (dir-spec.txt) [3] are obtained
by bandwidth authorities,
Is a bandwidth authority a directory authority that votes for bandwidths?
Or is it a bandwidth generator that produces the bandwidth file?
Post by Nick Mathewson
which generate a file storing information
Post by juga
on relays' measured bandwidth capacities.
Remove "measured".
Post by Nick Mathewson
Post by juga
1.4. Format Versions
1.0.0 - The legacy fallback bandwidth measurements document format
Instead of "bandwidth measurements document format", say "bandwidth list"?
Post by Nick Mathewson
Post by juga
1.1.0 - Adds key_value lines to the header, format version,
optional ones and section separator.
Information: Let's repeat in this section which versions of Tor can
consume these versions.
All Tor versions can consume format version 1.0.0.
All Tor versions can consume format version 1.1.0, but they warn on header lines.
See https://trac.torproject.org/projects/tor/ticket/25960
Post by Nick Mathewson
Post by juga
2. Format details
And if they don't, the file SHOULD be ignored.
Post by Nick Mathewson
Post by juga
- Header (exactly once)
- Relays measurements (zero or more times)
Grammar suggestion: "Relay measurements".
Replace "measurements" with "bandwidths"?
Post by Nick Mathewson
Post by juga
2.1. Definitions
The following nonterminals are defined in dir-spec.txt, sections
Int
SP (space)
NL (newline)
Keyword
ArgumentChar
fingerprint (hexdigest)
Does this have to start with a "$" ? I think it does. Maybe we should be explicit about that.
It does. And we should.
Post by Nick Mathewson
Post by juga
nickname
Nonterminals defined in "Tor Directory List Format" (dir-list-spec.txt),
version_number
value ::= ArgumentChar+
Excluding SP
Post by Nick Mathewson
Post by juga
key_value ::= Keyword "=" value
line ::= ArgumentChar* NL
timestamp ::= Int
bandwidth ::= Int
relay_line ::= key_value (SP key_value)* NL
2.2. Header format
Some header lines MUST appear in specific positions, as documented below.
And if they don't, the file SHOULD be ignored.
Post by Nick Mathewson
Post by juga
All other lines can appear in any order.
There MUST NOT be multiple key_value header lines with the same key.
And if there are, the parser SHOULD choose an arbitrary line.

All lines in the file MUST be 510 characters or less, to allow for the trailing newline and NUL characters. (The previous limit was 254 characters in Tor 0.2.6.2-alpha and earlier.)

The parser MAY ignore longer lines.

Should we lift this restriction in 1.1.0?
Post by Nick Mathewson
Maybe this line belongs below in the key_value section?
Post by juga
timestamp NL
[At start, exactly once.]
The Unix Epoch time in seconds when the file was created.
Question: Why no keyword and equal sign here? Is this a legacy thing?
Yes, tor expects a Unix timestamp on a single line by itself.
Post by Nick Mathewson
Also, wouldn't it be more standard to have it be in YYYY-MM-DDTHH:MM:SS
format?
Tor refuses to read bandwidth files unless they start with an integer on a line by itself. So this would be a breaking change.
Post by Nick Mathewson
Post by juga
"version=" version_number NL
[In second position, zero or one time.]
The specification document format version.
It uses semantic versioning [5].
This line has been added in version 1.1.0 of this specification.
Version 1.0.0 documents do not contain this line, and the
version_number is considered to be "1.0.0".
General concern: I question the use of = signs here in the headers. If
we use "SP" instead, then we can reuse a lot of the same machinery tor
currently uses to parse other documents.
I think using SP is fine.

But if we want to re-use the parsing machinery, we probably need to add a keyword to the initial timestamp. That would be a breaking change.
Post by Nick Mathewson
Post by juga
"software=" value NL
[Zero or one time.]
The name of the software that created the document.
This line has been added in version 1.1.0 of this specification.
Version 1.0.0 documents do not contain this line, and the software is
considered to be "torflow".
"software_version=" value NL
[Zero or one time.]
The version of the software that created the document.
The version may be a version_number, a git commit, or some other
version scheme.
This line has been added in version 1.1.0 of this specification.
If we use SP as a separator, we can make these two lines:

"software" SP name_value SP version_value NL
Post by Nick Mathewson
Post by juga
"scanner_started=" timestamp NL
[Zero or one time.]
The Unix Epoch time in seconds when the scanner that generates the
measurements document started.
This line has been added in version 1.1.0 of this specification.
See note above about time format. YYYY-MM-DDTHH:MM:SS is how we specify
times elsewhere in Tor.
This is a new field, so we can choose the format.
Post by Nick Mathewson
Post by juga
"earliest_measurement=" timestamp NL
[Zero or one time.]
The Unix Epoch time in seconds when the first relay measurement
was obtained.
This line has been added in version 1.1.0 of this specification.
See note above about time format.
Post by juga
key_value NL
[Zero or more times.]
Future format versions may include additional key_value header lines.
Additional header lines will be accompanied by a minor version increment.
Implementations MAY add additional header lines as needed. This
specification SHOULD be updated to avoid conflicting meanings for the
same header keys.
Parsers MUST NOT rely on the order of these additional lines.
Additional header lines MUST NOT use any keywords specified in the
relay measurements format.
And if there are, the parser MAY ignore conflicting keywords.
Post by Nick Mathewson
Post by juga
If a header line does not conform to this format, the line SHOULD be
ignored by parsers.
Suggestion: say what recipients of this document should do with
unrecognized data. In general, it's good for forward compatibility to
say something like, "Recipients MUST ignore key_value lines if they do
not recognize the keyword. Recipients MUST ignore any extra material in
a line that they do not recognize."
We should specify what parsers should do with every MUST in the document.
Post by Nick Mathewson
Also see suggestion above about using SP as our separator rather than
"=" for consistency with other documents Tor parses.
Post by juga
NL
[Zero or one time.]
The header ends.
This line has been added in version 1.1.0 of this specification.
For version 1.0.0 documents, the header ends when the first relay
measurement line is found conforming to the next section.
Suggestion: Replace this empty line with an explicit keyword, for
consistency with other documents.
Post by juga
2.3. Relay measurements format
It consists of zero or more relay_line with the measurement results
of relays in arbitrary order.
There can be at most one relay_line per relay identity (fingerprint).
There MUST NOT be multiple key_value pairs with the same key in the same
relay_line.
And if there are, the parser SHOULD choose an arbitrary value.
Post by Nick Mathewson
Do existing implementations accept arbitrary order here?
Existing Tor implementations do not accept node_id at the end of a line.
https://trac.torproject.org/projects/tor/ticket/26004

We should:
* add this as a MUST NOT in 1.0.0, and
* allow it in 1.1.0, with a list of tor versions that support it

If we use the standard directory parser, each relay line will have to start with a keyword. Perhaps we should use "b" or "r" or "n". This would be a breaking change.
Post by Nick Mathewson
Post by juga
"node_id=" fingerprint
[Exactly once.]
The fingerprint of the relay being measured.
Suggestion: Add a field to hold the Ed25519 Identity of the relay being
measured. Say that implementations SHOULD include both RSA fingerprint
and Ed25519 identity, and that implementations SHOULD accept lines that
contain at least one of them.
Suggestion: the ed25519 IDs should be base64 encoded, without a trailing =, because a trailing = makes the format ambiguous.
Post by Nick Mathewson
Post by juga
"bw=" bandwidth
[Exactly once.]
The measured bandwidth of this relay.
Tor accepts zero bandwidths, but they trigger bugs in older Tor
implementations. Therefore, implementations SHOULD NOT produce zero
bandwidths. Instead, they SHOULD use one as their minimum bandwidth.
And if there are zero bandwidths, the parser MAY ignore them.
Post by Nick Mathewson
Post by juga
Multiple measurements can be aggregated using an averaging scheme, such
as a mean, median, or decaying average.
Torflow scales bandwidths to kilobytes per second. Other implementations
SHOULD use kilobytes per second for their initial bandwidth scaling.
If different implementations or configurations are used in votes for the
same network, their measurements MAY need further scaling. See Appendix B
for information about scaling, and one possible scaling method.
key_value
[Zero or more times.]
Technically, this isn't a key_value, because a "value" is made of
ArgumentChar, and ArgumentChar can contain spaces. So if we were
parsing
"foo=abc bar=def"
we might be parsing either one key_value ("foo", "abc bar=def") or two
("foo", "abc"), ("bar, "def").
Let's exclude SP from value to resolve this issue.
Post by Nick Mathewson
Post by juga
Future format versions may include additional key_value pairs on a relay_line.
Additional key_value pairs will be accompanied by a minor version increment.
Implementations MAY add additional relay key_value pairs as needed. This
specification SHOULD be updated to avoid conflicting meanings for the
same relay keys.
Parsers MUST NOT rely on the order of these additional key_value pairs.
Additional key_value pairs MUST NOT use any keywords specified in the
header format.
And if there are, the parser MAY ignore conflicting keywords.
Post by Nick Mathewson
As above, let's say that a parser should ignore key_value entries with
keywords that it doesn't recognize.
Post by juga
If a relay line does not conform to this format, the line SHOULD be
ignored by parsers.


T
teor
2018-05-05 01:18:47 UTC
Permalink
Post by teor
Post by juga
Tor accepts zero bandwidths, but they trigger bugs in older Tor
implementations. Therefore, implementations SHOULD NOT produce zero
bandwidths. Instead, they SHOULD use one as their minimum bandwidth.
And if there are zero bandwidths, the parser MAY ignore them.
Bandwidth files also need to respect MaxAdvertisedBandwidth and
RelayBandwidthRate/Burst. We need to specify that the relay descriptor
bandwidth rate and burst should limit the bandwidths in the file.

Torflow supports MaxAdvertisedBandwidth by putting relays in partitions
that match their bandwidth. Maybe it also does some other adjustments.

sbws can probably just do a min() using the measured bandwidth:
https://github.com/pastly/simple-bw-scanner/issues/155

For details, see:
https://trac.torproject.org/projects/tor/ticket/8494#comment:5

T
juga
2018-05-06 20:54:00 UTC
Permalink
Post by teor
Hi Nick,
Post by Nick Mathewson
Suggestion: Add a field to hold the Ed25519 Identity of the relay being
measured. Say that implementations SHOULD include both RSA fingerprint
and Ed25519 identity, and that implementations SHOULD accept lines that
contain at least one of them.
Suggestion: the ed25519 IDs should be base64 encoded, without a trailing =, because a trailing = makes the format ambiguous.
You're talking about the certificate, right?.
This would change the concept of "line", since the certificate is more
than one "line".

This is how it is defined in dir-list-spec.txt

base64-encoded-ed25519-identity :== "-----BEGIN ED25519 CERT-----" NL
certificate "-----END ED25519 CERT-----" NL
teor
2018-05-06 22:39:19 UTC
Permalink
Post by juga
Post by teor
Hi Nick,
Post by Nick Mathewson
Suggestion: Add a field to hold the Ed25519 Identity of the relay being
measured. Say that implementations SHOULD include both RSA fingerprint
and Ed25519 identity, and that implementations SHOULD accept lines that
contain at least one of them.
Suggestion: the ed25519 IDs should be base64 encoded, without a trailing =, because a trailing = makes the format ambiguous.
You're talking about the certificate, right?.
This would change the concept of "line", since the certificate is more
than one "line".
This is how it is defined in dir-list-spec.txt
base64-encoded-ed25519-identity :== "-----BEGIN ED25519 CERT-----" NL
certificate "-----END ED25519 CERT-----" NL
The certificate is a proof of identity.
But we only need to refer to a relay by its ed25519 public key:

"master-key-ed25519" SP MasterKey NL

[At most once]

Contains the base-64 encoded ed25519 master key as a single argument. If it is present, it MUST match the identity key in the identity-ed25519 entry.

https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt#n416

T

Loading...