Discussion:
[tor-dev] Tor Bandwidth Measurements Document Format
juga
2018-04-17 11:23:00 UTC
Permalink
Hi,

as commented with teor and pastly, i send in-line a draft specification
for the document format that the bandwidth scanner implementations
should produce.

I've left my own questions/notes in square brackets.

Thanks,
juga.

=======================================

Tor Bandwidth Measurements Document Format
[juga: which name should we give to this document?]

1. Scope and preliminaries

This document describes the format of Tor's bandwidth measurements
document, version X.X.X [juga: which version should be this?]
and later.

Since Tor version X.X.X [juga: which tor version?] the directory
authorities use the bandwidth measurements document called
"V3BandwidthsFile" and produced by Torflow [1]
(format described in README.spec.txt [2]).

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in
RFC 2119.

1.2. Acknowledgements

The original bandwidth measurement scanner (Torflow) and format was
created by mike. Teor suggested to write this specification while
contributing on pastly's new bandwidth scanner implementation.

This specification was revised after feedback from:

XXX

1.3 Outline

The bandwidth measurements mentioned in sections 3.4.1 and 3.4.2
of dir-spec.txt [3] are obtained by bandwidth authorities, which are
either directory authorities or other servers running bandwidth
measurement scanners and sending the results to the former.
[juga: it seems that bandwidth authorities have not been formally
before]

2. Format details

Bandwidth measurements MUST contain the following sections:
- Header (exactly once)
- Relays measurements (zero or more times)

Each section (or entry) ends with a separator.

2.1. Nonterminals

The following nonterminals are defined in the Onionoo details
document specification [4]:

fingerprint
nickname

In the bandwidth measurement documents nickname is optional.

The following nonterminals are defined in the in dir-spec.txt:

NL (newline)
SP (space)

"bw" = INT, the aggregated measured bandwidth of this relay, in
kilobytes per second.

We introduce the following nonterminals:
[juga: this should probably be defined more formally and should
probably link to other documents, which ones?]

"version" = The name and the version of the bandwidth scannner
software, such as "sbws 0.1.0".
The name of the software, if absent, is assumed to be "torflow".
[juga: which should be the version if absent?]

"timestamp" = INT, the Unix Epoch time when the file was created.

2.2. Header format

It MUST consists of:

"timestamp" timestamp NL
"version" version NL

2.3. Relay measurements format

Relays measurements MUST consist of the following items.

"node_id" fingerprint SP
"bw" bandwidth SP

When there are no more items, the "bw" item ends with NL instead of
SP.

2.4. Optional extra items

Different implementations of the bandwidth measurements scanners MAY
include other items per relay.

For instance, sbws includes:

"rtt" = INT, Round Trip Time (to obtain 1B)

Every relay measurement in sbws consists of:

"node_id" fingerprint SP
"bw" bandwidth SP
"nick=" nickname SP
"rtt=" rtt SP
"time=" timestamp NL

Every relay measurement in Torflow consists of:

"node_id" fingerprint SP
"bw" bandwidth SP
"nick=" nickname SP
"measured_at=" slice timestamp NL

The "measured_at" does not correspond to the "time" in sbws.
[juga: actually, if bwauths use "measured_at", then the code on them
or sbws should be changed].

Torflow includes other items that are out of the scope of this
document.

References:

1. https://gitweb.torproject.org/torflow.git
2.
https://gitweb.torproject.org/torflow.git/tree/NetworkScanners/BwAuthority/README.spec.txt#n332
3. https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt
4. https://metrics.torproject.org/onionoo.html#details


A. Sample data

A.1. Torflow

1523911758
node_id=$68A483E05A2ABDCA6DA5A3EF8DB5177638A27F80 bw=392760 nick=Test
measured_at=1523911725 updated_at=1523911725 pid_error=4.11374090719
pid_error_sum=4.11374090719 pid_bw=57136645 pid_delta=2.12168374577
circ_fail=0.2 scanner=/filepath

A.2. sbws

1523911758
version=0.1.0
node_id=$68A483E05A2ABDCA6DA5A3EF8DB5177638A27F80 bw=392760 nick=Test
rtt=380 time=1523911725
teor
2018-04-17 13:01:30 UTC
Permalink
Hi,

Thanks for writing this draft spec.
Post by juga
Hi,
as commented with teor and pastly, i send in-line a draft specification
for the document format that the bandwidth scanner implementations
should produce.
I've left my own questions/notes in square brackets.
Thanks,
juga.
=======================================
Tor Bandwidth Measurements Document Format
[juga: which name should we give to this document?]
That's a fine name.
You can leave out the "Document" if you want.
Post by juga
1. Scope and preliminaries
This document describes the format of Tor's bandwidth measurements
document, version X.X.X [juga: which version should be this?]
It doesn't matter, so let's use semantic versioning:
* the original torflow format was 1.0.0
* the format in this spec adds the "version" feature, so it is 1.1.0
(it is compatible with 1.0.0, as long as parsers ignore unrecognised
lines)
Post by juga
and later.
Since Tor version X.X.X [juga: which tor version?]
It looks like 0.2.4.12-alpha added measured bandwidths
https://gitweb.torproject.org/tor.git/tree/ChangeLog#n12710
Post by juga
the directory
authorities use the bandwidth measurements document called
"V3BandwidthsFile" and produced by Torflow [1]
(format described in README.spec.txt [2]).
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in
RFC 2119.
1.2. Acknowledgements
The original bandwidth measurement scanner (Torflow) and format was
created by mike. Teor suggested to write this specification while
contributing on pastly's new bandwidth scanner implementation.
XXX
1.3 Outline
The bandwidth measurements mentioned in sections 3.4.1 and 3.4.2
of dir-spec.txt [3] are obtained by bandwidth authorities, which are
either directory authorities or other servers running bandwidth
measurement scanners and sending the results to the former.
[juga: it seems that bandwidth authorities have not been formally
before]
You could use the definition in the man page:
"the bandwidth-authority generated file storing information on
relays' measured bandwidth capacities"
Post by juga
2. Format details
- Header (exactly once)
- Relays measurements (zero or more times)
Each section (or entry) ends with a separator.
This line is a copy-paste error, it should be deleted.
Post by juga
2.1. Nonterminals
The following nonterminals are defined in the Onionoo details
fingerprint
nickname
This file format gets the fingerprint and nickname from the
consensus, so you should reference dir-spec.txt.

(dir-list-spec.txt gets relay fingerprints and nicknames from
Onionoo. That's why it uses the Onionoo definitions.)

Here are the definitions of hexdigest (fingerprint) and nickname:
https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt#n1268
Post by juga
In the bandwidth measurement documents nickname is optional.
"optional" is not relevant in a definition.
Let's delete this line, it's already documented as optional later on.
Post by juga
NL (newline)
SP (space)
"bw" = INT, the aggregated measured bandwidth of this relay, in
kilobytes per second.
bw is not defined in dir-spec.txt. And the formatting is confusing.
Double quotes are used for ASCII literal strings in dir-spec.txt.
Can you please follow the format used in dir-spec.txt?

Here is one example:
https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt#n210

Here's how you can define bw using the Int definition from
dir-spec.txt:
https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt#n795

bw = Int

bw is the aggregated measured bandwidth of this relay, in kilobytes
per second.
Post by juga
[juga: this should probably be defined more formally and should
probably link to other documents, which ones?]
dir-spec.txt
Post by juga
"version" = The name and the version of the bandwidth scannner
software, such as "sbws 0.1.0".
Our newest spec uses "version" for the file format version:
https://gitweb.torproject.org/torspec.git/tree/dir-list-spec.txt#n148

So please don't make a field with a different meaning and structure,
and call it "version".

I suggest:
* use "version" for the file format version (or don't use "version")
* use "source" for the implementation software name and version

Please fix the formatting of this definition to be like dir-spec.txt.
This definition has two arguments separated by spaces, the name,
and the version.
Post by juga
The name of the software, if absent, is assumed to be "torflow".
[juga: which should be the version if absent?]
"if absent" is not relevant in a definition.
Let's move these lines to the header section.

The software version should be optional.
Torflow does not have a version, so we cannot require a version.
Post by juga
"timestamp" = INT, the Unix Epoch time when the file was created.
Please fix the formatting of this definition to be like dir-spec.txt.
Post by juga
2.2. Header format
We should say if order matters.
We should say how new items get added to the header.
(We could say that parsers MUST ignore unrecognised lines.)
Post by juga
"timestamp" timestamp NL
"version" version NL
The sbws sample data has:
1523911758
version=0.1.0

The first line does not have a "timestamp" string literal.
The second line has an equals sign.
The second line is optional (see the torflow sample data).

Does Tor ignore the version line?
If it does, we should document it.
Post by juga
2.3. Relay measurements format
You should say that order on a line doesn't matter, and relay order
also doesn't matter.
Post by juga
Relays measurements MUST consist of the following items.
"node_id" fingerprint SP
"bw" bandwidth SP
The format has equals signs, but this definition does not.
Post by juga
When there are no more items, the "bw" item ends with NL instead of
SP.
It might be easier to say that each line allows extra arguments, and
reference the dir-spec.txt definition:
https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt#n261

But does each argument need an equals sign?
Post by juga
2.4. Optional extra items
Different implementations of the bandwidth measurements scanners MAY
include other items per relay.
"rtt" = INT, Round Trip Time (to obtain 1B)
This definition belongs in the definition section.
Please fix the formatting of this definition to be like dir-spec.txt.
Post by juga
"node_id" fingerprint SP
"bw" bandwidth SP
The format has equals signs, but this definition does not.

The fingerprints in the sample data have $ signs.
Does Tor require them? Or are they optional?
We should document it either way.
Post by juga
"nick=" nickname SP
"rtt=" rtt SP
"time=" timestamp NL
The equals signs are correct here.
Post by juga
"node_id" fingerprint SP
"bw" bandwidth SP
The format has equals signs, but this definition does not.
Post by juga
"nick=" nickname SP
"measured_at=" slice timestamp NL
slice is not defined, just use "timestamp", then explain using
the next line.
Post by juga
The "measured_at" does not correspond to the "time" in sbws.
Is it worth explaining the difference?
Post by juga
[juga: actually, if bwauths use "measured_at", then the code on them
or sbws should be changed].
Tor does not contain the string "measured_at":
https://github.com/torproject/tor/search?q=measured_at

For consistency, please remove "measured_at", or add "updated_at".
Post by juga
Torflow includes other items that are out of the scope of this
document.
We should think about which torflow fields are worth documenting.
Post by juga
1. https://gitweb.torproject.org/torflow.git
2.
https://gitweb.torproject.org/torflow.git/tree/NetworkScanners/BwAuthority/README.spec.txt#n332
3. https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt
4. https://metrics.torproject.org/onionoo.html#details
A. Sample data
Maybe the sample data should contain more than one relay?
Post by juga
A.1. Torflow
1523911758
node_id=$68A483E05A2ABDCA6DA5A3EF8DB5177638A27F80 bw=392760 nick=Test
measured_at=1523911725 updated_at=1523911725 pid_error=4.11374090719
pid_error_sum=4.11374090719 pid_bw=57136645 pid_delta=2.12168374577
circ_fail=0.2 scanner=/filepath
A.2. sbws
1523911758
version=0.1.0
node_id=$68A483E05A2ABDCA6DA5A3EF8DB5177638A27F80 bw=392760 nick=Test
rtt=380 time=1523911725
T

Loading...