[tor-dev] Error-Correcting Onions with Bech32

Discussion:

nullius

2017-12-31 00:53:04 UTC

# Synopsis

The Bech32 standard for error-correcting base32 strings was developed
explicitly for relative ease and reliability in human communication of
pseudorandom bitstrings. I invite discussion of specifying Bech32 as an
alternative means for representing RFC 7686 .onion domain names. Should
the response hereto be positive, then I will offer a formal proposal.

I have written and released a tool which automatically recognizes and
encodes/decodes .onion addresses in Bech32. To complement whatever I
here say, please get a hands-on feel for Bech32 .onions:

https://github.com/nym-zone/bech32

Manpage (yes, a real manpage!):
https://raw.githubusercontent.com/nym-zone/bech32/master/bech32.1.txt

# Background: About Bech32

Bech32 is specified by the Bitcoin BIP 173 standard,[1] co-authored by
Pieter Wuille and Greg Maxwell. According to Mr. Maxwell, âBech32 is
designed for human use and basically nothing elseâ; the underlying
research and development process involved extensive testing with human
users, analysis of NIST visual confusability data, and the integration
of a BCH code with strong error correction and detection properties.

[1] https://github.com/bitcoin/bips/blob/master/bip-0173.mediawiki

I refer to BIP 173 for further explanation of Bech32âs design
properties, its rationales, and the limits of its error handling.

A specific application of Bech32 is Bitcoinâs new address format for the
future, which I call âBravo Charlie Addressesâ after the letters âbcâ
specified for Bitcoin addresses in the standardâs âhuman-readable partâ
(HRP). However, the standard was written to permit general use in other
applications.

Having in hand a standard explicitly designed to ease the pain which
wetware suffers when it comes into contact with pseudorandom gibberish,
the cypherpunk in me is overjoyed at the potentials. One is a concept
which I call âPGP Descriptorsâ, which I am currently working to specify
with a few extra features and nuances. And of course, I think of
.onions!

# Bech32 for .onion

I hereby nominate âonionâ as the logical HRP for RFC 7686 .onion
special-use domain names.

Here is Bech32 .onion by example, using my bech32 tool with its built-in
.onion support to encode and decode the name for the Tor Projectâs
.onion equivalent of its âwwwâ site:

```
$ bech32 -e expyuzz4wqqyqhjn.onion
onion1yh0c5eeuksscs8fdyd8406
$ bech32 -d onion1yh0c5eeuksscs8fdyd8406
expyuzz4wqqyqhjn.onion
```

The string is longer, because it contains 6 base32 charactersâ worth of
error-correcting code. N.b. also, the foregoing should work just fine
for v3 onions (formerly prop-224).

Imagine the impact on users who have a practical need to transmit a
.onion address by verbal communication, or via a handwritten note. Now
they can get some help with errors, instead of wondering why they canât
connect to a nonexistent .onion site.

The standard enjoins applications against autocorrecting Bitcoin
addresses, so as to prevent even the slightest possibility of causing
funds loss by being too âhelpfulâ. But in applications where it would
be safe to do so, Bech32 can indeed correct small errors (as well as
reliably detecting much worse errors). I suggest that such automatic
correction would be suitable for .onion addresses.

Bech32 co-author Dr. Wuille (sipa) has published Javascript reference
code, plus a Javascript error-correction demo, under an MIT license.
Perhaps this may be easily adapted into Torbutton, for automagic
decoding of Bech32 âonion1â to .onion domains in the Tor Browser address
bar. The code is in the same repository whence I copied the Bech32
reference C code I use internally in my tool:

https://github.com/sipa/bech32

# Conclusionâor, to be continued...

An alternative representational format with error-correcting codes will
make .onion addresses more human-friendly. I look forward to the day
when âonion1â addresses can be passed by handwritten notes, vocalized
with a radio alphabet, stuffed into QR codes, scrawled on parchments
placed in bottles tossed to sea, rocketed into space, and then
conveniently transformed with appropriate corrections into the DNS-style
.onion format specified by RFC 7686.

Hereâs to the alternative Onion format of the future!

--
***@nym.zone | PGP ECC: 0xC2E91CD74A4C57A105F6C21B5A00591B2F307E0C
Bitcoin: bc1qcash96s5jqppzsp8hy8swkggf7f6agex98an7h | (Segwit nested:
3NULL3ZCUXr7RDLxXeLPDMZDZYxuaYkCnG) (PGP RSA: 0x36EBB4AB699A10EE)
ââIf youâre not doing anything wrong, you have nothing to hide.â
No! Because I do nothing wrong, I have nothing to show.â â nullius

Alec Muffett

2017-12-31 00:57:49 UTC

Permalink

Thanks! That's very interesting! TIL :-)

What would you propose to do with subdomains, like
www.facebookcorewwwi.onion? Or is that outside the scope of your proposal?

- alec

Post by nullius
# Synopsis
The Bech32 standard for error-correcting base32 strings was developed
explicitly for relative ease and reliability in human communication of
pseudorandom bitstrings. I invite discussion of specifying Bech32 as an
alternative means for representing RFC 7686 .onion domain names. Should
the response hereto be positive, then I will offer a formal proposal.
I have written and released a tool which automatically recognizes and
encodes/decodes .onion addresses in Bech32. To complement whatever I here
https://github.com/nym-zone/bech32
https://raw.githubusercontent.com/nym-zone/bech32/master/bech32.1.txt
# Background: About Bech32
Bech32 is specified by the Bitcoin BIP 173 standard,[1] co-authored by
Pieter Wuille and Greg Maxwell. According to Mr. Maxwell, âBech32 is
designed for human use and basically nothing elseâ; the underlying research
and development process involved extensive testing with human users,
analysis of NIST visual confusability data, and the integration of a BCH
code with strong error correction and detection properties.
[1] https://github.com/bitcoin/bips/blob/master/bip-0173.mediawiki
I refer to BIP 173 for further explanation of Bech32âs design properties,
its rationales, and the limits of its error handling.
A specific application of Bech32 is Bitcoinâs new address format for the
future, which I call âBravo Charlie Addressesâ after the letters âbcâ
specified for Bitcoin addresses in the standardâs âhuman-readable partâ
(HRP). However, the standard was written to permit general use in other
applications.
Having in hand a standard explicitly designed to ease the pain which
wetware suffers when it comes into contact with pseudorandom gibberish, the
cypherpunk in me is overjoyed at the potentials. One is a concept which I
call âPGP Descriptorsâ, which I am currently working to specify with a few
extra features and nuances. And of course, I think of .onions!
# Bech32 for .onion
I hereby nominate âonionâ as the logical HRP for RFC 7686 .onion
special-use domain names.
Here is Bech32 .onion by example, using my bech32 tool with its built-in
.onion support to encode and decode the name for the Tor Projectâs .onion
```
$ bech32 -e expyuzz4wqqyqhjn.onion
onion1yh0c5eeuksscs8fdyd8406
$ bech32 -d onion1yh0c5eeuksscs8fdyd8406
expyuzz4wqqyqhjn.onion
```
The string is longer, because it contains 6 base32 charactersâ worth of
error-correcting code. N.b. also, the foregoing should work just fine for
v3 onions (formerly prop-224).
Imagine the impact on users who have a practical need to transmit a .onion
address by verbal communication, or via a handwritten note. Now they can
get some help with errors, instead of wondering why they canât connect to a
nonexistent .onion site.
The standard enjoins applications against autocorrecting Bitcoin
addresses, so as to prevent even the slightest possibility of causing funds
loss by being too âhelpfulâ. But in applications where it would be safe to
do so, Bech32 can indeed correct small errors (as well as reliably
detecting much worse errors). I suggest that such automatic correction
would be suitable for .onion addresses.
Bech32 co-author Dr. Wuille (sipa) has published Javascript reference
code, plus a Javascript error-correction demo, under an MIT license.
Perhaps this may be easily adapted into Torbutton, for automagic decoding
of Bech32 âonion1â to .onion domains in the Tor Browser address bar. The
code is in the same repository whence I copied the Bech32 reference C code
https://github.com/sipa/bech32
# Conclusionâor, to be continued...
An alternative representational format with error-correcting codes will
make .onion addresses more human-friendly. I look forward to the day when
âonion1â addresses can be passed by handwritten notes, vocalized with a
radio alphabet, stuffed into QR codes, scrawled on parchments placed in
bottles tossed to sea, rocketed into space, and then conveniently
transformed with appropriate corrections into the DNS-style .onion format
specified by RFC 7686.
Hereâs to the alternative Onion format of the future!
--
3NULL3ZCUXr7RDLxXeLPDMZDZYxuaYkCnG) (PGP RSA: 0x36EBB4AB699A10EE)
ââIf youâre not doing anything wrong, you have nothing to hide.â
No! Because I do nothing wrong, I have nothing to show.â â nullius
_______________________________________________
tor-dev mailing list
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev

nullius

2017-12-31 02:46:00 UTC

Permalink

Post by Alec Muffett
Thanks! That's very interesting! TIL :-)

Why, if it isnât instant feedback from the RFC 7686 co-author! In
response to what you said, in brief: I will propose that any subdomain
data (which is presumably human-readable) be transmitted in a separate
or affixed string, leaving Bech32 to deal with the pseudorandom blobs.
Technical details follow.

Post by Alec Muffett
What would you propose to do with subdomains, like
www.facebookcorewwwi.onion? Or is that outside the scope of your proposal?

Good question. That had briefly occurred to me; but I couldnât figure
out any feasible means to stuff subdomains into the Bech32 string, for
the following reasons:

(0) RFC 1034 DNS names may be up to 255 octets in length. But Bech32
strings are more length-limited. After subtracting an HRP of âonionâ (5
chars), the required separator of â1â, and the 6 characters of ECC
checksum in the data part, the 90-character total length limit can only
spare up to 78 characters for the onion address data. For both v2 and
v3 onions, thatâs more than sufficient. But even if the length limit
could be raised, an excessively long string would destroy the
human-friendliness which is the raison dâÃªtre for Bech32.

(I *infer* that this last may be one reason for the length limit.
Although of course I canât say for certain, Iâve read Greg Maxwell
discussing some of the user testing involved in the standardâs
development; and 90 chars seems to me the extreme of what a mortal
flesh-and-blood creature could handle with such a string.)

(1) Bech32 is a base-32 encoding, only with a different alphabet than
RFC 4648. Thus, it would be necessary to design another layer of
encoding to most efficiently represent subdomain labels and the
dot-separator with an alphabet of 38 characters [-0-9a-z.]. Worse,
depending on which standards an implementation follows or ignores, that
is not really a strict limitation on names seen in the wild. How should
the Bech32 transformation deal with names containing an underscore â_â?
Or other characters? I think it would only be safe to go with full
octets. This would severely exacerbate the problem of (0) above.

(Aside: The special alphabet is bound to raise some eyebrows; so I will
here quote its rationale from BIP 173: âThe character set is chosen to
minimize ambiguity according to [this](https://hissa.nist.gov/~black/GTLD/)
visual similarity data, and the ordering is chosen to minimize the
number of pairs of similar characters (according to the same data) that
differ in more than 1 bit. As the checksum is chosen to maximize
detection capabilities for low numbers of bit errors, this choice
improves its performance under some error models.â From what I
understand, a large amount of CPU time was spent crunching over the data
in search of the most error-resistant alphabet.)

(2) Most subdomains are human-memorableâin your example, âwwwâ. Coding
them with Bech32 would decrease human-friendliness, which is the precise
opposite of my objective in making this suggestion. Bech32 is great for
helping humans deal with pseudorandom blobs; for those, it improves upon
RFC4648 Base32, Base64, hexadecimal, or in Bitcoinâs case, the old
base58-based address encoding. But it is absolutely inappropriate as a
coding format for text which humans can easily read, type, and remember.

It is also important to consider relative impact in common usage. I
observe that most .onions do not use subdomains. I do think that itâs
important to support this use case; but if tradeoffs must be made, then
I would optimize more for making that pseudorandom blob less brittle in
human hands.

For the foregoing reasons, I will propose that subdomain data, if any,
be kept separate from the Bech32 coding. It may be either kept in a
separate string, or somehow affixed with a special delimiter either
before or after the Bech32 representation of the onion. Off-the-cuff,
which of these looks best to you?

www:onion19qzypww2zw3ykkkglr4tu9

onion19qzypww2zw3ykkkglr4tu9:www

another-level.www:onion19qzypww2zw3ykkkglr4tu9

(My choice of a delimiter here may be wrong, if we want for the
browserâs address bar to translate it. I should think more about this.)

Finally, I think I should mention: Yes, âonion19qzypww2zw3ykkkglr4tu9â
is not as pretty as âfacebookcorewwwi.onionâ. But few .onion sites have
the compute power available to Facebook! Moreover, my proposal should
apply to v3 onionsâwhere nobody on Earth will be able to fully
bruteforce out a human-memorable string.

I would advise users to stick to the DNS-style coding for
facebookcorewwwi.onion, and take advantage of Bech32 as an alternative
representation for http://yz7lpwfhhzcdyc5y.onion/ ,
http://5nca3wxl33tzlzj5.onion/ , and other such strings. Those are pure
pain for users now, and it will only get use when v3 onions get uptake.
Error-correcting codes do not make the names any easier to read; but
they certainly do help with the inevitable mistakes in all the use cases
which involve voice, handwriting, manual typing, carrier pigeons, etc.

teor

2017-12-31 03:23:39 UTC

Permalink

Hi,

Please read the naming layer API proposal before writing your proposal:

https://gitweb.torproject.org/torspec.git/tree/proposals/279-naming-layer-api.txt

In particular, if you added a unique top-level domain (.bech?), you
would only have to specify how a the bech translation plugin works.
(It would be a much shorter proposal.)

For the foregoing reasons, I will propose that subdomain data, if any, be kept separate from the Bech32 coding. It may be either kept in a separate string, or somehow affixed with a special delimiter either before or after the Bech32 representation of the onion. Off-the-cuff, which of these looks best to you?
www:onion19qzypww2zw3ykkkglr4tu9
onion19qzypww2zw3ykkkglr4tu9:www
another-level.www:onion19qzypww2zw3ykkkglr4tu9
(My choice of a delimiter here may be wrong, if we want for the browserâs address bar to translate it. I should think more about this.)

Why not:

www.onion19qzypww2zw3ykkkglr4tu9

Transforming the final 2 components and leaving the rest intact seems
like the most usable form. Particularly if you're going to add a .bech at
the end of the address for prop#279.

T

nullius

2017-12-31 10:12:53 UTC

Permalink

Post by teor
https://gitweb.torproject.org/torspec.git/tree/proposals/279-naming-layer-api.txt
In particular, if you added a unique top-level domain (.bech?), you
would only have to specify how a the bech translation plugin works. (It
would be a much shorter proposal.)

Thanks, teor. I reviewed the spec (version 13cbcbc) carefully, and
opened https://trac.torproject.org/24774 attaching a `git diff` patch
with proposed changes.

The crux of the matter is support for what I will call alternative name
representations. Prop-279 assumed quasi-DNS names resolved through some
sort of a network or database lookup. However, an alternative
representation can be entirely self-contained. Thus, one of the changes
I request is to explicitly permit a global wildcard '*' tld for plugins
which can be sandboxed with neither network nor filesystem access (and
will return answers in microseconds).

I also proposed changes to permit the UTF-8 characters required for
representing names in languages other than American English, and some
other technical improvements. I added status code 5 to support plugins
which can discern when a name is in a recognized format, but is
intrinsically invalid e.g. due to checksum failure; and I expanded the
description of status code 2, for plugins which do not have TLDs but do
recognize a definite syntax.

The potential use cases here extend beyond my suggestion for
Bech32-coded .onions. I also wish to encode .onion addresses in a
mnemonic phrase, similar to those generated by this tool:

easyseed(1) BIP 39 mnemonic phrase generator
https://github.com/nym-zone/easyseed
manpage:
https://raw.githubusercontent.com/nym-zone/easyseed/master/easyseed.1.txt

Out of the box, that will make a mnemonic from the raw data for a v3
.onion address, but not v2 (too short). I could easily draw up a spec
to represent v2 .onions as 8 words, and v3 onions as 24â25 words, each
including a simple checksum. The mnemonic standard Iâve been using
includes carefully designed wordlists for nine different languages; I
will soon be adding multilanguage support to my tool, which I could copy
over to a prop-279 name system plugin.

Now, imagine an activist under a repressive rÃ©gime whispering in the ear
of a whistleblower eight words for the address of a SecureDrop. Or
scrawling a Bech32 address on a scrap of paper in a hurry. The
possibilities are many.

Should my proposed changes be accepted, I will be eager to write tools
and plugins for .onion alternative representations which look either
like this (a real address, properly encoded in Bech32):

onion1kt50trm0nf4jxkskpcjy74

...or approximately like this (random words off a wordlist, for example
only):

mad century mirror awkward glory shine cake fat

...with out-of-the-box support for Chinese (Simplified), Chinese
(Traditional), French, Italian, Japanese, Korean, and Spanish, in
addition to English.

Wordlists, all designed to minimize user error:
https://github.com/bitcoin/bips/tree/master/bip-0039
(In the English list, all words are unique within the first four
characters; and similar/confusable words are excluded.)

Given appropriate prop-279 changes, I wonât need to draw a proposal.
Iâll simply write code!

Yawning Angel

2017-12-31 10:48:52 UTC

Permalink

I commented on the ticket but I'll do it here for completeness sake:

On Sun, 31 Dec 2017 10:12:53 +0000

Post by nullius
I also proposed changes to permit the UTF-8 characters required for
representing names in languages other than American English, and some
other technical improvements. I added status code 5 to support
plugins which can discern when a name is in a recognized format, but
is intrinsically invalid e.g. due to checksum failure; and I expanded
the description of status code 2, for plugins which do not have TLDs
but do recognize a definite syntax.

This is pointless because internationalized domain names are
standardized around Punycode encoding (Unicode<->ASCII), and said
standard is supported by applications that support IDN queries.

I am firmly against this change, and I'm not particularly thrilled by
the thought of homograph attacks either.

Post by nullius
Given appropriate prop-279 changes, I wonât need to draw a proposal.
Iâll simply write code!

It's worth keeping in mind that no one to my knowledge has implemented
prop 279 in the tor code itself, though there is (IIRC) a python kludge
that kind of allows development.

Regards,

--
Yawning Angel

nullius

2018-01-01 08:45:57 UTC

Permalink

Post by Yawning Angel
This is pointless because internationalized domain names are
standardized around Punycode encoding (Unicode<->ASCII), and said
standard is supported by applications that support IDN queries.
I am firmly against this change, and I'm not particularly thrilled by
the thought of homograph attacks either.

Happy New Year, Yawning; and apologies for the delayed reply. I thought
Iâd best work up some code for an object demonstration of why I urge the
importance of UTF-8 (and also embedded spaces, which I forgot to mention
explicitly).

Here is an 8-word mnemonic phrase encoding for Wikileaks
(http://wlupld3ptjvsgwqw.onion/), in 8 different languages or writing
systems:

real element glow tennis pluck museum hair shuffle
æŽ ç± å± ä»° æ³ª åŽ ä¹ æ
æœ æ å± ä»° æ· å³ ä¹ æ
parole distance fautif sombre notoire loyal flairer ratisser
retina erba idillio suonare potassio opposto india scuderia
ã«ãã€ãããããããã¡ãããã»ããããšããããããŸã€ããããããã¯ããã
áá¡á·áá¡ááµ áá¡á«áá®áš áá¡áŒáá®á· áá³á«áá¡á¯ ááµáá®á¯ áá§á¯áá¡ áá¥á«áá®áŒááµ áá®áŒáá¡á«
pie dulce gimnasio tabla oscuro molde guerra repetir

Imagine an activist whispering this address in someoneâs ear, in the
peopleâs native tongue!

Respectively, those mnemonics are in English, Chinese (Simplified),
Chinese (Traditional), French, Italian, Japanese, Korean, and Spanish.
Those are not my selections; they are the languages for which wordlists
are currently available in the standard I am adapting. Here is a hint
on how to produce these phrases:
https://github.com/nym-zone/easyseed/commit/ba77be1b1a1f0c6af50ceba5c89f4adece7e5dff

As for Punycode vs. UTF-8:

Homograph attacks are not âsolvedâ by Punycode any more than they would
be fixed by base64ing all addresses. Punycode is not a security
feature; to the contrary! CVE-2013-7424, CVE-2015-8948, CVE-2016-6261,
CVE-2016-6262, CVE-2017-14062.... Need I say more?

With some care, I can write a perfectly secure UTF-8 handler (forbidding
non-shortest form, with a proper U+FFFD replacement algorithm, etc.).
Whereas I have never seen a Punycode decoder which gives me confidence
in its behaviour under all possible inputs. I assiduously avoid
interacting with the bloat and pitfalls of IDNA and Punycode, insofar as
I can. By contrast, UTF-8 has been happily in use on Unix/Plan9 systems
for a quarter-century.

I know that as you say, applications which handle a string as a âdomainâ
will Punycode it before Tor even sees it. But my thinking from the
beginning was not in terms of DNS names. One of my constructive
criticisms of prop-279 is that it makes that assumption.

The proper question is not, âHow do we make more flexible pseudo-DNS
lookups?â, but rather more generally: **How can we turn the pseudorandom
binary data from .onion names into forms friendlier to humans?** If the
Name System API could be in some way modified to admit better answers in
the long term, then it would be my pleasure to help achieve that.

Now since I know that Alec Muffett is reading this thread, here are
mnemonics in the same languages for facebookcorewwwi.onion:

chimney capital common neither demand certain hen athlete
èº« ç ç å·š çœ® è¯ å ç¶
èº« ç± ç å·š çœ® è å ç¶
cameÌra boussole chasseur mairie crayon butiner fougeÌre annuel
casuale buffone collare osare derivare capello intuito apatico
ãããã€ããããããããããã¡ãããããããããããããããããã¯ãããã¯ã€
áá©áá§áš ááµáá¬áš áá¡ážáá§á« áášáá¡áŒ áá¢áá¡áŒ áá¡á·áá¡ ááŠáá¯á¯ áá©áá³áž
calor brazo centro mover crema cabeza helio antojo

Dare to dream outside the quasi-DNS box about how .onion addresses can
be represented!

Yawning Angel

2018-01-01 09:04:28 UTC

Permalink

On Mon, 1 Jan 2018 08:45:57 +0000

On 2017-12-31 at 10:48:52 +0000, Yawning Angel

I'm aware of the use cases for IDNs.

Homograph attacks are not âsolvedâ by Punycode any more than they
would be fixed by base64ing all addresses. Punycode is not a
security feature; to the contrary! CVE-2013-7424, CVE-2015-8948,
CVE-2016-6261, CVE-2016-6262, CVE-2017-14062.... Need I say more?

Sigh, the problem is encoding format agnostic.

My point was, by allowing non-ASCII characters the onus is on *someone*
to solve the problem of homograph attacks (which admittedly is a bit of
a tangent). I'm painfully aware that all browsers, including Tor
Browser have utterly inadequate solutions here.

I know that as you say, applications which handle a string as a
âdomainâ will Punycode it before Tor even sees it. But my thinking
from the beginning was not in terms of DNS names. One of my
constructive criticisms of prop-279 is that it makes that assumption.

It makes that assumption because it is an entirely reasonable thing
to do in the context of Tor.

Dare to dream outside the quasi-DNS box about how .onion addresses
can be represented!
a) if Onion addresses suddenly stop looking very-similar-to DNS
addresses, Tor risks returning to a world where special expertise is
necessary to build software for it, thereby harming growth/adoption

The current proposal can get "very similar-to DNS addresses" IDNs by
using the same encoding format that DNS uses.

Regards,

--
Yawning Angel

meejah

2018-01-12 22:00:57 UTC

Permalink

Post by Yawning Angel
It's worth keeping in mind that no one to my knowledge has implemented
prop 279 in the tor code itself, though there is (IIRC) a python kludge
that kind of allows development.

Said kludge is here, for completeness:

https://github.com/meejah/torns

(It's definitely not a thing you should use "in production" or whatever,
but a nice toy if you want to play with a Prop279 implementation). I'm
happy to merge PRs to fix things etc but I'm not "actively developing"
it.

Also worth noting that Tor doesn't play nicely with multiple controllers
that try to do stream-attaching; the above thing does stream-attaching.

--
meejah

Alec Muffett

2017-12-31 11:46:28 UTC

Permalink

For the foregoing reasons, I will propose that subdomain data, if any, be
kept separate from the Bech32 coding. It may be either kept in a separate
string, or somehow affixed with a special delimiter either before or after
the Bech32 representation of the onion. Off-the-cuff, which of these looks
best to you?
www:onion19qzypww2zw3ykkkglr4tu9
onion19qzypww2zw3ykkkglr4tu9:www
another-level.www:onion19qzypww2zw3ykkkglr4tu9
(My choice of a delimiter here may be wrong, if we want for the browserâs
address bar to translate it. I should think more about this.)

I need to think about this more, and after coffee, but my first concerns
would be:

1) that having multiple representations of a site's onion address is likely
to break many/most sites, because of Host/Origin headers being complicated
enough already.

2) anything involving colons in any position ("https://
onion19qzypww2zw3ykkkglr4tu9:www/") is likely to break both
client-side-web-browsers and server-side-CMS-software unless they are
specially re-engineered for Tor, which is likely to inhibit use *of* Tor;
colons are a port-number separator in URLs, unless they come as part of an
IPv6 address in [square brackets].

My general sense is that:

a) if Onion addresses suddenly stop looking very-similar-to DNS addresses,
Tor risks returning to a world where special expertise is necessary to
build software for it, thereby harming growth/adoption

b) if Onion addresses have 2+ forms, one like the current (www.
4acth47i6kxnvkewtm6q7ib2s3ufpo5sqbsnzjpbi7utijcltosqemad.onion) and the
other being apparently more human-usable because it contains a CRC, the one
which allows access to websites will win.

My expectation to date has been that the problem with "
4acth47i6kxnvkewtm6q7ib2s3ufpo5sqbsnzjpbi7utijcltosqemad" is that that
there is no place for the eyeball to rest when typing it in; as such I've
presumed that a canonical form, defined by Tor, would be something like:

https://www.
4acth47i-6kxnvkew-tm6q7ib2-s3ufpo5-sqbsnzjp-bi7utij-cltosqem-ad.onion/

...being N groups of M characters (where N and M can be argued, feel
free...) and where any unused characters within the 63-character
DNS-compliant budget can be used to implement a credit-card-like running
checksum or CRC, for quick client-side checks; eg: the URL bar can identify
that you are typing in an Onion address and leave it red-or-grey until you
type something which satisfies the checksum, before flinging it at
tor-daemon for attempted resolution.

Or, indeed, you could leave out the hyphens and do the same; the Prop224
Onion address is 59 characters, leaving a budget of 63-59==4 characters or
20 bits; we could put these at the end, in the space marked "@@@@":

https://www4acth47i6kxnvkewtm6q7ib2s3ufpo5sqbsnzjpbi7utijcltosqemad@
@@@.onion/

....and use those 20 bits to implement 5x 4-bit checksums over 12-character
chunks:

https://{www
4acth47i6}{kxnvkewtm6q7}{ib2s3ufpo5sq}{bsnzjpbi7uti}{jcltosqemad@}@@@.onion/

...so that any UX component which wants to help the user can highlight (in
red? or bold?) where the problem is, picking out a chunk of 12 characters
which contain the typo:

https://www4acth47i6kxnvkewtm6q7*ib2s3ujpo5sq*
bsnzjpbi7utijcltosqemadwxyz.onion/
---------------------------------^^^^^^^^^^^^

Spot the errant 'j'.

The advantage of a system like this is that it's not perfect, but a typo
mostly has to happen twice and be quite fortunate to go undetected.

Of course it's not perfect, but nothing will be, and clever selection of
checksum and encoding will result in something which is still DNS- and
Browser-compliant.

-a

--
http://dropsafe.crypticide.com/aboutalecm

Alec Muffett

2017-12-31 12:01:44 UTC

Permalink

Post by Alec Muffett
...so that any UX component which wants to help the user can highlight (in
red? or bold?) where the problem is, picking out a chunk of 12 characters
https://www4acth47i6kxnvkewtm6q7*ib2s3ujpo5sq*bsnzjpbi7utijclt
osqemadwxyz.onion/
---------------------------------^^^^^^^^^^^^
Spot the errant 'j'.
The advantage of a system like this is that it's not perfect, but a typo
mostly has to happen twice and be quite fortunate to go undetected.
Of course it's not perfect, but nothing will be, and clever selection of
checksum and encoding will result in something which is still DNS- and
Browser-compliant.

One other advantage: a DNS-format-compliant checksum like this could be
trivially baked into an SSL certificate without requiring CA/Browser Forum
to invent a wholly new kind of certificate just-for-Tor

This would result in Prop224 Onion Addresses which would not only be
typo-resistant, but could also continue to be issued with EV certificates
where site-attestation is beneficial.

Further: adding segment-checksum bits at the end would be (I think?)
backwards compatible with existing Prop224 addresses.

-a

--
http://dropsafe.crypticide.com/aboutalecm

teor

2017-12-31 12:22:10 UTC

Permalink

Hi,

b) if Onion addresses have 2+ forms, one like the current (www.4acth47i6kxnvkewtm6q7ib2s3ufpo5sqbsnzjpbi7utijcltosqemad.onion) and the other being apparently more human-usable because it contains a CRC, the one which allows access to websites will win.

What if they both allow access to websites?

I had always thought that prop#279 addresses would be
translated into their canonical forms before the browser
acts on them. But the current proof-of-concept
implementation would include them in the Host header,
because the translation is done at the Tor layer
(not the browser layer).

This also makes a mess of security certificates.
(Or it means that both names would need to be in the certificate.)

And there's the issue of having two names for the same site.

https://www.4acth47i-6kxnvkew-tm6q7ib2-s3ufpo5-sqbsnzjp-bi7utij-cltosqem-ad.onion/
...being N groups of M characters (where N and M can be argued, feel free...)

That's not what's specified right now, and it not what will be
released in 0.3.2 in a few weeks.

But we could implement a grouping and checksum mechanism
like this using a prop#279 plugin, much like the bech transform.

Depending on where we do the name translation, this change
would cause the same Host header and certificate issues.

The advantage of a system like this is that it's not perfect, but a typo mostly has to happen twice and be quite fortunate to go undetected.
Of course it's not perfect, but nothing will be, and clever selection of checksum and encoding will result in something which is still DNS- and Browser-compliant.

This is true. We should make any schemes DNS-compliant,
which is how the examples in prop#279 work.

This would result in Prop224 Onion Addresses which would not only be typo-resistant, but could also continue to be issued with EV certificates where site-attestation is beneficial.
Further: adding segment-checksum bits at the end would be (I think?) backwards compatible with existing Prop224 addresses.

They would be compatible, as would most prop#279 addresses,
apart from the issues mentioned above.

Are you aware that there's already a checksum in v3 onion
service addresses?

"The onion address of a hidden service includes its identity public key,
a version field and a basic checksum."

https://gitweb.torproject.org/torspec.git/tree/rend-spec-v3.txt#n2012

T

Alec Muffett

2017-12-31 13:04:06 UTC

Permalink

On 31 Dec 2017 12:22, "teor" <***@gmail.com> wrote:

Are you aware that there's already a checksum in v3 onion
service addresses?

No I was not*, that's great!

"The onion address of a hidden service includes its identity public key,

a
version field and a basic checksum."

It would be great to get the human interface elements to leverage this;
perhaps overall we are premature in trying to solve the presumed HCI issues
of long onions?

- a

*entirely

nullius

2018-01-01 23:51:14 UTC

Permalink

On 2018-01-01 at 22:36:53 +0000, Taylor R Campbell

Date: Sun, 31 Dec 2017 11:46:28 +0000
Or, indeed, you could leave out the hyphens and do the same; the Prop224
Onion address is 59 characters, leaving a budget of 63-59==4 characters or

Actually, the label part is 56 characters, not 59 characters.
rend-spec-v3.txt, Â§ 6 [ONIONADDRESS]. See also Â§ 1.2 [NAMING] (âThe
result is a 56-character domain nameâânit, that should be âlabelâ).
Using the first example example address therefrom:

$ bech32 -e pg6mmjiyjmcrsslvykfwnntlaru7p5svn6y2ymmju6nubxndf4pscryd.onion
onion10x7vvfgcfvz3jjt4c29kddntq35l0aj4d7c6cvvf57d5phdr9u0jz3crm5jhsx

Of course, 56 + 6 = ...

$ echo -n \
0x7vvfgcfvz3jjt4c29kddntq35l0aj4d7c6cvvf57d5phdr9u0jz3crm5jhsx \
| wc -c
62

N.b. that this still includes the two octets of truncated SHA3-256,
wrapped inside a format with 30 bits of error-correcting BCH code.
Decoding/re-encoding the name to drop the SHA3 bits would cut the
payload from 280 to 264 octets, which could be represented in 53+6=59
Bech32 characters with the BCH ECC.

I also question whether the onion version needs a whole octet. In the
specific application of Bech32 to Bitcoin, the âwitness versionâ
(version of encoded tx auth program) is restricted to 0â16, inclusive;
and the Bech32 coding is done with one of what I will call a âquintetâ
char (5 bits) for the version, followed by the encoding of 8-bit octets
of the witness program.[0] If the .onion version were resticted to 0â15
so as to fit in 4 bits, then only 260 bits = 52 quintets would be needed
to express the version plus the 256-bit master identity key. How many
.onion address versions are expected in, say, the next 20â30 years?
Adding a 6-char BCH code, the total label length would be 58 quintet
characters.

At these lengths, I think every character of pseudorandom data which can
be reasonably shaved off is a significant win for wetware UX.

0. Note, Bech32 encoding rules do not require that the encoded bit
length be a multiple of 5. The standard prescribes the simple rule that
strings of octets be zero-padded to a multiple of 5 bits when encoding,
and decoded to octets with up to 4 trailing 0 bits discarded.
https://github.com/bitcoin/bips/blob/master/bip-0173.mediawiki

Existing checksum in v3 addresses aside, what would prevent using a
second DNS label for a longer checksum if you wanted a bigger budget?
The labels are limited to 63 octets, but the whole name can be up to
255 (including label length bytes).

I expect that the user burden of a greater length of pseudorandom
gibberish would outweigh any possible UX benefit of adding more checksum
data. A 6-quintet BCH code already provides error correction,
guarantees detection of errors affecting not more than 4 characters, and
has a <10^-9 probability of failing to detect a greater number of
errors. Is better than that really needed?

Upon the same cryptographic self-validation principle which .onion
applies in the first place, I have also considered such possibilities as
encoding a TLS public key fingerprint in subdomain labels. The
fingerprint could be automatically verified by the connecting TLS client
against the same data it itself provides via SNI. This could alleviate
the current need to get CAB Forum to approve some form of DV for .onion
certificates. However, the results must be considered absolutely
impracticable for humans transcription. The usage model would rely
exclusively on bookmarks, copypaste, etc.

teor

2018-01-02 02:47:46 UTC

Permalink

Date: Sun, 31 Dec 2017 11:46:28 +0000
Or, indeed, you could leave out the hyphens and do the same; the Prop224
Onion address is 59 characters, leaving a budget of 63-59==4 characters or

Actually, the label part is 56 characters, not 59 characters. rend-spec-v3.txt, § 6 [ONIONADDRESS]. See also § 1.2 [NAMING] (“The result is a 56-character domain name”—nit, that should be “label”).

We would happily take a patch that makes the wording more
precise throughout the proposal and Tor's other specifications.

…
N.b. that this still includes the two octets of truncated SHA3-256, wrapped inside a format with 30 bits of error-correcting BCH code. Decoding/re-encoding the name to drop the SHA3 bits would cut the payload from 280 to 264 octets, which could be represented in 53+6=59 Bech32 characters with the BCH ECC.

You could safely drop and recalculate the hash, but if the onion
address encoding changes in a future version, you would have
to patch all the bech code.

I also question whether the onion version needs a whole octet. In the specific application of Bech32 to Bitcoin, the “witness version” (version of encoded tx auth program) is restricted to 0–16, inclusive; and the Bech32 coding is done with one of what I will call a “quintet” char (5 bits) for the version, followed by the encoding of 8-bit octets of the witness program.[0] If the .onion version were resticted to 0–15 so as to fit in 4 bits, then only 260 bits = 52 quintets would be needed to express the version plus the 256-bit master identity key. How many .onion address versions are expected in, say, the next 20–30 years? Adding a 6-char BCH code, the total label length would be 58 quintet characters.
At these lengths, I think every character of pseudorandom data which can be reasonably shaved off is a significant win for wetware UX.

We won't be revising the spec at this point, because it's been
implemented. However, you could suggest that the next
version of onion services only uses 5 bits to encode the version.

You could safely encode the current version 3 in zero bits, but if
the onion address encoding changes in a future version, you
would have to patch all the bech code.

One way of doing this is to make the bech prefix "onion3".

...