[tor-dev] Atlas is not that friendly to Web Archive

Iain Learmonth

2018-02-13 16:42:23 UTC

Hi,

Post by Leonid Evdokimov
I've recently found out that new Atlas re-design is not that friendly to
web archive. http://archive.li/ can't properly detect "page loaded"
event that leads to capturing "loading" page[%]. Moreover,
https://web.archive.org/ can't capture #-based links at all, as far as I see.

This is an interesting point. There is not really any way currently to
link to a relay at a particular point in time. The data itself is
preserved in CollecTor, but not in an easy to consume form.

Capturing rendered pages for later viewing is probably not the most
useful thing that humanity could be doing with its disk drives. The
reason that we currently cannot have a time travel service for Relay
Search is that Onionoo would not be able to handle that amount of data
with its current architecture.

If someone produces a patch that fixes this for Relay Search, I'd be
happy to review it. I haven't yet investigated exactly what would be
required. In the long term though, I would like to fix this issue with a
service that can provide time travel information.

There is also another possible option, which is not quite as pretty but
may do enough to be useful for this purpose, which relates to raw
descriptors. #22026 would create a service for accessing raw
descriptors, which we could perhaps make into a time traveling service
and allow you to have a link to cite a raw descriptor.

Thanks,
Iain.