[tor-dev] anonbib

Discussion:

[tor-dev] anonbib

ng0

2017-11-03 21:58:35 UTC

Hi,

our plan with the bibliography collection of GNUnet is to
implement something similar to your/freehaven's anonbib.

While running the build and cache update of it from
current git HEAD on the anonbib.cfg I noticed a number of
outdated and broken links.

I'm currently playing with 2 options: re-use anonbib as it is
and change the style + some of its content (for us at GNUnet)
or write something similar to it.
From my perspective option 1 would be the best as we could
work on fixing links together, keep the content up-to-date
and at the same time keep the duplicate efforts and work
down to a minimum.

What do you think?

--
ng0
GnuPG: A88C8ADD129828D7EAC02E52E22F9BBFEE348588
GnuPG: https://dist.ng0.infotropique.org/dist/keys/
https://www.infotropique.org https://ng0.infotropique.org

grarpamp

2017-11-04 00:29:00 UTC

Permalink

At some time in the past I noticed that the anonbib did
not have links to local copies of some of the materials.
If that's still the case, I'd definitely suggest creating them
at this oppurtunity.
And though rare and more curation work, some papers do
receive content / errata updates.

Nick Mathewson

2017-11-04 15:08:23 UTC

Permalink

Post by ng0
Hi,
our plan with the bibliography collection of GNUnet is to
implement something similar to your/freehaven's anonbib.
While running the build and cache update of it from
current git HEAD on the anonbib.cfg I noticed a number of
outdated and broken links.
I'm currently playing with 2 options: re-use anonbib as it is
and change the style + some of its content (for us at GNUnet)
or write something similar to it.
From my perspective option 1 would be the best as we could
work on fixing links together, keep the content up-to-date
and at the same time keep the duplicate efforts and work
down to a minimum.
What do you think?

Hi! I'd love to have more people working on the anonbib content. The code
itself is an old yucky kludge to which I feel no strong attachment, and the
generated HTML is also in need of a revamp.

So, "patches welcome"!

--
Nick

Roger Dingledine

2017-11-04 18:54:46 UTC

Permalink

Post by ng0
our plan with the bibliography collection of GNUnet is to
implement something similar to your/freehaven's anonbib.

Great.

See also the censorbib, for another example.

Post by ng0
While running the build and cache update of it from
current git HEAD on the anonbib.cfg I noticed a number of
outdated and broken links.

Yep. Many links have failed over the years. That was one of the big
reasons to have the local cached version of each file.

Post by ng0
I'm currently playing with 2 options: re-use anonbib as it is
and change the style + some of its content (for us at GNUnet)
or write something similar to it.
From my perspective option 1 would be the best as we could
work on fixing links together, keep the content up-to-date
and at the same time keep the duplicate efforts and work
down to a minimum.

Sounds plausible to me. I think we would be excited to take patches for
broken links -- even if the new link becomes just a link to our cached
version, which will hopefully live forever. :)
https://www.freehaven.net/anonbib/cache/

But for the ones that have a broken link *and* don't have a cached
version, it would be especially awesome for somebody to track those down.

It's not entirely clear what we ought to do with anonbib. At the
beginning, there was no google scholar, so it really was the place to go
to find out about papers in the anonymous communications area. And also,
back then, there were only 10 or 15 papers and you could feasibly read
all of them.

Now I think anonbib needs to be something other than "all of the papers
about the topic". One way forward would be to cull it even more, so it
becomes more of a recommended reading list.

--Roger

Moritz Bartl

2017-11-04 19:24:22 UTC

Permalink

Post by Roger Dingledine

Post by ng0
our plan with the bibliography collection of GNUnet is to
implement something similar to your/freehaven's anonbib.

Great.
See also the censorbib, for another example.

There's also a mixnet bibliography at https://bib.mixnetworks.org/ /
https://github.com/applied-mixnetworks/mixbib . If you come across
papers related to mixnets, please submit a patch! Also, we should add
highlights like the anonbib has.

ng0

2017-11-04 19:57:05 UTC

Permalink

Post by Roger Dingledine

Post by ng0
our plan with the bibliography collection of GNUnet is to
implement something similar to your/freehaven's anonbib.

Great.
See also the censorbib, for another example.

Thanks, I'll search for it.

Post by Roger Dingledine

Post by ng0
While running the build and cache update of it from
current git HEAD on the anonbib.cfg I noticed a number of
outdated and broken links.

Yep. Many links have failed over the years. That was one of the big
reasons to have the local cached version of each file.

Sounds plausible to me. I think we would be excited to take patches for
broken links -- even if the new link becomes just a link to our cached
version, which will hopefully live forever. :)
https://www.freehaven.net/anonbib/cache/
But for the ones that have a broken link *and* don't have a cached
version, it would be especially awesome for somebody to track those down.
It's not entirely clear what we ought to do with anonbib. At the
beginning, there was no google scholar, so it really was the place to go
to find out about papers in the anonymous communications area. And also,
back then, there were only 10 or 15 papers and you could feasibly read
all of them.
Now I think anonbib needs to be something other than "all of the papers
about the topic". One way forward would be to cull it even more, so it
becomes more of a recommended reading list.
--Roger

Christian Grothoff and myself have a different understanding of
how we would apply anonbib to our work, but essentially we would
have 2 different "flavors". Anonbib has a specific focus (I assume,
I didn't go through all the papers yet) and our paper selection
would be more focused on another topic. Christian's idea is that
we'd have two different topics hosted.
We are discussing this here right now: https://gnunet.org/bugs/view.php?id=5121

Post by Roger Dingledine
Working together on the anonbib code: great. Just to clarify: we
would host _our_ bibligraphy and they'd continue to host theirs,
right? Because the focus (secure P2P vs. anonymity) is somewhat
different, so it does make sense to have two different sites with different papers.

Now the "problem" is neither our bibliography nor yours seem to be
completely "ours" or "yours", we mix in what we picked up on the
way to where we are now. Our bibliography.git export right now
counts 1045 files.

I agree with you, to trim them down could be necessary.
For example we could concentrate on creating selected volumes
of papers and the cross-links between them, and stay within
a chosen topic.

I have no idea (at the moment) what has been collected on our
side and how many of the files are outside of a common theme,
I only did the export to git recently.

I'd rather not let people depend on Google's infrastructure
for knowledge, but it shouldn't be out job to maintain a
complete and growing library of knowledge either, so picking
a topic and cutting down to that sounds reasonable to me.

Post by Roger Dingledine
_______________________________________________
tor-dev mailing list
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev

--
ng0
GnuPG: A88C8ADD129828D7EAC02E52E22F9BBFEE348588
GnuPG: https://dist.ng0.infotropique.org/dist/keys/
https://www.infotropique.org https://ng0.infotropique.org

grarpamp

2017-11-04 22:01:40 UTC

Permalink

Post by Roger Dingledine

Post by ng0
our plan with the bibliography collection of GNUnet is to

https://gnunet.org/bibliography

Post by Roger Dingledine

Post by ng0
implement something similar to your/freehaven's anonbib.

https://www.freehaven.net/anonbib/
https://www.onion-router.net/Publications.html

Post by Roger Dingledine
See also the censorbib, for another example.

https://censorbib.nymity.ch/

# I2Pbib
https://geti2p.net/en/papers/

There a a few more I can't recall right now. If anyone knows of other
community curated collections in the overlay routing mixnet messaging
p2p privacy crypto comms distributed filesharing storage spaces...
feel free to post links to them in this subthread.

Post by Roger Dingledine
It's not entirely clear what we ought to do with anonbib. At the
beginning, there was no google scholar, so it really was the place to go
to find out about papers in the anonymous communications area. And also,
back then, there were only 10 or 15 papers and you could feasibly read
all of them.

Yes there are lots of papers all over the net, and in massive
collections like arxiv,
SSRN, etc... but few places collected and curated by community of
relavance here.
One could envision a large community curation database bibliography
of papers and multimedia presentations. With tagging and export system,
including perhaps json / HTML, pick and choose your fields, for those papers
that each official project tags as references or relavant to their interests.
Click 'Tor', get Tor's... click 'Briar', get Briar's... etc and
in addition to the obvious function of global search and browse everything
in the databse by various sorting / filters / rankings that the reader chooses.
With open submission by anyone (ie: as yet non project submitted /
tagged bibinfos),
of new entries into a 'potentially relavant to community' subpool, such
that they might eventually be tagged by projects and readers as desired.

Saves a lot of duplicative work at the projects, is easily mirrored,
imported into web pages, etc.

Post by Roger Dingledine
Now I think anonbib needs to be something other than "all of the papers
about the topic". One way forward would be to cull it even more, so it
becomes more of a recommended reading list.

grarpamp

2017-11-04 22:40:20 UTC

Permalink

Post by grarpamp
Saves a lot of duplicative work at the projects, is easily mirrored,
imported into web pages, etc.

With mentioned problems of
- Google threat covered by community hosting and replication.
- Separate / overlapping project / topic focus covered by a
flexible tagging and views system.
- Not easily being able to find and read what other projects in the space
are referencing covered by now having a combined database itself.
- Maintaining effort of growing multiple bib systems covered by everyone
lending some minor time coding to the main bib project db itself,
freeing up time for each project to then focus on submit / tag
and reading / using the materials as the more beneficial result.

And so on.

grarpamp

2017-11-04 23:01:56 UTC

Permalink

Post by grarpamp
With mentioned problems of

Scfith Riseup

2017-11-05 00:05:08 UTC

Permalink

I wonder if there is an option to start to use ipfs ( https://ipfs.io/ ) or something like it to permanently and resiliently store items for posterity?

Post by grarpamp

Post by grarpamp
With mentioned problems of

- Broken links, not founds, redirects covered by a single monthly
crawl thus being regular and benefitting all projects at once.
- Size, could apply common compression such as xz or even ZSTD
to entire mirrorable local archive. Similar for video materials.
http://open-zfs.org/w/images/b/b3/03-OpenZFS_2017_-_ZStandard_in_ZFS.pdf
http://youtu.be/hWnWEitDPlM
_______________________________________________
tor-dev mailing list
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev

grarpamp

2017-11-05 07:14:47 UTC

Permalink

Post by Scfith Riseup
I wonder if there is an option to start to use ipfs ( https://ipfs.io/ ) or
something like it to permanently and resiliently store items for posterity?

Bib users would need a client to avoid abusing inproxy.
Though a client would offload from the bib.

There doesn't seem to be much of a data loss issue now,
papers with broken links are still refindable and fixable
if searched for hard enough, no?

But it might be said there's organization, maintenance, and
wider audience utility issues with current bibs.

However
- Once a better bib gets made, someone should consider
pushing the dataset into IPFS, gnunet, storj, whatever.
Object hash deduplicated systems among them are storage
efficient, no matter how many people push the same thing.
- Since most video presentation data exists only on youtube
(aka: google) at their whim, I assign high risk of loss to that
community corpus. It's a mess. All projects should be publishing
local copies of theirs for mirroring. Also, it's hard to autodedupe
down from youtube since they embed uniques per download / view.
- Projects should self host, or at least dual home themselves,
in their own overlays. for reference and other uses.

ng0

2017-11-06 05:56:30 UTC

Permalink

Post by grarpamp

Post by Scfith Riseup
I wonder if there is an option to start to use ipfs ( https://ipfs.io/ ) or
something like it to permanently and resiliently store items for posterity?

Bib users would need a client to avoid abusing inproxy.
Though a client would offload from the bib.
There doesn't seem to be much of a data loss issue now,
papers with broken links are still refindable and fixable
if searched for hard enough, no?
But it might be said there's organization, maintenance, and
wider audience utility issues with current bibs.
However
- Once a better bib gets made, someone should consider
pushing the dataset into IPFS, gnunet, storj, whatever.
Object hash deduplicated systems among them are storage
efficient, no matter how many people push the same thing.
- Since most video presentation data exists only on youtube
(aka: google) at their whim, I assign high risk of loss to that
community corpus. It's a mess. All projects should be publishing
local copies of theirs for mirroring. Also, it's hard to autodedupe
down from youtube since they embed uniques per download / view.
- Projects should self host, or at least dual home themselves,
in their own overlays. for reference and other uses.

Good morning,

I like the proposed ideas so far (especially the idea of
being able to filter by tags and keeping one code repository
that could be reused at each others location.
We could try and use http://libgen.io/ and https://sci-hub.cc/
as a fallback search if there's an generic API for them
(I haven't tried so far), I heard they are good although
sometimes (they might be?) legally in the grey depending
on where you are located.

I think videos should be a separate issue, we selfhost them
already as far as I know but integrating them into git is
no (good) solution.
If you don't go for something like Mediagoblin, you could
ask the higher level organization you are part of (for
example GNU, in our case) if video/audio hosting capabilities
exist. Asking CCC for hosting would be another choice, for
their media they have a good amount of mirrors.
In the longterm this should be replaced, but for now this
is good enough.
However, this is derailing a bit from the original issue.

You listed some bibs that are similar to the ones already
mentioned and proto-bibs (like ours at GNUnet). Should we
track down more of them to ask the groups and people
running them if they want to get involved?
Or do you want to get started?

I'll need the feedback of Grothoff before I can say wether
we as a group agree or not. My opinion is that it's good
and reusable at our side without causing too much confusion
about content and location.

--
GnuPG: A88C8ADD129828D7EAC02E52E22F9BBFEE348588
GnuPG: https://dist.ng0.infotropique.org/dist/keys/
WWW: https://ng0.infotropique.org

grarpamp

2017-11-06 09:45:39 UTC

Permalink

Post by ng0
I think videos should be a separate issue, we selfhost them
already as far as I know but integrating them into git is
no (good) solution.

Don't think I would propose committing the actual videos / papers
to git... too much bloat... just the bib / meta / hash info and links.
Perhaps the links would point to files on the joint webserver.
Mirrors could clone the git and rsync the files.
Primary video links could be out to youtube.
Secondary sets of links that require clients could go to IPFS
or wherever for both papers and videos, even torrent magnet
infohash, seeding bandwidth could be shared across projects
as well.

Post by ng0
If you don't go for something like Mediagoblin
exist. Asking CCC for hosting would be another choice, for
their media they have a good amount of mirrors.

Whatever works.

Post by ng0
Should we
track down more of them to ask the groups and people
running them if they want to get involved?

If in the crypto privacy messaging overlay etc etc etc spaces,
it could be beneficial to at least send them a link to this thread.
Since each can freely tag to their own desire / view, and saves
maintenance it could be a hit.