Discussion:
[tor-dev] Proposal: Check Maxmind GeoIP DB before distributing
Jaskaran Singh
2018-06-30 11:53:27 UTC
Permalink
Hi List,

Please have a look at this proposal.

Filename: Check-Maxmind-GeoIP-DB-before-distributing.txt
Title: Check Maxmind GeoIP-DB before distributing
Ticket(s): #26240
Author: Jaskaran Singh
Created: June 2018
Status: Open

0. Motivation and Overview
We're using Maxmind's (company registered in the US) GeoIP Database,
which is not just antithetical to the philosophy that one should not
totally rely on a service/software for all needs, but has some serious
security repercussions too.

Trusting Maxmind's GeoIP Database is dangerous, as it may lead to some
possible attacks on the Network. We propose that the Database be checked
for integrity before distributing to the users. The whole process of
checking for integrity can be assigned to the Directory Authorities (or
any trusted systems) who would be responsible for completing it using a
script.

We should also give a choice to the user whether she wants to use
Maxmind's DB or any other DB of her choice, or even to not use any
Geo-IP DB at all.

1. Threat Model
We assume an adversary that is capable of introducing false information
in the Maxmind GeoIP database, either by it's influence over the company
or otherwise. The adversary also has enough resources to perform Sybil
attack on the network.

2. Attacks on the Network

2.1 Sybil attack under the Radar
The Tor Network is constantly monitored for any suspicious spike in
nodes, as it may be an indication of an oncoming/undergoing sybil
attack. A powerful adversary can coerce Maxmind to map some specific IP
address blocks to different countries. This may lead to people/scripts
monitoring the network to not feel suspicious about this event, and
would result in the adversary staying under the radar.

2.2 False Location indication for a shady node
A large percentage of people don't want the exit of their circuits to be
located in certain countries where the communication is under
surveillance. The powerful adversary knows this as well. Users generally
add a line in their config that allows them to not form a circuit
through nodes located in those locations. To overcome this, the
adversary can coerce Maxmind to alter it's database to map some
particular IP's to locations which the user thinks are havens of free
speech.

3. Design of the Solution
We should check Maxmind database against it's own previous versions.
Additionally we should also simply stop using GeoIP database
intrinsically for every purpose but still allow users to plug in their
own databases through the interface we implement. Perhaps the latter can
be introduced as ./configure option for when the user is highly
distrustful of Maxmind and wants to use a service she trusts, or doesn't
wants to use at all. The two solutions are explained below.

3.1 Checking for integrity

Step 1: The Dir Authorities (or any trusted computers) fetch the latest
maxmind geoip-db along with its previous versions.

Step 2: Tor Nodes' location are checked against the previous versions
for any changes.

Step 3: All the Dir Authorities perform the above two steps
independently of each other. A count of the number of changes in node
locations is maintained. If the changes are in significant amount, they
are viewed with suspicion, since this can be the preparation of a sybil
attack by the adversary. In such a case, the new changes into the
database can be discarded. Though, even change in a single node's
location is concerning, but it is not easy attribute that change to
malice. Sometimes there are genuine reasons for a location to change.

Step 4. This database is then distributed to the users.

3.2 Doing away with GeoIP location altogether
GeoIP databases are occasionally un-realiable and can be done away with
safely. We can provide a ./configure option to the users that enables
them to plug in their own trusted service. If the user doesn't have
access to a database of her own choice, she can simply choose Maxmind,
or not use any database at all. It would remove our dependence from just
one database, and diversify our usage.

4. Licensing issues
Maxmind has a pretty liberal license when it comes to their database, as
summarized below

Maxmind - CC BY-SA 4.0
* Copy and redistribute the material in any medium or format
* remix, transform, and build upon the material
for any purpose, even commercially

5. Dealing with false positives
Maxmind calculates geolocation of an IP addr using WHOIS records,
Reverse DNS etc. It claims to have precision rate of 99.5% on country
level. The other 0.5% is more likely to be those IP addresses for which
neither WHOIS record nor Reverse DNS are setup.

A very large percentage of Tor Nodes are run from datacenters, which
usually have all their records set up. It's highly unlikely for an IP
address belonging to a datacenter to be mapped to a wrong location.

Hence, false positives would be very few, and can be safely ignored
after a simple manual/scripted investigation.
--
Jaskaran Veer Singh (jvsg)
jvsg1303 at gmail dot com
PGP 2814 3FB7 A32D 429B 092E 27F0 8AA3 C532 9E1A 6AD8
Iain Learmonth
2018-07-01 14:54:11 UTC
Permalink
Hi,
Post by Jaskaran Singh
0. Motivation and Overview
We're using Maxmind's (company registered in the US) GeoIP Database,
which is not just antithetical to the philosophy that one should not
totally rely on a service/software for all needs, but has some serious
security repercussions too.
I would love to see a full list of all the places we currently use this
database and what security consequences could be.

Relevant tickets to this discussion that you may want to read have the
keyword "metrics-geoip" in trac.

Also, you may be interested in karsten's comment on #22203 where we talk
about downloading signed GeoIP files from the dirauths instead of
shipping them in the distribution.

Thanks,
Iain.
Katharina Kohls
2018-07-03 09:34:33 UTC
Permalink
Hi,
Post by Jaskaran Singh
5. Dealing with false positives
Maxmind calculates geolocation of an IP addr using WHOIS records,
Reverse DNS etc. It claims to have precision rate of 99.5% on country
level. The other 0.5% is more likely to be those IP addresses for which
neither WHOIS record nor Reverse DNS are setup.
A very large percentage of Tor Nodes are run from datacenters, which
usually have all their records set up. It's highly unlikely for an IP
address belonging to a datacenter to be mapped to a wrong location.
Hence, false positives would be very few, and can be safely ignored
after a simple manual/scripted investigation.
We measured Tor relay locations a while ago using ICMP RTT measurements
from multiple server instances located in Europe, North America, Asia,
and Oceania. Using the minimum RTT for each connection*, we applied
multilateration for estimating the location of a relay. Even though this
approach is noisy because of varying network conditions and routes, we
still get a good estimate of the relay's actual position.

We compared our estimated ICMP relay locations with the GeoIP information:
- our test set consisted of a full consensus
- we conducted the measurements within 5 days and repeated reference
experiments a month later to test the stability of results
- we sent 500 pings per relay from 8 remote servers and repeated the
measurements multiple times
- we use the minimum RTT as input for the multilateration

Results can be summarized as follows:
- the median location error is in a range of 440km
- 287 outliers are more than 2654km away from the position that GeoIP
suggested. This represents ~4.6% of the tested relays
- the 75th percentile of nodes differs by more than 1000km

Currently we repeat the experiments with 16 instead of 8 servers and
work on improving the evaluation to improve the location estimate.

We cannot take these results as a ground truth, as a majority of GeoIP
locations already document the actual country and continent a relay is
in. Nevertheless, this is a good way to add an independent verification
step. The location error for the outliers is a proof that there are
nodes that actually run on a different continent and this is an
important security issue if users want to circumvent a certain country.
The same applies for the 75th percentile, which also leads to updated
country information for a significant set of relays.

We can conclude that yes, a large percentage of Tor nodes have OK
records. But the number of false positives is not that low and, from my
opinion, cannot be ignored. Besides an independent verification step,
for which I suggest timing measurements and multilateration, location
errors that lead to an updated country code should be considered as
update (or respective nodes should be flagged).

*this follows the motivation that no transmission can ever be faster
than a certain threshold, so the minimum RTT is the closest we can get
to this threshold


Cheers,
Katharina
grarpamp
2018-07-04 02:52:02 UTC
Permalink
Thanks for your work.
You may also consider Africa and South America, Canada, Russia, etc.
And locations interior to all such that contacts within an RTT
are not as likely to be across a pond or other border,
vs as at some edge IX or landing. Cable maps may assist.

Loading...