Go Proxy Security, Part 3: Behind Google's Curtain - Auditing Gosumdb

tl;dr: https://gosumdb.scrutineer.tech

In Part 1 and Part 2 of this series, I explained why goproxy and gosumdb exist. In this last part of this series, I introduce an auditing tool for gosumdb that I published, Gosumd Audit.

Goproxy was built with auditability in mind. In my research, I could not find a publicly available audit of the goproxy and the supporting gosumdb, so I built one.

Two parts of the Merkle Tree need auditing: The Merkle Tree itself and the logical integrity of the records stored in the Tree.

Verifying the Merkle Tree

There are usually two kinds of proofs performed for a Merkle Tree, Inclusion Proofs and Consistency Proofs.

Inclusion Proofs link a record to the Tree Head Hash. This check is automatically performed by a client downloading dependencies with go mod. The client looks up the record for a go module in gosumdb. Because the client does usually not store all records locally, a search for the record is performed online.

This record, when hashed following the rules of the Merkle Tree, is part of the Tree Head Hash. The client verifies that the record hash is part of the calculations of the Tree Head Hash. To speed up the calculation and save time, only parts of the Merkle Tree are used to calculate the Tree Head Hash. This was explained in Part 2 when I explained Tiles.

Because Inclusion Proofs are performed by all clients by default, it doesn’t make sense to implement an auditing tool for it myself. Consistency Proofs are not part of the default checks, and that is what I focused on.

A Consistency Proof verifies that an older Tree T is fully contained in a newer Tree T'.

A Consistency Proof is powerful in its statement because it verifies that

Both claims defend against different attacks.

If an attacker can remove parts of the Merkle Tree, they could add malicious records, wait for a few victims, and once the victims got served, remove the malicious records again so an audit would not find suspicious records.

If an attacker can exchange a Merkle Tree with a new one, (malicious) records could be added and removed arbitrarily.

Gosumdb Audit stores published Tree Head Hashes periodically. For every new Tree Head Hash, the tool checks that all former Tree Head Hashes are contained in the newer and larger Tree. The check is performed after all records were hashed individually, resulting in the Tree Head Hash.

Logical Integrity

The validity of the stored records is as important as the integrity of the Merkle Tree itself.

One of the attack vectors I identified is the duplication of records. If a client makes a lookup for a go module, the gosumdb API returns one record. The client verifies that this record is contained in the Merkle Tree.

If the same go module (path + version) exists twice among all records, the gosumdb API could serve different repository hashes to different clients. Clients could not detect this, as they usually only perform an Inclusion Proof, which would succeed for duplicates.

Because Gosumdb Audit stores all records locally, it verifies that the combination (path + version) has no duplicates.

Future improvements

While reading, you might have thought about the most obvious logical integrity check. Does the stored repository hash in a record match the hash of the original repository?

There are more than 18 million records stored. Each corresponds to a specific go module version. Downloading and verifying all repositories would be the best check possible, but it is not practical. The amount of data to download and verify is very difficult to handle. And some of the modules might only be stored in goproxy, but are long gone from the source.

What can we do though?

Both approaches have their deficiencies. Externally provided repositories can get corrupted, for example, if the maintainer overwrites a git tag. This could lead to false positives.

Bait repositories can be detected by gosumdb and always get treated correctly.

Both approaches are still left for implementation in Gosumdb Audit and will be added at a later stage.

By Raphael Sprenger licensed under CC BY-NC 4.0