0 votes
1 view
in Devops and Agile by (16.9k points)

This probably never happened in the real-world yet, and may never happen, but let's consider this: say you have a git repository, make a commit, and get very very unlucky: one of the blobs ends up having the same SHA-1 as another that is already in your repository. Question is, how would Git handle this? Simply fail? Find a way to link the two blobs and check which one is needed according to the context?

More a brain-teaser than an actual problem, but I found the issue interesting.

1 Answer

0 votes
by (22.3k points)

I did an experiment to find out exactly how Git would behave in this case. This is with version 2.7.9~rc0+next.20151210 (Debian version).

I primarily simply reduced the hash size from 160-bit to 4-bit by applying the subsequent diff and rebuilding git:

--- git-2.7.0~rc0+next.20151210.orig/block-sha1/sha1.c

+++ git-2.7.0~rc0+next.20151210/block-sha1/sha1.c

@@ -246,6 +246,8 @@ void blk_SHA1_Final(unsigned char hashou

blk_SHA1_Update(ctx, padlen, 8);

/* Output hash */

- for (i = 0; i < 5; i++)

- put_be32(hashout + i * 4, ctx->H[i]);

+ for (i = 0; i < 1; i++)

+ put_be32(hashout + i * 4, (ctx->H[i] & 0xf000000));

+ for (i = 1; i < 5; i++)

+ put_be32(hashout + i * 4, 0);

}

Then I did a few commits and noticed the following.

If a blob already exists with the same hash, you will not get any warnings at all. Everything seems to be ok, but when you push, someone clones, or you revert, you will lose the latest version (in line with what is explained above).

If a tree object already exists and you create a blob with a similar hash: Everything can appear traditional, until you either try to push or someone clones your repository.

Then you'll see that the repo is corrupt.

If a commit object already exists and you make a blob with the same hash: same as #2 - corrupt

If a blob already exists and you make a commit object with the same hash, it will fail when updating the "ref".

If a blob already exists and you create a tree object with the same hash.

It will fail when creating the commit.

If a tree object already exists and you create a commit object with a similar hash, it'll fail when updating the "ref".

If a tree object already exists and you create a tree object with the same hash, everything will seem ok.

But once you commit, all of the repository will reference the wrong tree.

If a commit object already exists and you create a commit object with the same hash, everything will seem ok.

But once you commit, the commit can never be created, and also the HEAD pointer are moved to an old commit.

If a commit object already exists and you create a tree object with the same hash, it will fail when creating the commit.

For #2 you will typically get an error like this when you run "git push":

error: object 0400000000000000000000000000000000000000 is a tree, not a blob

fatal: bad blob object

error: failed to push some refs to origin

or:

error: unable to read sha1 file of file.txt (0400000000000000000000000000000000000000)

In case you delete the file and then run "git checkout file.txt".

For #4 and #6, you will typically get an error like this:

error: Trying to write non-commit object

f000000000000000000000000000000000000000 to branch refs/heads/master

fatal: cannot update HEAD ref

when running "git commit".

In this case you can typically just type "git commit" but again since this may create a brand new hash (because of the changed timestamp)

For #5 and #9, you will typically get an error like this:

fatal: 1000000000000000000000000000000000000000 is not a valid 'tree' object

when running "git commit"

If somebody tries to clone your corrupt repository, they will typically see something like:

git clone (one repo with collided blob,

d000000000000000000000000000000000000000 is commit,

f000000000000000000000000000000000000000 is tree)

Cloning into 'clonedversion'...

done.

error: unable to read sha1 file of s (d000000000000000000000000000000000000000)

error: unable to read sha1 file of tullebukk

(f000000000000000000000000000000000000000)

fatal: unable to checkout working tree

warning: Clone succeeded, but checkout failed.

You can examine what was checked out with 'git status'

And retry the checkout with 'git checkout -f HEAD'

...