This is git-annex's bug list. Link bugs to done when done.
subtle build issue on OSX 10.7 and Haskell Platform (if you have the 32bit version installed)
Posted Sat Sep 22 03:36:59 2012
watcher commits unlocked files
Posted Sat Sep 22 03:36:59 2012
unannex command doesn't all files
Posted Sat Sep 22 03:36:59 2012
Issue on OSX with some system limits
Posted Wed Sep 19 16:46:44 2012
annexed symlink mtime matching code is disabled on non-linux systems; needs testing
Posted Tue Jul 17 17:54:56 2012
unannex and uninit do not work when git index is broken
Posted Tue Jul 17 17:54:56 2012
Unfortunate interaction with Calibre
Posted Tue Jul 17 17:54:56 2012
Prevent accidental merges
Posted Tue Jul 17 17:54:56 2012
softlink mtime
Posted Tue Jul 17 17:54:56 2012
signal weirdness
Posted Tue Jul 17 17:54:56 2012
git rename detection on file move
Posted Tue Jul 17 17:54:56 2012
S3 memory leaks
Posted Tue Jul 17 17:54:56 2012
[[!edittemplate template=templates/bugtemplate match="bugs/*" silent=yes]]
git annex unused --debug
; this will tell us the git command that's outputing the data it cannot process. Then you can try running that git command and see what the problem filename is.locale
setting may also be relevant. FWIW, I've tried to create a file with\xb4
in its name and have not gotten git-annex unused to crash on it.git-annex was not crashing due to content in the git-annex branch, but due to a symlink in one of your regular git branches, probably master and origin/master.
This bug is fixed in git master, if you need the fix before the next release.
Yes, the problem is fixed.
The repository was a normal git repository with path /tmp/çüş (git init) and with annex description "çüş" (git annex init çüş)
afaict, i can't reproduce the problem anymore either :-)
Doing,
Somewhat works for me, git-annex watch at least starts up and takes a while to scan the directory, but it's not ideal. Also, creating files seems to work okay, when I remove a file the changes don't seem to get pushed across my other repos, running a sync on the remote repo fixes things.
To re-inject new content for a file, you really want to get a new key for the file. Otherwise, other repos that have the old file will never get the new content. So:
a0826293 fixed the last problem, there is coreutils available in macports, if they are installed you get the gnu equivalents but they are prefixed with a g (e.g. gchmod instead of chmod), I guess not everyone will have these install or prefer these on OSX
Some more tests fail now...
On a side note, I think I found another bug in the testing. I had tested in a virtual machine in archlinux (a very recent updated version) Please see the report here tests fail when there is no global .gitconfig for the user
Ah, great, thanks very much for the quick fix!
Yes, when I mentioned three defunct git processes, there were three processes shown as "git [defunct]", plus the three git processes I listed, plus two "git-annex" processes. Upon cancel/resume, there were no defunct git processes when I checked, but by the time I found the bug report on the forum and commented I'd already successfully upgraded by annex (by repeatedly attaching strace) and couldn't really easily get at either additional 'ps' info or a fuller strace than what I posted (that was just the log from one of the attach/detach cycles), so it's a relief you managed to pinpoint the problem.
Or, even better, wouldn't it make sense to have SHA backends always default to --fast and only use non-fast when any snags are hit, use non-fast mode for that file.
Though if we continue here, we should probably move this to its own page.
Lauri a scratch patch would be very helpful. Encoding stuff makes my head explode.
However, I am very worried by haskell's changes WRT unicode and filenames. Based on user input, git-annex users like to use it on diverse sets of files, with diverse and ill-defined encodings. Faffing about with converting between encodings seems likely to speactacularly fail.
Outside the test suite, git-annex's actual use of cp puts fairly low demands on it. It tries to use cp -a or cp -p if available just to preserve whatever attributes it can preserve, but the worst case if that you have a symlink pointing to a file that doesn't have the original timestamp or whatever. And there's little expectation git preserves that stuff anyway.
I will probably try to make the test suite entirely use git clone rather than cp.
Joey, sorry, I got it wrong. I thought upgrading git didn't help and you adjusted things in git-annex instead.
Anyway, can I get around upgrading on all hosts by reformatting the drive to case-sensitive HFS+? Or will I have to upgrade git (currently version 1.7.2.5) eventually anyway?
On second thought and after some messing (trying most of the options and combinations of options on OSX for).... I tried replacing cp with gnu cp from coreutils on my OSX install, and all the tests passed. sigh cp -a is preserving some permissions and attributes but not all, its not behaving in the same way as the gnu cp does... the closet thing that I have found on OSX that behaves in the same way as gnu "cp -pr" is to use "ditto".
Just doing a "ditto SOURCE DEST" in the tests passes everything. I'm not sure if its a good idea to use this even though it works. Though this is just the tests, does it affect CopyFile.hs where "cp" is called?
It seems the objects are in the remote after all, but the remote is unaware of this fact. No idea where/why the remote lost that info, but.. Anyway, with the SHA backends, wouldn't it make sense to simply return "OK" and update the annex logs accordingly, no?
Local:
Remote:
So, there is evidence here of a circumstance caused by the other bug, as I suspected.
I don't think that manual
git commit -a
caused the problem. I suspect it was a subsequentgit add
that caused git to follow the wrong case paths and add the files in the wrong place. Ie, when you run "git add .git-annex", it recurses into.git-annex/Gm/
, and adds files using that case, that were previously added from.git-annex/GM/
.For completeness, can you verify this repo's core.ignorecase setting?
I hate that you are stuck using loop filesystems to work around this bug. If my guess is correct, you don't need to, as long as you avoid manually running "git add .git-annex". I take this bug seriously. While I'm currently very involved in adding Amazon S3 support to git-annex (which will take days more of solid work), I do plan to make a loop filesystem of my own, probably vfat, so I can try and reproduce this on a case-insensative filesystem. If you could confirm my above hypothesis, that would speed things up for me.
It's possible I will have to tweak the hash directories. Hopefully if so, I will only tweak them for new keys; if I had to do a v3 backend just to fix this stupid thing, I'd be sad -- upgrading all my offline disks from v1 to v2 took me many days.
I forgot to mention that the statfs64 stuff in OSX seems to be deprecated, see http://developer.apple.com/library/mac/#documentation/Darwin/Reference/ManPages/man2/statfs64.2.html
on a slightly different note, is anonymous pushing to the "wiki" over git allowed? I'd prefer to be able to edit stuff inline for updating some of my own comments if I can :P
Try the changes I've pushed to use statfs64 on apple.
There is actually a standardized statvfs that I'd rather use, but after the last time that I tried going with the POSIX option first only to find it was not broadly implemented, I was happy to find some already existing code that worked for some OSs.
(While ikiwiki supports anonymous git push, it's a feature we have not rolled out on Branchable.com yet, and anyway, ikiwiki disallows editing existing comments that way. I would, however, be happy to git pull changes from somewhere.)
That's odd, I have the md5sha1sum package installed and it still fails with pretty much the same error
the configure script finds sha1sum, builds and starts to run.
FYI, (the follow is on OSX 10.7 on two different machines)
On my 64bit install of haskell platform...
On my 32bit install of haskell platform...
Running cabal build or cabal install git-annex as you suggest with the 32bit install does do the right thing.
I have pushed out a preliminary fix. The old mixed-case directories will be left where they are, and still read from by git-annex. New data will be written to new, lower-case directories. I think that once git stops seeing changes being made to mixed-case, colliding directories, the bugs you ran into won't manifest any more.
You will need to find a way to get your git repository out of the state where it complains about uncommitted files (and won't let you commit them). I have not found a reliable way to do that; git reset --hard worked in one case but not in another. May need to clone a fresh git repository.
Let me know how it works out.
What an evil little bug. In retrospect, this probably bit my own test upgrades, but I ran
git annex fsck
everywhere and so avoided the location log breakage.I've fixed the bug, which also involved files with other punctuation in their names [&:%] when using the WORM backend.
The only way I have to recover repos that have already been upgraded is to run
git annex fsck --fast
in each clone of such a repo, which will let it rebuild the location log information. I think that is the best way to recover; ie I can't think of a way to recover that doesn't need to do everything fsck does anyway.So, it appears that you're using git annex copy --fast. As documented that assumes the location log is correct. So it avoids directly checking if the bare repo contains the file, and tries to upload it, and the bare repo is all like "but I've already got this file!". The only way to improve that behavior might be to let rsync go ahead and retransfer the file, which, with recovery, should require sending little data etc. But I can't say I like the idea much, as the repo already has the content, so unlocking it and letting rsync mess with it is an unnecessary risk. I think it's ok for --force to blow up if its assumptions turn out to be wrong.
If you use git annex copy without --fast in this situation, it will do the right thing.
This only happens with the WORM backend (or possibly with SHA1E if the file's extension has a newline).
The problem is not the newline in the file, but the newline in the key generated for the file. It's probably best to just disallow such keys being created.
Version: 0.20110503
My local non-bare repo is copying to a remote bare repo.
I have been recovering in a non-bare repo.
If there is anything I can send you to help... If I removed said files and went through http://git-annex.branchable.com/bugs/No_easy_way_to_re-inject_a_file_into_an_annex/ -- would that help?
In the meantime, would it be acceptable to split the pre-commit hook into two discrete parts?
This would allow to (if preferred) defer "git annex fix" until post-commit while still keeping the safety net for unlocked files.
Alternatively, you can just load it up in ghci and see if it reports numbers that make sense:
Hi,
(I'm new to git and git annex, so please forgive any mistakes I make...)
My repo is messed up right now. The fact that I copied the repo with rsync -a back and forth from a case insensitive filesystem to a case sensitive one, probably didn't help.
I believe the annexed files in .git/annex/objects/ are still using a mixed case directory hashing scheme. That's the problem I'm having. The symlinks point to the wrong case and are now broken. I don't think the latest versions of git-annex changed that (it only changed the hashing under .git-annex, right?).
Even if I clean up my repo, I think I'm still going to have a problem because I have one repo on an OS X case insensitive filesystem and my other repos on case sensitive Linux filesystems. Potentially the directory name under .git/annex/objects will have a different case. Then the symlink might have a different case than my Linux FS. Does git-annex track changes in git by the contents of the symlink? In which case the case difference would show up as a change even though there is no change?
Is it possible to change the directory hashing scheme under .git/annex/objects to use lowercase names?
Seems like you probably have files in git with nearly as long filenames as the key files. Course, you can rename those yourself.
This couldn't be changed directly in WORM without some ugly transition, but it would be possible to implement it as a WORM100 or so. OTOH, if you're going to git annex migrate, you might as well use SHA1.
A google search http://www.google.com/search?hl=en&sclient=psy-ab&q=conq%3A+invalid+command+syntax&btnG= finds other examples of this error message related to ssh, mercurial, and bitbucket. What that has to do with git is anyone's guess, but I'm pretty sure git-annex is not related to it at all.
Hey @fmarier. Well, this bug report is closed because you can already get rid of the symlinks. Just put a bare git repo on your fat filesystem, and use git-annex copy --to/--from there.
Now, that puts all the files that are on the device in .git/annex/objects/xx/yy/blah.mp3 -- how well rockbox would support that I don't know. And if it tries to modify or delete those files, git annex also can't help you manage those changes.
Another recent option is the directory special remote type, which again uses "xx/yy/blah.mp3" and can't track changes made to the files. This could perhaps be extended in the direction you suggest, although trying to fit this into the special remote infrastructure might not be a good fit really.
The most likely way this has to get dealt with is really by using smudge filters, which would eliminate the symlinks and allow copying a non-bare git repo onto vfat.
Yeap, that did the trick. I just tested a few separate OSX 10.6.6 systems and the tests are better behaved now, only 3 failures now.
So the tests behave better (at least we don't get resource fork errors any more)
On all the systems I tested on, I'm down to 3 failures now.
It's the same set of failures across all the OSX systems that I have tested on. Now I just need to figure out why there are still these three failures.
Adam, this bug was fixed a long time ago, first using option #2 above, but later switching to option #3 -- git-annex treats filenames as opaque binary blobs and never decodes them in any encoding; haskell's normal encoding support for stdio is disabled.
And it never resulted in a failure like you show. I cannot reproduce your problem, but it is a different bug, please open a new bug report.
It exists locally, whereis tells me it exists locally and locally, only.
The object is not in the bare repo.
The file might have gone missing before I upgraded my annex backend version to 2. Could this be a factor?
That seems an excellent idea, also eliminating the need for git annex fix after moving.
However, I think CVS and svn have taught us the pain associated with a version control system putting something in every subdirectory. Would this pain be worth avoiding the minor pain of needing git annex fix and sometimes being unable to follow renames?
Hm, if path's ok, guess there's no way around git-bisect indeed. Wonder if there's some kind of ccache for haskell...
OS is linux, amd64 on "host1" and i386 on "host2" where git-annex-shell is crashing. I'll try to come up with a commit, thanks for clarifications.
Actually I may have just been stupid and should have read the man page on statfs...
yields this...
we could just stick another if defined (APPLE) instead of what I previously had and it looks like it will do the right thing on OSX.
I also encountered Adam's bug. The problem seems to be that communication with the git process is done with
Char8
-bytestrings. So, whenL.unpack
is called, all filenames that git outputs (withls-files
orls-tree
) are interpreted to be in latin-1, which wreaks havoc if they are really in UTF-8.I suspect that it would be enough to just switch to standard
String
s (orData.Text.Text
) instead of bytestrings for textual data, and toWord8
-bytestrings for pure binary data. GHC should nowadays handle locale-dependent encoding ofString
s transparently.Repeated bisect with -j1, just to be sure it's not a random error, and it gave me 828a84ba3341d4b7a84292d8b9002a8095dd2382 again. Guess I'll look through the changes there a bit later and try to revert these until it works.
Not sure if it's repeatable by anyone but me (and hence worth fixing), but here's a bit more of info about the system:
(some stuff listed here as ::installed, but contains no files, since these packages detect whether ghc-7.0.2 already comes with the same/newer package version)
I meant to say in it wasn't reliable when I was following the instructions for "Comment 12". I did find that just doing a "git annex copy -t externalusb ." then a "git annex drop ." from the root of my cloned and "none trusted" annexed repos to be more reliable, it just means I temporarily need a load of space to get myself out of my earlier mess.
On testing this bug fix, I found a minor behavioural issue with git annex copy -f REMOTE . doesn't work as expected
I also failed to mention, that in the case when i have stray log files after what has happened in comment 2, I get this left over after a commit when git is confused...
Up until now I have just been updating the status of the staged files by hand and commiting it on my mac x00, this probably isn't helping. I'd rather not lose the tracking information.
Currently fsck silently ignores --to/--from. It should at least complain if it is not supported.
Thanks to your feedback, I got it going.
Maybe those two should be added to the 'OSX how-to' in the forum
[realizes pcre-light is needed but pcre not installed on my mac]
sudo port install pcre
sudo cabal install pcre-light
[tests are failing, need haskell's quickcheck]
sudo cabal install quickcheck
I think I know how I got myself into this mess... I was on my mac workstation and I had just pulled in a change set from another repo on a linux workstation after I had a made a bunch of moves. here's a bit of a log of what happened...
Despite
status
listing S3 support, your git-annex is actually built with S3stub, probably because it failed to find the necessary S3 module at build time. Rebuild git-annex and watch closely, you'll see "** building without S3 support". Look above that for the error and fix it.It was certianly a bug that it showed S3 as supported when built without it. I've fixed that.
If you try to clone a git repo that has a symlink over to a VFAT filesystem, you get (in its place) a regular file that contains the name of the symlink target. So why can't git-annex use that? I could still do git annex get on this file, git annex would still "know" that it's a symlink, and could replace it with a copy of the real file (instead of putting it in .git/annex).
I know if it were that simple, someone would have done it already, so what am I missing? I guess trying to get the file FROM the repository would fail because it wouldn't find the file in .git/annex? Couldn't you store a reverse mapping? You wouldn't be able to move the file around, but you already lose that once you give up symlinks. It would also be a little harder to tell which symlinks were "dangling"; I don't see an easy way to get around that. It would still be better than a bare repo..
I just noticed this issue, and was wondering what the current status is.
Finally got around to report the issue to GHC tracker.
Looks quite alike (at least to the haskell-illiterate person like me) to a highest-priority issue that's hanging right at the top of the list. There are other similar reports, but they seem to be either related to PowerPC Macs, closed as invalid or due to needinfo inactivity.
Guess any further discussion belongs there, unless ghc developers will bounce it back. Thanks a lot for your help, Joey, and for sharing a great thing that git-annex is.
S3 doesn't support encryption at all, yet.
It certainly makes sense to use a different portion of the encrypted secret key for HMAC than is uses as the gpg symmetric encryption key.
The two keys used in HMAC would be the secret key and the key/value key for the content being stored.
There is a difficult problem with encrypting filenames in S3 buckets, and that is determining when some data in the bucket is unused for dropunused. I've considered two choices:
gpg encrypt the filenames. This would allow dropunused to recover the original filenames, and is probably more robust encryption. But it would double the number of times gpg is run when moving content in/out, and to check for unused content, gpg would have to be run once for every item in the bucket, which just feels way excessive, even though it would not be prompting for a passphrase. Still, haven't ruled this out.
HMAC or other hash. To determine what data was unused the same hash and secret key would have to be used to hash all filenames currently used, and then that set of hashes could be interested with the set in the bucket. But then git-annex could only say "here are some opaque hashes of content that appears unused by anything in your current git repository, but there's no way, short of downloading it and examining it to tell what it is". (This could be improved by keeping a local mapping between filenames and S3 keys, but maintaining and committing that would bring pain of its own.)
I agree, it's weird, but that's what I'm seeing:
Output:
I also ran into problems on a case-insensitive HFS+ file system, it seems. I tried following the instructions in comment 12:
However, I still see upper and lower case directories in .git-annex. Did I misunderstand that they should all be lower case now?
You're missing the sha1sum command, everything else is a followon error from that. Added a hint about this to install, and in the next version configure will check for sha1sum.
Thanks for the reply @joey.
While it would certainly be possible for a bare repo to exist on my iRiver, the problem is that the music player uses the filesystem to organize files into directories like "Artist/Album/Track.ogg". So replacing that with "..../xx/yy/Track.ogg" would make it fairly difficult to browse my music collection and select the album/track I want to listen to :)
So unless I have the files physically organized like the symlinks, then it's probably not going to work very for that particular workflow. Smudge filters are interesting though. In the meantime, I'll look into rsyncing from another box which has the right filesystem layout onto my iRiver directly.
I've posted about this on the git mailing list. It's possible that these bugs, which can be shown to affect things other than just git-annex, will be fixed in git.
I will wait a while to see. But am considering making git-annex use all-lowercase hash dirs for the log files. Maybe it could first look for .git-annex/aaaa/bbbb/foo.log, but also look for, read, and merge in any info from .git-annex/Aa/Bb/foo.log. And always write to the new style filenames. This would avoid confusing git with changes to mixed-case files, and avoid another massive transition.
git annex fsck
or reset to the old git tree (andgit config annex.version 2
) and upgrade again..zshrc
is only read for interactive shells, sossh mybox 'echo $PATH'
displayed/usr/bin:/bin:/usr/sbin:/sbin
. Using.zshenv
, which is used even for non-interactive shells, did the trick. Thanks!git-annex reinject
.What you're describing should be impossible; the error message shown can only occur if the object is present in the annex where
git-annex-shell recvkey
is run. So something strange is going on.Try reproducing it by running on the remote system,
git-annex-shell recvkey /remote/repo.git $key
.. if you can reproduce it, I guess the next thing to do will be to strace the command and see why it's thinking the object is there.I see the same results ("
touch: cannot touch 'Zp': File exists
") on these Debian systems:It does NOT happen on this Ubuntu system:
So really it seems like only the Ubuntu kernel is the outlier here? Maybe it has something to do with charsets or something; I think FAT is a mess in that regard and even long versus short filenames can behave differently.
This (rather longish) thread discusses the current situation, the planned changes for 7.2 and the various issues: http://haskell.org/pipermail/glasgow-haskell-users/2011-November/021115.html
The summary seems to be: From 7.2 on, getDirectoryContents will return proper Strings, i.e. where a Char represents a Unicode code point, and not a Word8, which will fix the problem of outputting them.
git-annex uses locking to avoid problems if multiple processes are run at the same time.
I just tested on NFS, with Linux on the server and client, and it works ok. It seems your NFS client (or server) must not support fncl locking. What OS is your NAS running?
I did not. Thanks :)
This still means that you can't re-inject a new version of a file unless you have the old one if you are using a SHA* backend, but that might be a corner case anyway.
I wouldn't say it's completly impossible for a WORM100 to work. It would just have the contract that the pair of mtime+100chars has to be unique for each unique piece of data.
But, I have yet to be convinced there's any point, since SHA1 exists.
The dtrace puzzlingly does not have the same errors shown above, but a set of mostly new errors. I don't know what to make of that.
This seems to be caused by it setting the execute bit on the file. I don't know why that would fail; it's just written the file and renamed it into place so clearly should be able to write to it.
This also suggests something breaking with permissions.
All right, I see the same thing with linux 3.1.0. It seems this behavior has changed since linux 3.0.0. Mounting with shortname=lower avoids the problem.
I feel a good case could be made that this new behavior is a linux bug. Your example with touch particularly shows how weird it is.
Hmm.. is utimensat available at all?
I've committed an update that may convince at least some compilers to expose this newer POSIX stuff. I don't know if it will help, please let me know.
You convince me for unannex, but isn't the goal of uninit to revert all annex operations? In the current state, a clean revert is not possible (because of the broken symlinks after uninit). Instead of copying, using hard links is out of question?
For my needs, is the command "git annex unlock ." (from the root of the repo) a correct workaround?
git annex whereis
say about it? Is the content actually present in annex/objects/ on the bare repository? Does that contradict whereis?Nice work on the bisection. It's obviously a compiler bug. Having two test cases that differ in only as trivial and innocous a commit as 828a84ba3341d4b7a84292d8b9002a8095dd2382 might help a GHC developer track it down.
We should probably forward this as a GHC bug. I hope you can find a different version or build of GHC to build git-annex with.
Ah, that gave me a good clue, my system just got pretty confused with a mixture of quickcheck and testpack installs. Would it be possible to put up a list of versions of the software you are using on your development environment? (at least the minimum tested version)
I guess it shouldn't matter to most users who are going to rely on packagers to sort these dependancy issues, but it's nice to know.
Anyway, the tests build now, and they seem to fail on my (rather messy) install of haskell platform + ghc 6.12 on osx 10.6.6.
I assumed that since the tests built, then running them shouldn't be a problem. It looks like some argument isn't being passed about for the location of the .t directory that gets created. I will check the dependancies on my system again.
if you go for the two-commits version, small intermediate branches (or git-commit-tree) could be used to create a tree like this:
while the first commit (436e46f) has a "
/subdir/foo → ../.git-annex/where_foo_is
", the intermediate (9395665) has "/subdir/deeper/foo → ../.git-annex/where_foo_is
", and the inal commit (106eef2) has "/subdir/deeper/foo → ../../.git-annex/where_foo_is
".--follow
uses the intermediate commit to find the history, but the intermediate commit would neither show up ingit log --first-parent
nor affectgit diff HEAD^..
& co. (there could still be confusion overgit show
, though).I'm not sure how this happened, as far as I can see, and based on my testing,
git annex upgrade
does stage the location log files. OTOH, I vaguely rememeber needing to stage some of them when I was doing my own upgrades, but that was a while ago, and I don't remember the details.Your upgrade seems to have gone ok from the file lists you sent, so you can just:
git add .git-annex; git commit
Git can follow the rename fine if the file is committed before
git annex fix
(you can git commit -n to see this), so making git-annex pre-commit generate a fixup commit before the staged commit would be one way. Or the other two ways I originally mentioned when writing down this minor issue. I like all those approaches better than .git-annex clutter.It all boils down to the fact that the path to a relative symlink's target is determined relative to the symlink itself.
Now, if we define the symlink's target relative to the git repo's root (eg. using the $GIT_DIR environment variable, which can be a relative or absolute path itself), this unfortunately results in an absolute symlink, which would -for obvious reasons- only be usable locally:
So, what we need is the ability to record the actual variable name (instead of it's value) in our symlinks.
It is possible, using variable/variant symlinks, yet I'm unsure as to whether or not this is available on Linux systems, and even if it is, it would introduce compatibility issues in multi-OS environments.
Thoughts on this?
Ok, well it looks like it isn't doing anything useful at all.
.git-annex/??
if you want to, then runninggit annex fsck --fast
in each of your clones would regenerate the data using only the lower-case hash directories.Surely this could be handled with an extra layer of indirection?
git-annex would ensure that every directory containing annexed data contains a new symlink
.git-annex
which points to$git_root/.git/annex
. Then every symlink to an annexed object uses a relative symlink via this:.git_annex/objects/xx/yy/ZZZZZZZZZZ
. Even though this symlink is relative, moving it to a different directory would not break anything: if the move destination directory already contained other annexed data, it would also already contain.git-annex
so git-annex wouldn't need to do anything. And if it didn't, git-annex would simply create a new.git-annex
symlink there.These
.git-annex
symlinks could either be added to.gitignore
, or manually/automatically checked in to the current branch - I'm not sure which would be best. There's also the option of using multiple levels of indirection:I'm not sure whether this would bring any advantages. It might bring a performance hit due to the kernel having to traverse more symlinks, but without benchmarking it's difficult to say how much. I'd expect it only to be an issue with a large number of deep directory trees.
Yes, encrypting the symmetric key with users' regular gpg keys is the plan.
I don't think that encryption of content in a git annex remote makes much sense; the filenames obviously cannot be encrypted there. It's more likely that the same encryption would get used for a bup remote, or with the directory remote I threw in today.
ps
orstrace
to see what it's doing.Personally I'd rather have working rename detection but I agree it's not 100% ideal to be littering multiple directories like this, so perhaps you could make it optional, e.g. based on a git config setting?
Here are a few more considerations, some in defence of the approach, some against it:
.git-annex
is hidden;CVS/
is not.CVS/
and.svn/
, it's only a symlink, not a directory containing other files..git-annex
was moved within the repository:.git-annex
in any subdirectory is always a symlink to../.git-annex
so instead you would need to check that all of the new ancestors contain this symlink too, and optionally remove any no longer needed symlinks.$git_root/foo -follow
,diff -r
etc. would traverse into$git_root/.git/annex
This last point is the only downside to this approach I can think of which gives me any noticeable cause for concern. However, people are already use to working around this from CVS and svn days, e.g.
diff -r -x .svn
so I don't think it's anywhere near bad enough to rule it out.As my comment from work is stuck in moderation:
I ran this twice:
but nothing changed
'git add .git-annex' didn't do anything. That's when I noticed that this repository is on a case-insensitive HFS+ file system.
So, if I get this right it's not a new bug, but similar to this situation: git-annex directory hashing problems on osx
Assuming that it was the file system's fault, I went ahead and upgraded yet another clone. That one (on an ext3 file system) had neither staged changes nor left-over untracked files. Everything seems to just have fallen right into place. Is that possible or still weird?
Hmm. Old versions may have forgotten to git add a .git-annex location log file when recovering content with fsck. That could be another reason things are out of sync.
But I'm not clear on which repo is trying to copy files to which.
(NB: If the files were recovered on a bare git repo, fsck cannot update the location log there, which could also explain this.)
I've seen this kind of piping stall that is unblocked by strace before. It can vary with versions of GHC, so it would be good to know what version built git-annex (and on what OS version). I filed a bug report upstream before at http://bugs.debian.org/624389.
I really need a full strace -f from the top, or at least a complete
strace -o log
of git-annex from one hang through to another hang. The strace you pastebinned does not seem complete. If I can work out which specific git command is being written to when it hangs I can lift the writing out into a separate thread or process to fix it.@pavel, you mentioned three defunct git processes, and then showed ps output for 3 git processes. Were there 6 git processes in total? And then when you ran it again you said there were no defunct gits -- where the other 3 git processes running once again?
As best I can make out from the (apparently) running git processes, it seems like the journal files for the upgrade had all been written, and the hang occurred when staging them all into the index in preparation for a commit. I have committed a change that lifts the code that does that write out into a new process, which, if I am guessing right on the limited info I have, will avoid the hang.
However, since I can't reproduce it, even when I put 200 thousand files in the journal and have git-annex process them, I can't be sure.
Yeah, I saw those google links myself, but couldn't see why the bitbucket/ssh would be relevant.
The strange thing is that I only get this message when running git-annex.
I also don't have a conq in my path so I don't know where it is running from.
Oh well, if I ever sort it out I'll post back here.
ok, pulling the latest master and building on OSX now does this...
changing the #if 0 to 1 gives this...
it seems that commit 6634b6a6b84a924f6f6059b5bea61f449d056eee has broken support for OSX.
Just did some minor digging around and checking, this seems to satisfy the compilers etc... I have yet to confirm that it really is working as expected. Also it might be better to check for a darwin operating system instead of apple I think, though I don't know of any one really using a pure darwin OS. But for now it works (I think)
Completed git-bisect twice, getting roughly the same results:
contents of final refs/bisect:
"roughly" because second bisect gave two commits as a result, failing to build one of them (missing .o file on link, guess it's because of -j4 and bad deps in that version's build system):
Also noticed that "git-annex-shell ..." command succeeds if ran as root user, while failing from unprivileged one. There are no permission/access errors in "strace -f git-annex-shell ...", so I guess it could be some bug in the GHC indeed.
JIC, logged a whole second bisect operation. Resulting log: http://fraggod.net/static/share/git-annex-bisect.log
Bisect script I've used (git-annex-shell dies with error code 134 - SIGABRT on GHC error):
hSetEncoding h localeEncoding
to suitable places. Making things work properly with an arbitrary locale encoding would be more complicated.I could dig it out, but I am sure I said dots are fine and a whirly better.
Still, WONTFIX is fine.
git annex describe
only sets the description to avoid complication. Imagine using it in a script for example.git annex status
shows the description. It does not show the trust level because I have not thought of a visually pleasing and compact way to show it in the repository list there.. suggestions appreciated, since the same list is used bywhereis
, and showing trust levels there would be particularly useful.I think the correct steps should be, make a backup first :) then ...
I eventually migrated all of my own annex'd repos and I no longer have the old hashed directories but the new ones in the form
I did lose some tracking information but not data (as far as I can see for now), but that was quickly fixed by pushing and pulling to my bare repo which tracks most of my data.
I also found that it worked a bit more reliably for me on the copies of repos that were located on case sensitive filesystems, but I guess that was expected.
git 1.7.4 does not make things better. With it, if I add first "X/foo" and then "x/bar", it commits "X/bar".
That will certianly cause problems when interoperating with a repo clone on a case-sensative filesystem, since git-annex there will not see the location log that git committed to the wrong case directory.
It's possible there is some interoperability problem when pulling from linux like you did, onto HFS+, too. I am not quite sure. Ah, I did find one.. if I clone the repo with "X/foo" in it to a case-sensative filesystem, and add a "x/foo" there, and pull that commit back to HFS+, git says:
Aha -- that lets me reproduce your problem with the same file being staged twice with different capitalizations, too:
And modified files that git refuses to commit, which entirely explains git-annex has issues with git when staging/commiting logs.
I think git is frankly, buggy. It seems I will need to work around this by stopping using mixed case hashing for location logs.
The code is:
The error message from the compiler, followed by the above error message does not seem "silent". It does exit 0 without running the test suite if it cannot be built.
You might try mounting your NAS with the mount option
local_lock=all
This will keep the lock files on your (I assume linux) client. If you do this make sure you don't have another client using git-annex in the same NFS directory.
I think I have figured out why
It goes back to the this piece of code (in test.hs)
It seems that on OSX it does not preserve the symbolic link information, basically cp is not gnu cp on OSX, doing a "cp -a SOURCE DEST" seem's to the right thing on OSX. I tried it out on my archlinux workstation by replacing -pr with just -a and all the tests passed on archlinux.
I'm not sure what the implications would be with changing the test with changing the cp command.
Haven't given these any serious thought (which will become apparent in a moment) but hoping they will give birth to some less retarded ideas:
Bait'n'switch
In doing so, the blobs to be committed can remain unaltered, irrespective of their related files' depth in the directory hierarchy.
To prevent git from reporting ALL annexed files as unstaged changes after running post-commit hook, their paths would need to be added to .gitignore.
This wouldn't cause any issues when adding files, very little when modifying files (would need some alterations to "git annex unlock"), BUT would make git totally oblivious to removals...
Manifest-based (re)population
... thus circumventing the issue entirely, yet diffstats (et al.) would be rather uninformative.
Wide open to suggestions, criticism, mocking laughter and finger-pointing :)
I doubt that git-annex can be used with QuickCheck 1.2.0. The QuickCheck I've tested it with is 2.1.0.3 actually.
I suspect you have an old version of the TestPack haskell library on your system, that is linked against QuickCheck 1.2.0. Git-annex has been tested with TestPack 2.0.0, which uses QuickCheck 2.x.
In any case, you don't have to run 'make test' to build git-annex, and my comments above should make the main program compile, I expect.
Possible solutions:
This:
or this:
or this:
If you want to reformat this output, putting 'here', 'origin', etc into fixed formatting might make sense, as well. -- Richard
After mulling this over, I think actually encrypting the filenames is preferable.
Did you consider encrypting the symmetric key with an asymmetric one? That's what TrueCrypt etc are using to allow different people access to a shared volume. This has the added benefit that you could, potentially, add new keys for data that new people should have access to while making access to old data impossible. Or keys per subdirectory, or, or, or.
As an aside, could the same mechanism be extended to transparently encrypt data for a remote annex repo? A friend of mine is interested to host his data with me, but he wants to encrypt his data for obvious reasons.
I'm using git-annex to keep my music in sync between all of my different machines. What I'd love to be able to do is to also keep it in sync with my iRiver player. Unfortunately, the firmware, Rockbox, doesn't support ext3, so I'm stuck with a FAT filesystem.
I can see how the design of git-annex makes it rather difficult to get rid of the symlinks, so how about taking a different approach: something like a "git annex export DEST" which would take a destination (not a git remote) and rsync the content over to there as regular files.
Maybe "git annex sync DEST" or "git annex rsync DEST" would be better names if we want to convey the idea that the destination will be made to look like the source repo, including performing the necessary deletions.
I followed this to re-inject files which git annex fsck listed as missing.
For everyone of those files, I get
when trying to copy the files to the remote.
-- Richard
AFAICs, you probably just have a "conq" program that is running in the background and emitted this error.
The error message is not part of git-annex; it does not run any "conq" thing itself. Although you could try passing the --debug parameter to check the commands it does run to see if one of them somehow causes this conq thing.
It may be possible that OSX has some low resource limits, for user processes (266 per user I think) doing a
seems to change the behaviour of the tests abit...
the number of failures vary as I change the values of the maxprocs, I think I have narrowed it down to OSX just being stupid with limits thus causing the tests to fail.
When I reproduce this, the file is not gone, it's been moved under .git/annex/objects. There is no way an add can delete a file, since all it does is rename it. It would be good for it to error unwind and move the file back though.
Seems you built it using
make
.. could you try instead building with cabal, ie runcabal install git-annex
orcabal build
in the source tree. I think cabal will probably do the right thing.I could fix the Makefile, I suppose. What does this say: `ghc -e 'print System.Info.arch'
Alright, I have created a case-insensative HFS+ filesystem here on my linux laptop.
I have not been able to trick git into staging the same file with 2 different capitalizations yet.
It might be helpful if you can send me a copy of a git repository where 'git add -i' shows the same file staged with two capitalizations. Leaving out .git/annex of course. (joey@kitenet.net; a tarball would probably work)
It seems that
git add
only started properly working on case insensative filesystems quite recently. The commit in question is 5e738ae820ec53c45895b029baa3a1f63e654b1b, "Support case folding for git add when core.ignorecase=true", which was first released in git 1.7.4, January 30, 2011. If you don't yet have that version, that could explain the problem entirely. In about half an hour (dialup!) I will have downloaded an older git and will see if I can reproduce the problem with it.I'm running ghc 6.12.3 with the corresponding haskell-platform package from the HP site which I installed in preference to the macports version of haskell-platform (it's quite old). it seems when you install quickcheck, the version that is installed is of version 2.4.0.1 and not 1.2.0 which git-annex depends on for its tests.
it fails with this
I'd imagine if I could downgrade, it would compile and pass the tests (I hope)
git annex unlock; modify; git-annex lock
Ah, I see, I was not thinking about the location log update that's done on the remote side.
For transfers over ssh, that's a separate git-annex-shell invoked per change. For local-local transfers, it's all done in a single process but it spins up a state to handle the remote and then immediately shuts it down, also generating a commit.
In either case, I think there is a nice fix. Since git-annex does have a journal nowadays, and goes to all the bother to support recovery if a process was interrupted and journalled changes that did not get committed, there's really no reason in either of these cases for the remote end to do anything more than journal the change. The next time git-annex is actually run on the remote, and needs to look up location information, it will merge the journalled changes into the branch, in a single commit.
My only real concern is that some remotes might never have git-annex run in them directly, and would just continue to accumulate journal files forever. Although due to the way the journal is structured, it can have, at a maximum, the number of files in the git-annex branch. However, the number of files in it is expected to be relatively smal and it might get a trifle innefficient, as it lacks directory hashing. These performance problems could certainly be dealt with if they do turn out to be a problem.
That sounds just fine, but indeed my use case was a bare backup/transfer repository that is meant to always be only at the remote end of git-annex operations. So why not as well do a single commit after everything has been copied and journaled? That's what's done at the other end too, after all. Or, if commits are to be minimized, just stage the journal into the index before finishing, but don't commit it yet?
(I would actually prefer this mode of usage for other git-annex operations, too. In git you can add stuff little by little and commit them all in one go. In git-annex the add immediately creates a commit, which is unexpected and a bit annoying.)
If you install the monads-fd package (with cabal install for instance), then you can no longer build git-annex:
I'm leaving this bug open because this feature, however minor is not available on OSX and BSD.
I have added a partial implementation using lutimes(3), which should be available on the BSDs. However, it's ifdefed out due to a casting problem: The TimeSpec uses a CTime, while lutimes uses a CLong. These data types may be internally the same on some or all platforms, so if you want this feature you can try changing the "ifdef 0" in Touch.hsc to 1 and try it, see if "git annex add" mirrors file modification time in created symlinks, and let me know.
@seqq git-annex always uses the same case when creating and accessing the files pointed to by the symlinks. So it will not matter if it's used on a case-insensative, or case-insensative but preserving system like OSX.
You need to fix up the cases of the files in .git/annex/objects to what it expects. I'm not sure what would be the best way to do that. The method described in recover data from lost+found might work well.
Keep in mind that lots of small files may have significant overhead, so a warning that it's not possible to make sure there's enough space would make sense for certain corner cases. Actually finding out the exact overhead is beyond git-annex' scope and, given transparent compression etc, ability, but a warning, optionally with a "do you want to continue" prompt can't hurt.
-- RichiH
It doesn't need to be installed into the system PATH; just the user PATH. Which you should be able to control.
Exactly how to do this surely varies, but here I have a
~/.bashrc
containingPATH=$HOME/bin:$PATH; export PATH
and I keep git-annex-shell in bin and it's available to eg "ssh mybox git-annex-shell"Yes, makes sense. I am so used to using --fast, I forgot a non-fast mode existed. I still think it would be a good idea to fall back to non-fast mode if --fast runs into an error from the remote, but as that is well without my abilities how about this patch?
I've also seen this apparent hang during upgrade to v3. A few more details:
The annex in question has just under 18k files (and hence that many log files), which can slow down directory operations when they're all in the same place (like, for example, .git/annex/journal).
git-annex uses virtually no CPU time and disk IO when it's hanging like this; the first time it happened, 'ps' showed three defunct git processes, with two "git-annex" processes and three "git" procs:
I Ctrl+C'd that and tried again, but it hung again -- this time without the defunct gits.
An strace of the process and its children at the time of hang can be found at http://pastebin.com/4kNh4zEJ . It showed somewhat weird behaviour: When I attached with strace, it would scroll through a whole bunch of syscalls making up the open-fstat-read-close-write loop on .git/annex/journal files, but then would block on a write (sorry, don't have that in my scrollback any more so can't give more details) until I Ctrl+C'd strace; when attaching again, it would again scroll through the syscalls for a second or so and then hang with no output.
Ultimately I detached/reattached with strace about two dozen times and that caused it (?) to finish the upgrade; not really sure how to explain it, but it seems like too much of a timing coincidence.
I use Debian Squeeze, I have the Debian package cabal-install 0.8.0-1 installed.
This installed: Cabal-1.10.2.0, zlib-0.5.3.1, cabal-install 0.10.2. No version of monad-control or monadIO installed.
After I added a depencency for monadIO to the git-annex.cabal file, it installed correctly.
-- Thomas
I checkout out the git annex branch and using
I found a file
The corresponding file also existed in the master branch (as a link).
I moved both these files to a folder outside my repository and synched my git-annex branch with by master server. I still get the same error. Is there any other place where information about this file is stored?
Yes, this is a known problem with kqueue, it has to keep every directory in the tree open. On inotify I have a note that it may need to fork off extra watcher processes to deal with this. Of course that adds significant complication.
In the meantime, you may be able to increase your system's maximum allowed number of open files per process somehow.
(I doubt that the ssh-agent is related; git-annex does not use ssh-agent directly anyway..)
I don't think this has to do with the path name of the repository containing utf-8 at all.
Your recipe for reproducing this depends on some pre-existing repository that I don't know how to set up to reproduce this bug. All I can guess is that, based on the "decodeUtf8" in the error message, it's coming from the one part of the code that still uses that, the union merger.
This is what happens when I add the debug parameter
git annex unused --debug
unused . (checking for unused data...) git ["--git-dir=/home/kristian/AnnexMedia/.git","--work-tree=/home/kristian/AnnexMedia","ls-files","--cached","-z","--","/home/kristian/AnnexMedia"] git ["--git-dir=/home/kristian/AnnexMedia/.git","--work-tree=/home/kristian/AnnexMedia","show-ref"] (checking master...) git ["--git-dir=/home/kristian/AnnexMedia/.git","--work-tree=/home/kristian/AnnexMedia","ls-tree","--full-tree","-z","-r","--","refs/heads/master"] git ["--git-dir=/home/kristian/AnnexMedia/.git","--work-tree=/home/kristian/AnnexMedia","cat-file","--batch"] git-annex: Cannot decode byte '\xb4': Data.Text.Encoding.decodeUtf8: Invalid UTF-8 stream