git-annex's high-level design is mostly inherent in the data that it stores in git, and alongside git. See internals for details.
See encryption for design of encryption elements.
git-annex's high-level design is mostly inherent in the data that it stores in git, and alongside git. See internals for details.
See encryption for design of encryption elements.
watch
branch.Assuming you're storing your encrypted annex with me and I with you, our regular cron jobs to verify all data will catch corruption in each other's annexes.
Checksums of the encrypted objects could be optional, mitigating any potential attack scenarios.
It's not only about the cost of setting up new remotes. It would also be a way to keep data in one annex while making it accessible only in a subset of them. For example, I might need some private letters at work, but I don't want my work machine to be able to access them all.
@Richard the easy way to deal with that scenario is to set up a remote that work can access, and only put in it files work should be able to see. Needing to specify which key a file should be encrypted to when putting it in a remote that supported multiple keys would add another level of complexity which that avoids.
Of course, the right approach is probably to have a separate repository for work. If you don't trust it with seeing file contents, you probably also don't trust it with the contents of your git repository.
I always appreciate your OSX work Jimmy...
Could it be put into macports?
New encryption keys could be used for different directories/files/patterns/times/whatever. One could then encrypt this new key for the public keys of other people/machines and push them out along with the actual data. This would allow some level of access restriction or future revocation. git-annex would need to keep track of which files can be decrypted with which keys. I am undecided if that information needs to be encrypted or not.
Encrypted object files should be checksummed in encrypted form so that it's possible to verify integrity without knowing any keys. Same goes for encrypted keys, etc.
Chunking files in this context seems like needless overkill. This might make sense to store a DVD image on CDs or similar, at some point. But not for encryption, imo. Coming up with sane chunk sizes for all use cases is literally impossible and as you pointed out, correlation by the remote admin is trivial.
In relation to macports, I often found that haskell in macports are often behind other distros, and I'm not willing to put much effort into maintaining or updating those ports. I found that to build git-annex, installing macports manually and then installing haskell-platform from the upstream to be the best way to get the most up to date dependancies for git-annex.
fyi in macports ghc is at version 6.10.4 and haskell platform is at version 2009.2, so there are a significant number of ports to update.
I was thinking about this a bit more and I reckon it might be easier to try and build a self contained .pkg package and have all the needed binaries in a .app styled package, that would work well when the webapp comes along. I will take a look at it in a week or two (currently moving house so I dont have much time)
I see no use case for verifying encrypted object files w/o access to the encryption key. And possible use cases for not allowing anyone to verify your data.
If there are to be multiple encryption keys usable within a single encrypted remote, than they would need to be given some kind of name (a since symmetric key is used, there is no pubkey to provide a name), and the name encoded in the files stored in the remote. While certainly doable I'm not sold that adding a layer of indirection is worthwhile. It only seems it would be worthwhile if setting up a new encrypted remote was expensive to do. Perhaps that could be the case for some type of remote other than S3 buckets.
For the unfamiliar, it's hard to tell if a command like that would persist. I'd suggest being as clear as possible, e.g.:
It's not much for now... but see http://www.sgenomics.org/~jtang/gitbuilder-git-annex-x00-x86_64-apple-darwin10.8.0/ I'm ignoring the debian-stable and pristine-tar branches for now, as I am just building and testing on osx 10.7.
Hope the autobuilder will help you develop the OSX side of things without having direct access to an osx machine! I will try and get gitbuilder to spit out appropriately named tarballs of the compiled binaries in a few days when I have more time.
Complete fsck is good, but once a week probably enough.
But please see if you can make fsck optional depending on if the machine is running on battery.
Hey Joey!
I'm not very tech savvy, but here is my question. I think for all cloud service providers, there is an upload limitation on how big one file may be. For example, I can't upload a file bigger than 100 MB on box.net. Does this affect git-annex at all? Will git-annex automatically split the file depending on the cloud provider or will I have to create small RAR archives of one large file to upload them?
Thanks! James
wasn't there some filesystem functionality that could tell you the amount of open file handles on a certain file? I thought this was tracked per-file too. Or maybe i'm just confusing it with the number of hard links (which stat can tell you), anyway something to look into.
hfsevents seems usable, git-annex does not need to watch for file changes on remotes on other media.
But, trying kqueue first.
You could perhaps run the autobuilder on a per-commit basis..
Corner case, but if the other program finishes writing while you are annexing and your check shows no open files, you are left with bad checksum on a correct file. This "broken" file with propagate and the next round of fsck will show that all copies are "bad".
Without verifying if this is viable, could you set the file RO and thus block future writes before starting to annex?
@wichert All this inotify stuff is entirely linux specific AFAIK anyway, so it's find for workarounds to limitations in inotify functionality to also be linux specific.
@dieter I think you're thinking of hard links, filesystems don't track number of open file handles afaik.
@Jimmy, I'm planning to get watch going on freebsd (and hopefully that will also cover OSX), after merging it :)
@Richard, the file is set RO while it's being annexed, so any lsof would come after that point.
maybe at some point, your tool could show "warning, the following files are still open and are hence not being annexed" to avoid any nasty surprises of a file not being annexed and the user not realizing it.
Homebrew is a much better package manager than MacPorts IMO.