Chris's Wiki :: blog/tech/ModernAgeGit Commentshttps://utcc.utoronto.ca/~cks/space/blog/tech/ModernAgeGit?atomcommentsDWiki2009-04-07T10:02:56ZRecent comments in Chris's Wiki :: blog/tech/ModernAgeGit.From 83.8.221.71 on /blog/tech/ModernAgeGittag:CSpace:blog/tech/ModernAgeGit:f8a38fec3e953fba95d1598be0a9c3553efeb4b1From 83.8.221.71<div class="wikitext"><p>First, even though git originally, at the beginning, didn't use any form of delta encoding to reduce disk space (disk is cheap), it still used compression, and it stored only unique different versions of contents of a file (thanks to its content hash addressed design borrowed from Monotone). This storage format is called <strong>loose format</strong>.</p>
<p>Second, quite soon after it acquired <strong>packed format</strong>, which uses <em>delta encoding</em>. It uses the same format for network (bandwidth is not as large and as cheap as disk space) as for disk. You should repack your repository from time to time (it nowadays, with modren git, should usually happen automatically) mainly not because it reduces disk space, but because it reduces access time (and wasted disk space and inodes). BTW. delta encoding used by git is XDiffLib binary delta encoding...</p>
<p>Still, I agree that "disk is cheap" mentality allowed for very clear git repository model, without need to build it around deltaification scheme (or equivalent, like weave storage model).</p>
<p>--Jakub Narębski</p>
</div>2009-04-07T10:02:56ZBy Chris Siebenmann on /blog/tech/ModernAgeGittag:CSpace:blog/tech/ModernAgeGit:ce1ac70efa645a0a3a1e4fb5bf0a0e0850e9b6f3Chris Siebenmann<div class="wikitext"><p>As you mention, the difference between pre-Unix filesystems with versioning
and git is that the filesystem versioning wasn't used to keep a full history
of your files; one frequently pruned old versions (either automatically or
manually). If you wanted a full history you had to use some sort of source
code management system, and I expect that they used delta formats precisely
to save precious space.</p>
<p>(I suspect that file versioning on pre-Unix filesystems was mostly designed
for and used for quick and easy recoveries from various sorts of accidents.)</p>
</div>2009-03-26T18:06:40ZFrom 91.22.168.242 on /blog/tech/ModernAgeGittag:CSpace:blog/tech/ModernAgeGit:f6453a9b42c30b6900d6a685b3b1a270749ba352From 91.22.168.242<div class="wikitext"><p>I disagree, there were many pre-UNIX filesystems that had file versioning (VMS and the ITS file system come to mind), I don't think these used deltas internally, and they were effectively used for version control, but on a file level. (You had to prune them often, of course.)</p>
<p>What Git changed is that it does content-indexing, which probably would have been too expensive back in the days. Then, Plan9's Venti pretty much is the same.</p>
<p>Still, Git is such an easy concept one wonders why nobody came up with it earlier (I think the packed file algorithm is pretty "recent", but its not like algorithms of this kind didn't exist before).</p>
<p>--chris2</p>
</div>2009-03-26T17:30:53ZBy Chris Siebenmann on /blog/tech/ModernAgeGittag:CSpace:blog/tech/ModernAgeGit:f6b65ec8dbf17b2f056b94edb0d6b130ac654c6bChris Siebenmann<div class="wikitext"><p>Current versions of git have an optional delta-based 'pack' format for
storage, but the core git storage format is one Unix file per repository
object (see for example <a href="http://www.kernel.org/pub/software/scm/git/docs/user-manual.html#pack-files">here</a>).
This is part of the reason that git can be crazy fast at operations like
commits, because they require just a few file copies (and of things that
are probably hot in caches, too).</p>
</div>2009-03-26T16:06:10ZFrom 62.140.137.62 on /blog/tech/ModernAgeGittag:CSpace:blog/tech/ModernAgeGit:d9219d4a8875cb8cc6e216be2d21a3204b94d658From 62.140.137.62<div class="wikitext"><p>Huh? I haven't used git much, but I'm a hg developer, and we surely don't store full copies on every change. I strongly doubt that git does, either. In Mercurial, we store deltas from a base revision, right until the delta becomes almost-as-large as (a new) full copy. Since git does content-based versioning, I expect that it actually only stores the changed content, and some hash structure that tells it how to reconstruct the changed file from that.</p>
<p>Dirkjan Ochtman</p>
</div>2009-03-26T09:40:55Z