Wandering Thoughts archives

2012-05-28

Yes, git add makes a difference (no matter what people think)

One of the things said about git is that it's less user friendly and takes longer to learn than Mercurial; the first exhibit for this difference is usually git add and by extension git's index. Unfortunately, a common reaction among git fans to both the general issue and git add in specific is a kind of defensive denial, where they hold forth that it's not that difficult and people learn it fine and really, git is user friendly.

You may already have gotten an idea of my views on this. I'm here to tell you, from a mostly outsider perspective, that git add really does make a real difference in initial user friendliness, one that makes Mercurial easier to pick up and use for straightforward uses.

(I've used git to a certain extent, for example for my Github stuff, but I am not up to the experienced user level. I'm not really at that level with Mercurial either, partly because I haven't needed to be and partly because I'd rather learn git; Mercurial is easier but I like git more.)

Before people freak out too much, let me be explicit: all of this is about initial user friendliness, the ease of doing straightforward things and picking up the system. In the long run I think that the git index is the right design decision (for a programmer focused VCS) because it creates an explicit model for doing a number of important but tricky things, a model that can be manipulated and inspected and reasoned about, and once you learn git and use it regularly dealing with the index becomes second nature. But people generally do not defend the index in these terms; instead, they try to maintain with a straight face that it's no real problem for people even at the start.

(If you think that the index does not cause problems for git beginners, I would gently suggest that you trawl through some places where they ask questions.)

The usability problem with git add is not just the need for git add itself as an extra step, it is that the existence of the index has additional consequences that ripple through to using other bits of git. For example, let us take the case of the disappearing diff:

; git diff a
[...]
-hi there
+hi there, jim
; git add a
; git diff
;

If you already know git you know what's going on here (and you're going to reach for 'git diff --cached'). If you're learning git, well, your change just disappeared. Of course this happens the other way around too; 'git diff' shows you nice diffs, then you do 'git commit' and it tells you nothing to commit. Wait, what? The diffs are right there.

(There's worse bear traps in the woods for beginners, too, like doing a 'git add' and then further editing the file. Here 'git diff' will show you a diff but it is not what will be committed.)

All of this is a cognitive burden. When you use git, you have to learn and remember the existence of the index and how this affects what you do, and you probably need to take extra steps or pay extra attention to what 'git commit' and so on tell you. This cognitive burden is real, although it can (and will be) overcome with familiarity and what it enables has important benefits. It is a mistake and a lie to try to pretend otherwise. Honesty in git advocacy is to say straightforwardly that the index is worth it in the end (possibly unless you have simple work patterns).

(A system where the index or its equivalent is an advanced feature, one not exposed by default, really does have a simpler initial workflow. If it's designed competently (and Mercurial is), everything 'just works' the way you expect; hg commit commits what hg diff shows you and so on. In real life this makes a difference to people's initial acceptance of a new VCS, especially if the simple workflow is adequate for almost everything you'll ever do with the system. This is not true of the sort of advanced VCS use that programmers can practice routinely, but it can be of other VCS uses.)

Sidebar: the problem with 'git commit -a'

At this point some people may come out of the woodwork to tell me about git commit -a, or even about creating an alias like 'git checkin' that always forces -a. There are two pragmatic problems with this.

First, the index still exists even if you're trying to pretend otherwise. This means that you can accidentally use the index; you can run git add because something said to, or you can run straight git commit, and so on. All of these will create confusion and cause git to do what (to you) looks like the wrong thing.

(In fact you have to run git add every so often, to add new files.)

Second, it is not at all obvious from simply reading documentation that using git commit -a is a fully reliable way of transmuting git into Mercurial. Maybe it is, maybe it isn't, but as a beginner you don't know (not without doing more research than I myself have done). Because many git operations are fundamentally built around the existence of the index, the safest assumption to make is that the index really does matter and git commit -a is probably an incomplete workaround.

(For example, at the point where you do git add to add a new file you'll become familiar with git diff HEAD in order to get the true diffs for what will be committed when you run git commit -a, which I hope illustrates my point adequately. And maybe there's a better command for doing that, which also illustrates my point because git diff HEAD is what I came up with as a relative git novice.)

tech/GitAddMatters written at 23:34:04; Add Comment

How to do a very cautious LVM storage migration

A while back I wrote about how I was tempted by LVM mirroring when I wanted to migrate my LVM setup from a RAID mirror on some old disks to a new RAID mirror on some new(er) disks. Because I am some peculiar combination of cautious and daring, I gave in to this temptation recently. Now that the migration has more or less finished, it's time I reported in how it went and how to do this.

The short summary is using LVM mirroring to migrate my LVM volume group from disk to disk worked without problems but the next time I need to do this I will probably just use pvmove, because establishing the actual mirrors was achingly slow and the whole process was kind of a tedious pain in the rear. I don't know if pvmove would be faster, but I can hope.

(The mirrors seemed to perform decently once they were synchronized. But initial synchronization of about 250 GB of data took literally days and it was not disk speed limited; LVM never drove the disks at full bandwidth or full IOPs/second rates.)

There are two advantages of using LVM mirroring instead of pvmove and I used both of them. First, you can run for a while on both the new storage and the old storage at the same time, to build up confidence in the new storage. Second, you can preserve a complete and usable copy of all of your data on the old storage, a copy that you can inspect, mount, and so on if you wind up having to. With pvmove, your data just moves; you wind up only on the new storage and there's nothing left on the old storage.

I read a number of writeups of how to do LVM mirroring on the web, but I found all of them to be a little bit unclear (partly because the logic of when you specified which disk device wasn't always clear). So here is the annotated steps that I used. First, let's say that the old disk space you're migrating away from is /dev/OLD and the new disk space is /dev/NEW, and you're migrating the LVM volume group vg0 with the single volume vg0/data, mounted on /data. Then:

  1. Initialize /dev/NEW as a LVM physical volume:
    pvcreate /dev/NEW
  2. Add it to the volume group:
    vgextend vg0 /dev/NEW

  3. Mirror each volume/filesystem to the new storage:
    lvconvert -m1 --mirrorlog mirrored --alloc anywhere vg0/data /dev/NEW

    This is the step that takes forever, and you have to repeat it for each filesystem (I did not try to lvconvert multiple volumes at once, I did them one at a time).

    It's possible that you will not need '--alloc anywhere'; leave it out the first time to see (if you do need it, LVM will report that it can't find space to put stuff). The important arguments are -m1, which tells lvconvert to create a mirror (on /dev/NEW, because that's the physical volume we specified) and --mirrorlog mirrored which tells it to create a (mirrored) persistent on-disk log of what bits of the mirror are in sync.

    If I was doing this again I might just use --mirrorlog disk, because as it happens LVM put both of my mirror log mirrors on /dev/NEW for its own inscrutable reasons (it's possible that --alloc anywhere influenced this). I didn't let this worry me because the whole situation was temporary and /dev/NEW was itself a mirrored RAID array, so it was already pretty reliable.

    (It's possible that a non-mirrored mirrorlog would speed things up.)

  4. Verify that everything looks good:
    lvs -a -o+devices

    What this should show is that vg0/data now has four internal subvolumes. The _mimage_N subvolumes are the actual mirrors (the original volume you started with and the mirror on the new storage), one on each of /dev/OLD and /dev/NEW, and you'll also have two additional subvolumes for the mirror log (ideally one on each disk, but see above).

    At this point you can run with full mirroring for as long as you want in order to build up confidence in the new disk(s). Once you're fully happy with them, it's time to complete the migration by splitting off the old disks.

  5. Split apart each volume, leaving the live version on the new disk and creating a new volume that is the data on the old disk. I think that I read that this apparently goes better if the filesystem is unmounted at the time, so that's how I did it:
    umount /data
    lvconvert --splitmirrors 1 -n data-o vg0/data /dev/OLD
    mount /data

    The -n data-o gives the volume name of the 'new' volume (ie, the name you want for the original volume on the original disk). We specify /dev/OLD here to tell lvconvert that it should act on the mirror side that is on /dev/OLD.

    If you run 'lvs -a -o+devices' afterwards, you should see that all of those internal subvolumes have disappeared and you now have two volumes; vg0/data should be entirely on /dev/NEW and vg0/data-o should be entirely on /dev/OLD.

  6. After doing this for each filesystem you have one volume group using both /dev/OLD and /dev/NEW but all of your live volumes are on /dev/NEW; all of the volumes on /dev/OLD are unused. The final step is to split apart the volume group itself into two, the live one on /dev/NEW and a second volume group that is just all of the old volumes on /dev/OLD.

    First, we need to make all of the volumes on /dev/OLD inactive:

    lvchange -an vg0/data-o

    This should complete without complaints because none of these volumes should be in use; they should all be quiescent, unmounted, and so on.

    Then we can split the volume group itself:

    vgsplit vg0 vg0-o /dev/OLD

    Here vg0-o is the name of the 'new' volume group, ie the old copy of the data on the old storage. We specify /dev/OLD to tell vgsplit to act on the volumes (and physical volume and so on) on /dev/OLD.

    Running 'lvs -a -o+devices' should now show two volume groups, with vg0 using only /dev/NEW and vg0-o using only /dev/OLD.

After this is done you can decommission vg0-o at your leisure. I haven't gotten around to doing that since I haven't quite reached the point where I want to physically remove the old disks (I still have my boot partition on them, partly because I need to figure out which physical SATA plug on the motherboard actually is sda, sdb, and so on).

(I don't know if you can just disconnect the disks without doing anything special in LVM. That would be the ideal way to do it since it would preserve vg0-o and its volumes completely intact for any future need, but LVM might get upset when you reboot your machine because a volume group it expects isn't there.)

linux/LVMCautiousMigration written at 00:33:42; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.