2012-05-28
Yes, git add
makes a difference (no matter what people think)
One of the things said about git is that
it's less user friendly and takes longer to learn than Mercurial; the first exhibit for this difference
is usually git add
and by extension git's index. Unfortunately,
a common reaction
among git fans to both the general issue and git add
in specific
is a kind of defensive denial, where they hold forth that it's not
that difficult and people learn it fine and really, git is user
friendly.
You may already have gotten an idea of my views on this. I'm here to
tell you, from a mostly outsider perspective, that git add
really
does make a real difference in initial user friendliness, one that
makes Mercurial easier to pick up and use for straightforward uses.
(I've used git to a certain extent, for example for my Github stuff, but I am not up to the experienced user level. I'm not really at that level with Mercurial either, partly because I haven't needed to be and partly because I'd rather learn git; Mercurial is easier but I like git more.)
Before people freak out too much, let me be explicit: all of this is about initial user friendliness, the ease of doing straightforward things and picking up the system. In the long run I think that the git index is the right design decision (for a programmer focused VCS) because it creates an explicit model for doing a number of important but tricky things, a model that can be manipulated and inspected and reasoned about, and once you learn git and use it regularly dealing with the index becomes second nature. But people generally do not defend the index in these terms; instead, they try to maintain with a straight face that it's no real problem for people even at the start.
(If you think that the index does not cause problems for git beginners, I would gently suggest that you trawl through some places where they ask questions.)
The usability problem with git add
is not just the need for git
add
itself as an extra step, it is that the existence of the index
has additional consequences that ripple through to using other bits of
git. For example, let us take the case of the disappearing diff:
; git diff a [...] -hi there +hi there, jim ; git add a ; git diff ;
If you already know git you know what's going on here (and you're going
to reach for 'git diff --cached
'). If you're learning git, well, your
change just disappeared. Of course this happens the other way around
too; 'git diff
' shows you nice diffs, then you do 'git commit
' and
it tells you nothing to commit. Wait, what? The diffs are right there.
(There's worse bear traps in the woods for beginners, too, like doing a
'git add
' and then further editing the file. Here 'git diff
' will
show you a diff but it is not what will be committed.)
All of this is a cognitive burden. When you use git, you have to
learn and remember the existence of the index and how this affects what
you do, and you probably need to take extra steps or pay extra attention
to what 'git commit
' and so on tell you. This cognitive burden is
real, although it can (and will be) overcome with familiarity and what
it enables has important benefits. It is a mistake and a lie to try to
pretend otherwise. Honesty in git advocacy is to say straightforwardly
that the index is worth it in the end (possibly unless you have simple
work patterns).
(A system where the index or its equivalent is an advanced feature, one
not exposed by default, really does have a simpler initial workflow. If
it's designed competently (and Mercurial is), everything 'just works'
the way you expect; hg commit
commits what hg diff
shows you and so
on. In real life this makes a difference to people's initial acceptance
of a new VCS, especially if the simple workflow is adequate for almost
everything you'll ever do with the system. This is not true of the sort
of advanced VCS use that programmers can practice routinely, but it can
be of other VCS uses.)
Sidebar: the problem with 'git commit -a
'
At this point some people may come out of the woodwork to tell me about
git commit -a
, or even about creating an alias like 'git checkin
'
that always forces -a
. There are two pragmatic problems with this.
First, the index still exists even if you're trying to pretend otherwise.
This means that you can accidentally use the index; you can run git add
because something said to, or you can run straight git commit
, and so on.
All of these will create confusion and cause git to do what (to you) looks
like the wrong thing.
(In fact you have to run git add
every so often, to add new files.)
Second, it is not at all obvious from simply reading documentation that
using git commit -a
is a fully reliable way of transmuting git into
Mercurial. Maybe it is, maybe it isn't, but as a beginner you don't
know (not without doing more research than I myself have done). Because
many git operations are fundamentally built around the existence of
the index, the safest assumption to make is that the index really does
matter and git commit -a
is probably an incomplete workaround.
(For example, at the point where you do git add
to add a new file
you'll become familiar with git diff HEAD
in order to get the true
diffs for what will be committed when you run git commit -a
, which I
hope illustrates my point adequately. And maybe there's a better command
for doing that, which also illustrates my point because git diff HEAD
is what I came up with as a relative git novice.)
How to do a very cautious LVM storage migration
A while back I wrote about how I was tempted by LVM mirroring when I wanted to migrate my LVM setup from a RAID mirror on some old disks to a new RAID mirror on some new(er) disks. Because I am some peculiar combination of cautious and daring, I gave in to this temptation recently. Now that the migration has more or less finished, it's time I reported in how it went and how to do this.
The short summary is using LVM mirroring to migrate my LVM volume group
from disk to disk worked without problems but the next time I need to do
this I will probably just use pvmove
, because establishing the actual
mirrors was achingly slow and the whole process was kind of a tedious
pain in the rear. I don't know if pvmove
would be faster, but I can
hope.
(The mirrors seemed to perform decently once they were synchronized. But initial synchronization of about 250 GB of data took literally days and it was not disk speed limited; LVM never drove the disks at full bandwidth or full IOPs/second rates.)
There are two advantages of using LVM mirroring instead of pvmove
and
I used both of them. First, you can run for a while on both the new
storage and the old storage at the same time, to build up confidence in
the new storage. Second, you can preserve a complete and usable copy of
all of your data on the old storage, a copy that you can inspect, mount,
and so on if you wind up having to. With pvmove
, your data just moves;
you wind up only on the new storage and there's nothing left on the old
storage.
I read a number of writeups of how to do LVM mirroring on the web, but I
found all of them to be a little bit unclear (partly because the logic
of when you specified which disk device wasn't always clear). So here
is the annotated steps that I used. First, let's say that the old disk
space you're migrating away from is /dev/OLD
and the new disk space
is /dev/NEW
, and you're migrating the LVM volume group vg0
with the
single volume vg0/data
, mounted on /data
. Then:
- Initialize
/dev/NEW
as a LVM physical volume:pvcreate /dev/NEW
- Add it to the volume group:
vgextend vg0 /dev/NEW
- Mirror each volume/filesystem to the new storage:
lvconvert -m1 --mirrorlog mirrored --alloc anywhere vg0/data /dev/NEW
This is the step that takes forever, and you have to repeat it for each filesystem (I did not try to
lvconvert
multiple volumes at once, I did them one at a time).It's possible that you will not need '
--alloc anywhere
'; leave it out the first time to see (if you do need it, LVM will report that it can't find space to put stuff). The important arguments are-m1
, which tells lvconvert to create a mirror (on/dev/NEW
, because that's the physical volume we specified) and--mirrorlog mirrored
which tells it to create a (mirrored) persistent on-disk log of what bits of the mirror are in sync.If I was doing this again I might just use
--mirrorlog disk
, because as it happens LVM put both of my mirror log mirrors on/dev/NEW
for its own inscrutable reasons (it's possible that--alloc anywhere
influenced this). I didn't let this worry me because the whole situation was temporary and/dev/NEW
was itself a mirrored RAID array, so it was already pretty reliable.(It's possible that a non-mirrored mirrorlog would speed things up.)
- Verify that everything looks good:
lvs -a -o+devices
What this should show is that
vg0/data
now has four internal subvolumes. The_mimage_N
subvolumes are the actual mirrors (the original volume you started with and the mirror on the new storage), one on each of/dev/OLD
and/dev/NEW
, and you'll also have two additional subvolumes for the mirror log (ideally one on each disk, but see above).At this point you can run with full mirroring for as long as you want in order to build up confidence in the new disk(s). Once you're fully happy with them, it's time to complete the migration by splitting off the old disks.
- Split apart each volume, leaving the live version on the new disk and
creating a new volume that is the data on the old disk. I think
that I read that this apparently goes better if the filesystem
is unmounted at the time, so that's how I did it:
umount /data
lvconvert --splitmirrors 1 -n data-o vg0/data /dev/OLD
mount /data
The
-n data-o
gives the volume name of the 'new' volume (ie, the name you want for the original volume on the original disk). We specify/dev/OLD
here to tell lvconvert that it should act on the mirror side that is on/dev/OLD
.If you run '
lvs -a -o+devices
' afterwards, you should see that all of those internal subvolumes have disappeared and you now have two volumes; vg0/data should be entirely on /dev/NEW and vg0/data-o should be entirely on /dev/OLD. - After doing this for each filesystem you have one volume group using
both /dev/OLD and /dev/NEW but all of your live volumes are on
/dev/NEW; all of the volumes on /dev/OLD are unused. The final step is
to split apart the volume group itself into two, the live one on /dev/NEW
and a second volume group that is just all of the old volumes on /dev/OLD.
First, we need to make all of the volumes on /dev/OLD inactive:
lvchange -an vg0/data-o
This should complete without complaints because none of these volumes should be in use; they should all be quiescent, unmounted, and so on.
Then we can split the volume group itself:
vgsplit vg0 vg0-o /dev/OLD
Here
vg0-o
is the name of the 'new' volume group, ie the old copy of the data on the old storage. We specify/dev/OLD
to tellvgsplit
to act on the volumes (and physical volume and so on) on/dev/OLD
.Running '
lvs -a -o+devices
' should now show two volume groups, with vg0 using only/dev/NEW
and vg0-o using only/dev/OLD
.
After this is done you can decommission vg0-o
at your leisure. I
haven't gotten around to doing that since I haven't quite reached the
point where I want to physically remove the old disks (I still have
my boot partition on them, partly because I need to figure out which
physical SATA plug on the motherboard actually is sda
, sdb
, and so
on).
(I don't know if you can just disconnect the disks without doing
anything special in LVM. That would be the ideal way to do it since it
would preserve vg0-o
and its volumes completely intact for any future
need, but LVM might get upset when you reboot your machine because a
volume group it expects isn't there.)