2007-02-21
Fixing Python's string .join()
The thing that has always irritated me about string .join()
is that it
doesn't stringify its arguments; if one of the things in the sequence to
be joined isn't a string, .join()
doesn't call str()
on it, it just
pukes. This is periodically annoying and inconvenient.
It recently occurred to me that this can be fixed, like so:
class StrifyingStr(str): def join(self, seq): s2 = [str(x) for x in seq] return super(StrifyingStr, self).join(s2) def str_join(js, seq): return StrifyingStr(js).join(seq)
(A similar version for Unicode strings is left as an exercise for the reader.)
You might think that a generator expression would be more efficient
than a list comprehension here; in fact, that's what my first version
used. Then I actually timed it, and found out that regardless of whether
or not .join()
was passed a list or an iterator, and for sizes of
the list (or iterator) from 10 elements to 10,000, doing the list
comprehension was slightly faster.
Now that I have this I can think of a number of places where I may wind up using it, which kind of makes me wish I'd scratched this irritation before now.
The quick overview of DiskSuite failover
Solaris 8 DiskSuite does failover on disks (logical or otherwise), not filesystems or partitions. Solaris then gives you up to seven partitions per disk (technically you get eight, but DiskSuite takes one for its metadata); you then use these partitions as the building blocks for mirrors, stripes, and filesystems.
Disks are grouped together into metasets, and one machine in your failover cluster owns each metaset at any given time and is the only machine allowed to do IO to any of its disks. As a consequence, all mirroring, striping, and so on has to be within a metaset. In our setup, each metaset is all of the disks in a virtual NFS server. A single physical system can be the owner of more than one metaset (and thus more than one virtual NFS server).
(Failover itself is done by changing the owner of a metaset, possibly forcibly.)
In Solaris 8, all of the disks in a metaset have to appear as the same devices on all of the machines participating in the failover pool for that metaset (eg, c0t0d0 has to be c0t0d0 everywhere). This is apparently a limitation in the metadata that DiskSuite keeps, and I believe it's been relaxed by Solaris 10. As a practical matter, this means you want identical hardware configurations for all of your fileserver machines.
If you have a SAN and want any of your filesystems to have SAN RAID controller redundancy (so the filesystem keeps going even if one controller falls over), this means the filesystem's metaset must include disks from more than one controller. Unless you dedicate all of two controllers to a single (probably very big) metaset, you will probably wind up with a situation where a single SAN RAID controller has (different) disks in several different metasets. This unfortunately complicates load calculations and working out what the consequences are of taking a single controller down.
(In the extreme case the failure of a single controller could affect all of your virtual NFS servers.)