== Today's Solaris 10 irritation: the fault manager daemon More and more, Solaris 10 strikes me as being much like Ubuntu 6.06: a system with plenty of big ideas but only half finished implementations. Today's half implemented idea is _fmd_, the new [[fault manager daemon http://www.sun.com/bigadmin/features/articles/selfheal.jsp]]. One of the things I expect out of a fault monitoring system is that it should not report things as faulted when they are now fine, especially not with scary messages that get dumped on the console at every boot (it's acceptable to report them as faulted and now better, provided that you only do it once). As I discovered today, under some circumstances involving ZFS pools and iSCSI, _fmd_ falls down on this; I got verbose error messages about missing pools (that were there and fine) dumped to the console (and syslog) on every boot. Unfortunately, I couldn't find any simple way to clear these errors. There is probably a magic _fmadm flush_ incantation, but I couldn't find the right argument, and doing _fmadm reset_ on the two ZFS modules that _fmadm config_ reported didn't do anything. I had to resort to picking event UUIDs out of _fmadm faulty_ output and running _fmadm repair_ on each one. (And why didn't Sun give the fault manager an option to send email to someone when faults happen? I'd have thought that that would be basic functionality, and it would make it actually useful for us.) === Sidebar: How I got _fmd_ to choke this way I ran a test overnight that hung the iSCSI target machine, which caused the Solaris machine to reboot and then hang during boot. In the process of straightening all of this out there was a time when the iSCSI machine was refusing connections, which caused the Solaris machine to finally boot but with none of the ZFS pools available. When I brought the iSCSI machine back up, the pools reappeared but the fault manager had somehow latched on to the original 'pool not present' events and kept repeating them.