Chris's Wiki :: blog/solaris/OmniOSUpgradeDifficulties Commentshttps://utcc.utoronto.ca/~cks/space/blog/solaris/OmniOSUpgradeDifficulties?atomcommentsDWiki2015-03-17T20:48:02ZRecent comments in Chris's Wiki :: blog/solaris/OmniOSUpgradeDifficulties.By Chris Siebenmann on /blog/solaris/OmniOSUpgradeDifficultiestag:CSpace:blog/solaris/OmniOSUpgradeDifficulties:85cd6510ffdaeca3a74bc8596fe670954af2e884Chris Siebenmann<div class="wikitext"><p>It's extremely difficult to make a fileserver transparently redundant
if you don't have fast failover, and unfortunately ZFS does not; pool
import is a very slow process in at least some environments. Without
transparent redundancy, users notice if you take a fileserver down for
any amount of time.</p>
<p>(We have a hot spare fileserver, but even planned deliberate failover
would probably be ten to twenty minutes and that much time is definitely
user visible and has real effects on our overall environment even for
users that are not on that fileserver.)</p>
</div>2015-03-17T20:48:02ZBy Erik Mathis on /blog/solaris/OmniOSUpgradeDifficultiestag:CSpace:blog/solaris/OmniOSUpgradeDifficulties:b403d4dc44a2fc6d205b0fd1e544ec16e5f95e43Erik Mathis<div class="wikitext"><p>This is a working example of why you make everything redundant from the get go. Plan for and expect midday outages, updates, emergency patch releases. It seems silly to me in 2015 to have only one server to do anything.</p>
</div>2015-03-17T20:12:38ZBy Chris Siebenmann on /blog/solaris/OmniOSUpgradeDifficultiestag:CSpace:blog/solaris/OmniOSUpgradeDifficulties:8373cb0c28453d00467a1724eff9b32f8506a7dfChris Siebenmann<div class="wikitext"><p>All of these issues are surmountable but they make things non-easy (and
really, non-trivial). As for emergency outages for rollbacks: we can
have them, but then people may well get unhappy with us for needing
them in the first place. From the perspective of users, what matters is
us providing a reliable service; if we can't do that and if we're just
flailing around (or if we do things that in practice make it worse),
they're going to get very unhappy with us and start asking pointed
questions about things.</p>
<p>On our Linux machines, we only have outages for non-optional security
upgrades and we basically don't do rollbacks (partly this has been
because we haven't encountered any fatal issues on them). For obvious
reasons it would take a pretty severe problem for us to reintroduce a
known, must-patch security issue.</p>
</div>2015-03-16T16:17:54ZBy liam at unc edu on /blog/solaris/OmniOSUpgradeDifficultiestag:CSpace:blog/solaris/OmniOSUpgradeDifficulties:10a8ab99011e09002e885bc6295fefb6a1793740liam at unc edu<div class="wikitext"><p>This is standard system admin in many many places. Is your problem that you can't get maintenance windows from your customers? Or that your customers don't understand the concept of an emergency outage to rollback a change that causes problems?</p>
<p>One of the downsides of Linux over Solaris and AIX is the difficulty of providing a clean rollback to a system upgrade. If you are having issues with OmniOS which does have a rollback mechanism, why can't you just apply the practice you use for your Linux upgrades to your OmniOS boxes? Or is it that you create only 'cattle' with Linux, and leave OmniOS for 'pets'?</p>
</div>2015-03-16T13:28:35Z