2014-06-25
A retrospective on our Solaris ZFS-based NFS fileservers (part 1)
We're in the slow process of replacing our original Solaris ZFS fileserver environment with a second generation environment. With our current fileservers enter their sunset period it's a good time to take an uncommon retrospective look back over their six years of operation and talk about what went well and what didn't quite do so. Today I'm going to lead with the good stuff about our Solaris machines.
(I'm actually a bit surprised that it's been six years, but that's what the dates say. I wrote the fileservers up in October of 2008 and they'd already been in operation for several months at that point.)
The headline result is that our fileserver environment has worked great overall. We've had six years of service with very little disruption and no data loss. We've had many disks die, we've had entire iSCSI backends fail, and through it all ZFS and everything else has kept trucking along. This is actually well above my expectations six years ago, when I had a very low view of ZFS's long-term reliability and expected to someday lose a pool to ZFS corruption over the lifetime of our fileservers.
The basics of ZFS have been great and using ZFS has been a significant advantage for us. From my perspective, the two big wins with ZFS have been flexible space management for actual filesystems and ZFS checksums and scrubs, which have saved us in ways large and small. Flexible space management has sometimes been hard to explain to people in a way that they really get, but it's been very nice to simply be able to make filesystems for logical reasons and not have to ask people to pre-plan how much space they get; they can use as little or more or less as much as they need.
Solaris in general and Solaris NFS in particular has been solid in normal use and we haven't seen any performance issues. We used to have some mysterious NFS mount permission issues (where a filesystem wouldn't mount or work on some systems) but they haven't cropped up on our systems for a few years from what I remember. Our Solaris 10 update 8 installs may not be the most featureful or up to date systems but in general they've given us no problems; they just sit in their racks and run and run and run (much like the iSCSI backends). I think it says good things that they reached over 650 days of uptime recently before we decided to reboot them as a sort of precaution after one crashed mysteriously.
Okay, I'll admit it: Solaris has not been completely and utterly rock solid for us. We've had one fileserver that just doesn't seem to like life, for reasons that we're not sure about; it is far more sensitive to disk errors and it's locked up several times over the years. Since we've replaced the hardware and reinstalled the software, my vague theory is that it's something to do with either or both of the NFS load it gets or the disks it's dealing with (it has most of our flaky 1TB Seagate disks, which fail at rates far higher than the other drives).
One Solaris feature deserves special mention. DTrace (and with it Solaris source code) turned out to be a serious advantage and very close to essential for solving an important performance problem we had. We might have eventually found our issue without DTrace but I'm pretty sure DTrace made it faster, and DTrace has also given us useful monitoring tools in general. I've come around to considering DTrace an important feature and I'm glad I get to keep it in our second generation environment (which will be using OmniOS on the fileservers).
I guess the overall summary is that for six years, our Solaris ZFS-based NFS fileservers have been boring almost all of the time; they work and they don't cause problems, even when crazy things happen. This has been especially true for the last several years, ie after we shook out the initial problems and got used to what to do and not to do.
(We probably could have made our lives more exciting for a while by upgrading past Solaris 10 update 8 but we never saw any reason to do that. After all, the machines worked fine with S10U8.)
That isn't to say that Solaris has been completely without problems and that everything has worked out for us as we planned. But that's for another entry (this one is already long enough).
Update: in the initial version of this entry I completely forgot to mention that the Solaris iSCSI initiator (the client) has been problem free for us (and it's obviously a vital part of the fileserver environment). There are weird corner cases but those happen anywhere and everywhere.
How my new responsive design here works
Encouraged by the commentators (and their suggestions) on my earlier entry about responsive design here, I sat down and banged out some CSS and revised my markup. Since I went through a bunch of iterations (many of them not working) to get my current results, I want to write down everything before I forget how it all works and why I needed to do things the way I have.
Following Aristotle Pagaltzis's suggestion,
the core styling is done with 'display: table...
' settings on
<div>
s. The div tree looks like this (roughly):
<div class="wtblog"> <div class="maintext"> ... left column contents ... </div> <div class="sidebar"> ... sidebar contents ... </div> </div>
In the normal CSS rules, wtblog is set to display: table
while
the other two are set to display: table-cell
with their widths
set to 76% and 24% respectively. This creates an implicit table row
and stacks them up side by side with most of the space given to
the main content. The table-* display styles seem well supported on anything I really care about (IE 7
users are out of luck, though). This is basically exactly the structure
I used to create via actual <table>, <tr>, and <td> elements. The
initial rewrite to this form was pretty much easy and painless.
My first CSS attempt to transform this into a minimized version
with the sidebar below the main content was too clever. In my media
qualifier rules I reset each column to 'display: table-row
' in
order to get them to stack on top of each other, which worked but
had the problem that display: table-row
entities can't have borders
and I wanted to set a top border on the sidebar. This caused me to
go through several iterations of inventing extra <div>s so that I
would have something to make into a display: table-cell
<div>
inside the table-row <div>.
After a while I came to my senses and realized the straightforward,
obvious solution: plain 'display: block
' <div>s already stack on
top of each other. So now the minimized version resets all three
<div>s to be 'display: block; width: auto;
' (in addition to
tinkering with margins, borders, and various other things). This
just works.
I did go through some amount of pain finding a @media
query that
would work on the iPad Mini, not just in a desktop browser when it
was narrowed. After some fiddling I made it work by checking against
max-device-width
as well as plain max-width
(which is what the
browsers are happy with). I also have a really iPad Mini specific rule
to increase the font sizes some as well; I aimed for something that
would make my content look much like the 'readability' view you can get
in the iPad browser.
While I was fiddling around with my CSS I also set up a maximum width so that people with giant browsers on giant screens don't get text that sprawls all over the place. The maximum width is probably still too wide for good readability, but I don't know what the right maximum width is considered to be (casual web searches did not help answer this question).
Because I'm lazy and not crazy I specified almost all of my limits and sizes in ems so I didn't have to care about font sizes. In fact I think this works best; someone who has really increased their font size because they find it more readable doesn't magically want to read fewer words in a line than normal. Unfortunately not everything has sensible default font sizes, especially the iPad Mini.
(In writing this entry I've discovered that CSS has added all sorts
of exciting new sizing units since I last looked at it quite a lot
of years ago. Possibly I will use some of them in my CSS at some
point, once I understand things like rem
and vw
better.)
The whole experience has been a lot less painful than I expected it to be. Dealing with the iPad Mini's peculiarities was annoying and involved a lot of experimentation with things that didn't work, but apart from that things went pretty smoothly. I ran into one CSS quirk but it's documented, more or less, and I think it existed even in the <table> version of my layout.
(The quirk is that almost all of the ways you might think of to
move the first line of one table cell down relative to the first
line of the other table cell don't work. They either don't do
anything or they move both columns down at once. The solution is
to explicitly set 'vertical-align: top;
' in the table cell you
want to offset; then things like padding will start working.)