Chris's Wiki :: blog/linux/SSDRootDilemma Commentshttps://utcc.utoronto.ca/~cks/space/blog/linux/SSDRootDilemma?atomcommentsDWiki2015-12-02T06:53:34ZRecent comments in Chris's Wiki :: blog/linux/SSDRootDilemma.By Anon on /blog/linux/SSDRootDilemmatag:CSpace:blog/linux/SSDRootDilemma:d663fd1044762a8985306d2f26df09d74e7c5724Anon<div class="wikitext"><p>(Apologies for the previous double comment)</p>
<p>If you don't mind getting technical and your system is new enough you can the perf based technique described on <a href="http://www.brendangregg.com/blog/2014-12-31/linux-page-cache-hit-ratio.html">http://www.brendangregg.com/blog/2014-12-31/linux-page-cache-hit-ratio.html</a> to work out page cache hit rates. You might also be able to work out which parts of files are currently cached using pccstat (<a href="https://github.com/tobert/pcstat">https://github.com/tobert/pcstat</a> ).</p>
</div>2015-12-02T06:53:34ZBy Anon on /blog/linux/SSDRootDilemmatag:CSpace:blog/linux/SSDRootDilemma:6bc7140734c5a23fd7c985a923d7ef66d6633bf4Anon<div class="wikitext"><p>If you can't stay cutting edge I'd give bcache a miss for now... - <a href="http://thread.gmane.org/gmane.linux.kernel.bcache.devel/3097/focus=3098">http://thread.gmane.org/gmane.linux.kernel.bcache.devel/3097/focus=3098</a></p>
</div>2015-12-01T20:12:28ZBy Anon on /blog/linux/SSDRootDilemmatag:CSpace:blog/linux/SSDRootDilemma:21ce311f2f74df30c4fb26ed3f6e9b20a6a2732dAnon<div class="wikitext"><p>If you can't stay cutting edge I'd give bcache a miss for now... - <a href="http://news.gmane.org/gmane.linux.kernel.bcache.devel">http://news.gmane.org/gmane.linux.kernel.bcache.devel</a></p>
</div>2015-12-01T20:11:46ZBy Chris Siebenmann on /blog/linux/SSDRootDilemmatag:CSpace:blog/linux/SSDRootDilemma:caa07d60681237b6fc7a2ff092e41788dc2c7cf2Chris Siebenmann<div class="wikitext"><p>The other issue with an L2ARC is that an L2ARC eats ZFS ARC
memory in order to store the L2ARC metadata in a way that a
native SSD pool does not. See eg <a href="https://www.mail-archive.com/zfs-discuss@opensolaris.org/msg34674.html">this message from Richard Elling</a>.
A back of the envelope calculation suggests that a 250 GB L2ARC could
easily take a GB or more of RAM for metadata.</p>
<p>(On the other hand, I currently have 49.6 GB in my L2ARC and I'm
using ~76 MB of RAM for L2ARC headers. This is the l2_hdr_size
stat in <code>/proc/spl/kstat/zfs/arcstats</code> for ZFS on Linux.)</p>
<p><a href="http://dotfiles.tnetconsulting.net/home.html">Grant</a>: My current
root filesystem is ext3 (on a software RAID mirror) with no special
tuning. I don't know if ext3 does anything particularly special on
SSDs.</p>
<p>(I'm deliberately conservative with my root filesystem for obvious
reasons, and this is an old root filesystem anyways. I'd probably make
a new one a native ext4 filesystem but not change anything else.)</p>
</div>2015-12-01T15:57:04ZBy Chris Siebenmann on /blog/linux/SSDRootDilemmatag:CSpace:blog/linux/SSDRootDilemma:a7e115ccf014cc256b8886f55fd48554d818354cChris Siebenmann<div class="wikitext"><p>What I have in ZFS filesystems is things like my home directory and all of
the source code that I build things from (and the build areas where the
object files go and so on). I may also put some virtual machine images
on the SSD, depending on how much disk space I wind up with. I believe
that all of these are significantly 'hotter' than the root filesystem
and the collective working set exceeds my RAM, although I may be wrong.</p>
<p>(Building software also involves writing things to disk, which SSDs
help with.)</p>
<p>I already have an L2ARC attached to the pool, but it's only a 60 GB SSD. I
suppose a simple first step would actually be to swap that out for one
of the 250 GB SSDs and see what happens. If I'm daring I could partition
the disk as a split L2ARC/ZIL and experiment with how well it performs.
This is of course less exciting than a ZFS pool on the SSDs, but it's
a lot easier to set up.</p>
<p>(The one annoying thing about using a L2ARC instead of an actual pool is
that L2ARC is not persistent over reboots, which means a long process of
reloading the L2ARC every time you reboot. Persistent L2ARC is a feature
that's coming, well, sometime.)</p>
</div>2015-12-01T14:56:04ZBy Alan on /blog/linux/SSDRootDilemmatag:CSpace:blog/linux/SSDRootDilemma:661243fe47b82ae7357c1f37f29c7b7f5b938d62Alan<div class="wikitext"><p>You've mentioned bloating / (or /var) by caching RPMs. That's a good example of something that could live on a hard drive. In theory one can use bind mounts to play more fine-grained games here. Personally I found they're quite annoying, e.g. they pollute the output of df. dpkg is explicitly happy for symlinks to be used in this case instead. Unfortunately I don't know if it's supported in rpm / Fedora; I just know some people are doing it anyway.</p>
<p>---</p>
<p>Systemd does now supports separate /usr, and mounts it in the initramfs. Looks like it was a nasty transition :( finally fixed in <a href="https://bugzilla.redhat.com/show_bug.cgi?id=815264">F18</a>. I'm sure they don't consider it deprecated, because it's useful for the work on <a href="http://0pointer.net/blog/projects/stateless.html">"stateless systems" etc.</a> (Lennart blog post). Given how powerful that solution is I don't think systemd has any problem with separate /var.</p>
<p>The /var issue you mentioned on Fedora seems unrelated to systemd. It's an issue with yum upgrade from a live cd style upgrader (and neglecting to mount that fs). It was considered a bug and (eventually?) a workaround provided. Fedora upgrades now run on the host (as a special boot target), starting with fedup and now dnf-plugin-system-upgrade. So all OS filesystems would be mounted, including /var.</p>
<p>The /var/run insanity you debugged on 2006 Ubuntu is clearly pre-systemd. Hopefully the new /run has sorted all that out. (It solved a real problem as everyone was using random tmpfs's like /dev/.mystuff before / becomes writable).</p>
<p>I understand this is not necessarily reassuring enough to try it again :). Personally I like the initramfs mounting, merged /usr, new /run, and the organisation for "stateless systems". But I've never had to deal with separate /usr or /var. I agree separate /usr partitions are currently unloved (I certainly don't have any use for one). I would really have expected separate /var to work properly, but it sounds like distros aren't always getting it right.</p>
<p>The bright side is that where/when systemd gets it right, that applies to all distros :).</p>
</div>2015-12-01T13:53:20ZBy The Col on /blog/linux/SSDRootDilemmatag:CSpace:blog/linux/SSDRootDilemma:47e6233cd9e427bc52fb08585610618b46e86f0bThe Col<div class="wikitext"><p>I think that this is an absolute No Brain-er. For the amount of Logs that Linux writes to the disk and the amount of IO generated... Why wouldn't you want this running as fast as possible. I have not seen a HDD outperform a SSD in ages. The only possible exception would be for sensor monitoring, where you get allot of little writes sequentially. </p>
<p>The only caveat would be if you are using a HW raid, how much overhead that raid controller places on your mirrored pair (assuming raid 1). </p>
<p>But if you are in any doubt. Do an iostat on your OS and see how much disk activity there is with no workload and how much disk activity there is under load.</p>
</div>2015-12-01T06:14:34ZBy Grant on /blog/linux/SSDRootDilemmatag:CSpace:blog/linux/SSDRootDilemma:8cb7a752bbbfc1bd94b2e753e59f2d559960e8edGranthttp://dotfiles.tnetconsulting.net/<div class="wikitext"><p>Have you considered using the SSDs as ZIL / SLOG (?) for ZFS? I think you would get a LOT of benefit from the SSDs for all of your ZFS pool if you did that.</p>
<p>(I'm relatively new to ZFS on Linux, so I may have the incorrect terms, as I've not done this yet myself.)</p>
<p>I did not realize that you were running ZFS on Linux until the end of the article. What file system(s) are you using for your root? Do they have any FS specific methods to benefit from the SSD?</p>
</div>2015-12-01T05:36:51ZBy Ewen McNeill on /blog/linux/SSDRootDilemmatag:CSpace:blog/linux/SSDRootDilemma:f05b6dd167467ea58c0a589b4e331434ee3f09c3Ewen McNeill<div class="wikitext"><p>I'm struggling to imagine what you have in your ZFS filesystem(s), outside your root disk (including /usr and /var), which is more performance sensitive/intensive than anything on your root disk <em>and</em> would not also benefit from the "well I have 32GB of RAM" rationale that you are considering to <em>not</em> put your root on SSD.</p>
<p>SSDs basically offer two performance boosts:</p>
<p>1. random reads are significantly faster (no waiting on mechanics)</p>
<p>2. writes flush to disk significantly faster</p>
<p>The first (random reads) is basically relevant at boot time, and any time you access something that is not already in your cache. Given a large enough cache, and infrequent enough reboots it should tend towards "relevant at boot time" after a while. So if your RAM is larger than the union of your maximum working set, after N days this may be irrelevant. But the getting there might take a while (at least "run all the usual applications in all the usual ways with all the usual data"). </p>
<p>The second (writes commit faster) is relevant any time anything is waiting on fsync before proceeding. Which is surprisingly often for a wide range of tasks (database-like tasks being an obvious one, but by no means the only one -- eg, even writing from an editor will typically fsync() the file for safety).</p>
<p>If it were me, my null hypothesis would be "mirror SSDs with MD to provide 80GB root disk, and put root on there", and then give the remainder of each disk to ZFS. AFAICT (eg, <a href="http://open-zfs.org/wiki/Performance_tuning">this ZFS performance FAQ</a>) the main considerations for ZFS on partitions are making sure they're erase-block-aligned (which you <em>also want to do with MD RAID and extN/xfs filesystems</em>), and that ZFS won't turn on write caching within the device (which should matter much less with a SSD). </p>
<p>Possibly you'd just want to give ZFS the remainder of the SSD for the ZIL and/or L2ARC, and leave the (presumably larger) underlying data on the existing disks. In which case the capacity loss should be less of a concern. (IIRC ZFS will acknowledge writes as "on stable storage" once they hit the ZIL and/or L2ARC so both of those on SSD should give you the write boost of the SSD; and the L2ARC will cache more data for reading, and for longer, than RAM -- including over reboots.)</p>
<p>Ewen</p>
<p>PS: About the only thing I can think of is if you have a large database on your ZFS now which is sufficiently large and performance critical to justify a dedicated SSD set. But you seem to be talking about a desktop rather than a transaction server...</p>
<p>PPS: It seems to be generally accepted now that if you purchase a reasonable quality SSD and don't constantly thrash it with writes (eg, busy database/logging) you can expect it to last at least as long as the warranty -- probably much longer if it's lightly written to (eg, desktop rather than server). For the write-intensive case the rationale seems to be to treat the SSD like "racing tires" which are known to wear out quicker, but worth it for the performance; most modern SSDs can give some estimate of write lifetime left -- which is basically a function of reserve capacity left for when blocks reach their write limits. (And larger SSDs benefit from having proportionally less writes for the same write traffic -- eg, double the disk size with the same writes, and you're writing half as many "full disk writes" per period.)</p>
</div>2015-12-01T05:36:20Z