Sometimes there are drawbacks to replicating configuration files

November 14, 2014

This is a war story, but not my war story; this is all my coworkers' work.

Writing a working Samba configuration is a lot of painful work. There are many options, many of them interact with clients in odd and weird ways, and the whole thing often feels like a delicately balanced house of cards. As a result we we have a configuration that we've painstakingly evolved over the many years that we've been using Samba. When we needed a second Samba server dedicated to a particular group but still using our NFS fileservers we of course copied the file, changed the server name, and used it as is. Starting from scratch would have been crazy; our configuration is battle-tested and we know it works.

We recently built out an infrastructure for cheap bulk storage, originally intended for system backups; the core idea is that people buy some number of disks, give them to us, and we make them accessible via Time Machine (for Macs) and Samba (for Windows). Of course we set up this machine's Samba using our master Samba configuration (again with server names changed, and this time around with a lot of things taken out because eg this server doesn't support printing). Recently we discovered that Samba write performance on this server was absolutely and utterly terrible (we're taking in the kilobytes or very small megabytes a second range). My coworkers chased all sorts of worrysome potential causes and wound up finding it in our standard smb.conf, which had the following lines:

# prevent Windows clients copying files to
# full disks without warning. This can lead
# to data loss.

strict sync = yes
sync always = yes

Surprisingly, when you tell your Samba server to fsync() your writes all the time your write performance on local disks turns out to be terrible. Performance was okay on our main Samba servers for complex reasons involving our NFS servers.

The comment explains the situation we ran into fairly well; Windows clients copying files from the local disk to a Samba disk could run out of space on the filesystem backing the Samba disk, have the write fail, not notice, and delete the local file because it 'copied'. That was very bad. Forcing syncs flushed the writes from the Samba server to the NFS fileserver and guaranteed that if the fileserver accepted them there was space in the filesystem (and conversely that if you were out of space the Samba server knew before it replied to the client). All of this is perfectly rational; we ran into a Samba issue, found some configuration options that fixed it, put them in, and even documented them.

(Maybe there are other configuration options that would have fixed this problem and maybe this problem is not an issue any more on current versions of Samba and everything else in our environment, but remember what I said about us not rewriting Samba configuration files because they're a house of cards.)

This whole thing is a nice illustration of the downside of replicating configuration files when you're setting up new services. Not starting from scratch is a lot faster and may well save you a lot of painful bad experiences, but it can let things slip through that have unpleasant side effects in a new environment. And it's not like you can really avoid this problem without starting from scratch; going through to question and re-validate every configuration setting is almost certainly too mind-numbing to work. Plus there's no guarantee that even a thorough inspection would have caught this issue, since the setting looks perfectly rational unless you've got the advantage of hindsight.

Written on 14 November 2014.
« I want opportunistic, identity-less encryption on the Internet
Our current problems with 10G Intel networking on OmniOS »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri Nov 14 00:54:21 2014
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.