What is going on with Samba's POSIX locking on NFS on Linux

October 6, 2010

Once we looked at the evidence from our Samba problem, it was relatively easy to come up with a broad theory of what was going wrong. All of the signs pointed to some sort of NFS-related locking clash where Samba was applying multiple locks that it (and the local kernel) did not think clashed with each other, but where the NFS server felt that they did clash and rejected one.

As far as I can see, this is indeed what is going on.

Current versions of Samba both flock() files and (if POSIX locking is enabled) apply fcntl() locks to them. This does not conflict for local files as the two sorts of locks are independent (and this is even documented in the flock(2) manpage).

In old versions of the Linux kernel, flock() didn't work at all over NFS; flock() locks were purely local (where they continued to not conflict with POSIX locks). In Linux 2.6.12, the Linux NFS client was changed to make flock() locks work over NFS by quietly converting flock() locks to server side POSIX locks; this conversion happens in the depth of the kernel NFS client, and the local kernel's general locking layer is unaware of it. Since two POSIX locks can obviously conflict with each other, this conversion means that from 2.6.12 onwards flock() locks now conflict with fcntl() locks on NFS filesystems.

Hence the common symptom of the problem: if you upgraded your system such that you crossed over the 2.6.12 version boundary, Samba's dual locking went from non-conflicting to conflicting (on NFS filesystems only) if the Windows client program made just the right sort of locking requests. Evidently Office 2003's 'save in Office 2003 format' code does so, and other programs do not.

(I believe that Samba takes a read flock() when clients open files, so from the symptoms it looks like Office 2003 was trying to acquire a write lock of some sort when you opened or saved the file. I can see why this was changed in subsequent versions of Office.)


Comments on this page:

From 12.192.123.196 at 2010-10-29 16:01:46:

Try adding the following line to the global section of your smb.conf:

nt acl support = no

Worked for us with the same problem you describe. It actually makes since do to the lack of support for acls in NFSv3. Perhaps they will get it to work with NFSv4.

Shawn shawn(dot)stephens(at)gmail(dot)com

By cks at 2010-11-01 14:56:24:

I've now tried this and sadly it doesn't help.

(Having read the Samba code, I'm not terribly surprised by this; my memory is that the flock() is unconditional and the fcntl() is conditional only on POSIX locking being on.)

From 68.149.54.23 at 2011-03-17 22:30:56:

Thank you very much for this information! I've been pulling my hair out for days, and my Google searches led me around in circles. :( I'm doing an upgrade of a server that used to work just fine, even with nfs mounted samba shares, and am having nothing but problems.

Initially it was a problem with roaming profiles failing with errors. I isolated that to a change in the default behavior of strict locking from false to true. Solved that by adding:

strict locking = no

Also learned that oplocks were a bad idea on an nfs mounted share, so I turned all that off:

oplocks = no kernel oplocks = no level2 oplocks = no

Not sure if this is still needed, but really, I don't like the idea of oplocks anyway and the files being shared at most of my sites aren't the type that could really benefit from this scheme anyway. Opening an office document shouldn't require locking the file - open it and check that the file hasn't changed before you save it back out makes far more sense for the majority of office documents out there and is way easier to manage then dealing with locks. It's only applications that share a file between multiple machines that need things like oplocks and these days, that's a bad software design. Way better to have a proper client/server design then relying on the filesystem to handle shared access to data files.

I also tried turning off acl support like this:

nt acl support = no

But that didn't solve my problem either. I was researching posix locking when I came across this site! Thanks again! I'll have to wait till morning to confirm this solves the problem, but already I feel good about it! It's the only locking option left, but besides that, you're explanation sounds dead on!

...Izzy

Written on 06 October 2010.
« Why people combine NFS with Samba servers
In universities, you often buy hardware when you can »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Oct 6 01:31:09 2010
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.