Wandering Thoughts archives

2017-03-29

The work of safely raising our local /etc/group line length limit

My department has now been running our Unix computing environment for a very long time (which has some interesting consequences). When you run a Unix environment over the long term, old historical practices slowly build up and get carried forward from generation to generation of the overall system, because you've probably never restarted everything from complete scratch. All of this is an elaborate route to say that as part of our local password propagation infrastructure, we have a program that checks /etc/passwd and /etc/group to make sure they look good, and this program puts a 512 byte limit on the size of lines in /etc/group. If it finds a group line longer than that, it complains and aborts and you get to fix it.

(Don't ask what our workaround is for groups with large memberships. I'll just say that it raises some philosophical and practical questions about what group membership means.)

We would like to remove this limit; it makes our life more complicated in a number of ways, causes problems periodically, and we're pretty sure that it's no longer needed and probably hasn't been needed for years. So we should just take that bit of code out, or at least change the '> 512' to '> 4096', right?

Not so fast, please. We're pretty sure that doing so is harmless, but we're not certain. And we would like to not blow up some part of our local environment by mistake if it turns out that actually there is still something around here that has heartburn on long /etc/group lines. So in order to remove the limit we need to test to make sure everything still works, and one of the things that this has meant is sitting down and trying to think of all of the places in our environment where something could go wrong with a long group line. It's turned out that there were a number of these places:

  • Linux could fail to properly recognize group membership for people in long groups. I rated this as unlikely, since the glibc people are good at avoiding small limits and relatively long group lines are an obvious thing to think about.

  • OmniOS on our fileservers could fail to recognize group membership. Probably unlikely too; the days when people put 512-byte buffers or the like into getgrent() and friends are likely to be long over by now.

    (Hopefully those days were long over by, say, 2000.)

  • Our Samba server might do something special with group handling and so fail to properly deal with a long group, causing it to think that someone wasn't a member or deny them access to group-protected file.

  • The tools we use to build an Apache format group file from our /etc/group could blow up on long lines. I thought that this was unlikely too; awk and sed and so on generally don't have line length limitations these days.

    (They did in the past, in one form or another, which is probably part of why we had this /etc/group line length check in the first place.)

  • Apache's own group authorization checking could fail on long lines, either completely or just for logins at the end of the line.

  • Even if they handled regular group membership fine, perhaps our OmniOS fileservers would have a problem with NFS permission checks if you were in more than 16 groups and one of your extra groups was a long group, because this case causes the NFS server to do some additional group handling. I thought this was unlikely, since the code should be using standard OmniOS C library routines and I would have verified that those worked already, but given how important NFS permissions are for our users I felt I had to be sure.

(I was already confident that our local tools that dealt with /etc/group would have no problems; for the most part they're written in Python and so don't have any particular line length or field count limitations.)

It's probably worth explicitly testing Linux tools like useradd and groupadd to make sure that they have no problems manipulating group membership in the presence of long /etc/group lines. I can't imagine them failing (just as I didn't expect the C library to have any problems), but that just means it would be really embarrassing if they turned out to have some issue and I hadn't checked.

All of this goes to show that getting rid of bits of the past can be much more work and hassle than you'd like. And it's not particularly interesting work, either; it's all dotting i's and crossing t's just in case, testing things that you fully expect to just work (and that have just worked so far). But we've got to do this sometime, or we'll spend another decade with /etc/group lines limited to 512 bytes or less.

(System administration life is often not particularly exciting.)

sysadmin/GroupSizeIncreaseWorries written at 01:57:29;


Page tools: See As Normal.
Search:
Login: Password:

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.