2017-03-29
The work of safely raising our local /etc/group
line length limit
My department has now been running our Unix computing environment
for a very long time (which has some interesting consequences). When you run a Unix environment over the
long term, old historical practices slowly build up and get carried
forward from generation to generation of the overall system, because
you've probably never restarted everything from complete scratch.
All of this is an elaborate route to say that as part of our local
password propagation infrastructure, we
have a program that checks /etc/passwd
and /etc/group
to make
sure they look good, and this program puts a 512 byte limit on the
size of lines in /etc/group
. If it finds a group line longer than
that, it complains and aborts and you get to fix it.
(Don't ask what our workaround is for groups with large memberships. I'll just say that it raises some philosophical and practical questions about what group membership means.)
We would like to remove this limit; it makes our life more complicated
in a number of ways, causes problems periodically, and we're pretty
sure that it's no longer needed and probably hasn't been needed for
years. So we should just take that bit of code out, or at least
change the '> 512
' to '> 4096
', right?
Not so fast, please. We're pretty sure that doing so is harmless,
but we're not certain. And we would like to not blow up some part
of our local environment by mistake if it turns out that actually
there is still something around here that has heartburn on long
/etc/group
lines. So in order to remove the limit we need to
test to make sure everything still works, and one of the things that
this has meant is sitting down and trying to think of all of the
places in our environment where something could go wrong with a
long group line. It's turned out that there were a number of these
places:
- Linux could fail to properly recognize group membership for people
in long groups. I rated this as unlikely, since the glibc people
are good at avoiding small limits and relatively long group lines
are an obvious thing to think about.
- OmniOS on our fileservers could
fail to recognize group membership. Probably unlikely too; the days
when people put 512-byte buffers or the like into
getgrent()
and friends are likely to be long over by now.(Hopefully those days were long over by, say, 2000.)
- Our Samba server might do something special with group handling and
so fail to properly deal with a long group, causing it to think that
someone wasn't a member or deny them access to group-protected file.
- The tools we use to build an Apache format group file
from our
/etc/group
could blow up on long lines. I thought that this was unlikely too;awk
andsed
and so on generally don't have line length limitations these days.(They did in the past, in one form or another, which is probably part of why we had this
/etc/group
line length check in the first place.) - Apache's own group authorization checking could fail on long
lines, either completely or just for logins at the end of the
line.
- Even if they handled regular group membership fine, perhaps our OmniOS fileservers would have a problem with NFS permission checks if you were in more than 16 groups and one of your extra groups was a long group, because this case causes the NFS server to do some additional group handling. I thought this was unlikely, since the code should be using standard OmniOS C library routines and I would have verified that those worked already, but given how important NFS permissions are for our users I felt I had to be sure.
(I was already confident that our local tools that dealt with
/etc/group
would have no problems; for the most part they're
written in Python and so don't have any particular line length or
field count limitations.)
It's probably worth explicitly testing Linux tools like useradd
and groupadd
to make sure that they have no problems manipulating
group membership in the presence of long /etc/group
lines. I can't
imagine them failing (just as I didn't expect the C library to have
any problems), but that just means it would be really embarrassing
if they turned out to have some issue and I hadn't checked.
All of this goes to show that getting rid of bits of the past can
be much more work and hassle than you'd like. And it's not particularly
interesting work, either; it's all dotting i's and crossing t's
just in case, testing things that you fully expect to just work
(and that have just worked so far). But we've got to do this sometime,
or we'll spend another decade with /etc/group
lines limited to
512 bytes or less.
(System administration life is often not particularly exciting.)