2015-11-26
Some notes on Apache's suexec
We've recently been wrestling with suexec in an attempt to get it to do something that it seemed that suexec would do. As a result of that learning experience, I feel like writing down some things about suexec. You may wish to also read the official Apache documentation on suexec, but note that you may have to pay close attention to some of the things that it says (and a few things appear to be outright wrong).
Suexec has two modes:
- Running
/~<user>/...CGIs as the particular user involved. This needs no special extra configuration for suexec and simply just happens. Per-user CGIs must be located under a specific subdirectory in the user's Unix home directory, by defaultpublic_html; suexec documentation calls this subdirectory name the userdir. - Running CGIs for a virtual host as a particular user and group.
This must be configured with the
SuexecUserGroupdirective. All virtual host CGIs must be located under a specific top level directory, by default often/var/www; suexec documentation calls this directory the docroot.
(Suexec also does various ownership and permissions checks on the CGIs and the directory they are directly in. Those are beyond the scope of these notes.)
The first important thing here is that the suexec docroot and
userdir are not taken from the Apache DocumentRoot and UserDir
settings; instead, they're hard coded into suexec itself. Any time
that suexec logs errors like 'command not in docroot', the docroot
it means is not the Apache DocumentRoot you've configured. It
pretty much follows that if your Apache settings do not match the
hardcoded suexec settings, suexec will thumb its nose at you.
(Also, the only form of UserDir directive that will ever work
with suexec is 'UserDir somename'. You cannot use either 'UserDir
/some/dir' or 'UserDir /some/*/subdir' with suexec. The suexec
documentation notes this.)
The second important thing is that Apache and suexec explicitly
distinguish between the two modes based on the incoming request
itself, not the final paths involved, and these two modes are
exclusive. If you make a request for a CGI via a /~user/... URL,
the only thing that matters is if the eventual path is under the
user's home directory plus the suexec userdir. If you make a
request to a virtual host with a SuexecUserGroup directive, the
only thing that matters is if the eventual path is under the suexec
docroot. In particular, you cannot configure a virtual host for
a user, point its DocumentRoot to that user's userdir, and have
suexec run CGIs. This path would be perfectly acceptable if the
CGIs were invoked via /~user/... URLs, but when invoked for a plain
virtual host, suexec will reject these requests because the paths
aren't under its docroot.
(Mechanically, Apache prefixes the user name it passes to the suexec
binary with a ~ if it is a UserDir request. This is undocumented
behavior reverse engineered from the code, so you shouldn't count
on it.)
The third important thing is that suexec ignores symlinks in all
of this checking; it uses only the 'real' physical paths, after
symlinks have been traversed. As a result you cannot fool suexec
by, for example, putting symlinks to elsewhere under what it considers
its docroot. However it is fine for user /etc/passwd entries
to include symlinks (as we do); suexec will not
be upset by that.
Normally the suexec docroot and userdir are set when suexec
is compiled and are fixed afterwards, which obviously creates some
problems if you need something different. Debian and Ubuntu provide
a second version of suexec that can look these up at runtime from
a configuration file (this is the apache2-suexec-custom package).
Failing this, well, you'll be arranging (somehow) for all of your
virtual hosts to appear under /var/www (or at least all of the
ones that need CGIs).
(You can determine the userdir and docroot settings for your
suexec with 'suexec -V' as root. You want AP_DOC_ROOT and
AP_USERDIR_SUFFIX.)
Sidebar: what 'command not in docroot' really means
The suexec error 'command not in docroot' is actually generic and is used for both modes of requests. So what suexec means by 'docroot' here is either the actual docroot, for a virtual host request, or the user's home directory plus the userdir subdirectory, for a /~user/... request. Unfortunately you cannot tell from suexec's log messages whether it was invoked for what it thought was a user home directory request or for a virtual host request; that has to be obtained from the Apache logs.
The check is done by a simple brute force method: first, chdir()
to the CGI's directory and do a getcwd(). Then chdir() to either
the docroot or the user's home directory plus the userdir and
do another getcwd(). Compare the two directory paths and fail if
the first is not underneath the second. Because it uses getcwd(),
all symlinks involved in either path will wind up getting fully
expanded.
2015-11-16
The problems with creating a new template language
One reaction to my entry saying you shouldn't create new templating languages is to ask why this is so. My original entry was written from the perspective of someone who's actually done this so I just assumed that all of the problems with creating your own were obvious, but this is not necessarily the case. So let's run down the problems here.
When you create a new web templating system, you face a number of problems:
- You need to design the templating language. Language design is
hard and my own experience strongly suggests that you
don't want a (too) minimal design. Many of the design decisions
you make here will constrain your further steps and what can be
done with the templating system, often in ways that are not
necessarily obvious until later.
(A too complicated templating language has its own drawbacks as well, but there are tradeoffs and decisions that depend on the environment that the templating system will be used in. For example, are the people writing the templates also going to be coding the web app, so that you can move complexity from the templating language to the app itself, or are different groups doing each side of things?)
- You need to design the API for using the templating system; how
you specify and load templates, how you expand them, how you
provide data used during expansion, and so on. As part of this
you will face issues of what happens in various sorts of errors.
If your template language has loops or other potentially unbound
constructs, you're going to need to decide how you limit them (or
if you just let template coding errors cause template expansion
to run forever).
One issue that you will want to consider is how expansion strings will or won't be automatically escaped by the templating system. Not doing escaping at all has proven to be extremely dangerous, but at the same time inserting unescaped text is sometimes necessary. HTML has several different contexts that need different sorts of things escaped, then there's URLs, and you may also want your templating system to be useful for more than HTML.
- You need to actually write all of the code. I hope that by now
you see that this is much more sophisticated than just
printfexpansion of strings; we're talking about a full scale parser and interpreter of your language, which probably has conditions, loops, and so on. In the process of this (if you haven't already done so earlier), you're going to wind up dealing with character set conversion issues. - Once you've written the basic template handler there are a bunch
of efficiency issues that come into the picture. A good template
system does not reparse everything from scratch on every request,
which means both pre-parsing things and figuring out how to detect
when you need to reload changed template files off disk. Then
there's various sorts of caching for template expansion, and
perhaps you want some way to generate
ETagandLast-Modifiedinformation without running a full template expansion (and then often throwing away the result). Can you write out partial template results in chunks to hold down memory requirements, or do you have to fully generate even huge pages in memory before you can start output?(And there's the efficiency issue of simply making the code run fast. Profiling and performance tuning code takes work all by itself.)
You can write simpler templating systems that skip some or many of these considerations. Some of them are relatively unimportant at small scale (DWiki gets by without any sort of template preparsing) but others may cause you serious security problems if you neglect them. On top of that, there are any number of issues that have proven to be inobvious to people who are writing their first templating system. No matter what scale of templating system you're writing, you can expect to run into problems that you don't even initially recognize or realize that you have.
(I'm not even convinced I know how to design and write a good templating system, and I have the advantage of having done it once already.)
Using an existing templating system instead of writing your own has the great advantage that other people have already worried about and faced all of these issues. If you pick a good templating system, other people should have already invested all of the time and work to come up with good solutions (and they will continue to put effort into things like bug fixes and performance improvements). In fact they may well have solved problems you don't yet realize even exist.
All of that work saving is nice. But there's a deeper reason not to roll your own here:
You are probably not going to do as good a job as existing template systems do.
Writing a good templating system is hard work that takes a lot of specialized knowledge and skill. Unless you put a quite large amount of time into it, your new templating system is very likely to not be as nice as existing templating system. It will be incomplete and inefficient and limited and possibly buggy and problematic. This should not be surprising, since major templating systems have had a great deal of work put into them by a bunch of smart people. It would be really amazing if you could duplicate that all on your own in a relatively small amount of time.
(Of course you may have the advantage of writing a more focused and narrower templating system than those major templating systems, which tend to be quite general. My personal opinion is that you're probably not going to be making one that's narrow enough to make up all that ground.)
2015-11-11
No new web templating languages; use an existing one
Suppose, hypothetically, that you are creating a web application. Let's even suppose that it's a very small and simple one, almost an embarrassingly small one. As part of this app, you need a very little bit of something like a templating system. Not much, just a bit more than printing formatted strings. Clearly you have such a trivial situation that you can just bang together a tiny and simple mini-templating language, right?
Let me save you some time and effort: no. Don't do it. The reality is that we've reached a point in time where writing your own (web) templating language or system is basically guaranteed to be a mistake. I know, you have a trivial application and you don't want to take an external dependency, you hardly need anything, all of the existing templating systems are wrong or too heavyweight, there's a whole list of excuses. Don't accept them. Suck it up, take an external dependency, and use an existing templating system even if it's vast overkill for your problem. Your future self will thank you in a few years.
(I could almost go further than this and maybe I should, but that's another entry.)
All of this especially applies if you have an application that's needs more than a trivial templating system; I picked an extreme case because it's where the temptation can be strongest. Writing your own non-tiny templating system today is an especially masochistic exercise because even a basic one is a bunch of work and raises a moderate ton of questions that you're ill-equipped to answer (or even recognize) unless this is not your first templating system.
In hindsight, writing my own 'simple' templating system was one of the mistakes I made when I wrote DWiki (the code that powers Wandering Thoughts). It's been a very educational mistake, but unless you really want to do things the hard way for the experience I can't recommend it.
(Note that rolling your own is not a great learning experience unless you live with the result for a number of years, so that you have plenty of time to run into the lurking problems. Almost anything can look good if you write it, use it briefly, and then abandon it. Many of my painful lessons took years to smack me in the face.)
PS: This assumes that you aren't working in a new language where no one's written a decent templating system. If you are, I think that you should at least steal one of the battle-tested designs from good templating systems in other languages.
(Also, yes, a very few people have very special needs and have to write their own systems. They know who they are.)