Wandering Thoughts archives

2015-11-26

Some notes on Apache's suexec

We've recently been wrestling with suexec in an attempt to get it to do something that it seemed that suexec would do. As a result of that learning experience, I feel like writing down some things about suexec. You may wish to also read the official Apache documentation on suexec, but note that you may have to pay close attention to some of the things that it says (and a few things appear to be outright wrong).

Suexec has two modes:

  1. Running /~<user>/... CGIs as the particular user involved. This needs no special extra configuration for suexec and simply just happens. Per-user CGIs must be located under a specific subdirectory in the user's Unix home directory, by default public_html; suexec documentation calls this subdirectory name the userdir.

  2. Running CGIs for a virtual host as a particular user and group. This must be configured with the SuexecUserGroup directive. All virtual host CGIs must be located under a specific top level directory, by default often /var/www; suexec documentation calls this directory the docroot.

(Suexec also does various ownership and permissions checks on the CGIs and the directory they are directly in. Those are beyond the scope of these notes.)

The first important thing here is that the suexec docroot and userdir are not taken from the Apache DocumentRoot and UserDir settings; instead, they're hard coded into suexec itself. Any time that suexec logs errors like 'command not in docroot', the docroot it means is not the Apache DocumentRoot you've configured. It pretty much follows that if your Apache settings do not match the hardcoded suexec settings, suexec will thumb its nose at you.

(Also, the only form of UserDir directive that will ever work with suexec is 'UserDir somename'. You cannot use either 'UserDir /some/dir' or 'UserDir /some/*/subdir' with suexec. The suexec documentation notes this.)

The second important thing is that Apache and suexec explicitly distinguish between the two modes based on the incoming request itself, not the final paths involved, and these two modes are exclusive. If you make a request for a CGI via a /~user/... URL, the only thing that matters is if the eventual path is under the user's home directory plus the suexec userdir. If you make a request to a virtual host with a SuexecUserGroup directive, the only thing that matters is if the eventual path is under the suexec docroot. In particular, you cannot configure a virtual host for a user, point its DocumentRoot to that user's userdir, and have suexec run CGIs. This path would be perfectly acceptable if the CGIs were invoked via /~user/... URLs, but when invoked for a plain virtual host, suexec will reject these requests because the paths aren't under its docroot.

(Mechanically, Apache prefixes the user name it passes to the suexec binary with a ~ if it is a UserDir request. This is undocumented behavior reverse engineered from the code, so you shouldn't count on it.)

The third important thing is that suexec ignores symlinks in all of this checking; it uses only the 'real' physical paths, after symlinks have been traversed. As a result you cannot fool suexec by, for example, putting symlinks to elsewhere under what it considers its docroot. However it is fine for user /etc/passwd entries to include symlinks (as we do); suexec will not be upset by that.

Normally the suexec docroot and userdir are set when suexec is compiled and are fixed afterwards, which obviously creates some problems if you need something different. Debian and Ubuntu provide a second version of suexec that can look these up at runtime from a configuration file (this is the apache2-suexec-custom package). Failing this, well, you'll be arranging (somehow) for all of your virtual hosts to appear under /var/www (or at least all of the ones that need CGIs).

(You can determine the userdir and docroot settings for your suexec with 'suexec -V' as root. You want AP_DOC_ROOT and AP_USERDIR_SUFFIX.)

Sidebar: what 'command not in docroot' really means

The suexec error 'command not in docroot' is actually generic and is used for both modes of requests. So what suexec means by 'docroot' here is either the actual docroot, for a virtual host request, or the user's home directory plus the userdir subdirectory, for a /~user/... request. Unfortunately you cannot tell from suexec's log messages whether it was invoked for what it thought was a user home directory request or for a virtual host request; that has to be obtained from the Apache logs.

The check is done by a simple brute force method: first, chdir() to the CGI's directory and do a getcwd(). Then chdir() to either the docroot or the user's home directory plus the userdir and do another getcwd(). Compare the two directory paths and fail if the first is not underneath the second. Because it uses getcwd(), all symlinks involved in either path will wind up getting fully expanded.

ApacheSuexecNotes written at 01:02:43; Add Comment

2015-11-16

The problems with creating a new template language

One reaction to my entry saying you shouldn't create new templating languages is to ask why this is so. My original entry was written from the perspective of someone who's actually done this so I just assumed that all of the problems with creating your own were obvious, but this is not necessarily the case. So let's run down the problems here.

When you create a new web templating system, you face a number of problems:

  • You need to design the templating language. Language design is hard and my own experience strongly suggests that you don't want a (too) minimal design. Many of the design decisions you make here will constrain your further steps and what can be done with the templating system, often in ways that are not necessarily obvious until later.

    (A too complicated templating language has its own drawbacks as well, but there are tradeoffs and decisions that depend on the environment that the templating system will be used in. For example, are the people writing the templates also going to be coding the web app, so that you can move complexity from the templating language to the app itself, or are different groups doing each side of things?)

  • You need to design the API for using the templating system; how you specify and load templates, how you expand them, how you provide data used during expansion, and so on. As part of this you will face issues of what happens in various sorts of errors. If your template language has loops or other potentially unbound constructs, you're going to need to decide how you limit them (or if you just let template coding errors cause template expansion to run forever).

    One issue that you will want to consider is how expansion strings will or won't be automatically escaped by the templating system. Not doing escaping at all has proven to be extremely dangerous, but at the same time inserting unescaped text is sometimes necessary. HTML has several different contexts that need different sorts of things escaped, then there's URLs, and you may also want your templating system to be useful for more than HTML.

  • You need to actually write all of the code. I hope that by now you see that this is much more sophisticated than just printf expansion of strings; we're talking about a full scale parser and interpreter of your language, which probably has conditions, loops, and so on. In the process of this (if you haven't already done so earlier), you're going to wind up dealing with character set conversion issues.

  • Once you've written the basic template handler there are a bunch of efficiency issues that come into the picture. A good template system does not reparse everything from scratch on every request, which means both pre-parsing things and figuring out how to detect when you need to reload changed template files off disk. Then there's various sorts of caching for template expansion, and perhaps you want some way to generate ETag and Last-Modified information without running a full template expansion (and then often throwing away the result). Can you write out partial template results in chunks to hold down memory requirements, or do you have to fully generate even huge pages in memory before you can start output?

    (And there's the efficiency issue of simply making the code run fast. Profiling and performance tuning code takes work all by itself.)

You can write simpler templating systems that skip some or many of these considerations. Some of them are relatively unimportant at small scale (DWiki gets by without any sort of template preparsing) but others may cause you serious security problems if you neglect them. On top of that, there are any number of issues that have proven to be inobvious to people who are writing their first templating system. No matter what scale of templating system you're writing, you can expect to run into problems that you don't even initially recognize or realize that you have.

(I'm not even convinced I know how to design and write a good templating system, and I have the advantage of having done it once already.)

Using an existing templating system instead of writing your own has the great advantage that other people have already worried about and faced all of these issues. If you pick a good templating system, other people should have already invested all of the time and work to come up with good solutions (and they will continue to put effort into things like bug fixes and performance improvements). In fact they may well have solved problems you don't yet realize even exist.

All of that work saving is nice. But there's a deeper reason not to roll your own here:

You are probably not going to do as good a job as existing template systems do.

Writing a good templating system is hard work that takes a lot of specialized knowledge and skill. Unless you put a quite large amount of time into it, your new templating system is very likely to not be as nice as existing templating system. It will be incomplete and inefficient and limited and possibly buggy and problematic. This should not be surprising, since major templating systems have had a great deal of work put into them by a bunch of smart people. It would be really amazing if you could duplicate that all on your own in a relatively small amount of time.

(Of course you may have the advantage of writing a more focused and narrower templating system than those major templating systems, which tend to be quite general. My personal opinion is that you're probably not going to be making one that's narrow enough to make up all that ground.)

TemplateLanguageProblems written at 01:34:32; Add Comment

2015-11-11

No new web templating languages; use an existing one

Suppose, hypothetically, that you are creating a web application. Let's even suppose that it's a very small and simple one, almost an embarrassingly small one. As part of this app, you need a very little bit of something like a templating system. Not much, just a bit more than printing formatted strings. Clearly you have such a trivial situation that you can just bang together a tiny and simple mini-templating language, right?

Let me save you some time and effort: no. Don't do it. The reality is that we've reached a point in time where writing your own (web) templating language or system is basically guaranteed to be a mistake. I know, you have a trivial application and you don't want to take an external dependency, you hardly need anything, all of the existing templating systems are wrong or too heavyweight, there's a whole list of excuses. Don't accept them. Suck it up, take an external dependency, and use an existing templating system even if it's vast overkill for your problem. Your future self will thank you in a few years.

(I could almost go further than this and maybe I should, but that's another entry.)

All of this especially applies if you have an application that's needs more than a trivial templating system; I picked an extreme case because it's where the temptation can be strongest. Writing your own non-tiny templating system today is an especially masochistic exercise because even a basic one is a bunch of work and raises a moderate ton of questions that you're ill-equipped to answer (or even recognize) unless this is not your first templating system.

In hindsight, writing my own 'simple' templating system was one of the mistakes I made when I wrote DWiki (the code that powers Wandering Thoughts). It's been a very educational mistake, but unless you really want to do things the hard way for the experience I can't recommend it.

(Note that rolling your own is not a great learning experience unless you live with the result for a number of years, so that you have plenty of time to run into the lurking problems. Almost anything can look good if you write it, use it briefly, and then abandon it. Many of my painful lessons took years to smack me in the face.)

PS: This assumes that you aren't working in a new language where no one's written a decent templating system. If you are, I think that you should at least steal one of the battle-tested designs from good templating systems in other languages.

(Also, yes, a very few people have very special needs and have to write their own systems. They know who they are.)

NoNewTemplateLanguages written at 01:13:33; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.