Wandering Thoughts archives

2014-09-26

The practical problems with simple web apps that work as HTTP servers

These days there are a number of languages and environments with relatively simple to learn frameworks for doing web activity (I am deliberately phrasing that broadly). Both node and Go have such things, for example, and often make a big deal of it.

(I know that 'let's do some web stuff' is a popular Go tutorial topic to show easy it is.)

All of this makes it sound like these should be good alternatives to the CGI problem (especially with their collections of modules and packages and so on). Unfortunately this is not the case in default usage and one significant part of why not is exactly that these systems are pretty much set up to be direct HTTP servers out of the box.

Being a direct HTTP server is marvelously simple approach for a web app if and only if you're the only thing running on the web server. If you have a single purpose web server that exists merely for your one web application, it's great that you can expose the web app directly (and in simple internal setups you don't particularly need the services of Apache or nginx or lighttpd or the like). But this single purpose web server is very rarely the case for simple CGI-level things. Far more common is a setup where you have a whole collection of both static pages and various simple web applications aggregated together under one web server.

(In general I feel that anything big enough for its own server is too big to be sensible as a 'simple CGI'. Good simple CGI problems are small almost by definition.)

If you try hard you can still make this a single server in Go or node but you're going to wind up with kind of a mess where you have several different projects glued together, all sharing the same environment with each other (and then there are the static files to serve). If the projects are spread across several people, things will get even more fun. Everything in one bucket is simply not a good engineering answer here. So you need to separate things out, and any way you do that makes more and more work.

If you separate things out as separate web servers, you need multiple IPs (even if you embody things on the same host) and multiple names to go with them, which are going to be visible to your users. If you separate things out with a frontend web server and reverse proxying, all of your simple web apps have to be written to deal with this (and with the various issues involved in using HTTP as a transport). Both complicate your life, eroding some of the theoretical simplicity you're supposed to get.

(However Go does have a FastCGI package (as well as a CGI package, but then you're back to CGI), apparently with an API that's a drop in replacement for the native Go HTTP server. It appears that node has at least a FastCGI module that's said to be a relatively drop in replacement for its http module. FastCGI does leave you with the general problems of needing daemons, though.)

PS: I'm handwaving the potentially significant difference in programming models between CGI's 'no persistent state between requests' and the shared context web app model of 'all the persistent state you want (or forget to scrub)'. I will note that the former is much simpler and more forgiving than the latter, even in garbage collected environments such as Go and Node.

Sidebar: the general issues with daemons

Although it is not specific to systems that want to be direct HTTP servers, the other problem with any sort of separate process model for simple web apps is exactly that it involves separate processes for each app. Separate processes mean that you've added more daemons to be configured, started, monitored and eventually restarted. Also, those daemons will be sitting there consuming resources on your host even if their app is completely unused at the moment.

You can make this easy if you try hard. But today it involves crafting a significant amount of automation because pretty much no out of the box Unix system is designed for this sort of operation. Building this automation is a not insignificant setup cost for your 'simple' web apps (well, for your first few).

(If you try really hard and have the right programming model you can get apps to be started on demand and stopped when the demand goes away, but this actively requires extra work and complexity in your systems and so on.)

HTTPAppProblem written at 03:09:43; Add Comment

2014-09-25

Why CGI-BIN scripts are an attractive thing for many people

The recent Bash vulnerability has people suddenly talking about CGI-BIN scripts, among other things, and so the following Twitter exchange took place:

@dreid: Don't use CGI scripts. For reals.

@thatcks: My lightweight and simple deployment options are CGI scripts or PHP code. I'll take CGI scripts as the lesser evil.

@eevee: i am pretty interested in solving this problem. what are your constraints

This really deserves more of a reply than I could give on Twitter, so here's my attempt at explaining the enduring attraction of CGI scripts.

In a nutshell, the practical attraction of CGI is that it starts really simple and then you can make things more elaborate if you need it. Once the web server supports it in general, the minimal CGI script deployment is a single executable file written in the language of your choice. For GET based CGI scripts, this program runs in a quite simple environment (call it an API if you want) for both looking at the incoming request and dumping out its reply (life is slightly more difficult if you're dealing with POST requests). Updating your CGI script is as simple as editing it or copying in a new version and your update is guaranteed to take effect immediately; deactivating your script is equally simple. If you're using at least Apache you can easily give your CGI script a simple authentication system (with HTTP Basic authentication).

In the category of easy deployment, Apache often allows you to exercise a lot of control over this process without needing to involve the web server administrator to change the global web server configuration. Given .htaccess control you can do things like add your own basic authentication, add access control, and do some URL rewriting. This is part of how CGI scripts allow you to make things more elaborate if you need to. In particular, if your 'CGI script' grows big enough you don't have to stick with a single file; depending on your language there are all sorts of options to expand into auxiliary files and quite complicated systems (my Rube Goldberg lashup is an extreme case).

Out of all of the commonly available web application systems (at least on Unix), the only one that has a similar feature list and a similar ease of going from small to large is PHP. Just like CGI scripts you can start with a single PHP file that you drop somewhere and then can grow in various ways, and PHP has a simple CGI-like API (maybe even simpler, since you can conveniently intermix PHP and HTML). Everything else has a more complex deployment process (especially if you're starting from scratch) and often a more complex management process.

CGI scripts are not ideal for large applications, to say the least. But they are great for small, quick things and they have an appealingly simple deployment story for even moderate jobs like a DHCP registration system.

By the way, this is part of the reason that people write CGI scripts in (Bourne) shell. Bourne shell is itself a very concise language for relatively simple things, and if all you are doing is something relatively simple, well, there you go. A Bourne shell script is likely to be shorter and faster to write than almost anything else.

(Expert Perl programmers can probably dash off Perl scripts that are about as compact as that, but I think there are more relatively competent Bourne shell scripters among sysadmins than there are relatively expert Perl programmers.)

CGIAttractions written at 02:04:20; Add Comment

2014-09-14

My current hassles with Firefox, Flash, and (HTML5) video

When I've written before about my extensions, I've said that I didn't bother with any sort of Flash blocking because NoScript handled that for me. The reality turns out to be that I was sort of living a charmed life, one that has recently stopped working the way I want it and forced me into a series of attempts at workarounds.

How I want Flash and video to work is that no Flash or video content activates automatically (autoplay is evil, among other things) but that I can activate any particular piece of content if and when I want to. Ideally this activation (by default) only last while the window is open; if I discard the window and re-visit the URL again, I don't get an autoplaying video or the like. In particular I want things to work this way on YouTube, which is my single most common source of videos (and also my most common need for JavaScript).

For a long time, things worked this way with just NoScript. Then at some point recently this broke down; if I relied only on NoScript, YouTube videos either would never play or would autoplay the moment the page loaded. If I turned on Firefox's 'ask to activate' for Flash, Firefox enabled and disabled things on a site-wide basis (so the second YouTube video I'd visit would autoplay). I wound up having to add two extensions to stop this:

  • Flashblock is the classic Flash blocker. Unlike Firefox's native 'ask to activate', it acts by default on a per-item basis, so activating one YouTube video I watch doesn't auto-play all future ones I look at. To make Flashblock work well I have disabled NoScript's blocking of Flash content so that I rely entirely on Flashblock; this has had the useful side effect of allowing me to turn on Flash elements on various things besides YouTube.

    But recently YouTube added a special trick to their JavaScript arsenal; if Flash doesn't seem to work but HTML5 video does, they auto-start their HTML5 player instead. For me this includes if Flashblock is blocking their Flash, so I had to find some way to deal with that.

  • StopTube stops YouTube autoplaying HTML5 videos. With both Flashblock and StopTube active, YouTube winds up using Flash (which is blocked and enabled by StopTube). I don't consider this ideal as I'd rather use HTML5, but YouTube is what it is. As the name of this addon sort of suggests, StopTube has the drawback that it only stops HTML5 video on YouTube itself. HTML5 video elsewhere are not blocked by it, including YouTube videos embedded on other people's pages. So far those embedded videos aren't autoplaying for me, but they may in the future. That might force me to not whitelist YouTube for JavaScript (at which point I almost might as well turn JavaScript off entirely in my browser).

    (An energetic person might be able to make such an addon starting from StopTube's source code.)

Some experimentation suggests that I might get back to what I want with just NoScript if I turn on NoScript's 'Apply these restrictions to whitelisted sites too' option for embedded content it blocks. But for now I like Flashblock's interface better (and I haven't been forced into this by being unable to block autoplaying HTML5 video).

There are still unfortunate aspects to this setup. One of them is that Firefox doesn't appear to have an 'ask to activate' (or more accurately 'ask to play') option for its HTML5 video support; this forces me to keep NoScript blocking that content instead of being able to use a nicer interface for enabling it if I want to. It honestly surprises me that Firefox doesn't already do this; it's an obviously feature and is only going to be more and more asked for as more people start using auto-playing HTML5 video for ads.

(See also this superuser.com question and its answers.)

FirefoxFlashVideoHassles written at 01:15:58; Add Comment

2014-09-04

Some other benefits of using non-HTTP frontend to backend transports

A commentator left a very good comment on my entry on why I don't like HTTP as a frontend to backend transport that points out the security benefits of using a simple protocol instead of a complex one. That makes a good start to talking about the general benefits of using a non-HTTP transport, beyond the basic lack of encapsulation I talked about in my entry.

The straightforward security benefit (as noted by the commentator) is that a simple protocol exposes less attack surface and will lead to implementations that are easier to audit. Very related to this is that full HTTP is a very complicated protocol with many dark corners, especially if people start trying tricky exploits against you. In practice, HTTP used as a transport is unlikely to use full HTTP (and hopefully many of the perverse portions will be sanitized by the frontend); however, just what subset of HTTP it uses is going to be some unpredictable, generally undocumented, and variable (between frontends). As a result, if you're implementing HTTP for a backend you have a problem; put simply, where do you stop? You probably don't want to try to implement the warts and all version of full HTTP, if only because some amount of the code you're writing will never get used in practice, but you don't necessarily know where it's safe to stop.

(Related to that is how do you test your implementation, especially its handling of errors and wacky requests? In a sense, real HTTP servers have a simple life here; you can simply expose them on the Internet and see what crazy things clients send you and expect to work.)

A transport protocol, even a relatively complicated one like FastCGI, gives you a definite answer to this. The protocol is much simpler than HTTP and much more bounded; you know what you have to write and what you have to support.

(There is a devil's advocate take on this that I will get to in a sidebar.)

Another pragmatic advantage is that using a separate transport protocol imposes a strong separation between the HTTP URL, host, port, and so on of the original request and the possible TCP port and endpoint of the transport protocol. Your backend software has to work very hard to confuse the two and thus to generate URLs that use the backend information instead of the real frontend request information. By contrast software behind a HTTP reverse proxy has to take very special care to use the right host, URL, and so on; in many configurations it needs to be specifically configured with the appropriate frontend URL information instead of being able to pull it from the request. This is a perennial issue with software.

Sidebar: the devil's advocate view of full HTTP complexity

Using a non-HTTP transport protocol doesn't necessarily mean that you avoid all of the complexity of HTTP, because after all your application is still dealing with HTTP requests. What your backend gets to avoid is some amount of parsing the HTTP request and sanitizing it; with some frontend servers you can also avoid handling things like compressing your response in a way that the client can deal with. Even under the best of circumstances this still leaves your backend (generally your application and framework) to handle the contents of HTTP headers and the HTTP request itself. This is where a great deal of the complexity of HTTP resides and it's relatively irreducible because the headers contain application level information.

You are also at the mercy of your frontend for how much sanitization is done for you, and this may not be well documented. Is there a header size limit? Does it turn multi-line headers (if a client is perverse enough to send them) into single-line entities or does it leave them with embedded newlines? And so on.

(Let's not even ask about, say, chunked input.)

If you're dealing with the contents of HTTP headers and so on anyways, I think that you can rationally ask if not having to parse the HTTP request is such a great advantage.

NonHTTPTransportBenefits written at 00:07:48; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.