Wandering Thoughts archives

2012-03-25

Atom feeds and constrained (web) environments

Recently, Aristotle Pagaltzis noticed that I had misspelled the filename of this entry. When I renamed it to fix this, the WanderingThoughts Atom feed repeated the entry (and the comments feed repeated the comments on it); this led to a discussion on Atom entry IDs between Aristotle and I that I am now going to surface as an actual entry so I can discuss DWiki's problem with Atom entries at length.

Atom is in general a nice feed format, but it has one awkward requirement: it absolutely demands that you give each of your feed entries a globally unique ID. Anything that parses an Atom feed is entitled to assume that a new ID means a new entry, regardless of anything else (eg, the text exactly duplicating another entry). This requires per-entry metadata, and metadata is one of the deep problems of file-based content engines. Aristotle suggested:

But isn't it feasible to append a header (well, footer) line to an article file containing an ID, the first time the file is found not to have one?

Part of the problem is the practical difficulties of doing this. For instance, you need some sort of locking so that two simultaneous requests for the Atom feed do not both attempt to invent the Atom ID for an entry, and then you get to worry about the user also editing the file at the same time. All of these difficulties are why I would require an explicit 'publication' step if I was writing a new file-based blog engine (I discussed this here).

Beyond those problems, DWiki (the code behind WanderingThoughts) operates in a uniquely constrained environment; it was written to only require read access to the files that it was serving. Partly this is because it might run as a different user (for example, the web server user), and partly this is because I don't like to give web applications that much power and freedom; it's much easier to feel confident about code that writes things only in very constrained and limited ways. Beyond DWiki's specific circumstances I think that this is a good constraint to assume in general for a file-based system, because modifying files on the fly plays badly with things like keeping the files for your website or blog in a VCS repository (which is one of the big attractions of a file-based engine).

In this sort of environment you simply don't have a unique ID for entries. There is nothing that exists in the filesystem that you can safely use, and you have no way to make up an ID yourself and firmly associate it with the entry. Almost the best you can do is use the filename as the unique ID and hope that it changes only rarely. This is pretty much what DWiki does, and that means that on the rare occasions when I have to rename an entry, I violate the Atom spec.

Sidebar: doing better with more work

It's possible to do somewhat better than just using the filename as part of the ID. Ignoring the locking issues for the moment, what you need to do is make up an ID the first time you see an entry and then record the file-to-ID association in a separate file. Using a separate file avoids all of the issues with updating the entry itself, and still allows the user to correct the mapping by hand if they ever have to rename an entry's file.

AtomConstrainedEnvironments written at 01:11:33; Add Comment

2012-03-14

The right way to do wikitext transitions

Suppose that you have a wiki and for some reason you really need to change the wikitext dialect that it accepts (sadly this is not always hypothetical). As I once alluded to in a parenthetical aside, there is a right way to do this, one that will not make people swear off your software. As part of this you can make a small change to your wiki engine that will make all sorts of transitions much easier and thus make people happier with your markup language.

To put it simply, the wrong way to do wikitext transitions is anything that does not use your normal wikitext rendering engine. The right way is to use your regular wikitext rendering engine but instead of having it output HTML, have it output your new wikitext markup. Using your regular engine means that the conversion process interprets the wikitext exactly as it usually gets displayed; you never have a case where the conversion thinks the wikitext markup means one thing but it actually means another.

(You are also quite likely to have a complete conversion, since the rendering engine is itself a natural checklist of all of your markup. And if you miss some markup that's actually used, you can spot it from unexpected HTML in the output.)

So why don't wiki authors routinely do this? My guess is that many wikis don't actually have rendering engines with real parsers, but instead mostly use regular expression based progressive rewrites of the input text. Such progressive rewrites are relatively easy for wikitext to HTML because your output format is generally hard to confuse with your input format (which means that you don't run the risk of accidentally reprocessing already fully processed output). They are not as easy with wikitext to wikitext, because here your output format is easily confused with as yet unprocessed input.

(This is the old general regular expression problem of wanting to rename A to B at the same time that you rename B to A.)

A closely related way to make people happy with you is to have some way to dump out raw (untemplated) HTML for wikitext pages. People like this because it makes migrating away from your wikitext engine much simpler. Content in plain HTML is extremely portable and relatively easy to put into something else; the HTML that your wiki outputs for actual pages is not so much, because it is ornamented with navigation, sidebars, and so on. Also, when you have a specific 'output plain HTML' mode you can easily make it walk all wikitext pages for people instead of forcing them to crawl their site.

(This is on my mind lately because we are staring at this issue; we have a MoinMoin wiki that we need to turn into something else, and extracting the content in some usable form is clearly going to be a pain.)

I understand that some wikitext engines can import sufficiently plain and straightforward HTML and turn it into wiki markup (eg, I believe there is software to do this for Markdown). I consider this going above and beyond the call of duty for a wiki, but if you want to do it and can do it well it'll certainly be appreciated. If you support both simple HTML output and simple HTML input, try to make sure that doing a round trip doesn't change the markup (because sooner or later some joker will try it, just to see what happens).

GoodWikiTextTransitions written at 21:34:45; Add Comment

2012-03-07

How not to use Apache's ProxyPass directive

Periodically we need to set up reverse proxies with Apache's ProxyPass directive (to support our solution to the multiuser PHP problem). On the surface doing this fairly simple and straightforward; however, the important devil is in this spotlighted bit in the documentation:

If the first argument ends with a trailing /, the second argument should also end with a trailing / and vice versa. Otherwise the resulting requests to the backend may miss some needed slashes and do not deliver the expected results.

Since I have now stubbed my toe on this thoroughly, here are several ways to not use ProxyPass for this, all of which fall afoul of the above warning (some in less than obvious ways).

To start with, the basic template of ProxyPass is 'ProxyPass /a/path http://somewhere/else'. When Apache sees any URL that starts with /a/path, it removes /a/path from the front of the URL, puts whatever remains on the end of the second URL, and tries to fetch the resulting URL.

In all of the following examples, we want /url/ to be reverse proxied as a directory; the target has a page at the top level with a relative link to a.html.

First mistake:

ProxyPass /url/ http://localhost:8080

The top level page works and the link to a.html shows as link to /url/a.html, but attempts to follow the link fail with Apache being unable to fetch the URL http://localhost:8080a.html. This shows that Apache is effectively forming the URL by text substitution and then interpreting it later; because there is no / at the end of the second argument, it simply glued the raw text of everything past /url/ onto it and the result fails badly.

(This also doesn't do anything to handle a request for just '/url', but one can get around that with other tricks.)

Second mistake:

ProxyPass /url http://localhost:8080

If you request /url/ everything works. But if you request just /url you still get the page (instead of a redirection to the version with a version with a / on the end) and the relative link to a.html comes out as a link to /a.html (which doesn't exist and in any case is not reverse proxied) instead of /url/a.html, because your browser sees /url as a page in / instead of a directory.

This case is the tricky case because it's not obvious that we're breaking the rule from the documentation; after all, everything looks right since neither argument ends with a /. The problem is that when you make a bare request for http://localhost:8080, as you do when you ask for '/url', Apache implicitly adds a / on the end (because it has to; it must GET something from the server at localhost:8080). This implicit / means you have a / on the end of the second argument but not on the end of the first argument and have thus broken the rule.

My belief is that there is no simple way for whatever is behind the reverse proxy to fix this. Without peeking at special request headers that Apache reverse proxying supplies, it cannot tell whether a request for / is from someone who asked for '/url/' (and is okay) or someone who asked for '/url' (and should get redirected to /url/).

Third mistake:

ProxyPass /url http://localhost:8080/

If you ask for /url/ or anything under /url/, the reverse proxied web server receives a request for the (local) URL // or something that starts with that. Many web servers are unhappy about this. If you ask for just /url you get a page, but the relative links on the page are broken as before because it's still not redirected to /url/.

(However, now a suitably crazy web app can actually tell the difference between the two requests.)

As far as I can tell the only proper way to use ProxyPass in this situation is as follows:

ProxyPass /url/ http://localhost:8080/

This follows the rules and does not result in doubled /'s. It doesn't handle requests for /url at all, but I believe that you can arrange for /url to be redirected to /url/ by having a real /url directory in an appropriate place in your filesystem.

(In our environment most of these redirections are for user home pages, where /~user will already get redirected appropriately.)

ApacheProxyPass written at 02:24:29; Add Comment

2012-03-05

Web frameworks should be secure by default

In reaction to the recent GitHub Rails vulnerability, raganwald wrote in part (about the Rails security issue):

The Rails team went with the original Rails perspective on this: Rails developers are required to act like adults and be careful when working with sharp tools.

I understand why this view of insecure-by-default is popular, but it's making a fundamental mistake. The problem with the 'insecurity as a sharp tool' view is that insecurity is not like the other sharp tools in Rails' toolbox. When you misuse or misapply others of those sharp tools, you're very likely to find out right away because your webapp doesn't work right; things break, assuming that you actually test your features. You know that you've cut yourself and you need to fix it. With security issues, it's not at all obvious that you've just cut yourself on the edges of a sharp tool, because nothing obvious breaks or goes wrong. Security bugs are not like normal bugs; this is a large part of why they are hard to find.

(To extend the metaphor beyond the breaking point, default insecurity is not just a sharp tool, it is a sharp tool coated with an anti-coagulant and a numbing agent. Most people wouldn't want to work with such a thing.)

The real world argument for frameworks being secure by default is pragmatics. We pretty much know how most people develop web applications using frameworks; 90% or more of them do just enough to get their application working and tested (and many apps will not be fully tested, especially things like error paths). Only a very few people will go back and carefully audit their application for security issues (or probably even read your security documentation unless it's shoved in their face). If your framework ships as insecure by default, most of your developers will never notice and many of the web applications written in it will be insecure. This is not theoretical; as we've just seen with GitHub and Rails, it's what really happens, even with smart people and very good teams, and it happens over and over again. You cannot argue that people should be smarter and better than that, not if you are designing a framework for real people, because we know that they aren't.

(One fundamental reason it keeps happening is that security is almost always an overhead, not a selling feature. Teams are under relentless pressure to prioritize for features, because that's what people care about and can actually see. If they have to remember to do extra things for security, sooner or later those things are going to be overlooked in the rush to produce.)

FrameworksDefaultSecure written at 20:14:49; Add Comment

Convenience in web frameworks is often insecure

For those of you who have not heard, GitHub was compromised today, or more exactly a long-standing vulnerability was demonstrated today. They were compromised because of a feature in Rails called 'mass assignment' that by default allows web operations to update any field of the model record.

(I don't know enough Rails to confidently say what web operations in specific allow this, although some sources suggest PUT operations. I also don't know if they're tied to forms or can be submitted just out of the blue.)

No doubt there are a certain number of people who are now pointing and laughing at Rails. They probably shouldn't be. Lots of web frameworks have a terrible history of this sort of vulnerability, because of a core conflict: convenience in frameworks is often insecure, by the nature of how web frameworks achieve convenience.

Fundamentally, a great deal of the convenience of frameworks comes from not having to say things; this is the mantra of 'convention over configuration'. The problem is that being secure invariably requires saying things, one way or another (either to allow access to some things when the default is no access or to block access to some things when the default is access to all). Thus the most 'convenient' way for a framework to operate, the one that requires saying the least, is to be insecure by default. This doesn't require your users to say anything unless they notice the security issues and care, whereas the other way around requires your users to say extra things to get access to stuff they want.

(Sometimes, if you're clever, you can figure out how to be both secure and convenient. This is great when it works and framework authors should do as much of it as possible; for example, these days there isn't any excuse for a framework that doesn't validate a received form submission against the form's allowed parameters. Note that this doesn't seem to have been the Rails problem here.)

And so over and over again, frameworks are insecure by default and people do not say the things to use them securely. Or, to put the blame where it belongs, the creators of frameworks decide that making them insecure is okay because people who need security will fix them, even though everyone has seen repeatedly that people don't do that.

(Thus, framework creators often fail to solve the real problem. If you create a system that people do not use securely you have failed to create a secure system in practice, even if it's secure in theory. Always remember, security is people, not math and theory.)

ConvenientFrameworksAndSecurity written at 00:39:12; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.