Wandering Thoughts archives

2014-02-26

Saying goodbye to the PHP pokers the easy way

If you have a public web site or a web app, you almost certainly have people trying drive-by PHP exploits against you whether or not your site shows any sign of using PHP. The people (or software) behind these don't care; they seem to operate by taking one of your URLs and slapping the page name (and sometimes query parameters) of a vulnerable bit of PHP, then seeing if it works. I see requests like:

GET /~cks/space/blog/linux/images/stories/food.php?rf
POST /~cks/space/blog/linux/index.php?option=com_jce&task=plugin&plugin=imgmanager&file=imgmanager&version=1576&cid=20
POST /~cks/space/blog/linux//components/com_jnews/includes/openflashchart/php-ofc-library/ofc_upload_image.php?name=guys.php
GET /~cks/space/blog/linux//components/com_jnews/includes/openflashchart/tmp-upload-images/guys.php?rf

If you have anything other than a static site, these requests are at least annoying (in that they're forcing your code to run just to give the attacker a 'no such URL' answer). If you log potential security issues (such as odd POST content-types or the like) they can also make your logs nag at you. Recently I got irritated at these people and decided to make them go away the easy way.

The easy way here is to have your web server handle refusing the requests instead of letting them go all the way to your actual app code. Front end web servers generally have highly developed and very CPU-efficient ways of doing this (exactly how varies with the web server), plus this means your app code won't be logging any errors because it's never going to see the requests in the first place. In my case this host runs Apache and so the simplest way is a RewriteRule:

RewriteRule ^.*\.php$ - [F,L]

No fuss, no muss, no CPU consumption from my Rube Goldberg stack, and no more log messages.

(Arguably this generates the wrong HTTP error code, if you think that matters, since it generates a 403 instead of the theoretically more correct 404.)

Of course you can only do this trick if you can guarantee that you'll never use a URL ending in .php. This isn't necessarily something you can assert for a general use web program (cf), but it often is something you can say about your particular site. It's certainly something I can say about here; even though I theoretically could create a perfectly valid URL ending in .php (although it wouldn't be a PHP page), I'm never going to.

(And if I do, I can change or remove my RewriteRule.)

PHPPokersGoodbye written at 00:03:26; Add Comment

2014-02-22

A subtle advantage of generating absolute path URLs during HTML rendering

If you're writing a multi-page web application of some sort, sooner or later you'll want to turn some abstract name for another page into the URL for that page, or more exactly into a URL that you can put into a link on the current page. For a non-hypothetical example you might be writing a wiki or a blog engine and linking one entry to another one. When you're doing this, a certain sort of person will experience a little voice of temptation urging them to be clever and generate relative paths in those URLs. After all if you're rendering /a/path/page1 and linking to /a/path/page2 you can simply generate a '<a href="page2">' for your link instead of putting the whole absolute path in.

(And this sort of cleverness appeals to any number of programmers.)

The obvious reason not to do this is that it's more work. Your code almost certainly already has to be able to generate the absolute URLs for pages, while converting those absolute URLs to relative ones will take additional code. So let's assume that you have a library that will do this for free. Generating relative URLs is still a bad idea because of what it does to your (potential) caching.

A HTML fragment with absolute path URLs is page-independent; it can be included as-is anywhere on your site and it will still work. But a HTML fragment with relative path URLs is page-dependent. It works only on a specific page and can't be reused elsewhere, or at least it can only be reused in certain select other pages, not any arbitrary page. Relative path URLs require more cache entries; instead of caching 'HTML fragment X', you have to cache 'HTML fragment X in the context of directory Y' (and repeat for all different Ys you have). Some web apps have a lot of such directories and thus would need a huge number of such cache entries. Which is rather wasteful, to put it one way.

This is one of those fortuitous design decisions that I stumbled into back at the start of writing DWiki. I made it due to laziness (I didn't want to write something to relativize links, however nifty it would have been) but it turned out to be an excellent idea due to the needs of caching.

(Note that in most blog engines, one sort of 'HTML fragments' that you will be reusing is blog entries or at least their leadin text. Blogs typically have lots of places where entries appear.)

AbsoluteURLsAdvantage written at 00:12:00; Add Comment

2014-02-17

File based engines and the awkward problem of special URLs

I was recently asked a good question on Twitter:

@thatcks Do you publish feed URLs on your blog besides '/blog/?atom'? My reader of choice sadly has issues re: dropping the GET param.

The answer is unfortunately not. So, you might reasonably wonder, why do syndication feeds here use a query parameter? The answer is that DWiki (the engine behind Wandering Thoughts) is a file and directory based engine and when you build such an engine, you wind up with a URL namespace problem.

Put simply, when you're simply presenting a view of a directory hierarchy the user de-facto owns the URL namespace. They create valid URLs by creating files and directories, and it's within their power and their right to create even ones with awkward names. If you add your own names to this namespace (for example a 'blog/atom' URL for the blog's Atom syndication feed) you're at risk of colliding with a name the user is creating in the directory hierarchy. Collisions are generally bad, especially collisions when you have not told the user what happens next.

I think that there are three main things you can do here. First, you can simply reserve some names in the namespace, ie you tell the user 'you can't create a file or directory called 'atom', that's a reserved name'. There are several versions of name reservation but I think that they're all unappetising for various reasons. Reserved names also give you problems if you want to add features that require new ones, since the user may already be using the name you want to take over.

(This is a familiar issue for programming languages; adding new reserved keywords for things like new syntax is fraught with peril and the possibility that old programs suddenly won't work with your new version because they're using what is now a reserved keyword as a variable name.)

The second and related approach is to fence off certain classes of names as invalid for the user and thus available for your program's synthetic URLs. This can work reasonably well if you create rules that match user expectations and have solid, appealing reasons. For example, DWiki won't serve files with names that start in '.' or end in '~', and so both categories of names are available for synthetic URLs. The drawback of this is that the resulting synthetic URLs generally look ugly; you would have 'blog/.atom' or 'blog/atom~'.

(DWiki uses a few such synthetic URLs but all for transient things, not for any URL that users will ever want to bookmark or pass around.)

The third approach is to take your program's synthetic URLs completely out of the user's namespace, such as by making them query parameters. Even if a user creates a file with a '?' in its name, it simply won't be represented as an URL with a query parameter; to be done right, the '?' will have to be %-encoded in the URL. This approach has two virtues. First, it's simple. Second, it can be applied to any regular URL whether or not it's a directory or a file, and it doesn't require turning a file into a pseudo-directory (eg going from 'blog/anentry' to 'blog/anentry/commentfeed', which raises the question of what 'blog/anentry/' should mean). DWiki takes this approach, and so syndication feeds and in fact all alternate views of directories or files are implemented as query parameters.

(From the right perspective, a syndication feed is just an alternate view of a directory hierarchy. Or at least that's my story and I'm sticking to it.)

FileBasedUrlConstraints written at 00:12:28; Add Comment

2014-02-16

Why comments aren't immediately visible on entries here

Recently, a commentator on this entry left a comment with a good question that was unrelated to the entry. GlacJAY asked:

Why do I need one more click to see the comments?

The most useful answer is that things remain this way as a deliberate design decision that I've made because of how I want Wandering Thoughts to operate and come across to readers. I could sugar coat this, but I should be honest: the entries are what I really want people to read, not the comments. I see comments as an optional supplement for the entries, similar in spirit to footnotes.

Making it take an extra click to read comments for many URLs is a conscious way of de-emphasising the comments in favour of the entry text. I want you to read the entry text; then you can go on to read comments if you find the idea interesting enough. If I embedded comments on the main entry page, there are some entries (often entries that I care relatively strongly about) where the comments section would come to dominate the overall entry simply because of the relative volumes of text (eg this recent one). I very much don't want that. My writing is the important thing here as far as I'm concerned (and yes I'm biased).

(Related to this, I consider it a feature that you can't start reading an entry and then trivially skip down to the comments partway through. There is at least a little bit of a roadblock.)

This is not a blog design decision that works everywhere and for everyone. Some people want their entries to be the starting point for discussion and interaction; these people clearly want to make their comments more accessible and so on than I do here. I read a number of blogs like that, some of them where the comments section can be as interesting as the blog entries themselves.

(Some people go the other way and don't want on-blog comments at all. I don't feel this way and value comments here, but I do feel that comments are here primarily for me instead of for my readers. Which is a reason I'm willing to de-emphasise them for readers in the way I do.)

PS: that comments are treated this way is also caught up in the history of DWiki's original design and intended purpose (which was not to be a blog engine). But that's another story for another entry.

CommentsVisibilityIssues written at 01:28:12; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.