Subdirectories: NewFeatures.
2013-03-06
DWiki's caching system
DWiki has optional caching in order to speed up generating results repeatedly. DWiki uses a disk-based cache for this (although the interface is abstracted and alternate forms of caching may be introduced someday). There are three caches, which can be enabled separately: the renderer cache, a brute force page cache, and an in-memory brute force page cache that is only used if DWiki is running as a preforking SCGI server.
DWiki never removes the files of out of date cache entries from the disk cache; instead, it stops considering out of date ones to be valid. Cleaning out the detritus is left for an external process. ChrisSiebenmann considers this safer; giving a program an automated
unlink()makes him nervous.See ConfigurationFile for the options controlling the behavior of the caches.
In theory DWiki's caching is optional. In practice a decent sized DWiki is simply too slow without caching for some of the more expensive operations and caching becomes more or less a necessity. ChrisSiebenmann now believes that you should configure all levels of caching in basically any DWiki unless you have some unusual need and are sure.
The brute force cache
The brute force page cache is about as simple as you can get: it caches complete requests for a configured time (called a time-to-live, or TTL). That's it. The BFC is intended as a load-shedding measure when DWiki is under significant load, so it only acts under certain circumstances:
- only on
GETorHEADrequests.- only on requests without a
Cookie:header.- requests only get put into the cache if the system seems loaded.
(For speed, when something is valid in the cache DWiki just serves it without checking the system load.)
A good BFC TTL is on the order of 30 seconds to three minutes or so; long enough to shed significant load if you are getting a lot of hits to a few pages and short enough that dynamic pages won't become too outdated. (And that waiting to see a comment show up or whatever is not too annoying.)
Because Atom syndication requests are among the most expensive pages to compute, the BFC can be set to give them a longer TTL than usual. There is a second TTL that can be set for Atom requests that aren't using conditional
GET; the idea is that if requesters cannot be bothered to be polite, we can't be bothered to serve fresh content. Setting this option always caches the results of such requests, even if the load is low, which means that even people doing proper conditional GET requests will use the cached results for as long as their (lower) TTL says to.It's actually faster to serve static pages from the static page server code than from the BFC, so the BFC doesn't try to cache static pages.
The two sides of the BFC
It's important to understand that the BFC does not check load when it is checking to see if something is in its cache. This means there are two stages to processing a request: deciding what TTL to use for cache checks, and deciding whether to cache something that was not current in the cache.
The TTL used is:
bfc-atom-nocond-ttlif this is an unconditional request for an Atom view, if set.bfc-atom-ttlfor Atom view requests in general, if set.bfc-cache-ttlotherwise.Pages enter the BFC cache either because the system seems to be loaded or because
bfc-atom-nocond-ttlwas set and they were an unconditional request for an Atom view.Once something is in the cache, it will be served from the cache if it is not older than the check TTL. Different requests can use different check TTLs for the same cached page; for example, conditional GETs versus other requests for Atom views.
The in-memory cache
The in-memory cache is essentially a version of the brute force cache that holds pages in memory instead of on disk. It's only effective in environments where DWiki serves multiple requests from the same process; currently it's only used if DWiki is running as a preforking SCGI server. Because it holds pages in memory as page response objects, the in-memory cache is about the fastest way that DWiki can serve requests. In particular it's faster to serve static pages from the IMC than from disk, so unlike the BFC the IMC does cache static pages.
Because IMC entries disappear automatically and are essentially free to create, the IMC caches pages unconditionally when active (unlike the BFC). This means that it should normally have a relatively low TTL, often lower than the BFC's TTL. Note that because the IMC is before the BFC, it can load its cache from BFC cache hits.
For obvious reasons, it's pointless to set the IMC cache size to be larger than the number of requests a preforked SCGI process will serve before exiting.
To keep IMC memory usage under control, the IMC has a settable maximum page size that it will cache. Tune this as appropriate for your environment.
The IMC can be deliberately forced on with
imc-force-on, in case you're running DWiki in some other preforking environment (for example as a WSGI application under a preforking WSGI server such as uWSGI).Considerations for the IMC TTL
Under some setups, DWiki will only be running as a (preforking) SCGI server when it's under heavy load; in others DWiki is running this way all of the time, even when the load is light. Because the IMC unconditionally caches pages the latter situation can be annoying; it means that someone who, say, writes and posts a new comment may not see that comment until the IMC TTL expires. DWiki makes some attempt to bypass the IMC (and the BFC) in the common case of someone leaving a comment. However this is not perfect (in part because it requires the web browser to accept cookies from DWiki).
(This also applies if you're running DWiki in some other preforking environment and have forced the IMC on.)
If you're running DWiki full time in an IMC-on environment, you likely want to set a quite low IMC TTL, such as 15 to 30 seconds. If you're running DWiki with the IMC on only under heavy load you can set a higher IMC TTL, such as two minutes (120 seconds).
The renderer cache
The renderer cache is actually two caches. The renderer cache proper caches the output of various renderers (cf TemplateSyntax). The output is cached with a validator and the cached results are fully validated before they get used; this means that renderer cache entries do not normally use a TTL and in theory could be valid for years.
The (heuristic) generator cache caches the output of some expensive precursor generator routines. These cache entries only have heuristic validators, where DWiki can be fooled if people try hard enough. Generator cache entries do have a TTL, so that if the heuristic is fooled DWiki will pick up the new result sooner or later. Some cache entries can also explicitly invalidated by DWiki in a pretty reliable process; by default, these have a much longer TTL than plain heuristic cache entries. These are called 'flagged' (heuristic) generator entries and various ConfigurationFile settings controlling how they behave are
render-heuristic-flagged-....(Trivia: the 'flagged' name is because such entries are invalidated using a flag file, or more accurately a flag cache entry.)
Currently the main renderer cache caches the output of various wikitext to HTML rendering routines while the generator cache caches the results of various filesystem 'find all descendents' walks that are used to build lists of comments (for Atom comments feeds and some wikitext macros; this uses explicit invalidation) and lists of pages (for Atom feeds and various blog renderers such as
blog::prevnext).Unfortunately, a DWiki page that has comment or access restrictions must be cached separately for each DWiki user that views it. Under some situations this can result in a number of identical copies being cached under different names. If you want to avoid this, DWiki lets you turn off renderer caching for non-anonymous users.
Force-invalidating list of pages caches
The general validator for
blog::prevnextcache entries is the modification time for all of the directories involved that had files in them at the time (the latter condition is for technical reasons). The heuristical validator checks that some of the file timestamps are still the same, but it can't check all of them and still be a useful cache.So the easy way to invalidate this is to change the modification time of a directory involved, for example with
touch.The 'list of pages' cache is similarly invalidated by changing a directory modification time. Unlike the
blog::prevnextcase, the directory times are the only thing that this cache checks. This is a bit of a pity but the performance improvements from caching this information are very visible.Disk space usage and directories
Much like comments, each page that has something cached for it becomes a subdirectory, with the various cached things in files. The different sorts of caches use different top-level directories under the
cachedir, so you have paths likecachedir/bfc,cachedir/renderers, andcachedir/generators.Because some results include absolute URLs that mention the current hostname, DWiki must maintain separate caches for each
Host:header it sees in the BFC and the general renderers cache. These are handled as subdirectories in each cache directory, socachedir/bfc/localhost/...and so on. Entries in the generator cache don't depend on the currentHost:header, so there is only one (sub)cache for all requests,cachedir/generators/all/....Generally the general renderers cache uses the largest amount of disk space, followed by the BFC, and the generator cache is the smallest.
If you're using caching (and as mentioned, you probably want to), you'll want to periodically trim the caches. ChrisSiebenmann just does this by hand every so often by removing the cache directories entirely; DWiki will then rebuild them as necessary.
2013-02-25
DWiki Global Variables
As TemplateSyntax discusses, one can use global variables in templates in several ways. However, it helps to know what global variables are available. Thus this incomplete listing.
First, all ConfigurationFile directives are available as global variables.
Then, during request processing DWiki internally defines a number of additional global variables:
pageThe current page's full path, in DWiki form. abspageThe current page's full path, including a ' /' at the start.pagenameThe page's name; its last path component. pagetypeThe type of the page, usually 'file' or 'dir'. view-formatThe current view being processed. relnameIn blog::blog, the name of the current page relative to the blog directory being displayed. basepageIn a VirtualDirectory context, the full path of the non-virtual directory. Otherwise the same as page.:wikitext:titleAfter a piece of wikitext has been rendered (more exactly after any wikitext template renderer has been used, including wikitext:cache), this is its title if any exists. The 'title' of a piece of wikitext is the text of the header that is at the start of the text, if there is one. This is the same as thewikitext:titletemplate renderer but may be more convenient to use.:wikitext:title:nohtmlThis is the title but without any HTML markup, making it useful for eg a <title>. It's the same as the :wikitext:title:nohtmltemplate renderer.loginThe currently authenticated user. comment-ipIP address that posted the current comment. comment-userUser that posted the current comment. :comment:postThe result of an attempt to post a comment. One of 'good', 'bad', 'badchar', or 'nocomment' (the latter if it was an attempt to post an empty comment). (Only defined during comment posting.) :error:errorError type. Only defined during error processing. http-commandThe type of HTTP command being processed, either GET or POST. http-versionThe (claimed) version of HTTP that the current request used. remote-ipThe IP address the current request came from. server-nameThe hostname or IP address for this web server that the sender of the current request claims to have used. Not all of these are defined all of the time. Generally a context-dependant variable is only defined when the current thing being processed has that sort of information.
There are other global variables that get set, but they are for more internal use, and you're best off browsing the source code for them.
2013-02-01
DWiki can now restrict what sorts of VirtualDirs advertise AtomFeeds (both in SyndicationDiscovery and in the Atom toolbar) and/or provide them if they're requested by URL.
It turns out that when you have a fair amount of content in a DWiki your VirtualDirs and thus your AtomFeeds proliferate like over-active rabbits. Then SyndicationDiscovery kicks in so that anyone who looks at a virtual directory can discover its Atom feed and either start polling it or just crawl it. Once your DWiki gets big enough this becomes not really a good thing, as Chris has found out with his techblog.
2011-12-09
Directories can now say that they don't want to be rendered in specific view types. The usage case Chris has in mind is his techblog, where the blogdir view of categories is utterly huge because it renders hundreds of entries. Because this is intended to be a graceful gentle fix, trying to view a directory in a disallowed view generates a (permanent) redirection to the default view of the directory. To avoid redirection loops, this redirection only happens if the view has been specified explicitly as a URL parameter.
(For obvious reasons, disallowed views are also disallowed in virtual directories derived from a particular real directory.)
This is done similarly to DefaultDirViews: touch a file in the directory
called .flag.noview:<viewname>. Unlike default views, this is not
currently inherited by child directories.
The 'See As' page tools links also exclude disallowed view types. Right now they do so a little bit too thoroughly, in that they exclude the default view if it's also disallowed. Moral: don't do that, even though the code saves you from a redirection loop in this case.
Right now there is no restrictions on what (directory) views can be disallowed, so you can disallow Atom feeds. This is probably not a feature and will probably not be staying, although Chris may change his mind about this or just be lazy.
2011-06-01
Not covered before now are various new configuration options that have been quietly added to DWiki over the five or so years that I have been using it as mostly a blogging engine. As you might expect, a bunch of these have to do with dealing with obnoxious clients of various sorts.
They are by and large documented in ConfigurationFile. I am not going to try to remember them here.
These are directives that change how DWikiText is interpreted to do things like turn off certain font characters or map a simple-to-type character sequence like '->' to a HTML entity. They are documented in DWikiText so I am not going to repeat myself here.
In the process I added a new and less annoying plain quoting mechanism: ``...''. It looks better in ASCII than it probably does in the font here.
(The code for this was written in August of 2007 or so, but I sat on it because I wasn't entirely sure I liked the feature. Well, nuts to that; time to roll things out and just go.)
2006-04-08
I've decided that sometimes I really want a directory to have an
index page, not just a list of the contents. So now I can, with the
unimaginatively named 'index' view. It's mostly template based; the
normal template uses inject::index to display an __index file.
However, the view has several special properties:
__index in the directory.__index is a redirection, the index view will just generate
a redirect to the target.(Thus, the index view could be used to replace the wikiroot
configuration directive.)
The index view is valid on directories even without a __index
page in the directory. Right now the template just displays a normal
directory listing in that case.
2006-03-25
#pragma search ...I got tired of the relative links in my blog drafts (written in an
entirely different directory than their eventual home) not working.
So, introducing #pragma search; this adds additional directories to
search for pages when resolving relative links, both WikiWords and
explicit [[...]] links. The directories listed in a search pragma must
be absolute path names, and are searched only after the hard-coded
possibilities have been exhausted (including the alias directory for
WikiWords).
You can't have both a #pragma search ... and a #pragma pre
in the same page, partly because it doesn't make any sense.
(This is the simple way of getting out of handling multiple
#pragma directives.)