2013-10-26
10G Ethernet and network buffer sizes (at least on Linux)
I spent a chunk of today checking out the performance of 10G networking on what we hope will be our new iSSCSI backend hardware and in the process I made an interesting discovery: at 10G speeds, the size of your program's network buffers can matter. In fact it can matter a lot.
I'm used to a network environment where it doesn't really matter how
much you write() to the network at once as long as it's not stupidly
small. Going much smaller than the 1500 byte MTU on 1G Ethernet will
cost you TCP performance but that's about it. Certainly once you hit the
MTU you're in the clear and any competent modern machine and network
will deliver wire bandwidth. With 10G Ethernet this doesn't seem to
be true at all. Getting close to theoretical wire speeds even in a
relatively tight sending loop seems to require network buffer sizes that
are much bigger than the MTU. I was seeing performance increases all the
way out to 128 KB write() buffers.
(128 KB is the interesting size for me because 128 KB is the common ZFS block size and thus the usual size for ZFS disk IOs. iSCSI overhead makes that a bit more than 128 KB on the wire but not that much.)
I don't know exactly why this is so, but one obvious theory is that 10G Ethernet transmit speeds are so ferociously fast that you need relatively large buffer sizes to keep the operating system's transmit buffers filled. If I'm doing the math right, it takes about 12 microseconds to transmit a 1500 byte Ethernet packet on 1G Ethernet; in order to keep the network saturated your program needs to push around that much write data into the kernel every 12 microseconds to avoid stalling the network. By contrast 10G will write about 16 KB to the network over that same 12 microseconds, so you need to push about ten times more data into the kernel to keep the network fed and avoid stalls.
(This assumes that the transmit paths for 10G are no more efficient than for 1G. This may be a bad assumption; I wouldn't be surprised if there is more hardware acceleration and thus less kernel overhead for 10G.)
If my theory is right what matters here is not your program's total bandwidth into the kernel but latency. If you write with small buffers you must provide more data to the kernel almost immediately in order to avoid a stall. If you write with large buffers you buy yourself latency insurance and can afford more of a delay before your program gives the kernel more data. I'm not entirely satisfied with this explanation but it's the best I have right now.
(It may be lucky that my test sending program is not necessarily the absolutely fastest thing that it could be and has some degree of extra overhead because of how I'm using it. If I'm right I wouldn't necessarily have seen this effect with a really fast and minimal test program.)
2013-10-14
The importance of small UI tweaks (for me), dmenu edition
The basic behavior of dmenu is that it reads a bunch of things from standard input then puts up a window which displays those things (or as many of them as fit) and lets you type things with autocompletion from the list. The standard version of dmenu puts as much as possible as it reads from standard input into the display area and as a result if you have a lot of completion items you get a very cluttered display area. Because I don't like this sort of clutter I spent a long time keeping my dmenu completion items relatively modest.
(Modern wide screens exacerbate this effect because they create lots of display area.)
Relatively recently I made a little change to my private version of dmenu: I added support for completion items that didn't show up in the initial display (they show only if you type something and they match it). The compounding effect of this small change was quite interesting. Because extra items no longer cluttered up the display, I became willing to put in a whole bunch of extra items. These extra items, now available for fast completion, made dmenu more convenient to use for access to them and as a result made me happier about using dmenu in general. In short, yet another little point of friction I hadn't even realized was there had been abraded away (I've seen this sort of invisible friction before).
(See here for the source for this if you're interested.)
What this says to me is that small UI tweaks matter. Over and over again (cf), small things that I thought would be insignificant have turned out not to be so. And if these things are lurking in my environment, how many of them are lurking in the programs that I write? There are almost certainly little things that really do matter lurking in my programs and I should probably pay more attention to my sense of niggling irritations and 'this could be better' in them.
(This isn't perfect because it reflects how I use my programs and systems, not necessarily how other people do. Exposing something to other people and seeing how they use it is always a humbling experience when I can observe it.)
2013-10-11
Some pain points of parsing wikitext (and simplifications that avoid them)
I've talked before in general about things that make parsing a general wikitext hard or at least inobvious but I've never given specific examples. Today I want to do so and as an associated issue talk about what simplifications and shortcuts you can take to make it easier.
First, let's talk about errors. Specifically, that in parsing general
wikitext there aren't errors; you've always got to
keep on producing output. In a wikitext dialect where * is the
emphasis delimiter, consider the text 'This is *an error'. In a
conventional parser the result of this is an error that boils down
to 'unterminated *'. In a generally useful wikitext parser, this
must either add an implicit terminating * or treat the visible
* as just a literal character. What complicates this is that
many conventional parsing techniques generate errors only well after
the actual error point; for example, the literal error here might be
'unexpected end of text'. The good news here is that error recovery is
actually a well studied thing so there are probably techniques that can
be adopted even if you can't use a standard parser generator with its
standard terrible error handling.
The two possible simplifications here are to either ignore the possibility of errors entirely or to declare that it's okay for errors to make the wiki page unreadable (yes, these have the same end result). The more restricted you make who can edit the wiki, the easier it will be to get away with this.
In a general wikitext, text style changes are a pain because they
can nest and thus overlap. It's valid to have bold italics but
'*~bold italics* bold~' either doesn't work or requires you to do
peculiar things to synthetically generate a properly nested result. I'm
not sure how many parser generators will allow you to write a recursive
general production rule for this and how many will require you to write
a whole series of restrictive sub-rules for 'text but no <whatever>
markers'. Relatedly, text style changes can nest inside other inline
elements like the text for links. As a result, sorting out how to handle
the following is going to be interesting:
This is *italic [[(with an *italic* link) http://somewhere]] text*.
The simplification is to disallow all nested style changes. Plain text can be set in only one style and all link text and other embedded text doesn't allow style changes at all.
(Related to this is text formatting that actually has the effect of turning off formatting characters to quote what it encloses. The traditional parsing approach pushes such quoting into the lexer but this leaves you with your text markup split between the lexer and the parser proper.)
The big pain point with block level parsing are nesting constructs, especially mixed line-level nesting constructs. For example:
A paragraph. > A nested paragraph > > A further nested paragraph, > > with running multi-line text. > * A list. > * With a nested one. > but this is continued. > * another list entry. > > == And a nested header > (Yes, people do this.) Really.
In a way, the important thing here is the nested paragraph with running multi-line text. Parsing this wikitext must reassemble the paragraph as a single entity despite there being non-paragraph tokens between the end of one line and the start of its text on the next line. The traditional solution (as used in things like Python) is to have the lexer swallow the line level tokens and generate synthetic tokens for changes in the level and type of nesting. The pain point is that putting this in your lexer is embedding a significant amount of your actual grammar and parsing in the lexer instead of the parser.
The simplification is to disallow complex nesting and have at most nesting that can be easily and relatively generically handled in the lexer (often this means whitespace based only, perhaps just for nested lists). You still have a certain amount of the grammar in the lexer, but at least it's a simple chunk.
(This also neglects any whitespace based parsing you might want to do for things like writing tables in a natural looking plain text style.)
There is also a general simplification possible for many of these pain points: you can do only limited parsing through your formal grammar and then post-process the results after the basic parse has finished. For example you could deal with many or all of the text formatting issues by having your lexing and parsing process only insert marker nodes for potential style changes and so on. The postprocessing pass would then turn these into actual style AST nodes or demote them to plain text as appropriate (or swallow them as redundant, as in the case of italics inside link text inside text that was already italic).
(This approach is unsuitable to single pass 'straight to HTML' wikitext parsing, but you should be parsing to an AST anyways.)
Note that it's certainly possible to create wikitext dialects with all of these simplifications and then to parse them with relatively robust general techniques. But these dialects are clearly limited compared to a fuller wikitext and it's my strong belief that the more general wikitext is going to be better for users outside of limited domains.
(Disclaimer: it's quite possible that I'm missing various general parsing techniques that make some or all of these pain points go away (I have a reasonable but not completely deep exposure to parsing techniques, especially modern ones). If so I'd love to hear about them.)
How DWiki parses its wikitext (part 1)
I have a long-standing view that parsing wikitext is both hard and not discussed very much; if for some reason you have to write a wikitext parser, there are very few good examples to easily crib from. So I feel like describing at least some stuff about how DWiki parses its wikitext dialect.
(Most of current DWiki wikitext parsing is not my work; the core code was generously improved and rewritten by Daniel Martin many years ago.)
The first big thing is that DWiki actually effectively has two separate parsers. One parser handles embedded formatting in running text (things like fonts, links, and so on) and the other one handles all of the line oriented block level structures like paragraphs, headers, blockquotes, lists, etc. What makes it work is that the block level parser doesn't parse running text immediately for multi-line things like paragraphs; instead it does what I'll call 'paragraph fusion', where all of the lines of raw text are gathered up until the end of the paragraph and only then parsed together. Splitting up the parsing job this way significantly simplifies both sides and makes it much simpler to handle text formatting that spans across multiple physical lines (since you're processing all of the lines at once).
(If I was rewriting DWiki's parsing code from scratch I would completely separate the parser code for running text from the block level parser code.)
Both the block level parser and the running text parser are complicated beasts but the broad form of the running text parser is easier to describe. It's a highly optimized (by Daniel Martin) version of the standard regular expression based substitution process where you walk through the source string, find everything that needs you to do something special (either to change it into some sort of HTML markup or to quote it) and process them. There are a bunch of ad hoc complications that are handled in ad hoc code on the fly, which I've become convinced is the best approach. One thing worth noting is that this step at a time process means that the text parser can be invoked recursively in order to handle the text of a link (which is allowed to have wikitext markup).
(The need for ad hoc things in wikitext parsing really calls for a separate entry, but I'm convinced that a strict and regular wikitext dialect is not 'humane'.)
The running text parser is also the single biggest place where a faster language than Python would improve the code. The current code attempts to do as much as possible in regular expressions for the traditional Python reason and this creates much of its complexity in various ways. A straight reimplementation in eg Go could probably dispense with a lot of this in favour of simpler lexing.
PS: this is going to be an irregular series because to write entries I have to actually understand chunks of DWiki's wikitext rendering code and this takes serious work at this point. If I'm a smart person I'll write all sorts of comments in the source code during this process in addition to these entries.