2007-10-31
What may be causing my random NumLock issues
I think I may have finally woken up to a (potential) cause for my
random NumLock issues: the xine media player. One
indicator, one that I really should have paid attention to earlier, is
that often starting xine will turn on NumLock (and sometimes turn it
off immediately afterwards). It is of course hard to be entirely sure
since the problem is erratic, but I've been doing some testing since
I noticed the correlation and I don't think I've ever had the problem
when xine wasn't running.
I suspect that xine is not the only thing at fault, partly because this happens far more on my home machine than my work machine. It's certainly possible that xine happens to be the only program I usually run that does some particular operation that makes the X server hiccup in a way that toggles NumLock.
This is kind of irritating, since xine is what I use to play AAC+ music streams, and an AAC+ stream is what I tend to put on as background music at the office. With my keyboard, flipping NumLock on causes very wild things to happen in vi, and now that I have a strong suspicion, I should probably stop using xine at work.
(I can't say I'm exactly listening to the stream a lot of the time; one of the reasons I listen to this stream instead of actual music albums or the like is that I can stand to interrupt it at the drop of a hat. With albums, I get grumpy if I can't listen to them all the way through without interruptions.)
Mind you, the real cure may be to upgrade my ancient Fedora Core 6 installations to Fedora Core 7 (or wait a bit for FC8). That might even fix xmms's problems with AAC+ streams, which would make me happy since I am not entirely fond of xine.
(In the mean time, it appears that using mplayer gets me most of what
I want. It handles AAC+ streams, it's verbose but non-GUI, and it uses
both less memory and less CPU than xine. I'll call myself sold.)
2007-10-23
How we sized the overcommit ratio
When we set up strict overcommit mode, we had to pick an overcommit ratio, or the alternate way of looking at it, we had to pick how much total address space commitment we allow. Because we first did this for compute servers, we decided to size things so that an active process would be able to use more or less all of the machine's physical memory, plus allow some extra on top to account for system processes that would get pushed into swap.
The logic is relatively straightforward:
- on dedicated compute servers with large amounts of RAM, we can
assume that must-have kernel memory is a negligible amount of
real memory.
- the last thing we want on a compute server is swap thrashing because that will kill performance for
both the job and the system; we would rather have jobs fail
outright.
- we have to assume that processes that ask for a lot of memory will use it; it is the only safe assumption.
- we further assume that there is no such thing as idle jobs;
if they exist, they're running (and thus using their memory,
and will thrash the machine if they wind up in swap).
- there will be some amount of ssh daemons, shells, and so on, but they will not use much memory.
Hence our target total address space commitment is the amount of RAM on the server plus a gigabyte or two to account for both the kernel's memory needs and the idle extra processes that will get shoved off to swap. Allowing more than physical memory does open up the possibility of going into swap thrashing, but it seems better to err on the liberal side just to make sure that people can extract every usable byte of RAM if they want to. (I am pretty sure that our users do not want us to save them from themselves quite that badly.)
(Unfortunately this requires fiddling the overcommit ratio on each machine to make the numbers come out right for its specific amount of RAM. I wish you could specify the total address space commitment as 'real memory plus <X>', where <X> might be negative.)
Our concerns with strict overcommit on our login servers come up precisely where these assumptions start breaking down, and now that I've written them out explicitly I can easily see that. We're probably okay on kernel memory usage, but some of the others are clearly off (eg, that all memory consuming processes are active at once).
2007-10-22
Vim options it turns out I want
This server recently moved from Fedora
Core 2 to FreeBSD, and in the process I discovered that I had quietly
become addicted to a few vim features, despite past dislikes of overly intelligent things that call themselves vi.
(I am pleased to report that the default FreeBSD version of vim
does not behave that way.)
So it turns out to be necessary to set a few vim parameters to get
it to behave the way I want. For my future reference, here's what I've
found I need to set in $HOME/.vimrc so far:
set nocompatible- This is the easiest way to get multi-level undo,
which has become my single must have, cannot live without it
vimfeature. (I should have expected this; I already knew that multi-level undo was addictive from using other editors with it.) set backspace=indent,eol,start- I have also gotten used to being able
backspace over anything, end of line included.
let loaded_matchparen = 1- This is one of those anti-features; I do
not want
vimto be freakily super-intelligent about (allegedly) matching delimiters.
I still sort of want vim to behave like basic vi, but apparently
missing these features is now too basic for me. Such is the corrupting
experience of using Linux, with its array of convenient extensions and
GNU this and that.
(I care about this partly because I write most WanderingThoughts entries
on this machine in vi, mostly out of inertia and habit.)
2007-10-17
Our experience with Linux's strict overcommit mode
As a follow-up to 64BitDrawback: after we had several machines crash due to being driven out of memory, dealing with the whole issue suddenly got a whole lot more urgent and we opted to try to solve it by turning on Linux's strict overcommit mode for swap allocation. At first we did this only on our compute servers, but after some of our login servers also OOM'd and crashed, we enabled it on them too.
(Strict overcommit has the great advantage that we don't need to pick a somewhat arbitrary number for a per-process size limit and thus that it is hard for people to be too displeased with.)
On the compute servers this has worked great and I consider it a big win. We have seen it choke off runaway jobs that would otherwise have killed the machine without perturbing anything, so it definitely does what we want it to, and no users have complained. (I'm not sure any of them have noticed, since the overcommit ratio we picked allows them to use all of the physical memory.)
Things are less clear on the login servers. Despite having lots of free memory and no swap usage, their committed address space grows slowly over time and after a while approaches the commit limit; at the worst, this could leave us with a system that can't start new processes despite having lots of capacity left.
I can only conclude that modern graphical applications actually do allocate a bunch of address space that they don't wind up using, for whatever reason. Over time, more people log in and run more programs, many of which are idle, many of which are not using all of their committed address space and never will, and the total committed space grows and grows. (Perhaps sometime it will reach a steady state.)
It's not at all clear what commit limit is appropriate in this situation, although we can probably defend a ratio that is very close to 100. If even that is too small, we probably might as well turn off strict overcommit (and look for another solution to groups of runaway programs; the login servers are 32-bit machines, so no single process can OOM them).