Wandering Thoughts archives

2016-05-06

A weird little Firefox glitch with cut and paste

Yesterday I wrote about Chrome's cut and paste, so to be fair today is about my long standing little irritation with Firefox's cut and paste. Firefox doesn't have Chrome's problem; I can cut and paste from xterm to it without issues. It's going the other way that there's a little issue under what I've now determined are some odd and very limited circumstances.

The simplest way to discuss this is to show you the minimal HTML to reproduce this issue. Here it is:

<html> <body>
<div>word1
  word2</div>
</body> </html>

If you put this in a .html file, point Firefox at the file, double click on 'word2', and try to paste it somewhere, you will discover that Firefox has put a space in front of it (at least on X). If you take out the whitespace before 'word2' in the HTML source, the space in the paste goes away. No matter how many spaces are before 'word2', you only get one in the pasted version; however, if you put a real hard tab before word2, you get a tab instead of a space.

(You can add additional ' wordN' lines, and they'll get spaces before the wordN when pasted. Having only word2 is just the minimal version.)

You might wonder how I noticed such a weird thing. The answer is that this structure is present on Wandering Thoughts entry pages, such as the one for this entry. If you look up at the breadcrumbs at the top (and at the HTML source), the name of the page is structured like this. As it happens, I do a lot of selecting the filenames of existing entries when I'm writing new entries (many of my entries refer back to other entries), so I hit this all the time.

(Ironically I would not have hit this issue if I didn't care about making the HTML generated by DWiki look neat. The breadcrumbs are autogenerated, so there's no particular reason to indent them in the HTML; it just makes the HTML look better.)

This entry is also an illustration of the use of writing entries at all. Firefox has been doing this for years and years, and for those years and years I just assumed it was something known because I never bothered to narrow down exactly when it happened. Writing this entry made me systematically investigate the issue and even narrow down a minimal reproduction so I can file a Firefox bug report. I might even get a fixed version someday.

PS: If you use Firefox on Windows or Mac, I'd be interested to know if this cut and paste issue happens on them or if it's X-specific.

web/FirefoxCutAndPasteBug written at 00:58:35; Add Comment

2016-05-04

My annoyance with Chrome's cut and paste support under X

In the past I've vaguely alluded to having problems getting xterm and Chrome to play nicely together for selecting text in xterm and pasting it into Chrome (eg here). At the time I couldn't clearly reproduce my problems, so I wrote it off as due to fallible memory. Well, my memory may be fallible but I also wasn't wrong about Chrome not playing nice with xterm's normal selections. It definitely happens and it's one of the real irritations with using Chrome for me.

Certainly, sometimes I can select text in xterm and paste it into a HTML form field in Chrome with the middle mouse button. But not always. I'm not sure if it most frequently happens with the fields involved in login forms, or if that's just my most common use of moving text from xterm to Chrome. Of course, being unable to paste a complex random password is very irritating. As a result, I basically always resort to my way of making xterm do real Copy and Paste; this is longer, but so far it's always worked.

(Instead of 'select, move mouse over, paste with middle mouse button', it is now 'select, hit Ctrl-Shift-C, move mouse over, use right mouse button to bring up Chrome field menu, select Paste'. I could save a bit of time by remembering that Ctrl-V is paste in Chrome, but it doesn't yet quite seem worth doing that.)

Now that I look this seems to be a known issue, based on Chromium bug #68886. That's been open since 2011, so I doubt things are going to change any time soon. I guess maybe I should memorize Ctrl-V.

(I know, this sounds petty. But sometimes it's the little things that really get under my skin and rob me of confidence in a program. If I don't understand when and why Chrome is not accepting text I'm trying to paste in from xterm, what other unpleasant things are lurking in its behavior that I just haven't stumbled into yet because I don't (yet) use it enough?)

web/ChromeCutAndPasteAnnoyance written at 23:53:11; Add Comment

The better way to clear SMART disk complaints, with safety provided by ZFS

A couple of months ago I wrote about clearing SMART complaints about one of my disks by very carefully overwriting sectors on it, and how ZFS made this kind of safe. In a comment, Christian Neukirchen recommended using hdparm --write-sector to overwrite sectors with read errors instead of the complicated dance with dd that I used in my entry. As it happens, that disk coughed up a hairball of smartd complaints today, so I got a chance to go through my procedures again and the advice is spot on. Using hdparm makes things much simpler.

So my revised steps are:

  1. Scrub my ZFS pool in the hopes that this will make the problem go away. It didn't, which means that any read errors in the partition for the ZFS pool is in space that ZFS shouldn't be using.

  2. Use dd to read all of the ZFS partition. I did this with 'dd if=/dev/sdc7 of=/dev/null bs=512k conv=noerror iflag=direct'. This hit several bad spots, each of which produced kernel errors that included a line like this:
    blk_update_request: I/O error, dev sdc, sector 1748083315
    

  3. Use hdparm --read-sector to verify that this is indeed the bad sector:
    hdparm --read-sector 1748083315 /dev/sdc
    

    If this is the correct sector, hdparm will report a read error and the kernel will log a failed SATA command. Note that is not a normal disk read, as hdparm is issuing a low-level read, so you don't get a normal message; instead you get something like this:

    ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
    ata3.00: irq_stat 0x40000001
    ata3.00: failed command: READ SECTOR(S) EXT
    ata3.00: cmd 24/00:01:73:a2:31/00:00:68:00:00/e0 tag 3 pio 512 in
             res 51/40:00:73:a2:31/00:00:68:00:00/00 Emask 0x9 (media error)
    [...]
    

    The important thing to notice here is that you don't get the sector reported (at least not in decoded form), so you have to rely on getting the sector number correct in the hdparm command instead of being able to cross check it against earlier kernel logs.

    (Sector 1748083315 is 0x6831a273 in hex. All the bytes are there in the cmd part of the message, but clearly shuffled around.)

  4. Use hdparm --write-sector to overwrite the sector, forcing it to be spared out:
    hdparm --write-sector 1748083315 <magic option> /dev/sdc
    

    (hdparm will tell you what the hidden magic option you need is when you use --write-sector without it.)

  5. Scrub my ZFS pool again and then re-run the dd to make sure that I got all of the problems.

I was pretty sure I'd gotten everything even before the re-scrub and the re-dd scan, because smartd reported that there were no more currently unreadable (pending) sectors or offline uncorrectable sectors, both of which it had been complaining about before.

This was a lot easier and more straightforward to go through than my previous procedure, partly because I can directly reuse the sector numbers from the kernel error messages without problems and partly because hdparm does exactly what I want.

There's probably a better way to scan the hard drive for read errors than dd. I'm a little bit nervous about my 512Kb block size here potentially hiding a second bad sector that's sufficiently close to the first, but especially with direct IO I think it's a tradeoff between speed and thoroughness. Possibly I should explore how well the badblocks program works here, since it's the obvious candidate.

(These days I force dd to use direct IO when talking to disks because that way dd does much less damage to the machine's overall performance.)

(This is the kind of entry that I write because I just looked up my first entry for how to do it again, so clearly I'm pretty likely to wind up doing this a third time. I could just replace the drive, but at this point I don't have enough drive bay slots in my work machine's case to do this easily. Also, I'm a peculiar combination of stubborn and lazy where it comes to hardware.)

linux/ClearingSMARTComplaintsII written at 00:19:21; Add Comment

2016-05-02

How I think you set up fair share scheduling under systemd

When I started writing this entry, I was going to say that systemd automatically does fair share scheduling between and describe the mechanisms that make that work. However, this turns out to be false as far as I can see; systemd can easily do fair share scheduling, but it doesn't do this by default.

The basic mechanics of fair share scheduling are straightforward. If you put all of each user's processes into a separate cgroup it happens automatically. Well. Sort of. You see, it's not good enough to put each user into a separate cgroup; you have to make it a CPU accounting cgroup, and a memory accounting cgroup, and so on. Systemd normally puts all processes for a single user under a single cgroup, which you can see in eg systemd-cgls output and by looking at /sys/fs/cgroup/systemd/user.slice, but by default it doesn't enable any CPU or memory or IO accounting for them. Without those enabled, the traditional Linux (and Unix) behavior of 'every process for itself' still applies.

(You can still use systemd-run to add your own limits here, but I'm not quite sure how this works out.)

Now, I haven't tested the following, but from reading the documentation it seems that what you need to do to get fair share scheduling for users is to enable DefaultCPUAccounting and DefaultBlockIOAccounting for all user units by creating an appropriate file in /etc/systemd/user.conf.d, as covered in the systemd-user.conf manpage and the systemd.resource-control manpage. You probably don't want to turn this on for system units, or at least I wouldn't.

I don't think there's any point in turning on DefaultMemoryAccounting. As far as I can see there is no kernel control that limits a cgroup's share of RAM, just the total amount of RAM it can use, so cgroups just can't enforce a fair share scheduling of RAM the way you can for CPU time (unless I've overlooked something here). Unfortunately, missing fair share memory allocation definitely hurts the overall usefulness of fair share scheduling; if you want to insure that no user can take an 'unfair' share of the machine, it's often just as important to limit RAM as CPU usage.

(Having discovered this memory limitation, I suspect that we won't bother trying to enable fair share scheduling in our Ubuntu 16.04 installs.)

linux/SystemdFairshareScheduling written at 23:11:23; Add Comment

The state of supporting many groups over NFS v3 in various Unixes

One of the long standing limits with NFSv3 is that the protocol only uses 16 groups; although you can be in lots of groups on both the client and the server, the protocol itself only allows the client to tell the server about 16 of them. This is a real problem for places (like us) who have users who want or need to be in lots of groups for access restriction reasons.

For a long time the only thing you could was shrug and work around this by adding and removing users from groups as their needs changed. Fortunately this has been slowly changing, partly because people have long seen this as an issue. Because the NFS v3 protocol is fixed, everyone's workaround is fundamentally the same: rather than taking the list of groups from the NFS request itself, the NFS server looks up what groups the user is in on the server.

(In theory you could merge the local group list with the request's group list, but I don't think anyone does that; they just entirely overwrite the request.)

As far as I know, the current state of affairs for various Unixes that we care about runs like this:

I care about how widespread the support for this is because we've finally reached a point where our fileservers all support this and so we could start putting people in more than 16 groups, something that various parties are very much looking forward to. So I wanted to know whether officially adding support for this would still leave us with plenty of options for what OS to run on future fileservers, or whether this would instead be a situation more like ACLs over NFS. Clearly the answer is good news; basically anything we'd want to use as a fileserver OS supports this, even the unlikely candidate of Oracle Solaris.

(I haven't bothered checking out the state of support for this on the other *BSDs because we're not likely to use any of them for an NFS fileserver. Nor have I looked at the state of support for this on dedicated NFS fileserver appliances, because I don't think we'll ever have the kind of budget or need that would make any of them attractive. Sorry, NetApp, you were cool once upon a time.)

unix/NFSManyGroupsState written at 00:46:00; Add Comment

2016-04-30

I should keep and check notes even on my own little problems

I mentioned yesterday that I had a serious issue when I installed a VMWare Workstation update, going from 12.1.0 to 12.1.1. I wound up being very grumpy about it, disrupted the remaining part of my work day, filed a support request with VMWare, and so on, and eventually VMWare support came through with the cause and a workaround..

It turns out that I could have avoided all of that, because I ran into this same problem back when I upgraded to Fedora 23. At the time I did my Internet research, found the workaround, and applied it to my machine. This means that I could have proceeded straight to re-doing the workaround if I'd remembered this. Or, more likely, if I'd kept good notes on the problem then and remembered to read them this time.

We try to make and keep good notes for problems on our production systems, or even things that we run into in testing things for our work environment; we have an entire system for it. But I generally don't bother doing the same sort of thing for my own office workstation; when I find and fix problems and issues I may take some notes, but they're generally sketchy, off the cuff, and not centrally organized. And partly because of this, I often don't think to check them; I think I just assume I'm going to remember things about my own workstation (clearly this is wrong).

So, stating the obvious: I would be better off if I kept organized notes about what I had to do to fix problems and get various things going on my workstation, and put the notes into one place in some format (perhaps a directory with text files). Then I could make it a habit to look there before I do some things, or at least when I run into a problem after I do something.

Also, when I make these notes I should make them detailed, including dates and versions of what they're about. It turns out that I actually had some very sketchy notes about this problem from when I upgraded to Fedora 23 (they were some URLs that turned out to be discussions about the issue), but they didn't have a date or say 'this applied when I upgraded to Fedora 23 with VMWare 12' or anything like that. So when I stumbled over the file and skimmed it, I didn't realize that the URLs were still relevant; I skipped that because I assumed that of course it had to be outdated.

(I'm sure that when I wrote the note file in the first place I assumed that I'd always remember the context. Ha ha, silly me, I really should know better by now. Especially since I've written more than one entry here about making just that assumption and being wrong about it.)

sysadmin/KeepAndCheckNotesOnMyProblems written at 23:32:46; Add Comment

A story of the gradual evolution of network speeds without me noticing

A long time ago I had a 28.8Kbps dialup connection running PPP (it lasted a surprisingly long time). A couple of times I really needed to run a graphical X program from work while I was at home, so I did 'ssh -X work' and then started whatever program it was. And waited. And waited. Starting and using an X program that is moving X protocol traffic over a 28.8K link gives you a lot of time to watch the details of how X applications paint their windows, and it teaches you patience. It's possible, but it's something you only really do in desperation.

(I believe one of the times I did this was when I really needed to dig some detail out of SGI's graphical bug reporting and support tool while I was at home. This was back in the days before all of this was done through the web.)

Eventually I finally stepped up to DSL (around this time), although not particularly fast DSL; I generally got 5000 Kbps down and 800 Kbps up. I experimented with doing X over my DSL link a few times and it certainly worked, but it still wasn't really great. Simple text stuff like xterm (with old school server side XLFD fonts) did okay, but trying to run something graphical like Firefox was still painful and basically pointless. At the time I first got my DSL service I think that 5/.8 rate was pretty close to the best you could get around here, but of course that changed and better and better speeds became possible. Much like I stuck with my dialup, I didn't bother trying to look into upgrading for a very long time. More speed never felt like it would make much of a difference to my Internet experience, so I took the lazy approach.

Recently various things pushed me over the edge and I upgraded my DSL service to what is about 15/7.5 Mbps. I certainly noticed that this made a difference for things like pushing pictures up to my Flickr, but sure, that was kind of expected with about ten times as much upstream bandwidth. Otherwise I didn't feel like it was any particular sea change in my home Internet experience.

Today I updated my VMWare Workstation install and things went rather badly. I'd cleverly started doing all of this relatively late in the day, I wound up going home before VMWare had a chance to reply to the bug report I filed about this. When I got home, I found a reply from VMWare support that, among other things, pointed me to this workaround. I applied the workaround, but how to test it? Well, the obvious answer was to try firing up VMWare Workstation over my DSL link. I didn't expect this to go very well for the obvious reasons; VMWare Workstation definitely is a fairly graphical program, not something simple (in X terms) like xterm.

Much to my surprise, VMWare Workstation started quite snappily. In fact, it started so fast and seemed so responsive that I decided to try a crazy experiment: I actually booted up one of virtual machines. Since this requires rendering the machine's console (more or less embedded video) I expected it to be really slow, but even this went pretty well.

Bit by bit and without me noticing, my home Internet connection had become capable enough to run even reasonably graphically demanding X programs. The possibility of this had never even crossed my mind when I considered a speed upgrade or got my 15/7.5 DSL speed upgrade; I just 'knew' that my DSL link would be too slow to be really viable for X applications. I didn't retest my assumptions when my line speed went up, and if it hadn't been for this incident going exactly like it did I might not have discovered this sea change for years (if ever, since when you know things are slow you generally don't even bother trying them).

There's an obvious general moral here, of course. There are probably other things I'm just assuming are too slow or too infeasible or whatever that are no longer this way. Assumptions may deserve to be questioned and re-tested periodically, especially if they're assumptions that are blocking you from nice things. But I'm not going to be hard on myself here, because assumptions are hard to see. When you just know something, you are naturally a fish in water. And if you question too many assumptions, you can spend all of your time verifying that various sorts of water are still various sorts of wet and never get anything useful done.

(You'll also be frustrating yourself. Spending more than a small bit of your time verifying that water is still wet is not usually all that fun.)

tech/HomeInternetSpeedChanges written at 02:18:23; Add Comment

2016-04-29

You should plan for your anti-spam scanner malfunctioning someday

Yesterday I mentioned that the commercial anti-spam and anti-virus system we use ran into a bug where it hung up on some incoming emails. One reaction to this is to point and laugh; silly us for using a commercial anti-spam system, we probably got what we deserved here. I think that this attitude is a mistake.

The reality is that all modern anti-spam and anti-virus systems are going to have bugs. It's basically inherent in the nature of the beast. These systems are trying to do a bunch of relatively sophisticated analysis on relatively complicated binary formats, like ZIP files, PDFs, and various sorts of executables; it would be truly surprising if all of the code involved in doing this was completely bug free, and every so often the bugs are going to have sufficiently bad consequences to cause explosions.

(It doesn't even need to be a bug as such. For example, many regular expression engines have pathological behavior when exposed to a combination of certain inputs and certain regular expressions. This is not a code bug since the RE engine is working as designed, but the consequences are similar.)

What this means is that you probably want to think ahead about what you'll do if your scanner system starts malfunctioning at the level of either hanging or crashing when it processes a particular email message. The first step is to think about what might happen with your overall system and what it would look like to your monitoring. What are danger signs that mean something isn't going right in your mail scanning?

Once you've considered the symptoms, you can think about pre-building some mail system features to let you deal with the problem. Two obvious things to consider are documented ways of completely disabling your mail scanner and forcing specific problem messages to bypass the mail scanner. Having somewhat gone through this exercise myself (more than once by now), I can assure you that developing mailer configuration changes on the fly as your mail system is locking up is what they call 'not entirely fun'. It's much better to have this sort of stuff ready to go in advance even if you never turn out to need it.

(Building stuff on the fly to solve your urgent problem can be exciting even as it's nerve-wracking, but heroism is not the right answer.)

At this point you may also want to think about policy issues. If the mail scanner is breaking, do you have permission to get much more aggressive with things like IP blocks in order to prevent dangerous messages from getting in, or is broadly accepting email important enough to your organization to live with the added risks of less or no mail scanning? There's no single right answer here and maybe the final decisions will only be made on the spot, but you and your organization can at least start to consider this now.

spam/PlanForSpamScannerMalfunction written at 00:21:06; Add Comment

2016-04-28

You should probably track what types of files your users get in email

Most of the time our commercial anti-spam system works fine and we don't have to think about it or maintain it (which is one of the great attractions of using a commercial system for this). Today was not one of those times. This morning, we discovered that some incoming email messages we were receiving make its filtering processes hang using 100% CPU; after a while, this caused all inbound email to stop. More specifically, the dangerous incoming messages appeared to be a burst of viruses or malware in zipped .EXEs.

This is clearly a bug and hopefully it will get fixed, but in the mean time we needed to do something about it. Things like, say, blocking all ZIP files, or all ZIP files with .EXEs in them. As we were talking about this, we realized something important: we had no idea how many ZIP files our users normally get, especially how many (probably) legitimate ones. If we temporarily stopped accepting all ZIP file attachments, how many people would we be affecting? No one, or a lot? Nor did we know what sort of file types are common or uncommon in the ZIP files that our users get (legitimate or otherwise), or what sort of file types users get other than ZIP files. Are people getting mailed .EXEs or the like directly? Are they getting mailed anything other than ZIP files as attachments?

(Well, the answer to that one will be 'yes', as a certain amount of HTML email comes with attached images. But you get the idea.)

Knowing this sort of information is important for the same reason as knowing what TLS ciphers your users are using. Someday you may be in our situation and really want to know if it's safe to temporarily (or permanently) block something, or whether it'll badly affect users. And if something potentially dangerous has low levels of legitimate usage, well, you have a stronger case for preemptively doing something about it. All of this requires knowing what your existing traffic is, rather than having to guess or assume, and for that you need to gather the data.

Getting this sort of data for email does have complications, of course. One of them is that you'd really like to be able to distinguish between legitimate email and known spam in tracking this sort of stuff, because blocking known spam is a lot different than blocking legitimate email. This may require logging things in a way that either directly ties them to spam level information and so on or at least lets you cross-correlate later between different logs. This can affect where you want to do the logging; for example, you might want to do logging downstream of your spam detection system instead of upstream of it.

(This is particularly relevant for us because obviously we now need to do our file type blocking and interception upstream of said anti-spam system. I had been dreaming of ways to make it log information about what it saw going by even if it didn't block things, but now maybe not; it'd be relatively hard to correlate its logs again our anti-spam logs.)

spam/KnowingAttachmentTypes written at 01:36:06; Add Comment

2016-04-26

How 'there are no technical solutions to social problems' is wrong

One of the things that you will hear echoing around the Internet is the saying that there are no technical solutions to social problems. This is sometimes called 'Ranum's Law', where it's generally phrased as 'you can't fix people problems with software' (cf). Years ago you probably could have found me nodding along sagely to this and full-heartedly agreeing with it. However, I've changed; these days, I disagree with the spirit of the saying.

It is certainly true you cannot outright solve social problems with technology (well, almost all of the time). Technology is not that magical, and the social is more powerful than the technical barring very unusual situations. And in general social problems are wicked problems, and those are extremely difficult to tackle in general. This is an important thing to realize, because social problems matter and computing has a great tendency to either ignore them outright or assume that our technology will magically solve them for us.

However, the way that this saying is often used is for technologists to wash their hands of the social problems entirely, and this is a complete and utter mistake. It is not true that technical measures are either useless or socially neutral, because the technical is part of the world and so it basically always affects the social. In practice, in reality, technical features often strongly influence social outcomes, and it follows that they can make social problems more or less likely. That social problems matter means that we need to explicitly consider them when building technical things.

(The glaring example of this is all the various forms of spam. Spam is a social problem, but it can be drastically enabled or drastically hindered by all sorts of technical measures and so sensible modern designers aggressively try to design spam out of their technical systems.)

If we ignore the social effects of our technical decisions, we are doing it wrong (and bad things usually ensue). If we try to pretend that our technical decisions do not have social ramifications, we are either in denial or fools. It doesn't matter whether we intended the social ramifications or didn't think about them; in either case, we may rightfully be at least partially blamed for the consequences of our decisions. The world does not care why we did something, all it cares about is what consequences our decisions have. And our decisions very definitely have (social) consequences, even for small and simple decisions like refusing to let people change their login names.

Ranum's Law is not an excuse to live in a rarefied world where all is technical and only technical, because such a rarefied world does not exist. To the extent that we pretend it exists, it is a carefully cultivated illusion. We are certainly not fooling other people with the illusion; we may or may not be fooling ourselves.

(I feel I have some claim to know what the original spirit of the saying was because I happened to be around in the right places at the right time to hear early versions of it. At the time it was fairly strongly a 'there is no point in even trying' remark.)

tech/SocialProblemsAndTechnicalDecisions written at 23:50:13; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.