Wandering Thoughts archives

2022-11-14

Firefox will now copy non-breaking spaces from HTML and that can be a problem

For many years, Firefox's copy and paste had a limitation or bug, which is that if you selected something that contained non-breaking spaces and copied it, the non-breaking spaces turned into regular spaces when you pasted. This was traced in bug #359303 and then bug #1769534. In recent Firefox Nightly builds, Mozilla (finally) changed this behavior, but the change has created a new issue for some people, me included. At least on Unix, there are a number of things that don't consider non-breaking spaces to be spaces (or whitespace).

(I initially thought that this happened in Firefox 103, but that's only the main bug #359303. The copy from HTML issue that's bug #1769534 is only fixed in Nightly. Or changed, from my perspective.)

Suppose, not hypothetically, that you have an email archive of instructional email on your local internal website, one that was made by something like Hypermail. Your original plain text email contained indented commands to run, which Hypermail rendered using non-breaking spaces to force a given amount of indent. When you then select and copy these indented commands, various things go wrong.

  • If you paste such an indented command into Bash (or some other shell using GNU readling), it will give you a syntax error. Readline will pass the non-breaking spaces through intact and as far as Bash is concerned, they're not whitespace so it takes them as part of the command name. This can result in cryptic errors in the form of '<some amount of apparent whitespace> prog: command not found'.

  • If you paste this into Vim (in a terminal such as xterm or gnome-terminal), what you get looks like a line that starts with some spaces, but Vim commands line '<<' won't unindent it and programs like 'unexpand' won't turn those 'spaces' back into tabs.

  • If you paste a code example for a language like Python (that cares about indentation) into either a file or an interactive Python session, you're likely to get an error like 'SyntaxError: invalid non-printable character U+00A0'.

The last means that if you want to render indented code examples on the web with nice fonts and everything, instead of a <pre> block, you very much don't want to do the indentation with non-breaking spaces (or at least not only with non-breaking spaces). You'll break the code examples for people in at least some environments.

(In case you're curious, Github appears to render code samples using tables and HTML <span> alignment. Somehow. I'm sure it's a chunk of work to design.)

The Firefox change for copying from HTML text has some heuristics for detecting when non-breaking spaces should be preserved and when they should be transformed into spaces instead, but those heuristics are fragile. One situation they don't cover is when a HTML renderer has used entirely non-breaking spaces to create indentation, with no spaces interspersed (or before or afterward). You can read a description of the specific heuristic in the changeset description or in the comment in the change to nsPlainTextSerializer.cpp.

At the moment, Chrome on Unix still appears to convert non-breaking spaces to plain spaces when you copy text out of it and into a terminal window (since this is X, that qualification is necessary; X copy and paste is somewhat intricate and tangled). I haven't tested this much, since I don't normally use Chrome. It's possible that Chrome does something similar to Firefox's new heuristics but has better-developed heuristics (or at least heuristics that work if you have a bunch of non-breaking spaces used to create indentation). I don't know what Safari or Edge do.

(Microsoft has a Linux version of Edge, which may someday be required in order to use Microsoft Teams and other online apps on Linux, but I have not installed it and don't plan to any time soon.)

As far as I know, there's no about:config option for this behavior in Firefox; the change was considered merely a bugfix. Fixing it in Unix programs requires all sorts of settings, although I think you can at least fix it fairly broadly in things that use GNU Readline with the first instructions from this SO answer. There is probably a way to make Vim convert non-breaking spaces into spaces when you paste text in in insert mode, but it's a bit tangled because of Vim's 'paste' option, which I have turned on because I need its effects.

FirefoxNonbreakingSpacesCopyIssue written at 22:54:31; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.