Wandering Thoughts archives

2021-10-29

Why browsers are driven to offer some degree of remote control

Not too long ago I wrote about changes in Firefox's remote control on Unix, and in that entry mentioned that probably all browsers on all platforms implement some degree of remote control. You might wonder why Firefox has a remote control feature and why this is probably a general thing. My belief is that there is a lesser and a greater reason, but to get there I'm going to start with a scenario.

Suppose that you're reading email in your mail client, and you hit a message with a link that you want to visit. You click on the link, the mail client does something and your browser starts on the link, and then you park the browser with the web page unread (as one does). You move on to another message, which also has a link you need to look at, so you click on the link, the mail client does something, and one way or another you can read the new web page.

One possible answer for what happened is that you now have two running copies of your browser, one displaying the first web page and the second displaying the second web page (and every new link you click in your mail client will start another copy of the browser). However, there are two problems with this. The lesser problem is that browsers are big, resource consuming things, so running extra copies of them is less than ideal. All browsers can open multiple windows or tabs, so it would be a significant resource saving if we could run just one copy and have it display all of the web pages.

The bigger problem is your web browser profile, which almost always contains things that your browser will want to update (such as your history and your current cookies). If there are multiple copies of your browser running they all need to coordinate access to your profile; they need to carefully lock it for updates, and probably to notify each other that important things have been updated. Essentially it's a shared database, and shared databases are pain points. Given the locking, it's also a fine way for one copy of the browser to cause another copy to perform badly, as the other copy waits for an update lock or to pick up new things or whatever.

Life is much easier if only one instance of the browser can be running in a given profile at a time. Then that browser instance can write updates to things out relatively freely and its internal information (such as your cookies) are as coherent between windows and tabs as it wants them to be (in the old days lots of things were always coherent, but these days they're increasingly isolated).

These two issues together provide strong motivation for browsers to implement some sort of remote control. This might be through a defined system API that browsers hook into (and your mail client uses), or it might be through the browser itself being able to find the master instance that currently owns access to a particular profile. The latter approach means that if people just run the browser (either from their desktop or a command line, with or without a URL), they get what they want.

web/BrowsersWhyRemoteControl written at 23:01:03; Add Comment

Things to do in Python 3 when your Unix standard input is badly encoded

Today I had a little adventure with Python 3. I have a program that takes standard input, reads and lightly processes a bunch of headers before writing them out, then just copies the body (of an email message, as it happens) from standard input to standard output. Normally it gets well formed input, with no illegally encoded UTF-8. Today, there were some stray bytes and the world blew up. Dealing with this was far harder than it should have been, partly because the documentation has issues.

Although the documentation for sys.stdin will not tell you this, sys.stdin most likely has the API of io.TextIOBaseWrapper. Otherwise, your only method for finding out what attributes and methods it supports is the ever friendly 'help(type(sys.stdin))' in a Python interpreter. If you're on Python 3.7 or later, what you probably want to do about a badly encoded standard input is change how it handles encoding errors with .reconfigure():

sys.stdin.reconfigure(errors="surrogateescape")

Now that I've learned about this, I think that you should generally do this as the first operation in any Python 3 program that reads from standard input, unless you are absolutely sure that the input being not well-formed UTF-8 is a fatal error (it almost never is).

Unfortunately for me, Ubuntu 18.04 LTS has Python 3.6.9 as its /usr/bin/python3 so I can't do this. One option appears to be to detach the underlying io.BufferedReader behind sys.stdin and recreate it with your desired error handling. I believe this would be:

b = sys.stdin.detach()
sys.stdin = io.TextIOWrapper(b, errors="surrogateescape")

Your options for errors= are documented in the codecs module's documentation on Error handlers. You may prefer something like "backslashreplace" or "namereplace", since they make the output UTF-8 correct. I'm old-fashioned, so I prefer to pass through the bad bytes exactly as they are.

Another option is to directly use the underlying sys.stdin.buffer object without changing sys.stdin. This object supports all of the usual IO methods like .readline(), but it returns bytes instead of strings; you can then deal with the bytes however you want, with or without decoding them with some form of error handling. Similarly, sys.stdout.buffer takes bytes for .write(), not strings. This means that the trouble free way of copying standard input to standard output is:

sys.stdout.buffer.write( sys.stdin.buffer.read() )

If you've previously written to the text mode sys.stdout, you need to flush it before you start this copy with 'sys.stdout.flush()'. If you omit this, Python may do odd and unhelpful things with your initial output.

(This is probably all well known in the community of frequent Python developers, but these days I'm an infrequent Python programmer.)

python/StdinHandlingBadEncoding written at 00:20:08; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.