== Things to do in Python 3 when your Unix standard input is badly encoded Today [[I had a little adventure with Python 3 https://twitter.com/thatcks/status/1453715712317526018]]. I have a program that takes standard input, reads and lightly processes a bunch of headers before writing them out, then just copies the body (of an email message, as it happens) from standard input to standard output. Normally it gets well formed input, with no illegally encoded UTF-8. Today, there were some stray bytes and the world blew up. Dealing with this was far harder than it should have been, partly because [[the documentation has issues https://twitter.com/thatcks/status/1453719136895336452]]. Although the documentation for [[sys.stdin https://docs.python.org/3/library/sys.html#sys.stdin]] will not tell you this, _sys.stdin_ most likely has the API of [[io.TextIOBaseWrapper https://docs.python.org/3/library/io.html#io.TextIOWrapper]]. Otherwise, your only method for finding out what attributes and methods it supports is the ever friendly '_help(type(sys.stdin))_' in a Python interpreter. If you're on Python 3.7 or later, what you probably want to do about a badly encoded standard input is change how it handles encoding errors with [[_.reconfigure()_ https://docs.python.org/3/library/io.html#io.TextIOWrapper.reconfigure]]: .pn prewrap on > sys.stdin.reconfigure(errors="surrogateescape") Now that I've learned about this, I think that you should generally do this as the first operation in any Python 3 program that reads from standard input, unless you are absolutely sure that the input being not well-formed UTF-8 is a fatal error (it almost never is). Unfortunately for me, Ubuntu 18.04 LTS has Python 3.6.9 as its /usr/bin/python3 so I can't do this. One option appears to be to detach the underlying [[io.BufferedReader https://docs.python.org/3/library/io.html#io.BufferedReader]] behind _sys.stdin_ and recreate it with your desired error handling. I believe this would be: > b = sys.stdin.detach() > sys.stdin = io.TextIOWrapper(b, errors="surrogateescape") Your options for _errors=_ are documented in [[the _codecs_ module's documentation on Error handlers https://docs.python.org/3/library/codecs.html#error-handlers]]. You may prefer something like "backslashreplace" or "namereplace", since they make the output UTF-8 correct. I'm old-fashioned, so I prefer to pass through the bad bytes exactly as they are. Another option is to directly use the underlying _sys.stdin.buffer_ object without changing _sys.stdin_. This object supports all of the usual IO methods like _.readline()_, but it returns bytes instead of strings; you can then deal with the bytes however you want, with or without decoding them with some form of error handling. Similarly, _sys.stdout.buffer_ takes bytes for _.write()_, not strings. This means that the trouble free way of copying standard input to standard output is: > sys.stdout.buffer.write( sys.stdin.buffer.read() ) If you've previously written to the text mode _sys.stdout_, you need to flush it before you start this copy with '_sys.stdout.flush()_'. If you omit this, Python may do odd and unhelpful things with your initial output. (This is probably all well known in the community of [[frequent Python developers ../programming/FrequentVsInfrequentDevs]], but these days I'm an infrequent Python programmer.)