Some notes on abusing the pexpect Python module

May 20, 2016

What you are theoretically supposed to use pexpect for is to have your program automatically interact with interactive programs. When they produce certain sorts of output, you recognize it and take action; when you see prompts, you can automatically answer them. Pexpect is often used this way to automate things that expect to be operated manually by a real person. This is not what I'm using pexpect for. What I'm using it for is to start a program in what it thinks is an interactive environment, capture its output if all goes well, and if things go wrong allow a human operator to step in and interact with the program (all the while still capturing the output). This means that I'm ignoring almost all of pexpect's functionality and abusing parts of the rest in ways that it was probably not designed for.

Before I start, I need to throw in a disclaimer. There are multiple versions of pexpect out there; my impression is that development stalled for a while and then picked up recently. As I write this, the pexpect documentation talks about 4.0.1, but what I've used is no later than 3.1. Pexpect 4 may fix some of the issues I'm going to grumble about.

Supposing that my case is what you want to do, you start out by spawning a command:

child = pexpect.spawn(YOURCOMMAND, args=args, timeout=None)

It's important to set a timeout of None as the starting timeout. If you want to have a timeout at all, for example to detect that the remote end has gone silent, you want to control it on a call by call basis.

Now you want to collect output from the child command:

res = []
while not child.closed and child.isalive():
   try:
      r = child.read_nonblocking(size = 16*1024, timeout=YOURTIMEOUT)
      res.append(r)
   except pexpect.EOF:
      # expected, just stop
      break
   except pexpect.TIMEOUT:
      # do whatever you want to recover
      return recover_child(child, res)

You might as well set size to large here. Although the documentation doesn't tell you this, it is just the maximum amount of data your read can ever return; it doesn't block until that much data is available. My principle is 'if the command generates a lot of output, let's read it in big blocks'.

We're not done once pexpect has raised an EOF. We need to do some cleanup to make sure that the child's exit status is available:

 # Some of this is probably superstition
 if not child.closed and child.isalive():
    child.wait()

 return (res, child.status)

Pexpect 3.1's documentation is not entirely clear on what you have to check when in order to see if the child is alive or not. Note that .isalive() has the (useful) side effect of harvesting the child's exit status if the child is not alive. It's helpfully not valid to call .wait() on a dead child, at least in 3.1, so you have to check carefully first.

As pexpect documents, it splits the actual OS process exit status into child.exitstatus and child.signalstatus (and various things return one or the other). The whole status is available as child.status, but you may find one or the other variant more useful (for example if you're really only interested in 'did the command exit with status 0 or did something go boom').

Allowing the user to interact with the child is somewhat more involved. Fundamentally we call child.interact() repeatedly, but there is a bunch of things that you need to do around this.

def talkto(child):
   # Set up to log interactive output
   res = []
   def save_output(data):
      if data: res.append(data)
      return data

   while not child.closed and child.isalive():
      try:
         child.interact(output_filter=save_output)
      except OSError as e:
         # Usually an EOF from the command.
         # Complain somehow.
         break

      # If the child is alive here, the user has
      # typed a ^] to escape from interact().
      # What happens next is up to you.

Yes, you read that right. Uniquely, pexpect's child.interact() does not raise pexpect.EOF on EOF from the child; instead it generally passes through an underlying OSError that it got (my notes don't say what that OSError usually is). In general, if you get an OSError here you have to assume that the session is dead, although pexpect doesn't necessarily know it yet.

Usefully, child.interact() sets things up so that control characters and so on that the user types are normally passed through directly to the child process instead of affecting your Python program. This means that under normal circumstances, if you type eg ^C your Python code won't get hit with a SIGINT; it'll go through to the child program and the child program will do whatever it does in reaction.

What you do if the user chooses to use ^[ to exit from child.interact() is up to you. Note that you can allow them to resume the interaction; just go back through your loop to call child.interact() again. If you allow the user to abandon the child and exit your talkto() function (you probably want to), you need to do some more cleanup of the child:

# after interact() returns, try to
# read anything left over, then close the child.
try:
   r = child.read_nonblocking(size=128*1024, timeout=0)
   res.append(r)
except (pexpect.EOF, pexpect.TIMEOUT, OSError):
   pass

child.close(force=True)

Calling read_nonblocking with timeout=0 means what you think it does; it's a non-blocking read of whatever (final) data is available right now, with no waiting for anything more to come in from the child.

At least in pexpect 3.1, you basically should call child.close() with force=True or you will get a pexpect error if the child stays alive, which it may. Setting force winds up hitting the child with a SIGKILL if nothing else seems to work, which is relatively sure.

(Although the documentation doesn't mention it, if the child is alive it always gets sent SIGHUP and then SIGINT first. Well, this happens in older versions of pexpect; the 4.0.1 code is a bit different and I haven't dug through it.)

Possibly there is a better Python module for this sort of interaction in general. If so, it is too late for me; I've already written all of this code and I hope to not have to touch it again before we have to port it to Python 3 (if ever).

(My impression is that you should try to use pexpect 4 if you can, as the code has been overhauled and the documentation at least somewhat improved.)

Written on 20 May 2016.
« Some basic data on the hit rate of the Spamhaus DBL here
Please stop the Python 2 security scaremongering »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri May 20 01:50:53 2016
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.