Wandering Thoughts archives

2005-12-31

A logical consequence of def being an executable statement

I've mentioned before that in Python, def is actually an executable statement (in FunctionDefinitionOrder). A logical consequences of this is that default values for function arguments are evaluated only once, when the def runs.

I say this because expressions generally get evaluated when any Python statement runs, so the expressions in things like 'def foobar(a, b=greeble(c), d=None):' are not being an exception. The exception would be if they were not evaluated then and were instead preserved as little lambda expressions to be evaluated later.

On an interesting side note, setting default values for arguments is one of the two places in Python where the same variable name can be in two different scopes simultaneously; the other is invoking a function with keyword arguments. Everywhere else you write 'a=a' the two a's are the same, but in these two cases the a being assigned to is in the new function's scope and the expression's a is in your current scope.

The result can be a little bit confusing, as you can see in StructsWithDefaults. (Which is one reason I like to avoid it.)

Sidebar: mutable default arguments

This means that mutable default arguments are usually not what you want, because if you change them they will stay mutated in subsequent invocations of the function. The usual pattern around this is something like:

def foobar(a, deflst=None):
  if deflst is None:
    deflst = []
  ....
DefAsStatementConsequence written at 02:45:49; Add Comment

2005-12-30

A Python surprise: the consequences of variable scope

The comment on my AClosureConfusion entry brought up a little Python surprise that I had known about in the back of my mind but never thought about fully before: the consequences of Python's scoping rules for variables.

Ignoring closures for the moment, Python only has two scopes for variables: global to the module and local to the entire function. (Closures introduce additional scopes for the variables of the 'outer' functions.)

Well, sure, you know that. But the consequence is that any variable in a function is 'live' for the entire function, including variables used only as the index in for loops and variables used only for elements in list comprehensions. So when you write:

newlst = [x for x in lst if foobar(x)]

For the rest of the function (or the entire module) you will have an x variable (in this case with the value of the last element of lst at the time of the list comprehension).

This is a little bit surprising, at least for me, because intellectually I usually consider such variables dead the moment the list comprehension or for loop is done. For example, I don't read their value after that point.

In some languages, index variables really are dead after the loop finishes; references to them outside the loop will get some sort of 'no such variable' error. In some other languages, such as perl, such scope restriction is optional but the recommended style.

VariableScopeConsequences written at 00:40:19; Add Comment

2005-12-26

Thinking about a closure confusion

Consider the following Python code, which is a very simplified version of certain sorts of real code that people write:

ll = []
for i in range(1,10):
  ll.append(lambda z: i+z)

print ll[0](0), ll[8](0)

A lot of people writing Python code like this for the first time expect to see it print '1 9', when in fact it prints '9 9'.

What I think is going on here is that people are thinking of closures as doing 'value capture' instead of what I will call 'slot capture'. In value capture the closure would capture i's current value, and things would work right. In 'slot capture' the closure captures i's slot in the stack frame and uses it to fish out the actual value when the closure runs. Since i always uses the same slot on every go through the for loop, every lambda captures the same slot value and thus afterwards will evaluate to the same thing.

Slot capture is harder to think about because you have to know more about language implementation; in this case, you need to know what does and doesn't create a new, unique stack frame. For example, this slightly more verbose version of the code does work right:

def make(i):
  return lambda z: i+z

ll = []
for i in range(1,10):
  ll.append(make(i))

Here the make function is needed for nothing more than magically forcing the creation of a new unique stack frame, with the net effect of capturing the value of each i in the lambdas. Is it any wonder that people scratch their heads and get this wrong every so often?

You can think about this not as stack frames but as scopes. This may make the make() example clearer: functions have a different scope from their callers, but the inside of a loop is in the same scope as outside it. (There are some languages where this is not true, so you can define variables inside a loop that aren't visible after it finishes. Even then, you may or may not get a new stack frame every time through the loop. Aren't closures fun?)

This sort of closure confusion is not restricted to Python; here is an example of the same issue coming up in Javascript, in a real version of my Python example.

Scope rules can get quite interesting and complicated, and of course they interact with closures in fun ways. For example, Javascript Closures has a long writeup of the Javascript scope rules, which are somewhat more exciting than the Python ones. (It also has nice examples of the (abstract) implementation details.)

AClosureConfusion written at 00:07:27; Add Comment

2005-12-22

What I really want is error-shielding interfaces

Recently (for my version of 'recently'), the blogosphere had a little tiff about 'humane' versus 'minimalist' APIs, starting with Martin Fowler's article and continuing onwards (there's a roundup here). To caricature the positions, the minimalist side feels that APIs should have only the basic building blocks, and the humane side feels that APIs should have all of the commonly used operations.

(Part of the fun of the whole exchange is that it got framed as a Ruby versus Java issue, due to the examples picked.)

I come down somewhere in between, because I am nervous about the large APIs that humane design creates but I think that minimalist APIs offload too much work to the programmer. What I like is APIs with basic building blocks and routines to do the common operations that are easy to get wrong. Unfortunately I have no snappier name for this approach than 'error-shielding interfaces'.

For example, getting the last element of a list. Something like 'aList.get(aList.size - 1)' contains enough code that there's chances for errors, so I much prefer either 'aList.last' or 'aList.get(-1)'. As a bonus, they clearly communicate your intent to people reading your code without needing them to decode the effects of the expression. (I have a vague preference for the aList[-1] approach, because it clearly generalizes to the Nth-last element.)

Similarly, I think that Python's .startswith() and .endswith() string methods are great additions to the API. They're common enough, and they make sure no programmer will ever write a stupid off by one error in the equivalent Python code. (I've written it. And eyeballed it carefully to make sure I got it right.)

In Python, there's another reason to add common operations to extension module APIs: an extension module can often implement a common operation significantly more efficiently than Python code can. For example, the Python equivalent of .endswith() pretty much has to make a temporary string object.

(There's also Ian Bicking's contribution, more or less on interface design in general here, which is well worth reading and thinking about.)

OnInterfaceStyles written at 02:56:53; Add Comment

2005-12-19

Initializing Python struct objects with optional defaults

Recently I was writing code to register names and their attributes. There were enough attributes that I didn't want to specify all of them all of the time, so I did the obvious Python thing: I made the register() function take a bunch of keyword arguments that had default values. The attributes are stored with a struct object, because I wanted an attrs.attribute syntax for accessing them.

The straightforward way to initialize the struct object was to write 'vi = ViewInfo(factory = factory, onDir = onDir, ...)', but that sort of repetition is annoying, especially when I had a perfectly good set of name/value pairs in the form of register()'s arguments. If only I could get at them.

It turns out that you can use the locals() dictionary for this, if you use it before you set any local variables in the function. So:

class ViewInfo(Struct):
  pass

def register(name, factory, onDir = False, \
             onFile = True, ...):
  vi = ViewInfo(**locals())
  view_dict[name] = vi

(I did not strictly need the name attribute in the ViewInfo data, but it doesn't do any harm and it meant I could use locals() straight.)

A similar pattern can be done directly in a struct class as:

class ViewInfo:
  def __init__(self, name, factory, \
               onDir = False, ...):
    for k, v in locals().items():
      if k != "self":
        setattr(self, k, v)

(You really want to exclude self, since circular references make the garbage collector work harder than necessary.)

StructsWithDefaults written at 01:33:47; Add Comment

2005-12-18

Emulating C structs in Python

One of the few data types from C that I miss when writing Python code is structs. The simplest replacement is dictionaries, but that means you have to write thing['field'] instead of thing.field. I can't stand that (it's the extra characters).

If you want thing.field syntax in Python, you need an object. The simplest C struct emulation is just to use a blank object and set fields on it:

class MyStruct:
  pass

ms = MyStruct()
ms.foo = 10
ms.bar = "abc"

Some people will say that this is an abuse of objects, since they don't have any code, just data. I say to heck with such people; sometimes all I want is data.

(Avoid the temptation to just use 'ms = object()', because it hurts your ability to tell different types of structs apart via introspection.)

Initialization this way is tedious, though. We can do it easier and more compactly by using keyword arguments when we create the object, with a little help from the class. Like so:

class Struct:
  def __init__(self, **kwargs):
    for k, v in kwargs.items():
      setattr(self, k, v)

class MyStruct(Struct):
  pass

ms = MyStruct(foo = 10, bar = "abc")

(And look, now our objects have some code.)

It's possible to write the __init__ function as 'self.__dict__.update(kwargs)', but that is fishing a little too much into the implementation of objects for me. I would rather use the explicit setattr loop just to be clear about what's going on.

(I am absolutely sure people have been using this idiom for years before I got here.)

Sidebar: dealing with packed binary data

If you need to deal with packed binary data in Python, you want the struct module.

This is a much better tool than C has, because structs are not good for this (contrary to what some people think); structs do not actually fully specify the memory layout. C compilers are free to insert padding to make field access more efficient, which makes struct memory layout machine and compiler dependent.

(I sometimes find it ironic that supposedly 'high level' languages like Python and Perl have better tools to deal with binary structures than 'low level' C.)

EmulatingStructsInPython written at 17:30:55; Add Comment

2005-12-15

Another introspection trick

Here's another example of Python's introspection and command interpreter being useful:

[x for x in dir(m) if isinstance(getattr(m, x), str) and 'localhost' in getattr(m, x)]

One of our Mailman lists had been accidentally set up thinking that the machine's name was 'localhost', instead of the machine's actual hostname, and this was causing problems. Mailman is written in Python and offers access to the internals of list data via an interactive Python interpreter (through the withlist program). This one-off bit of introspection was basically a grep over all the lists's attributes.

I won't claim that this would be good style in a program, but as something I typed at the Python interpreter's command line it was very handy. In this, it's like shells and shell scripts; we do things on the command line that we'd never do in shell scripts.

(The isinstance() check is necessary to keep the next clause from potentially throwing an exception on non-string attributes, which would abort the entire list comprehension.)

AnotherIntrospectionTrick written at 14:32:52; Add Comment

2005-12-13

What Python threads are good for

Because of the sometimes much-maligned Global Interpreter Lock, pure Python code itself can't run simultaneously on multiple CPUs. So what should you use Python threads for?

The real use for Python threads is turning synchronous functions in extension modules into asynchronous things that don't delay your main program. Often these functions have no asynchronous equivalents (unlike network IO), so it is either use threads or have your main program delayed. This works for sufficiently compute-intensive functions as well as functions, like socket.gethostbyname, that have to wait on outside things.

Python threads are not a good way to do asynchronous network IO, because it's inefficient overkill; use either select() or poll() from the select module instead (along with non-blocking sockets and so on). If you need a canned solution for this, consider Twisted, or asyncore and asynchat from the standard library.

Note that threads are the only way to make gethostbyname() and gethostbyaddr() asynchronous, because they don't necessarily just do DNS lookups. Exactly what data sources they consult and how is highly system dependent; you really need to just be calling the platform C library routines. This cuts both ways; if you want just DNS lookups, do just DNS lookups via something like dnspython.

My thread-using Python programs wind up being built around completion queues and thread pools; they hand off work to auxiliary threads and then wait for things to finish. (Sometimes in conjunction with network IO; see here for how I mix work completion notification and select() et al.)

(Someday I will have a general 'thread pool' module that I'm happy with. I probably need to write more thread-using programs first.)

UsefulPythonThreads written at 01:29:00; Add Comment

2005-12-09

Security versus resilience

A while back I wrote this, about an exception created by the cgi module when crackers submitted XML-RPC calls instead of form POSTs. It makes a great example for discussing the difference between 'secure systems' and 'resilient systems'.

Put broadly, security is keeping people out, while resilience is keeping operating when people attack you. The cgi module example shows that you can have one without the other. Sometimes this may even be deliberate; an exceptionally paranoid system could shut itself down any time it saw unexpected input, just to be sure. This would be quite secure but not at all resilient.

(There are real systems that are close to this paranoid, for example the PAL systems that try to prevent unauthorized use of nuclear weapons.)

The cgi module seems to be secure (and I say 'seems' only because I haven't personally analyzed the code). To a large extent Python makes it easy to be secure; you are protected from basic issues like buffer overruns, and exceptions force you to handle errors one way or another. Python code may fail, but it almost always fails safely. (This does leave design issues, where the code is right but the algorithm is horribly wrong, but no language can really help there.)

However, resilience is much harder and less common, as the cgi module example demonstrates (and there's a number of other ways to make programs using the cgi module unhappy). If this is sloppy programming on the part of the cgi module, then such sloppy programming is practically endemic; truly paranoid programming, even for network applications, is still rare. (And I'm not going to claim that I've managed it.)

I think that resilience is in general harder than security. Security is all about confining things and making sure that things don't happen, whereas resilience is about thinking about everything that could go wrong. This makes resilience much more of an open-ended problem than security, with many more things to think about and keep track of.

Because resilience is about 'what can go wrong?', it also needs you to go behind the convenient abstractions, like 'network IO is just a stream of bytes'. (It is, but it's a stream of bytes that may come very slowly or very fast, not come at all, or be incomplete. What happens to your program in each case?)

On a concrete level, I'm pretty confidant in DWiki's security, and its design has a certain amount of thought put into the issues. I'm equally confidant that DWiki is not resilient and that there are a bunch of ways (even without writing comments) to hammer it. (DWiki gets a certain amount of resilience from being run as a CGI-BIN by Apache, but this only goes so far.)

SecurityVsResilience written at 02:16:05; Add Comment

2005-12-03

How to do TCP keepalives in Python

TCP keepalives are do-nothing packets the TCP layer can send to see if a connection is still alive or if the remote end has gone unreachable (due to a machine crash, a network problem, or whatever). Keepalives are not default TCP behavior (at least not in any TCP stack that conforms to the RFCs), so you have to specifically turn them on. (There are various reasons why this is sensible.)

In Python you can do this with the .setsockopt() socket method, using the socket.SO_KEEPALIVE option and setting a value of integer 1. The only mystery is what the level parameter should be; despite what you might guess, it is socket.SOL_SOCKET. So a complete code example is:

import socket
def setkeepalives(sck):
  sck.setsockopt(socket.SOL_SOCKET, \
                 socket.SO_KEEPALIVE, 1)

Various sources recommend turning keepalives on as soon as possible after you have the socket.

(Keepalives are only applicable to TCP sockets, so one might expect SOL_TCP or at least SOL_IP, but no; they are a generic socket level option. Go figure.)

On Linux, you can control various bits of keepalive behavior by setting the additional SOL_TCP integer parameters TCP_KEEPIDLE, TCP_KEEPINTVL, and TCP_KEEPCNT; Python defines them all in the socket module. See the tcp(7) manpage for details. The default values are found in /proc/sys/net/ipv4 in the files tcp_keepalive_time, tcp_keepalive_intvl, and tcp_keepalive_probes, and are fairly large.

TcpKeepalivesInPython written at 03:17:29; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.