2012-11-25
More thoughts on why Python doesn't see much monkey-patching
In yesterday's entry I advanced the idea that a significant part of why Python doesn't see much monkey patching (and Ruby does) is that you can't monkey patch a lot of fundamental Python classes and types because they're implemented in C. In sober hindsight, I think that I'm understating two intangibles (one of which I touched on in passing) in favour of a technical explanation.
First, I think that the effects of culture matter more than I initially thought. Culture plays a strong role in shaping how we write code in a language, both explicit in what people tell you to do and not do and implicit in things like what tutorials present and what approaches they take to solve problems. To invert the saying that you can write Fortran in any language, the reality is that people don't and culture is a good part of why.
Second is the issue of syntax. Ruby has a clear and direct syntax for adding your own methods to outside classes, while Python does not. In Python you have to go out of your way to modify a class after it's been created and the syntax is at least somewhat awkward and indirect. It's probably not longer (in lines) than the Ruby version, but it's certainly less direct and obvious. Since I'm a big believer that syntax matters I think that this can't help but have an effect in the two languages. I don't know if Ruby's syntax steers people towards monkey patching, but how it's both inobvious and a pain in Python has to steer people away from it.
Sidebar: the syntax in each language
In Ruby, monkey patching looks like this:
class SomeClass
def newfunction
[whatever]
end
end
(I believe this is correct; I'm taking it from online examples, since I'm not a Ruby programmer.)
I think that this is the exact same Ruby syntax as you use when defining a full class yourself.
In Python, the equivalent is:
def newfunction(self, ...): [whatever] SomeClass.newfunction = newfunction
This is shorter but less direct, involves more repetition, and doesn't
clearly mark the new function as purely a class method (to be really
equivalent to the Ruby version we should do 'del newfunction' to clean
it out of our namespace). Full-scale lambdas would make this slightly
better because then you wouldn't need to define the function separately
and then glue it on to the class, but it would still look nothing like
the way you define classes themselves.
The limits of monkey patching in Python
One of the things I find interesting is the question of why monkey-patching is a common thing in the Ruby world but not in the Python world. Certainly a significant part of this is cultural (Python culture is very much against monkey patching, Ruby culture seems to be completely accepting of it), but it's hard for me to believe that that's the whole story. People do things to solve problems and I don't believe that Python is magically without the problems that cause people to monkey-patch Ruby. I've recently come up with not so much a theory as an observation on this.
One of the things that people do with Ruby monkey-patching is adding convenience methods to core Ruby classes, things like strings and arrays. You can't do this in Python, because Python monkey-patching has a fundamental limit: you can't monkey-patch anything written in C. Well, you can't monkey-patch it unless it went out of its way to let you, and most modules written in C don't.
There's two aspects of this. To start with, a fair number of the interesting Python classes and modules are written in C, including (of course) all of the fundamental types. These are all de facto sealed from modification and given that Python's culture is against monkey-patching it's unlikely that a proposal to change that would be accepted (to put it one way). This means that you simply can't do a fair amount of the monkey patching that happens in Ruby.
More generally, that C-level things can't be monkey patched makes it at least somewhat dangerous to patch anything that might plausibly be turned into a C module even if it isn't one right now. This is probably especially likely to happen to popular data structures (any number of which have made journeys from Python to C). If such a transition does happen, your monkey patching will immediately fail and you'll have to find another way.
(I don't know if this really acts as a disincentive, though, because I'm not sure that people who might monkey patch such things are aware of this issue.)
2012-11-19
Python 3's print() annoys me (although maybe it shouldn't)
One of a number of changes in Python 3 that I have a visceral unhappy
reaction to is the replacement of the print statement by the print()
function. On the one hand I sort of understand the 'computer science'
perspective on changing it to a function; it moves Python's explicit
statements closer to being purely intrinsic language operators. On the
other hand, I just don't like it.
I think that part of the reason I don't like it is that it remains more
or less magic while hiding that magic. When print was a statement,
it was clear that it was pretty unusual and special. Turning print()
into a function does not really make it any less magical (or central),
but it sort of pretends otherwise in my eyes; the magic is now hidden
behind a function call. Part of this feeling probably comes because
print() is a built-in function, which makes it specially privileged;
I'd consider sys.print() to be less magical (although more stupid;
printing things is a pretty common and important operation, so short
names for it matter).
On a pragmatic level, though, I'm clearly wrong; Python 3 print()
is significantly less magical than Python 2 print, at least in
CPython. CPython 2 has special bytecodes for printing things and
print compiles down to them, while in Python 3 uses of print()
compile to ordinary (global) function calls (and you can even redefine
print() yourself if you want to be perverse). The arguments and
special behaviors are also more regular and more easily discoverable (as
a function, print() has in-Python documentation that you can see with
help()).
(That print() is a real function opens up heavy use of print()
to various sorts of the usual optimizations, and means that it's
intrinsically somewhat slower than the Python 2 version since it
necessarily involves a function name lookup and a function call. It's
unlikely that this will matter to any real code, but you never know.)
PS: I know, I know. Even Python 2 has plenty of magical functions in the
global namespace, things like type() or str(). I don't claim that my
annoyance at print() is rational and I probably wouldn't be annoyed if
Python had always had a print() function. Which implies that part of
my annoyance may be due to what I see as a basically pointless renaming
of a builtin to a function, one that forces a whole bunch of noisy code
changes for no really compelling reason. (There is an entire rant about
language changes without strong reasons that could go here.)
Also, I just tend to think that print reads better because it stands
out more than yet another function call.
Sidebar: why C's printf() doesn't irritate me similarly
The simple version is that printf() is clearly a library function.
A large part of the reason that this works in C and doesn't work
in Python is namespaces, in that Python has them and C doesn't. In C
everyone can dump functions into what in Python is the global namespace;
in Python, modules cannot do this, which makes builtin names like
print() special. Another reason this works in C is that producing
output is clearly out of scope of the core C language, which is very
limited and confined (it doesn't even have dynamic memory allocation).
2012-11-11
A reminder: string concatenation really is string concatenation
Once upon a time when I was starting to write Python, I scribbled down the following code:
def warn(s): sys.stderr.write(sys.argv[0] + ": " + s + "\n")
(More or less. My actual code had an error and so didn't even work.)
Many Python programmers are wincing, because of course string
concatenation is both somewhat inefficient and not the idiomatic way to
do this; you should be using % string formatting. But there's another
somewhat more subtle reason to avoid code like this, one that I ran into
recently when I stumbled over this code the hard way by having it blow up
in my face.
The surrounding code went something like this:
try: o, r = getopt.getopt(....) except getopt.error, cause: warn(cause) ....
This failed. You see, the subtle problem with string concatenation is
that it really is string concatenation. Unlike % formatting, it will
not try to str() objects to convert them to strings; if they are not
strings already, it just fails. It is of course easy to overlook this if
you usually give your code actual strings; passing in a non-string object
that can be stringified may be an uncommon corner case that you don't
test explicitly.
This code actually exposes an interesting effect of Python's slow
changes between Python 1.x and Python 2. Back in the old days exceptions
actually were strings instead of objects that can be string-ified, and
so this code could work when fed one of those exceptions. I wrote the
program this code appears in back in 2003 or earlier and I believe we
were still using Python 1.5 at the time (although it wasn't the current
version even then); the 1.5.2 version of the getopt module appears to
still have been using string exceptions at the time. So this might have
been less crazy back then than it appears now (although it was still the
wrong way to do it plus my actual implementation had a bug).