2010-09-23
Why things should be in Python's standard library
There's another advantage of having things in the standard library: if they're in the standard library, they'll get used. This is what you want when the standard library module is the right solution to its problem (and a terrible idea if the module is a bad solution), especially when there are worse alternatives.
There are a lot of problems that have a few good solutions and a lot of bad ones; a classical example is parsing XML (using regular expressions versus a real parser). If one of the good solutions is not in the standard library but you can build a bad solution from other standard library bits, it's pretty much guaranteed that lots of people will build versions of the bad solution; people solve a lot of their problems using whatever tools the standard library gives them.
(Some of the people will start down the bad road because they don't know any better and don't realize how much pain they're getting themselves into. Some will do it because it works well enough for their current needs and it is the fast way to solve their problem.)
For a sufficiently common problem, especially if the wrong ways are sufficiently unproductive (or are sufficiently bad ideas), the conclusion is that you want an implementation of the right way to be in the standard library so that people will use it instead of slapping together yet another cringe inducing implementation of a bad solution.
(If this is an obscure problem, well, you can't put everything in the standard library.)
One short way of putting this is that the standard library should have good implementations of common things that it is (too) easy to get wrong.
(By the way, it does no good to say that people should not be so foolish as to tackle these problems the wrong way. This may be technically correct, but it does not solve the social problem of getting people to produce good code.)
Sidebar: the drawback of complexity
I cheated a bit. It's not enough that the standard library module be the right way to solve its problem, because that's not sufficient by itself to get people to use it over the alternatives. It must be the easiest way to solve its problem (or at least look like it). There are two sides to this: the amount of code you have to write with and without the module, and the amount of mental work you need to do in order to put together a solution, ie the module's complexity.
The module can usually win on the amount of code you have to write (if it can't, it has serious problems). Some standard library modules have not been too successful on the complexity front, though.
(Examples can help. Especially examples of doing simple things.)
2010-09-17
More on the module selection problem
A commentator on reddit made a decent point on my last entry; to quote:
Sometimes you just have to set aside time to find the best of breed and get to know the code you are using. [...] Exploration has a cost in software projects, and sometimes software projects fail (sub-project in this case).
I agree with this on large, important projects. if you're settling in to make a significant investment of time in programming something, you should expect to take a certain amount of time to figure various things out. If you've got a several week programming project, spending a day to figure out which of three or four modules you want to use is par for the course. Like a lot of other things in programming, this is partly an issue of optimizing costs; it makes sense to spend this time when it's going to make an important difference to your program.
But I also disagree. Not everything we use Python for is large, especially if we can use third party modules to do a lot of the work, and when you're working on small things the balance of time and effort is a lot different.
Much of what I write is relatively small things, and often they're sideline projects as opposed to things that we have to have. With such small things I wind up with three choices:
- I can spend an appreciable amount of time (compared to how much
work I expect the rest of the program to be) trying to figure out
if the third party module is any good. Even if it is good this
can be a mis-optimization because I'm not saving time overall.
- I can slap together my own version of the module. It won't work
as well as a good real version, but writing it will probably be
faster than trying to find a good real version unless I get lucky.
- I can abandon the program entirely because either of the other options would take more time than can be justified given the program's benefits. There's lots of things that I would like but that I don't have all that much time to spend on.
I often wind up going to the second or third option because the first one can be quite dangerous in a relatively small project. Often the only solid way to evaluate a promising module is to use it for real, ie to try to write my program using it. But if the module turns out not to work, I face the time consuming and seriously demotivating prospect of ripping much of my code apart to take out the bad module and swap in some replacement.
(It's much of my code because in a small thing, outside modules are likely to be fairly significant for the rest of your code instead of contained in a relatively small corner).
2010-09-15
Why I like modules to be in the Python standard library
Even when there may be perfectly good third party modules for something, I really want there to be a module for it in Python's standard library. Part of the reason is obviously how I find third party modules to be awkward, but another part of it is what I call the selection problem.
The selection problem is the problem of picking a third party module (sometimes even finding it, although pypi helps with that) and figuring out if it's any good. Simply figuring out the quality of a module is a bunch of work, and the amount of work multiplies drastically if there's several third party modules that all do what I want. Often, the only way I can really tell if a module is going to work well is to actually try using it. Generally this has to be in a real program (I find toy examples both frustrating to write and uninformative), which means that if I have picked poorly I may have wasted a bunch of time and effort. Even if I can rule out a module relatively early, I had to spend the time to read documentation or skim code or the like, and that time's all wasted.
(And frankly it's frustrating to run into near misses, modules that almost do what I need and almost work. Faced with this, it often at least feels easier to write something from scratch myself if what I want isn't too big.)
When a module has made it into the standard library, I don't have to go through all of this; I can just use the module, secure in the confidence that this is a good implementation of whatever it is that I want to do. Someone else has already gone through all of this quality assurance work, and if there were multiple implementations the Python people have probably either picked the best one or at least determined that they are more or less equivalent and so I am not missing anything very important by not looking at the other options.
(Yes, sometimes this confidence is misplaced. But generally it's at least close.)
Update: see also WhyInStandardLibraryII for additional comments on the time drain of the selection problem.
2010-09-14
A confession: I find third party modules awkward in Python
One of my weaknesses is that with rare exceptions, I pretty much never use third party Python modules even when they'd make my life easier. I will sometimes use third party modules that are single files, but multi-file modules are generally too annoying (especially if they require compiling something).
The problem with third party modules is that installing and managing them is usually too much work, especially if they are large. You basically have four choices:
- hope that all of the operating systems that you want to use the
module on have prebuilt packages (which are sufficiently modern)
and you can persuade your system administrator to install it.
- if you are your system administrator, you can install the modules
by hand (or by various Python packaging methods) in the system
module area. This leads to sysadmin heartache and leaves you exposed if you ever
find yourself wanting to use your program on a machine where you
don't have this power.
- put the third party modules straight into your program's main
source directory (presuming that you have one, and that your
program is not a single Python file before you started depending
on third-party modules).
- some module install schemes support an alternate installation
location.
If you take this option, you generally have to manually fiddle
with
sys.pathin your programs to get them to find these modules (either directly or by setting various environment variables in wrapper scripts).
All of these approaches have aspects that annoy me. The friction they add is just enough that when I ask myself 'do I really need this module or is it just convenient', the answer usually is that it is just a convenience and I can live without it.
I'm aware that this is a personal twitch, and that a lot of people use the various install options happily. Probably they're more productive than me, too, because they're open to using convenient third party modules when they make sense. I did say this was a personal weakness.
Sidebar: how I'd like third party module installs to work
The basic answer is that I want to be able to configure a place (the
root of a directory hierarch) for them to go, once, and then to have
this place automatically added to the Python search path for everything
I run so that my code can just do 'import <whatever>' without me
having to do anything else.
I want this to happen without the need for environment variables or special cover scripts in front of the Python interpreter or whatever, because any of those requirements impose their own frictions. (For example, special environment variables make it annoying to run things from cron or other contexts where you don't have all of your environment automatically initialized for you.)
2010-09-02
Why Python's global is necessary
When I started out programming in Python, I didn't really like
global. For a long time I considered it unaesthetic, annoying,
and on the whole an irritating wart of the bytecode implementation. As I mentioned recently,
I have come around to a different view of global, and it goes like
this.
If you want to have both global variables and lexically scoped local variables, you have to be able to tell whether a given name being assigned to in a function is a local or a global variable at the time that the function is being defined. Assuming that you want as much as possible of this to be implicit for various reasons, there are three relatively reasonable choices that I can think of:
- you must declare globals explicitly; otherwise names are local.
- you must declare locals explicitly; otherwise names are global.
- the decision is made implicitly by what global names already exist when the function is being defined; a name that exists globally is taken as a global variable, and otherwise the name is taken as a local variable.
(If a name is never assigned to within a function but only read from, it's either a global variable or a 'use of an undefined value' error. Python opts to consider it a global variable.)
The third option is fragile (and un-Pythonic). This leaves you with a
choice between the first and the second options, and either way you are
going to need a keyword for it. Python makes the decision that writing
to global variables will be rare and so it forces you to declare them
explicitly; local variables, the common case, are handled implicitly. So
it needs global, because having local instead would be worse (and
having neither would be much worse).
(This decision might be either a pragmatic one, based on what was expected to be common, or a philosophical choice to make global variables more inconvenient in the hopes of making them less common. I don't know the Python history involved, so I have no idea which it was.)
Other languages make different choices here, sometimes for philosophical reasons that come down on the other side and sometimes just for historical ones (eg, if they started out without local variables or lexical scoping at all).
Sidebar: the many problems with the fully implicit option
The core problem with the fully implicit option, why it is fragile in many ways, is that it makes the meaning of a function dependent on its surrounding context. You can't just read a function and know what it does and what it manipulates; instead you have to know what global names exist when the function is defined.
One consequence of this is that anything that changes what global names are defined can change the meaning of the function. In a language like Python where function definition is an ordinary executable statement, one done immediately when encountered, merely moving a function definition forward or backwards inside a file could change the function's meaning even without any other code changes (as you move it before or after where global names are created or even deleted).