Exploring the irritating thing about Python's .join()

June 17, 2015

Let's start out with the tweets:

@Twirrim:
list_object.join(',')

AttributeError: 'list' object has no attribute 'join'
*facepalm*

','.join(list_object)

@thatcks: It's quite irritating that you can't ask lists to join themselves w/ a string, you have to ask a string to join a list with itself.

Python has some warts here and there. Not necessarily big warts, but warts that make you give it a sideways look and wonder what people were thinking. One of them is how you do the common operation of turning a sequence of strings into a single string, with the individual strings separated by some common string like ','. As we see here, a lot of people expect this to be a list operation; you ask the list 'turn yourself into a string with the following separator character'. But that's not how Python does it; instead it's a string operation where you do the odd thing of asking the separator string to assemble a list around itself. This is at least odd and some people find it bizarre. Arguably the logic is completely backwards.

There are two reasons Python wound up here. The first is that back in the old days there was no .join() method on strings and this was just implemented as a function in the string module, string.join(). This makes perfect sense as a place to put this operation, as it's a string-making operation. But when Python did its great method-ization of various module functions, it of course made most of the string module functions into methods on the string type, so we wound up with the current <str>.join(). Since then it's become Python orthodoxy to invoke list to string joining as 'sep.join(lst)' instead of 'string.join(lst, sep)'.

The other reason can be illuminated by noting that if Python did it the other way around you wouldn't have just lst.join(), you'd also have to have tuple.join() and in fact a .join() method on every sequence compatible type or even iterators. Anything that you wanted to join together into a string this way would have to implement a .join(), which would be a lot of types even in the standard library. And because of how both CPython and Python are structured, a lot of this would involve re-implementation and duplication of identical or nearly identical code. If you have to have .join() as a method on something, putting it on the few separator types means that you have far less code duplication and that any new sequence type automatically supports doing this in the correct orthodox way.

(I'm sure that people would write iterator or sequence types that didn't have a .join() method if it was possible to do so, because sooner or later people leave out every method they don't think they're going to use.)

Given the limitations of Python, I'll reluctantly concede that the current .join() approach is the better alternative. I don't think you can even get away with having just string.join() and no string .join() method (however much an irrational bit of me would like to throw the baby out with the bathwater here). Even ignoring people's irritation with having to do 'import string' just to get access to string.join(), there would be some CPython implementation challenges.

Sidebar: The implementation challenges

String joining is a sufficiently frequent operation that you want it to be efficient. Doing it efficiently requires doing it in C so that you can do tricks like pre-compute the length of the final string, allocate all of the memory once, and then memcpy() all of the pieces into place. However, you also have both byte strings and Unicode strings, and each needs their own specialized C level string joining implementation (especially as modern Unicode strings have a complex internal storage structure).

The existing string module is actually a Python level module. So how do you go from an in-Python string.join() function to specific C code for byte strings or Unicode strings, depending on what you're joining? The best mechanism CPython has for this is actually 'a method on the C level class that the Python code can call', at which point you're back to strings having a .join() method under some name. And once you have the method under some name, you might as well expose it to Python programmers and call it .join(), ie you're back to the current situation.

I may not entirely like .join() in its current form, but I have to admit that it's an impeccably logically assembled setup where everything is basically the simplest and best choice I can see.


Comments on this page:

By Zev Weiss at 2015-06-17 04:29:29:

While I can sort of see the appeal of list.join(sep), I'd find it pretty weird to have a method on lists that was only really applicable to lists of strings. I guess perhaps you could generalize it to apply to lists of sequences, but even so I think it'd be quite out of place among the existing list methods, which are all much more generic.

By dozzie at 2015-06-17 05:46:18:

@cks:

The other reason can be illuminated by noting that if Python did it the other way around you wouldn't have just lst.join(), you'd also have to have tuple.join() and in fact a .join() method on every sequence compatible type or even iterators.

And that's why map(), filter() and reduce() are separate from instances they operate on, so the precedence is already well-established.

it’s an impeccably logically assembled setup

But it’s robot logic though, isn’t it, in a way? Isn’t that why it bugs you regardless?

By himdel at 2015-06-18 05:45:55:

But isn't Python's biggest limitation just being unwilling to do a pragmatic half-assed solution?

Surely, just adding a list.join and tuple.join methods that simply call str.join with the arguments flipped would cover most of the use cases... (and is O(1) not O(N); sure, it may have to be in C and possibly not as straightforward but still..).

And there could be a standard mixin you can use in your classes.

Or am I overlooking something?

By cks at 2015-06-18 17:52:14:

I don't think that this quite rises to the level of irritation and user hostility of full blown robot logic, although I may be wrong about that because I'm so thoroughly acclimatized to Python. It's more of a quirk than a 'makes you go well out of your way' thing.

@himdel: My feeling is that if you keep str.join() it is un-Pythonic to have list.join, tuple.join, and so on. Python language design avoids redundant convenient methods, which is what they would be. If you remove a Python-accessible string method for this, you have the implementation issues. In either case you have broad support issues, because a lot of things would not pick up explicit support for a .join() method, which would make use of it chancy and push people towards explicit measures (eg I'd expect a lot of code that does 'list(obj).join(...)').

My impression is that there are languages that can automatically apply mixins (or generic implementations or etc) to types to create derived operations like a general .join() method; in such a language this design would still give you broad support for .join() on basically anything you could use it on in Python. But Python is clearly not such a language.

(If it was, you would not have to code all of the special methods for eg comparison; you could code one and let Python deduce the rest through usual arithmetic comparison rules or whatever. Python has code to do this for you, eg functools.total_ordering, if you find it and invoke it, but it's not automatic and as a result people write classes that only support certain comparison operations just because it's easier and those operations are the only ones they need right then.)

By himdel at 2015-06-20 04:49:00:

@cks .. I guess it's more of a philosophical issue, really. I don't really see anything wrong with having both, and just use the weird one when needed, and the simple one when not. (for simple = non-surprising)

I mean I understand that some kind of language purity was a consideration when designing Python and this is really an effect of that decision.. but ultimately the effect is that in this case, it's purity over convenience.

Though, I suspect it's more of a concern for me as someone who touches Python from time to time but usually works in other languages than for people who use Python daily.

As for the mixins.. I really wouldn't want them applied automatically, I'm not a big fan of magic, and yes, that means people will forget them from time to time, but so what, code usually isn't set in stone.

By Ewen McNeill at 2015-06-26 06:20:25:

In some languages the solution to this problem would be to have an iterator base type that had these common methods on it, to which all the list/tuple/etc classes could fall through if they didn't have a better implementation. Python instead has an iterator convention, so misses out on this logical central place to put common code.

Another other language solution is to have a standardised iterator interface that causes the compiler/interpreter to force you to implement the required things. Python's choice of just a convention doesn't really let it do that either, except by failing at runtime. So... arguably having things that act like an iterator but don't implement join(), when others do implement join(), causing a runtime error is very Pythonic.

It does feel like it'd require a Major Version change to actually get changed though. And it seems... unlikely Python will have another one of those in a hurry.

Ewen

Written on 17 June 2015.
« NFS writes and whether or not they're synchronous
The cost of OmniOS not having /etc/cron.d »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Jun 17 02:08:42 2015
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.