Exploring the irritating thing about Python's .join()

June 17, 2015

Let's start out with the tweets:

@Twirrim:
list_object.join(',')

AttributeError: 'list' object has no attribute 'join'
*facepalm*

','.join(list_object)

@thatcks: It's quite irritating that you can't ask lists to join themselves w/ a string, you have to ask a string to join a list with itself.

Python has some warts here and there. Not necessarily big warts, but warts that make you give it a sideways look and wonder what people were thinking. One of them is how you do the common operation of turning a sequence of strings into a single string, with the individual strings separated by some common string like ','. As we see here, a lot of people expect this to be a list operation; you ask the list 'turn yourself into a string with the following separator character'. But that's not how Python does it; instead it's a string operation where you do the odd thing of asking the separator string to assemble a list around itself. This is at least odd and some people find it bizarre. Arguably the logic is completely backwards.

There are two reasons Python wound up here. The first is that back in the old days there was no .join() method on strings and this was just implemented as a function in the string module, string.join(). This makes perfect sense as a place to put this operation, as it's a string-making operation. But when Python did its great method-ization of various module functions, it of course made most of the string module functions into methods on the string type, so we wound up with the current <str>.join(). Since then it's become Python orthodoxy to invoke list to string joining as 'sep.join(lst)' instead of 'string.join(lst, sep)'.

The other reason can be illuminated by noting that if Python did it the other way around you wouldn't have just lst.join(), you'd also have to have tuple.join() and in fact a .join() method on every sequence compatible type or even iterators. Anything that you wanted to join together into a string this way would have to implement a .join(), which would be a lot of types even in the standard library. And because of how both CPython and Python are structured, a lot of this would involve re-implementation and duplication of identical or nearly identical code. If you have to have .join() as a method on something, putting it on the few separator types means that you have far less code duplication and that any new sequence type automatically supports doing this in the correct orthodox way.

(I'm sure that people would write iterator or sequence types that didn't have a .join() method if it was possible to do so, because sooner or later people leave out every method they don't think they're going to use.)

Given the limitations of Python, I'll reluctantly concede that the current .join() approach is the better alternative. I don't think you can even get away with having just string.join() and no string .join() method (however much an irrational bit of me would like to throw the baby out with the bathwater here). Even ignoring people's irritation with having to do 'import string' just to get access to string.join(), there would be some CPython implementation challenges.

Sidebar: The implementation challenges

String joining is a sufficiently frequent operation that you want it to be efficient. Doing it efficiently requires doing it in C so that you can do tricks like pre-compute the length of the final string, allocate all of the memory once, and then memcpy() all of the pieces into place. However, you also have both byte strings and Unicode strings, and each needs their own specialized C level string joining implementation (especially as modern Unicode strings have a complex internal storage structure).

The existing string module is actually a Python level module. So how do you go from an in-Python string.join() function to specific C code for byte strings or Unicode strings, depending on what you're joining? The best mechanism CPython has for this is actually 'a method on the C level class that the Python code can call', at which point you're back to strings having a .join() method under some name. And once you have the method under some name, you might as well expose it to Python programmers and call it .join(), ie you're back to the current situation.

I may not entirely like .join() in its current form, but I have to admit that it's an impeccably logically assembled setup where everything is basically the simplest and best choice I can see.

Written on 17 June 2015.
« NFS writes and whether or not they're synchronous
The cost of OmniOS not having /etc/cron.d »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Jun 17 02:08:42 2015
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.