2015-06-17
Exploring the irritating thing about Python's .join()
Let's start out with the tweets:
@Twirrim:
list_object.join(',')AttributeError: 'list' object has no attribute 'join'
*facepalm*','.join(list_object)
@thatcks: It's quite irritating that you can't ask lists to join themselves w/ a string, you have to ask a string to join a list with itself.
Python has some warts here and there. Not necessarily big warts, but warts that make you give it a sideways look and wonder what people were thinking. One of them is how you do the common operation of turning a sequence of strings into a single string, with the individual strings separated by some common string like ','. As we see here, a lot of people expect this to be a list operation; you ask the list 'turn yourself into a string with the following separator character'. But that's not how Python does it; instead it's a string operation where you do the odd thing of asking the separator string to assemble a list around itself. This is at least odd and some people find it bizarre. Arguably the logic is completely backwards.
There are two reasons Python wound up here. The first is that back
in the old days there was no .join()
method on strings and this
was just implemented as a function in the string
module,
string.join()
. This makes perfect sense as a place to put this
operation, as it's a string-making operation. But when Python did
its great method-ization of various module functions, it of course
made most of the string
module functions into methods on the
string
type, so we wound up with the current <str>.join(). Since
then it's become Python orthodoxy to invoke list to string joining
as 'sep.join(lst)
' instead of 'string.join(lst, sep)
'.
The other reason can be illuminated by noting that if Python did
it the other way around you wouldn't have just lst.join()
, you'd
also have to have tuple.join()
and in fact a .join()
method on
every sequence compatible type or even iterators. Anything that you
wanted to join together into a string this way would have to implement
a .join()
, which would be a lot of types even in the standard
library. And because of how both CPython and Python are structured,
a lot of this would involve re-implementation and duplication of
identical or nearly identical code. If you have to have .join()
as a method on something, putting it on the few separator types
means that you have far less code duplication and that any new
sequence type automatically supports doing this in the correct
orthodox way.
(I'm sure that people would write iterator or sequence types that
didn't have a .join()
method if it was possible to do so, because
sooner or later people leave out every method they don't think
they're going to use.)
Given the limitations of Python, I'll reluctantly concede that the
current .join()
approach is the better alternative. I don't think
you can even get away with having just string.join()
and no string
.join()
method (however much an irrational bit of me would like
to throw the baby out with the bathwater here). Even ignoring
people's irritation with having to do 'import string
' just to get
access to string.join()
, there would be some CPython implementation
challenges.
Sidebar: The implementation challenges
String joining is a sufficiently frequent operation that you want
it to be efficient. Doing it efficiently requires doing it in C so
that you can do tricks like pre-compute the length of the final
string, allocate all of the memory once, and then memcpy()
all
of the pieces into place. However, you also have both byte strings
and Unicode strings, and each needs their own specialized C level
string joining implementation (especially as modern Unicode strings
have a complex internal storage structure).
The existing string
module is actually a Python level module. So
how do you go from an in-Python string.join()
function to specific
C code for byte strings or Unicode strings, depending on what you're
joining? The best mechanism CPython has for this is actually 'a
method on the C level class that the Python code can call', at which
point you're back to strings having a .join()
method under some
name. And once you have the method under some name, you might as
well expose it to Python programmers and call it .join()
, ie
you're back to the current situation.
I may not entirely like .join()
in its current form, but I have
to admit that it's an impeccably logically assembled setup where
everything is basically the simplest and best choice I can see.