Wandering Thoughts archives

2015-06-17

Exploring the irritating thing about Python's .join()

Let's start out with the tweets:

@Twirrim:
list_object.join(',')

AttributeError: 'list' object has no attribute 'join'
*facepalm*

','.join(list_object)

@thatcks: It's quite irritating that you can't ask lists to join themselves w/ a string, you have to ask a string to join a list with itself.

Python has some warts here and there. Not necessarily big warts, but warts that make you give it a sideways look and wonder what people were thinking. One of them is how you do the common operation of turning a sequence of strings into a single string, with the individual strings separated by some common string like ','. As we see here, a lot of people expect this to be a list operation; you ask the list 'turn yourself into a string with the following separator character'. But that's not how Python does it; instead it's a string operation where you do the odd thing of asking the separator string to assemble a list around itself. This is at least odd and some people find it bizarre. Arguably the logic is completely backwards.

There are two reasons Python wound up here. The first is that back in the old days there was no .join() method on strings and this was just implemented as a function in the string module, string.join(). This makes perfect sense as a place to put this operation, as it's a string-making operation. But when Python did its great method-ization of various module functions, it of course made most of the string module functions into methods on the string type, so we wound up with the current <str>.join(). Since then it's become Python orthodoxy to invoke list to string joining as 'sep.join(lst)' instead of 'string.join(lst, sep)'.

The other reason can be illuminated by noting that if Python did it the other way around you wouldn't have just lst.join(), you'd also have to have tuple.join() and in fact a .join() method on every sequence compatible type or even iterators. Anything that you wanted to join together into a string this way would have to implement a .join(), which would be a lot of types even in the standard library. And because of how both CPython and Python are structured, a lot of this would involve re-implementation and duplication of identical or nearly identical code. If you have to have .join() as a method on something, putting it on the few separator types means that you have far less code duplication and that any new sequence type automatically supports doing this in the correct orthodox way.

(I'm sure that people would write iterator or sequence types that didn't have a .join() method if it was possible to do so, because sooner or later people leave out every method they don't think they're going to use.)

Given the limitations of Python, I'll reluctantly concede that the current .join() approach is the better alternative. I don't think you can even get away with having just string.join() and no string .join() method (however much an irrational bit of me would like to throw the baby out with the bathwater here). Even ignoring people's irritation with having to do 'import string' just to get access to string.join(), there would be some CPython implementation challenges.

Sidebar: The implementation challenges

String joining is a sufficiently frequent operation that you want it to be efficient. Doing it efficiently requires doing it in C so that you can do tricks like pre-compute the length of the final string, allocate all of the memory once, and then memcpy() all of the pieces into place. However, you also have both byte strings and Unicode strings, and each needs their own specialized C level string joining implementation (especially as modern Unicode strings have a complex internal storage structure).

The existing string module is actually a Python level module. So how do you go from an in-Python string.join() function to specific C code for byte strings or Unicode strings, depending on what you're joining? The best mechanism CPython has for this is actually 'a method on the C level class that the Python code can call', at which point you're back to strings having a .join() method under some name. And once you have the method under some name, you might as well expose it to Python programmers and call it .join(), ie you're back to the current situation.

I may not entirely like .join() in its current form, but I have to admit that it's an impeccably logically assembled setup where everything is basically the simplest and best choice I can see.

JoinDesignDecisions written at 02:08:42; Add Comment

2015-06-07

You won't get people off Python 2 by making their lives worse

This is one of those times when I'm just going to quote someone, but hey, it's Guido van Rossum (via):

However this talk of "wasting our time with Python 2" needs to stop, and if you think that making Python 2 less attractive will encourage people to migrate to Python 3, think again. [...]

What he said, with all the emphasis you can imagine.

If the Python developers really think that, it's rather sad. Of course they wouldn't be the first people to believe that; the trick has a long history in computing, even if it often backfires.

I also personally think that it is stupid at this point in Python 3's life cycle. By now, there are probably two major classes of people who are still using Python 2: the people who are waiting for dependencies to get ported and the people who have decided that it is not a worthwhile expenditure of their time to port their code. Deliberately screwing these people does nothing to get them to move to Python 3. To the extent that they are aware that they are getting deliberately screwed by Python developers, it is more likely to encourage them to port their code to something else, anything else.

(Probably there is a third class of people, namely people who wrote some Python a while back and haven't touched it since because it works and what's this Python 3 thing and why should they care? These people are ignoring the whole mess, but in practice they are probably lost to Python 3 for good; you might as well consider them 'will never port'.)

In short: harming the remaining Python 2 users will not get them to migrate to Python 3 any faster than they already are, it just pisses them off. They are not migrating because it is impossible (at least currently) or too hard or too risky or the like.

(I could blather about what Python 3 'should' do to push for more migration, but it doesn't matter on several levels and anyways, I would be speaking from an uninformed and purely personal position. But in general, if the rate of Python 3 migration is not pleasing the Python developers, I prescribe a mirror.)

Sidebar: Why I say that people who are ignoring this are probably lost

I'm sure that there's a bunch of people out in the world who haven't heard about the Python 2 to Python 3 commotion; they have some Python 2 code, it works, they don't care about anything else. Due to being out of the loop, the first time they're likely to come into contact with this issue is when Python 2 isn't there on some new system and their old code immediately stops working.

(This can be either through /usr/bin/python disappearing or through it becoming Python 3.)

At this point, I think the most likely reaction of these people will be to discard their now-ancient (Python 2) system. If what it does is still needed, they'll probably rewrite from scratch using whatever is their current language and environment (which is not Python 3, because remember, they're out of the loop). If porting to Python 3 is easy they might do that instead, but I suspect it's not; they're going to basically be dealing with legacy code.

We're very close to being in this boat ourselves at work. While we have some Python code and not all of it's written by me, I think I'm the only person who's really following the Python 2 vs Python 3 issue. In my absence our Python code would run until it fell over and couldn't be easily patched, and then my co-workers might well pick another language they like better (whatever it would be at the time).

Python2NoBeatings written at 01:52:51; Add Comment

By day for June 2015: 7 17; before June; after June.

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.