2014-04-13
A problem: handling warnings generated at low levels in your code
Python has a well honed approach for handling errors that happen at a low level in your code; you raise a specific exception and let it bubble up through your program. There's even a pattern for adding more context as you go up through the call stack, where you catch the exception, add more context to it (through one of various ways), and then propagate the exception onwards.
(You can also use things like phase tracking to make error messages more specific. And you may want to catch and re-raise exceptions for other reasons, such as wrapping foreign exceptions.)
All of this is great when it's an error. But what about warnings? I recently ran into a case where I wanted to 'raise' (in the abstract) a warning at a very low level in my code, and that left me completely stymied about what the best way to do it was. The disconnect between errors and warnings is that in most cases errors immediately stop further processing while warnings don't, so you can't deal with warnings by raising an exception; you need to somehow both 'raise' the warning and continue further processing.
I can think of several ways of handling this, all of which I've sort of used in code in the past:
- Explicitly return warnings as part of the function's output. This
is the most straightforward but also sprays warnings through your
APIs, which can be a problem if you realize that you've found a
need to add warnings to existing code.
- Have functions accumulate warnings on some global or relatively
global object (perhaps hidden through 'record a warning' function
calls). Then at the end of processing, high-level code will go
through the accumulated warnings and do whatever is desired with
them.
- Log the warnings immediately through a general logging system that you're using for all program messages (ranging from simple to very complex). This has the benefit that both warnings and errors will be produced in the correct order.
The second and third approaches have the problem that it's hard for intermediate layers to add context to warning messages; they'll wind up wanting or needing to pass the context down to the low level routines that generate the warnings. The third approach can have the general options problem when it comes to controlling what warnings are and aren't produced, or you can try to control this by having the high level code configure the logging system to discard some messages.
I don't have any answers here, but I can't help thinking that I'm missing a way of doing this that would make it all easy. Probably logging is the best general approach for this and I should just give in, learn a Python logging system, and use it for everything in the future.
(In the incident that sparked this entry, I wound up punting and just
printing out a message with sys.stderr.write() because I wasn't in a
mood to significantly restructure the code just because I now wanted to
emit a warning.)
2014-03-17
Simple versus complex marshalling in Python (and benchmarks)
If you have an external caching layer in your Python application, any caching layer, one of the important things that dictates its speed is how fast you can turn Python data structures into byte blobs, stuff them into the cache, and then get byte blobs back from the cache and turn them back into data structures. Many caches will store arbitrary blobs for you so your choice of marshalling protocols (and code) can make a meaningful difference. And there are a lot of potential options; marshal, cPickle, JSON, Google protobuf, msgpack, and so on.
One of the big divisions here is what I could call the JSON verus pickle split, namely whether you can encode and decode something close to full Python objects or whether you can only encode and decode primitive types. All else being equal it seems like you should use simple marshalling, since creating an actual Python class instance necessarily has some overhead over and above just decoding primitive types. But this leaves you with a question; put simply, how is your program going to manipulate the demarshalled entities?
In many Python programs these entities would normally be objects, partly because objects are the natural primitive of Python (among other reasons, classes provide convenient namespaces). This basically leaves you with two options. If you work with objects but convert them to and from simple types around the cache layer, you've really built your own two-stage complex marshalling system. If you work with simple entities throughout your code you're probably going to wind up with more awkward and un-Pythonic code. In many situations what I think you'll really wind up doing is converting those simple cache entities back to objects at some point (and converting from objects to simple cache entities when writing cache entries).
Which brings me around to the subject of benchmarks. You can find a certain amount of marshalling benchmarks out there on the Internet, but what I've noticed is that they're basically all benchmarking the simple marshalling case. This is perfectly understandable (since many marshalling protocols can only do primitive types) but not quite as useful for me as it looks. As suggested above, what I really want to get into and out of the cache in the long run is some form of objects, whether the marshalling layer handles them for me or I have to do the conversion by hand. The benchmark that matters for me is the total time starting from or finishing with the object.
With that said, if caches are going to be an important part of your
system it likely pays to think about how you're going to get entries
into and out of them efficiently. You may want to have deliberately
simplified objects near the cache boundaries that are mostly thin
wrappers around primitive types. Plus Python gives you a certain
amount of brute force hacks, like playing games with obj.__dict__.
(I don't have any answers here, or benchmark results for that matter. And I'm sure there's situations where it makes sense to go with just primitive types and more awkward code instead of using Python objects.)
Sidebar: The other marshalling benchmark problem
Put simply, different primitive types generally encode and decode at different speeds (and the same is true for different sizes of primitive types like strings) This means you need to pay attention to what people are encoding and decoding, not just what the speed results are; if they're not encoding something representative of what you want to, all bets may be off.
(My old tests of marshal versus cPickle showed some interesting type-based variations of this nature.)
You can also care more about decoding speed than encoding speed, or vice versa. My gut instinct is that you probably want to care more about decoding speed if your cache is doing much good, because getting things back from the cache (and the subsequent decodes) should be more frequent than putting things into it.
2014-03-13
The argument about unbound methods versus functions
As mentioned in How functions become bound or unbound methods, in Python 2 when you access a function on
a class (eg you do cls.func) it becomes an 'unbound method'. In
Python 3 this is gone, as Peter Donis
mentioned in a comment on that entry; if you do cls.func you get
back the plain function. I'm not entirely sure how I feel about
this, so let's start by asking a relevant question: what's the
difference between an unbound method and the underlying function?
The answer is that calling an unbound method adds extra type checking
on the first argument. When cls.func is an unbound method and you
call it, the first argument must be an instance of cls or a
subclass of it (ie, something for which isinstance() would return
True). If it isn't you get a TypeError (much like the ones we
saw back here). Calling the
function directly has no such requirement; you can feed it anything
at all, even though as a class method it's probably expecting an
instance of the class as its first argument.
I'll admit that it's an open argument whether this type checking
is a good thing or not. It's certainly atypical for Python and the
conversion from a plain function into an unbound method is a bit
surprising to people. There aren't that many situations in Python
where making something an attribute of a simple class magically
changes it into something else; normally you expect 'A.attr =
something; A.attr' to give you 'something' again. The argument
in defense of the checking is that it's a useful safety measure for
functions that are almost certainly coded with certain assumptions
and directly calling class methods on the class is not exactly a
common thing.
Now that I've written this entry, I can see why Python 3 took unbound methods out. They might be handy but they're not actually essential to how things work (unlike bound methods) and Python's mechanics are mostly about what actively has to be there. I guess my view is now that I don't mind them in Python 2 but I doubt I'm going to miss them in Python 3 (if I ever do anything with Python 3).
2014-03-11
How functions become bound or unbound methods
Suppose that you have a class:
class A(object):
def fred(self, a):
print "fred", self, a
Then we have:
>>> a = A() >>> A.fred <unbound method A.fred> >>> b = a.fred >>> b <bound method A.fred of <__main__.A object at 0x1b9c210>>
An unbound method is essentially a function with some trimmings.
A 'bound method' is called that because the first argument (ie
self) is already set to a; you can call b(10) and it works
just the same way as if you had done a.fred(10) (this is actually
necessary given how CPython operates). So far so good, but how
does Python make this all work?
One way that people sometimes explain how Python makes this work
is to say that A.fred has been turned into a Python descriptor. This is sort of
true but it is not quite the full story. What is really going on
is that functions are leading a double life: functions are also
descriptors. All functions are descriptors all of the time,
whether or not they're in a class. At this point you might rationally
ask how a bare function (outside of a class) manages to still work;
after all, when you look at it or try to call it, shouldn't the
descriptor stuff kick in? The answer is descriptors only work
inside classess. Outside of classes, descriptors just sort of sit
there and you can access them without triggering their special
behavior; in the case of functions, this means that you can call
them (and look at their attributes if you feel so inclined).
So the upshot is that if you look at a function outside of a class, it is a function and you can do all of the regular functiony things with it. If you look at it inside a class it instantly wraps itself up inside a bound or unbound method (which you can then pry the original function back out of if you're so inclined). This also neatly explains why other callables don't get wrapped up as bound or unbound methods; they aren't (normally) also descriptors that do this.
This is rather niftier than I expected it to be when I started digging. I'm impressed with Python's cleverness here; I would never have expected function objects to be living a double life. And I have to agree that this is an elegantly simple way to make everything work out just right.
(This entry was inspired by a question Pete Zaitcev asked, which started me wondering about how things actually worked.)
PS: All of this is actually described in the descriptor documentation in the Functions and Methods section. I just either never read that or never fully understood it (or forgot it since then).
Sidebar: Why CPython has to expose bound methods
Given how CPython makes calls, returning bound methods all the time is actually utterly essential. CPython transforms Python code to bytecode and in its bytecode there is no 'call <name>' operation; instead you look up <name> with a general lookup operation and then call the result. Since the attribute lookup doesn't know what the looked up value is going to be used for, it has to always return a bound method.
Of course, bound methods are also the right thing to do with method
functions in general if you believe that functions are first class
entities. It'd be very strange for 'b = a.fred; a.fred(10); b(10)' to
hav the two function calls behave differently.
(The argument over returning unbound methods instead of the bare function is a bit more abstract but I think it's probably the right thing to do.)
2014-02-24
The origins of DWiki and its drifting purpose
One of the interesting things about writing Wandering Thoughts has been getting a vivid and personal experience with what happens when some code you've written gets repurposed for something rather different than what it was originally designed for. Because, you see, DWiki (the wiki engine behind the blog) was not originally intended to be a blog engine and what it was originally designed for shaped it in a number of ways that still show today.
(I alluded to this when I talked about about why comments aren't immediately visible on entries.)
Put simply, I originally designed DWiki as yet another attempt to build
a local sysadmin documentation wiki that my co-workers would use. We
hadn't shown much enthusiasm for writing HTML pages and I didn't think
I could get my co-workers to edit things through the web, but I figured
I at least had a shot if I gave them simple and minimal markup that
they could edit by going 'cd /some/directory; vi file'. This idea
never went anywhere but once I had the core wiki engine I added enough extra
features to make it able to create a
blog, and then I decided I might as well use the features and write one.
(From the right perspective a blog is just a paged time-based view over a directory hierarchy. So are Atom syndication feeds.)
One feature that this original purpose strongly affected is how comments are displayed. To put it one way, if you're creating a sysadmin documentation wiki, input from outsiders is not a primary source of content. It's a potential source of feedback to us, but it's definitely not on par to the (theoretical) stuff we were going to be writing. So I decided that (by default) comments would get a secondary position; if you were just browsing the wiki, you'd have to go out of your way to see the comments. As a wiki, if people left comments with seriously worthwhile feedback we'd fold that feedback into the main page.
(Adding comments was also a sop to the view that all true wikis are web-editable by outsiders. I wasn't going to make the wiki itself web-editable, but this way I could say that we were wiki-like in that we were still allowing outsiders to have a voice.)
Another thing that this original purpose strongly affected was DWiki's
choice of text formatting characters, especially its choice of _
as the 'typewriter text' formatting character. If you're writing about
sysadmin things it's quite common to want to set text in typewriter
text to denote (Unix) commands so you want a nice convenient character
sequence for it; _ looks like a great choice because almost nothing
you write about is going to have actual underscores (they're very
uncommon in Unix command lines). When I instead started using DWiki to
write more and more about code, this turned into a terrible decision
since _ is an extremely common character in identifiers.
(Another choice that looked sensible for writing about Unix commands
but turned out to be bad for writing about code is using ((...)) for a
block of typewriter text with no further formatting. The problem is that
when you're writing about code you often wind up wanting to write about
things with (...) on the end and that confuses the text parser.)
PS: In hindsight I can see all sorts of problems with my idea of a sysadmin documentation wiki. Even if I'd tried to market it better to my co-workers I suspect that it wouldn't have worked, especially as something that was publicly visible.
2014-02-05
An interesting internal Django error we just got
As a result of someone trying to either exploit or damage it, our account request system just notified us that it had hit an internal exception in the depths of Django. While that's not too great, what was really interesting was the specific exception and where it happened; it boiled down to:
[...] File ".../django/db/backends/sqlite3/base.py", line XXX, in execute return Database.Cursor.execute(self, query, params) OverflowError: long too big to convert
Wait, what?
It turns out that the cause of this is (to me) very interesting and also
completely explicable once we trace down the layers. We need to start at
the actual form. Among other things, this presents a <select> element
with the possible values drawn from the database. How Django implements
this in our case is that the text of each option is under our control
but the HTML form 'value' for each option is an integer (which happens
to be the database row's internal primary key). Ie, it looks like this:
[...] <option value="5">Blah blah</option> <option value="6">More blah</option> [...]
Our attacker edited the HTML (I believe using Firefox's developer
tools) to provide a really absurdly large value for the option
that they picked; for example, one attempt had '518446744073709551616'
for this form element. Because Django is a modern web framework it of course does not simply trust this submitted
value; instead it validates it and in the process turns it into a
proper reference to an ORM object representing the particular
database row. If I'm reading the code right, this validation is
done ultimately by making a SQL query to look up the row given its
primary key (in the process this validates that it's among the set
you provided).
This is where we descend down the layers to the SQLite driver.
Because the SQLite driver is a good modern database driver, it uses
SQL placeholders. Since the
primary key field is an integer, this means that the SQLite driver
must convert the value passed down by Django to an actual integer
in order to pass it as a placeholder, and not just a Python integer
but an actual C-level long. As it happens, Django has not passed
down the raw string value but has already called int() on the
hacked up string, which has given us a Python long integer. This
long integer is of course far too big to fit into the C-level long
that the SQLite driver requires and the driver notices, giving us
this OverflowError (well, it turns out that it is the core Python
code that notices, but close enough).
(If you modify the form to something that is not an integer at all, Django detects it at a much higher level and rejects the form cleanly.)
I find this an interesting error partly because of how the low level issues involved show through. A whole cascade of things had to combine together to create this error, including Python's unification of ints and longs, and it is the sort of really obscure corner case that can easily slip through and be overlooked.
(Since it can be triggered from the outside it's probably worth reporting it as a Django bug, but I need to verify that it's still there in the current version. We're a bit behind by now for various reasons.)
PS: We found out about this problem because one of Django's cool
features is that it can be set to email you reports about uncaught
exceptions such as this. The reports include not just the backtrace but
also things like the form POST parameters, which was vital in this
case. Without the POST parameters I would have been totally lost; with
them, once I started looking the absurd values of this particular form
field jumped right out at me.
2014-01-31
Why I now believe that duck typed metaclasses are impossible in CPython
As I mentioned in my entry on fake versus real metaclasses, I've wound up a bit obsessed with the question
of whether it's possible to create a fully functional metaclass
that doesn't inherit from type. Call this a 'duck typed metaclass'
or if you want to be cute, a 'duck typed type' (DTT). As a result
of that earlier entry and some additional
exploration I now believe that it's impossible.
Let's go back to MetaclassFakeVsReal for a moment and look at the
fake metaclass M2:
class M2(object):
def __new__(self, name, bases, dct):
print "M2", name
return type(name, bases, dct)
class C2(object):
__metaclass__ = M2
class C4(C2):
pass
As we discovered, the problem is that C2 is not an instance of M2
and so (among other things) its subclass C4 will not invoke M2 when
it is being created. The real metaclass M1 avoided this problem by
instead using type.__new()__ in its __new__ method. So why
not work around the problem by making M2 do so too, like this:
class M2(object):
def __new__(self, name, bases, dct):
print "M2", name
return type.__new__(self, name, bases, dct)
Here's why:
TypeError: Error when calling the metaclass bases
type.__new__(M2): M2 is not a subtype of type
I believe that this is an old friend in a new
guise. Instances of M2 would normally be based on the C-level
structure for object (since it is a subclass of object), which
is not compatible with the C-level type structure that instances
of type and its subclasses need to use. So type says 'you cannot
do this' and walks away.
Given that we need C2 to be an instance of M2 so that things work
right for subclasses of C2 and we can't use type, we can try brute
force and fakery:
class M2(object):
def __new__(self, name, bases, dct):
print "M2", name
r = super(M2, self).__new__()
r.__dict__.update(dct)
r.__bases__ = bases
return r
This looks like it works in that C4 will now get created by M2.
However this is an illusion and I'll give you two examples of the
ensuing problems, each equally fatal.
Our first problem is creating instances of C2, ie the actual
objects that we will want to use in code. Instance creation is
fundamentally done by calling C2(), which means that M2 needs a
__call__ special method (so that C2, an instance of M2, becomes
callable). We'll try a version that delegates all of the work to type:
def __call__(self, *args, **kwargs):
print "M2 call", self, args, kwargs
return type.__call__(self, *args, **kwargs)
Unsurprisingly but unfortunately this doesn't work:
TypeError: descriptor '__call__' requires a 'type' object but received a 'M2'
Okay, fine, we'll try more or less the same trick as before (which is now very dodgy, but ignore that for now):
def __call__(self, *args, **kwargs):
print "M2 call", self, args, kwargs
r = super(M2, self).__new__(self)
r.__init__(*args, **kwargs)
return r
You can probably guess what's coming:
TypeError: object.__new__(X): X is not a type object (M2)
We are now well and truly up the creek because classes are the only
thing in CPython that can have instances. Classes are instances of
type and as we've seen we can't create something that is both an
instance of M2 (so that M2 is a real metaclass instead of a fake
one) and an instance of type. Classes without instances are obviously
not actually functional.
The other problem is that despite how it appears C4 is not actually
a subclass of C2 because of course classes are the only thing
in CPython that can have subclasses. In specific, attribute lookups
on even C4 itself will not look at attributes on C2:
>>> C2.dog = 10 >>> C4.dog AttributeError: 'M2' object has no attribute 'dog'
The __bases__ attribute that M2.__new__ glued on C4 (and C2)
is purely decorative. Again, looking attributes up through the chain of
bases (and the entire method resolution order)
is something that happens through code that is specific to instances of
type. I believe that much of it lives under the C-level function that
is type.__getattribute__, but some of it may be even more magically
intertwined into the guts of the CPython interpreter than that. And as
we've seen, we can't call type.__getattribute__ ourselves unless we
have something that is an instance of type.
Note that there is literally no attributes we can set on non-type
instances that will change this. On actual instances of type, things
like __bases__ and __mro__ are not actual attributes but are
instead essentially descriptors that look up and manipulate fields
in the C-level type struct. The actual code that does things like
attribute lookups uses the C-level struct fields directly, which is one
reason it requires genuine type instances; only genuine instances even
have those struct fields at the right places in memory.
(Note that attribute inheritance in subclasses is far from the only
attribute lookup problem we have. Consider accessing C2.afunction
and what you'd get back.)
Either problem is fatal, never mind both of them at once (and note
that our M2.__call__ is nowhere near a complete emulation of
what type.__call__ actually does). Thus as far as I can tell
there is absolutely no way to create a fully functional duck typed
metaclass in CPython. To do one you'd need access to the methods
and other machinery of type and type reserves that machinery
for things that are instances of type (for good reason).
I don't think that there's anything in general Python semantics that
require this, so another Python implementation might allow or support
enough to enable duck typed metaclasses. What blocks us in CPython is
how CPython implements type, object, and various core functionality
such as creating instances and doing attribute lookups.
(I tried this with PyPy and it failed with a different set of errors
depending on which bits of type I was trying to use. I don't have
convenient access to any other Python implementations.)
2014-01-21
Fake versus real metaclasses and what a fully functional metaclass is
Lately I've become a little bit obsessed with the question of whether
you can create a fully functional metaclass that doesn't inherit
from type (partly this was sparked by an @eevee tweet, although
it's an issue I brushed against a while back).
It's not so much that I want to do this or think that it's sensible
as that I can't prove what the answer is either way and that bugs
me. But before I try to tackle the big issues I want to talk about
what I mean by 'fully functional metaclass'.
Let's start with some very simple metaclasses, one of which inherits
from type and one of which doesn't:
class M1(type):
def __new__(self, name, bases, dct):
print "M1", name
return super(M1, self).__new__(self, name, bases, dct)
class M2(object):
def __new__(self, name, bases, dct):
print "M2", name
return type(name, bases, dct)
class C1(object):
__metaclass__ = M1
class C2(object):
__metaclass__ = M2
M2 certainly looks like a metaclass despite not inheriting from type
(eg if you try this out you can see that it is triggered on the creation
of C2). But appearances are deceiving. M2 is not a fully functional
metaclass (and there are ways to demonstrate this). So let me show you
what's really going on:
>>> type(C1) <class 'M1'> >>> type(C2) <type 'type'>
(We can get the same information by looking at each class's __class__
attribute.)
The type of a class with a metaclass is the metaclass while the
type of a class without a metaclass is type, and as we can see
from this, C2 doesn't actually have a metaclass. The reason for
this is that M2 created the actual class object for C2 by calling
type() directly, which does not give the newly created class a
metaclass (instead it becomes a direct instance of type). If all
you're interested in is changing a class as it's being created this may not matter, or at least you may not
notice any side effects if you don't subclass your equivalent of
C2.
In this example M1 is what I call a fully functional metaclass and
M2 is not. It looks like one and partly acts like one, but that is an
illusion; at best it can do only one of the many things metaclasses
can do. A fully functional metaclass like M1 can do
all of them.
Now let's come back to a demonstration that M2 is not a real
metaclass. The most alarming way to demonstrate this is to subclass
both classes:
class C3(C1): pass class C4(C2): pass
If you try this out you'll see that M1 is triggered when C3 is
created but M2 is not triggered when C4 is created.
This is very confusing because C4 (and C2 for that matter) has
a visible __metaclass__ attribute. It's just not meaningful
after the creation of C2, contrary to what some documentation
sometimes says. Note that this is sort of documented if you read
Customizing class creation
very carefully; see the section on precedence rules, which only
talks about looking at a __metaclass__ attribute in the actual
class dictionary, not the class dictionaries of any base classes.
Note that this means that general callables cannot be true
metaclasses. To create a true metaclass, one that will be inherited
by subclasses, you must arrange for the created classes to be
instances of you, and only classes can have instances. If you have a
__metaclass__ of, say, a function, it will be called only when
classes explicitly list it as their metaclass; it will not be called for
subclasses. This is going to surprise everyone except experts in Python
arcana, so don't do that even if you think you have a use for it.
(If you do want to customize only classes that explicitly specify a
__metaclass__ attribute, do this check in your __new__ function
by looking at the passed in dictionary. Then people who read the code of
your metaclass have a chance of working out what's going on.)
I will admit that Python 3 cleaned this up by removing the magic
__metaclass__ attribute. Now you can't be misled quite as
much by the presence of C2.__metaclass__ and the visibility of
C4.__metaclass__. To determine whether something really has a
metaclass in Python 3 you have no choice but to look at type(),
which is always honest.
2014-01-16
Link: Armin Ronacher's 'More About Unicode in Python 2 and 3'
Armin Ronacher's More About Unicode in Python 2 and 3 contains a lot of information about the subject from someone who works with this stuff and so is much better informed about it in practice than I am. A sample quote:
I will use this post to show that from the pure design of the language and standard library why Python 2 the better language for dealing with text and bytes.
Since I have to maintain lots of code that deals exactly with the path between Unicode and bytes this regression from 2 to 3 has caused me lots of grief. Especially when I see slides by core Python maintainers about how I should trust them that 3.3 is better than 2.7 makes me more than angry.
I learned at least two surprising things from reading this. The first was that I hadn't previously realized that string formatting is not available for bytes in Python 3, only for Unicode strings. The second is that Mercurial has not and is not being ported to Python 3. As Ronacher notes, it turns out that these two issues are not unrelated.
For me, the lack of formatting for bytes adds another reason for not using Python 3 even for new code because it forces me into more Unicode conversion even if I know exactly what I'm doing with those unconverted bytes. Since I use Unix, with its large collection of non-Unicode byte APIs, there are times when this matters.
(For instance, it is perfectly sensible to manipulate Unix file paths as bytes without trying to convert them to Unicode. You can split them into path components, add prefixes and suffixes, and so on all without having to interpret the character sets of the file name components. In fact, in degenerate situations the file name components may be in different character sets, with a directory name in UTF-8 and file name inside a subdirectory in something else. At that point there is no way to properly decode the entire file path to meaningful Unicode. But I digress from Armin Ronacher's article.)
2014-01-06
The problem with compiling your own version of Python 3
I've previously mentioned in passing that I simply can't use Python 3 on some platforms because it's not there (or perhaps it's there only in an old and lacking version). As reported by Twirrim, in some places the popular answer to this issue is to say that I should just compile my own local version of Python 3 by hand (perhaps in a virtualenv). At this point most of the sysadmins in the audience are starting to get out of their chairs, but hold on for a moment; I want to make a general argument.
There is a spectrum of Python coding that ranges from big core systems that are very important down to casual utilities. For something that is already big and complex, the extra overhead of compiling a specific version of Python is small (you've probably already got complex installation and management procedures even if you've automated them) and can potentially increase the reliability of the result. Nor is the extra disk space for another copy of the Python interpreter likely to be a problem; even if the disk space used by your system doesn't dwarf it, your core system is important enough that the disk space doesn't matter. But all of this turns on its head for sufficiently little and casual utilities; because they're so small, building and maintaining a custom Python interpreter drastically increases the amount of effort required for them as a total system.
Somewhere on the line between big core systems and little casual utilities is an inflection point where the pragmatic costs of a custom Python interpreter exceed either or both of the benefits from the utilities and the benefits from using Python 3 instead of whatever version of Python 2 you already have. Once you hit it, 'install Python 3 to use this' ceases being a viable approach. Where exactly this inflection point is varies based on local conditions (including how you feel about various issues involved), but I argue that it always exists. So there are always going to be situations where you can't use your own version of Python 3 because the costs exceed the benefits.
(With that settled, the sysadmins can now come out of their chairs to argue about exactly where the inflection point is and point out any number of issues that push it much further up the scale of systems than one might expect.)