2024-11-29
Python type hints are probably "worth it" in the large for me
I recently added type hints to a little program, and that experience wasn't entirely positive that left me feeling that maybe I shouldn't bother. Because I don't promise to be consistent, I went back and re-added type hints to the program all over again, starting from the non-hinted version. This time I did the type hints rather differently and the result came out well enough that I'm going to keep it.
Perhaps my biggest change was to entirely abandon NewType()
.
Instead I set up two NamedTuple
s and used type aliases for
everything else, which amounts to three type aliases in total.
Since I was using type aliases anyway, I only added them when it
was annoying to enter the real type (and I was doing it often enough).
I skipped doing a type alias for 'list[namedTupleType]' because I
couldn't come up with a name that I liked well enough and that it's
a list is fundamental to how it's interacted with in the code
involved, so I didn't feel like obscuring that.
Adding type hints 'for real' had the positive aspect of encouraging me to write a bunch of comments about what things were and how they worked, which will undoubtedly help future me when I want to change something in six months. Since I was using NamedTuples, I changed to accessing the elements of the tuples through the names instead of the indexes, which improved the code. I had to give up 'list(adict.items())' in favour of a list comprehension that explicitly created the named tuple, but this is probably a good thing for the overall code quality.
(I also changed the type of one thing I had as 'int' to a float, which is what it really should have been all along even if all of the normal values were integers.)
Overall, I think I've come around to the view that doing all of this is good for me in the same way that using shellcheck is good for my shell scripts, even if I sometimes roll my eyes at things it says. I also think that just making mypy silent isn't the goal I should be aiming for. Instead, I should be aiming for what I did to my program on this second pass, doing things like introducing named tuples (in some form), adding comments, and so on. Adding final type hints should be a prompt for a general cleanup.
(Perhaps I'll someday get to a point where I add basic type hints as I write the code initially, just to codify my belief about the shape of what I'm returning and passing in, and use them to find my mistakes. But that day is probably not today, and I'll probably want better LSP integration for it in my GNU Emacs environment.)
2024-11-27
Some notes on my experiences with Python type hints and mypy
As I thought I might, today I spent some time adding full and relatively honest type hints to my recent Python program. The experience didn't go entirely smoothly and it left me with a number of learning experiences and things I want to note down in case I ever do this again. The starting point is that my normal style of coding small programs is to not make classes to represent different sorts of things and instead use only basic built in collection types, like lists, tuples, dictionaries, and so on. When you use basic types this way, it's very easy to pass or return the wrong 'shape' of thing (I did it once in the process of writing my program), and I'd like Python type hints to be able to tell me about this.
(The first note I want to remember is that mypy becomes very irate at you in obscure ways if you ever accidentally reuse the same (local) variable name for two different purposes with two different types. I accidentally reused the name 'data', using it first for a str and second for a dict that came from an 'Any' typed object, and the mypy complaints were hard to decode; I believe it complained that I couldn't index a str with a str on a line where I did 'data["key"]'.)
When you work with data structures created from built in collections, you can wind up with long, tangled compound type name, like 'tuple[str, list[tuple[str, int]]]' (which is a real type in my program). These are annoying to keep typing and easy to make mistakes with, so Python type hints provide two ways of giving them short names, in type aliases and typing.NewType. These look almost the same:
# type alias: type hostAlertsA = tuple[str, list[tuple[str, int]]] # NewType(): hostAlertsT = NewType('hostAlertsT', tuple[str, list[tuple[str, int]]])
The problem with type aliases is that they are aliases. All aliases for a type are considered to be the same, and mypy won't warn if you call a function that expects one with a value that was declared to be another. Suppose you have two sorts of strings, ones that are a host name and ones that are an alert name, and you would like to keep them straight. Suppose that you write:
# simple type aliases type alertName = str type hostName = str func manglehost(hname: hostName) -> hostName: [....]
Because these are only type aliases and because all type aliases are treated as the same, you have not achieved your goal of keeping you from confusing host and alert names when you call 'manglehost()'. In order to do this, you need to use NewType(), at which point mypy will complain (and also often force you to explicitly mark bare strings as one or the other, with 'alertName(yourstr)' or 'hostName(yourstr)').
If I want as much protection against this sort of type confusion, I want to make as many things as possible be NewType()s instead of type aliases. Unfortunately NewType()s have some drawbacks in mypy for my sort of usage as far as I can see.
The first drawback is that you cannot create a NewType of 'Any':
error: Argument 2 to NewType(...) must be subclassable (got "Any") [valid-newtype]
In order to use NewType, I must specify concrete details of my actual (current) implementation, rather than saying just 'this is a distinct type but anything can be done with it'.
The second drawback is that this distinct typing is actually a problem when you do certain sorts of transformations of collections. Let's say we have alerts, which have a name and a start time, and hosts, which have a hostname and a list of alerts:
alertT = NewType('alertT', tuple[str, int]) hostAlT = NewType('hostAlT', tuple[str, list[alertT]])
We have a function that receives a dictionary where the keys are hosts and the values are their alerts and turns it into a sorted list of hosts and their alerts, which is to say a list[hostAlT]). The following Python code looks sensible on the surface:
def toAlertList(hosts: dict[str, list[alertT]) -> list[hostAlT]: linear = list(hosts.items()) # Don't worry about the sorting for now return linear
If you try to check this, mypy will declare:
error: Incompatible return value type (got "list[tuple[str, list[alertT]]]", expected "list[hostAlT]") [return-value]
Initially I thought this was mypy being limited, but in writing this entry I've realized that mypy is correct. Our .items() returns a tuple[str, list[alertT]], but while it has the same shape as our hostAlT, it is not the same thing; that's what it means for hostAlT to be a distinct type.
However, it is a problem that as far as I know, there is no type
checked way to get mypy to convert the list we have into a
list[hostAlT]. If you create a new NewType to be the list type,
all it 'aListT', and try to convert 'linear
' to it with 'l2 =
aListT(linear)', you will get more or less the same complaint:
error: Argument 1 to "aListT" has incompatible type "list[tuple[str, list[alertT]]]"; expected "list[hostAlT]" [arg-type]
This is a case where as far as I can see I must use a type alias for 'hostAlT' in order to get the structural equivalence conversion, or alternately use the wordier and as far as I know less efficient list comprehension version of list() so that I can tell mypy that I'm transforming each key/value pair into a hostAlT value:
linear = [hostAlT(x) for x in hosts.items()]
I'd have the same problem in the actual code (instead of in the type hint checking) if I was using, for example, a namedtuple to represent a host and its alerts. Calling hosts.items() wouldn't generate objects of my named tuple type, just unnamed standard tuples.
Possibly this is a sign that I should go back through my small programs after I more or less finish them and convert this sort of casual use of tuples into namedtuple (or the type hinted version) and dataclass types. If nothing else, this would serve as more explicit documentation for future me about what those tuple fields are. I would have to give up those clever 'list(hosts.items())' conversion tricks in favour of the more explicit list comprehension version, but that's not necessarily a bad thing.
Sidebar: aNewType(...) versus typing.cast(typ, ....)
If you have a distinct NewType() and mypy is happy enough with you, both of these will cause mypy to consider your value to now be of the new type. However, they have different safety levels and restrictions. With cast(), there are no type hint checking guardrails at all; you can cast() an integer literal into an alleged string and mypy won't make a peep. With, for example, 'hostAlT(...)', mypy will apply a certain amount of compatibility checking. However, as we saw above in the 'aListT' example, mypy may still report a problem on the type change and there are certain type changes you can't get it to accept.
As far as I know, there's no way to get mypy to temporarily switch to a structural compatibility checking here. Perhaps there are deep type safety reasons to disallow that.
2024-11-26
Python type hints may not be for me in practice
Python 3 has optional type hints (and has had them for some time), and some time ago I was a bit tempted to start using some of them; more recently, I wrote a small amount of code using them. Recently I needed to write a little Python program and as I started, I was briefly tempted to try type hints. Then I decided not to, and I suspect that this is how it's going to go in the future.
The practical problem of type hints for me when writing the kind of (small) Python programs that I do today is that they necessarily force me to think about the types involved. Well, that's wrong, or at least incomplete; in practice, they force me to come up with types. When I'm putting together a small program, generally I'm not building any actual data structures, records, or the like (things that have a natural type); instead I'm passing around dictionaries and lists and sets and other basic Python types, and I'm revising how I use them as I write more of the program and evolve it. Adding type hints requires me to navigate assigning concrete types to all of those things, and then updating them if I change my mind as I come to a better understanding of the problem and how I want to approach it.
(In writing this it occurs to me that I do often know that I have distinct types (for example, for what functions return) and I shouldn't mix them, but I don't want to specify their concrete shape as dicts, tuples, or whatever. In looking through the typing documentation and trying some things, it doesn't seem like there's an obvious way to do this. Type aliases are explicitly equivalent to their underlying thing, so I can't create a bunch of different names for eg typing.Any and then expect type checkers to complain if I mix them.)
After the code has stabilized I can probably go back to write type hints (at least until I get into apparently tricky things like JSON), but I'm not sure that this would provide very much value. I may try it with my recent little Python thing just to see how much work it is. One possible source of value is if I come back to this code in six months or a year and want to make changes; typing hints could give me both documentation and guardrails given that I'll have forgotten about a lot of the code and structure by then.
(I think the usual advice is that you should write type hints as you write the program, rather than go back after the fact and try to add them, because incrementally writing them during development is easier. But my new Python programs tend to sufficiently short that doing all of the type hints afterward isn't too much work, and if it gets me to do it at all it may be an improvement.)
PS: It might be easier to do type hints on the fly if I practiced with them, but on the other hand I write new Python programs relatively infrequently these days, making typing hints yet another Python thing I'd have to try to keep in my mind despite it being months since I used them last.
PPS: I think my ideal type hint situation would be if I could create distinct but otherwise unconstrained types for things like function arguments and function returns, have mypy or other typing tools complain when I mixed them, and then later go back to fill in the concrete implementation details of each type hint (eg, 'this is a list where each element is a ...').