# Wandering Thoughts archives

2018-10-16

## Quickly bashing together little utilities with Python is nice

One of the reasons that I love Python, and one of the things that I love using it for, is its ability to quickly and easily bash together little utility programs. By their nature, these things don't get talked about very much (they're so small and often so one-off), but this time I have a couple of examples to talk about.

As part of writing yesterday's entry on external email delivery delays we see, I found myself wanting to turn Exim's human-friendly format that it reports time durations in into an integer number of seconds, so that I could sort it, and then later on I found myself wanting to convert the other way, so that I could tell what the few high number of seconds I was getting turned into in human-readable terms.

Exim's queue delay format looks like '1d19h54m13s'. The guts of my little Python program to convert these into seconds looks like this:

```rexp = re.compile("([a-z])")
timemap = {'s': 1,
'm': 60,
'h': 60*60,
'd': 24*60*60,
}

def process():
for l in sys.stdin:
sr = rexp.split(l)
# The minimum split should be '1', 's', '\n'.
if len(sr) < 3:
continue
secs = 0
for i in range((len(sr)-1) // 2):
o = i*2
secs += int(sr[o]) * timemap[sr[o+1]]
print(secs)
```

The core trick is to use a Python regexp to split '1d19h54m13s' into `['1', 'd', '19', 'h', '54', 'm', '13', 's']` (plus a trailing newline in this case). We can then take pairs of these things, turn the first into a number, and multiply it by the unit conversion determined by the second.

Going the other direction looks surprisingly similar (for example, I literally copied the `timemap` over):

```timemap = {'s': 1,
'm': 60,
'h': 60*60,
'd': 24*60*60,
}

def tsstring(secs):
o = []
for i in ('d', 'h', 'm', 's'):
if secs >= timemap[i] or o:
n = secs // timemap[i]
o.append("%d%s" % (n, i))
secs %= timemap[i]
return "".join(o)
```

There are probably other, more clever ways to do both conversions, as well as little idioms that could make these shorter and perhaps more efficient. But one of the great quiet virtues of Python is that I didn't need to reach for any special idioms to bash these together. The only trick is the regular expression split and subsequent pairing in the first program. Everything else I just wrote out as the obvious thing to do.

Neither of these programs worked quite right the first time I wrote them, but in both cases they were simple enough that I could realize my oversight by staring at things for a while (and it didn't take very long). Neither needed a big infrastructure around them, and with both I could explore their behavior and the behavior of pieces of them interactively in the Python interpreter.

(Exploring things in the Python interpreter was where I discovered that the `.split()` was going to give me an odd number of elements no matter what, so I realized that I didn't need to `.strip()` the input line. I'm sure that people who work with RE `.split()` in Python know this off the top of their head, but I'm an infrequent user at this point.)

Neither of these programs have any error handling, but neither would really be improved in practice by having to include it. They would be more technically correct, but I should never feed these bad input and if I do, Python will give me an exception despite me skipping over the entire issue. That's another area where Python has an excellent tradeoff for quickly putting things together; I don't have to spend any time on error handling, but at the same time I can't blow my foot off (in the sense of quietly getting incorrect results) if an error does happen.

(I almost started writing the Exim format to seconds conversion program in Go, except when I looked it up `time.ParseDuration()` doesn't accept 'd' as a unit. I think writing it in Python was easier overall.)

### Sidebar: Code bumming and not thinking things through

I knew I would be processing a decent number of Exim delay strings in my program, so I was worried about it being too slow. This led me to write the core loop in the current not quite obvious form with integer indexing, when it would be more straightforward to iterate through the split list itself, pulling elements off the front until there wasn't enough left. There are alternate approaches with an explicit index.

Looking at this now the day afterward, I see that I could have made my life simpler by remembering that `range()` takes an iteration step. That would make the core loop:

```        for i in range(0, len(sr)-1, 2):
secs += int(sr[i]) * timemap[sr[i+1]]
```

This would also perform better. Such are the perils of paying too close attention to one thing and not looking around to think about if there's a better, more direct way.