2012-09-09
The core difference between Debian source packages and RPMs
At least from my perspective, the two big source package formats in the Linux world are Debian's and (source) RPMs. I've worked with both (although far more with RPMs than with debs) and recently I've formed an opinion on what the core difference between them is and what each is better (or best) at.
The Debian source format is optimized for the case where the 'upstream' developer is also effectively the Debian packager (in what Debian calls a 'native' package). The Debian control files live in the general distribution tarball and you can build Debian packages right from the development tree with no fuss and bother. You don't need to have any extra bureaucracy or keep things outside the source tree.
The RPM source format is optimized for packaging (and changing) other people's packages. Everything lives outside the source tree (indeed in a completely separate area) and from the start all modifications were supposed to be made as a sequence of patches. In theory RPM has support for 'native' packages, packages with a spec file integrated into their source tarball, but I don't think many people really use this and it's certainly not the natural way to work with RPM packages.
Even though RPM has some 'native' support, it has no way to build a package from an unpacked source tree the way that Debian does. By contrast, building from an unpacked tree is the fundamental operation in Debian packaging. If you're developing your program and want to repeatedly build the package the Debian approach is much more convenient. The flipside is true (in my opinion) if you're packaging and possibly modifying an upstream package; there the RPM approach is cleaner and easier to work with, as I've sort of grumbled about before.
This doesn't quite make arguments about which source format is better into arguments about editors, but in my opinion it does move the question one step removed. The right question is not which is better but which situation is more common.
(In my biased opinion I believe that the answer is 'packaging other people's programs' and in fact it's proven to be a mistake to have the upstream developer try to also package the program, but the latter is a topic for another entry.)
When I've interned Python strings
One day, I read the following on reddit's r/python:
Is manually interning a string every a good idea? I'm having trouble thinking of a use case where the cost is justified outside the compilation cycle.
and a reply:
I've yet to run into a situation where it was, but I guess it could be in a very restricted number of situations, such as a small set of non-literal strings generated over and over again (e.g. some sort of parser), interning could reduce memory pressure.
This is exactly the case that I ran into at one point, with one of my Python daemons which was being used in a fairly demanding situation where I wanted to minimize the memory usage and memory churn over time. My program had three important features for having this make sense: it had big files to parse, there was a lot of repeated text in the files (text that had to be saved), and the files were changed a bit and reloaded on a relatively frequent basis.
Interning repeated text is an obvious win for memory usage, if you have a decent amount of it (measuring helps to know this). In my situation it also helped avoid memory churn during reloads of the files. When you reload a configuration file, the usual case is that almost all of the text is the same as it was the last time; this creates a lot of text and string duplication as you re-parse the file and get the same results as last time for most of it. Interning strings here insures that you do not create a boatload of new strings every time you reload the configuration file (and discard a boatload of old ones); instead you're likely to create only a few new ones and discard a few old ones.
Of course all of this care and interning may be a micro-optimization that doesn't make any difference in your actual circumstance. Interning strings is a performance optimization, so like any other optimization you should measure it to see if it gives you any benefits.
(For my program in our specific situation, it was one of a number of things that did make a visible difference.)