Handling lines with something-separated fields for Python

March 4, 2007

As a system administrator, I spend a bunch of my time dealing with files made up of lines that are composed of fields separated by some character. A classical example is /etc/passwd, with colon-separated fields. These file formats are ordered lists with named fields, which should sound familiar, but they don't show up as Python lists, they show up as lines of text and they want to be output as text; that we use lists to represent them is just an implementation detail.

This only takes a little bit of extra work to implement on top of our previous SetMixin class:

class FieldLine(SetMixin, list):
    separator = ":"
    def __init__(self, line):
        n = line.split(self.separator)
        super(FieldLine, self).__init__(n)
    def __str__(self):
        return self.separator.join(self)

class PasswdLine(FieldLine):
    fields = gen_fields('name', 'passwd',
                        'uid', 'gid', 'gecos',
                        'dir', 'shell')

(Where gen_fields is basically the dict()-ized version of enum_args from here.)

Now that I've written these entries, I have a confession: this is actually what I started out doing. I didn't first build a general ordered list with named fields class and then realized it could be used to deal with /etc/passwd lines; I started out needing to deal with /etc/passwd lines, decided that I wanted read/write access to named fields, and then built downwards. I just wrote it up backwards because it looks neater that way.

(In fact this is the cleaned up and idealized version of this class. The real one in my program does not subclass list; instead it is a normal class with a private 'field_store' list and everything just directly manipulates that. It also doesn't handle the slicing cases, because I didn't need to. I did the new version for here for various reasons, including that it was a good excuse to play around with subclassing built in types.)


Comments on this page:

From 62.30.223.72 at 2007-03-10 01:52:16:

Hi Chris, There is alsso the CSV module. It even has an example (the second one), here: http://docs.python.org/lib/csv-examples.html

By cks at 2007-03-10 02:26:43:

Splitting up the passwd lines is the least interesting part; if I had just wanted that, I would just have used .split() myself. What I wanted was writable passwd fields I could refer to by name, since that was a lot clearer than having to remember and deal with field indexes.

(That I needed to update fields and write passwd entries out made the pwd module unsuitable; it is read-only, and only reads from the real passwd database anyways.)

From 62.30.223.72 at 2007-03-11 05:34:39:

Hi Again Chris, I'm paddy3118 by the way (paddy3118.blogspot.com). I took the time to write my own code snippet to allow me to easiy refer to password field items:

>>> from UserDict import IterableUserDict
>>> class D(IterableUserDict): pass
... 
>>> passwdlines = '''Administrators:*:544:544:,S-1-5-32-544::
... Administrator:unused_by_nt/2000/xp:244:245:\
  U-PADDYS-HPLAPTOP\Administrator,S-1-5-26-26XX959-1726ZZ9-41423YY:\
  /home/Administrator:/bin/bash'''.split('\n')
>>> fieldnames = ['name', 'passwd', 'uid', 'gid', 'gecos', 'dir', 'shell']
>>> pswds = [ D(zip(fieldnames, l.split(':'))) for l in passwdlines]
>>> for d in pswds:
... 	d.__dict__.update(d.iteritems())
... 
>>> pswds[0]['name']
'Administrators'
>>> pswds[0].name
'Administrators'
>>> 

I use the trick of IterableUserDict then assigning to its __dict__ so I can acces data fieds with .name or ['name']

- Paddy.

By cks at 2007-03-11 17:30:47:

(An administrative note: I have used magic site admin powers to split a very long <pre> line in paddy3118's example into multiple lines, so that it doesn't force the text very wide. People playing with the example should turn the second 'Administrators:' line and the two lines after it into one long line, deleting the '\' at the end of the lines.)

By cks at 2007-03-12 15:42:59:

This is an interesting way of getting a dictionary that you can address as a structure (ie, with 'obj.field'). Note that the fields are really read only, because there's nothing that forces a write to one naming of a field to also update the other.

Written on 04 March 2007.
« Weekly spam summary on March 3rd, 2007
Some useful new Linux software RAID features »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sun Mar 4 22:44:56 2007
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.