A thought on reading multiline recordsI've recently been writing some small programs to digest multiline
records which don't have an end of record marker, just a start of record
one (in my case, the output of Solaris In awk this structure works decently, although it can get unclear and your 'start of line' code can get quite big. It's more problematic in something like Python, because it cuts against the natural subroutine structure of the problem. The obvious structure a subroutine that processes a record, but if you do this you wind up passing it the first line of its record and having it return the first line of the next one. When I was writing code to do this, it struck me that the way out is to have a specialized file reader that returned a special 'end of record' marker as well as an 'end of file' one. This lets your 'process a record' subroutine just read and process lines until it gets an end of record result. (Internally, the specialized reader has to store the first line of the new record and returns it the next time it's called.) There's more overall code in the version of my program that uses the specialized reader approach, but it's clearer code so I like it better. Sidebar: simple record reader codeHere is the code I wound up using for this:
EOR = object()
class RecordReader(object):
def __init__(self, fo, sre):
self.pending = None
self.mre = re.compile(sre)
self.fo = fo
self.eof = False
self.first = True
def readline(self):
if self.pending:
pl = self.pending
self.pending = None
return pl
line = self.fo.readline()
if not line:
self.eof = True
return line
if self.mre.match(line) and not self.first:
self.pending = line
return EOR
else:
self.first = False
return line
It takes a file object and an (uncompiled) regular expression that
matches the start of record lines. As I found out the hard way, you need
the Given this, we can write the obvious function to read an entire record:
def readrecord(reader):
lns = []
while 1:
line = reader.readline()
if not line or line is EOR:
break
lns.append(line)
return lns
The
(2 comments.)
|
These are my WanderingThoughts GettingAround This is part of CSpace, and is written by ChrisSiebenmann. * * * Atom feeds are available; see the bottom of most pages. Categories: links, linux, programming, python, snark, solaris, spam, sysadmin, tech, unix, web |