Using content hashing to avoid the double post problem
For those who have not encountered it, the double post problem (or the double comment problem) happens when your web system is just slow enough to respond that the user clicks 'Post <whatever>' again in their browser and re-submits the same post/comment/what have you. In a straightforwardly implemented system, this results in a second copy of the comment or post appearing.
(This is of course a specific instance of a general double submission problem for all web forms.)
I worried about this problem when writing DWiki's comment system, and the way I chose to deal with it was to use a (cryptographic) hash of the comment's content as the internal name of the comment. Since the contents of repeated posts are the same, they will all have the same name and so no matter what, there would only be one copy of the comment.
(DWiki's code detects the case of trying to post a comment that already exists and quietly tells people that they succeeded.)
To me, the appeal of this approach is that I get all of this for free. I have to generate some internal name for the comment; by making it a hash of the content, I get duplicate suppression without having to do anything extra.
When you take this approach, one of the important things that you need to decide is what makes a comment or a post 'the same', such that two separate submissions should hash to the same name and turn into one. Is it the contents alone, the contents plus the authorship (and if so, what elements of authorship for unauthenticated comments), or the contents plus the authorship plus the time to some resolution?
(For comments specifically, I think that this is going to depend to some extent on what sort of environment you want. Choosing to hash only comment content will have the effect of suppressing duplicate short posts such as 'me too', 'I agree', and so on, even if they're written by different people at different times.)
For DWiki, I chose to hash on the comment context plus the authorship, which includes the IP address. This will usually suppress real duplicate posts but in theory could fail if the comment is being submitted through something where the IP address keeps changing (such as a revolving web proxy, or from a machine that changed IP addresses between two submission attempts).
Poking around the OpenSolaris codebase (for sysadmins)
If you do much work with Solaris things like DTrace and
mdb -k, you
are sooner or later going to want to poke around the OpenSolaris code,
both for kernels and for utilities and so on. If you do this very much,
you are going to want your own local copy of the OpenSolaris codebase
(while you can use the OpenSolaris website, sooner or later navigating
through it will drive you mad). You can get a copy with Mercurial; see
here for instructions on
how. For spelunking purposes, there is little to no reason to get the
binary only bits.
(Just to confuse you, OpenSolaris is called 'ON/Nevada' in much of this.)
Now, there's an important caution to this: OpenSolaris source is not
the same thing as Solaris source. You can generally use OpenSolaris
source as a guide to what you'll find with DTrace et al, but it's not
a sure thing (even if you go back in version history), and sometimes
you will find important differences. Some of these can be spotted by
looking at structure definitions in your own system's include files in
/usr/include, but not all of the interesting header files make it
there. In some cases you may have to resort to dumping structures with
Everything useful in the onnv-gate repository lives in
and I'm going to quote paths relative to this from now on.
- In general, everything has most of its code in a <whatever>/common
subdirectory (for 'code common across all architectures', I assume).
mdbsource is in
cmd/mdb, and the
mdbmodules are mostly in the
mdbmodule source can be the best way to find out interesting
mdbcommands and exactly what they do; this can lead to useful discoveries.
- most interesting kernel source is in
uts/commonin a relatively obvious layout. Many internal header files are in the
sys/subdirectory here; others can be found in the source area for their code, eg
fs/zfs/sysfor internal ZFS headers.
- ZFS commands rely on some ZFS libraries to do most of the work;
lib/libzfs. These are what you need to look at if you want to figure out the division between user and kernel space, and also what limitations are artificially imposed by
zfsand what limitations are real.
In general the repository history in the onnv-gate repository is not
very useful. Sometimes you can use '
hg log -v' and so on to pick out
the specific code change that fixed a bug number that you're interested
in, and thus see how applicable to your particular circumstances it may
(The other thing I've used the repo history for is to trace the code for a particular ZFS kernel feature that I wanted to use back in time to establish that I would have to use a relatively recent OpenSolaris build in order to get it.)