Categories: links, linux, programming, python, snark, solaris, spam, sysadmin, tech, unix, web.
|
2009-11-30 Using content hashing to avoid the double post problemFor those who have not encountered it, the double post problem (or the double comment problem) happens when your web system is just slow enough to respond that the user clicks 'Post <whatever>' again in their browser and re-submits the same post/comment/what have you. In a straightforwardly implemented system, this results in a second copy of the comment or post appearing. (This is of course a specific instance of a general double submission problem for all web forms.) I worried about this problem when writing DWiki's comment system, and the way I chose to deal with it was to use a (cryptographic) hash of the comment's content as the internal name of the comment. Since the contents of repeated posts are the same, they will all have the same name and so no matter what, there would only be one copy of the comment. (DWiki's code detects the case of trying to post a comment that already exists and quietly tells people that they succeeded.) To me, the appeal of this approach is that I get all of this for free. I have to generate some internal name for the comment; by making it a hash of the content, I get duplicate suppression without having to do anything extra. When you take this approach, one of the important things that you need to decide is what makes a comment or a post 'the same', such that two separate submissions should hash to the same name and turn into one. Is it the contents alone, the contents plus the authorship (and if so, what elements of authorship for unauthenticated comments), or the contents plus the authorship plus the time to some resolution? (For comments specifically, I think that this is going to depend to some extent on what sort of environment you want. Choosing to hash only comment content will have the effect of suppressing duplicate short posts such as 'me too', 'I agree', and so on, even if they're written by different people at different times.) For DWiki, I chose to hash on the comment context plus the authorship, which includes the IP address. This will usually suppress real duplicate posts but in theory could fail if the comment is being submitted through something where the IP address keeps changing (such as a revolving web proxy, or from a machine that changed IP addresses between two submission attempts).
Poking around the OpenSolaris codebase (for sysadmins)If you do much work with Solaris things like DTrace and (Just to confuse you, OpenSolaris is called 'ON/Nevada' in much of this.) Now, there's an important caution to this: OpenSolaris source is not
the same thing as Solaris source. You can generally use OpenSolaris
source as a guide to what you'll find with DTrace et al, but it's not
a sure thing (even if you go back in version history), and sometimes
you will find important differences. Some of these can be spotted by
looking at structure definitions in your own system's include files in
Everything useful in the onnv-gate repository lives in
In general the repository history in the onnv-gate repository is not
very useful. Sometimes you can use ' (The other thing I've used the repo history for is to trace the code for a particular ZFS kernel feature that I wanted to use back in time to establish that I would have to use a relatively recent OpenSolaris build in order to get it.)
|
These are my WanderingThoughts GettingAround This is part of CSpace, and is written by ChrisSiebenmann. * * * Atom feeds are available; see the bottom of most pages. Categories: links, linux, programming, python, snark, solaris, spam, sysadmin, tech, unix, web |