Wandering Thoughts archives

2010-03-23

No DSL is 'human-readable' as such

One of the popular advantages attributed to using domain-specific languages is that they're 'human-readable'. This label is highly misleading because there is no such thing as a human readable computer language of any sort, DSL or otherwise.

Despite decades of AI research, we cannot instruct computers in natural language; it is too slippery and imprecise. As a result, all languages used to specify things to computers have specific semantics and are not simply 'plain English'. Even if you try hard to use English words and English phrases, what you wind up with is something like legal language (and for much the same reasons); the words may be English, but everything has a specific technical meaning and you cannot truly understand what programs actually do or write them until you understand those technical meanings.

(One of the heights of these attempts to be as English as possible is COBOL, and that didn't work; you still can't completely understand or write COBOL programs without knowing COBOL, and the process created an ugly monstrosity.)

DSLs are not magically exempt from this because they are small and simple (if they are). Just like other computer languages, you cannot really understand what any DSL 'program' does without learning the DSL itself, which is to say learning the specific technical meaning and semantics that the DSL attaches to words and phrases (and punctuation and so on). To the extent that a DSL looks 'human readable' without this learning, it's fooling you; you don't truly understand what things mean, and if you believe otherwise you can go horribly wrong.

(Sometimes you get away with it, just as much as you can sometimes hack away in a full programming language that is vaguely familiar to one that you already know. But that's not something to rely on; for real production work, as opposed to quick test hacks, you're going to need to learn the DSL.)

I believe that there are real advantages to DSLs, including being easier to read. But this is not the same thing as being 'human readable', and pretending it is distorts discussions of DSLs and their advantages (and drawbacks). Specifically, using a DSL instead of some alternative doesn't mean that people magically don't have to learn the DSL because they already understand what it does; they don't, not really.

(I alluded to this in passing in an earlier entry.)

DSLIsNotHumanReadable written at 00:03:01; Add Comment

2010-03-21

The power of 'I like this' in social applications

One of the things that turns plain applications into social applications is the ability to get feedback from other people, the ability to know that other people are appreciating your stuff. I think it follows as a corollary that if you want to build a popular social application, getting the feedback right matters a lot; the more feedback people get, the more likely they are to continue using your social application.

To be specific for a bit, let's talk about Flickr (partly because I have experience with it as a user). The leading feedback mechanism on Flickr is leaving comments on other people's photographs. This has the drawback that writing a comment requires you to either have something to say about the picture or be willing to leave a meaningless noise comment; otherwise, you won't do anything, reducing the level of feedback and thus the 'stickyness' of Flickr as a whole.

(You can in theory mark a photograph as a 'favorite', but this is a bad feedback mechanism for various reasons, including that people like a lot more pictures than are their favorites.)

Experiencing this effect directly has led me to a thesis: in a social application, it's useful to give people a lightweight, socially acceptable way of saying 'I liked this', 'thumbs up', or the equivalent, without forcing them to find something to actually say (or to clutter up actual comments with 'me too'). People are not necessarily articulate about things, they don't necessarily want to write, and forcing them to do so in order to create feedback lessens the amount of feedback that they leave. Reducing the amount of work and effort that it takes to create feedback means that you'll get more of it, with the accompanying good effects.

(In support of this thesis, I note that a number of recent social applications have an explicit 'I like this' feedback option, such as Facebook and Tumblr. Facebook even illustrates how you can condense such feedback so that it doesn't take up as much space as comments.)

PowerOfLike written at 02:24:04; Add Comment

2010-03-12

End results versus what's inside the black box

One of the divisions in technology is between people who mostly care about the end results and people who care (sometimes very passionately) about what is inside the black boxes that they use. The former sort say things like 'the Pentium is the best-performing CPU right now'; the latter say things like 'the SPARC architecture is far more elegant than the ugly hacks of the x86'.

(This division is not exclusive to computer hardware, but computer hardware and especially CPU architecture is a common hotbed of people who care a lot about it.)

I used to think that I was more the latter sort of person than the former, but either that changed over time or I was lying to myself. These days, it's pretty clear I'm much more someone who cares about the ends than someone who cares about what's inside the box. I certainly don't make my technology decisions (even for my personal machines) based on the elegance of the hardware; by now, I care far more about how well it runs things that I care about.

(For example, the x86 architecture is a horrible mess but you know what, I don't care. The compiler worries about the ugliness and the limited register set, and Intel and AMD have consistently delivered the affordable performance that all of the RISC vendors failed to manage. I would be happier if it had been the other way around, but I don't feel very strongly about it any more.)

This obviously strongly influences my attitudes on things like Unix workstation mythology. Because I care more about end results these days, I'm not much taken with arguments that old Unix workstation hardware, old RISC chips, and so on were intrinsically superior to today's PC hardware because they were more elegant and less of a horrible kludge; since what I care most about is how well the resulting machine runs my Unix environment, I prefer today's PCs, warts and all. I know that there are people who don't hold this view and who feel strongly enough about it to make different choices, but in many ways we're on different sides of a fairly large gulf, one that there's very little point in arguing over.

(Instead I argue that the Unix workstations were less elegant than people remember and had their own share of warts and kludges.)

It's worth noting that I am not an absolutist on this. After all, I'm using slower PC hardware because it's what my operating system supports with open source drivers, and not using various attractive programs because they're not open source or they're just ugly inside, and so on, so clearly I still care about the details to some degree. Sysadmins are somewhat biased in this anyways, because for us the end results include things like 'can we support this and troubleshoot it or is it going to cause us heartburn at 3am', and these practically require us to peek inside the black boxes and care about the contents to some degree.

EndsVsDetails written at 01:27:47; Add Comment

2010-03-06

Pushing code changes upstream is hard work

This is a followup to SupportingVsForking, where I talked about the difference between supporting some open source code and forking it being whether you could get your changes accepted upstream. One thing that is not widely understood is that getting bugfix changes accepted upstream is hard work at the best of times, with a cooperative upstream.

(We can see this from how many private changes to the Linux kernel each Linux distribution maintains.)

The problems are many. Often there is a conflict between the expedient way for you to fix a problem now and the 'right' way to fix the problem, which the upstream is going to argue for and which may require quite a lot of development (and arguing with developers); sometimes the upstream won't even know what the right way is, but they'll know that your way is the wrong way. In some cases, you and the upstream may disagree about whether there is a bug and (if there is a bug) where it exists and what exactly it is. Some times the upstream may accept that there is a bug but feel that fixing it is too disruptive at the current time.

(And all of this assumes that your proposed change is good code. Sometimes it isn't; the most common case in Linux is new hardware drivers, which often contain code that varies from the merely bad to the outright wretched. A distribution often needs to support the new stuff soon and didn't write the drivers; the upstream needs to have code that can be maintained over the long term by people other than its original authors.)

All of these translate to 'thanks but no thanks' for your bug fix or change, which means that you get to maintain more code for a while.

It's worth noting that accepting downstream changes is work for the upstream too. Many of these problems require an investment of time from upstream developers to read code, debate approaches, investigate problems, and so on, and the time of upstream developers is in limited supply. Plus, things like code reviews and arguing with people about whether something is the right approach or is actually a bug are not very rewarding or fun activities, which makes it harder to persuade developers to do them.

(All of this goes even more so if you are adding features or removing limitations instead of fixing bugs, because those raise much larger questions of whether they should be done at all and if your approach is the right approach.)

UpstreamingChangesIsHard written at 00:49:59; Add Comment

2010-03-03

All syndication formats use XML

I am pretty sure that this isn't the first time that I've seen people be grumpy about Atom because it's an XML-based format. Unfortunately, I have bad news for such people; to put it one way, it's XML all the way down.

More directly, all syndication feed formats, RSS's many variants as well as Atom, are XML-based (including the versions of RSS that are based on RDF, since the RDF used is XML-based). This is not just at a light structural level in RSS's case; you can routinely find RSS feeds that have <![CDATA[...]]> sections and other significant XML-isms that cannot just be treated as text (or HTML) inside elements that you strip off with a regexp.

Equally, all syndication formats are not XML in real life, in that attempting to parse any format with a strict XML parser will not infrequently give you errors (cf this comic). This is not even considering using a validating parser that actually checks the relevant syndication format specification (you can see how your favorite feeds would score at feedvalidator.org). In practice you can produce any syndication feed format with string bashing and have it consumed, despite errors, by most feed readers.

(Actually, I don't know for sure that Google Reader accepts invalid syndication feeds. I'd expect it to, but one can never be sure; online aggregators have been surprisingly picky in the past.)

My overall opinion of the relative merits of Atom and RSS remains unchanged. However, there's little reason to switch if RSS meets your needs and doesn't cause problems; feed readers, aggregators, and so on are going to support both for the indefinite future.

RSSisXML written at 01:49:55; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.