In practice, anything involving the JVM is often a heavyweight thing

October 31, 2015

Last week I asked on Twitter if anyone had a good replacement for swish-e for indexing and searching some HTML pages. Several people suggested Apache Solr; my immediate reaction was that this sounded too heavyweight for what we wanted. It was then asserted that Solr is not that heavy if you disable enough things. I had a number of reactions to that, but my instant one was 'nothing involving the JVM is lightweight'. Today I want to talk about that.

I don't call JVM-based things 'heavyweight' because Java itself can easily eat up lots of memory (although that's certainly a potential issue). What makes the JVM heavy for us is that we don't already run any JVM-based services and that all too often, Java is not like other languages. With languages like Python, Perl, Ruby, or even PHP, as a sysadmin you can generally be pretty indifferent to the language the system is written in. You install the system (ideally through a package manager), you get some binaries and maybe some crontab jobs, and you run the binaries. You're done. With Java, my impression and to some extent my experience is that you also have to administer and manage a JVM. A Java system is not run some programs and forget; it's putting .jars in the right place, it's loading certificates into JVM trust stores, it's configuring JVM parameters, and so on and so forth. There is a whole level of extra things to learn and things to do that you take on in order to maintain the JVM environment for the system you actually want to run.

(One way to put it is that a JVM seems to often be a system inside your normal system. You get to maintain your normal system and you also get to learn how to maintain the JVM system as well.)

All of this makes any JVM-based system a heavyweight one, because adopting it means not just learning the system but also learning how to manage a probably-complex JVM environment. If we were already running JVM based things it would be a different issue, of course, because we'd probably already have this expertise (and the JVM way might even work better for us), but as it stands we don't.

(Similar issues probably hold for any Node-based system, partly because of Node itself and partly because Node has its own very popular package management system that we'd probably have to learn and wrangle in order to run any Node-based thing.)

It's probably possible to design JVM-using systems that are not 'JVM-based' in this way and that encapsulate all of the complexity inside themselves. But I suspect that something labeled on its website as 'enterprise' has not been designed to operate this way.

(I've mostly talked about the JVM instead of Java specifically because I suspect most of these issues also apply to any other JVM-based language, such as Scala, Clojure, and so on.)


Comments on this page:

By Ewen McNeill at 2015-10-31 03:05:31:

While I agree that Java based applications tend to be relatively heavyweight -- due to mostly being used for "enterprise" software -- I suspect the difference between the "overhead" of JVM/Java and some other "runs in a Virtual Machine" languages is smaller than you think. Modern JVMs, running not-gigantic programs, on not so constrained systems, typically need much less tuning than used to be require to fit any VM onto a system say 20 years ago.

For a Sysadmin, the things that seem to me to matter in language choice are (a) whether the language tools/libraries are well packaged for the OS (Perl, Python, etc typically yes; Ruby, Java, Node, etc much less so), (b) whether it needs a non-trivial amount of extra feeding and caring, and (c) whether that effort is well documented. Those are all genuine concerns. But I'm not sure if I'd call something "heavyweight" just because it wasn't familiar. Because depending on what one has used before, anything could be extra work to get familiar with. (Given a choice I'll certainly go for a tool that is in a language where the whole ecosystem is familiar though; anything else is making more work.)

FWIW Apache Solr does ship with everything included in the archive to run it, except a JVM. And IIRC it runs okay with a JVM installed from OS packages. That combination isn't ideal for scaling up... But works for small use cases. From memory the only pain I had when trialing it for a project was needing to write something to run its start script on boot.

Ewen

PS: Glimpse comes to mind for your original problem. Written in C :-). No idea what the web integration of it looks like after all these years.

By Anon at 2015-10-31 12:55:04:

How about http://xapian.org ?

Written on 31 October 2015.
« My discovery that the USB mouse polling rate matters
One advantage of System V is that it was available »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sat Oct 31 01:05:43 2015
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.