The Source for Java Technology Collaboration
User: Password:



Tom White's Blog

Programming Archives


More Literate Programming: Language-Level Anaphora

Posted by tomwhite on June 29, 2006 at 01:21 PM | Permalink | Comments (5)

Last month I blogged about Literate Programming with jMock, and also about using anaphora to avoid repetition in the tests. (An anaphor is a word like it that refers to something previously referred to.)

This got me thinking: is it possible to use anaphora more widely at the language level? Would such constructs be useful? Before trying to do this in Java I looked at more dynamic languages, starting with a very quick look at Lisp, where I first came across anaphora in programming languages.

Continue Reading...



A Faster Java Regex Package

Posted by tomwhite on March 27, 2006 at 12:02 PM | Permalink | Comments (4)

Anders Møller's dk.brics.automaton is a Java regex package whose main claim to fame is that it is significantly faster then all other Java regex libraries, including the java.util.regex classes in the JDK. Like many things in computer science, the speed gains come at a price. In this case, the regular expression language supported is not as rich as the Perl 5 syntax that is prevalent in today's tools, including the JDK implementation (which has only minor variations from Perl 5). That said, for some applications this trade-off may be worth it. (For example, it is going to be used in Nutch - see discussion on this thread.) In the rest of this post I shall try to explore this trade-off a bit more.

Continue Reading...



Modularize Early

Posted by tomwhite on May 30, 2005 at 03:57 PM | Permalink | Comments (2)

There is an old saying that mathematicians only know three numbers: 0, 1 and ∞ (infinity). There is some truth in this in computing too, as dealing with a single entity can be very different to dealing with a multiplicity of that entity. In JDK 1.5 speak: Am I using a Thing or a Collection<Thing>?

It is possible to build a system using single entities at each level of the system hierarchy. Imagine a system running on a single virtual machine composed of a single deployment unit containing a single package with a single class that has a single method which is a single line of code. It is possible to do this, but no one takes it to this extreme. (Although, the thinlet Java GUI toolkit comes close, being a single class of 6000-plus lines, an example of the blob anti-pattern) The usual guidance for getting the right system decomposition is to follow the Unix philosophy of Do one thing, do it well at all levels. Following this advice, you split things when they try to do too much. This is easy enough to understand and apply at the level of individual lines of code, but harder to realize as we move up the levels of the hierarchy.

What often happens is that at the higher levels we just have single instances, which is fine when the system starts out but over time grows to be an unwieldy monolith. The problem is by the time you recognise that it really is time to split up the system, it is actually quite difficult to do so since it is so complicated! It would be easier to modularise early. Knowing that you system can function with (say) two EARs tells you how you can expand it to run from three in the future. But imagine you had just a large single EAR. When you came to split it, as well as solving the usual problem of where to split the code, you have to work out how to split it - that is, how to make the system run from two deployment units as opposed to one. The combination of these problems can be enough to make any reasonable programmer balk, and carry on bloating the single EAR... I know this has happened to me.

One of the striking things about Jini is the way it encourages you to embrace multiplicity at the high end of the hierarchy from the outset. This is for a simple reason: a Jini system is a collection of orchestrated services running in many VMs. Perhaps we can learn some lessons from Jini about building more modular systems. This is not a new thought, the pioneer computer scientist Grace Hopper made a similar point in the mid-twentieth century:

In pioneer days they used oxen for heavy pulling, and when one ox couldn't budge a log, they didn't try to grow a larger ox. We shouldn't be trying for bigger computers, but for more systems of computers.


groovy -pi -e

Posted by tomwhite on April 18, 2005 at 12:55 PM | Permalink | Comments (1)

Perl is famous for its one-liners. By using the -e command line switch you can execute the script supplied as an argument. But the real power comes when you use -p (to process each line of a supplied file), and -i (to modify the file in place). The classic example is to perform search and replace on a bunch of files. The following will replace all occurrences of curious george with the gruffalo:

perl -pi -e 's/curious george/the gruffalo/g' favourites.html

I was happy to learn recently that you can do the same with Groovy. Groovy supports the same set of command line arguments, but the script is obviously more Java-ish (the =~ operator is a Groovy regex operator):

groovy -pi -e "(line =~ 'curious george').replaceAll('the gruffalo')" favourites.html

OK, it's not quite as terse, but it's still a one-liner. (Actually, there is a bug in the latest version of Groovy, JSR 1, which prevents the -i switch from functioning correctly. Hopefully it will be fixed soon. As a workaround you can call groovy -pi.bak -e ... which has the same effect and in addition backs up the original file.)

More complicated examples provide a great way to learn more about Groovy features. This awk-like replacement prints the first and penultimate whitespace-separated columns from a text file, and shows off Groovy's Python-inspired slicing ability.

groovy -pe "line.split('\\s')[0, -2]" *

And this will find duplicate words:

groovy -ne "if (line =~ '\\b(\\w+)\\b\\s+\\b\\1\\b') println line" *

Do you have any Groovy one-liners?





Powered by
Movable Type 3.01D
 Feed java.net RSS Feeds