The Source for Java Technology Collaboration
User: Password:



Tom White's Blog

March 2006 Archives


A Faster Java Regex Package

Posted by tomwhite on March 27, 2006 at 12:02 PM | Permalink | Comments (4)

Anders Møller's dk.brics.automaton is a Java regex package whose main claim to fame is that it is significantly faster then all other Java regex libraries, including the java.util.regex classes in the JDK. Like many things in computer science, the speed gains come at a price. In this case, the regular expression language supported is not as rich as the Perl 5 syntax that is prevalent in today's tools, including the JDK implementation (which has only minor variations from Perl 5). That said, for some applications this trade-off may be worth it. (For example, it is going to be used in Nutch - see discussion on this thread.) In the rest of this post I shall try to explore this trade-off a bit more.

Continue Reading...



Affordable Web-Scale Computing

Posted by tomwhite on March 17, 2006 at 06:10 AM | Permalink | Comments (0)

With the launch of Amazon S3 (Simple Storage Service) we are seeing a continuation of the trend for the big web companies to monetize their computing infrastructure by opening it up to developers. It is probably only a matter of time before we see Google create something similar, which would essentially be a limited public interface onto the Google File System.

I would love an API that exposes Google's MapReduce, a simple programming model for crunching on large datasets. You can write and run MapReduce programs today, using Hadoop, but it's only really useful if you have enough machines at your disposal. The pay-as-you-go model of S3 (and Sun Grid) would be very attractive to developers who want to run ad hoc computations, or can't afford the upfront investment in hardware.





Powered by
Movable Type 3.01D
 Feed java.net RSS Feeds