Skip to main content

Stupid Scanner tricks...

Posted by pat on October 24, 2004 at 1:18 AM PDT

One of the things I've always wanted in Java is a "one liner" trick to read all of the text from a stream. For example, I often want to be able to grab the contents of a URL or file as a simple String, without a lot of typing. The URL class tantalizingly holds out its getContent() method, but sadly, content handlers were never really taken seriously. I don't even particularly care about performance, I'd just like something for the simple case, in standard Java, that's not too hard to remember. Well, the Java 1.5 java.util.Scanner class finally has the answer...

Suppose I have a stream:

    InputStream source = new URL("http://pat.net/misc/foo.txt").openStream();

The canonical way to gather it to a String has always been to use a BufferedReader, e.g.

    BufferedReader br = new BufferedReader( new InputStreamReader( source ) );
    StringBuffer text = new StringBuffer();
    for ( String line; (line = br.readLine()) ! = null )
        text.append( line );

This is about 4 lines of tediousness code (assuming the resulting StringBuffer is good enough), uses two classes, a loop, and too many parentheses. I must have typed code like this a million times, as I bet a lot of people have.

I've often been tempted to try to shorten it a bit using the DataInputStream readFully() method:

    byte [] buf = new byte[ source.available() ];
    new DataInputStream( source ).readFully( buf );
    String text = new String( buf );

That would be a bit less typing and involve only an array and a class. The problem is that it relies on the input stream's available() method to reflect the total size of the data to be returned... which in general it doesn't. The available() method works for files and you could always substitute your own size if you can get it from other meta-data, but it's still a messy solution and doesn't exactly roll off of the finger tips.

Finally now with Java 1.5's Scanner I have a true one-liner:

    String text = new Scanner( source ).useDelimiter("\\A").next();

One line, one class. The only tricky is to remember the regex \A, which matches the beginning of input. This effectively tells Scanner to tokenize the entire stream, from beginning to (illogical) next beginning. As a bonus, Scanner can work not only with an InputStream as the source, but also a File, Channel, or anything that implements the new java.lang.Readable interface. For example, to read a file:

    String text = new Scanner( new File("poem.txt") ).useDelimiter("\\A").next();

Finally, before someone chastizes me I should point out that you can accommodate a specific character set with all of the above examples. In the first you'd set the charset in the InputStreamReader, in the second you'd specify it with the String constructor, and in the Scanner example you can pass a charset to the constructor.

Enjoy!

Related Topics >>