The Source for Java Technology Collaboration
User: Password:



Pat Niemeyer

Pat Niemeyer's Blog

Stupid Scanner tricks...

Posted by pat on October 24, 2004 at 01:18 AM | Comments (5)

One of the things I've always wanted in Java is a "one liner" trick to read all of the text from a stream. For example, I often want to be able to grab the contents of a URL or file as a simple String, without a lot of typing. The URL class tantalizingly holds out its getContent() method, but sadly, content handlers were never really taken seriously. I don't even particularly care about performance, I'd just like something for the simple case, in standard Java, that's not too hard to remember. Well, the Java 1.5 java.util.Scanner class finally has the answer...

Suppose I have a stream:

    InputStream source = new URL("http://pat.net/misc/foo.txt").openStream();

The canonical way to gather it to a String has always been to use a BufferedReader, e.g.

    BufferedReader br = new BufferedReader( new InputStreamReader( source ) );
    StringBuffer text = new StringBuffer();
    for ( String line; (line = br.readLine()) ! = null )
        text.append( line );

This is about 4 lines of tediousness code (assuming the resulting StringBuffer is good enough), uses two classes, a loop, and too many parentheses. I must have typed code like this a million times, as I bet a lot of people have.

I've often been tempted to try to shorten it a bit using the DataInputStream readFully() method:

    byte [] buf = new byte[ source.available() ];
    new DataInputStream( source ).readFully( buf );
    String text = new String( buf );

That would be a bit less typing and involve only an array and a class. The problem is that it relies on the input stream's available() method to reflect the total size of the data to be returned... which in general it doesn't. The available() method works for files and you could always substitute your own size if you can get it from other meta-data, but it's still a messy solution and doesn't exactly roll off of the finger tips.

Finally now with Java 1.5's Scanner I have a true one-liner:

    String text = new Scanner( source ).useDelimiter("\\A").next();

One line, one class. The only tricky is to remember the regex \A, which matches the beginning of input. This effectively tells Scanner to tokenize the entire stream, from beginning to (illogical) next beginning. As a bonus, Scanner can work not only with an InputStream as the source, but also a File, Channel, or anything that implements the new java.lang.Readable interface. For example, to read a file:

    String text = new Scanner( new File("poem.txt") ).useDelimiter("\\A").next();

Finally, before someone chastizes me I should point out that you can accommodate a specific character set with all of the above examples. In the first you'd set the charset in the InputStreamReader, in the second you'd specify it with the String constructor, and in the Scanner example you can pass a charset to the constructor.

Enjoy!


Bookmark blog post: del.icio.us del.icio.us Digg Digg DZone DZone Furl Furl Reddit Reddit
Comments
Comments are listed in date ascending order (oldest first) | Post Comment

  • Good tip, thanks!

    Posted by: pdm on October 24, 2004 at 10:39 AM

  • Pat... Great tip, but it has one flaw that bit me. The Scanner doesn't get closed. I used your tip to read in text from a file, perform regex replacement on the contents and write them back out. But some of the files are subsequently moved based on another regex. Without closing the Scanner first that fails. So it ends up a nice 3-liner instead: Scanner s = new Scanner(file); String text = s.useDelimiter("\\A").next(); s.close(); Dean

    Posted by: dwette on February 28, 2005 at 04:22 AM

  • Thanks for your idea of Delimiter. Is Fantastic!

    Posted by: sermax on December 07, 2005 at 04:07 AM

  • Hi I am trying to scan a File for functions ...... basically the patther (); Example: strcat("fat",her"); add(); I tried this :- sc.findInLine("([a-zA-Z]+) (\p{Punct})"); //to get the function nameand left bracket Its giving error messages for the Punctions part .... could you help. Thanks

    Posted by: ashucool83 on June 18, 2006 at 05:07 PM





Powered by
Movable Type 3.01D
 Feed java.net RSS Feeds