Skip to main content

Scanners live in vain

Posted by cayhorstmann on October 14, 2013 at 9:22 PM PDT

Do you remember the olden days when reading lines from a file was as easy as eating soup with a fork?

BufferedReader reader = new BufferedReader(new InputStreamReader(someInputStream));
String line;
while ((line = reader.readLine()) != null)
   process(line);

Just about ten years ago, Java 5 put an end to that nonsense.

Scanner in = new Scanner(/*just about anything at all that makes sense here */)
while (in.hasNextLine())
   process(in.nextLine());

Right now, I am putting the final touches on "Java 8 for the Impatient" and I describe the changes in the I/O API. You can read a file into a Stream<String>. That is nice. The stream is lazy. Once you have found what you are looking for, nothing else is read.

try (Stream<String> lines = Files.lines(path, StandardCharsets.UTF_8)) {
   String passwordEntry = lines.filter(s -> s.startsWith("password=")).findFirst();
   ...
}

What if you want to read from an URL instead? Did they retrofit Scanner? No sir. Scanners lives in vain, in the java.util package. (Extra credit if you know where that comes from.) Instead, someone went back into the graveyard and retrofitted BufferedReader. Does BufferedReader have a constructor that takes an InputStream? No sir. It hasn't been touched for ten years. So, here we are, in 2013, and have to write

try (Stream<String> lines = new BufferedReader(new InputStreamReader(url.openStream())).lines())
   ...

I realize the Java API is vast, but really, it isn't that vast. All the file stuff is in java.io and java.nio, and yes, java.util.Scanner, and every year or two I get to revisit it as I update a book. If I can keep track of it, so should the folks at Oracle. Moving forward, it would be good to keep a few principles in mind.

  • Everyone loves the convenience methods in Files Keep them coming.
  • Nobody loves the layered stuff, like new BufferedReader(new InputStreamReader(...)). That was a bad idea from the start, and I said so almost twenty years ago in the early editions of Core Java, where I pointed out that for the preceding twenty years programmers had been able to open files and got buffering behind the scenes without any of that nonsense.
  • Maybe the age of scanners has come to an end, and streams are the new way for consuming input. But learn from the scanners. One thing that made them attractive was that they are omnivores. You could construct them from a file. An input stream. A string. A ReadableByteChannel. That is how it should be. If you feel the urge to ignore Scanner and resurrect BufferedReader, just add those constructors.

Comments

Possibly the Scanner performance is not so good compared to ...

Possibly the Scanner performance is not so good compared to the performance of the BufferedReader. On my machine Scanner seems to be 3x slower than a BufferedReader in terms of reading lines. I am using a PC with Windows 7, 1 TB HD.

Here are the results for a very short test:

"Testing 5000 times on file with size 234402.
[scanner: 28.991 sec]
[bufferedreader: 9.672 sec]
[scanner_bufferedreader: 28.229 sec]"

Here is the code:

import java.io.BufferedReader;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.UnsupportedEncodingException;
import java.util.Scanner;
import net.sf.antcontrib.perf.StopWatch;

/**
*
* @author Gil Fernandes
*/
public class ScannerTest {

    public static void main(String[] args) throws Exception {
        int times = 5000;
        final File file = new File("D:\\dev\\data\\input\\TestData.csv");
        System.out.printf("Testing %d times on file with size %d.%n", times, file.length());
        final String encoding = "UTF-16";
       
        StopWatch stopWatch = new StopWatch("scanner");
        stopWatch.start();
        scannerTest(times, file, encoding);
        stopWatch.stop();
        System.out.println(stopWatch);
       
        stopWatch = new StopWatch("bufferedreader");
        stopWatch.start();
        bufferereaderTest(times, file, encoding);
        stopWatch.stop();
        System.out.println(stopWatch);
       
        stopWatch = new StopWatch("scanner_bufferedreader");
        stopWatch.start();
        scannerTest2(times, file, encoding);
        stopWatch.stop();
        System.out.println(stopWatch);
    }

    private static void scannerTest(int times, final File file, final String encoding) throws FileNotFoundException {
        for (int i = 0; i < times; i++) {
            try (Scanner scanner = new Scanner(file, encoding)) {
                while (scanner.hasNextLine()) {
                    scanner.nextLine();
                }
            }
        }
    }
   
    private static void scannerTest2(int times, final File file, final String encoding) throws Exception {
        for (int i = 0; i < times; i++) {
            try (Scanner scanner = new Scanner(new BufferedReader(new InputStreamReader(new java.io.FileInputStream(file), encoding)))) {
                while (scanner.hasNextLine()) {
                    scanner.nextLine();
                }
            }
        }
    }

    private static void bufferereaderTest(int times, final File file, final String encoding) throws IOException {
        for (int i = 0; i < times; i++) {
            try (BufferedReader reader = new BufferedReader(new InputStreamReader(new java.io.FileInputStream(file), encoding))) {
                String line = null;
                while ((line = reader.readLine()) != null) {
                }
            }
        }
    }
}

Hi Cay, Apart the classes of the collection API, most ...

Hi Cay,
Apart the classes of the collection API, most modifications done in order to retrofit a class to use lambdas was driven by the community, sending an email to the lambda-dev mailing list was the best way to be sure that your preferred class was retrofitted to use lambdas.
Obviously, not all classes was retrofitted :)
It's too late for Java 8 now, but I'm sure that the work on Java 9 will start soon.

cheers,
Rémi