Skip to main content

PMD optimisation rules put to the test - the AvoidEmptyStrings rule

Posted by johnsmart on April 18, 2008 at 3:50 AM PDT

PMD is an excellent static code analysis tool, with a rich set of rules regarding coding best practices and potential errors. The trick is working out which rules apply for your code.

Out of curiosity, I ran some benchmarks on the Optimization PMD rules, to see how they measure up to the latest JDKs. The results were, interesting...

Consider the AvoidEmptyStrings rule.
PMD Rule: AvoidEmptyStrings "Finds empty string literals which are being added. This is an inefficient way to convert any type to a String."

Today, it would appear not... Look at the following code, which illustrates the said bad practice:

        for(int i=1; i < 10000000; i++) {
        String s = "" + 123;
        }

This takes around 7 ms to run on my laptop.

Now look at the following "improved" version:

        for(int i=1; i < 10000000; i++) {
        String s = Integer.toString(123);
}

This took 425 ms to run, or approximately 60 times slower. Hmmm.

There may be other reasons to use Integer.toString(), but it would appear that performance is not one of them.

That said, this sort of optimization is often very much an intellectual exercise with today's JDKs. Objectively, half a second for 10 million operations is still probably plenty fast for most situations, as other, slower operations like database calls will form a bottleneck long before String operations will. The rule of thumb is, as always, benchmark before you optimize, measure before and after, and know why you are optimising.

Still, if you need to squeeze every last CPU cycle, this sort of thing is good to know ;-).

Update

Later on, I reran some tests which seem to confirm the role played by the constant value pointed out by some of the comments below. Using variable values, the results are more what you would expect:

    for (int i = 1; i < 10000000; i++){
        String s = Integer.toString(i);
    }

Results: 1382ms

    for (int i = 1; i < 10000000; i++) {
        String s = Integer.toString(i);
    }

Results: 757ms

So Integer.toString() is faster in situations using variables. When constant values are used, the compiler optimises things a little better when simple String concatanation is used.

Related Topics >>

Comments

To clarify, this is a Java language level optimisation. See 15.28 Constant Expression of JLS 3rd Ed.

Bytecode generated by javac can be inspected with javap -c. In this case we have:

0: iconst_1 1: istore_1 2: iload_1 3: ldc #2; //int 10000000 5: if_icmpge 17 8: ldc #3; //String 123 10: astore_2 11: iinc 1, 1 14: goto 2 17: return

Here we can see the interned string "123" stored in local variable 2 and unused.

But "" + finalString is not used as finalString :(

So Integer.toString() is faster in situations using variables. When constant values are used, the compiler optimises things a little better when simple String concatanation is used.
With this code

public class AvoidEmptyString { public static void main(String[] args) { long start; final int finalInt = 123; final String finalString = String.valueOf(finalInt); start = System.currentTimeMillis(); for (int i = 0; i < 10000000; i++) { final String s = "" + i; } System.out.println(System.currentTimeMillis() - start); start = System.currentTimeMillis(); for (int i = 0; i < 10000000; i++) { final String s = "" + finalString; } System.out.println(System.currentTimeMillis() - start); start = System.currentTimeMillis(); for (int i = 0; i < 10000000; i++) { final String s = Integer.toString(i); } System.out.println(System.currentTimeMillis() - start); start = System.currentTimeMillis(); for (int i = 0; i < 10000000; i++) { final String s = "" + finalInt; } System.out.println(System.currentTimeMillis() - start); start = System.currentTimeMillis(); for (int i = 0; i &lt 10000000; i++) { final String s = finalString; } System.out.println(System.currentTimeMillis() - start); } }
I have

3387 2264 1875 22 22
We can see "" + finalInt is used as finalString

Here are some of the results from my machine.

Example 1: 14ms
Example 2: 744 ms

Now I believe rjray was correct about the constant-folding optimization. So I ran this code next. Example 3: int val = 123; for (int i = 1; i < 10000000; i++) { String s = "" + val; } Results 1607 ms.

I also ran code arsene suggested and got this:
Results 3863 ms.

I would say PDM is on the right track with it's suggestion.

I would agree with the previous comment-- I suspect that the presence of two constants in that expression led to constant-folding optimization at the JVM level.

What about ? for (int i = 0; i < 10000000; i++) { final String s = "" + i; }

Let's do some back of the envelope calculations. For argument's sake, let's say your laptop can get through 5 billion instructions per second (on a single thread). In 7 ms that is 35 million instructions. Your loop goes around (one short of) 10 million times. So that would be 5 instructions per iteration. Hmm, seems like microbenchmarks can be misleading.