Skip to main content

Pivot Faces the "One Million Records" Challenge

Posted by gkbrown on October 29, 2008 at 1:05 PM PDT

Earlier this week I came across this article on Inside RIA:

I decided to see how Pivot would handle this challenge. The results can be seen here:

Like the author of the Flex version, I omitted the 1,000,000 row dataset from the online example due to file size. However, I did run the test a number of times, and the numbers are as follows:

10k: ~200ms

100k: ~2s

1M: ~20s

Nice and linear. Unfortunately, about half as fast as the Flex version when running locally. I had expected the performance of a Java app to exceed that of the Flash player, so I was disappointed.

I did a little research to try to identify the bottleneck. A significant part of it seems to be Pivot's use of hash maps to store deserialized CSV data rather than arrays, which is what the Flex version appears to use. So, it looks like some optimization may be in order, both in Pivot's handling of maps as well as whatever is contributing to the additional processing time, which I haven't yet had time to identify.

In any case, while the numbers aren't ideal, I was pleased to discover that Pivot was up to the "million record challenge" and fared pretty well, even if it didn't take first place.


Note - the demo has moved since this entry was published. The new location is: The demo has also been updated to use a streaming model to load the data, rather than loading it all up front before populating the table. The total time to retrieve the records is the same, but the user can see the data starting to load almost immediately, resulting in a much better perceived performance.

Pivot's default cell renderer expects row data to implement the Map interface - this allows the physical position of the column to be decoupled from the position of the field in the record. The tradeoff appears to be field lookup time - however, this is something that can probably be optimized.

The HashMap vs array is a odd parallel. Is it that you don't really need the random access lookup capability of the HashMap, and might just as well have used an ArrayList? (which will also have better performance when populating though maybe not be as performant as a dumb array) Or if the capability is in fact valuable, then might not Flex be trading off poorer performance elsewhere, where it will have to search the array? or, if that capability is useful, then Flex is paying for it somewhere else, and comparing some other benchmarks other than simply loading it up might come out more in Pivot's favor.