|
|
||
Kirill Grouchnikov's BlogJanuary 2005 ArchivesGarbage collection - not a panaceaPosted by kirillcool on January 26, 2005 at 01:16 AM | Permalink | Comments (11)Sometimes working with a good profiler gives you very valuable insights into performance bottlenecks. Of course, there is a convention of "don't start profiling until your code is production quality", but there are few steps that you may take to spare yourself a lot of trouble. I've been looking at the source code of the JXM project as part of the bindmark initiative. Altough this project appears to be dead (the last posting dates back to September 2003), it is a perfect demonstration of the techniques that should be avoided from the very beginning. Its performance, both in time and in memory, was dead last among all the frameworks that have been tested, and a quick glance in a profiler reveals why. A class com.lifecde.jxm.XMLToClassMap provides a function called getField():
public FieldMap getField(Object obj, String name)
{
FieldMap field = new FieldMap();
field.setName(namePolicy.javaName(name));
return field;
}
The class FieldMap's constructor looks innocuous enough:
public FieldMap()
{
TimeZone z = TimeZone.getTimeZone("GMT");
timeFormat = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss z");
timeFormat.setTimeZone(z);
dateFormat = new SimpleDateFormat("yyyy-MM-dd z");
dateFormat.setTimeZone(z);
timestampFormat = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
timestampFormat.setTimeZone(z);
}
However, this code is a real performance killer (and not in a good way). First of all, the format members are used only for converting dates and timestamps. Second, as the formats are constant, they should have been declared as static members and initialized only once. Another option would be to cache them in a HashMap of some sort. You can say now - it's not that bad, so we allocate three objects which then get garbage-collected. No harm, no foul. Wrong.The getField() function is called on every XML tag. On parsing 2000 moderately-sized XML strings (each one about 1KB big), the FieldMap constructor accounts for 2.411.079 allocated objects that totals 25.4% of all allocated memory. In total, 120.000 FieldMap are created in 106.82 seconds which is 27.6% of the total running time. You can ask, how 120.000 constructors resulted in 2.5 million allocations? Just look at the constructor of SimpleDateFormat in JDK and see yourself. In this case simple technique of making the objects static would have resulted in 25% save of both memory and time. In addition, reflection (or introspection) is heavily used throughout the library. A Class.forName() alone accounts for 12% time and 16.2% memory (resulting in 1.441.039 allocations). Using HashMap (possibly with soft references) would have resulted in additional speedup. To sum up - the profiler is not some extravagant toy. It should be used from the very beginning. In addition, use static members for constant fields and hash data structures for speedups. AOP - a poor man's excuse for writing ugly codePosted by kirillcool on January 14, 2005 at 04:26 AM | Permalink | Comments (21)Let's take two examples that are given in any AOP language, logging and context passing. AOP takes pride of the fact that it allows "injecting" code at the beginning and at the end of any method (specified using sophisticated "regular expressions"). But does this really qualify as a logging and tracing mechanism? Not really. Any non-academic application has functions with multiple exit points (although this on itself can be called a bad programming practice) with multi-branch logics inside. Typically I would have 5-8 log points with various log levels. The important lines that i wish to log most certainly lie inside the code. Each line prints information on specific objects. Moreover, these messages must be localized using DB (or configuration XML file) and filtered based on the log level that can change at runtime (presumably after an erroneous behaviour is spotted). AspectJ most certainly does not provide any means to do this. Second, context passing. Now we have a stack of 10 functions and we need to pass additional parameter to the innermost function. Now we have two cases: there are already way too many parameters and we don't want to pass an additional one (this is called not modular in AspectJ tutorial); or we have some class that contains all the arguments and we are too lazy to change it. The second case is obvious - it costs next to nothing to put an additional member to this class without breaking the modularity. The first one is more tricky. If you have already 15-20 parameters for each function, than your coding practices really suck. If you are unwilling to add a parameter to a function that doesn't really need it, but has to pass it to another function that does, you should rethink the design of your functions. In any case, "pushing" a parameter into a code that wasn't designed to handle it is very bad design. Moreover, this forces reading multiple files in order to understand what each function does. Another examples of AspectJ from the tutorial in short:
Open source - the curse of the abundancePosted by kirillcool on January 11, 2005 at 03:25 AM | Permalink | Comments (11)As this blog shows, almost every imaginable field of the programming has been covered with open source Java projects, all vying for the same #1 spot. It may seem at first that this can only be a good thing, competition driving the creators forward to create better and faster libraries. However, not all is fair in this kingdom. A recent survey that has been performed on the XML data binding frameworks (available here) has provided me with some interesting insights. In this survey, nine different libraries have been tested for time and memory performance on the same XML input (and its Java representation). The biggest problem? Had to write code for each one of them from the scratch. Even though the JAXB API has been available from 2001, the frameworks have not adopted the proposed approach (each one because of its reasons, i'm sure). The result - each marshalling and unmarshalling class has its unique name, unique signatures for its functions and so on. The bottom line for a developer that wishes to switch implementation - no can do. This seemingly innocent approach of reinventing the wheel (even when the original one is good enough and the only problem with it is that it wasn't invented by you) effectively prevents a potentially successful framework from taking the market lead. Suppose tomorrow a great library comes out, all blazing with performance, with full support of XSD, DTD and Relax. Who will use it? Only the new projects. The cost of switching to a new library and rewriting your code will be too prohibitive for any existing midscale up project. In addition, this also results in libraries that have not been tested thoroughly in production environments. A recent example of this has been discussed at this entry. This particular class has test functions, they just don't cover the buggy line. It has been there since 0.6.0 version (up until the current 0.6.4). The availability of the source code and the testing by the author apparently have not been enough. The data structure market has been divided by so many players that a new player just goes unnoticed (except by Google). Lack of coordination between different libraries (even on the most basic interface level) doesn't allow me as the end user to test the libraries by simply switching the implementation in the configuration file. The result - a really good library is sitting somewhere there languishing. Enjoy Kirill | ||
|
|