 |
Beware of the Natives
Posted by mlam on December 07, 2006 at 02:11 PM | Comments (16)
There are a lot of not so nice things about using native methods. Here are some:
- less safe - think "stray pointers".
- less portable - you'll have to recompile them for every target device architecture you deploy on, present and future.
- less cost effective - need extra work to build and test all the architecture variations, extra disk storage for deploying all the different binary versions, etc.
- less manageable - can be a "binary version" tracking, management, and device provisioning nightmare.
But if these reasons aren't enough to deter you from using native methods, try this on for size:
Native code can hurt performance
This seems to go against most people's expectations, but it is the truth. First of all, there is the reason due to what goes on in the runtime stacks when you invoke native methods. I've talked about that in my previous articles (here, here, and here). There, I showed that using native methods incurs bootstrapping and extra frame pushing/popping overhead which results in degraded performance. But there are also many other reasons besides this.
To be fair, native code can be used to help improve performance when used in the right places. I will explain those cases as well. The key is to use native code "carefully".
Ok, let's go bust the "native" myth ...
Anything you can do, I can do better!
Having a dynamic adaptive compiler (JIT) is more than simply translating bytecodes into equivalent machine instructions one bytecode at a time. Such a naive translation strategy is more like an assembler than a compiler, and usually yields anywhere from less than 2x to 5x performance gains. The higher 5x performance gains is a generous estimate, and can only be realized if the method being compiled only does some very simple arithmetic in a small loop. Any real world application, even a HelloWorld program, will not realize that kind of gain. Even the smallest JITs today such as that in the phoneME Feature (CLDC) VM will be able to do better than this.
The phoneME Advanced (CDC) VM, CVM, has a JIT that is far more sophisticated than that. The CVM JIT can yield gains from 4x to 14x (over interpreted bytecodes) depending on the application. The high end numbers are for very simple applications like the small naive loop. Note: these numbers are very rough estimates that I made using an educated guess based on internal measurement results. Actual performance numbers may vary (lower or higher) depending on the device architecture and benchmarks used.
But is this enough? Can JIT compiled code perform as well as or better than native code? Well, first we need to understand how the JIT gives us some of these huge gains. I'll highlight 2 aspects:
Inlining
When a JIT compiles a method, it can choose to inline a method that is called from the method being compiled. For example, method mA calls method mB, and we're compiling mA. In this case, mB can be inlined into mA as part of mA's compilation. Inlining reduces call overhead between methods (e.g. for pushing/popping frames). Also by inlining the code of mB into mA, the 2 pieces are now placed in closer proximity. This improves cache locality which helps performance. Since mA is likely to call mB, it would hurt performance if the 2 are far apart.
Runtime Profiling
Unlike static compilers (like C compilers), a JIT makes use of runtime profiling techniques to determine what methods and/or code paths are hot (i.e. frequently used). And the JIT will only compile the methods that are hot.
In CVM, each method has an invocation counter that tracks its "hotness". The JIT inliner actually factors the hotness of callee methods into its determination of whether to inline or not. Hence, callee methods that are more hot will be more likely to be inlined. Less hot methods will be less likely to be inlined. This means that the inlining is selective of hot code paths that are taken by the application.
For example, mA calls mB1 and mB2. mB1 calls mC1. mB2 calls mC2. From profiling, we know that mA calls up to mC2 a lot via mB2, but not so much to mC1. When compiling mA, this allows us to choose to inline mB1, mB2, and mC2 into mA, but not mC1.
This allows us to invest the cost of inlining where it will yield us the most performance gains, while not incurring the cost as much in other lower yield areas.
No, you cant!
But can't native methods also do the same? Yes, to some extent, if you are talking only about C functions and not Java methods that are native. C compilers do have inlining. But there are problems as we'll see below.
There are also profilers for C code. But applying these to Java native methods is something else.
Yes, I can!
C functions can certainly do some inlining. The issues is which callee methods to inline? The choice of methods to inline is seldom known to the C compiler as compilation is done at build time rather than at runtime. To get around this problem, people do use native profilers to capture a picture of the application's hot methods and paths. This profile is then fed back into the C compiler to guide its inlining and optimizations. This is all done during development time.
The problem with this is that it assumes that the captured profile is representative of the actual execution profile of the application at runtime under real usage. A lot of times, this static approach does yield fairly good results. But it is no substitute for what a JIT can do with a runtime profiler if the application is very dynamic in nature.
Now, factor in that the Java platform is a dynamic environment where new code can be downloaded at runtime. In other words, the callee method may not even be known or available to the C compiler and/or profiler at development time. There is no option to tweak the optimizations for executing code that doesn't exist yet.
Another issue is with the Java language's support for virtual methods. With virtual methods, the C compiler won't be able to know which callee method to inline. Most modern JITs (including CVM's) employs a technique called speculative inlining to solve the problem of inlining virtual methods. What happens is that the JIT will inline an expected callee method based on its hotness. When the compiled method is executed, it will first check if the method to be called is the expected one that was inlined. If so, it will proceed with executing the inlined code. Otherwise, it will do a regular virtual invocation. There are also many other variations on this scheme, but the basic idea is the same.
This speculative inlining together with the runtime profiling allows virtual methods to be inlined efficiently by JITs. It would be difficult for a C (or C++) compiler to implement speculative inlining in an efficient way without the help of a runtime profiler. One possible approach is to inline everything blindly or to use some heuristics. The blind approaches results in too much code bloat without necessarily yielding results. Code bloat also hurts cache locality and, therefore, performance. The heuristics approach is a blind guess. You may have a winner or a loser, and you won't know for sure until after your code has been deployed.
Anything I can do, I do better than you!
When I started this discussion, I was talking about Java native methods, not ordinary C code. For native methods, the performance impact is even worse. This is because JITs do not know what goes on in a native method (with one exception explained below). Having a native method in a call chain essentially stops a JIT inliner at the point where the native method is encountered. It cannot inline past the native method.
Secondly, in order to be at least somewhat portable, native methods need to be written using an API like the Java Native Interface (JNI). If you look at the JNI specification, you'll see that every access to Java data structures like methods and fields are through the use of an indirect function call through the JNI environment interface. Hence, if your method needs to manipulate Java object fields on a regular basis, the field accesses will incur a lot of overhead in terms of multiple function calls. In contrast, JIT compiled code will be able to access the same object field just like C accesses a struct field i.e. in a few or only a single machine instruction. The JNI overhead is in the order of something like 20x to 100x. That's a stiff penalty.
To be proper and correct, a lot of JNI API calls need to be followed by an explicit exception check using JNI's ExceptionCheck() API. For example, this needs to be done after calls to all method invocation and allocation APIs. This is because the VM has a right to throw an OutOfMemoryError at any of these junctures. Without the check, the native method may be proceeding in an unstable environment, or using memory where it is unavailable. That would simply be bad programming. These checks will add additional overhead.
In JIT compiled code, these checks are not explicit i.e. no time is spent on executing any such checks. The VM simply handle the exception automatically. The only case where the VM can't handle it automatically is when it encounters a native method in the call chain. Go figure! That's when it has to hand control over to the native method, and let the native method do the check explicitly.
Apart from these, there are other costs to using native methods that hurts performance. These include marshalling costs of method arguments and return values. Remember that CVM uses 2 stacks: a native stack, and a Java stack. Arguments will need to be marshalled across these stacks. Even JavaSE's VM which uses only one stack will incur additional marshalling costs when crossing from Java code to native, and vice versa. However, their cost is less than CVM's.
Another cost is the cost of VM state changes. In CVM, a thread is always in one of 2 states: GC safe or GC unsafe. I'll leave the meaning of these states for a later day when I discuss the GC (garbage collector). But for reasons not explained here but are necessary, native methods need to operate in a mostly GC safe state, while for performance reasons, compiled code operate in a mostly GC unsafe state. Going across that boundary from compiled Java code to a native method will incur some additional overhead in terms of these state changes. The JavaSE VM also has similar (but different) VM states, and the issue exists there as well.
So, if you don't have a good reason for using a native method, then don't. Chances are your native code will be slower, and you may also get sloppy and introduce bugs (such as forgetting to do that exception check at the proper place). Native methods not only cannot be optimized by the JIT, it prevents the JIT from doing the best job it can as when there is no native methods in the call chain.
Why Bother?
So, why allow native methods at all? At some point, we will have to access OS or hardware resources. Even the Java libraries that provide access to these need to do some native work at some point. If the Java libraries do not provide access to these resources, then that may be a case when a native method is warranted. Chances are there will already be such APIs. If not, perhaps you should get involved with the JCP process and get a JSR together to handle the missing piece. You could go with the native method solution, but as I pointed out, there can be significant cost to doing so.
For the application developer, it is usually best to just write in the Java language. There is one exception to this rule of thumb. If the type of computation that needs to be done by the native method does not involve a lot of access to Java objects and do not call Java methods frequently, then you may have a candidate for a native method. First, you should make sure that there is actually a benefit in writing that method as a native. An example of such a method would be like a ZIP library method that decompresses a compressed file. The decompression works purely on a C array and does not return to the Java domain until its job is done. In other words, in order to do its job, it does not need to cross that boundary between the C and Java domains frequently. Generally, one big source of performance costs of native methods come from the crossing of that domain boundary.
Use the Standard Java Libraries
Note that the standard class libraries already come with a java.util.zip package to let you do that zipping stuff. In general, I think the standard class libraries usually already provide you all the APIs you will need to access OS and hardware resources, or to do computationally intense operations (like the ZIP operations).
There is another incentive to use the standard libraries instead of rolling your own native. I explained that native methods are not only themselves not optimal, but also stop the JIT from doing its work optimally. This because the JIT has no knowledge of what's in the native methods. However, for native methods in the standard class libraries which need to be native, there is a solution that can allow the JIT to still perform optimally. This solution is the use of intrinsic methods (commonly abbreviated as intrinsics). Support for intrinsics is available both in CVM and the JavaSE VM.
Intrinsics are methods whose semantics are already known by the VM and its JIT at VM development time. Hence, the only native methods that can be intrinsics are the ones in the standard libraries. Amongst other benefits, intrinsics effectively allow JITs to inline native methods (in some form) where it couldn't be done before. Hence, intrinsics allows native (OS and hardware) resources to be accessed without necessarily incurring all the costs of a native method. JITs will still be able to generate optimal code around calls to these intrinsic native methods. You won't be able to get this benefit if you roll your own native methods to access those resources.
Having said this, not every native resource can be accessed through intrinsics. The choice of which methods to be made intrinsics is dependent on how the VM and library developer chooses to optimize for the device. There are tradeoffs involved and a cost to having intrinsic methods. Therefore, it is not possible nor wise to make every method in the standard library into intrinsics. However, this choice is usually only made by platform developers. Application developers have no direct control on this, nor any awareness of which methods are intrinsics or not.
Final Word
In the above, I talked about some of the costs incurred with using native methods. This discussion is by no means exhaustive, but it is adequate to illustrate the issue. In general, for the application developer, using native code means getting less of the good stuff and more of the headaches. For platform / device manufacturers, native code is a necessary pain. But intrinsics can at least help with some of the performance impact of having native methods.
So, beware of natives. :-)
Bookmark blog post: del.icio.us Digg DZone Furl Reddit
Comments
Comments are listed in date ascending order (oldest first) | Post Comment
-
Wow, a lot of information on this topic, thanks. Anecdotally, I remember being endlessly amused when my co-workers decided that static-compiling their server-side app with GCJ would significantly improve the speed, and when they were done they found that the increase was, at most, 5%.
Posted by: invalidname on December 07, 2006 at 03:15 PM
-
These are old issues. The issues for the future are the lack of explicit parallelism in Java and the fact that with 64bit addressed memory the garbage collector will crowd out the running code.
Posted by: bellbux on December 08, 2006 at 04:41 AM
-
Sun continues to make speed based arguments to convince people not to use native code. There are many people like me who are forced to use native code because of the paltry development tools available for compile legacy C, C++, and Fortran on the JVM. If you have a large codebase in one of those languages, the best solution would often be to recompile it to bytecode--if the tools existed. There are a few tools to support C on the JVM, but it doesn't really compare to C on Windows or Java on the JVM.
Posted by: coxcu on December 08, 2006 at 05:38 AM
-
Hi bellbux,
Yes, these are old issues. I had previously started the article out with some background stating that this subject has been explained by many experts before. I cut that part out in my edit because I didn't want the article to be too length. Regardless, I thought that for those people who have not heard of the issues before, it would be informative. Plus, it was a means for me to explain more about the workings of CVM internals to the JavaME community (which is one of my goals). These issues may be one that is more relevent to platform developers in the JavaME space then yours presumably. Sorry that it didn't do anything for you.
Regarding the explicit parallelism and 64-bit issues, those are problems in the JavaSE domain that I am not an expert on. But I believe that there is the java.util.concurrent package in Java 1.5 that caters to explicit parallelism. And a quick google tells me that folks have been looking at supporting NUMA architectures as far back as 2000. I'm guessing that NUMA is one aspect of addressing collection of a large 64-bit heap. Again, I am not an expert in these areas, nor am I knowledgeable of the extent of the support/solutions for them available today.
Regards, Mark
Posted by: mlam on December 08, 2006 at 07:17 AM
-
Thanks for the blogs. I found all the VM design entries to be quite interesting reading, even though I do all my coding for SE. :-)
Posted by: afishionado on December 08, 2006 at 08:34 AM
-
Dynamic compilation from HotSpot is a marvelous technology, but it is lacking a small item to lower arguments of static compilation defenders: memory between runs. When a Java application starts anew, HotSpot has to find from scratch where are the hot methods, again and again at every run. The information collected from a previous run is not kept to speed the compiler with its optimization for the next run. And this could be one of the reason Java application are so slow to start, a recurring argument from users of Java applications...
It's like you take your car every morning to go to your office. At some point of the drive, you have to take a decision to go on the left or the right road. The first morning, you select to turn right; the next one, you take the left path. Based on your experience, you decide to always take the left road because you spend less time in your car. All is well up to the time when there is a congestion for a few days on your usual "left" road and you decide to turn right and discover that now this path is better.
This is dynamic decision, and based on memory from previous experiences, you are able to take optimal decisions in less time.
The HotSpot compiler is able to take such decisions, but every day, as a driver, it has to go on both left and right paths to decide which one is best, spending a lot of energy in the process.
A static compiler like C or C++ is like a driver who has been conditionned to always use the left road beacuse at the time boths roads where constructed, the left one was the shortest.
If HotSpot kept profiling information and executable code from previous runs...
Posted by: genepi on December 08, 2006 at 10:57 AM
-
Mark,
Just because they are old doesn't mean they are uninteresting. Just meant to say that would also have been interesting to look to the future. Or flag it up as a J2ME specific article in the heading.
To exploit parallelism you need to be able to control which processor your data structures are stored on. Otherwise you waste a lot of time fetching data from other processors before you can perform an operation. Efficient parallel algorithms keep data in the neigbourhood of the processor that is going to perform operations. NUMA will help with this. But Java's emphasis on shielding you from the machine will limit the effectiveness.
As for 64 bit. If you have 16000000000000000000 bytes addressable memory .... then then garbage collector is going to be pretty strained scanning through that lot. That is assuming all 64 bits are used. It is only just over a decade since we hit the limit of 16 bits. So within twenty or so we'll have 64 addressable bits worth of memory actually in use..
Posted by: bellbux on December 08, 2006 at 10:59 AM
-
Hi coxcu,
Sorry I couldn't reply earlier. Java.net was down for a while. Anyway, I apologize if my article gave you the wrong impression. Just to clarify, I am not Sun's spokesman, and my blog is not Sun's soapbox. It is mine (though I happen to work at Sun). :-) Also, I am not trying to convince people to not use native code categorically. The article went into details about what happens behind the scenes specifically in order that the developer will be able to make a more informed decision.
There are certainly good reasons for using native code, and your case of getting Java code to work with legacy code is one good reason ... I call it the "glue" reason (as in glue to legacy code). Sorry, I forgot to talk about that (it escaped my mind). I say "good" not because it is easy, or desirable from a performance standpoint. But "good" because under real-world development schedules and budgets, you may not always be able to do a re-write of legacy code using the Java language. Now, that sounds outrageous ... to have to rewrite all that legacy code. There will be significant development cost, a risk of introducing bugs and destabilizing a system, and a whole host of other negatives that may or many not apply depending on your situation. Yes, I know.
But simply compiling C (or C++ or Fortran) code directly into Java bytecodes may not yield the same level of performance one gets when compiling C to machine code. This is because the programming paradigms don't match exactly. C may look like the Java language but the resemblance doesn't go much further beyond its syntactic appearance. To do a proper translation job requires taking the difference in paradigms into account. This is a lot more difficult to do than conventional compilation. Maybe this is why tools for such purposes are scarce. But I am just making an educated guess here. I don't work with such translators/compilers and therefore don't have detailed knowledge of the degree of efficacy they can achieve.
What is left is the common way (which you are stuck with) of interfacing legacy code via native glue. To get performance there, my advice (the same one I give to JavaME developers) is to minimize code flow transitions across the native and Java domains. That's where both the native and Java side loses performance. If you are able to use the legacy code as a somewhat self-contained service (it's like an RPC to a remote service), then the native code can retain its performance, and the boundary transition penalty can be relatively infrequent and negligible.
Hence, writing native methods is also an art of choosing where to draw that boundary between the native and Java domains. Draw it too high (less Java code), and you will get less portability. Draw it too low (a lot of rewrite into Java code, but not a complete one), and you will incur a high native boundary crossing penalty. The ideal line is drawn somewhere in between. There is, of course, the option to rewrite everything in the Java language i.e. abandon the old system, and upgrade. While some people do this, not everyone can afford the upgrade. Sorry. It's an imperfect world. But thanks for your input.
Regards, Mark
Posted by: mlam on December 08, 2006 at 12:41 PM
-
Hi genepi,
Those are interesting ideas. And my team (the JavaME CDC team) has actually talked about it before. Implementation, however, is another issue. The compiled code that is most optimal is based on code paths that have been executed multiple times. This means that certain constant pool entries have been resolved, static initializers have been run, certain objects have been instantiated and initialized in memory, and that certain hardware resources have been initialized in a certain way. If any of these are not persisted to the next run as well (or re-initialized properly then), the compiled code from the previous run could/would be invalid. Hence, it take more than persisting the profiling information and the compiled code.
In C or C++, these additional artifacts would manifest as global variables for the data, or initializers that get run unconditionally in some order for the initialization process. It's not like you get them for free. The only difference is that you don't incur the compilation time, and the allocation (since C/C++ can use globals).
But certainly, there may be something that can be done to lessen the repetition. Thanks again for your input.
Regards, Mark
Posted by: mlam on December 08, 2006 at 12:58 PM
-
Hi bellbux,
Thanks for your input on parallelism and the 64bit heap issues. Sorry, I wasn't able to say something about them in the first place. I wasn't even conscious about issues in those domains.
I also apologize if I have misrepresented the article. While editing for brevity, I did cut out a lot of introduction text that could have framed the article better. However, while I speak mostly from the perspective of the CVM, I did intend to give some perspective on how wide spread the issues are by pointing out their presence or absence in other VMs. The issues are applicable to JavaSE (to some degree) though I see that I don't have the complete picture there yet. Thanks again for your input.
Regards, Mark
Posted by: mlam on December 08, 2006 at 01:15 PM
-
Mark,
Your article was great, don't be hard on yourself. It was just that I came to the article via the front page of java.net, which provided no hint it was a J2ME article. Its something to bear in mind for the web - people can come to your article from all sorts of places with no context - so intro paras are rarely a waste of space!
Posted by: bellbux on December 09, 2006 at 01:57 AM
-
I'm truly sorry for posting this publically, but I couldn't find any other way to contact you. I don't mean to emabarass you, but I think you chose an unfortunate example with ZLIB. I've known for a while that native zlib is actually slower than a pure Java implementation. Your blog convinced me to finally report the issue to Sun (see below). If your initial reaction is that "this guy is crazy", then I'm in good company. I didn't believe it myself. I encourage you to run the test program from the incident report. Try it with multiple threads on a multi-processor machine as well. I'm not affliated with the JZLib project, other than being a satisfied user (after much pain and suffering caused by the native zlib implentation that has memory leaks, see this: Bug 4797189).
Maybe as a Sun engineer you could use your influence to have this looked at. It would be a poster child for JVM technology. Just think of the headlines: Pure Java code faster than hand coded assembly!
Again, sorry for sticking my tongue at you in public. It was only meant in the best spirit of open information exchange.
Thanks
Moh
---------------------------------------------------------------
Your Report (Review ID: 860642) - Deflater and Inflater in java.util.zip use slow native code
Date Created: Sat Dec 09 09:57:17 MST 2006
Type: rfe
Customer Name: Mohammad Rezaei
SDN ID: rezaei
status: Waiting
Category: java
Subcategory: classes_util
Company: none
release: 5.0
hardware: x86
OSversion: win_xp
priority: 4
Synopsis: Deflater and Inflater in java.util.zip use slow native code
Description:
A DESCRIPTION OF THE REQUEST :
java.util.zip has been available since Java 1.0. At that time, it was necessary to use native code for performance reasons. However, since then, JVM technology has progressed to a point where native code is actually slower than pure Java code. This can easily be demonstrated using the following test program. The program requires a port of zlib to pure Java, currently available as an open source project (JZlib, http://www.jcraft.com/jzlib/).
On my machine, the program runs 3x faster with pure Java code.
JUSTIFICATION :
1) Native code is slower (see above example)
2) Native code is insecure. There have been several fixes in zlib over the past few years having to do with buffer overflow. In a pure Java implementation, there can be no such attack vector. A Java program using native code is at the mercy of the locally installed zlib version.
3) Native code is unpredictable. There is no guarantee than the version of native code is the one the program was tested with.
4) Deflater and Inflater have been the cause of many bugs, mostly having to do with memory leaks and performance. These are a direct result of using native code, as the garbage collector cannot reclaim natively allocated memory. A pure Java implementation does not suffer from the same. (6487640, 4813885, 6293787 and many more)
5) Zlib compression is critical to many aspects of not only application programs, but also Java itself. Application startup time can potentially be improved with better performance.
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Please replace the native calls with pure Java.
ACTUAL -
Calling native code causes poor performance, memory leaks and platform dependence.
---------- BEGIN SOURCE ----------
package test;
import com.jcraft.jzlib.ZOutputStream;
import com.jcraft.jzlib.ZInputStream;
import java.io.ByteArrayOutputStream;
import java.io.ByteArrayInputStream;
import java.io.IOException;
import java.util.zip.*;
public class TestZlib extends Thread
{
private static final int DEFAULT_MEMORY_SIZE = 25*1024*1024; // 25 megs
private static final int DEFAULT_THREADS = 1;
private int memorySize;
public TestZlib(int memorySize)
{
this.memorySize = memorySize;
}
public static void usage()
{
System.out.println("usage: TestZlib <memory per thread> <number of threads>. Defaults are 1 thread and 25000000 (25M)");
}
public void run()
{
byte[] incoming = createInput();
while (true)
{
try
{
long now = System.currentTimeMillis();
int standard = runTestWithStandardLib(incoming);
long delta = System.currentTimeMillis() - now;
System.out.println("Standard Java libraries that call native code took "+delta+" ms. Compressed size "+standard);
now = System.currentTimeMillis();
int jzlib = runTestWithJzlib(incoming);
delta = System.currentTimeMillis() - now;
System.out.println("Pure Java JZLib took "+delta+" ms. Compressed size "+jzlib);
}
catch (Throwable t)
{
System.out.println("caught exception: " + t.getClass().getName() + ": " + t.getMessage());
t.printStackTrace();
}
}
}
private int runTestWithStandardLib(byte[] incoming)
throws IOException
{
ByteArrayOutputStream bos = new ByteArrayOutputStream(1000);
DeflaterOutputStream zos = new DeflaterOutputStream(bos, new Deflater(5, false));
for (int i = 0; i < incoming.length; i++)
{
zos.write(incoming[i]);
}
zos.close();
byte[] buf = bos.toByteArray();
ByteArrayInputStream bis = new ByteArrayInputStream(buf);
InflaterInputStream zis = new InflaterInputStream(bis);
int bytesRead = 0;
int totalRead = 0;
while ((bytesRead = zis.read(incoming, totalRead, incoming.length - totalRead)) > 0)
totalRead += bytesRead;
if (totalRead < memorySize)
{
System.out.println("did not decompress the same number of bytes!");
}
return buf.length;
}
private int runTestWithJzlib(byte[] incoming)
throws IOException
{
ByteArrayOutputStream bos = new ByteArrayOutputStream(1000);
ZOutputStream zos = new ZOutputStream(bos, 5);
for (int i = 0; i < incoming.length; i++)
{
zos.write(incoming[i]);
}
zos.close();
byte[] buf = bos.toByteArray();
ByteArrayInputStream bis = new ByteArrayInputStream(buf);
ZInputStream zis = new ZInputStream(bis);
int bytesRead = 0;
int totalRead = 0;
while ((bytesRead = zis.read(incoming, totalRead, incoming.length - totalRead)) > 0)
totalRead += bytesRead;
if (totalRead < memorySize)
{
System.out.println("did not decompress the same number of bytes!");
}
return buf.length;
}
private byte[] createInput()
{
byte[] incoming = new byte[memorySize];
for (int i = 0; i < incoming.length; i++)
{
incoming[i] = (byte) (Math.sin(i) * 256);
}
return incoming;
}
public static void main(String[] args)
{
int numThreads = DEFAULT_THREADS;
int memSize = DEFAULT_MEMORY_SIZE;
try
{
if (args.length > 0)
{
memSize = Integer.parseInt(args[0]);
}
if (args.length > 1)
{
numThreads = Integer.parseInt(args[1]);
if (numThreads > 200)
{
System.out.println("that's too many thread!");
}
}
}
catch (NumberFormatException n)
{
System.out.println("quit fooling around!");
usage();
System.exit(1);
}
System.out.println("running with " + numThreads + " threads and " + (memSize / 1024 / 1024) + "M per thread");
for (int i = 0; i < numThreads; i++)
{
new TestZlib(memSize).start();
}
}
}
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
Use JZlib in application code.
workaround:
Posted by: mohrezaei on December 09, 2006 at 09:44 AM
-
Hi Moh,
Thanks for the info. And I'm not embarassed. :-) I think it's actually better that this information is disclosed out in the open rather than emailed to me personally. It will do more good this way. Regarding my influence as a Sun engineer, I actually don't have a lot more influence than you as a member of the community. In fact, nowadays, with the Java platform being open-sourced, you can actually implement this change yourself. The decision, as to whether to accept the change or not, is based on the merit of the change itself. This is the power for open-source, and one of the major reasons to open-source the Java platform ... you, an individual in the community, can change the JDK. You don't have to be a Sun employee. And everyone benefits mutually.
BTW, though I'm intrigued with what you've pointed out, just to be clear, this is not an endorsement of the proposed changes. That still needs to go through the proper submission and review process. The same goes for proposed changes from within Sun too. It's just to make sure that all the kinks have been ironed out. Thanks again.
Regards, Mark
Posted by: mlam on December 09, 2006 at 10:11 AM
-
After a bit of experimentation, I found out that the zlib native code can be faster than the pure Java implementation. The piece of code that exposes the weakness of the native code is this:
for (int i = 0; i < incoming.length; i++)
{
zos.write(incoming[i]);
}
Sending one byte at a time slows down the native code a lot more than the pure Java code. Sending 16 bytes at a time will show equal performance for both native and pure Java. At 128 or more, the native code approaches the steady state, which is about 2x faster than the pure Java code.
So maybe ZLIB is not a terrible example from a performance perspective. It does, however, expose all the other problems of native code (memory leaks, insecurity, platform dependence, etc).
Thanks
Moh
Posted by: mohrezaei on December 09, 2006 at 10:11 AM
-
Mark, I don't quite follow the statement:
"Even JavaSE's VM which uses only one stack..."
Do you mean that Java SE's VM has as mixed stack that contains Java and native frames? As per the VM spec and AFAIK (given my limited knowledge) of Sun's implementation, aren't the java stack and the native stack distinct & different? Can you elaborate please?
Posted by: bharathch on December 09, 2006 at 11:51 PM
-
Hi bharathch,
My article is one of many in a series that talks about the internals of the phoneME Advanced VM (CVM) for JavaME CDC. The context for that statement is based on information that I disclosed in previous articles, especially this one. However, I only explained JavaSE's single stack approach very briefly there. But essentially what I meant is that the JavaSE VM pushes and pops Java frames on the physical native stack (based on my limited understanding ... Disclaimer: I work in JavaME, not JavaSE). I guess you can call it mixing (though the term "mixing" does not capture all the details / nuances of this arrangement). The VM spec section you pointed out only states the need (or lack thereof) for a native stack which is, conceptually (in abstraction), different than the Java stack. In implementation, the VM can choose to use whatever data structure it wants to implement this abstraction. In the JavaSE VM's case, it uses the same physical native stack to implement the 2 logical stacks. Hence, a single "physical" stack in contrast with CVM's dual "physical" stacks.
Regards, Mark
Posted by: mlam on December 10, 2006 at 08:06 AM
|