Skip to main content

Beware of the Natives

Posted by mlam on December 7, 2006 at 2:11 PM PST

There are a lot of not so nice things about using native methods. Here are some:

  • less safe - think "stray pointers".
  • less portable - you'll have to recompile them for every target device architecture you deploy on, present and future.
  • less cost effective - need extra work to build and test all the architecture variations, extra disk storage for deploying all the different binary versions, etc.
  • less manageable - can be a "binary version" tracking, management, and device provisioning nightmare.

But if these reasons aren't enough to deter you from using native methods, try this on for size:

Native code can hurt performance

This seems to go against most people's expectations, but it is the truth. First of all, there is the reason due to what goes on in the runtime stacks when you invoke native methods. I've talked about that in my previous articles (here, here, and here). There, I showed that using native methods incurs bootstrapping and extra frame pushing/popping overhead which results in degraded performance. But there are also many other reasons besides this.

To be fair, native code can be used to help improve performance when used in the right places. I will explain those cases as well. The key is to use native code "carefully".

Ok, let's go bust the "native" myth ...

Anything you can do, I can do better!

Having a dynamic adaptive compiler (JIT) is more than simply translating bytecodes into equivalent machine instructions one bytecode at a time. Such a naive translation strategy is more like an assembler than a compiler, and usually yields anywhere from less than 2x to 5x performance gains. The higher 5x performance gains is a generous estimate, and can only be realized if the method being compiled only does some very simple arithmetic in a small loop. Any real world application, even a HelloWorld program, will not realize that kind of gain. Even the smallest JITs today such as that in the phoneME Feature (CLDC) VM will be able to do better than this.

The phoneME Advanced (CDC) VM, CVM, has a JIT that is far more sophisticated than that. The CVM JIT can yield gains from 4x to 14x (over interpreted bytecodes) depending on the application. The high end numbers are for very simple applications like the small naive loop. Note: these numbers are very rough estimates that I made using an educated guess based on internal measurement results. Actual performance numbers may vary (lower or higher) depending on the device architecture and benchmarks used.

But is this enough? Can JIT compiled code perform as well as or better than native code? Well, first we need to understand how the JIT gives us some of these huge gains. I'll highlight 2 aspects:


When a JIT compiles a method, it can choose to inline a method that is called from the method being compiled. For example, method mA calls method mB, and we're compiling mA. In this case, mB can be inlined into mA as part of mA's compilation. Inlining reduces call overhead between methods (e.g. for pushing/popping frames). Also by inlining the code of mB into mA, the 2 pieces are now placed in closer proximity. This improves cache locality which helps performance. Since mA is likely to call mB, it would hurt performance if the 2 are far apart.

Runtime Profiling

Unlike static compilers (like C compilers), a JIT makes use of runtime profiling techniques to determine what methods and/or code paths are hot (i.e. frequently used). And the JIT will only compile the methods that are hot.

In CVM, each method has an invocation counter that tracks its "hotness". The JIT inliner actually factors the hotness of callee methods into its determination of whether to inline or not. Hence, callee methods that are more hot will be more likely to be inlined. Less hot methods will be less likely to be inlined. This means that the inlining is selective of hot code paths that are taken by the application.

For example, mA calls mB1 and mB2. mB1 calls mC1. mB2 calls mC2. From profiling, we know that mA calls up to mC2 a lot via mB2, but not so much to mC1. When compiling mA, this allows us to choose to inline mB1, mB2, and mC2 into mA, but not mC1.

This allows us to invest the cost of inlining where it will yield us the most performance gains, while not incurring the cost as much in other lower yield areas.

No, you cant!

But can't native methods also do the same? Yes, to some extent, if you are talking only about C functions and not Java methods that are native. C compilers do have inlining. But there are problems as we'll see below.

There are also profilers for C code. But applying these to Java native methods is something else.

Yes, I can!

C functions can certainly do some inlining. The issues is which callee methods to inline? The choice of methods to inline is seldom known to the C compiler as compilation is done at build time rather than at runtime. To get around this problem, people do use native profilers to capture a picture of the application's hot methods and paths. This profile is then fed back into the C compiler to guide its inlining and optimizations. This is all done during development time.

The problem with this is that it assumes that the captured profile is representative of the actual execution profile of the application at runtime under real usage. A lot of times, this static approach does yield fairly good results. But it is no substitute for what a JIT can do with a runtime profiler if the application is very dynamic in nature.

Now, factor in that the Java platform is a dynamic environment where new code can be downloaded at runtime. In other words, the callee method may not even be known or available to the C compiler and/or profiler at development time. There is no option to tweak the optimizations for executing code that doesn't exist yet.

Another issue is with the Java language's support for virtual methods. With virtual methods, the C compiler won't be able to know which callee method to inline. Most modern JITs (including CVM's) employs a technique called speculative inlining to solve the problem of inlining virtual methods. What happens is that the JIT will inline an expected callee method based on its hotness. When the compiled method is executed, it will first check if the method to be called is the expected one that was inlined. If so, it will proceed with executing the inlined code. Otherwise, it will do a regular virtual invocation. There are also many other variations on this scheme, but the basic idea is the same.

This speculative inlining together with the runtime profiling allows virtual methods to be inlined efficiently by JITs. It would be difficult for a C (or C++) compiler to implement speculative inlining in an efficient way without the help of a runtime profiler. One possible approach is to inline everything blindly or to use some heuristics. The blind approaches results in too much code bloat without necessarily yielding results. Code bloat also hurts cache locality and, therefore, performance. The heuristics approach is a blind guess. You may have a winner or a loser, and you won't know for sure until after your code has been deployed.

Anything I can do, I do better than you!

When I started this discussion, I was talking about Java native methods, not ordinary C code. For native methods, the performance impact is even worse. This is because JITs do not know what goes on in a native method (with one exception explained below). Having a native method in a call chain essentially stops a JIT inliner at the point where the native method is encountered. It cannot inline past the native method.

Secondly, in order to be at least somewhat portable, native methods need to be written using an API like the Java Native Interface (JNI). If you look at the JNI specification, you'll see that every access to Java data structures like methods and fields are through the use of an indirect function call through the JNI environment interface. Hence, if your method needs to manipulate Java object fields on a regular basis, the field accesses will incur a lot of overhead in terms of multiple function calls. In contrast, JIT compiled code will be able to access the same object field just like C accesses a struct field i.e. in a few or only a single machine instruction. The JNI overhead is in the order of something like 20x to 100x. That's a stiff penalty.

To be proper and correct, a lot of JNI API calls need to be followed by an explicit exception check using JNI's ExceptionCheck() API. For example, this needs to be done after calls to all method invocation and allocation APIs. This is because the VM has a right to throw an OutOfMemoryError at any of these junctures. Without the check, the native method may be proceeding in an unstable environment, or using memory where it is unavailable. That would simply be bad programming. These checks will add additional overhead.

In JIT compiled code, these checks are not explicit i.e. no time is spent on executing any such checks. The VM simply handle the exception automatically. The only case where the VM can't handle it automatically is when it encounters a native method in the call chain. Go figure! That's when it has to hand control over to the native method, and let the native method do the check explicitly.

Apart from these, there are other costs to using native methods that hurts performance. These include marshalling costs of method arguments and return values. Remember that CVM uses 2 stacks: a native stack, and a Java stack. Arguments will need to be marshalled across these stacks. Even JavaSE's VM which uses only one stack will incur additional marshalling costs when crossing from Java code to native, and vice versa. However, their cost is less than CVM's.

Another cost is the cost of VM state changes. In CVM, a thread is always in one of 2 states: GC safe or GC unsafe. I'll leave the meaning of these states for a later day when I discuss the GC (garbage collector). But for reasons not explained here but are necessary, native methods need to operate in a mostly GC safe state, while for performance reasons, compiled code operate in a mostly GC unsafe state. Going across that boundary from compiled Java code to a native method will incur some additional overhead in terms of these state changes. The JavaSE VM also has similar (but different) VM states, and the issue exists there as well.

So, if you don't have a good reason for using a native method, then don't. Chances are your native code will be slower, and you may also get sloppy and introduce bugs (such as forgetting to do that exception check at the proper place). Native methods not only cannot be optimized by the JIT, it prevents the JIT from doing the best job it can as when there is no native methods in the call chain.

Why Bother?

So, why allow native methods at all? At some point, we will have to access OS or hardware resources. Even the Java libraries that provide access to these need to do some native work at some point. If the Java libraries do not provide access to these resources, then that may be a case when a native method is warranted. Chances are there will already be such APIs. If not, perhaps you should get involved with the JCP process and get a JSR together to handle the missing piece. You could go with the native method solution, but as I pointed out, there can be significant cost to doing so.

For the application developer, it is usually best to just write in the Java language. There is one exception to this rule of thumb. If the type of computation that needs to be done by the native method does not involve a lot of access to Java objects and do not call Java methods frequently, then you may have a candidate for a native method. First, you should make sure that there is actually a benefit in writing that method as a native. An example of such a method would be like a ZIP library method that decompresses a compressed file. The decompression works purely on a C array and does not return to the Java domain until its job is done. In other words, in order to do its job, it does not need to cross that boundary between the C and Java domains frequently. Generally, one big source of performance costs of native methods come from the crossing of that domain boundary.

Use the Standard Java Libraries

Note that the standard class libraries already come with a package to let you do that zipping stuff. In general, I think the standard class libraries usually already provide you all the APIs you will need to access OS and hardware resources, or to do computationally intense operations (like the ZIP operations).

There is another incentive to use the standard libraries instead of rolling your own native. I explained that native methods are not only themselves not optimal, but also stop the JIT from doing its work optimally. This because the JIT has no knowledge of what's in the native methods. However, for native methods in the standard class libraries which need to be native, there is a solution that can allow the JIT to still perform optimally. This solution is the use of intrinsic methods (commonly abbreviated as intrinsics). Support for intrinsics is available both in CVM and the JavaSE VM.

Intrinsics are methods whose semantics are already known by the VM and its JIT at VM development time. Hence, the only native methods that can be intrinsics are the ones in the standard libraries. Amongst other benefits, intrinsics effectively allow JITs to inline native methods (in some form) where it couldn't be done before. Hence, intrinsics allows native (OS and hardware) resources to be accessed without necessarily incurring all the costs of a native method. JITs will still be able to generate optimal code around calls to these intrinsic native methods. You won't be able to get this benefit if you roll your own native methods to access those resources.

Having said this, not every native resource can be accessed through intrinsics. The choice of which methods to be made intrinsics is dependent on how the VM and library developer chooses to optimize for the device. There are tradeoffs involved and a cost to having intrinsic methods. Therefore, it is not possible nor wise to make every method in the standard library into intrinsics. However, this choice is usually only made by platform developers. Application developers have no direct control on this, nor any awareness of which methods are intrinsics or not.

Final Word

In the above, I talked about some of the costs incurred with using native methods. This discussion is by no means exhaustive, but it is adequate to illustrate the issue. In general, for the application developer, using native code means getting less of the good stuff and more of the headaches. For platform / device manufacturers, native code is a necessary pain. But intrinsics can at least help with some of the performance impact of having native methods.

So, beware of natives. :-)

Related Topics >>