|
|
||
Mark Lam's BlogCVM JIT Constant Pool DumpsPosted by mlam on March 12, 2008 at 11:52 PM | Comments (8)Hello World! It's been a long time ... ummm ... like 6 months since I last wrote an entry. What can I say? That's the problem with having a day job, and so far, all the ideas for things that I want to write about involves some heavy duty writing that will take up a lot of time. So, I've been putting it off. Sorry. However, this inquiry came in today on one of my previous blog entries. Now, this, I can answer without taking up a few days of writing time. So, here you are ... The Question
I took some liberty with editing the comment for clarity. Jamsheed, I hope you don't mind. What's a Constant Pool? Instead, Jamsheed is referring to constants that are referenced by code that the CVM JIT compiler generates. These constants may be the same values that are fetched from the classfile constant pool, but that's beside the point. Now, there will be more than one of these constants used in the JIT generated code. So, instead of spreading them all over the generated code, we "pool" them together i.e. we keep them in one (or a few) places. We call these places (where we pool the constants) the constant pools. This is the constant pool that Jamsheed was referring to. Why Pool Constants? Now consider what happens when you have your constants spread out throughout interlaced between all the JIT generated code. If that occurs, when you execute code that is located adjacent to the constant, the constant may also be loaded into the i-cache as well simply because it is located near the needed code. The cache manager doesn't actually know if the bits in memory are code or data. It just loads a few words of memory at a time into the respective cache. If those few words include a constant in the midst of code being executed, the constant will get loaded into the i-cache as well. When this happens, some space in the i-cache will taken up for something (i.e. the constant) that will never be executed as code. This is inefficient use of the limited and precious i-cache space. Similarly, when loading the constant as data, the cache manager will load a few words of memory around the constant into the d-cache. If all the words around the constant are actually code and not data, then the d-cache will now contain wasted space for words that will never be used as data. The end result of all this is less cache locality, and that means that the code will run slower. By pooling the constants together, we lessen the probability of these kinds of i-cache and d-cache inefficiencies occurring. :) But I digress. Now, let's get back to Jamsheed's question ... The ARM Instruction Set Let's say you (i.e. the JIT) just emitted a load instruction to load a constant. Because you want to pool the constants together, you don't actually emit the constant yet. Instead, you keep a record of where the load instruction was emitted, and later on when you emit the constant (and therefore, finally know where it is located), you'll come back and fix up this load instruction with the proper offset for that constant. The question is ... how do you know when to actually emit (also commonly referred to as "dump") the constant? If you dump it too early, then you may not be pooling as many constants as you possibly could, thereby increasing the cache inefficiency issue I described earlier. If you dump it too late, then the constant may be out of reach of the 4K range of the load instruction that needs to reach it. The answer is to do periodic checks for a need to dump constants i.e. we'll dump them out into a pool whenever we feel that we may reach the 4K range limit soon. See CVMJITcpoolNeedDump() in src/share/javavm/runtime/jit/jitconstantpool.c. Does the CVM JIT really do that? Hence, we can't actually wait till we reach the 4K limit before dumping the constants. Note that there's also a chance that we may have collected a large number of constants. When we dump the constants, each constant also further increases the offset for the next constant. Also, if we don't dump right now, we don't know when the next opportunity to dump will show up. If it shows up too late, then we'll have a JIT compilation failure. To address this issue, the CVM JIT uses a heuristic and dump whenever we reach a distance of 2K limit from the original load instruction i.e. we tradeoff some cache inefficiency to make sure that we can reach the constants from the load instructions. By now, you might be thinking ... what a retarded JIT compiler! Surely it can do something more intelligent and inch out every last bit of offset possible between the load instruction and the constant pool dump. Well, theoretically, the JIT can do that. But in this case, we're talking about a JavaME VM JIT, and it needs to be fast and efficient i.e. the CVM JIT is not allowed to take up too much time and memory to do the compilation. Using the above heuristic is a cheap but effective trick that gets the job done without sacrificing too much performance. "Cheap" is good for embedded devices. :) More CVM JIT Details So, remember earlier when I said that there's one other place where we can dump constants? Well, that place is in between every sequence of instructions that the JIT grammar rules may emit (with some proper code to branch around the constant pool dump, of course). There is a logical break between each sequence of instructions emitted for each JIT grammar rule where we can insert a constant pool dump. Because of the small offset range of ARM VFP constants, the CVM JIT is forced to allow dumps more frequently like this. This doesn't necessarily mean that there will always be a constant pool dump every 128 bytes of instructions or so. It only means that when there are constants to dump, you may see them show up every 128 bytes or so in the worst case. Fortunately, our benchmark data shows that performance is not impacted by this (or at least not significantly enough to be noticed). But I am still digressing ... The Question again The Answer ... finally Having said that, if you take a look at the range limit issues and the heuristic that the CVM JIT has to employ to predict when the last possible opportunity to dump constants is, you may find that this optimization will have very little effect on the overall scheme of things. Yes, dumping in forward order will help. How it will help is perhaps to allow the CVM JIT to use a heuristic ratio that is less than half the max offset range (currently it is half). This will allow more constants to be pooled before we do a dump. However, I'm not sure how much it will help and how much to change the heuristic ratio. That will be an interesting exercise to do when someone can find the time. As I've said when I started this entry, the day job is not leaving us a lot of time to play. :( Is anyone in the community willing to give this a try and report your findings? Of course, I can provide a few tips on what to do if you are interested. Last Words Have a nice day. :) Tags: CDC CVM Java J2ME JavaME JIT phoneME phoneME Advanced embedded systems Bookmark blog post: CommentsComments are listed in date ascending order (oldest first) | Post Comment
| ||
|
|