Skip to main content

Defering partition placement in a grid to avoid redistribution thrashing and other issues

Posted by bnewport on August 1, 2007 at 6:05 AM PDT

We have scenarios where a customer may want to have say 200 partitions and preload the data into the grid when the partition primaries are initially placed. The customer might want to load 100Gb of data and planned on 500MB of primary data and 500MB of replica data per JVM. Obviously, this needs, say, 200 JVMs with heap sizes of around 1.5Gb to 2Gb (total of around 400Gb) before it's possible to even start preloading.

ObjectGrid could start placing partitions as soon as a single JVM starts but that won't work in this case. We'd place all 200 primaries on that JVM with a 2GB heap and then each primary would start preloading and we'd run out of memory in under a second trying to load 100GB of data in a 2GB JVM. The lesson here is naive partition placing like this is very dangerous.

This is why we have the numInitialContainers attribute on an ObjectGrid. This lets the application developer tell ObjectGrid not to start partition placement until at least this number of JVMs have started. We could set this to 200 in this scenario and nothing will happen until all 200 JVMs have started and then the partition primaries and replicas will be placed in them. There is no memory issue because all 200 JVMs have 2GB each of heap space.

Another issue is what ObjectGrid automatically redistributes or rebalances as new JVMs start. Lets suppose we had a less extreme situation like we were keeping 10MB per partition but had 200 of them, thats 2Gb or 4Gb with replicas. Imagine the trashing at initial startup if we placed all the partitions on the first 10 JVMs that started and then started another 90JVMs. The amount of rebalancing and redundant shifting of data between JVMs would be ridiculous. The numInitialContainers attribute can be used to prevent this kind of sillyness also. It delays initial partition placement and therefore prevents this kind of thrashing by waiting until the cluster reaches a stable initial start state. This eliminates the redistribution issues completely.

You can read more about this on our wiki.

Related Topics >>