Skip to main content

Characteristics of DataGrids

Posted by bnewport on February 5, 2008 at 8:01 AM PST

This is a great question. Depending on who you ask, you get a different answer. Vendors will pitch what they think and why competitors are wrong. I guess I just want to put out what I think it is so here goes.


Topology wise, there are three styles. The fixed number of partitions where records hash to one partition is the first. This is typical of the default mode of ObjectGrid and Coherence. Gigaspaces can do this also. Clients can typically hash the key of records and then select a partition using modulo arithmetic. A route table contains entries for each partition. The entry says which JVMs hosts the primary and any replicas for that partition. Routing is then a matter of selecting the right JVM based on the record key.

The next style is different because there isn't a fixed number of partitions. When a new JVM starts then it creates a fixed number of partitions usually which receive unique numbers. The grid software places the primaries for the partitions in the newly started JVM and also creates replicas if needed on the other JVMs in the grid. This is different than the first kind because the number of partitions varies and also the partitions are sparse. The first kind has partitions numbered from 0 to N. This second kind contains a dynamic set of partitions numbered in a random way. Clients route in a different way with this style. Clearly, the key can't be mapped mathematically to a partition although there are consistent hashing algorithms which are capable of this using various indirect approaches. Usually the way clients use this style of grid is they select an available partition using some algorithm (random, round robin, weighted round robin etc). The client then receives a token/cookie to allow it to return to that partition in the future. ObjectGrid uses this style for our HTTP Session manager. We store the token in a session cookie. Outside of Objectgrids use of this for sessions, gigaspaces uses something like this for spaces. I don't think coherence has a mode like this but of course it may. The route table in this case may distinguish between local and foreign partitions. This means is the primary running on the creating JVM or did it fail over. This is important from a life cycle point of view. Imagine every time a new JVM started, we made 5 more partitions which have primaries and replicas. When it crashes we promote the replicas to primaries and create additional replicas to maintain HA. It clients could route new requests to the failed over replica then it would never empty and over time we'd end up with an ever growing number of partitions and this is a bad thing. So, the route table indicates which partitions are local or foreign (i.e. ones that already failed over). Clients only are routed for new requests to local partitions. This allows the foreign ones to eventually drain as the existing work is completed and when they are empty, the partition is destroyed. This keeps the number of partitions roughly equal to the number of JVMs times the number of partitions created for each JVM.

The last style is range based partitions. Here, the number of partitions is dynamic and instead of using a hash function to map a record to a fixed set of partitions, we use key ranges instead. We start with one partition and it has a key range of A-Z. We then add data to it and when it reaches a certain size, we split it in to two partitions, each with half the data and half the key range. This process continues and clients map keys to a partition by selecting the partition owning the range for the specific key. This has the advantage of being expandable forever. The normal fixed hash style partitioning model can suffer when a non uniform hash is used. This means records don't distribute uniformly over the partitions. You get hot partitions with more data than their peers which leads to runtime issues. The range based partitioning avoids this issue.

Of course, a more developed hash based partition can use hierarchical hashing. Initially, we'd have a fixed set of partitions and a simple hash but if one partition got 'hot' then we can treat that one partition as N more partitions and rehash the original partition to the N new ones and so on. This gives a hierarchical hash. Consistent hashing algorithms again are probably a more efficient approach to solving this anyway. This partition splitting is similar to the key range one but has more routing overhead and also cannot efficiently stream records in key order to a client. Google big table uses key range partitioning where the key is a reverse URL like com.cnn.www/index.html. This allows it to stream all pages from the grid to a client. This is difficult to do efficiently from a hash based partitioned grid.

So, thats the 3 or maybe 3.5 ways I know to organize a data grid. All products should virtualize the JVMs hosting the grid and make sure that partitions are evenly distributed across the available JVMs, maintain availability levels by creating replicas as needed and make sure primaries and replicas are separated sufficiently, i.e. in different floors/buildings/cities etc.

Data Access method

Next, we have access method. The first is network attach. Here clients attach over the network to the 'datagrid' using one of the three routing approaches described above. The datagrid looks like a shared data resource for applications attaching to it. Clients may use a local or near cache to hold frequently accessed data avoiding network hops and this cache will be stale and need to be invalidated/updated when the main datagrid data is modified. ObjectGrid supports this as does everyone else. This is probably the MOST common scenario. The datagrid can be used as a shared network attached cache in this mode.

The second is move the business logic inside the datagrid and route the business logic requests to the partition for a specific key. The logic then executes within the same address space as the data with no latency. This is the 'XTP' style and investment banks use this style to process requests with very low latencies. The downside is that applications may need to deploy code to the grid rather than just access data in it. There are lightweight approaches to deploying code though like using a language like groovy, jython or jruby etc to send the code with the request or preregister it and then invoke it on the grid.

Last, integration with legacy data backends

Last but not least, how does the datagrid live with other traditional or legacy data sources like databases or mainframe applications. There are three approaches here.

First, it's completely unintegrated. This means the applications using the datagrid first check if data is in the grid and if it is then they use it but if not then the application pulls the data from the real data source and puts it in the datagrid for next time. Object relational mappers like OpenJPA or Hibernate are examples of this approach.

Second, we have write through. Here, a 'loader' type plugin is attached to the grid. If a client looks up a record that isn't in the grid then the grid automatically asks the loader to try get it from the data source and then stores it in the grid. If data is changed in the grid by a client then the grid synchronously tells the loader to write it back to the data source. The problem with this approach is if the data source isn't available then you can't read data that wasn't already stored in the grid and secondly you can't change data when the data source is down. Two big issues compromising the availability of your datagrid. Performance is also an issue because as the datagrid warms up the data source may be heavily stressed by requests from a grid. Updates also happen using a transaction per grid transaction and again may overload even the largest of data sources.

Lastly, we have write behind. Here the grid asks the loader to preload all the data in the grid. This allows the grid to service read requests even if the data source isn't available. This is optional but if you can't do this then clearly you can't read uncached data if the data source is down. Updates are written to the grid but buffered in replicated memory. Periodically, the buffered changes are written to the data source using a large batch transaction. If the same record is updated multiple times during this period then only the last version is written to the data source. If the data source isn't available then the grid just keeps on going and will try again later. The main issue with this approach is that there is the data source isn't always up to date. This may not be a problem as this style of grid can be used to preprocess large amounts of data at grid speed before writing to the data source for later processing/reporting etc. All products can do write behind.


So, thats it. This should allow you to characterize a datagrid to fit your application. As you can see there are a variety of styles in a couple of dimensions and choosing the right 'shape' of datagrid is essential for the success of a project using datagrid technology.

Related Topics >>