Peer Presence in JXTA
First, I apologize for my long absence from blogging at java.net. I have a busy job as Chief Architect at No Magic. I do talks at conventions, training, run development in three countries (Lithuania, Thailand, and the US), and sing and dance for customers. Kind of hard to get time to blog on my passions of P2P and Java. So without further ado, no more excuses.
Lots of people assume you can just ask for who is online. Not going to work, unfortunatly.The problem is that the request is done through the protocol rather than a DB call which is what you would expect.
In the current implementation the RDV has an in-memory knolledge of the peers and their information. As each new peer arives, its info is added to the list plus broadcast to all connected peers. The new peer gets a copy of the list so that it knows about everyone currently on line.
In my new version, the data passed around will be minimal with just the peer ID, an email, and a last update date. The brunt of the data will be stored in an advertisement indexed by the peer ID. If the last update is newer than the local advert, the peer fetches the advert by specifying the peer ID and using the peer ID as the index (confused yet?). This works because the advert is matched via a primary key. If you just asked for adverts of a particular type, you only get the first few advertisements.
Now, if the peer is coming online and it has never done so before, it gets a list of peers online. It iterates through this list, asking for an advert of each peer. It only asks for the advertisement, but does not specify that only that peer be searched. So, what happens is that the nearest peer that has that advertisement will reply. The peer then checks the date of the last update and the date in the advertisement. If the date is older, it then re requests an advertisement from the peer+peerID so that it gets the most up to date copy. The upshot of this is that the info about the peer gets replicated so that no one peer is inundated with info plus each peer caches its view so that requests are limited to peers that have just changed their resume'.
Why not store all this on a RDV? The problem with that is it turns the RDV into a server. It loads it down with cpu, bandwidth, and memory reduction. It also causes it to be a greater point of failure. The less information stored on a RDV, the better.
Now for the magic of the presence system. I had said we don't want the RDV to serve. Well, it has to do a little, but only to peers connected. The data is also transient. As a peer connects it adds data. As it disconnects, it removes data. But if the RDV fails or is taken offline, peers will reconnect to a new RDV and add their info to the new RDV. So this is in fact fault tolerant.
Now for my next trick. Peers need to belong to peer groups. These help to do three things. First is just the address space, second is the ability to accept messages between peers only in the peer group, and finally to only use resources within the peer group. The final bit is of most concern to us. Since the core of peer presence is run from the RDV, only the peers in the peer group will get messages and be able to see data about fellow peers. This helps scale the system plus if we are doing this the most efficient way, a couple of peers within each group are volunteered as RDV which means no single computer does the duty for all peers.