Skip to main content

When can JMX notifications be lost?

Posted by emcmanus on August 23, 2007 at 12:32 AM PDT

The href="http://java.sun.com/javase/technologies/core/mntr-mgmt/javamanagement/best-practices.jsp">JMX
Best Practices guide says notifications can sometimes be lost.
Why is that? When might it happen? Read on.

Here's the href="http://java.sun.com/javase/technologies/core/mntr-mgmt/javamanagement/best-practices.jsp#mozTocId387765">relevant
text from the Best Practices guide:

It is important to be aware of the semantics of notification
delivery when defining how notifications are used in a
model. Remote clients cannot assume that they will receive all
notifications for which they are listening. The JMX Remote API
only guarantees a weaker condition:

A client either receives all notifications for which it is
listening, or can discover that notifications may have been
lost.

This text might seem somewhat alarming. First of all, notice
that it only applies to remote clients. A local client
(within the same Java VM) will reliably get all notifications it
asks for.

Secondly, the text is describing something that will only
happen in unusual circumstances. Notifications will only be lost
when they arrive so fast that they cannot be delivered to the
remote client quickly enough, or if there is a long network outage
during which enough notifications arrive to overflow the
notification buffer on the server. If you're sure that the rate
of notifications is always low then you probably don't need to
worry. Long network outages will probably trigger other problems
in your client, so you'll need to deal with them more generally
than just worrying about lost notifications.

Careful clients

But if you have many notifications, you probably want to follow
the advice in the subsequent paragraphs of the Best Practices
guide:

Notifications should never be used to deliver information
that is not also available in another way. The typical client
observes the initial state of the information model, then reacts
to changes in the model signalled by notifications. If it sees
that notifications may have been lost, it goes back and observes
the state of the model again using the same logic as it used
initially. The information model must be designed so that this
is always possible. Losing a notification must not mean losing
information irretrievably.

When a notification signals an event that might require
intervention from the client, the client should be able to
retrieve the information needed to react. This might be an
attribute in an MBean that contains the same information as was
included in the notification. If the information is just that a
certain event occurred, it is often enough just to have a
counter of how many times it occurred. Then a client can detect
that the event occurred just by seeing that the counter has
changed.

Stateless servers

The design of the existing standard connectors is such that
notification loss can happen when there are many notifications
coming from the MBeans in the MBean Server. This is true even for
clients that are only listening for a small subset of those
notifications. In the extreme case, a client that is listening
for a very rare notification might not see it, because other
MBeans are generating frequent notifications that nobody is
listening to. Once again, the client can tell that this has
happened (via href="http://java.sun.com/javase/6/docs/api/javax/management/remote/JMXConnector.html#addConnectionNotificationListener(javax.management.NotificationListener,%20javax.management.NotificationFilter,%20java.lang.Object)">JMXConnector.addConnectionNotificationListener).

The existing connectors behave like this because they have been
designed to have no non-transient state on the server. A
consequence is that the server has no non-transient record of
which clients are interested in which notifications. Therefore it
has to store all notifications in its buffer, in case
some client it doesn't remember is interested in them.

The servers were designed to have no non-transient state for
better scalability. In retrospect, this was probably a design
mistake. In many client/server systems, you have one server, or
just a few servers, and a large number of clients. So limiting
state in the server is an excellent idea, because it allows the
server to handle many more clients. But in management systems,
the situation is usually the opposite: you typically have one
client (a management program such as href="http://java.sun.com/developer/technicalArticles/J2SE/jconsole.html">JConsole)
that may connect to and manage many servers. There are no common
use cases where a server might have a large number of JMX
clients.

In version 2.0 of the JMX API, being defined by href="http://jcp.org/en/jsr/detail?id=255">JSR 255, we are
adding an href="http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=5108776">Event
Service. Among other things, this will fix the problem where
a client might lose notifications that it is interested in because
there are many other notifications that it is not interested in.

Notification loss is inevitable

Even with the new Event Service, notification loss will still
be possible, however. Can't we get rid of it?

To answer this question, consider what happens when
notifications are produced faster than they can be handled. This
might be because of network delays, or because the client needs to
do some work for each notification. Suppose this situation
persists. What should the system do?

There are basically three possibilities:

  1. Some notifications are eventually dropped. This is
    what the JMX Remote API does, and it is also what the new Event
    Service will do.
  2. Notification senders are slowed down. This is what
    usually happens in the local case. An MBean sends a
    notification to a local listener by invoking the listener's href="http://java.sun.com/javase/6/docs/api/javax/management/NotificationListener.html#handleNotification(javax.management.Notification,%20java.lang.Object)">handleNotification
    method. Unless it has multiple threads, the MBean will wait for
    that method to complete before doing anything else, including
    sending any more notifications.
  3. Notifications accumulate in an unbounded buffer.
    This is actually the worst solution. In the real world there is
    no such thing as an unbounded buffer. And even if you save the
    notifications in a giant disk, which is effectively unbounded,
    you still haven't fixed the problem that the client is getting
    further and further behind the server. When the client finally
    gets a notification that was sent yesterday, is that still any
    use?

When we were designing the JMX Remote API, we assumed that most
MBeans that send notifications were not expecting sending to be
slow. In the local case, sending is just invoking a method, and
that method is usually punctual. If we had wanted to apply
solution 2, slowing down senders, that could have broken the
assumptions of existing MBeans. Coding MBeans so that they can
cope with a blocked send would also be considerably more
difficult. So, even though this solution ( href="http://en.wikipedia.org/wiki/Flow_control">flow
control
) is arguably better, we were reluctant to impose
it.

The future: JMX Event Service

As I mentioned, in version 2.0 of the JMX API we are designing
a new Event Service. This will be part of the JDK 7 platform.
Though it will not eliminate notification loss, it will
significantly reduce the likelihood of such loss. And it will
also allow you to plug in your own transport for notifications.
In particular you could plug in the href="http://java.sun.com/jms">Java Message Service to use an
existing message bus.

[Tags:
.]

Related Topics >>