The Source for Java Technology Collaboration
User: Password:



Eamonn McManus

Eamonn McManus's Blog

Making a JMX connection with a timeout

Posted by emcmanus on May 23, 2007 at 01:23 PM | Comments (14)

One question I encounter frequently about the JMX Remote API is how to reduce the time taken to notice that a remote machine is dead when making a connection to it. The default timeout is typically a couple of minutes! Here's one way to do it.

Probably the cleanest technique for connection timeouts in general is to set a connection timeout on the socket. The idea is that instead of using...

Socket s = new Socket(host, port);

...you use...

SocketAddress addr = new InetSocketAddress(host, port);
Socket s = new Socket();
s.connect(addr, timeoutInMilliSeconds);

The problem is that this is at a rather low level. If you're making connections with the JMX Remote API you usually don't see Socket objects at all. It's still possible to use this technique, but it requires a certain amount of fiddling, and the particular fiddling you need depends on which connector protocol you are using.

A lot of the time, a much simpler and more general technique is applicable. You simply create the connection in another thread, and you wait for that thread to complete. If it doesn't complete before your timeout, you just abandon it. It might still take two minutes to notice that the remote machine is dead, but in the meantime you can continue doing other things.

If you're making a lot of connections to a lot of machines, you might want to think twice about abandoning threads, because you might end up with a lot of them. But in the more typical case where you're just making one connection, this technique may well be for you.

Assuming you're using at least Java SE 5, you'll certainly want to use java.util.concurrent to manage the thread creation and communication. There are a few ways of doing it, but the easiest is probably a single-thread executor.

The method below allows you to connect to a given JMXServiceURL with a timeout of five seconds like this:

JMXConnector jmxc = connectWithTimeout(jmxServiceURL, 5, TimeUnit.SECONDS);

My first cut at the problem

In my first version of this entry, I proposed a solution with the following outline.

JMXConnector connectWithTimeout(JMXServiceURL url, long timeout, TimeUnit unit) {
    ExecutorService executor = Executors.newSingleThreadExecutor();
    Future<JMXConnector> future = executor.submit(new Callable<JMXConnector>() {
	public JMXConnector call() {
	    return JMXConnectorFactory.connect(url);
	}
    });
    return future.get(timeout, unit);
}

Half an hour after posting, I suddenly realised that this version is incorrect. It reminds of the saying that for every complex problem there is a solution that is simple, obvious, and wrong.

This solution does the right thing when the connection succeeds within the time limit, and also in the case of the problem we are trying to solve, where it takes a very long time to fail. But if the connection succeeds after the time limit, the caller will already have returned, and we'll have made a connection that nobody knows about!

The second attempt

This is the outline of my second attempt, which I believe is correct. There are several refinements we'll need to apply before having a solution that actually works.

// This is just an outline: the real code appears later
JMXConnector connectWithTimeout(JMXServiceURL url, long timeout, TimeUnit unit) {
    final BlockingQueue<Object> mailbox = new ArrayBlockingQueue<Object>(1);
    final ExecutorService executor = Executors.newSingleThreadExecutor();
    executor.submit(new Runnable() {
	public void run() {
	    JMXConnector connector = JMXConnectorFactory.connect(url);
	    if (!mailbox.offer(connector))
		connector.close();
	}
    });
    Object result = mailbox.poll(timeout, unit);
    if (result == null) {
	if (!mailbox.offer(""))
	    result = mailbox.take();
    }
    return (JMXConnector) result;
}

To understand how and why this works, notice that exactly one object always gets posted to the mailbox. There are three cases:

  • If the connection attempt finishes before the timeout, then the connector object will be posted to the mailbox and returned to the caller.
  • If the timeout happens, then the main thread will try to stuff the mailbox with an arbitrary object (here the empty string, but any object would do), so the connection thread will realise it has connected too late and close the newly-made connection.
  • If the timeout happens at exactly the same time as the connection is made, then the main thread may find that the mailbox is already full, in which case it again picks up the connector object and returns it.

Making it work

The code above is just an outline, and leaves out some necessary details. We need to refine it in several ways to make it work.

The first refinement we'll need is exception handling. The result of the connection attempt could be an exception instead of a JMXConnector. This doesn't change the reasoning above, but it does complicate the code.

The main thread calls BlockingQueue.poll, which can throw InterruptedException, so we must handle that.

About half of the final version of connectWithTimeout involves footering about with exceptions. It's times like this that I'm inclined to join the checked-exception-haters.

The second refinement is to clean up the connect thread when we're finished with it. The outline code doesn't call shutdown() on the ExecutorService, so every time connectWithTimeout is called, a new single-thread executor is created, and therefore a new thread. If you're lucky, the garbage-collector will pick up your executors and their threads at some stage, but you don't want to depend on luck.

A more subtle point about threads is that the outline code will create non-daemon threads. Your application will not exit when the main thread exits if there are any non-daemon threads. So as written, if you have a thread stuck in a connection attempt and your application is otherwise finished, it will stay around until the connection attempt finally times out. That's pretty much exactly the sort of thing we're trying to avoid. So we'll need to arrange to create a daemon thread instead.

All right, so here's the real code.

    public static JMXConnector connectWithTimeout(
	    final JMXServiceURL url, long timeout, TimeUnit unit)
	    throws IOException {
	final BlockingQueue<Object> mailbox = new ArrayBlockingQueue<Object>(1);
	ExecutorService executor =
		Executors.newSingleThreadExecutor(daemonThreadFactory);
	executor.submit(new Runnable() {
	    public void run() {
		try {
		    JMXConnector connector = JMXConnectorFactory.connect(url);
		    if (!mailbox.offer(connector))
			connector.close();
		} catch (Throwable t) {
		    mailbox.offer(t);
		}
	    }
	});
	Object result;
	try {
	    result = mailbox.poll(timeout, unit);
	    if (result == null) {
		if (!mailbox.offer(""))
		    result = mailbox.take();
	    }
	} catch (InterruptedException e) {
	    throw initCause(new InterruptedIOException(e.getMessage()), e);
	} finally {
	    executor.shutdown();
	}
	if (result == null)
	    throw new SocketTimeoutException("Connect timed out: " + url);
	if (result instanceof JMXConnector)
	    return (JMXConnector) result;
	try {
	    throw (Throwable) result;
	} catch (IOException e) {
	    throw e;
	} catch (RuntimeException e) {
	    throw e;
	} catch (Error e) {
	    throw e;
	} catch (Throwable e) {
	    // In principle this can't happen but we wrap it anyway
	    throw new IOException(e.toString(), e);
	}
    }

    private static <T extends Throwable> T initCause(T wrapper, Throwable wrapped) {
	wrapper.initCause(wrapped);
	return wrapper;
    }

    private static class DaemonThreadFactory implements ThreadFactory {
	public Thread newThread(Runnable r) {
	    Thread t = Executors.defaultThreadFactory().newThread(r);
	    t.setDaemon(true);
	    return t;
	}
    }
    private static final ThreadFactory daemonThreadFactory = new DaemonThreadFactory();

The initCause method is only used once but it's handy to have around for those troublesome exceptions that don't have a Throwable cause parameter.

I think it would be awfully nice if java.util.concurrent supplied DaemonThreadFactory rather than everyone having to invent it all the time.

Shouldn't this be simpler?

I admit I'm a bit uncomfortable with the code here. I'd be happier if I didn't need to reason about it in order to convince myself that it's correct. But I don't see any simpler way of using the java.util.concurrent API to achieve the same effect. Uses of cancel or interrupt tend to lead to race conditions, where the task can be cancelled after it has already delivered its result, and again we can get a JMXConnector leak; or we might close a JMXConnector that the main thread is about to return. I'd be interested in suggestions for simplification.

Conclusion of the foregoing

This is a useful technique in many cases, subject to the caution above. It's not limited to the JMX Remote API, either; you might use it when accessing a remote web service or EJB or whatever, without having to figure out how to get hold of the underlying Socket so you can set its timeout.

My thanks to Sébastien Martin for the discussion that led to this entry.

[Tags: .]


Bookmark blog post: del.icio.us del.icio.us Digg Digg DZone DZone Furl Furl Reddit Reddit
Comments
Comments are listed in date ascending order (oldest first) | Post Comment

  • Hmm, I come in at #4 and #5 on your 'everyone' query!

    I'm not sure that there is a completely universal solution, and simple concrete pools are available.

    I'm certainly not keen on completely abandoning threads though, ever, unless you impose an upper bound on unreaped threads, else you have created a way of blowing up long-running apps at random running out of memory, etc...

    Rgds

    Damon

    Posted by: damonhd on May 23, 2007 at 02:13 PM

  • Hi,

    what if we cannot open threads because we are using the JMX connection within tomcat ?

    Posted by: applebanana8 on May 24, 2007 at 01:44 AM

  • Damon, I agree with your comment about abandoning threads, which is why I added a caution about when the technique is valid.

    I'm not sure what you mean by "simple concrete pools are available". I'd certainly be interested if there's some library that would implement the logic of "wait for the operation or abandon it and clean it up when it finishes".

    applebanana8, if you can't create threads, then you are screwed, to use a technical term. Seriously, it would still be possible to fall back on the socket timeout option I described at the outset, but that option does vary from somewhat complicated to horribly complicated.

    Posted by: emcmanus on May 24, 2007 at 06:18 AM

  • Hi,

    I meant that things like Executor.newScheduledThreadPool(int corePoolSize) exist, so that a simple static definition can be used to set up a pool rather than writing a new class, except for paranoid perfectionists like me...

    One simple way to cap and use timeouts as just being discussed is to use a thread pool with capped size and then use the facility to have excess work handled in the calling thread when the pool gets full. That might be a reasonable compromise between simplicity and safety, though I'd still prefer to stack up async and relatively cheap file handles than expensive and big threads if possible.

    Rgds

    Damon

    Posted by: damonhd on May 24, 2007 at 01:53 PM

  • Yes, I deliberately chose a solution with an Executor so you could easily replace it with another flavour of Executor. I discarded an alternative solution that required singleThreadExecutor semantics for this reason. As you say, you could easily avoid catastrophic failure in (putatively) pathological cases with a bounded thread pool.

    I do agree that explicit connection timeouts are better if you can persuade your API to provide them. With the JMX Remote API and its RMI connector, that involves some fairly complicated magic with RMIClientSocketFactory, and it doesn't compose well with other uses of socket factories.

    Posted by: emcmanus on May 25, 2007 at 01:32 AM

  • No big deal but I'm assuming that your very last catch handler should have the following line of code instead...


    throw initCause(new IOException(e.toString()), e);

    Posted by: jswift on September 05, 2007 at 04:04 PM

  • jswift, in fact the code works as written on JDK 6. But your fix is necessary to compile on JDK 5, since the IOException constructors with a "cause" parameter didn't exist in JDK 5. Thanks for pointing that out!

    Posted by: emcmanus on September 06, 2007 at 03:11 AM

  • Hi Eamonn,

    Great article.
    One small point on the DaemonThreadFactory . Do you think it should keep an instance of the defaultThreadFactory rather than creating a new one eveytime? As it stands (for me at least), the names of the threads created are

    [Pool1-Thread1, Pool2-Thread1, Pool3-Thread1, ...]

    which doesn't seem right?
    Keeping an instance of defaultThreadFactory gives
    [Pool1-Thread1, Pool1-Thread2, Pool1-Thread3, ...]

    But again, minor point on a good article.

    Shaun

    Posted by: sabram on February 27, 2008 at 04:41 PM

  • Oops, I just realized that I modified the code to take into account Shaun's excellent suggestion in the previous comment, but forgot to say so. Thanks Shaun!

    Posted by: emcmanus on April 04, 2008 at 09:14 AM

  • Hi,
    I'm reading (http://www.performanceengineer.com/blog/monitoring-weblogic-using-jmx/)
    that it's possible also opposite problem:
    "... If there are a large number of MBeans in your monitored application, you may run into a problem using Windows where the IIOP connection will timeout...."

    To solve this problem author suggest to properly set CORBA.transport.ORBTCPReadTimeouts (cause the example refer to rmi/iiop ).
    I don't know exactly what this mean, but I suppose that this is a low level solution, and a similar result could be obtained using your smart solution using a suitable timeout.
    is it correct?
    thanks

    Posted by: lukebike on May 27, 2008 at 04:08 AM

  • lukebike, CORBA is a mysterious and disturbing black box which I try not to go near if I can avoid it.

    I think you are right that the problem described there is the opposite of what I was talking about. I was explaining how you can abandon a connection if it doesn't work after a certain time. The article you link to describes a system that already does this, but sometimes too soon. Therefore you need to adjust its timeout, using the property they mention.

    Posted by: emcmanus on May 27, 2008 at 04:36 AM

  • Good article, I've been battling with a similar problem myself. Have you thought about creating a proxy that would time out for you and connecting to the jmx server through this proxy? This way you don't need to modify the client application at all and avoid all the problems with concurrency.

    Posted by: iminar on June 10, 2008 at 10:09 AM

  • iminar, I'm not sure what you have in mind. An unmodified client application will typically connect using JMXConnectorFactory.connect(jmxServiceURL). You could do some magic with jar service providers so that a JMXServiceURL looking like service:jmx:rmi-timeout://blah/blah connects to service:jmx:rmi://blah/blah but with the timeout logic above. Is that the sort of thing you were thinking of?

    Posted by: emcmanus on June 11, 2008 at 12:49 AM

  • I'm wrote a simple proxy server that a client can connect to. The proxy server then re-sends all the requests to the (jmx) server. Since I have full access to the network socket connected to the (jmx) server, I can easily set the timeout time.

    I'm testing this solution right now. So far it looks good.

    cheers,
    Igor

    Posted by: iminar on June 11, 2008 at 11:09 PM



Only logged in users may post comments. Login Here.


Powered by
Movable Type 3.01D
 Feed java.net RSS Feeds