Skip to main content

Dealing with real-time data in UIs (and others)

Posted by carcassi on February 15, 2012 at 12:57 PM PST

I've been working for a number of years now in the NSLS-II Control System group, creating tools that hook up to the control system. What I do is soft-real time stuff (I can drop data on the floor, I don't have hard latency requirements, etc.) mostly dealing with writing clients to display the data and let operator interact with the control system.

In these conditions, you have have to think through all your threading, throttling, caching and queueing, or you'll have unresponsive UIs that do not even show current data. Surprisingly, the problems are always the same, but they are tedious and easy to get wrong. I was able to put together a general purpose library that takes care of all the plumbing, but before going into that I though I would give an overview of the problems.

These issues are actually relevant to most UIs in any language, (typically the one that don't shine for responsiveness) so a brief introduction may be useful to many.

What not to do

Let's say you have some library that notifies you asynchronously (a control system, a web service, a file system, SNMP, ...), something like:

source.addNotificationListener(new NotificationListener() {
    public void notification(NotificationEvent event) {
        // Now what?
    }
}

You are tempted to do something like:

source.addNotificationListener(new NotificationListener() {
    public void notification(NotificationEvent event) {
        myWidget.setFoo(event.getBar());
    }
}

Which is horrible because UI subsystem are typically single threaded, so this is just a mess. So, you consult your toolkit documentation (or any blog since this has been blogged to death), and you figure out how to dispatch a Runnable to the UI thread. So you write something like:

source.addNotificationListener(new NotificationListener() {
    public void notification(final NotificationEvent event) {
        MyToolkit.asyncExec(new Runnable() {
             myWidget.setFoo(event.getBar());
        }
    }
}

Now this is less horrible, but you may still have a problem: the event may just point to a buffer that is guaranteed to have the event data only within the callback. After that, it may get rewritten with a new event, so you may or may not be accessing the right data, without proper synchronization, which is still horrible. Even if you get the synchronization right, you still need to notify the UI whenever that object has actually changed, or it won't repaint leading to the worst thing you could have: the data in memory is not the data on screen. Better to have an application crash, die and display no information at all than display the wrong information!

So you may need to make a copy, and an immutable one would be better. If it's mutable, you may need to worry about locking and check what effect can the locking have on your UI thread (every ms lost there, is lost forever).

OK, so you get the javadocs, you figure out that you indeed need to make a copy, you make it immutable, and you have:

source.addNotificationListener(new NotificationListener() {
    final Bar barCopy = copyBar(event.getBar());
    public void notification(NotificationEvent event) {
        MyToolkit.asyncExec(new Runnable() {
             myWidget.setFoo(barCopy);
        }
    }
}

Good, now you have something that at least writes the right data on the right thread... which works ok when it does, and it's horrible under load. See: you are pumping tasks in the (single) UI thread at the rate that you get from your source (possible from multiple threads). Say you have 100 listeners, and each notifies at 10Hz: now you have 1000 notifications a second hammering the (single) UI thread. Which is, again, horrible.

First, this may be a complete waste. The screen refreshes at 60/70Hz, so any update faster than that is useless. If it's text you are displaying, and you are changing it 50 times a second, who is going to be able to read it? Updates faster than a couple of Hz may be more "precise" but can be counterproductive in this case.

Second, you can start queuing up tasks faster than the event thread can process, which can completely clog the event thread making the ui unresponsive, and can, after some time, exhaust the memory you have available, and crash the whole application (if lucky) or leave it in a completely undeterminate state (if not lucky). Not a good thing.

And here is the main take-home message: when you are listening to asynchronous events, you need to decouple the rate at which they are received from the rate at which they are processed. You often need to aggregate, so that a number of events on one side generates a single event on the other. And while you are it, you may want to do the heavy lifting computation, so that your UI (if you have it) gets only what it needs to put on screen when it's done.

What you need

In the vast majority of cases, you'll end up with some incarnation of this:

On the right half you have processing that goes at the rate set by the source. On the left, you have processing at a rate set by the destination. Don't be fooled: there is always an optimal rate above which things either get very inefficient or extremely useless. You need to show data to the UI? Above 50 Hz is going to be useless. You have to write data in a database? You may not want to send more than a query a second. You need to write to disk? You may want to fill a buffer. So, even in the optimal case, you'll have to decouple rates.

In the middle, you have what I call the collector: it takes data at one rate, and it is being asked at another. This is where the synchronization needs to be done. And this is where you have to decide what to do with the extra notifications: do I only need a cache with the last one? or do I need a cache with the last N? or do I need a cache with the ones that came in the last N seconds? or do I want a queue, with only the new ones? And if the queue fills up, which data should I dump? Should I save it to do disk? And if the disk is full? Should I send an alarm to someone?

Between the source and the collector, you may have some processing. You may want to extract only the data you are interested so you don't waste memory in the queue, or you do some pre-computation on it, including the copying we talked about.

On the other side, you have what I call the scanner: it starts the processing at the desired rate. This could be a timer, could start based on some notification from the collector (maybe based on how much data was collected), could be triggered by readiness from the target subsystem. Whatever is the logic that starts it, the scanner will need to throttle back: it will need to adapt the rate based on what the target subsystem can handle. Suppose you want to update your UI at 50 HZ, but your UI thread is busy: you need to skip, or you are just going to compound the problem. Same thing if your database is not responding as fast as you'd like. By skipping and consolidating the processing in fewer notifications, you will typically have a system that is sturdier and much better at "catching up" when it's behind.

The scanner will trigger some other processing, aggregating the data both in time and from different sources, computing the end result. And in the end, you will ship the final result to the target subsystem. In the UI case, it will be the queueing to the UI thread of your toolkit.

Plenty of cases fit this picture, and, once you look at it this way, it's actually pretty simple. But it's tedious: you have to do all these things, always, for all the applications that you are building, and you have to test that the collector is doing it right, that the scanner is throttling back, ... If you, like me, need to crank different applications, then you'll want a framework to do this. A framework that can work with whatever the destination needs to have and the source gives. A framework that does not care whether you are using Swing, SWT or if you have a UI at all! Where you can unit test the pipeline with mock datasources and without the UI. And that is what I have been working on. And that's the topic for another post.

 

AttachmentSize
DecouplingRate.png89.66 KB
Related Topics >>

Comments

The article is missing a link to Gabriele's ...

The article is missing a link to Gabriele's work:

http://pvmanager.sourceforge.net

Nice to hear from you :-) even though with some ...

Nice to hear from you :-) even though with some delay (I've been in a black hole). I agree with you, I've seen this problem lots of times (even though not with the numbers you experience with your job) and decoupling is the key. The way I'm approaching it is by means of agents, which sound as same stuff (decoupled threads, queues of messages, etc...). I've done a very simple prototype and now I'm trying to see whether it can be implemented on an existing actor framework, such as Akka. Unfortunately I'm really behind my original schedule.

One problem I've had is that decoupling these too much leads ...

One problem I've had is that decoupling these too much leads to a lot of latency in updates. This isn't too bad for updates of values, but for status updates this can cause trouble. We had one very painful demo years ago of a system where status updates were all polled across 3 layers, and the updates were very slow. The (potential) customer would pull a cable, start looking at his watch and tapping his foot while the status information took 30 seconds or a minute to burble up through the api.

Since that experience I've tried to have push-based status updates wherever possible. This does complicate things, because the status push may not be in the same thread as the periodic update, but from a user perception point of view, having a snappy response to errors is very valuable, expecially when demonstrating and testing the system.

 The latency that I am introductin is on the order of a ...

 The latency that I am introductin is on the order of a couple of millliseconds, mainly because I didn't bother making it better. It could be turned to 0 latency the following way:

  • Assume max notification rate is 1000 Hz (i.e. minimum spacing 1 ms)
  • Collector notifies scanner that new data is available
  • If last notification happened more than 1 ms ago, process right away
  • If last notification happened less than 1 ms ago, schedule a new one for 1 ms after the previous notification

In case you get a burst of 1000 notifications, the first goes through with minimum latency, all the other are combined in the next for maximum throughput.

The way that I think of decoupling is not push vs pull. It's the ability to have 1 outgoing notification for N ingoing notification, so that the incoming rate can be different than the outgoing rate. However you are doing it (push or pull), it does not matter: whatever works best in that case. But you need to do it. If you are not prepared to "batch" requests, you may waste time under stress, make it a lot harder for the system to catch up and introduce delays: I have seen UIs that kept going for minutes after an initial burst before they caught up with the live data!