Skip to main content

JavaFX Life: From Script to Production

Posted by opinali on June 4, 2010 at 11:20 AM PDT

I've finished the development of my Game of Life, with a couple final fixes and new features... including a solution to the bad performance reported before. Once again the work has uncovered some surprises; read on.

Un-Scripting JavaFX Script

The first version used a "scriptish" style, all code thrown in a single .fx file, only average effort in structure. Now I have three files: World.fx with the World class (data model and Life algorithms); IO.fx with new support for loading patterns; and Main.fx with the UI. This refactoring required declaring some classes, functions or properties to public[-read|-init]. Some extra noise, but the Java veteran inside me feels much warmer and fuzzier with encapsulated code. I still appreciate though, the facility to bang prototype code without thinking about such issues.

I'm a bit annoyed with the absence of private visibility, but arguably that's unnecessary: if you have global functions/variables or multiple classes in the script, you are likely in the prototype stage and won't bother with encapsulation. On the other hand, I'm worried that the javafxc output uses public visibility for all source features, losing VM-level enforcement of visibility.  The bytecode contains some annotations like @ScriptPrivate but these serve only to the compiler, they are ignored by the VM's classloading and verification. You cannot trust JavaFX's visibilities for security purposes. A more important impact, perhaps, is that bytecode optimization/obfuscation tools can't take full advantage of restricted visibility for closed-world analyses.

Some I/O and Parsing

Several people complained that it's too much work setting Game of Life (GOL) patterns manually, one click per cell. The Internet is literally infested with GOL resources. (Indeed, the web can be divided in four major groups: Game of Life; Fractals; Retrocomputing; and Boring sites. Thanks to me, java.net is just moving out of the Boring category.) There are many popular GOL programs for all computers since the ENIAC, and they have developed a few standard file formats, the most popular being LIF and RLE (each with a couple variants...). The LIF (Life 1.06) format is braindead simple, it can be parsed with very modest code:

function parseLIF (text:String):Point2D[] {
    for (cell in text.split("\n") where indexof cell > 0) {
        def xy = cell.split(" ");
        Point2D { x: Integer.valueOf(xy[0]) y: Integer.valueOf(xy[1]) }
    }
}

My parsing functions uses JavaFX's Point2D as a cell coordinate; the output is a sequence of such coordinates for all "live" cells in the pattern. I can use Java's string manipulation facilities including regular expressions, so the job is pretty easy. JavaFX's sequences and generator contribute again for minimal coding.

Problem: I've used String.split(), not available on the JavaFX Mobile platform. The compiler will catch this only if I reconfigure the project for JavaFX Mobile.

RFE for the people writing IDE plugins: allow me to create a project of type "JavaFX library", that I can configure to any of the JavaFX profiles including common, so the compiler will allow me to use only the strict set of APIs (from both JavaFX and the underlying Java runtime) that are guaranteed to exist in the selected profile. Notice that the mobile profile is a proper subset of desktop but only for the JavaFX APIs; for the underling Java APIs this is not true, I cannot use mobile as a G.C.D. configuration for code that should run in any profile, because this would allow the project to use JavaME-specific APIs that are not available in JavaFX Desktop (even CLDC alone, includes at least javafx.microedition.io - the base GCF package).

This remembers me that the Generic Connection Framework is a great API that should really be available on JavaSE. That was the plan of JSR-197, but unfortunately this idea never took off. JavaFX lacks a full-blown I/O API; javafx.io is a good start as an "80/20 rule" API for higher-level needs, but many complex programs will be tied to tons of java.io / java.nio / java.net / ..., or equivalent JavaME APIs. Except that they wouldn't, if the GCF was an official part of JavaSE. Perhaps Oracle should promote this idea - add the JSR-197 jars to the JavaFX runtime as an extension package (i.e. a separate jar, only downloaded or loaded by apps that need it). But inclusion in JavaSE would be much better, perhaps also solving that platform's embarrassing deficiency of standard support for some kinds of I/O (yes I know about JavaComm, which is another part of the problem, not the solution.)

But the LIF format is very dumb (bloated files); what you really want is the RLE format:

function parseRLE (text:String):Point2D[] {
    def lines = for (line in text.split("\n") where not line.startsWith('#')) line;
    def header = for (l in lines[0].split(", ")) l.substring(l.lastIndexOf('=') + 1).trim();
    def x = Integer.valueOf(header[0]);
    def y = Integer.valueOf(header[1]);
    var currX = 0;
    var currY = 0;
    def run = new StringBuffer();
    for (line in lines where indexof line > 0) {
        for (i in [0 ..< line.length()]) {
            def c = line.charAt(i);
            if (Character.isDigit(c)) {
                run.append(c);
                null
            } else {
                def len = if (run.length() == 0) 1 else Integer.valueOf(run.toString());
                run.setLength(0);
                for (l in [1..len]) {
                    if (c == '$'.charAt(0) or currX == x) {
                        ++currY;
                        currX = 0;
                    }
                    if (c == 'b'.charAt(0) or c == 'o'.charAt(0)) {
                        def cell = if (c == 'o'.charAt(0)) Point2D { x: currX y: currY } else null;
                        ++currX;
                        cell
                    } else null
                }
            }
        }
    }
}

The big, outer for loop that contains most of parseRLE() will produce (and return) a Point2D[] (the return type declaration is optional, the compiler could infer it). The entire state machine that parses the RLE format is inside this for. Each step through the state machine will either deliver a Point2D value that is appended to the return sequence (actually, to a sub-sequence that is eventually flattened into the return), or a null value that is ignored (JavaFX Script's sequences cannot contain null; inserting null is a no-op). It's a nice example that justifies both the auto-flattening and the restriction of nulls. These features let me code parseRLE() in a quasi-functional style, without any ugly explicit sequence mutation. The only explicit variables are the locals currX and currY, part of the state of my state machine. The remaining state is the iteration variables line, i and l, but these are all "managed" - JavaFX Script fixes Java's mistakes by not allowing user modification of loop control variables, and also function parameters. This makes the for construct functional, unless you throw extra variables and assignments.

This is the popular Glider pattern in RLE format:

# The Glider
x = 3, y = 3
3o$o$bo

The most remarkable thing in parseRLE() is the ugly handling of characters, e.g. if (c == '$'.charAt(0)). JavaFX Script doesn't have a first-class character type, a common trait of scripting languages. The problem is, JavaFX Script does not "box" chars - coming from non-FX APIs like String.charAt() - into strings of length 1. These chars remain with the Character type. But the language doesn't have a character literal syntax; '$' is a string and not a character. Writing if (c == '$') will grant you a compiler error about incomparable Character and String types.

RFE: Either add a character literal syntax, or promote chars to strings (but with the necessary unboxing optimizations, to keep the efficiency of a simple char wherever possible).

The problem is bigger than his, however; even strings are a second-class type in JavaFX Script. It seems to me that strings should be handled as a special kind of sequence, which elements are characters (or 1-char strings). I want to iterate a string with for (c in line); I want to get a substring with slicing syntax like line[5..<10]. Today you can declare a variable with the sequence type Character[], that is even optimized internally with a special-cased sequence class CharArraySequence; but that is completely unrelated to the String type.

First-class support for strings could be added as compiler sugar. The same old good, efficient and interoperable java.lang.String class could be used to store string data, without any extra wrapper; but the compiler would overload the syntaxes of sequences and for to handle strings. As a simple example, the code:

noSpaces = for (c in line where not Character.isSpace(c)) c;

could be de-sugared into this (Java) code:

StringBuilder noSpaces$sb = new StringBuilder();
for (int c$index = 0; c$index < line.length(); ++c$index) {
    char c = line.char(c$index);
    if (!Character.isSpace(c)) {
        noSpaces$sb.append(c);
    }
}
String noSpaces = noSpaces$sb.toString();

Now I know that this is easier said than done, because it's not just throwing a handful of special-case translations. The right way to do this requires that strings and sequences are "normalized" to a sufficiently homogeneous AST, so the code generation is able to implement either common or separate handling as necessary, for every combination of strings vs. other kinds of sequences, as well as with other language features.

The language already performs some custom handling of strings, for interpolation with {}. A great start, but we need more :) besides sequences integration, first-class (and portable) regex support would be another hit. This obvious RFE is already filed as JFXC-2757: JavaFX Script should support regex literals, and as the comments explain, it's not that as easy as in other languages that have this feature because there are interactions with binding and triggers. (But this means also, that first-class regex would be more powerful than in other languages.)

Reading from the Web

I won't embed Life patterns in the program; it will fetch these from the web. The site conwaylife.com contains many patterns, well organized and available in stable URLs and in several formats. The front page is also a great Life Java applet, a surprise for me because it loads very fast and smooth. When I wrote the original Life blog & program, I didn't find this superior Java GOL (but that's a very complex, optimized implementation - the Game of Life (and cellular automata in general) allows some crazy optimizations - not adequate to my purposes).

public class LifeRequest extends HttpRequest {
    public-read var result:Point2D[];
    override var onInput = function (is) {
        try {
            def sb = new StringBuffer(is.available());
            while (is.available() > 0) sb.append(is.read() as Character);
            result = parseRLE(sb.toString());
        } finally {
            try { is.close() } catch (e:IOException) {}
        }
    }
}

Class LifeRequest makes an HTTP request to an URL that contains a Life pattern in RLE format, then reads the input stream and parses it. Yeah the code that consumes the stream is stupid (one byte at a time). But it seems the underlying HTTP stream - for the record, a FX-specific com.sun.javafx.io.http.impl.WaitingInputStream - is buffered; I didn't notice any performance impact reading large patterns. Once again I wish we could have some extra string power, or perhaps higher-level I/O APIs. I cannot use methods like read(byte[]) because I don't want to write a Java class just to allocate a nativearray. And I don't want either, to rely in additional JavaSE-only APIs like BufferedReader; even with that wouldn't help a lot - I'd still a loop, invoking readLine() for each line and using a StringBuilder. What I really need is an API that "slurps" the whole stream into a string. Or perhaps something more JavaFX-style like being able to create a "view sequence" of several component types (think java.io buffers); this would probably need the language to offer forward-only sequences that support sequential iteration but not random access (but this opens yet another big avenue of new language designs... let's skip that).

def patterns = [
    "b52bomber", "B-52 bomber",
    "blinkerpuffer1", "Blinker puffer",
...
];

Follows a static list of the Popular Patterns offered by the site mentioned above. This is a simple list of key/value pairs, where the key is part of the URL that will fetch the data. Except of course, that this is a flat sequence. Now I can plug once again my favorite RFE: I need a native map data type. :-)

/*** Begin Digression: How "complete" should JavaFX be?

At this point, I hear some people screaming - "just use Java!!" for these things that JavaFX is not yet ideally suited, like nontrivial string manipulation or I/O. And not pile layers of new RFEs demanding the language to become more powerfulcomplex and the javafx.* APIs more completebloated.

In fact, I often don't even need to drop to Java code, I can just use Java APIs directly, in the Java way (without insisting in support for sequences and other JavaFX features) but all inside normal JavaFX Script functions. That would be uglier JavaFX Script code, but would arguably be smoother than moving some code into a separate .java source, with a different syntax and harder integration e.g. for methods that would need to call back into JavaFX Script objects. (Only problem here is that I cannot allocate a nativearray from JavaFX Script.)

All so-called scripting / higher-level languages assume that you may have to fall back to "system" code for some tasks. That's why languages like Ruby, Python, Perl etc., have a system interface (to C language / native shared libs) that's much less torturing than JavaSE's JNI. For alternative JVM languages it's even better, the system fallback usually means calling Java classes, not C/native code. Even with issues like the SE-vs-ME fragmentation, Java is usually an order of magnitude better than C as a system-level language for carrying the load that a higher-level language cannot. (For the few exceptions, there's still JNI so you lose nothing... well, except for that JNI=torturing detail.)

The only issue of course, is where exactly to draw the line. People coming from Java may consider JavaFX already good enough. You can't build a complex app in pure JavaFX, but so what? "It's a goddamn UI DSL! Just use Java for any non-UI work." I don't see that way, I think JavaFX has great potential to be a great platform on its own.

Even if you by the DSL argument, the frontiers between application layers is blurred and dynamic... even in a well-architected front end, the UI typically shares significant code with other layers: POJOs, validation, general utilities. And you have lots of communication between these layers, e.g. querying some business Facade to populate a form. This is typically smooth when all layers share a single language and SDK, but much harder otherwise. And what happens when you change your mind or find a design mistake, and need to push a bit of code from one layer to another? Any refactoring that straddles a barrier of language/SDK will be much more difficult, certainly beyond the ability of IDEs's automatic and safe refactoring commands... Obviously, it's much more convenient being able to code the entire application in a single language/SDK. Then you fall back to the system level in a much more limited and ad-hoc manner, e.g. to optimize a performance-critical algorithm, or to better reuse a system library that doesn't have a wrapper for the higher-level language/SDK, or for legacy support, etc.

The high-level language/SDK should provide at least the reasonable basics, on all fundamental features. That RFE for a built-in map type is fundamental, because you can go very far with "only" sequences and maps, while only sequences is definitely limited (if you ignore performance, having only maps would be less limited; maps are more general). But having a very rich data structures library, like JavaSE's java.util, is not fundamental - I'd say >95% of the Collections API are just performance optimizations (or convenience algorithms/APIs e.g. Stack) over the basic list/sequence & map that most scripting languages offer as their single built-in data structures.

Notice that language-integrated data structures are very powerful; the compiler can often perform decisions such as selecting a specialized implementation of sequences or hashtables that's more efficient for a specific program usage. You don't need manual choices such as ArrayList vs. LinkedList: you trust the compiler to do that choice. Only when the compiler fails in such magic optimizations, and only when that failure is found to be a significant performance problem, you optimize it manually.

I don't want to bloat the JavaFX APIs either, but many interesting FX-specific APIs could be implemented as a thin layer over some SE/ME-specific APIs. We still need that FX layer because it makes the same features more portable, and more powerful and easier to program as the API can take advantage of features like binding, sequences and first-class functions & closures. This is again not different from other JVM languages, see for example Groovy or Scala. Both communities seem to believe that it's worth the effort and runtime size, to either wrap or replace many Java APIs like JAXP, Swing, Collections, concurrency, JDBC; or to provide full-new frameworks for critical tasks like web development. Not to mention the languages that are independent from the JVM and carry over their own completely independent set of standard libraries for everything, plus big app frameworks (e.g. Rails for JRuby).

Compared to these languages, JavaFX would need less and lighter API wrappers. The language is very close to Java (Groovy looks closer to a superset of Java; but Groovy's dynamic typing and high reliance on metaprogramming make it actually much less close than the surface syntax suggests). Different from the likes JRuby, there's no need to support any feature or library that was not designed for the JVM. Different from Clojure, there's no radical paradigm shift towards full-blown functional programming. I think we could have a nice set of "thin wrapper" APIs, with very small weight in runtime size and CPU/memory overhead, to cover a very good range of extra functionality like XML(*), I/O, concurrency, perhaps some enterprise / distribution stuff (CDI and some extra client-side support for trivial consumption of EJB / JMS / JAX-WS servers), etc. The NetBeans JavaFX Composer already has some draft of this - if you add a JDBC Data Source to your design, Composer will spit ten .fx files into your project - a thin FX API for things like RecordSet. But everybody hates these IDE-proprietary libraries. I guess that in the future, these will evolve into official JavaFX APIs, e.g. javafx.sql. The canonical example is JavaSE 6's GroupLayout, first born as a proprietary library of the NetBeans "Matisse" Swing editor.

(*) Yes JavaFX does XML, but it's a simple API with its own small parser implementation. The same is true for some other JavaFX APIs that one could imagine to be thin wrappers for Java APIs. This is actually nice for light weight (no Mb-size parser like Xerces making your applets slower to load) and portability (exact same parser implementation used in all JavaFX profiles). But some apps will need the full power of JAXP, and JavaFX could make this power available, with a friendly JavaFX wrapper, at least for the higher profiles like desktop and tv.

End Digression: How "complete" should JavaFX be? ***/

Back to the UI...

def patternCB = ChoiceBox {
    layoutInfo: LayoutInfo { width: 160 }
    items: for (p in patterns where indexof p mod 2 == 1) p
}

This new ChoiceBox allows me to pick one of the patterns.

onMouseClicked: function (e:MouseEvent) {
    if (e.button == MouseButton.PRIMARY and not
            (e.altDown or e.controlDown or e.shiftDown or e.metaDown)) {
        world.flip(xx, yy);
    } else {
        def req:IO.LifeRequest = IO.LifeRequest {
            location: "http://www.conwaylife.com/pattern.asp?p={
                patterns[patternCB.selectedIndex * 2]}.rle"
            onDone: function () { world.set(xx, yy, req.result) }
        }
        req.start();
    }
    toolbar.requestFocus();
}

I've changed the existing mouse event handler: now only the left mouse button will toggle a cell. For the right button (UPDATE: or your Mac's single-button + any control key), I pick the ChoiceBox selection, do some simple arithmetic to get its "key", build a full URL, then invoke the LifeRequest. I provide a onDone handler that passes the result (as well as the closured-captured cell position) to the new World.set() function:

public function set (x:Integer, y:Integer, cells:Point2D[]):Void {
    for (cell in cells) {
        def xx = (x + cell.x) as Integer;
        def yy = (y + cell.y) as Integer;
        if (xx >= 0 and xx < SIZE and yy >= 0 and yy <= SIZE)
            this.cells[yy*SIZE + xx] = true
    }
}

The latter is pretty easy. It would be half the size if I didn't have to cast Point2D's coordinates to Integer (this reuse of Point2D was questionable... but I'm lazy). Notice that the Life pattern is contained in a rectangle, and I rubber-stamp the live cells in that rectangle to the world, using the selected cell as the top-left corner.

Exercise for the reader: (or maybe I will do it later) Make the right-click-down event activate an outline rectangle with the exact width/height of the selected pattern; so at right-click-up the pattern is actually set in the world. This needs reading the pattern at right-click-down, so you know its shape... a better idea is reading it even before, when the ChoiceBox selection is set or changed; just do that in background so the UI doesn't freeze. Then the pattern loading would appear to happen instantly. In a variant of this idea, instead of a boring rectangular outline, the preloaded pattern could be overlaid (with the obvious translucency-with-radial-fade effect) on top of the live world, until you "drop" it in the desired position.

Behold!...

The finished program, for this version - click to launch. (If you didn't read the whole blog: use right-mouse click, or click while pressing any control key, to load the selected pattern at the cell under mouse pointer.)

Life2

The source code is now 3 files and ~200 LOC, including imports and metadata for 25 patterns. Notice that the "oscillator" patterns are also good for performance benchmarking.

The screenshot above is taken with Prism; it's noticeably different from the previous screenshot (antialiasing of the rectangle borders). I'm not sure which toolkit is "wrong" here, but most likely Prism as it is still in early access, and its output looks more "blurred".

Performance Mystery I: JavaFX Script Functions

The JavaFX team clarified to me that they don't recreate the internal scene graph nodes after property changes (like I do with Rectangle.fill); this destroys my obvious shot for the cause of bad performance. On the other hand, they found that text formatting and rendering (for my status label) was a bottleneck (at least for the simpler testes without actual Life action). Part of the problem here is bug JFXC-3483: Use of String.format for string concatenation hurts performance.

I tried now simple quick profiling with the NetBeans Profiler, and a lot of cycles go in binding (remarkably runtime methods like notifyDependents()), and in several compiler-generated methods like World$1Local$57.doit$$56(). As it turns out, javafxc is compiling some of my functions into something... different. My World.life() method, that calculates the new state of a single cell, contains an inner class 1Local$57; this class is a closure that captures all local variables from the life() method (the parameters x and y, the local count, and the receiver this). In short, the entire content of the life() function is wrapped as a closure. This is the (decompiled) code generated for the "do it" method of the closure. (The mangled names and synthetic methods should disappear in JavaFX 1.3.1, thanks to JDI support - at least in the debugger and profiler, but not in decompiled bytecode.)

public boolean doit$$56() {
    _cls57 receiver$ = this;
    VFLG$Local$57$count = (short)(VFLG$Local$57$count & 0xffffffc7 | 8);
    applyDefaults$(0);
    _cls57 _tmp = this;
    int yy$ind = Math.max(y - 1, 0);
    for(int yy$upper = Math.min(y + 1, get$SIZE() - 1); yy$ind <= yy$upper; yy$ind++) {
        int yy = yy$ind;
        int xx$ind = Math.max(x - 1, 0);
        for(int xx$upper = Math.min(x + 1, get$SIZE() - 1); xx$ind <= xx$upper; xx$ind++) {
            int xx = xx$ind;
            if(elem$World$cells(yy * get$SIZE() + xx))
                $Local$57$count = get$Local$57$count() + 1;
        }
    }
    return get$Local$57$count() == 3 || get$Local$57$count() == 2 &&
        ((Boolean)isLive$bFunc$int__int(FXConstant.make(Integer.valueOf(x)), 0,
        FXConstant.make(Integer.valueOf(y)), 0).get()).booleanValue();
}

This code is pretty good... except for all the closure overhead. The closure class contains several other methods, and invocations to life() must go through all this baggage including allocation of the closure, extra indirection for locals lifted to the heap, and full binding support for locals (!). This overhead is not related to the first-class status of JavaFX Script's functions (a different, very efficient mechanism is used to wrap functions into values).

The life() method finishes invoking another function, isLive(), which is compiled with even extra weird stuff (name mangling, different calling convention) that's due to being a bound function.

And it gets worse: if I add to life() a conditional return statement before that function's end, this return is compiled as a closure's non-local return. That means raising a (runtime-internal)NonLocalReturnException that will be handled by the (also generated-code) caller. Non-local returns are necessary to allow the code inside a closure to break/continue a loop that contains the closure, or return from the method that contains the closure. Java exceptions are a great mechanism to implement non-local returns. But it seems that javafxc is abusing this technique, using the non-local return exception for trivial return statements that are not non-local returns - in javafxc-generated closures, no less. Also, it seems the technique is not implemented efficiently, showing Throwable.() as the third top CPU hotspot in one of my profiling sessions.

Then I further investigated this issue, and discovered that this trivial optimization...

function life (x:Integer, y:Integer) {
    var count = if (cells[y * SIZE + x]) then -1 else 0;
    for (yy in [max(y - 1, 0) .. min(y + 1, SIZE - 1)])
        for (xx in [max(x - 1, 0) .. min(x + 1, SIZE - 1)])
            if (cells[yy * SIZE + xx]) ++count;
    count == 3 or count == 2 and isLive(x, y)
    count == 3 or count == 2 and cells[y * SIZE + x]
}

...would change the generated code into:

public boolean life(int x, int y) {
    World receiver$ = this;
    int count = elem$World$cells(y * get$SIZE() + x) ? -1 : 0;
    int yy$ind = Math.max(y - 1, 0);
    for(int yy$upper = Math.min(y + 1, get$SIZE() - 1); yy$ind <= yy$upper; yy$ind++) {
        int yy = yy$ind;
        int xx$ind = Math.max(x - 1, 0);
        for(int xx$upper = Math.min(x + 1, get$SIZE() - 1); xx$ind <= xx$upper; xx$ind++) {
            int xx = xx$ind;
            if(elem$World$cells(yy * get$SIZE() + xx))
                count++;
        }
    }
    return count == 3 || count == 2 && elem$World$cells(y * get$SIZE() + x);
}

The whole closure overhead was gone. No closure class anymore. A single method is generated, which bytecode is just as efficient as what javac would produce for equivalent Java code. No locals lifted to the heap, no extra binding support, etc. Notice for example, the simple "count++" instead of the previous gobbledygook "$Local$57$count = get$Local$57$count() + 1".

The big performance screwup was the fact that I was invoking a bound function, isLive(). This caused the caller function life() to "inherit" a ton of overhead that's apparently necessary to deal with bound functions. But this is probably a compiler bug/limitation, because isLive() is not itself a bound function, unless I don't understand the reason for that compilation strategy.

The bad news is that javafxc has some potential performance bugs (or missing optimizations):

  1. Inefficient use of NonLocalReturnException: a) Use in places where it is apparently not necessary; b) should reuse a preallocated exception object;
  2. Absence of optimized compilation of script-private functions that are never used as values (don't need the code for "first-class" support);
  3. Unnecessary propagation of overhead from bound functions to common (non-bound) caller function;
  4. Induction of binding overheads for local variables that are lifted to closure fields;

All these issues must be confirmed, I'm not intimate with the javafxc compiler. Alas, the identified overheads are actually pretty common in other high-level languages... although they are often "hidden" inside interpreters or runtimes, but "exposed" in JavaFX Script which is fully static-typed and compiled. This exposure is good because programmers can easily spot useless bloat and complain about it. ;-) The compiler will certainly keep improving its intelligence to only add extra overhead where it is really necessary.

But if I found a single important new fact about JavaFX's performance, that's it: Bound functions are expensive and dangerous. The extra overhead is not limited to the compiled code of the bound function itself, or even to call-sites; if you have any common function that contains call-sites to any bound function, this entire function will compiled with lots of extra overhead. In my Life program, the bound function was very simple so I just manually inlined it. Otherwise I would have refactored it into a pair of functions: a (possibly script-private) function that performs the actual work, and a public bound function that wraps over it and is only invoked by code that really needs the bound behavior.

Performance Mystery II: Redundant Binding

This section could also be titled: "I am stupid".

Text rendering performance was still a major problem, so I proceeded to investigate it. I know that Java's string formatting APIs are somewhat expensive, but they shouldn't be that bad - the profiler was showing some enormous overhead, in CPU and memory allocation, coming off places like Matcher.() and Formatter.format().

Then I noticed the bug. I have a label with a bound expression:

Label { text: bind "({animSlider.value as Integer}) Gen: {world.gen} Pop: {world.pop}" }

The bug is simple: the variable world.pop is updated incrementally, once for each live cell, in the method World.run().

public function run ():Void {
    ++gen;
    var pop = 0;
    cells = for (y in [0 ..< SIZE]) for (x in [0 ..< SIZE]) {
        def cell = life(x, y);
        if (cell) ++pop;
        cell
    }
    this.pop = pop;
}

Fixing the bug was trivial: I created a local variable pop, so I can do a single update to the field in the end of the method. The previous code was forcing the entire rendering of the Label (formatting, rasterization, layout, clipping...) to be repeated for each live cell accounted in each generation.

This is the flipside of JavaFX Script's binding feature to be so simple, so seamless: you don't notice the overhead. There are not explicit setters or firePropertyChange() calls. A Swing programmer would never make this kind of mistake, because the property-change stuff is all explicit. Spotting this kind of performance bug is difficult, maybe due to the immaturity of tooling: no JavaFX-specific support in profilers. Two JavaFX engineers, who told me that they found a huge bottleneck in the Label formatting and rendering, didn't notice the cause.

My new rule of thumb: Don't update public[-read] properties inside loops. Ever. Even for non-public properties, you are advised to avoid repetitive updates. Just mirror the property in a local variable, and update the field only at method's end.

Even in Java this is an interesting micro-optimization, although in JavaFX Script (definitely not a system-level language) we're not supposed to use such low-level techniques... except if, as demonstrated now, there are new, higher-level reasons for that. ;-)

Conclusions

My Life program is now incredibly faster; it runs the "Life" test at full 64 rows @ 50ms delay, without dropping frames, scoring ~19.9 fps. Memory allocation is much saner at ~1095Kb/s (~1 young-GC of 4Mb, costing only 3ms each 4s). CPU usage is still higher than that of a competing Swing program, but that's due to my purist use of sequences and binding; I could easily optimize these... but I'm happy that I didn't, because this pushed me to find my real performance problems.

The graphics / animation engine is not the bad guy that I suspected in the previous blog. It's not doing any stupid reconstruction of the entire internal scenegraph just because I change a trivial fill property of some nodes. Even the string interpolation bug was ultimately insignificant.

People planning to use JavaFX for advanced animation and games must only take some care, like not allowing an avalanche of binding events in every frame, and not updating bound(able) properties inside tight loops (duh!). I also advise to completely avoid bound functions in code that's even remotely performance-critical.

As a final note, I know that my animation strategy is "wrong"; I shouldn't trigger direct changes to the scene graph when a new Life generation is calculated. I should use a separate Timeline to refresh the display. The current strategy, coupling internal state changes to display updates, makes impossible to run GOL in high-speed mode - I can easily calculate many thousands of generations per second, but no graphics technology would be able to catch up in that much frames-per-second.