Skip to main content

The tales of the four Fibonacci's

Posted by forax on November 1, 2009 at 5:17 AM PST

Let me introduce a new language named pseudo (Why this name ?
Why another language ? Why God ? all these questions will be answered
in a later blog).


One great feature of this language is that it allows
gradual typing,
i.e. you can assign a type to a parameter, a local variable, etc.
or not.


If a variable as no type, the compiler consider it as any.
You can assign any value of any type to a variable typed any
you can assign a variable typed any to any variable typed with any typed
(but this may cause an exception at runtime).

any is similar to java.lang.Object in Java, but
don't require any narrow cast (casts are implicit) and
any is also a super type of all primitive types.
In fact, in pseudo, there is no distinction between reference types
and primitive types.
any is equivalent to C# 4.0 dynamic but it's
implementation requires, I think, less boxing and less conversions that
its C# counterpart.

Status of the language

pseudo will run on the upcoming Java 7 platform, a Java platform
that enables JSR 292 i.e a Java platform with the new bytecode instruction invokedynamic.


The grammar is almost finished, yes, that easy if like me you have already
written the compiler generator Tatoo (v4.2).


The type checker is half baked, the support of objects and first order functions is flaky.
The backend creates javac AST trees and let javac generates the bytecode.


Anyway, the language compiler is able to generate bytecode to compile a simple Fibonacci's function (*).

def fib(n) {
  if (n < 2) {
    return n
  }
  return fib(n - 1) + fib(n - 2)
}
  public static java.lang.Object fib(java.lang.Object) throws java.lang.Throwable;
    Code:
       0: aload_0      
       1: iconst_2     
       2: invokedynamic #2,  0      // NameAndType "__operator__:<":(Ljava/lang/Object;I)Z
       7: ifeq          12
      10: aload_0      
      11: areturn      
      12: aload_0      
      13: iconst_1     
      14: invokedynamic #3,  0      // NameAndType "__operator__:-":(Ljava/lang/Object;I)Ljava/lang/Object;
      19: invokestatic  #4          // Method fib:(Ljava/lang/Object;)Ljava/lang/Object;
      22: aload_0      
      23: iconst_2     
      24: invokedynamic #3,  0      // NameAndType "__operator__:-":(Ljava/lang/Object;I)Ljava/lang/Object;
      29: invokestatic  #4          // Method fib:(Ljava/lang/Object;)Ljava/lang/Object;
      32: invokedynamic #5,  0      // NameAndType "__operator__:+":(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;
      37: areturn

As you see, if you see something, most calls are translated into invokedynamic, but not all of them.
recursive call to fib are translated into invokestatic.
So even in a dynamic language, some part of the code are not dynamic at all.


A more closer look to that the code allows one to remark that it uses some constants iconst_1, iconst_2
and also that signatures of some invokedynamic, contains a 'I', or a 'Z'
(for the VM, it means this function takes an integer or returns a boolean).
So even in a dynamic language, 1 or 2 are not dynamic value and the result of a test is always a boolean.


The bytecode invokedynamic is really interresting because it allows to surface this kind of information
directly to the runtime system of the language.
So by example for invokedynamic at instruction 2 instead of trying all operator '<', the runtime
knowns that it has to lookup for a '<' that takes a dynamic values and an integer.

As I said in the introduction, pseudo is a language that allows gradual typing,
You can, by example, indicates that the parameter of fib is an int.


How does this change the generated bytecode ?

def fib(int n) {
  if (n < 2) {
    return n
  }
  return fib(n - 1) + fib(n - 2)
}
  public static java.lang.Object fib(int) throws java.lang.Throwable;
    Code:
       0: iload_0      
       1: iconst_2     
       2: if_icmpge     10
       5: iload_0      
       6: invokestatic  #2          // Method java/lang/Integer.valueOf:(I)Ljava/lang/Integer;
       9: areturn      
      10: iload_0      
      11: iconst_1     
      12: isub         
      13: invokestatic  #3          // Method fib:(I)Ljava/lang/Object;
      16: iload_0      
      17: iconst_2     
      18: isub         
      19: invokestatic  #3          // Method fib:(I)Ljava/lang/Object;
      22: invokedynamic #4,  0      // NameAndType "__operator__:+":(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;
      27: areturn

Wow, only the + between the two recursive calls of fib is dynamic.
Also not that n (local variable 0) at bytecode 6 is boxed to be an Integer.


Clearly, add a type to the parameter help the compiler to reduce the number of dynamic calls.
We will see further if it has a big impact or not on performance.


What if instead of assigning a type to the parameter of fib,
One just give a return type to fib.

def fib(n):int {
  if (n < 2) {
    return n
  }
  return fib(n - 1) + fib(n - 2)
}
  public static int fib(java.lang.Object) throws java.lang.Throwable;
    Code:
       0: aload_0      
       1: iconst_2     
       2: invokedynamic #2,  0      // NameAndType "__operator__:<":(Ljava/lang/Object;I)Z
       7: ifeq          17
      10: aload_0      
      11: invokedynamic #3,  0      // NameAndType __cast__:(Ljava/lang/Object;)I
      16: ireturn      
      17: aload_0      
      18: iconst_1     
      19: invokedynamic #4,  0      // NameAndType "__operator__:-":(Ljava/lang/Object;I)Ljava/lang/Object;
      24: invokestatic  #5          // Method fib:(Ljava/lang/Object;)I
      27: aload_0      
      28: iconst_2     
      29: invokedynamic #4,  0      // NameAndType "__operator__:-":(Ljava/lang/Object;I)Ljava/lang/Object;
      34: invokestatic  #5          // Method fib:(Ljava/lang/Object;)I
      37: iadd         
      38: ireturn   

Compared to the full dynamic version, the only optimisation here is that the + between the two fib calls
can be computed using iadd instead of relying on a dynamic call.
Moreover, a dynamic cast was added to convert n as an int at instruction 11.


Is this code less dynamic than the full dynamic version, this is not clear.


And the version with all types:

def fib(int n):int {
  if (n < 2) {
    return n
  }
  return fib(n - 1) + fib(n - 2)
}

  public static int fib(int) throws java.lang.Throwable;
    Code:
       0: iload_0      
       1: iconst_2     
       2: if_icmpge     7
       5: iload_0      
       6: ireturn      
       7: iload_0      
       8: iconst_1     
       9: isub         
      10: invokestatic  #2          // Method fib:(I)I
      13: iload_0      
      14: iconst_2     
      15: isub         
      16: invokestatic  #2          // Method fib:(I)I
      19: iadd         
      20: ireturn 

The generated code is exactly the same that the one generated by javac
for a static method fib written in Java. No dynamic call at all.

Benchmark

No blog entry is really a good blog entry without a benchmark :)


We have 4 versions of Fibonacci's function that are more or less dynamic,
which one is more effective, which one is less.


The best to test the effectiveness will be to test with latest
beta of jdk7 patched with mlvm repository but I've
a problem
to currently build it. I hope it will be solved soon.


Anyway, I can test with the backport:

               result      time
  fib        : 102334155   3,62 s
  fib-param  : 102334155   3,05 s
  fib-return : 102334155   3,42 s
  fib-typed  : 102334155   1,52 s

First, using a dynamic language have a cost at runtime.
Using a partially typed language reduce the overhead a bit but
the cost of boxing/unboxing is still there.


Even if there is an overhead to use a dynamic language,
invokedynamic infrastructure allow you to reduce that overhead,
a 238% overhead is not much compared to current overhead of dynamic
languages like Ruby or Groovy.

Cheers,

Rémi


* The Fibonacci's function should
coded like this.

Comments

Based on this example one

Based on this example one might be forgiven for thinking there is no point bothering with gradual typing since you had to go all the way to full typing to get any appreciable benefit. There are other arguments than performance for gradual typing, though.

Re: Based on this example

Yes, you're right.

Detecting errors at compile time is the major benefit of gradual typing.

Rémi

Could also be that the

Could also be that the example is unrealistic. The experience from commercial Lisp environments seems to be that strategically-placed type annotations can indeed make a great difference but I have no idea how many such strategic locations there are in a typical program, maybe only a handful, in which case you can't intelligently discuss the performance benefits of gradual (that is to say, partial) typing from such a micro-sample.

What happens at runtime?

Remi, cool stuff! It would be very instructional if you could elaborate what happens at runtime, i.e. one-time/repeated dynamic binding, inline caches assuming "int" as receiver type, or whatever is going on there... Matthias

Re: What happens at runtime?

Hi Matthias,

The runtime relies on invokedynamic machinery, so the runtime specializes the calling code for each call site.
At first call, the runtime gather parameters that are typed as any and try to find a corresponding implementation method
using parameter classes.
Then it installs runtime checks against parameter classes to call the same method if parameters classes are the same.
So it's a kind of inlining cache.

Rémi

Fools seldom differ....

I was just musing the other day, when reading a blog about silverlight, javafx and javascript/canvas benchmarks that what you really want is just-in-time typing. Ie use type interference to maximize runtime static typing and performance, but leave the code flexible as possible - only typing on use or where you want to. The key here is how to help the programmer minimize programming errors without the type system creating extra work via 'boiler plate code'. The key problem is in reality object-object messaging is done at the method level - but object integrity and meaning is at the object level. Ie potentially the same method signature on different type objects may have a very different result, in other situations you don't care. The problem occurs in dynamic languages where you ignore this underlying difference and don't discriminate; in typed languages the problem becomes defining loads of boiler plate interface definitions and implementations in situations where you don't care! Also dynamic calls cost more. As you point out the cost of dynamic calls can be reduced if you can analyse the code in a specific context - this could be done at compile or runtime ( where the running includes the compilation in effect ) You could imagine an IDE where the construction of the object and all methods calls on in is tracked and checked - so at the method level there is safety ( for a closed system - obviously if you use API's where the code within is hidden you can't ) Ie those API's automatically are defined by the usage of the objects - ie dynamic static typing - the type of the parameter is a list of methods used within the implementation of the API method! Ie you have dynamically generated static typing - for your particular code base and usage the types of parameters could be completely defined by usage ( unless you choose to explicitly to type at a higher level than method because you really do want to tie it down ). The type of the return params would be ANY - though again in your code base if a parameter is then returned, you could work out all the calls on that object and display - and work out if there is any problem with incoming types implementations. Maybe I don't get out enough, and languages like this exist already :-) Alternatively it could be one of those really bad half baked ideas you have in the shower.