Skip to main content

In Praise of Language Specs

Posted by cayhorstmann on June 27, 2011 at 7:25 AM PDT

When I see a language specification  like this, I run for the hills. I am a firm believer in specifications and multiple implementations. Here's a case in point. I put together an example for implicit conversions in my upcoming “Scala for the Impatient” book. 

object FractionConversions {
  implicit def int2Fraction(n: Int) = new Fraction(n, 1)
  implicit def fraction2Double(f: Fraction) = f.num * 1.0 / f.den
}

class Fraction(n: Int, d: Int) {
  val num: Int = if (d == 0) 1 else n / gcd(n, d);
  val den: Int = if (d == 0) 0 else d / gcd(n, d);
  private def gcd(a: Int, b: Int): Int = if (b == 0) a else gcd(b, a % b)
  override def toString = num + "/" + den
  def *(other: Fraction) = new Fraction(num * other.num, den * other.den)
  // other operators...
}

One always worries when there are too many conversions in and out of a particular type. If you translate the code above to C++, with two conversions

Fraction::Fraction(int n)
operator Fraction::double() const

you run into ambiguities that makes the class essentially impossible to use.

Of course, in Scala, you can turn off unhelpful conversions, by only importing the ones you want:

import FractionConversions.int2Fraction

or excluding the ones you don't want:

import FractionConversions.{fraction2Double => _, _} 

I also found reassurance in the following quote from the Odersky/Spoon/Venners book:

An implicit conversion is only inserted if there is no other possible conversion to insert. If the compiler has two options to fix x * y, say using either convert1(x) * y or convert2(x) * y, then it will report an error and refuse to choose between them. It would be possible to define some kind of “best match” rule that prefers some conversions over others. However, such choices lead to really obscure code. Imagine the compiler chooses convert2, but you are new to the file and are only aware of convert1—you could spend a lot of time thinking a different conversion had been applied!

That's great. And it is technically, literally true. But it doesn't mean you won't ever spend a lot of time thinking which conversion has been applied. Look at this:

val f = new Fraction(3, 4)
f * 5

What is it?

f * int2Fraction(5)

or

fraction2Double(f) * 5

It could be either, right? So it's ambiguous. So it won't compile, right? Except it does.

The compiler sees a * method applied to a Fraction. The parameter type is wrong, but it can be patched up, so that's what it does: f * int2Fraction(5)

Now look at the opposite:

5 * f

The compiler sees a * method applied to an Int. The parameter type is wrong, but it can be patched up, so that's what it does: 5 * fractionToDouble(f).

How can I make it ambiguous? Like this, surely:

def mul(a: Double, b: Double) = a * b
def mul(a: Fraction, b: Fraction) = a * b

Nope. mul(f, 5) yields a Fraction(15, 4) without a murmur. Huh? Aren't there two possible conversions? mul(fraction2Double(f), 5.toDouble) and mul(f, int2Fraction(5)). Shouldn't it be ambiguous? I had a hard time reading the spec (the infamous section 6.26), so I started an email thread. People had various conflicting theories. None of them were able to account for the fact that the seemingly identical

def mul(a: Double, b: Float) = a * b
def mul(a: Fraction, b: Fraction) = a * b

is ambiguous.

Daniel Sobral figured it out, not by reasoning from experience or common sense, but by reading the spec. When choosing among overloaded methods, Scala prefers the most specific.

Why is mul(Fraction, Fraction) more specific than mul(Double, Double)? Consider a call mul(0.5, 0.5). You can't use the first method—there is no DoubleFraction conversion. But with mul(f, f), either one will work. So, the second version is more general—it works in strictly more cases. The first version is more specific. More specific is considered better.

That's perhaps more intuitive in the case of inheritance. You'd want to prefer a fun(Person) over a fun(Object) when the parameter is a Person or Student. Specific is better.

If you think this is yet another proof that Scala is more complex than Java, click here and weep. The Java spec is strictly more complex in this regard.

What's the point? In a language that isn't formally specified, these rules can and do change on a whim, as implementors fine-tune the compiler to achieve this or that pretty effect. You have no recourse if your code breaks as a result.

In Scala, there is a language specification, and the behavior isn't likely to change, except by a conscious effort. If something doesn't work according to spec, I can file a bug, and there is no discussion whether it is a bug or not.

Some time ago, there was a discussion in the Java Champions mailing list whether there was any value to having multiple implementations of the JDK. Some people thought it was fine that the open-source implementors had only one choice—take OpenJDK and tweak it. Me, not so much. I am a huge fan of multiple implementations. It puts pressure on the spec authors to separate essential and ephemeral complexity, and, of course, it contributes to specs that are comprehensible and implementable. Nobody would dream of having just one implementation of HTML or C++, so why be satisfied with one implementation of the Java platform?

Related Topics >>

Comments

<p>&gt; Nobody would dream of having just one implementation ...

> Nobody would dream of having just one implementation of HTML or C++, so why be satisfied with one implementation of the Java platform?

Many C/C++ developer have dreamed for years for a single implementation - that's precisely why gcc is so popular. Multiple implementations of C and C++ meant we had to suffer the pain of porting every time we wanted an app to run on some other hardware or with a different vendor's compiler. Virtually all HTML and Javascript developers do dream of having a single implementation of HTML. Supporting all the different browsers is a nightmare.

I do see very small advantages to multiple implementations. Microsoft tried several ways to be incompatible while still conforming to the Java spec, and that caused Sun to tighten up the spec. But that benefit is dwarfed by the drawbacks of having to deal with incompatibilities (see decades of porting in C/C++ and the hassle of writing cross-browser HTML and Javascript today).

The fact that C++ has an 1100-page spec is an indication that the language is too complex. Whether it's "clear" or not is subjective - I think anything that's that complex cannot be clear by its very nature. I would say one measure of the usefulness of a programming language is its ability to have programmers understand its behavior *without* having to refer to the spec. While a language spec is necessary, the vast majority of developers will never look at it, and shouldn't have to.

<p>It seems to me that you want to have your cake and eat it ...

It seems to me that you want to have your cake and eat it too.

There is no law that forces you to port your gcc4 application to Visual C++, or your IE6 application to another browser. If you want a single target, you just declare that to be your target. Those customers who agree with you will run gcc4 or IE6. Those who dont...well, it's their loss, isn't it? You can't have it both ways--either you target a single platform, or, if you want to target multiple ones, you better hope that there is a spec.

I agree that the vast majority of developers don't need to look at the spec. I didn't need to look at the Scala spec in this instance. I could have just accepted what the compiler did, or, more likely realized that I was on thin ice and programmed more defensively. In this case, I was curious and wanted to look at the spec. No, actually I wanted someone else to look at it :-) Which is just what happened--Daniel did and settled the question.

 

 

<p>I do want to target multiple platforms, as does just ...

I do want to target multiple platforms, as does just about everyone working on real-world apps. I can't just say "this only works on IE" or "this only works on Ubuntu" - that's almost never practical for real-world products.

To support multiple platforms, I don't hope for a spec, I hope for a single implementation. Fortunately, I have Sun's single Java implementation, ported to multiple platforms. The alternative is to have a complete spec, and then another implementation such as Harmony, gcj, or Kaffe/CLASSPATH. None of those work reasonably well, and never will. That's because the spec (and even the TCK) can never have enough detail to ensure it. It's conceivable that a gcc application will just compile and run correctly on, say, Visual Studio. But that won't work for, say, Java vs. Harmony. The libraries are just too huge to make that practical.

IMHO, having only a single implementation is a huge advantage - no porting. That's perhaps Java's biggest advantage over C and C++.

<p>&nbsp;The biggest problem with C++ (in therms of ...

The biggest problem with C++ (in therms of complexity) isn't that it has a complicated spec. It's that the spec allows undefined behavior in certain places, either becuase it's easier to implement that way, or because it's a feature that lets you get down to the nitty-gritty bits of your architecture. The other thing that makes porting so difficult is the presence of extensions in the various implementations of the language (and even G++ has those). HTML and JavaScript have a similar problem, which is that the players aren't interested in following the spec.

It's the lack of holes in the spec, and Sun/Oracle's dogged determiniation to defend the spec that makes different Java implementations 100% compatible with each other.

<p>I agree completely. And the presense of extensions if the ...

I agree completely. And the presense of extensions if the fault of the *spec*, not developers. The Java spec says "these are the *only* keywords". IIRC, the C/C++ spec says essentially "these are the keywords, but we're not disallowing the addition of others." Hence, every implementation complies with the spec, and very few programs actually work across different implementations.

Sun's done a great job defending the spec, but the reality is that multiple implementations will never be 100% compatible, mostly because the libraries are just too huge to specify a complete spec. e.g. When are two DateFormat objects "equal"? The Javadoc will never specify that exactly.

<p>Hi Cay,<br /> Scala is more complex than Java for that ...

Hi Cay,
Scala is more complex than Java for that case.
Java selects the most specific method among the applicable methods using only the inheritance graph which is, as you say, something more intuitive.
Implicit conversions helps the writer of the code not the reader, and we read more code than we write, I let you drawn your own conclusion :)

I agree with you about having more than one implementations is something important.
But in the case of your examples, there exist (at least) two Java compilers, OpenJDK javac and Eclipse ecj, that both exercices the Java spec
and there are also two implementation of Groovy, Groovy and Groovy++, even if Groovy++ is not fully compliant to Groovy by design.

Rémi
 

<p>I'll let you guys to decide which is the most complex ...

I'll let you guys to decide which is the most complex one, in technical terms. But I agree with the substance of Remi's comment, that Java is more intuitive. Less surprises are possible. And yes, I prefer a language which facilitates reading to writing. I feared implicit conversion rules after being burned by C++ - while I agree that C++ lacked lots in clear specs and that Scala might be better at it, if you need to read the specs to understand a piece of code there's something wrong.

<p>When C++ first came out, lots of C programmers ...

When C++ first came out, lots of C programmers said: It's totally unintuitive that I can't tell which function gets called when I see x->someVirtualFunction(args). Crazy stuff, that polymorphism. In C, what you see is what you get. Of course, in Java, dynamic dispatch is the norm, and most people would agree that is a good thing. It makes the code easier to read because you operate on a higher conceptual level.

The Ceylon designers say "compile-time overloading is evil" because (as this blog shows) you can't always tell which method is called without turning on the apparatus above the neck. They set out to eliminate it from the language, and I wish them luck. What will people do when they have two methods of the same conceptual name with different parameter types? Hungarian notation probably. RectangularShape.setFramePD(Point p, Dimension d). I don't think that makes code easier to read.

I disagree that implicits always make your code harder to read. For example, in Scala, an implicit conversion lets you operate on strings as if they are collections of characters. That makes string operations more uniform--code is easier to read.

Yes, you can create complex situations, but that's true for any language feature. I just read through a flamewar about how Scala was deficient in that it didn't offer enums. The proponents for adding this feature trotted out the most byzantine examples of Java enums that I had ever seen. I vaguely knew that you could abuse Java enums, but I hadn't realized that people actually do that.

And finally, whatever you can say about C++, it does not lack clear specs. The specification is very thorough, and it clearly states what happens with implicit conversions. It's just that the wrong thing can easily happen because you can't control their scope.

 

<p>Ceylon allows to use named parameters to avoid ...

Ceylon allows to use named parameters to avoid overloading and hugarian notation.

The main problem is not that the code is harder to read with implicit, it's harder to debug.
Implicit conversion creates intermediary objects so when you call foo(bar), foo() doesn't receive bar as argument but another object converted from bar so you introduce 3 risks: user may forget that conversion, user may have trouble to find the applied conversion, the bug can lies in the conversion. The later is horrible because fixing the conversion may introduce new bugs at non obvious places.

If you want a more uniform code, you want objects behave like having the same type which basically means object having the same set of functions. So the solution is to be able to add functions to an object already defined and not to convert an object to another. Extension methods a la C# is the raw way to do that (you lost the polymorphism), Groovy metaclass/meta-protocol is the generalized way to do it. There is a wide range of solutions in between. But implicit is not the solution, it introduces too much pain.

Rémi

 

<p>The Fantom language has shown that no overloading works ...

The Fantom language has shown that no overloading works well - http://fantom.org/doc/docLang/Structure.html. In fact I rate no overloading as one of the biggest decisions you can make to improve a language (key benefits flow to reflection and type inference). To make it work you have to have default parameters, which is very useful anyway. The only place where overloading can be useful is constructors. But Fantom has a solution there that allows limited overloading, which is all that is actually needed.

Scala's implicits are, IMO, going to be one of the key planks in its downfall. They are simply too random and too hidden to be usable long term. However, simple automated type conversion IS a necessary feature for a real future language. I outlined my vision here - http://fantom.org/sidewalk/topic/1309 (explicits, not implicits)

<p>It will be interesting to see how Fantom evolves to ...

It will be interesting to see how Fantom evolves to address this issue, but I wouldn't hold my breath waiting for implicits to bring down Scala. IDEs are getting the hang of showing what conversions are happening, and some (many?) will prefer this selective display to the old-fashioned "show me everything in the source code, whether or not I want to see it" approach. It's really no different from type inference where one mouses over a variable and sees the inferred type.

In C++, the problem with type conversions isn't that they are automatic, but that you have so little recourse when they don't work the way you want them. In Scala, the right thing happens almost all the time, but if it doesn't, the language gives you the tools to fix it.

I share the desire for a language that is so simple and so rational that one can understand everything that happens with a quick glance at the source. But those languages don't necessarily do well in the marketplace because they don't let the wizards produce the kind of magic that ultimately gives rise to the kind of frameworks, libraries, or tools that make people switch. If Ruby didn't have metaprogramming, Rails would have never existed, and where would Ruby be today?