Skip to main content

Android = Java

Posted by opinali on August 17, 2010 at 12:38 PM EDT

The Java community is now swamped with discussions about Oracle's patent suit against Google's Android platform. I've been contributing my opinion in several places, but there is one critical topic that needs repeating the same comments everywhere... so, this blog spills the beans once and completely.

The 8th Millennium Problem: Android = Java?

The announcement that a researcher had proved P != NP, a few days ago, caused lots of enthusiasm in the programming community - at least for a couple days, until the first reviewers showed several flaws in the proof. I have studied the subject in my CS grad, but admittedly I don't know the advanced math to follow these proofs (P = NP? is one of the Clay Institute's Millennium Problems for a good reason.) So, let's talk about a much simpler equation: Is Android equivalent to Java? Notice I didn't say equal, I've said equivalent, just like in P = NP.

Equivalent class/bytecode formats

In many levels, the Android = Java equivalence is obvious. Android apps are written in the Java(TM) language, and compiled by the JDK's javac compiler (or equivalent, like ECJ). This produces standard Java bytecode (.class files). These files are then converted into Android's .dex, for all practical purposes just a different file format for Java classes. Yes it's a better format; an improvement over Sun's ~1994 design. But you can also take a GIF image and convert it into the superior PNG format, and both images will be perfectly equivalent even though the byte streams are completely different.

Equivalent file formats are largely implementation detail, usually for optimization. We could avoid all the trouble of fighting the MPEGLA video codec patents, for example, if we simply settled for a less efficient video stream without sophisticated, differential cross-frame compression techniques.

Android's different classfile design had several motivations; but avoidance of Sun's IP was certainly a major factor. Anyway, Google didn't move enough away of Java. Both formats are very equivalent. They differ in specific low-level data structures, but they are semantically identical, storing the exact same information. I'm sure a JavaSE or JavaME VM could easily add a .dex parser to its system classloader to load "Android classes".

The Android SDK relies on the fact that the .java -> .class -> .dex conversion is both trivial and lossless. The "lossless" part is important: While GIF = PNG, a lossy JPG file is less equivalent - it won't decode the same exact information. If the JVM and Dalvik were really independent, you could hardly write a relatively simple tool that converts compiled code from one form to the other, without any compromise: no loss of information, no bloat to compensate features that are first-class in one VM but not in the other, no extra runtime layer to implement one VM's core APIs in terms of the other's.

(I know how complex the dx translator is. I've looked the source code. The bytecode translator is big, a full decompiler/re-compiler, complete with SSA building. But this translation is still conceptually trivial; the mapping from Java to Dalvik bytecode is smooth, by design. Stack versus register architectures is optimization detail; important things like the VM-level typesystem are identical.)

Equivalent VMs

The Dalvik = JVM equivalence is also easy to show. It's not just the source or bytecode formats: it's their runtime counterpart too. Once an "Android class" is loaded by the Dalvik VM, it walks like a Java class and quacks like a Java class. If you know Java programming (down to advanced and low-level details), you know Android programming. It's just a matter of learning some new APIs and framework concepts. They are equivalent systems.

Remember Microsoft's .NET? When .NET was introduced, the Java community was quick to denounce .NET as a Java ripoff. I was in that crowd, but today I know better. Yeah it was largely a ripoff; the C# 1.0  language for one thing... the easiest way to distinguish programs of either language was style conventions - e.g. toString() versus ToString(). But in the critical VM specs, Microsoft did a good homework. The CLR, CLI, and core frameworks, are sufficiently different from Java so we cannot state a JVM = CLR equivalence. You can't run a simple file-format conversion tool on your compiled Java classes and get something that runs on the bare .NET runtime.

Want proof? Just look at the IKVM. This is a very interesting project that enables cross-compilation of Java into .NET, so your Java code will run unchanged on top of the CLR (or equivalent .NET runtime like Mono)... except that the IKVM is not a simple, dx-like file format converter. The conversion from Java classes, and the adaptation of its core APIs, to .NET, are very complex, for anything beyond HelloWorld. Internals of each platform like reflection, security, concurrency, exception handling, bytecode verification, I/O and other core APIs, are roughly similar in feature set but completely different in details and corner cases - forcing IKVM to jump through endless hoops so Java code will run on a .NET VM. This also needs a very large layer of extra runtime, basically the full JavaSE APIs adapted from OpenJDK sources. I've been loosely tracking IKVM's development for years - reading the great IKVM Blog - so I have a good idea of the massive effort that takes to adapt Java code and JavaSE apps to .NET. (The work is not yet complete; and the parts that are complete often have some performance tradeoff.)

(The old Visual J++ Visual J# was not a simple Java-to-.NET translator either. I won't discuss it, but it's sufficient to state that Visual J#'s compatibility with Java was much inferior than even very early IKVM releases.)

I've brought P = NP to the debate; somebody could bring Turing-equivalence and state that any Turing-complete platform / language / VM is equivalent to any other. This is true, but irrelevant. The Turing model is way too general; taking it by face value would destroy the entire software patent system (not a bad thing though!). We need to draw the line of JVM-equivalence in the sand, closer to pragmatic needs than Turing-equivalence. In my opinion, the trivial binary format translation, and extremely high source-level and runtime compatibility, puts Android definitely inside the line of Java equivalence.

Equivalent APIs and Runtime

Android uses a pretty big subset of the JavaSE APIs. These APIs (from Harmony) are clean-room implementations, but they have JavaSE as a model. Harmony would even be JavaSE certified, if not by the TCK licensing issue. This doesn't change the fact that the Harmony and JavaSE APIs are completely equivalent - on purpose, not by accident. As Charles Nutter, of JRuby fame, recently wrote:

Android supports a rough (but large) subset of the Java 1.5 class libraries. That subset is large enough that projects as complicated as JRuby can basically run unmodified on Android, with very few restrictions.

It seems that Dalvik is close enough to the JVM that it should be fully compliant with a big chunk of the JVM specification, including the complete and very detailed JMM (as Android supports Java-style threading and concurrency, down to the advanced java.util.concurrent package). So much for "Dalvik is a new VM" or "Dalvik doesn't run Java classes" (statements found in 90% of the blogs and forums debating the suit).

Final Thoughts

This blog is not about the merits of the Oracle vs Google suit. I will ignore (and I may delete) any off-topic comments (outside the issue of Android = Java equivalence). I'm just sick of the "Android is completely unrelated to Java" nonsense; Google and Android advocates must find much better arguments than this.

(I am saving my full judgment of the suit for the future, when all details and outcomes are known. Unless you have inside info (I don't), don't be naïve. Stay cool. We don't really know Oracle's - or Google's - full intentions and plans. We don't know the behind-the-scenes story, since 2007 when Google first announced Android (causing massive disruption of the JavaME ecosystem) and Sun was pissed off but had to put their tail between the legs. I don't buy altruistic motivations from any billion-dollar, shareholder-controlled company: not Google, not Oracle, not even old-beloved Sun. Anyway let's wait and see.)

I don't think Google was incompetent when they created a Java-based platform that didn't deviate further from Java (like .NET did). Dalvik, and the Android frameworks,  are probably as good as you can get while balancing the desire to keep huge compatibility with existing Java code and libraries, Java talent, and Java toolchain. Microsoft took the longer breath to create .NET without the benefit of instant migration from Java. Google didn't.

The Android = Java equivalence is obviously not all-inclusive in both sides (not a bijection). Each platform has some unique APIs, and of course, Android is a complete operating system including a Linux-based kernel, graphics and telephony stacks, etc. I'm obviously only talking about the common parts: the Java-based userland / application frameworks that rely on Java sources, Java classes (whatever the format), Java APIs (including thousands of common JavaSE APIs), and very remarkably a Java-like Virtual Machine. A precise statement of Android's relationship to other Java platforms might use the concept of Editions or Profiles. I remember a blogger saying something like "there's no 'J' in Android". Well, it's never too late: my suggestion is renaming it to Java GE (Java Google Edition). That would clear the confusion once and for all. ;-)


UPDATE: I wrote this blog after some research to refresh my knowledge of the discussed issues; but since posting I've read some tweets worth attaching here. Oracle's Terrence Barr points to a Good analysis: "Is Android Evil?", from Andreas Constantinou. (My summary: Android is quite closed from the POV of handset vendors and carriers; this doesn't make Android "evil" but it surely puts Android safely outside the rainbow-pooping-Unicorns domain of fully-open platforms... not that Java is in that domain either, like Oracle's suit shows.). From the other side of the fence, Google's Bob Lee states that Dalvik's design was driven by technical concerns, in his knowledge IP wasn't a factor; Bob links to the article Dalvik Optimization and Verification With dexopt. (My summary: Yes there is a lot of good, technical ideas behind the different Dalvik VM. That is not disputed.)

Comments

Yes, Android = java, only

Yes, Android = java, only that the java.* classes in Android came from OS Apache Harmony, not from the Sun releases which are now owned by Oracle.
But I won't comment about the merits of the Oracle vs Google suit either.

Virtual Machine

I'm new to Java, but what strikes me from reading the first few pages of many books is that one of the central tenants of Java is the VM. Android will not run Java programs because it does not have a VM. A prime example of this is Pogo.com. I cannot run any of the games on Pogo because my Android phone does not have a Java VM. Just my $.02

good article

Thanks for this wonderful article. But I may not agree with you completely. As a beginner if someone asks me is Android = Java, I would say NO. Java is a programing language whereas Android is a full OS. Wouldn't it be more appropriate to call it Dalvik =JVM rather than Android=Java?

Maybe...

...but then, Dalvik is not the only piece of Android that's similar to Java; a big chunk of the Java SE core class libraries are identical too. But my choice for the title "Android = Java" was more rhetorical than an attempt to be extremely precise. :)

Nice article!

Nice article Osvaldo! Good read. :) One has to be extremely partial or without proper knowledge to not understand that Android is a Java clone. Google is a copy-paste expert company. That's not to say I don't like google or I'm in favor of Oracle law suite, in fact I love Google and would love them to continue growing at this astonishing rate. (off topic) P.S: I don't understand the captchas on java.net, how are they supposed to prevent automated spam if their difficult lies on mathematical problems, are computers worse than humans at resolving mathematical problems?

Android SDK is what JavaME should have been

Oracle and all Java lovers should look forward to help Android grow. Mobile development needs this. My 2 cents

Google should have...

...paid up for licenses and helped evolve J2ME, used more MySQL support, hosted on Sun grids, not allowing Sun to become an easy acquisition target :) Too late for that. Herein a lesson, NIH syndrome and constant reimplementation of well-known concepts will bite you the harder, the bigger you are, because someone is holding a patent for what you'd done.

Nice try some good points...

Hi, This will be decided in court so I will not comment on that. Android=Java is true from a high level perspective but from a SE perspective it is not true. When Android can run Swing I will start to believe it. I understand from a Google performance and memory resource perspective why they would not want all the libraries but then what happened to the modular concept of only load what is needed rather than leave out functionality in SE? Regarding Google blogs I did notice but realized the goal of this site is to promote java otherwise Oracle would have transformed it along with all the old Sun web sites. And not surprising the old Sun engineers (sorry, I know some of you are below 40yrs old) probably are not focused any longer on Desktop or J2ME because of their new jobs. It hurts but that is the way it is. I am a desktop developer on my spare time (for over 13yrs sometimes fulltime) and not a mobile device developer right now and who knows I might change my spots some day. :-) Good to see such passion for java still! Best Regards to all! Tony Anecito Founder, MyUniPortal http://www.myuniportal.com

True, but not relevant to the suit

You make some very interesting and valid technical points. None of which have any bearing on the Oracle/Google suit, but to be fair you didn't claim that they did. What matters is not how JVM-like the Dalvik VM is, it's whether Dalvik infringes seven specific patents or not. My hunch is that several of the patents will be shown not to be relevant, Google will find prior art for a few, and just one or two will be left for a cross-licensing deal.

A cross-licensing deal...

is a pretty likely outcome, I agree. One thing makes me a little cautious, though, is that while Oracle is a software company that benefits from patents and cross-licensing, Google is at the core a marketing, advertising and market research company, and may not see continuing support of software patents in the courts as a good thing in the long run. I just hope it gets cleared up in a reasonable time frame.

Things that make you go hmm

I can't help but notice the pro oracle slant to the blogs on this oracle owned site, it seems at odds w/the reaction from rest of the java community sites. I'm just sayin.

The bias is pro-Java...

...not necessarily pro-Oracle (or pro-Sun in the old days). Speaking for myself, I've set up my blog here years ago because java.net is a relatively "elite" portal for Java-centric blogging, news and other content. You can't automatically create a java.net blog after you have a java.net account; there is some lightweight Editorial control and gatekeeping; blogs must be on-topic for the Java community. See also java.net governance. Oracle, of course, has power and influence (AFAIK they pay the bills...); on the other hand there are no ads, and no spam of offtopic material (java.net's blogging feed is the single such aggregated feed that I can stand having in my newsreader).

But yeah, I guess many java.net bloggers are Java enthusiasts that are at least, not very opposed to the official leadership of Java (Sun in the past, Oracle now).

Good article

Well informed article, I learned new things (about Android ecosystem). And it remains on a strict technical ground (something that lot of commenters didn't understand, obviously...).
Just a little niggle: AFAIK (but I don't know much on the topic), Visual J++ largely predates .NET, it is contemporary of Visual Basic 6... I recall Sun suing Microsoft for making a non-compatible "Java".

Not "that" Visual J++ :-)

I wanted to refer to the later Visual J# product that was indeed a Java-language-to-.NET bridge (it came with a Conversion Assistant thing, and a .NET Framework extension for J#). These are all dead now too. Sorry for the confusion.

Dumb ass

By your logic Microsoft Visual C++ is the same as GCC, or OpenOffice is the same as MS Word, or Oracle Financials is the same as SAP ERP. Its about PATENTS not FUNCTIONALITY.

++1

Further, if MS create a java to .net bytecode translator and the translator plus .net clone of jre library is complete enough to run most of java project, may we say .Net = Java?

It depends

If this combination of translator + libraries is too simple, then yes, we could say that .NET = Java. For a (sarcastic) example, if the binary code translator was as simple a a "XOR 0xFF" of each byte from the Java classfile, anybody would recognize the .NET code format as a badly-disguised ripoff of Java's format. Dalvik's dex format is certainly not that, but it's also not different enough to be considered something independent (like I discuss at length and depth in the article).

Notice once again that the similarity of binary formats is not important in itself; not even important techniques like Dalvik's register-based bytecode (very different from Java's stack-based instructions) or .NET's dynamic-typed bytecode (again very different from Java's separate opcodes for each basic data type). The binary formats are only important in the essential bits that are bound to the VM architecture. For example, the JVM only supports a specific set of basic data types: byte, short, char, int, long, float, double, object references, and single-dimension arrays of all these. And a single complex type construction, the Class. (Booleans are partially supported; there are no separate bytecodes or on-stack/heap type, the VM uses ints.) And this is of course, the exact same type-system of Dalvik. But .NET VMs have a very different type-system (a much richer one BTW - IMHO, the single advantage of .NET over Java). There are also other important fundamental .NET x Java diffs, like .NET's Assemblies. But doing this kind of platform-definition comparison of Dalvik x Java, you won't find any significant difference.

As for the libraries: as IKVM shows, you need a massive library layer (that's significantly difficult to port, and adds signifcant overhead to apps) to make Java applications run on a .NET VM. This shows that .NET and Java are different. On the other hand, you need ZERO extra libraries to make pretty big Java apps run unchanged on Android, because the exact same libraries are available. And this shows that Android = Java. Once again, the most important part is not that the libs were copied, but the fact that these libs run on the VM "bare metal". These low-level libraries (java.lang etc.) are EXTREMELY tightly-coupled to the VM architecture; so if you have a different VM that can use basically the same libs, that is by definition a Java VM.

Yes, Visual C++ and GCC are equivalent...

...notice however, that this is not a disputed issue, because the C and C++ languages are not trademarked, copyrighted, or patented. Nobody "owns" these languages and their standard libraries, compiler technology, or related file formats (i.e. COFF and ELF - the compiled-code formats of libraries and executables). That's why we don't see companies suing each other over the high level of similarity between different C/C++ toolchains, not to mention the platforms built on top of C/C++.

Things are different for Java (for better or worse). There are Java trademarks, copyrights, and patents. There are standards that are controlled by the JCP, that is in turn was mostly controlled by Sun and is now mostly controlled by Oracle. Google tried to avoid Oracle's trademarks, copyrights and patents, by using clean-room rewrites of the APIs, not using the 'J'-word, replacing some of the APIs, making some changes in VM architecture and file formats, etc. And it's quite possible that they did a sufficiently good job wrt the disputed patents - IANAL, and that was not the subject of my blog. I'm only debating the wider claim (of many Android advocates) that Android is such a completely, mind-boggling-different system from Java. This is patently false. People who don't know Android and/or Java well enough are easily fooled by superficial things like the different .dex file format.

If C++ were like Java, but owned by Microsoft, and MS was suing the FSF over GCC (or suing Oracle over its Sun Studio C++ compiler or whatever), would you (as a fanboi/advocate of the latter company) claim that these compilers/toolchains are independent, unrelated things? You can write million-line codebases that will compile perfectly on several compilers (ok maybe with some #ifdefs here and there, and restrictions like not having functionality like GUI that's not convered by C++'s standard libs). And that's my point, I'm only making the (obvious) case that Android and Java are just as similar/different as Visual C++ and GCC. (Thanks for the extra analogy and argument... dumbass.)

This is ridiculous. You can

This is ridiculous. You can not trademark, copyright nor patent a language!
You can trademark the word "Java". This was not violated according to Oracle's claims.
You can own copyright of source code. No source code was used unlicensed. It is completely OK to implement the same functionality, that does not violate copyright.
You can have patents on patentable technologies. Java itself is not patented and a language is not patentable. The patents in the actual claims are ridiculous.
It does not matter that C++ is like GCC or Java is like .NET or Java is like Android, or my bread is LIKE your bread. That's completely legal.

You are totally trolling. What does it mean that Android is LIKE Java?! Yeah BSD is LIKE MacOS, and NetBeans is LIKE Visual Studio, and Goethe's poems is LIKE Nietzsche's. So what? That's not violating any copyright or whatsoever. This is called BUSINESS, INNOVATION, INVENTION.

I'm sorry, but Oracle is not the only company in the world allowed to license a programming language. Even if it is LIKE any other language...

I didn't say that a language can be copyrighted etc.

I was not referrring to Java-the-language; "Java" means multiple things, I could have been more explicit saying e.g. "Java SE Platform", but this gets boring to write every time so I just assume that reader will have some intelligence and attention, and also consider the context of my previous writing.

It doesn't matter indeed that the language compiled by GCC is the same language compiled by MSVC++. But both compilers/toolchains/SDKs contain tons of specific techniques - compiler algorithms, runtime designs, file formats - which ARE patentable (at least in the US). And for the 10th time, that was not even the subject of my blog. I'm only saying that Android is largely a Java copycat. (I'm not claiming that this fact is bad by itself - it's not; or that Google didn't add lots of great innovation on top of it - they did.)

++1

++1

arbitrary equivalence

Just LAW does not work this way. You can NOT claim, that this and that LOOKS like the thingy you did, so you want money from them. Arbitrary defined equivalences do not hold in court. Just because you make bread you can not sue others also baking bread. Of course there'll be always people who claim to have written a verse and therefore want exclusive rights on poetry...

First, there is copyright: Apache Harmony, Spring, .NET etc. have a completely different CODEBASE, copyright does not apply! Just like using Oracle's GPL JDK in any GPL project is untouchable in court.

Second, there are the totally ridiculous software patents (only in the US, not in the EU for God's sake). The software patents have nothing to do with Java or technology; all major companies try to patent the double linked list and similar "inventions". Then they try to sue anyone to rip off profit and stop others from innovating... Fortunately Google has lawyers good enough invalidate those nonsense patents they are sued with.
Read this to see how ridiculous Oracle's ACTUAL claims are: http://blog.headius.com/2010/08/my-thoughts-on-oracle-v-google.html.

Did you RTFA?...

...because the link in the end of your comment is one of the very few links in my blog, and clearly identified - so I assume you're a typical troll who jumped the gun without considering carefully what I wrote...

Anyway, I'm not debating law or the Oracle suit (once again RTFA). But replying to that: Oracle is not suing Google because Android is similar to Java; it is suing Google because they believe that Android violates a bunch of patents that cover Java technology. It's a completely different issue. BTW, this kind of suit would be possible even if Android was a completely unrelated system, with full-new language, APIs, VM architecture and most other things I mention in the blog (y'now, the blog that you did not read). Many of these patents are general enough that they might be violated by (say) .NET, Ruby, or some other completely non-Java platform. And yeah that's a problem with patents, but that's another story. But the fact that Android is so much similar to Java, certainly makes the dodging of its patents much harder than it is for other systems that were not created as Java clones with some pieces changed, and have way more freedom in their design space.

Android = Java

Fascinating read. Thanks!

.

.

Hear what you are saying...

I hear what you are saying; but by Sun/Oracle's OWN definition Android != Java. Android does not fit within any of the 3 silos as mandated by Sun/Oracle, and Harmony (which Android uses) is still also not recognized as Java either - if Harmony claims this, they can be sued! Sun/Oracle first and foremost HAS to play by their own rules.

The fact that I can write a program that verifies a NP problem in P time is analogous to the fact that I can write a program for Android and verify it on a JSE. Just as it does not prove that P=NP, it also does not prove that Android = JSE.

At the end of the day, this still has nothing to do with Java (read: syntax), it has to do with patents and corporate powerplay brought on by the failure of Sun to foresee Google's calculated, but legit, maneuvers. Google could just as easily have gone with the C# syntax, which even permits supersets, but then it would probably just be Microsoft rather than Oracle running to the lawyers.

You are correct, but that's irrelevant

Oracle is not suing Google for violating the Java(TM) language or JavaSE(TM) or JavaME(TM) platform trademarks. Google doesn't use any trademarked name, and they don't pulicly claim compatibility (not even partially) with these legally-controlled platform definitions.

Your comment on P=NP is a bit confusing. (Readers unfamiliar with this problem just check Wikipedia.) I guess you're saying that we can compile/debug/test Android code in a JavaSE VM, but this doesn't prove any Android=Java equivalence, right? Well I didn't claim that, I made very specific technical points, not any general claim like this. You can even compile arbitrary C code (with low-level memory access, inline Assembly and all) to run on top of the JavaSE; there are pure-Java CPU emulators and full PC emulators that allow that... but the enormous effort of emulation clearly makes these techniques irrelevant to prove that Java and C are equivalent systems.