 |
My new opensource project: Flying Saucer, an all Java XHTML renderer.
Posted by joshy on June 18, 2004 at 01:05 PM | Comments (18)
I normally try to be even handed, un-biased, and bi-partisan; but today
I'm going to shamelessly use my muchly vaunted position as a highly skilled
blogologist in field of java.net to plug my new project: Flying Saucer, an
all Java XHTML + CSS renderer.
When I was doing research for my two part series on HTML renderers for
Swing (pts 1 &
2)
I got to thinking Why are there so few renderers, and almost none that
are opensource? Is it really that hard?. Initially I tried to fix some
of the bugs in the HTMLEditorKit that comes with Swing, but found a slew of
private methods that prevented me from improving it through subclassing. I
also tried editing the Swing code itself during a four hour cross country
flight, but to no avail. And even if I had made the changes I couldn't have
released them with the current Swing license. With so little to go on I
struck out on my own. I mean, really, how hard could it be? :)
Two months of hacking an hour here and there I found out how hard. It was both
easier and harder than I expected.. To make life easier on me I decided
to leave out everything that didn't directly have to do with rendering HTML on
the screen. I dropped the idea of making a complete browser with a UI,
javascript, debugging, bookmarks, history, etc. Those are all important but
not the hard part. Since CSS and XML parsing are already provided by other
libraries I reused them instead of reinventing the wheel. It's not that I
dislike wheel inventing, it just that I have to be realistic about what
one programmer can do in his spare time.
With that out of the way it was still too big. The core renderer of
Mozilla took years to develop and the full time work of several top notch
programmers. Gotta cut it down. Thinking about it, though, I don't need a
complete webbrowser like IE or Mozilla. Since this is for embedding in my
own programs most of the content will be generated by my own code. I don't
need to deal with every browser bug and every malformed webpage out there.
I could get by with strictly compliant pages and worry about quirks
handling in a compatibility library. (to be built later, of course. :)
Now, if I'm going to write a new renderer from scratch I should go for
the gold, ie: complete XHTML + CSS 2.0 (now 2.1). It's actually not as hard
as I thought. The W3C makes some completely exhaustive
specifications. The CSS 2 spec in
particular is huge, but it does describe in great and explicit
detail how each feature should behave (giving Internet Explorer no
excuse for it's bugs).
So what do we have. I've written a renderer that takes a
org.w3c.dom.Document object with inline styles and renders it into a
scrollable JPanel. Most of plain HTML is supported, as are the full box
model, tables, and images. Parts of relative, absolute, and fixed
positioning work. The main issue is the bugs in float/clear, and the lack
of forms support. Oh yeah, and it's really, really, really slow (Need some
help with that. I used regexs for the text parsing). But it works and the
supported features are pretty compliant.
I was actually surprised at how much I've done by myself. Still, I know
the limits of one programmer, which is why I'm launching it as an
open source project right here on Java.net. And I need your help!
This is a challenging project that needs some top-notch people. If you
feel you are up to it then sign on to the project and join the dev
mailing list. In particular I want to start breaking it up into modules and
find owners for each:
- Rendering compliance (research the latest standards, build it, test it)
- CSS (parsing, converting, optimizing, rendering new features)
- HTML->XHTML converter (support for HTML 3/4.0 pages)
- Browser component (a standalone webbrowser based on Flying Saucer)
- Javascript support (how do we plug in rhino?)
- I18N efforts (how do we make sure the whole thing is i18n-able)
- forms (input, select, text area, etc)
- object and plugin support (SVG, xforms, flash, pdf)
- printing module (we can't print anything yet!)
- network module (ssl, redirects, image loading, etc.)
- performance and memory optimization (how do we make the whole thing as fast and light as possible)
I really think that a complete XHTML renderer is a vital component of
any modern programming toolkit, and I'd like to see Flying Saucer become
the best of breed implementation for Java. It's a lot of work but it's
going to be rewarding. Come on in. The water's fine!
Oh, to just play with it quickly check the source out of cvs or download this
zipfile. You only need Ant and the 1.4 JDK. Run ant
test to launch the test program. Select different tests from the
Test menu.
Update: I forgot some links. the project is here and the mailing list is here.
Bookmark blog post: del.icio.us Digg DZone Furl Reddit
Comments
Comments are listed in date ascending order (oldest first) | Post Comment
-
I don't get it...
You reviewed previous releases and found (about) 15 individual implementations of the same idea.
I pointed you to one that is quite complete and open source. (Apache licence)
And you go and re-invent the wheel.
Whats up with that?!
Posted by: zander on June 19, 2004 at 03:19 PM
-
I don't get it...
Based on what I found there wasn't anything that was open source and also supported XHTML and CSS to any reasonable degree. Calpane was the best I had found but hadn't been updated in a long time and I couldn't get in touch with the author. Had I known that it was open sourced at the time I wrote the articles (which was actually about three months ago) I probably wouldn't have started my own project.
All that said. I think there is a need for both a forward looking XHTML renderer and a backwardly compatible HTML renderer. What is the code for Calpane like? How does it support CSS? Directly or through transcoding?
Another note. What do you think is the best license to use. Some have said that LGPL is too restrictive?
- Joshua
Posted by: joshy on June 19, 2004 at 07:17 PM
-
Open Source
Hi Joshua,
I'm not aware that CalPane does any CSS... The calpane cleanup is a bit stalled on my side due to lack of a good refactoring application. (vi does wonders; but it will only take you soo far..) I'm confident that that will change soon though.
If you have some new-found experience with xhtml and css parsing I would only find it more effecient to combine your code with the calpane code. (as pointed out in another thread the calpane code is in the http://uic.sf.net CVS)
On licencing:
When I started open source Java there was a lot of talk about the LGPLs wording being that it would effectively like GPL if it ever came to court. Naturally the libraries I use should be usable for my boss, since i spend time in my boss' hours on this.
This is the reason I choose the Apache licence for the UICompiler project.
There are a couple of companies using the project and a couple of dozen individuals, and we have had some good feeback, but only some 2 people that actually came over to contribute code or bugfixes.
In the end the difference between the Apache license and the LGPL for a Java application is just not visible.
So the right question (IMOHO) is just between GPL and Apache style licencing for any open source Java project.
Posted by: zander on June 20, 2004 at 12:32 AM
-
you just couldn't resist, could you...
The CSS 2 spec in particular is huge, but it does describe in great and explicit detail how each feature should behave (giving Internet Explorer no excuse for it's bugs).
Keep your timelines and terminology straight...
When the CSS 2 implementation in IE was created the CSS 2 specification was not finalised, so any inconsistencies are more likely than not caused by Microsoft working against a draft version which has seen changes since.
I'm sick and tired of the constant jabs and stings against Microsoft, especially in unrelated documents (were this a review of IE you might have some reason for your statement were it correct) and certainly if the author doesn't even take care to get his facts correct since he obviously doesn't WANT to get the facts when those facts might break down his concept of Microsoft as the Great Evil.
Grow up...
Posted by: jwenting on June 21, 2004 at 12:27 AM
-
I don't get it...
I think he started this project before your post.
You could have done more yourself to publicize calpane as an opensource project. Instead you rolled it into your existing UI compiler project where it is easily overlooked by the many people interested in having an all Java webbrowser.
Why don't you start a new project on Java.net independent of your UI compiler project? Maybe the calpane source can be merged into Josh's project?
Posted by: rabbe on June 21, 2004 at 05:47 AM
-
you just couldn't resist, could you...
I don't consider them a great evil. I'm sure that they were working with a draft when IE 4.* came out. They probably even had some employees on the draft committee. The fact remains, however, that the spec has been finalized for several years; and it's very specific, not leaving much room for interpretation. By now they should have released a new version that correctly implements a spec that,
according to the W3C spec, CSS 2 was finalized in 1998.
- Joshua
Posted by: joshy on June 21, 2004 at 06:06 AM
-
Re: I don't get it...
| Why don't you start a new project on Java.net independent of your UI compiler project?
Because the technology in the UICompiler project will make it easier to add features like anti-aliasing and various other low-level Swing features. I like to have one place where I work around Swing bugs :)
| Maybe the calpane source can be merged into Josh's project?
As the actions framework and theme features are important for code maintainability (in other words; it depends on the libraries in UIC) that would not be a very smart thing to do.
Don't you think that a HTML widget fits in the UICompiler widgetset? I mean; a filechooser, colorselector, wizard etc seem good company for a html widget...
Posted by: zander on June 21, 2004 at 06:44 AM
-
Re: I don't get it...
"Don't you think that a HTML widget fits in the UICompiler widgetset? I mean; a filechooser, colorselector, wizard etc seem good company for a html widget..."
You can make a case for it, but with all due respect to your project, It also makes sense that it should live beyond the UI compiler package and it's dependancies.
Posted by: rabbe on June 21, 2004 at 07:16 AM
-
Why the name?
Why did you pick the name 'Flying Saucer'. I hope you change it. I can't tell my boss that I've integrated a Flying Saucer into our code. He'll think I've lost it.
Posted by: jeremyzacker on June 21, 2004 at 11:34 AM
-
Re: I don't get it...
| It also makes sense that it should live beyond the UI
| compiler package and it's dependancies.
Using the html widget requires one extra jar which is approximately half its own size. The disadvantage is just about as big as a JTextField depending on java.text classes.
The advantage on the other hand is the ability to instantly get a huge amount of usability features and all the other stuff UIC delivers. I'm not going to abuse this Blog for that. I'll just have to work on better info on the website...
Posted by: zander on June 21, 2004 at 11:42 AM
-
you just couldn't resist, could you...
I don't consider them a great evil.
No, they are THE great evil.
Joshua, you are absolutely correct -- Microsoft has NO excuse for not making IE compliant with the spec, especially with as much manpower and cash as they do. They do have reasons, though, primarily being that they have refocused on desktop religion, and web services are the enabler for rich clients that make webpages irrelevant in their world. Are rich clients good? Yes. Are web pages bad? No. Microsoft has an obligation to support both, and they're doing a disservice to users everywhere by not supporting web standards, and even worse by terminating development of IE as a stand-alone browser.
Posted by: gerryg on June 21, 2004 at 11:54 AM
-
Why the name?
Just on a whim. Because I like 1950s style space kitch. I didn't name the project itself (in the directory) Flying Saucer because I assumed that we would find a better name sooner or later.
- Joshua
Posted by: joshy on June 21, 2004 at 12:42 PM
-
you just couldn't resist, could you...
have you ever tried building a website ? At a conservative estimate I'd say that the current project I'm working on has taken 50% longer because of the lack of standards compliance and just plain bugs in IE/Windows.
Try Googling for "Internet Explorer hacks", or just look at any site or discussion forum frequented by anyone who makes websites. And I'm not talking about security problems here, I'm talking about fundamental issues in rendering pages. For instance, have a look for "CSS Box Model" - Internet explorer ignores the standard and does it differently from every other browser - and (the piece de resistance) the way to get around it is to use ANOTHER bug in IE's CSS support to include hacky bit of illegal CSS code which works around the first bug .
IE is widely known as the worst browser out there in terms of standard compliance, and even ENORMOUS bugs haven't been fixed in years and years ? Why is this ? Could it be because it forces everyone to choose between either riddling their code with horrible and unmaintainable hacks in order to support IE, or to give up compatibility with any other browsers (and the published standards) and only support IE ? Which would you do, given IE's dominance ? The end result is that lots of websites are written which break the standards just so they can work on IE, and in the process cement it's dominance because they don't look right in the standards compliant browsers.
sorry this is a bit of a rant, but I've just this minute had to deal with exactly this issue, and now have to decide if I want my menu button labels to line up in Internet Explorer (by coding my page incorrectly) or in everything else (by doing the right thing). And it's left a bad taste in my mouth.
Posted by: jportway on June 21, 2004 at 02:42 PM
-
Why the name?
I like the name. Keep it as the nickname and keep the project name the same as you've got it. Good choices, and ignore the corporate grunts worried about their pointy-haired bosses.
Posted by: gerryg on June 22, 2004 at 07:53 AM
-
Some Help
There was a project on netbeans.org, NetBrowser. Started from merging of pretty browser-featured XBrowser and Jazilla HTML renderer. I remember whole story. The ideas, approaches and solutions were exactly the same. So the issue will be the same, also.
1. CSS
Whole tag instance factory should use "default" CSS that defines what tags are blocks, what are inlines and what are table models. This will cut rendering to support only 3 parameterized objects. And will allow redefinition of tags so that will be block, for example.
2. HTML->XHTML converter
THE MOST COMPLEX THING.
I take off my virtual hat on MS guys.
I was trying to use different HTML parsers with HTML 5.0 transitional DTDs, but reality constantly stressed this. How they solved this -- I just imagine.
Real (legacy) HTML is NOT XHTML. It is about 20% of all inet.
3. Browser component
Use existing one.
- Javascript support (how do we plug in rhino?)
Easily.
4. I18N efforts (how do we make sure the whole thing is i18n-able)
What you mean?
HTML results are Unicode already. [font] tag will use system's. So?
5. object and plugin support (SVG, xforms, flash, pdf)
8-)))
AWT comes to help???
8-)
I even created AppletTag that was drawing applet off-screen and copy the image on every redraw.
In a nutshell, you will use Swing but Applets using AWT will look ugly.
6. printing module (we can't print anything yet!)
No problems. You render on any canvas.
7. network module
Use HTTP library out there. Jakarta, W3C. Support for cookies file format should be expandable.
Posted by: wwk_killer on July 07, 2004 at 11:13 PM
-
Re: Some Help
1. CSS
>Whole tag instance factory should use "default" >CSS that defines what tags are blocks, what are >inlines and what are table models. This will cut >rendering to support only 3 parameterized >objects. And will allow redefinition of tags so that > will be block, for example.
Cool. That's exactly what I'm doing. There's really only about four types in the system.
2. HTML->XHTML converter
>Real (legacy) HTML is NOT XHTML. It is about >20% of all inet.
This is true, and that's why I decided not to do it. I really don't want to make a Java webbrowser. I want to make a renderer that can be embedded in other applications, which (I hope) is a much easier task.
>4. I18N efforts (how do we make sure the whole >thing is i18n-able)
>
>What you mean?
>HTML results are Unicode already. [font] tag will >use system's. So?
I mean what do we need to do to support different char sets, should we embed out own fonts, and what about RtL languages? I'm sure there's lots I haven't thought of.
>6. printing module (we can't print anything yet!)
>
>No problems. You render on any canvas.
True, but CSS defines special properties for handling printing. For example, laying out using inches instead of pixels or em. Or doing page breaking for long tables. There's a lot to consider, and since Flying Saucer will probably be used for some report generation that'll be important.
>7. network module
>
>Use HTTP library out there. Jakarta, W3C. >Support for cookies file format should be >expandable.
I had forgotten about that. I'll have to go see where they are these days.
Thanks,
- J
Posted by: joshy on July 11, 2004 at 12:21 AM
-
Open Source
Hi Zander,
help me out here. Out of courioisity I tried to identify the CalPane classes within UIC, but I'm not able to do so.
What is the package and where in the Demo is it used?
Thanks
KK
Posted by: kajkandler on August 22, 2004 at 06:43 PM
-
When you Google Flying Saucer and java, this blog entry still comes up. I was looking for more information and landed here. I think this project is fantastic.. javaians desperately need a way to embed rendered HTML+CSS and quite frankly, leave javascript, plugins, bookmarks (easy enough), history, cookies and all the rest to the browsers.
Webclient (XULRunner for java embedding) is not pure java. For me that means that it doesn't work on my Windows XP64 since the binaries are all 32bit Dlls. If you go to the bugzilla tracker, you can read the thread where that issue is being discussed between the engineers. #defines and MASM assembly compatibility and the size of integers and the 2Gig memory limits of 32 bit machines and all the rest are on parade.
Like I said, it's not pure java.
I hope Sun continues to concentrate on the core of Java and it's WORA (write once, run anywhere) value proposition, because amid the hype of everything else, people forget what an incredible boon to productivity , production and quality java is responsible for because it freed developers from THAT and let them concentrate on making great programs and solving real-world issues.
A large part of life comes down to just reading text. The majority of text is going to be XHTML, if it's not already. So keeping this project focused one doing one thing, really well, is just perfect.
Posted by: swv on November 28, 2007 at 05:29 PM
|