 |
The big secret revealed! A PDF viewing library!
Posted by joshy on December 13, 2007 at 07:39 AM | Comments (35)
Last week I told you we had a secret new open source project to release. Think of it as an early Christmas present. A project that you've never heard of and has nothing to do with JavaFX (which is partially untrue, but I'll get to that in a second). Well, it's almost the end of the week so here is the secret. You can listen to MP3 announcement (played on stage at the JavaPosse's JavaPolis session), or simply read on. We are releasing an
Open Source 100% Java PDF Renderer/Viewer
That's right, a 100% Java library which can parse PDF files and draw them to the screen. It's (creatively) named the SwingLabs PDF Renderer, and hosted at pdf-renderer.dev.java.net.
It's the same license as the rest of SwingLabs (LGPL) so you can easily embed it in your own applications. Several of us inside the desktop Java team here at Sun have been working hard on getting this released and now it's finally here. Go check it out at pdf-renderer.dev.java.net.
So, you probably have a few questions. First of all:
Why should I care?
You should care because PDF is one of the formats that makes the web go 'round. Soon to be an ISO spec, PDF is the standard way of exchanging non-interactive documents on the web. Everything from tax forms to clip art can be stored in PDFs. Mac OSX makes heavy use of PDF both as an asset format (the many widget images found in Aqua) and also as an ideal archive format using AppleScript workflows. PDF is everywhere.
Once a PDF is created you know with great certainty that it will display and print exactly as you want on any platform. Hmm. Write a PDF once and run it anywhere? Sounds like a good fit for Java! Combined with PDF writing libraries (like iText), you can do pretty much anything you want with PDFs.
What can I do with it?
Anything you want! You can embed a PDF in your Swing app, draw on top of it, and even render to places other than the screen (like PNG images). The awesome guys over at Project Wonderland have even started experimenting with projecting PDFs into their 3D shared universe. Most importantly, we know you'll come up with things we never thought of. That's why we are open sourcing it.

Experimental Project Wonderland support
As another bonus, we plan to use this library to build a PDF imported for the designer tool that I'm working on. So, technically, this does have something to do with JavaFX, but that's not the focus. The focus is general PDF support for Java.
Where did it come from and Who is running it?
The SwingLabs PDF Renderer was originally written in 2003 by researchers at Sun Labs for an internal collaboration tool called Sun(TM) Labs Meeting Suite. It was originally targeted at output from OpenOffice, so you will find it can support most OpenOffice PDF exports.
While the original code drop is from Sun, we want to get the community heavily involved. To make sure that happens we have recruited Tom Oke from Elluminate to run the project. He will act as project owner and lead architect. He is rapidly becoming an expert in the code and looks forward to discussing features with other contributors.
And speaking of other contributors..
What about iText and JPedal?
JPedal uses the GPL license, making it non-viable for certain applications. We think that the LGPL is a better fit for a library like this. iText is not a viewer/renderer. iText generates PDFs, it doesn't view them. This makes iText and the SwingLabs PDF Renderer great partners. I look forward to seeing how people combine them.
What are the limitations and how can I help?
As I said, we originally targeted OpenOffice exports, so a few things are missing. It implements most of the PDF 1.4 spec but is missing transparency, fill-in forms, and certain font-encodings. We hope that interested developers in the community will help us fill in these missing features.
If you want to get started then head over to the PDF-Renderer project website, download the code, and join the mailing lists.
Bookmark blog post: del.icio.us Digg DZone Furl Reddit
Comments
Comments are listed in date ascending order (oldest first) | Post Comment
-
Great work, Josh, and all those who contributed. The demo viewer rocks. I talked to Richard today and he told me how long you all have been working on getting this code opened up--big kudos! for sticking with it. This is a great component to have in our toolbox. Cheers! Patrick
Posted by: pdoubleya on December 13, 2007 at 07:47 AM
-
Please help the hard of thinking: can this be used as an image source, eg with JAI and Java3D?
Rgds
Damon
Posted by: damonhd on December 13, 2007 at 08:29 AM
-
damonhd Yes, you should be able to draw to any Graphics2D object, including a buffered image. Once it's an image you could use it with lots of other things.
Posted by: joshy on December 13, 2007 at 08:35 AM
-
Nice to see it! I remember Richard talking about the render a long time ago in the SwingLabs forums.
As it relates to JavaFX, will we ever get to download the binaries/source of the JavaFX PDF viewer shown at JavaOne?
Posted by: wsnyder6 on December 13, 2007 at 08:51 AM
-
Good piece of work. A few years ago I become mad when try to using an Adobe unsupported jar for this, and worked badly.
Posted by: fabriziogiudici on December 13, 2007 at 09:03 AM
-
wsnyder6 the plan is to release the JavaFX PDF viewer now that the lib is open source.
fabriziogiudici Actually, I've just posted a message to the yahoo group for that Adobe jar telling them about the new project.
Posted by: joshy on December 13, 2007 at 09:36 AM
-
How about the parser? Can it be used independently to parse a InputStream? And does it return a Charsequence or just images ?
Posted by: i30817 on December 13, 2007 at 09:39 AM
-
i30817: I don't see why this wouldn't work. I suggest looking at the code to see how difficult it would be.
Posted by: joshy on December 13, 2007 at 09:42 AM
-
Did you try to get the Multivalent or JPedal Open Source developers involved before launching yet another PDF library which will need work to bring up to that spec?
Posted by: markee174 on December 13, 2007 at 11:37 AM
-
That's great news, Josh!
PDF plays such a big role these days and having to play around with native libs and depending an Adobe Reader really is a pita. I took the PDF demo viewer out for a short ride and it behaved great, even displayed 454 pages of "Lucene In Action" without problems.
It's released, it performs well, it is open and it works!
Thanks to all who made this possible.
Posted by: alarenal on December 13, 2007 at 01:04 PM
-
There have been Open Source PDF viewers out there since 2000 and some decent commercial ones as well. JPedal uses a dual licensing model to fund its developments - if Sun thinks PDF should be free to commercial developers, they will find keeping up with Adobe is a lot of work....
Posted by: markee174 on December 13, 2007 at 01:13 PM
-
Humph. I just tried the Multivalent parser and it seems pretty raw. A very extensive class hierarchy, that when using their Extract text feature:
a) Didn't insert \n on the end of paragraphs
b) Outputed the exceptions to the same printstream as the parsed text. Yes i don't wan't to read NullPointerException in the parsed text.
c) Locked up on the first pdf i found in google, for reference :
http://www.portugal.gov.pt/NR/rdonlyres/EF50C8AB-7823-45E9-9E0D-3399777C2888/0/Programa_Simplex.pdf
Posted by: i30817 on December 13, 2007 at 04:52 PM
-
But to be fair they have the useful feature of transforming pdf to xml, but again they don't surround paragraphs with (as with the \n). And the font size attributes seem flaky.
Posted by: i30817 on December 13, 2007 at 04:54 PM
-
Posted by: i30817 on December 13, 2007 at 04:55 PM
-
But to be fair they have the useful feature of transforming pdf to xml, but again they don't surround paragraphs with
(as with the \n). And the font size attributes seem flaky.
Posted by: i30817 on December 13, 2007 at 04:56 PM
-
Is there a way to extract text to a String using this library? I've been looking through the source but didnt find such functionality.
i30817, which lib has a pdf to xml feature?
Posted by: dling on December 13, 2007 at 05:20 PM
-
Multivalent. But as i (tried to) say it doesn't allow the "well know html tag for paragraph that i don't know how to escape". And i think that there is a PDFParser class in this library that does what you want. JPedal has a strange license, supposedly GPL for non commercial projects but they only distribute a "Demo" and "Premium" version (the demo totally cripples the lib). I don't know if they distribute the source, but the "Demo" has "You shall not reverse engineer" etc, etc so i bailed.
Posted by: i30817 on December 13, 2007 at 05:40 PM
-
For extracting text you are better off using a library like iText.
Posted by: joshy on December 13, 2007 at 05:40 PM
-
I don't remember iText being able to parse PDFs easily. At least its not in easy to see in their api or examples. I thought that they painted into images.
Really i think that pdf is the worst text format ever. A supposedly resolution independent format that doesn't have a way to specify text order and no way to see if its text or a series of images. Not to mention that it can embed its non-default fonts, duh thats smart. Not.
Posted by: i30817 on December 13, 2007 at 06:15 PM
-
i30817 You are correct. PDF is the worst text format ever. That's because it's not meant to be a text form. PDF is a vector imaging format. It's purpose in life is to create a document that will look and print identically everywhere. It's very good at that. But for storing machine readable text it is horrible.
Posted by: joshy on December 13, 2007 at 06:36 PM
-
Great Job!
Wow.. just what I have been wishing Java could do!
-Carl
Posted by: carldea on December 13, 2007 at 07:44 PM
-
I can't tell from your description, the site, or the javadoc... is the built-in PDF viewer frame heavyweight or lightweight? Since as of JDK6 you still can't mix the two, it makes a big difference to me.
Also, looking at the javadocs, this project claims to be part of the "Meeting Central" project, although that link is broken. What's Meeting Central?
Posted by: samkass on December 13, 2007 at 08:25 PM
-
samkass The viewer is a lightweight swing component. It uses no heavyweights or native components. Meeting Central was an internal Sun project. Where is the link you are referring to?
Posted by: joshy on December 13, 2007 at 09:01 PM
-
Meeting Central is the internal name for Sun(TM) Labs Meeting Suite a collaborative application for distributed meetings. The PDF Renderer and jVoiceBridge are technology from that project which have so far been released as open source.
Posted by: nsimpson on December 13, 2007 at 11:43 PM
-
I was at JavaPolis the day before yesterday, signing my book 'iText in Action' (by the way, I'm Bruno Lowagie, the guy from iText). If only I had known there were people from SUN working on a PDF viewer in Antwerp, I'd have loved talking with them!
As for the question about iText and parsing PDF: we don't promote iText as a PDF parser, but you can use iText to extract the content stream of a page. If you do so, you'll soon find out why joshua is right when he confirms that PDF is the worst text form ever.
If you want to extract content from a PDF document, you're better of with OCR tools than with a parser. I've always found it a waste of energy trying to develop text extracting functionality for iText; the same goes for writing a PDF viewer: why would I go that direction with iText when there are alternatives?
That's why iText specializes in PDF generation and manipulation. We support very specific PDF functionality such as digital signing, (certificate) encryption, form filling,... I also look forward to seeing people combine iText and PDF Renderer. A pity we didn't meet at JavaPolis, but I understand: you couldn't give away your scoop ;-)
Nevertheless: if you ever want to meet, I live in Ghent, that's a one hour train ride away from Antwerp.
Posted by: blowagie on December 14, 2007 at 04:15 AM
-
How about patent indemnification? We can't use any open source here at work unless we can get some kind of patent indemnification.
Posted by: aberrant on December 14, 2007 at 10:57 AM
-
nsimpson: The sole javadoc comment for the class "PDFViewer" is: "A PDF Viewer application that integrates with the Meeting Central project."
Posted by: samkass on December 14, 2007 at 12:12 PM
-
Do you know if there's a way to create a bookmark tree? I scanned through some of the javadoc and didn't see how to do it. The JNLP example only shows preview pages in the left-hand frame.
- Mark
Posted by: phidias on December 14, 2007 at 01:40 PM
-
"JPedal uses the GPL license, making it non-viable for certain applications. "
Its under a commercial or GPL license so the only group its not viable for is commercial users who want a free library. If Sun feels this is necessary, you are welcome to provide it. But are you expecting them to contribute the 1000s of hours to add the support for compressed objects, forms, annotations, search capabilities, highlighting, DeviceN, etc, and debug all the odd PDF definitions out there?
I presume Sun have seen the JPedal viewer at http://81.21.79.168:8081/webstart/jpedalviewer.jnlp
Adobe themselves licensed JPedal to add PDF support to ColdFusion, rather than implement their own version in Java or update their bean. But then what would they know about PDF?
You have a vibrant little ecosystem of small companies providing a mix of Open Source and Commercial solutions who will probably now be considering whether they should port their code to DotNet rather than invest any more effort in Java, while you try and catch up where they already are :-(
Posted by: markee174 on December 15, 2007 at 01:44 AM
-
Footprint uses iText and we are from long time dreaming with a configuration GUI... this is a new chance :) Thanks for the tip :)
Posted by: felipegaucho on December 15, 2007 at 01:51 AM
-
Hey, big big thanks josh !!! I tried JPedal but it's really really slow !!!
I'm going to use your swinglabs one now for sure !!!
Nice component we get !!!!!
thanks again to all project members !!
Posted by: aleixmr on December 15, 2007 at 03:35 AM
-
Alex,
Ca you send me the file so I can have a look. We've spent the last 8 months optimising all aspects of JPedal and it flies on most files we have now- its very hard given the sheer variety of PDFs out there, as Sun will find out, especially without any feedback.
Posted by: markee174 on December 15, 2007 at 05:10 AM
-
@markee174 - Whats up with you? You must be a JPedal developer right? Do you think this will detract from your income from JPedal? Are you threatened by this? Wider adoption of PDF can only drive more business to your product if it is indeed better. If Sun can has to deal with Apache Harmony you can take a little competition from pdf-renderer. It's not like they folded it into the JDK.
Posted by: aberrant on December 15, 2007 at 06:11 AM
-
@aberrant: I'm delighted that Sun is finally taking PDF seriously. I've been trying to get Sun interested in PDF for years. I have no problem with Sun releasing a PDF library - as you say when people realise just how much is missing from this implementation, it may well generate more traffic for us and our GPL or commercial versions. Its a subset of 1.4 and we are not far off having 1.7 and we have forms support. We've got Adobe using our PDF library so we reckon its fairly good. Maybe this means we might finally to get Sun to support all the Tiff variants present in PDF we have been sending them:-) Its arbitary comments that our library is slow or that a GPL license is somehow not appropriate that annoy me and I respond to.....
Posted by: markee174 on December 15, 2007 at 08:35 AM
-
This is LGPL right? because I checked out the source from CVS and the file headers say "SUN PROPRIETARY/CONFIDENTIAL.".
Posted by: benloud on December 15, 2007 at 05:23 PM
|