Read office files with Java API
Last year when working in a project, there were a lot of documents (requirements, user guides, architecture, etc.), from different sources (email attachments, file shares, backups, old version control). The same document name but different date and size.
So, how to know which one is the latest and delete the others ?
There were two ways to achieve that:
- Open each file and see its properties, like date and last author.
- Programmatically read its properties and print it out.
And there are their costs:
- Very time consuming if you have many files, like I had (300 files)
- Time consuming, but you do this only once and can write a blog about that (WoW)
Sure, I went to the 2nd option.
Then, in a short time I developed a small program to read a set of files and print its name, date of last modification and complete path.
OpenOffice SDK libraries is used, so you need to have a OpenOffice 2.x installed somewhere. Actually it reads a small set of office files like sxw, doc, xls, odt, ods, pps, odt, ppt and odp. Feel free to modify it to suit your needs. Or even extend its functionality.
The java source code can be downloaded at DocViewer.java
* its a UTF-8 file
** remove .txt extension after download
As I am a linux user, this works with linux in mind. Windows users have reported it works with small modifications to its runtime settings, but I don't know which modifications need to be done.
The following libraries are needed:
$OO_HOME points to the OpenOffice installation. For me it is installed at /opt/broffice.org2.3
* BrOffice is the official Brazilian version of OpenOffice
- OpenOffice 2.x installation
- X Virtual Frame Buffer (Xvfb)
- Java (version 5 or more recent)
Very easy to compile
javac -classpath /opt/broffice.org2.3/program/classes/\* src/claudius/DocViewer.java
You see I have used classpath wildcards. Modify this as needed to compile it with JDK 5.
To run it, openoffice standalone program need to be running, but to avoid a graphical program popping out in a window hundreds of times, it can run in a non graphical way. To achieve that I used X Virtual Frame Buffer (xvfb), its a kind of X window manager in memory, this is useful to run graphical libraries at server machines.
The OpenOffice SDK will connect to OpenOffice program through sockets, as the standalone program will do the real job of read the office file.
- Run the Xvfb program
Xvfb :5 -screen 0 800x600x16 &
Load openoffice program on memory and inform it to use the X server at :5 display
$OO_HOME/program/soffice -accept="socket,host=127.0.0.1,port=8100;urp;" -display :5 -headless -norestore -invisible &
- Run the java program
java -classpath $CP claudius.DocViewer <path>
$CP point to the classpath defined previously
point to a single file or directory. If a directory it will search at subdirectories.
The output will look similar to this
dir = /home/claudio/resources/palestras/2007/10_justjava
file = diagnostico2.odp
Modified by: Claudio Miranda 5/10/2007 17:46:8
If this piece of code is useful or if you made any modification, please share it and write a comment.