 |
Achieving better compression with Deflater
Posted by mister__m on December 26, 2003 at 10:48 AM | Comments (4)
I've recently been playing more intensively with CVS - I've always used either IDE support for it or any nice GUI client for CVS available - and found out more about GZIP compression than I knew before. That's my main motivation for this post.
It's been quite a while - since JDK 1.1, according to javadocs - Java has been providing support for working with ZLIB compression through its API. The package java.util.zip contains classes for manipulating GZIP and ZIP formats, as well as for coding to compression utilities directly by using the Inflater and Deflater classes.
So, getting straight to code, if you want to compress an object you are writing to a stream:
public void writeCompressed(OutputStream os, Object toWrite) throws IOException {
ObjectOutputStream oos = null;
try {
oos = new ObjectOutputStream(new GZIPOutputStream(os));
oos.writeObject(toWrite);
} finally {
if (oos != null) {
try {
oos.close();
} catch (IOException ioe) {
/*
* The day someone gives me a sensible explanation why this method
* throws an exception (as if there was something I could do about it or
* if I cared!), I will be sooooo grateful :-D
*/
ioe.printStackTrace();
}
}
}
Besides the ugly try inside the finally block - that deserves a whole post to itself, called "API design we don't get", probably better posted by Hani -, it's pretty simple. I've been working a lot with Prevayler- in a simple way, a very good open-source substitute for databases, faster by far - and as it works with serialization, I thought it would be a good idea to compress the serialized stream it generates. I've written a class you can use with Prevayler that does just that, as part of my open-source project, reusable-components, and it'd been a while since I last modified it. However, after some time manually dealing with CVS, I've noticed GZIP streams can have different compression levels and started wondering if java.util.zip provided support for playing with these.
Indeed, Deflater supports compression levels through a method named setLevel(int). The argument this method takes is yet-another-magical-int-constant-in-the-world, an int argument whose value ranges from 1, a.k.a. BEST_SPEED, to 9, a.k.a. BEST_COMPRESSION. Deflater is used internally by DeflaterOutputStream, which is the superclass of GZIPOutputStream, used in the above example. So if there is a method for setting the compression level, it means it's pretty simple to do it, right? Hum, it's easy, but it could be easier, though.
The problem is that DeflaterOutputStream mantains a reference to its Deflater instance via a protected property named def. It means it is not possible to simply get the Deflater instance and set its compression level. As it is a protected property, though, subclassing GZIPOutputStream will make it accessible. A simple way - in terms of a practical solution, not a very readable one - to do it is using an anonymous inner class with the so-called "anonymous constructor" as shown below:
public void writeCompressed(OutputStream os, Object toWrite) throws IOException {
ObjectOutputStream oos = null;
try {
oos = new ObjectOutputStream(new GZIPOutputStream(os) {
{
def.setLevel(Deflater.BEST_COMPRESSION);
}
});
oos.writeObject(toWrite);
} finally {
if (oos != null) {
try {
oos.close();
} catch (IOException ioe) {
/*
* The day someone gives me a sensible explanation why this method
* throws an exception (as if there was something I could do about it or
* if I cared!), I will be sooooo grateful :-D
*/
ioe.printStackTrace();
}
}
}
Using Deflater.BEST_COMPRESSION instead of the default compression level decreases a reasonable (more than 20kb) stream total size by around 10%, according to my tests. GZIP compression makes my serialized objects 80% smaller, which is good, at least for me. This method may be used to fine-tuning the compression level so less CPU cycles are used to transmit something through the network, for example. After some experiencing, you may be able to figure out an ideal value in your specific case and use it as the compression level for your own GZIPOutputStream. Yet another obscure, hidden feature inside the API, recently found out. :-D
If you happen to be using Prevayler and would like to get smaller snapshots, take a look at reusable-components and download the latest version from here. Also, the Enum class has been enhanced to support anonymous subclasses and minor javadoc clarifications have been made, thanks to Jonathan O'Connor suggestions. If you want to join, I'd also be glad.
Bookmark blog post: del.icio.us Digg DZone Furl Reddit
Comments
Comments are listed in date ascending order (oldest first) | Post Comment
-
Why close throws IOException.
When you close a stream, any buffered data is flushed and resources are released. Suppose your code writes some data to a buffered network stream. Your dog then pulls out the network cable. Finally your code then closes the stream. You've unintentionally lost some data, so an IOException must be thrown, and thrown from close.
You don't want to ignore the exception from close, unless there's some other exception you are already handling. So before your finallys put in an extra ios/oos.close(); and perhaps ios/oos = null;.
Posted by: tackline on December 26, 2003 at 06:19 PM
-
Re: Why close throws IOException.
Thanks tackline, I agree with you when it comes to flushing. However, - and that is not shown in the code above - this is definetely not useful if you have already directly called flush(). That is what meant - but not what is shown :-D
The problem with close() in both Streams and java.sql.Connection, ResultSet and Statement is that there is no apparent usefulness in catching an Exception that simply means they could not be closed. I mean, what do you want to do when that happens? Tell the user? Log it just not to ignore it? If you have a problem with flushing, I have to agree with you but that's not written in the JavaDoc though. That's a real problem, because we don't understand why the API author thought it might be useful to us.
Again, thanks for your comment about flushing, that's really a good point (even I forgot to flush it :-D)
Posted by: mister__m on December 28, 2003 at 08:16 AM
-
Why do you want GZip over Zip?
The setLevel() metjod is public in ZipOutputStream and JarOutputStream.
http://java.sun.com/j2se/1.4.2/docs/api/java/util/zip/ZipOutputStream.html#setLevel(int)
http://java.sun.com/j2se/1.4.2/docs/api/java/util/jar/JarOutputStream.html
Why do you prefer GZip?
I only have 2 problems with ZipOutputStream.
1) There is an API mistake that should be fixed.
http://developer.java.sun.com/developer/bugParade/bugs/4512189.html
2) Real-life usage of ZipOutputStreams as part of a long chain of OutputStreams can completely bypass the Hotspot compiler. This is most likely to happen with small buffer sizes, large stream sizes, high compression levels, and using the -client flag, but I haven't found any sure-fire way to prevent it. When does happen, the write speed to the stream is essentially the same as if running with the -Xint (interpreted) flag, so the preformance hit is huge (writing is dozens to hundreds of times slower). Perhaps someone on the Hotspot team could write about this. I'm interested in knowing if it might be ultimately corrected by better Hotspotting, or if one simply has to write a single OutputStream with all of the functionality of the entire chain (a potentially very big task, indeed).
BTW -- Zoe has an ObjectOutputStream variant that you may be interested in.
http://article.gmane.org/gmane.mail.zoe.devel/437
Posted by: coxcu on December 29, 2003 at 09:45 AM
-
Why do you want GZip over Zip?
> Why do you prefer GZip?
Zip is better than GZIP if you are dealing with more than one file. GZIP is the right choice if you just want to compress a general stream, for example, a serialized object. That is the big difference between them. The fact there is a setLevel in ZipOutputStream just makes me think someone forgot to add it to GZIPOutputStream :-P.
> 1) There is an API mistake that should be fixed.
> http://developer.java.sun.com/developer/bugParade/bugs/4512189.html
Apparently someone haven't read Joshua Bloch's book nor my blog before writing core JDK classes... :-D
> 2-) (too long)...
Hum, what you are saying is interesting. Try GZIP instead - if it suits your needs - and let me know what happens.
> BTW -- Zoe has an ObjectOutputStream variant that you may be interested in.
I'll definetely take a look when I have the time, thank you.
Posted by: mister__m on December 29, 2003 at 10:25 AM
|