Skip to main content

Do I really need all those jars in my classpath?

Posted by emcmanus on February 21, 2008 at 9:56 AM PST

Big applications have a tendency to accumulate enormous
classpaths. Looking at such a classpath, you might be hard put
to know whether any given jar is really needed. Perhaps it was
needed at the time it was added, but that need has long since
evaporated. How can you tell? Having jars you don't need means
your application will be slower starting up, and perhaps also
while running. It also means that you might be worrying
unnecessarily about getting the latest version of a jar that
you're not actually using.

href="http://www.linkedin.com/ppl/webprofile?id=6042912">Kyrill
Alyoshin has an elegant solution using a href="http://java.sun.com/javase/6/docs/api/java/lang/instrument/package-summary.html">java.lang.instrument
agent. The basic idea is that you run your application with
java -javaagent:loosejar.jar ... and the
agent in loosejar.jar can find all the classes that
the application has loaded using href="http://java.sun.com/javase/6/docs/api/java/lang/instrument/Instrumentation.html#getAllLoadedClasses()">Instrumentation.getAllLoadedClasses().
Then for each of those classes it can find what jar it came
from. Thus for each jar on the classpath it can compute the
number of classes that have actually been loaded from it. When
this number is zero, the jar might be unnecessary.

Of course for the results to be valid you have to exercise the
application so that it does everything it can do. That might
not always be easy, but for a candidate jar you can often figure
out what to do to make the jar be referenced. To help you do
this incrementally, the loosejar agent exports a href="http://java.sun.com/jmx" rel="tag">JMX MBean that
allows you to ask for a report while the application is
running. So you can connect with JConsole and get a report, see
what jars might be unnecessary, try to provoke class-loading
from those jars, get another report, and so on.

One subtlety is that the set of jars is not just the contents
of the classpath. The jar could reference other jars through a href="http://java.sun.com/docs/books/tutorial/deployment/jar/downman.html">Class-Path
entry in its manifest. You'd like to know if all of those other
jars are really necessary too. Kyrill and I were unable to find
a better way to get the href="http://en.wikipedia.org/wiki/Transitive_closure">transitive
closure of referenced jar files than to use href="http://java.sun.com/javase/6/docs/api/java/lang/ClassLoader.html#getResources(java.lang.String)">ClassLoader.getResources("META-INF/MANIFEST.MF")
and parse the returned jar: URLs. Every jar has a
META-INF/MANIFEST.MF file, so every jar known by
the ClassLoader will show up in the result of this call, but
ugh. There has to be a better way.

Kyrill's loosejar project is hosted on href="http://code.google.com/p/loosejar/">Google Code.

Related Topics >>

Comments

Sorry, the layout got scrambled: Getting hold of the jar files might be a bit complicated in NetBeans. I'll try to give some pointers based on my limited knowledge: When you creat a distribution the modules will be organized in clusters. The root directory holds sub-directories for each of the clusters. For example: -myapp --myapp --nb6.0 --Platform7 Each of the clusters contains modules. Each of the modules comes with an xml file that sits outside the jar. This is how modules are found by the module system. this xml-config contains the location of the modules jar file. In the aove example the xml files are located under these paths: myapp/myapp/Config/Modules myapp/nb6.0/Config/Modules myapp/Platform7/Config/Modules Here's the example content for a javax mail library wrapper module's config (myapp/myapp/Config/Modules/javax-mail.xml): <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE module PUBLIC "-//NetBeans//DTD Module Status 1.0//EN" "http://www.netbeans.org/dtds/module-status-1_0.dtd"> <module name="javax.mail"> <param name="autoload">true</param> <param name="eager">false</param> <param name="jar">modules/javax-mail.jar</param> <param name="reloadable">false</param> <param name="specversion">1.0</param> </module> The param name="jar" points to the modules jar file (myapp/myapp/modules/javax-mail.jar). The manifest of this javax-mail.jar may contain contain dependencies for other modules: OpenIDE-Module-Module-Dependencies: javax.activation > 1.0 but I assume you can ignore those, since you can retrieve their jars from their own xml config. More important it may contain references to other jars in the class-path property (especially for library wrappers): Class-Path: ext/dsn.jar ext/mail.jar ext/pop3.jar ext/smtp.jar ext/mai lapi.jar ext/imap.jar Following this path you should be able to get a complete list of jars loaded by the app. It is more complicated when you run the app from inside the IDE, since the platform modules will be loaded from the IDE installation then and won't be in the build directory. Sorry for the lengthy comment - Toni

<p>Nice article , you have indeed covered topic in details ...

Nice article , you have indeed covered topic in details with sample code, its indeed a topic which require a deeper understanding than many other java topics. I have also blogged some of my experience about classpath in How Classpath works in Java

Thanks
Javin http://javarevisited.blogspot.com/2011/01/how-classpath-work-in-java.html

Getting hold of the jar files might be a bit complicated in NetBeans. I'll try to give some pointers based on my limited knowledge: When you creat a distribution the modules will be organized in clusters. The root directory holds sub-directories for each of the clusters. For example: -myapp --myapp --nb6.0 --Platform7 Each of the clusters contains modules. Each of the modules comes with an xml file that sits outside the jar. This is how modules are found by the module system. this xml-config contains the location of the modules jar file. In the aove example the xml files are located under these paths: myapp/myapp/Config/Modules myapp/nb6.0/Config/Modules myapp/Platform7/Config/Modules Here's the example content for a javax mail library wrapper module's config (myapp/myapp/Config/Modules/javax-mail.xml): true false modules/javax-mail.jar false 1.0 The param name="jar" points to the modules jar file (myapp/myapp/modules/javax-mail.jar). The manifest of this javax-mail.jar may contain contain dependencies for other modules: OpenIDE-Module-Module-Dependencies: javax.activation > 1.0 but I assume you can ignore those, since you can retrieve their jars from their own xml config. More important it may contain references to other jars in the class-path property (especially for library wrappers): Class-Path: ext/dsn.jar ext/mail.jar ext/pop3.jar ext/smtp.jar ext/mai lapi.jar ext/imap.jar Following this path you should be able to get a complete list of jars loaded by the app. It is more complicated when you run the app from inside the IDE, since the platform modules will be loaded from the IDE installation then and won't be in the build directory. Sorry for the lengthy comment - Toni

To be much more accurate (and safe) one needs to not only use the class load event but to ** also ** analyze the actual byte code (and/or class meta data) or to ensure every method has been executed as a Java class can easily be loaded whilst its direct dependent (method refs) not. I have seen this countless times with particular application server products when accessing internal state via reflection.

Cédrik, URLClassLoader.getURLs() is good, but it doesn't get you jars that are referenced indirectly by being mentioned in the Class-Path: attribute of one of the classpath jars. That's why I suggested the getResources hack.

samuelto, Alexis M-P also mentioned the possibility of using -verbose:class. It's certainly possible, but requires more setup than just adding -javaagent if you want to capture and analyse the reported classes, especially if there is other stuff going to stdout or if you want to see the list at several times during application execution.

arafalov, I suspect that the approach you describe wouldn't work. The JDK used to map a jar file into memory the first time it was accessed, so dTrace would just show you all the jars being mapped. I'm not sure it still does that, but it probably does read and store the list of classes in each jar rather than accessing all the jar files every time a class is being looked for.

The ClassLoader.getResources("META-INF/MANIFEST.MF") is a clever trick!
To fetch all loaded JAR files, you could also use the fact that most of the time Web ClassLoaders are instance of URLClassLoader, and thus call the .getURLs() on it. This is the very technique I use to display the hierarchy of ClassLoaders in my very own open-source monitoring tool: MessAdmin.

Yes, wlouth, I know, your product is "better"... we've met on that point already! :-)

Another way would be to observe file system access (FileMon/dTrace).

The classloader will access the jars in a specific (classpath) order, so if you prepend a fake jar at the start, you may be able to extract access sequences. Then, the last jar accessed in that sequence is where the class was found. This assumes only one class at a time is loaded, which I believe is true.

A wrinkle exists with classes not found (actually a frequent case with inner classes, etc), so the last jar would show up as a result in those case. If it is possible to put a fake jar at the end of classpath, you could avoid this problem.

I am not sure if MANIFEST entries go at the beginning or end of the classpath, but with enough data collected it should be possible to figure this out using statistical analysis.

You are right as soon as Eamonn had used the "commercial" label I should I have realized that any discussion on technical merits of different designs and solutions was over.

Would -verbose:class do the trick? It records each class being loaded along with the jar file where it came from to stdout. You can then write a simple script to get the list of jars containing at least one loaded class: [Loaded java.util.EventListener from /opt/jdk1.5.0_11/jre/lib/rt.jar] [Loaded javax.servlet.ServletContextListener from file:/myapp/j2ee/lib/servlet.jar]

wow wlouth, Can you push JXInsight just a little harder? I'm not quite getting the message

Time and again, I see projects: web war/ear files deployed with junit.jar, with every possible dependency of hibernate or spring, or struts. Static analysis may actually give a wrong picture (saying that those dependencies are actually needed!). Yes, Spring does have to have, say, ibatis to compile, but it doesn't need to be on your app classpath if the project uses hibernate.

Over the last 2 months I have provided training for 4 large US based organizations none of which required a single commercial license of JXInsight for their development teams - all licensed the product for pre-production or production environments with not a single developer in sight. From my experience most development teams can execute all their automated uses cases within the time frame allowed by the development edition unless of course there are performing stress testing which is off-topic for this thread and yes does require a commercial license. Most developers would not have the patience you indicate. I think you are stretching more than time here with this.

William, actually I did see that, but I had the impression that if you needed to analyse an app over a period of hours the free version would not apply.

Eamonn, if you had spent a little bit more time on our website you would have noticed that there is a FREE development edition of JXInsight which "others" (commercial) performance management and problem diagnostics vendors do not provide.

My goodness, five comments in the space of a day! I'm not sure I've ever had that.

I thought of static analysis tools and should have mentioned them. Thanks tcurdt and Xavi for the pointers. I see the static and dynamic approaches as complementary. First, static analysis could show that some jar can theoretically be accessed but dynamic analysis could show that it never is, and you could then look at how to rewrite your app so that even the static reference is removed. Second, static analysis cannot follow reflection, such as driver class names that appear in XML or properties files, so dynamic analysis can show what extra root jars you need to add to the static analysis.

William, someone looking for a commercial product to solve this problem (and many others) should certainly consider yours. But I don't see the use of the JMX API here as being problematic.

Sorry Eamonn but this again seems like a misuse of JMX which makes it practically impossible to model efficiently and easily complex state structures including class meta data.

There is already a much better solution that applies the concepts underlying a CMDB to runtime state inspection and diagnostics - JXInsight's JVMInsight. We already include a classes inspection extension that pulls all this meta data in one go and allows one to save it for off-line analysis as well as annotation.

Insight: Resolving ClassCastExceptions
http://blog.jinspired.com/?p=56br>
Here is an extension on this extension which shows the instrumentation performed by AspectJ.
http://blog.jinspired.com/?p=135

This video shows what it really is like to visual(ize a )vm.
http://www.jinspired.com/products/jxinsight/videos/insights/insights.html
William Louth
JXInsight Product Architect

Eamonn,

there's a powerful utility called Jar Jar Links, which can find dependencies at the class level and at the JAR level. It uses static analysis, so you don't need to run your application to find dependencies. You can get information of its usage here.

Regards,

Xavi

You can also use static analysis of the byte code. Minijar does that http://vafer.org/blog/tag/minijar based on http://vafer.org/projects/dependency/howto.html A much improved version is soon coming up though.

Interesting solution. I had done something similar when estimating how much of rt.jar was needed by certain aplications - http://blogs.sun.com/alexismp/entry/is_javabe_justified1 it simply analyzed the output of -verbose:class and was less elegant for sure. I do agree a modules system would be a welcomed refactoring and probably a better long-term solution (at a fairly expensive cost though).

The idea is really meritorious, but I can't image how to assure a correct results of it for big enterprise solutions. How long you need to monitor the application.. month, two, half a year. If you have integration tests which covers whole the functionality then your're ok with this, but if you're dealing with the legacy stuff without integration/performance tests..? I thing that the one of the best solutions is to migrate all your libraries under OSGI horse and forget about the unused jars.

NetBeans has three types of hierarchically organized classloaders. I guess you'd only need to get hold of the System Classloader, which has all the modules classloaders and the original (application) classloader as it's parents ( described here: http://wiki.netbeans.org/DevFaqClassLoaders ). Would that work with loosejar?

First, loosejar does do dection on a per classloader basis. But I guess this is not what you're looking for. There is an RFE to enable detection of OSGi bundles. I am figuring out how to get from an OSGi bundle entry in the MANIFEST.MF to the actual jars that the classloader has available. I am not familiar with NetBeans modules. Maybe you can point me in the right direction? The only thing I need to know is how to discover jars available to the classloader.

Very nice. Is there a way to use loosejar with a module system and multiple classloaders, like in NetBeans RCP projects?