About This Blog

Hi, I'm Ben Pryor. This blog contains my thoughts about general software engineering topics, and occasionally specifics that I find interesting. If you see something here that sparks your interest, please feel free to comment on a post or send me an email at ben at benpryor.com.

30 June 2006 - 7:00Pack200

Perhaps you’ve heard the term Pack200 before but haven’t had a chance to become familiar with it. Or maybe you already know that Pack200 is related to deployment of Java applications, but aren’t sure how it could be used with your application. Compared to other new features of Java 5, Pack200 hasn’t gotten as much attention. In this entry I’ll answer two questions: first, What is Pack200?, and second, Why would I want to use it?

Pack200 was released as part of the Java 5 platform, and is essentially a technology for achieving much better compression ratios of deployable Java code. Java code has traditionally been packaged and deployed as JAR (Java Archive) files, which are nothing more than standard zip files with the extension .jar. Pack200 can result in radically higher compression ratios of Java bytecode when compared to traditional JAR packaging.

The name Pack200 is derived from two sources, and to understand why the name was chosen you have to know a little bit of the history of the technology. William Pugh (best known as the developer of FindBugs) released a paper detailing a number of advanced techniques for compressing Java class files. These techniques were used by Sun to decrease the size of the JRE and the JDK downloads, starting sometime around the Java 1.4.1 release. William Pugh’s ideas and the format Sun used were referred to as Pack. Around the same time period, Java Web Start / JNLP technology was becoming relatively popular and Java applets were also starting to become more popular once again. A JSR (Java Specification Request) was created called JSR 200. The JSR was created to specify a “dense download” format for Java bytecode, based on the technology presented by William Pugh in his paper and already in use internally at Sun. This technology was eventually called Pack200 (from the JSR number) and became public and supported with Java 5.

The target use case for JSR 200 was to enable more optimal web deployment of Java applications, specifically in the case of Java Web Start and applet applications. The motivation there was to reduce the download / update time for the client and the bandwidth usage for the server. Java 5 includes hooks in the JNLP bits and the browser plugins so that you can use Pack200 out of the box with Web Start applications and applets (see Sun’s deployment guide). Even though Pack200 was designed for the Web Start type scenario, it is very applicable to any type of client-side Java - I’ll explain how later on in this entry.

The Pack200 technology is intended to be a deployment vehicle, and doesn’t come into play at runtime. A Java virtual machine still works only with class files and JAR archives on the classpath - Pack200 packages must be unpacked before the bytecode inside them can be loaded into a virtual machine. The right way to think about Pack200 is that it’s a faster, more efficient way to get bytecode onto client machines.

The basic premise of Pack200 is simple. Since the JAR format is a standard, generic file format, it is not designed to treat Java class files any differently from other file types. However, Java class files have certain properties that can be cleverly used by a more specific packaging technology. By exploiting specific aspects of the Java class file format, an extremely dense representation of a collection of class files can be generated. Traditional JAR files normally achieve compression ratios of 1:2 or 1:3. It’s not uncommon for Pack200 files to achieve compression ratios of 1:10 or greater.

For an example, I took a collection of about 1000 class files that make up a library in one of the projects I’m working on right now. I created a traditional JAR archive and a Pack200 package containing these class files. The files take up 3,987,908 bytes on disk. The JAR archive takes up 1,834,269 bytes on disk, or about 46% of the size of the original class files. The Pack200 package takes up 266,803 bytes on disk - 15% of the size of the JAR file, or an amazing 6.7% of the original size.

So how does Pack200 achieve such high compression ratios? For the full answer, read William Pugh’s paper linked above. In short, Pack200 combines a number of different low level techniques, each of which exploits a different aspect of the Java class file format or Java bytecode. Among other things, Pack200:

– Merges the constant pools of each individual class file to form one constant pool that’s shared for a collection of class files. This is a big win since many of the same constants (common class names for instance) appear in many of the individual constant pools. Basically, this technique eliminates the redundancy in the individual constant pools of class files in a package.

– Uses delta encoding whenever possible. For trivial example, to store two Strings that share a common section of characters, only one complete String needs to be stored, and the second String is stored as a delta off the first. This kind of encoding can be applied in a lot of different areas - even storage of similar sequences of numbers optimized with this technique.

– Implements variable length encoding. Variable length encoding allows values of the same data type to be stored using different amounts of space. For example, a small numeric value of a data type can be stored using fewer bytes than a larger value of that same data type.

– Enables more optimal secondary compression. Pack200 packaging is normally followed immediately by a gzip stage, which reduces the size of the package further.

– Incorporates a number of optional techniques for further reducing the size of class files. For instance, debug attributes can be stripped out of the class files.

Pack200 packing takes a JAR file as an input. A .pack file is then produced with the packed bytecode. The pack process is normally finished by gzipping the .pack file (the secondary compression mentioned above), ending with a .pack.gz file. The unpack process simply reverses the above: first ungzipping the .pack.gz file, and then producing a JAR file from the .pack file. The pack / unpack process is fairly resource intensive (especially compared to simple JARing). The pack process is done once on the server, and the unpack process is done once on each client machine at the time of install or deployment.

Pack200 has two interfaces: a command line interface and a programmatic interface. The Java 5 JDK and JRE ship with two command line tools: pack200 and unpack200. Interestingly enough, these command line tools are native C++ executables - they have no dependency on a Java runtime. There is also a class than can be used to programmatically pack and unpack - java.util.jar.Pack200.

Even though Pack200 works against JAR files, it’s not correct to think of it as a generic compressor for JAR files. For instance, a JAR file containing mostly resources will not achieve that high of a compression ratio - but a JAR file containing mostly class files will. Pack200 is a compressor for Java bytecode - it just uses JARs as input for convenience and integration with existing packaging mechanisms. The Pack200 format is lossy - run a JAR all the way through (pack and unpack) and the final JAR will be different from the original. Of course, the two JARs will be equivalent from the point of view of a Java virtual machine. This has some implications for JAR signing, so be sure to read through Sun’s deployment guide linked above if you’re going to combine Pack200 and JAR signing.

Don’t make the mistake of thinking that Pack200 is useful only for Web Start and applet scenarios. Any time there is a need to move Java bytecode from a server onto many client machines, Pack200 can be used to make life easier for both the client and server. A heavyweight Java client application could greatly benefit from Pack200. All it requires is Java 5 on the client machine, and an install routine that unpacks the Pack200 packages as part of the installation. This install routine could itself be written in Java and would make use of the Pack200 support in Java 5 (either programmatically or through use of the command line tools). An install scenario like this could reduce the download size of a client-side Java application by half or more, greatly reducing the bandwidth used by the server and the wait time on the client for the download.

A real-world example of where this technology is being successfully used outside of the Web Start and applet case can be found in the popular Eclipse IDE. Starting with version 3.2, the Eclipse update manager (used to get new Eclipse features and update existing features) will make use of Pack200 technology when Java 5 is available on the client. This will greatly reduce the wait time for updates to through the update manager, as well as reducing the bandwidth burdens on the Eclipse mirrors.

If you’re writing client-side Java applications in 2006 and haven’t yet looked at Pack200, take some time to evaluate the technology and see how it might fit into your overall deployment strategy.

No Comments | Tags: Uncategorized

Comments are closed.