Misadventures Recompiling the Java Ecosystem from Source

November 2023

Back home

Something is rotten in the state of Maven.

When you compile a Rust project, the package manager Cargo will download and compile all the project's dependencies from source.

When you compile a C project, you typically rely on compiled libraries living somewhere on your system. But, those libraries were likely compiled from source recently, because improvements and bug fixes in C compilers lead to better binaries.

When you compile a Java project, you always rely on compiled libraries downloaded from an online "repository" like Maven Central. In principle, all of the compiled libraries in Maven Central are open-source, but if you want to compile them from source yourself, you are in for a real Kafkaesque nightmare.

Deviating from the norm: Why compile your Java dependencies from source?

I know of one compelling reason and two minor reasons why we should want to compile our dependencies from source instead of downloading them from Maven Central.

The compelling reason is to prove that our project is immune to NoClassDefFoundError and NoSuchMethodError. When you compile a Java library, the Java compiler ensures that every class and method you use are really present. However, when your library runs as part of a bigger project, it may be using a different version of a dependency that does not have exactly the same classes and methods.

For example, did you know that JUnit 4, the most widely used testing library for Java, is not compatible with Hamcrest 2, even though Hamcrest 2 is turning five years old this month? You cannot use JUnit 4 and Hamcrest 2 in the same project—a fact you will only discover at run time, not compile time.

Because Java does not do compile-time typechecking for compiled libraries, it isn't hard to experience run time errors in projects that compile just fine:

  1. Somebody writes Library A version 1.0.0.
  2. Somebody writes Library B version 1.0.0, which depends on Library A.
  3. Library A gets an upgrade to version 1.1.0, removing a class and adding a parameter to a method.
  4. I write Executable C, which depends on Library A (1.1.0) and Library B (1.0.0). My project compiles just fine, but crashes when I try to invoke some of Library B's routines, since it wants to use features that were removed form Library A.

There isn't much we can do about this situation, but I would at least want to know about it at compile time.

The minor reasons regard improvements to javac itself. Java does most of its heavy lifting in the JIT, so the compiler itself does not see many meaningful changes over the years. Still, there have been some: numerous bug fixes and invokedynamic to name a few. (Yes, invokedynamic is used even in normal Java code! It helps make lambdas fast.) Java projects compiled with javac 1.5 will not benefit from any fixes or new JVM bytecodes.

What does it take to build a Maven project from source?

Maven is the premier build system for Java. If we are going to compile any Java projects from source, we should start there, since most major projects use Maven.

At its heart, Maven is not complicated (although I'll be the first to point out that you need to learn a thousand new words to be able to use it effectively). Each Maven project needs three things to build from source:

  1. A parent, which is just another Maven file that sets up some default settings. If you don't specify one, Maven gives you a default parent.
  2. Dependencies, which are names and versions of other projects.
  3. Plugins, which are names and versions of other projects that will be invoked at compile-time.

Maven has a lot of verbs, but the only one that matters to us is install. The overall plan is simply:

  1. Download the source for your project's parent. Run mvn --offline install to build and install it.
  2. Download the source for each of your project's dependencies. Run mvn --offline install to build and install each one.
  3. Download the source for each of your project's plugins. Run mvn --offline install to build and install each one.
  4. Run mvn --offline install to build and install your project!

Signs of trouble

Let's start fresh, with nothing downloaded in Maven's cache:

$ rm -rf ~/.m2/repository

Obviously we'll need some plugins, so let's start with the basic remote-resources-plugin:

$ git clone git@github.com:apache/maven-remote-resources-plugin.git
$ mvn --offline install -f maven-remote-resources-plugin/pom.xml
[...]
The following artifacts could not be resolved: org.apache.maven.plugins:maven-plugins:pom:40

Makes sense; this project uses a non-default parent so we'll need that first. It lives in a different repo, so let's grab that:

$ git clone git@github.com:apache/maven-parent.git
$ mvn --offline install -f maven-parent/maven-plugins/pom.xml
[...]
The following artifacts could not be resolved: org.apache:apache:pom:31-SNAPSHOT

Yet another parent, but we can get that one too:

$ git clone git@github.com:apache/maven-apache-parent.git
$ mvn --offline install -f maven-apache-parent/pom.xml
[...]
The following artifacts could not be resolved: org.apache.maven.plugins:maven-remote-resources-plugin:jar:3.1.0

Uh-oh. UH OH. That looks a lot like a circular dependency!

The first problem: Staircases

As it turns out, there isn't a circular dependency. But, the Java ecosystem isn't a clean tree either. Instead, the Java ecosystem is a staircase:

Maven is far from the only offender. Antlr is another example of a staircase in a popular Java project:

It's no wonder these staircases keep cropping up, given how easy it is to depend on something once any version of it shows up in Maven Central:

  1. Author library A and upload it to Maven Central
  2. Author library B, depending on A, and upload it to Maven Central
  3. Update library A to depend on B, and upload it to Maven Central

These staircases have a real cost, too! If you run mvn install on a trivial, zero-dependency project with no explicit plugins, Maven 3.9.5 will download 126 pom files and 44 jar files, including:

Do I look like I'm made of disk space?

The second problem: Lost cities

Maybe you're willing to accept some staircases and break some of them by hand. If you continue down this road and start really chasing down every dependency of every project, you start finding lost projects that aren't truly open-source anymore because their source code isn't hosted anywhere.

Here's a good example:

But where is oss-parent's source code? Someone uploaded it to Maven Central on January 27, 2014, but the official repo only has version 8-SNAPSHOT.

In other words, the transitive dependency tree of every project built with Maven today includes something that effectively isn't open source anymore.

Here's another example:

JUnit 4, the most popular testing framework for Java, can't be built without Google's maven-replacer-plugin, which is no longer available anywhere online as far as I can tell. It was last updated in 2012 and pulls in dependencies with numerous vulnerabilities, a fact I imagine is only remotely acceptable because it is only used at compile time.

Here's another example:

Guice, a very popular dependency injection framework, directly depends on a library called aopalliance. While aopalliance is technically still online, its source code is only available via CVS, and today (2023/11/6) SourceForge's CVS feature is down. It is scheduled to be retired permanently soon.

Here's another example:

Log4j, one of the most popular logging frameworks, transitively depends on iso-relax:

The iso-relax project is in the same broken CVS boat as aopalliance.

What next?

My efforts to build all my Java dependencies from scratch are on hold. Once you go chasing down every compile-time dependency, the ecosystem is simply too broken to build most major packages.

Maven Central never should have allowed uploads that create cyclic dependencies between components.

Many key projects are still using ancient build-time dependencies that have long outlived their mission lifetimes.

I'd love to see a Java ecosystem bootstrapped with Nix to ensure everything builds together compatibly, but I have no idea how to get there.

Back home