Java on Linux has been always a “special” topic. They don’t mix well.
The mindset of Linux distributions is very different to the Java world when it comes to build software. This is understandable as they have different requirements.
In the Java world, there is the concept of artifacts. You build org.foo.bar:bar-moo:1.1 once and it stays there forever, archived for anyone to use it. Tools like maven and ivy allow developers to specify in their source tree the specific dependencies of their components and those are grabbed from the network, the software built and then publish the output as a new artifact that others can grab.
Linux distributions on the other hand, bootstrap the complete stack from source. They don’t take the binary artifact from upstream but build it, and then use the binary they built to build the next. This seems to work pretty well for C, C++, and for Ruby, Python, let’s say it “works”.
When it comes to package Java software, Linux distributors find themselves in the following situation:
This clashes with Linux in various edges:
Distributions needs to build from source. Even if you get rid of the above requirement and you bundle all your dependencies, distributions want to build everything from source. This has technical and legal reasons. SUSE build system does very complex checks on every package that it builds. Those checks are part of the quality we sell to our customers. Other reasons are legal: I am still trying, for example, to build the Play! framework. Even if it is BSD, it includes some .jars inside of unknown origin. What would happen if one of these jars results to be proprietary?. Michael Vyskocil had a similar issue with openproj and its bundled dependencies.
Another reason to build from source is support. Enterprise distributions sell support and if a customer has a problem, we will fix it on our own and not wait for upstream to release a new version. Having a standardized way to build from source with our own fixes allows us to serve our customers. We can bundle jars in our application, but if a bug traces back to a jar we included, we would need to change the complete build description of the product in order to take this component. If we are able to rebuild this component at all. It already happened to us once with an XML-RPC library. And we were glad that it could be fixed by adding a patch to the rpm build description.
Because the Linux distributions know that they are not the center of the universe, they adapted. At the beginning things where still ok. Ant was very popular and basically you recursively packaged all build dependencies until you could build your package, in the same way:
Something like this:
find . -iname ‘*.jar’ | xargs rm -rf
export CLASSPATH=$(build-classpath foo)
Until this was true, the world was still fine. ant needed bootstrapping, but this was doable.
Maven is at the same time revolutionary and one of the biggest atrocities I have seen when building software.
On the positive side:
on the negative side:
All the above means that maven basically requires all the software it is supposed to build. Not the best design for a build system.
To make things worse, maven grabs dependencies from the network, which is what is disabled in our distro build process.
Fedora has done quite a progress providing a maven stack, by improving extending on the conventions the JPackage project started for maven packages. This is implemented using what is called a “dependency map”.
The approach works by installing some xml files per-package that map the maven artefact identifiers (groupId, artifactId) to a local jar in the system. Then maven itself is patched to include a resolver for artefacts with some properties:
What I don’t like for this approach is:
Why would anyone in their sane mind use XML files to create mappings to files when you are in a UNIX-like OS and you have the filesystem and symbolic links?.
It is very explicit. It does not rely on a simple convention.
A second issue is how packages are built. This is SUSE specific. Fedora can bootstrap packages with circular dependencies by introducing a binary package A, build other dependencies until it can build a real A. Once a package is built, it stays in the buildsystem frozen as an artefact (just like the Java world).
In the openSUSE Build Service, the repository is always ready to bootstrap. For circular dependencies you create a package A-bootstrap that provides A and set the project config to prefer A. When A does not exist, A-bootstrap is grabbed, but as soon as A is there, it is preferred and used. When a package changes the packages depending on it are rebuilt automatically. This approach has several advantages, but makes hard to bootstrap a collection of packages where everything depends on everything.
In openSUSE, we have successfully build many maven dependent packages in the Java:base project without having a maven package by using the maven ant plugin to generate a tarball with ant build files.
This method does not work for every package, specially when files are generated, then one needs also to include those. But they may be good enough for solving our specific bootstrapping problem. The question is how many bootstrap packages would we need.
Another idea is to use package with binary jars for bootstrapping.
Fedora is not very happy with the current situation either, and they have been researching adding native support to Koji to build maven packages.
In any case, I think there is room for improvement everywhere. I think the Maven infrastructure can be simplified taking into account that what maven contributed to the world was a (now) popular way to identify a module, and this is now being used also outside of Maven. Apache Ivy, SBT, Gradle, etc all support maven-style repositories and support refering to an artefact as groupId:artifactId:version.
Why not instead of a depmap just have:
/usr/share/java/foo.jar /usr/share/java/org.bar/foo.jar -> /usr/share/java/foo.jar /usr/share/java/org.bar/foo.pom
And have the Maven patched resolved to just look there?
If you need parallel versions, then just
/usr/share/java/foo1.jar /usr/share/java/foo2.jar /usr/share/java/org.bar/1.0/foo.jar -> /usr/share/java/foo1.jar /usr/share/java/org.bar/2.0/foo.jar -> /usr/share/java/foo2.jar /usr/share/java/org.bar/1.0/foo.pom /usr/share/java/org.bar/2.0/foo.pom
Or use the standard alternatives:
/usr/share/java/foo.jar -> /etc/alternatives/foo.jar
The resolver would first look for the specific version described in the .pom file as /usr/share/java/$groupId/$version/$artifactId.ext. If it is not found, it could fallback to just look for /usr/share/java/$groupId/$artifactId.ext. This supports most cases where we just have one version for the system and exceptions for some packages where providing a specific version in parallel is also required. If the same jar is also known under a different groupId, well, then you create another symlink.
Then, build-classpath is enhanced so that in addition of being able to say ‘build-classpath commons-logging’ you can also call ‘build-classpath commons-logging:commons-logging’. Identify every module by this convention.
The same with Provides: java(commons-logging:commons-logging). Fedora is already doing this as mvn(..), but is this maven specific?.
Why do we need xml files with maps, fragments of XML files that need to be updated at install and uninstall time?.
I discussed this with Fedora developers Alexander Kurtakov and Stanislav Ochotnicky and they mostly agreed with my concerns. They pointed me to Carlo de Wolf’s work on a similar solution, but using a standard maven repository layout.
Carlo’s solution does not touch maven but is implemented as a plugin that gets loaded using a custom config file that is used when you call the wrapper script fmvn instead of mvn (for Fedora-Maven).
The whole solution as they described it has some extras like macros to symlink the maven repository artifacts so that they can be found as artifacts in the JPP layout. I am not sure if we need this. What I like from the solution alone:
I have been playing with Carlo’s plugins and it looks very promising. Fedora would need time to switch to a solution like this, but at SUSE we don’t have maven in our stack so we have nothing to lose and at the same time we can help serving as a test bed.
Not having the need to patch maven allows us to use a vanilla build of Maven for bootstrapping.
maven-bootstrap (upstream binary release, Provides: maven)
fmvn-bootstrap (binary jars built locally with maven, Provides: fmvn)
Note: If you have more than one package with the same capability and want to use it in (Build)Requires, you will need to setup “Prefer:” in prjconf.
We would like to build now maven using fmvn. Here is where the circular dependencies start. We need maven (provided by maven-bootstrap) and it dependencies, like plexus and a big bunch of maven plugins.
So lets say I need a bootstrap package for maven-compiler-plugin:
org.apache.maven.plugins:maven-compiler-plugin : using version 2.3.2
Which generates the following files:
If I build it, I get an rpm with the following layout:
/usr/share/java/maven-compiler-plugin.jar is just a symlink to the real jar. This layout is enough for fmvn to find the artifact and also for legacy packages to just use build-class-path. It would still be better to enhance build-class-path to also accept groupId:artifactId keys and return the path to the jar.
The -bin suffix is to allow then the real package (built from source) to coexist in the same repository. The package with the -bin suffix also “Provides:” the package without the suffix so it can be used by dependent packages. Actually both “Provide:” java(org.apache.maven.plugins:maven-compiler-plugin) which is what a package that depends on it should “BuildRequire:”.
Once Carlo’s resolver works with “mvn install” I will try to build a repository following this method.