Monday, April 4, 2011

What's in your <license>?

Not very long ago, and not very far away, a conversation opened up about checking the licenses used on dependencies, to avoid issues with pulling in dependencies with toxic licensing in relation to commercial work, or at least pulling them in by accident.

Whilst I suspect this is not exactly a 'new' issue, its certainly one which people haven't been particularly inclined to deal with in a systematic way in the past.

Now, as I'm now doing a lot of my work inside the bounding box of structured projects, and as I had indeed seen a dependency pull in with GPLish consequences, I thought I'd look-see if maven could actually be our friend here, and help deal with this systematic issue - heck its dealing with dependency pulls in a systematic way, so why not the license checks too?

Well as it turns out, inside the structure of a maven pom, there is a licenses element, which can contain one (well actually zero but lets not be picky) or more license strings, like this :-

       <name>The Apache Software License, Version 2.0</name>  

Great! I'll just go dig for the maven plugin to validate my project dependencies against their license definitions and bob will, indeed, become my father's brother.

Well, it turns out, not so much.

First of all, it didn't look like there was such a plugin (apart from Apache RAT Maven plugin, but that's specifically to Apache projects) - nevermind, I should be able put something together ... so...

.. I did, and here it is the maven-license-validator-plugin.

I'm not going to bang this drum too hard, as its actually very little code at all - very very little indeed. But it works. Which is nice.

Anyway, that being done, I've come to realise just how much of a fricking mess the whole 'license' element is in in maven.

Take a look at this :-

             <value>Apache License v2</value>  
             <value>Common Public License Version 1.0</value>  
             <value>The Apache Software License, Version 2.0</value>  
             <value>The Apache Software License, Version 2.0</value>  
             <value>Apache Software License - Version 2.0</value>  
             <value>Apache License, Version 2.0</value>  
             <value>Apache License Version 2.0</value>  
             <value>Apache License</value>  
             <value>Apache 2</value>  
             <value>CDDL 1.1</value>  
             <value>COMMON DEVELOPMENT AND DISTRIBUTION LICENSE (CDDL) Version 1.0</value>  
             <value>Common Development and Distribution License (CDDL) v1.0</value>  
             <value>Public Domain</value>  
             <value>Bouncy Castle Licence</value>  
             <value>BSD style</value>  
             <value>Google Web Toolkit Terms</value>  
             <value>ICU License</value>  
             <value>Revised BSD</value>  
             <value>org.slf4j:slf4j-api:jar:1.5.6</value> <!-- no license info - confirmed at MIT Licensed -->  

1. How many ways are there to write 'Apache License V2'?! I know I could write up funky regexp to support picking out the variants being used and reduce the number of elements, but <sigh/>, and in any case I'd be worried to then have an overly-relaxed check miss some strange wording

2. Sun/Oracle and not putting any license in their maven artifacts - now as I understand it getting those artifacts into Central was a fight ... but no license ... grr.

Anyway - I'm seeing vibes indicating that this is an area the gods of Maven are starting to focus on, but as I can see it, its going to be a long hard road towards cleaning this area of maven up and reaching that clear nirvana of easily being able to go 'yep', licenses are a-ok.



Post a Comment