Friday, April 3, 2015

Command-line utility for PDI Marketplace (using Spoon?!)

The PDI Marketplace is a great way to extend the capabilities of your PDI installation, using excellent contributions from the community, and some less-excellent ones from yours truly ;)  At present, the Marketplace is a core PDI plugin (meaning it is not in the engine itself, but is included in all PDI distributions, both CE and EE).

As a proper plugin, its classes and "API" methods are not so easily accessible from the outside world, although it is possible using reflection or other methods. The good news is that because it is a core plugin, we know the JAR(s) will always be in a constant location (provided you haven't uninstalled or deleted it manually). Thus we could always add the plugins/market folder to our library/class paths if we needed direct access to the API.

So to get a command-line utility for listing, installing, and uninstalling PDI plugins via the Marketplace API, I could've created a Java project, done some classpath black magic, and gotten in working.  But to be honest that doesn't sound very fun, and this blog is all about fun with PDI :)

Now to the fun part! Did you know when you run or spoon.bat, you are really leveraging a capability called the Pentaho Application Launcher (on GitHub here), which was designed to offer a better experience for setting up the JVM environment than the "java" executable. In the first place, it allowed us to add all JARs in a folder to the classpath before the java interpreter supported wildcards, and also it allows us to build the classpath (as a parent URLClassLoader) dynamically, which avoids the max-length problem with setting the classpath in Windows, as we add a LOT of things to the PDI classpath.

Most users happily run and trust the system to set everything up. However the application launcher can be configured with a properties file (our defaults are in launcher/ but also from the command line, including the following switches:

-main: Allows you to set the main class (instead of Spoon.class) that is executed.
-lib: Allows you to add library folders to the environment.
-cp: Allows you to add classpath folders to the environment.

The -lib and -cp arguments are added to those in, so you don't have to worry about mucking with the existing setup.  In fact that's the point, I wanted the regular PDI environment but with my own entry point.  The kicker is that the folders have to be relative to the launcher/ folder. If you know where the libraries/JARs are (and don't mind figuring out the relative path from the absolute one), you can just add the relative paths.

My approach was the following: I wanted to use a small Groovy script to list, install, or uninstall plugins for PDI using the Marketplace API. I chose Groovy because I didn't want to set up a whole Java project with what would've been provided dependencies, and build and deploy a JAR with that one simple class.  Here's what the Groovy script looks like:

import org.pentaho.di.core.*

entries = new MarketEntries()

command = args[0]
if(command == 'list') {
  entries.each {println}
else {
  args[1..-1].each { arg ->
    entries.findAll { == arg}.each {
      try {
        switch( command ) {
          case 'install': Market.install(it, null); break;
          case 'uninstall': Market.uninstall(it, null, false); break;
          default: println "Didn't recognize command: $command"
      catch(NullPointerException npe) {
        // eat it, probably trying to get a reference to a Spoon instance

Then I needed a way to call this script with Groovy, but with the existing PDI environment available.  The whole thing is pretty easy to do with Gradle, but that approach downloads its own PDI JARs, doesn't support plugins, etc.  Plus it's not as fun as using to run a Groovy script ;)

Another thing I wanted to do was to dynamically find my Groovy interpreter, as I need to add its libraries to the library list for the application launcher. That's easily done in bash:

`dirname $(which groovy)`

That finds the executable, the library folder is one level up in lib/.  However I need this path as a relative path to launcher/, which proved to be more difficult. The most concise solution I found (for *nix) was to use Python and os.path.relpath:

python -c "import os.path; print os.path.relpath('`dirname $(which groovy)`'+'/../lib','launcher')"

Then it was a matter of adding that folder and ../plugins/market (the relative location of the PDI Marketplace JAR) to my library path, setting the main class as groovy.ui.GroovyMain, and passing as an argument the above Groovy script (which I called market.groovy located the same place as the bash script called market, both in my ~/bin folder).  Here's the resulting bash script:


./ -lib $(python -c "import os.path; print os.path.relpath('`dirname $(which groovy)`'+'/../lib','launcher')"):../plugins/market -main groovy.ui.GroovyMain `dirname $0`/market.groovy "$@"

Now I can go to my PDI installation and type "market list", and I get the following output (snippet):
PDI MySQL Plugin
PDI NuoDB Plugin
Apple Push Notification
Android Push Notification
Ivy PDI MongoDB Steps
Ivy PDI Git Steps
Vertica Bulk Loader

I can then put any of these friendly names into a command to install/uninstall:

market install "Ivy PDI Git Steps"
market uninstall "Android Push Notification"

The logger should output the status of the install/uninstall operation:
General - Installing plugin in folder: /Users/mburgess/pdi-ee-

There you have it!  Like I said, there are probably MUCH easier ways to get this same thing done, and perhaps someday I'll write a proper utility (or we'll add a CLI to the Marketplace itself). However it was much more fun to call to get a headless PDI and Groovy to install a Marketplace plugin.


(Note: I don't think this works with or because they already use and set the main class to Pan or Kitchen using the same technique)