Tuesday, December 4, 2012

GroovyConsoleSpoonPlugin with JSR-223 Support

I recently gave a presentation of my GroovyConsoleSpoonPlugin (see earlier posts) to the Pentaho crew, and I got a lot of great feedback on it. Specifically, Pentaho Architect Nick Baker suggested it would be good to have the Groovy-ized API available as a step, so we could leverage it during transformation execution.  For the presentation I had made changes to the Kettle code to allow the adding of RowListeners and TransListeners, so the Groovy Console plugin could interact with a running transformation.  However, his suggestion makes a lot of sense, as I can keep the plugin a proper plugin, with no changes to Kettle code needed.

I thought about creating a Groovy scripting step and adding my staging code to make scripting easier.  However that involves a lot of setup and boilerplate code that already exists in other steps (such as the JavaScript and Script steps).  I still may do that, in order to leverage the full power of the Groovy language, but in the meantime it occurred to me that I could just create a script engine wrapper (using the javax.script.ScriptEngine interface, etc.) and use the experimental Script step as the vehicle.

So to that end I added GroovySpoonScriptEngine and GroovySpoonScriptEngineFactory classes, which wrap the existing plugin code inside a JSR-223 compliant scripting engine.  Then the Script step can execute Groovy code (with my staging code already injected) during a transformation.

To get Spoon to recognize my scripting engine, I had to add my plugin directory to the list of libraries when launching Spoon.  This is done in launcher/launcher.properties like so:



After starting Spoon, I created a transformation with a Generate Rows (for one row) with a field called "fld1" with a value of "hello".  Then I wired it to a Script step, which is wired to a Dummy step.

The key to selecting a scripting engine in the Script step is to name the step with an extension that corresponds to the engine name.  So for the Groovy scripting engine the step name ends in ".groovy", and for my GroovySpoonScriptEngine the step name must end in ".groovyspoon".  My sample transformation looks like this:

Inside the Script step I put some very simple code showing the use of the row data in GStrings, output variables, local variables (which are not returned to the step processor), and the Groovy-ized API available through the plugin:



The script step does have a bit of a limitation (due to its generic nature) that the output fields must be specified in the Fields table at the bottom of the dialog.  This is in contrast, for example, to the User Defined Java Class (UDJC) step, which can add values to the row on-the-fly (see previous posts).

Previewing the Dummy step gives the following result:


So now the plugin supports the Groovy-ized API in the Groovy Console, the command line, and the Script step.  With these three entry points, hopefully interacting with PDI will become alot easier and even more powerful!

I hope to get time to document the Groovy-ized API on the project's wiki.  In the meantime, take a look in the code at staging.groovy, that contains most of the methods, properties, etc. available.  In addition, of course, is the full Kettle API and all Groovy language features, so the sky's the limit when it comes to what can be done :)

The project is on GitHub under GroovyConsoleSpoonPlugin. As always, I welcome all comments, questions, and suggestions.  Cheers!