Wednesday, November 7, 2012

UDJC to Verify Transformations

The "Verify Transformation" capability of Pentaho Data Integration (aka Kettle) is very handy for spotting issues with your transformations before running them.  As a sanity check or as an auditing feature, I thought it would be nice to verify all transformations in the repository using a Kettle transformation.

To do this, I wrote a User Defined Java Class (UDJC) step to call the TransMeta.checkSteps() API method, just as the Spoon GUI's "Verify transformation" button does. However, instead of displaying a dialog, I put the same information out onto the stream.

In order to get access to the repository's transformations, I started with a Get Repository Names step.  Since Kettle jobs don't (currently) have the same verification functionality, I set the filter to only return transformations:

NOTE: The UDJC step will accept all incoming rows (Jobs, Transformations, etc.) but will only process those whose "meta" object is of type TransMeta.

To get access to the TransMeta object for each transformation, I used the Auto Documentation Output step, with the output type set to METADATA. This puts (among other things) a field on the stream containing the TransMeta object associated with each transformation:

I wired the autodoc step to my UDJC step and output the same fields as are available in the GUI version:

If this is determined to be a useful step, I may turn it into a proper step plugin, to remove the need for the "show_successful" field, and to instead provide a dialog box to let the user choose which fields (and their names) to put out on the stream.  UDJC steps are just an easy way for me to get basic functionality out there and try to get early feedback.

I created a dummy repository on GitHub so I could have a Downloads area where I will start storing sample transformations that contain UDJC steps, etc.  This is slightly easier than putting the UDJC code on Gist or Pastebin, especially in this case since there are multiple steps involved.  The direct link to the above transformation is here.

If you give this a try or otherwise have comments, I'm eager to hear them :)