Monday, November 10, 2014

ZooKeeper Input and Output steps in PDI

While working with Apache Drill and PDI (see previous posts), I found myself needing to read and write values to and from Drill's ZooKeeper instance. Since ZooKeeper can be (and is) used for many other applications besides Drill, I thought I'd write some simple ZooKeeper steps for PDI, namely ZooKeeper Input and ZooKeeper Output. Also I thought it would be nice to be able to view and edit values in my ZooKeeper instance while designing transformations, so I integrated a cool UI called Zooviewer into Spoon.

The ZooKeeper Input step takes paths to ZooKeeper values, and fills those values into the field names you select in the dialog:




The above screenshot shows the ZooKeeper Input dialog, notice that all PDI types are supported, as long as they can represent their values as a byte array.

The ZooKeeper Output step also takes field names and paths, and will recursively create the paths if you check the "Create path(s)?" checkbox.

As of version 1.1 of the plugin (now in the PDI Marketplace), the ZooKeeper Output step also supports variable and field substitution for the Path values, in some pretty cool ways:

1) In the Path column of the Output Fields table, you can use a variable/parameter, such as ${pathParam}

2) You can also use the field-substitution notation, which will inject values from the given field as values for the path(s). This is a little-known feature of PDI and as far as I know has only been implemented in the Mongo plugin and (now) the ZooKeeper Output step. To use this, suppose you have a bunch of key/value pairs on the PDI stream going into the ZooKeeper Output step, where the key is the path where you want the value stored in ZooKeeper. Then you'd set the Path value in the ZooKeeper Output dialog to ?{key}. Notice the question mark instead of the dollar-sign, this indicates a field substitution versus a variable substitution.

3) The ZooKeeper Output step will perform another variable substitution on the field value, in case your field values contain variables.  Below is a screenshot showing this use case:



Notice the ZooKeeper Output dialog uses the ?{key} notation to get its path values from the key field in the stream. The key field values include a variable ${pathParam}, which is filled in at runtime (see "Execute a transformation" dialog under Parameters). Running this will create three paths and store 3 values.

To view and edit the values in your ZooKeeper instance from PDI, select Manage ZooKeeper from the Tools drop-down menu. This will bring up a view window and an edit window, where you can create and delete child nodes, change values, etc.  Here is a view of my ZooKeeper instance after running the test transformations I showed above:



I'm interested to see if folks find this plugin useful and if so, how they are using it. As always, I welcome all questions, comments, suggestions, and contributions.

Cheers!