On March 22, 2019 we released Apache NiFi MiNiFi C++ 0.6.0 . This brings a lot of features that I can't begin to cover in a single post .
There is one in particular I've been using quite a bit for rapid prototyping: Python processors . We have a simple example in our code base called SentimentAnalysis . This is a simple processor that performs a sentiment analysis on incoming text from the content of a flow file. It provides a score from 0.0 to 1.0 that indicates if the text is neutral, positive, or negative. This processor requires nltk and VaderSentiment to be installed via pip
With the introduction of Python processors I hope that developers can quickly create and deploy features written in Python to MiNiFi C++. The most important aspect is that Python processors are easy to add, remove, and run. The default configuration defines a subdirectory, minifi-python. Simply place your Python processors into this directory. The file name will be the processor name defined as the class in your flow.
To demonstrate this I've written a short flow ( at the end of this post ) that uses our sentiment analyzer. The flow simply pulls data from a directory on my file system, using GetFile. These flow files are then sent through the Sentiment Analyzer, which is written in python, and then logged with LogAttribute.
My test files are short in nature. Here is the example output from LogAttribute with a negatively scored payload.
As you can see the sentiment analysis provides different scores for a more positive payload. VaderSentiment with its default set does a good job at scoring text. I encourage you to read more about nltk and it sentiment analyzers.
What is required?
Python processors simply require that you implement a describe, on Initialize, and onTrigger functions . On The describe function allows us to provide a description of your processor to the framework. The onInitialize function allows you to specify whether your processor supports dynamic properties and the properties that make up your processor. The YAML will configure your processor as it would any C++ or Java processor implemented with our JNI capabilities.
What does it all mean?
It's a little unfair to couch this as a rapid prototyping feature. I think many will use it as such; however, these processors function in the same way C++ or Java processors do. They’re simply function calls into bound functions. There will be added cost, but it’s likely not beyond that of your I/O. As a result you should be able to use Python processors in your every day flows.
The example I provided is short but demonstrates how you can access your Python processors. In any case, if a dependency for your Python script does not exist we will not allow that processor to be loaded. In the future we hope to improve namespace references via the flow. If you look at the example flow, below, the class name is defined as org.apache.nifi.minifi.processors.SentimentAnalysis. In future releases we'll improve how we isolate and reference Python processors.
Feel free to give it a try and if you have any issues let me know. I encourage you to use one of our binary releases to give it a try.