PowerUp
"How to create TOS components" Tutorial : Part 10

Component Creation - Part 10

Today we will do something different : we will modify an existing component.

Note : I am actually modifying the component as I write this tutorial lesson, so you will see all the trial and error process.
You need to open the files I mention and read the code, else you would not be able to follow my explanations.I am not copying the whole file contents here because it would generate too many lines.
.

On the Talend community forum (Talendforge) someone asked to add a feature to the standard component tFileInputJson, you can see the post here.

This component reads a json file (or an url that returns a json structure) and finally sends the result to the output connections.
I found a bit strange that the component uses the script engine (javascript), but I am sure there is a good reason for that, in general we don't need to analyze in the detail the whole component to update it as requested.
The post author asked to add a "die on error" option, so that, if unchecked, a job could catch exceptions and keep running.
If we focus on that specific task, our job is going to be much simpler.

We are dealing with an input component, most likely it will open a file or stream in the begin section, scan lines at the end of the same section, get the fields in the main (or at the end of the begin one) section and finally close eventually open files in the end section.
This means that normally we have i/o operations that can fail in the begin and end sections, we will need to protect them in a try / catch, plus we will have some json process that can also fail in the begin and main (if it exists).

Getting started

First thing we need to get organized in order to modifiy and test the component, so we will remove it from the standard component folder and place it in our custom components one (we will also create a backup copy of the original one, just in case we mess it up badly :) ).
So, browse your file system to your talend install dir, look into the plugins/org.talend.designer.components.localprovider_xxx folder and move the tFileInputJson folder to your custom one, then create a backup copy of it.

Checking the content of the component folder we immediately notice that there is no main section, that's ok, it just means that all the record output process is handled either in the begin (most likely) or end section.
In fact if we check the end section we realize it is the common record cycle closure with the nb_line posting to the global map. There is not even a file close functionality, which would probably indicate the file is closed in the begin section, so probably it's content is fully loaded in memory (argh!) and then closed, we will check that.

Let's start by adding the "die on error" parameter in the XML descriptor, typically this is done adding this :



...
<PARAMETER
NAME="DIE_ON_ERROR"
FIELD="CHECK"
NUM_ROW="100"
>
<DEFAULT>false</DEFAULT>
</PARAMETER>
</PARAMETERS>




This will be completed by adding the line DIE_ON_ERROR.NAME=Die on error in the messages.properties file. You will notice here there are many different versions of this file, this is because of language translations, we will copy the same string for all the languages from another component that has the die on error (i.e. tMysqlOutput).
Then we are lucky because everything is handled in the begin section, so that's the only one where we need to read the new parameter (I know, I am lazy :) ).

To do that, add this line in the template part of the begin file, after String decimalSeparator =...

String dieOnError = ElementParameterParser.getValue(node, "__DIE_ON_ERROR__");



The idea, now, is to
  • Locate all the statements that can fail
  • Wrap them in try / catch blocks
  • Throw the exception in the catch block if the dieOnError is true

Reverse engineering complex components (this is not really one of them, luckily) it's easy to get lost in the code, so it's good to know what to look for upfront, and expecially what we can skip.
Runtime exeptions are thrown only by java output code, so there is no point in looking at the java jet code (that reduces a lot the lines we need to read), plus we will probably not need to secure the script engine initializations etc... because if they fail, they would do it systematically, we don't want to "hide" those failures.
Locate this part of code in the begin file :


...
<%if(!isUseUrl){//read from a file%>
java.io.FileReader fr_<%=cid%> = new java.io.FileReader(<%=filename %>);
<%}else{ //read from internet%>
java.net.URL url_<%=cid %> = new java.net.URL(<%=urlpath %>);
java.net.URLConnection urlConn_<%=cid %> = url_<%=cid %>.openConnection();
java.io.InputStreamReader fr_<%=cid %> = new java.io.InputStreamReader(urlConn_<%=cid %>.getInputStream());
<%}%>
java.lang.Object jsonText_<%=cid%> = org.json.simple.JSONValue.parse(fr_<%=cid%>);
jsEngine_<%=cid%>.eval("var obj="+jsonText_<%=cid%>.toString());

java.util.List<org.json.simple.JSONArray> JSONResultList_<%=cid%> = new java.util.ArrayList<org.json.simple.JSONArray>();




We know that opening a file or an url connection can throw exceptions, so let's start to add some code there :


...
<%if(!isUseUrl){//read from a file%>
java.io.FileReader fr_<%=cid%>=null;
try {
fr_<%=cid%> = new java.io.FileReader(<%=filename %>);
}
catch (java.lang.Exception e) {
<% if (("true").equals(dieOnError)) {%>throw(e);<% } %>
}
<%}else{ //read from internet%>
java.net.URL url_<%=cid %> ;
java.net.URLConnection urlConn_<%=cid %>;
java.io.InputStreamReader fr_<%=cid %>;
try {
url_<%=cid %> = new java.net.URL(<%=urlpath %>);
urlConn_<%=cid %> = url_<%=cid %>.openConnection();
fr_<%=cid %> = new java.io.InputStreamReader(urlConn_<%=cid %>.getInputStream());
}
catch (java.lang.Exception e) {
<% if (("true").equals(dieOnError)) {%>throw(e);<% } %>
}
<%}%>



Ok, that one was easy I guess, we simply capture all exceptions and we throw them back if the dieOnError is set, for the "open" part of our source file or url.
The lines just after parse the file into a java Object, then the jsEngine is asked to evaluate the stringfied version of the object (it's a way to pass objects from the java container program into the contained script engine)... I guess that process could be improved a bit, but this is not the topic today, and I am sure there are good reasons why the solution was implemented that way.
java.lang.Object jsonText_<%=cid%> = org.json.simple.JSONValue.parse(fr_<%=cid%>);
jsEngine_<%=cid%>.eval("var obj="+jsonText_<%=cid%>.toString());

We can also secure them catching Exceptions :


...
java.lang.Object jsonText_<%=cid%> ;
try {
jsonText_<%=cid%> = org.json.simple.JSONValue.parse(fr_<%=cid%>);
jsEngine_<%=cid%>.eval("var obj="+jsonText_<%=cid%>.toString());
} catch (java.lang.Exception e) {
<% if (("true").equals(dieOnError)) {%>throw(e);<% } %>
}




At this point, if the file is parsed succesfully, we have good hopes everything else will just work, however in the case of a failure, the various variables we "protected" would be unassigned and another error would be generated when trying to use them.
that would probably happen around here :

String resultObj_<%=column%>_<%=cid%> = invocableEngine_<%=cid%>.invokeFunction("jsonPath", <%=query%>).toString();


If this line fails, we'd better skip all the remaining part because there is no way we can produce records, luckily there is already some error checking afterwards, it's based on the int recordMaxSize_<%=cid%> = -1; variable, if it is set to -1 lines are not processed.
Also, the resultObj_<%=column%>_<%=cid%> can be set to "false" if an error occurs, that's probably our best bet for minimal intervention :)


...
String resultObj_<%=column%>_<%=cid%>;
try {
resultObj_<%=column%>_<%=cid%> = invocableEngine_<%=cid%>.invokeFunction("jsonPath", <%=query%>).toString();
} catch (java.Lang.Exception e) {
resultObj_<%=column%>_<%=cid%> = "false";
<% if (("true").equals(dieOnError)) {%>throw(e);<% } %>
}




And that should be it, we now need to test it and see how many mistakes we (ok, I :) ) did.

My version of the component can be downloaded here
There are a few other things that should be added now (besides the testing) :
1) Some output in the console when exceptions are catched and not thrown
2) Copy the "die on error" translation in all the language files


Part 9  Part 11