PowerUp
"How to create TOS components" Tutorial : Part 7

Component Creation - Part 7

In the previous tutorial lesson we introduced the basic operations on connectors and we implemented a simple exercise in which the component was generating an output.
However we assumed that all the fields were Strings and we simply sent to each one of them the same content.
When designing a component that "generates" the records it is normal to know upfront the schema, but in most cases (i.e. reading a table from a database, or a file) we need to "detect" it.

A real life component

Today we will create our first "useful" component, it will be simple, but it will actually accomplish something.
What we will achieve is to read a text (comma separated) file from the filesystem and send the content to the output connections, a very basic tCSVInputTutorial component.
We will stick to a simple approach, however you will be able to improve the component yourself if you like.

First thing we want to do is to create a new directory "tCSVInputTutorial" in your custom component fodler, the same one where you placed tTutorial1.
Then copy the content of the tTutorial1 folder into the new one and rename all the files substituting the old component name (tTutorial1) with the new one.
I normally do this each time I create a component, that's because I often recycle a great deal of code from one component to another, we will do the same.

High level, what our component will do is :
In the begin section : Open the text file, cycle (open) for all the lines in the file
In the main section : Scan the output connections and the metadata to output the csv line parsed with a StringTokenizer
In the end section : Close the cycle and close the text file

Makes sense, no? So we have a plan, let's execute it starting from our XML descriptor where we can get rid of the existing parameters, excluding the schema_type.
Done? Now clean up accordingly also the messages.properties file.
Which other parameters do we need? We need the name of the file for sure, plus we could add other parameters such as the field delimiter and a string inclusion character...but in order to keep it basic, for now, we will just add the filename, you will be able to experiment with any other addition once we are done.
You probably already guessed there is a parameter type "FILE", let's add one :



....
<PARAMETER NAME="INPUTFILE" FIELD="FILE" NUM_ROW="3" REQUIRED="true"/>
....




That line obviously goes into your XML descriptor file, in the PARAMETERS section, remember to add a nice label for INPUTFILE.NAME in the messages.properties file (and if you have no clue on what I am talking about, you probably missed the previous 6 parts of this tutorial).

Time to work on the java jet files, the begin part will basically do all the input file handling, so let's start from there.
First thing will be to retrieve the fielname parameter :
String filename = ElementParameterParser.getValue(node, "__INPUTFILE__");
You can get rid of all the other parameters (MYPAR1 etc) and of the following code because it's not relevant anymore.
So, next thing will be to open the file, but what if the file does not exist? In nomral java programming we would use a try / catch to avoid that exceptions are thrown, but in writing component it might not be the best choice.
Think about it : what should your job do if your input file is missing? In most cases it should fail, raise an exception, record the exception in a log etc.So you normally need the exception to be raised and handled by the Talend Engine.
For this reason, on purpose, we will not do any exception handling in our component. This rule cannot be applied to all the situations, you need to decide case by case, particularly you will handle exceptions if you can recover the situation (i.e. attempt connecting to a webservice 5 times before giving up -I actually do this in one of my components-)

Note : this is not a java tutorial, so I assume you are already familiar with file handling and the related classes, so I will not explain what they do and how they work.



<%@ jet
 imports="
   org.talend.core.model.process.INode
  org.talend.core.model.process.ElementParameterParser
  org.talend.core.model.metadata.IMetadataTable
  org.talend.core.model.metadata.IMetadataColumn
  org.talend.core.model.process.IConnection
  org.talend.core.model.process.IConnectionCategory
  org.talend.designer.codegen.config.CodeGeneratorArgument
  org.talend.core.model.metadata.types.JavaTypesManager
  org.talend.core.model.metadata.types.JavaType
 "
%>
<% CodeGeneratorArgument codeGenArgument = (CodeGeneratorArgument) argument;
INode node = (INode)codeGenArgument.getArgument();
String cid = node.getUniqueName();
String filename = ElementParameterParser.getValue(node, "__INPUTFILE__");
%>

  java.io.FileReader fr_<%=cid %> = new java.io.FileReader( <%=filename %> );
  java.io.BufferedReader br_<%=cid %> = new java.io.BufferedReader( fr_<%=cid %> );

  String inBuffer_<%=cid %> = br_<%=cid %>.readLine( );
  while( inBuffer_<%=cid %> != null )
  {




This is our begin section : You should recognize in the template code where we get the input filename, then I used a "minimalist" approach to file reading, feel free to improve it as you whish, I just wanted to keep it simple so that we can focus mainly on the Talend part.
As in our previous exercise, we open a cycle and we don't close it here, this time we use a while loop which will stop once there are no more lines in the file to be read.
Also I hope you noticed that in the java output code we reference classes with their full names

    In java output code classes are referenced with their canonical name, inlcuding the package name (i.e. java.io.FileInputStream ) as no import directive is specified


Let's see now the end section, where we close the cycle we just opened :



<%@ jet
 imports="
   org.talend.core.model.process.INode
   org.talend.core.model.process.ElementParameterParser
   org.talend.core.model.metadata.IMetadataTable
   org.talend.core.model.metadata.IMetadataColumn
   org.talend.core.model.process.IConnection
   org.talend.core.model.process.IConnectionCategory
   org.talend.designer.codegen.config.CodeGeneratorArgument
   org.talend.core.model.metadata.types.JavaTypesManager
   org.talend.core.model.metadata.types.JavaType
  "
%>
<%
CodeGeneratorArgument codeGenArgument = (CodeGeneratorArgument) argument;
INode node = (INode)codeGenArgument.getArgument();
String cid = node.getUniqueName();
%>
 inBuffer_<%=cid %> = br_<%=cid %>.readLine();
}
br_<%=cid %>.close();




This part is self explicative: we (try to) read a new line before closing the cycle (else it would loop forever) and when the process is over, we close the file.

The main section is slightly more complicated, however we will recycle a big part of the tTutorial1 component.
If you analyse its main section, you should be able to see that we can easily insert a StringTokenizer when we cycle the connections and iterate the tokens together with the columns, so we basically have two points where we need to add some code, they hare highlighted in bold.



<%@ jet
imports="
org.talend.core.model.metadata.IMetadataColumn
org.talend.core.model.metadata.IMetadataTable
org.talend.core.model.process.EConnectionType
org.talend.core.model.process.IConnection
org.talend.core.model.process.INode
org.talend.designer.codegen.config.CodeGeneratorArgument
java.util.List
"
%>
<%
CodeGeneratorArgument codeGenArgument = (CodeGeneratorArgument) argument;
INode node = (INode)codeGenArgument.getArgument();
String cid = node.getUniqueName();
List<IMetadataTable> metadatas = node.getMetadataList();
if ((metadatas != null) && (metadatas.size() > 0)) {//b
 IMetadataTable metadata = metadatas.get(0);
 if (metadata != null) {//a
  List<IMetadataColumn> columns = metadata.getListColumns();
  List< ? extends IConnection> outConns = node.getOutgoingConnections();
  for (IConnection conn : outConns)
  { //2
   if (conn.getLineStyle().equals(EConnectionType.FLOW_MAIN)||conn.getLineStyle().equals(EConnectionType.FLOW_MERGE))
   { //3
    String outputConnName = conn.getName();
%>
  java.util.StringTokenizer st_<%=cid %> = new java.util.StringTokenizer(inBuffer_<%=cid %>,","); // here
<%

    for (int i = 0; i < columns.size(); i++)
    {//4
     IMetadataColumn column = columns.get(i);
%>
<%=outputConnName %>.<%=column.getLabel() %> = st_<%=cid %>.nextToken(); // and here

<%
    }//4
   }//3
  }//2
 }//a
}//b
%>




Install the component (here my version if something went wrong with yours)
The code here works with some assumptions :
1) There is always a token for each column
2) All the columns in the schema are strings
3) The strings do not contain any delimiter char ","

So, try out the component having in mind those limitations.

Point 1) could be ok, we may want to let Talend handle the exceptions, or we may want to catch them and set the column value to null if no token is found. I am not going to do anything here, feel free to improve the code as you wish, the issue is not relevant for this lesson.
Similarly point 3) can be improved, but it's not relevant for this tutorial (it's a pure java issue), moreover the best way to handle a CSV read task would be to use a CSV Java library... we will probably do that in few lessons.
What is definitely relevant for this lesson is the point 2).

The idea is that if the target output column is a int then we will convert the token from the tokenizer into a int, if it is a float into a float... ok, I think you got the point.
To do that we need to get some information on the metadata (in the main section), this can be done in the jet part of the code.

The information we need is the "type to Generate", meaning the type that Talend needs in the schema column, this is retrieved with the following code :

String ttg = JavaTypesManager.getTypeToGenerate(column.getTalendType(),column.isNullable());


To get there we need to first detect the Talend Type of the column [ column.getTalendType() ] which will be something similar to id_integer then we need to detect the associated java type which could be a primitive or an object, such and int or its wrapper object if the schema column is nullable.
In our current case, we are betetr off using the wrapper object anyway since we have to perform a conversion, like as Float.parseFloat(st.nextToken()) which returns an Object.
We wil not use the column.isNullable() information, instead we will just force it to true.





....
    for (int i = 0; i < columns.size(); i++)
    {//4
     IMetadataColumn column = columns.get(i);
     String ttg = JavaTypesManager.getTypeToGenerate(column.getTalendType(),true);
      if (ttg.equals("String"))
      { // string
%>
      <%=outputConnName %>.<%=column.getLabel() %> = st_<%=cid %>.nextToken();
<%      } // string
      else
      if (ttg.equals("Integer"))
      { // Integer
%>
     <%=outputConnName %>.<%=column.getLabel() %> = Integer.parseInt(st_<%=cid %>.nextToken());
<%       } // Integer
      else
      if (ttg.equals("Float"))
      { // Float
%>
     <%=outputConnName %>.<%=column.getLabel() %> = Float.parseFloat(st_<%=cid %>.nextToken());
<%       } // Float
      //else .... all the other possible types

....


This simple code should illustrate quite clrarly the concept, indeed it can be improved, but hopefully you got the idea. If we didn't use the constant "false" for the isNullable parameter, we would have been forced to test for both primitive and Object, something like if (ttg.equals("Integer")||ttg.eqlaus("int")).

Time to try it! Push the component to the palette and setup a csv file similar to this one :



test1,2,3.4
test12,22,2.3
test31,32,-3.3




Then in your test job modify the schema of the component so that it has the first column as a string, the scond as an int and the third as a float, specify your csv file in the filename parameter and... congratulations on creating your first component that actually does something!
Ehm.. if the "something" it does is to generate an error message, then you can try with mine here.

We are almost done with this lesson, I think I covered the main topics I wanted to, but to complete the component, we also need to output the NB_LINE parameter.

    As a standard, components have a return value called NB_LINE which, at the end of the process is loaded in the globalMap and holds the number of processed records


First thing we need to create a return parameter in the XML descriptor , in the RETURNS section add this line :



<RETURN AVAILABILITY="AFTER" NAME="NB_LINE" TYPE="id_Integer"/>



Then we need to count the records as we process them, so in the java output code, begin section we can define a int variable :



...
int nb_line_<%=cid %> = 0;
...
[read cycle]


We increment it before closing the cycle in the end section :



...
 nb_line_<%=cid %>++;
}
globalMap.put("<%=cid %>_NB_LINE",nb_line_<%=cid %>);
....



And finally we push it to the globalMap.For this last step please notice that the correct name of the variable in the map has to be <componentUniqueName>_NB_LINE (the cid is a prefix and NOT a suffix).



This completes this tutorial lesson, which I hope you enjoyed (if not, please to let us know why in our feedback module, so that we can improve it!).>br> In the next lesson we will create an Output component and we will discover a trick to output a number of records different from the ones in input.

Part 6  Part 8