Sunday, March 28, 2010

Cascading: How does cascading decides which fields should go to a column family?

I was playing with Cascading code sample as given here.

Problem Statement: let's say we have three fields in a tuple for e.g.
line_num, lower, upper, double
1, a, A, AA

and I wish to add double to its own column family or lets say club it with an existing column family 'right' How do I do that.

Solution:
String tableName = "DataLoadTable";
        Fields keyFields = new Fields("line_num");
// add a new family name
        String[] familyNames = new String[] { "left", "right", "double" };
// group your fields together in the order in which you would like them to be 
// added to column families
        Fields[] valueFields = new Fields[] { new Fields("lower"),
                new Fields("upper"), new Fields("double") };
        HBaseScheme hbaseScheme = new HBaseScheme(keyFields, familyNames,
                valueFields);
        Tap sink = new HBaseTap(tableName, hbaseScheme, SinkMode.REPLACE);
// describe your tuple entry: add the new field
        Fields fieldDeclaration = new Fields("line_num", "lower", "upper",
                "double");
        Function function = new RegexSplitter(fieldDeclaration, ", ");

The remaining of the code remains the same as given in the example.

Either, the above was too obvious that the authors didn't talked about it in the user guide or I do not know how to describe the problem and hence was not able to find them.

Let me know if I'm wrong.

No comments: