Saturday, January 9, 2016

Difference between Text and String DataTypes in Hadoop

Here I am going to discuss some of the differences between Text and String class in Hadoop.

Text class lies in the package: import org.apache.hadoop.io.*;

Difference 1:
 
Text is not immutable :   String is immutable

Text t = new Text("hadoop");
t.set("BigData")
print "t" --> prints "BigData"


Difference 2 :
Text stores the string in a byte buffer with UTF-8 unicode encoding

Example : Text t = new Tex("hadoop");

will get converted into byte[] array, and then places in to ByteBuffer.

so the string "hadoop" will get stored like this [UTF-CODE(h),UTF-code(a)........ UTF-code(p)]

so this this is the byte[] array representation for string "hadoop"
   [104,97,100,111,111,112]

Why ?

Text uses standard UTF-8 which makes it potentially easier to inter-operate with other tools that
understand UTF-8.


Difference 3 :
CharAt(int index) in string returns the char at specified index.

charAt(int index) in Text returns the Unicode point in the above case it i 100.

Difference 4 : Due to lack of Rich API for manipulating strings in Text many cases we use to
convert it to String.

Difference 5 : Iterating over Text characters is tedious process when compared to string.
Example of iterating over charactes in Text;

Text t = new Text("hadoop");
      ByteBuffer bf = ByteBuffer.wrap(t.getBytes(),0,t.getLength());
 int cp;
 while(bf.hasRemaining()){
  cp = Text.bytesToCodePoint(bf);  
 System.out.print((char) cp);
 }

Similarity 1 :  find in Text equals to indexOf in String.

Text t = new Text("hadoop");
String s = new String("hadoop");

 System.out.println(" >>> "+t.find("o"));
 System.out.println(" >>> "+s.indexOf("o"));


3 comments:

Unknown said...

Being new to the blogging world I feel like there is still so much to learn. Your tips helped to clarify a few things for me as well as giving..

Hadoop Training in Chennai

Dot Net Training in Chennai

Unknown said...

You have provided an nice article, Thank you very much for this one. And i hope this will be useful for many people.. and i am waiting for your next post keep on updating these kinds of knowledgeable things...
iOS App Development Company
Android App Development Company
Best Mobile app Development company
Android App Development Company in chennai
iOS App Development Company in chennai

Revathi said...

Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging.great job you doing..thanks lot!!

android training in chennai

android online training in chennai

android training in bangalore

android training in hyderabad

android Training in coimbatore

android training

android online training

AWS certification question

AWS AWS Hi! this is for questions related to AWS questions. EC2 instances EC2 storage types cold HDD : 1. Defines performance in terms...