Thursday, November 26, 2015

HOW TO DEBUG DATANODE NOT RUNNING ISSUES IN HADOOP

Hi,
    In Recent past I am came across below issue.
Problem Point:
   I tried to copy a file from my local file system into hadoop file system(HFDS) with this command.

  bin/hadoop dfs -put /home/user/sample.txt /user/username/.
But  then I got this error:

15/11/26 08:36:57 WARN hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/username/Sample.txt could only be replicated to 0 nodes, instead of 1
bla bla ....

I understood that hdfs could not replicate the file.
Check the cluster health with the command  jps
shows me the following output:

user@localhost:~$ jps
5798 SecondaryNameNode
7648 Jps
6050 TaskTracker
5896 JobTracker
5474 NameNode

Observe the above output the Datanode is not running.

Now I visited the hadoop health status url : http://localhost:50070/dfshealth.jsp

Click on the URL "Namenodelogs"

there I checked for the recent data-logs node logs with the name:
hadoop-user-datanode-localhost.log 25542 bytes 
At the semi end of the log file search for the string "STARTUP_MSG: Starting DataNode STARTUP_MSG"

follow this paragraph observed this error: 
"ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Incompatible namespaceIDs in /app/hadoop/tmp/dfs/data: namenode namespaceID = 429859175; datanode namespaceID = 26135836"

then on in local file system changed the current directory to 
> cd /app/hadoop/tmp/dfs/data
> ls 
total 28
drwxr-xr-x 6 sekhar sekhar 4096 Nov 26 08:34 ./
drwxrwxr-x 5 sekhar sekhar 4096 Nov 21 18:57 ../
drwxrwxr-x 2 sekhar sekhar 4096 Nov 23 23:14 blocksBeingWritten/
drwxrwxr-x 2 sekhar sekhar 4096 Nov 23 23:14 current/
drwxrwxr-x 2 sekhar sekhar 4096 Nov 21 18:57 detach/
-rw-rw-r-- 1 sekhar sekhar  157 Nov 21 18:57 storage
drwxrwxr-x 2 sekhar sekhar 4096 Nov 26 07:40 tmp/

> cd current
> ll
sekhar@localhost:/app/hadoop/tmp/dfs/data/current$ ll
total 24
drwxrwxr-x 2 sekhar sekhar 4096 Nov 23 23:14 ./
drwxr-xr-x 6 sekhar sekhar 4096 Nov 26 08:34 ../
-rw-rw-r-- 1 sekhar sekhar    4 Nov 23 23:14 blk_-2585174188513577469
-rw-rw-r-- 1 sekhar sekhar   11 Nov 23 23:14 blk_-2585174188513577469_1002.meta
-rw-rw-r-- 1 sekhar sekhar  193 Nov 23 23:19 dncp_block_verification.log.curr
-rw-rw-r-- 1 sekhar sekhar  154 Nov 26 07:40 VERSION

> vi VERSION

change the namespaceId to namespaceId of the namenode
#Thu Nov 26 08:50:16 IST 2015
namespaceID=26135836 
storageID=DS-1674635271-127.0.0.1-50010-1448112482162
cTime=0
storageType=DATA_NODE
layoutVersion=-41

namespaceID should be changed to namespaceID of the namenodeID simply get this value from log
here in this case it is 429859175.

All set that'it. Just start the cluster now:
> sekhar@localhost:~$ jps
15278 JobTracker
14828 NameNode
15556 Jps
15172 SecondaryNameNode
15001 DataNode
15440 TaskTracker

WOW. Datanode is back.



No comments:

AWS certification question

AWS AWS Hi! this is for questions related to AWS questions. EC2 instances EC2 storage types cold HDD : 1. Defines performance in terms...