How to find the DataNodes that actually store a file in HDFS?
Posted on In QAA file may be splitted to many chunks and replications stored on many datanodes in HDFS. Now, the question is how to find the DataNodes that actually store a file in HDFS?
You may use the dfsadmin -fsck
tool from the Hadoop hdfs util. Here is an example:
$ hadoop fsck /user/aaa/file.name -files -locations -blocks
Connecting to namenode via http://dstore-170:50070
FSCK started by hadoop (auth:SIMPLE) from /10.0.3.170 for path /user/path/to/file.gz at Fri Oct 17 12:25:55 HKT 2014
/user/path/to/file.gz 12448905476 bytes, 93 block(s): OK
0. BP-1960069741-10.0.3.170-1410430543652:blk_1074365040_625145 len=134217728 repl=2 [10.0.3.173:50010, 10.0.3.174:50010]
1. BP-1960069741-10.0.3.170-1410430543652:blk_1074365041_625146 len=134217728 repl=2 [10.0.3.175:50010, 10.0.3.174:50010]
2. BP-1960069741-10.0.3.170-1410430543652:blk_1074365042_625147 len=134217728 repl=2 [10.0.3.175:50010, 10.0.3.174:50010]
3. BP-1960069741-10.0.3.170-1410430543652:blk_1074365043_625148 len=134217728 repl=2 [10.0.3.175:50010, 10.0.3.174:50010]
4. BP-1960069741-10.0.3.170-1410430543652:blk_1074365044_625149 len=134217728 repl=2 [10.0.3.181:50010, 10.0.3.174:50010]
...
91. BP-1960069741-10.0.3.170-1410430543652:blk_1074365131_625236 len=134217728 repl=2 [10.0.3.175:50010, 10.0.3.174:50010]
92. BP-1960069741-10.0.3.170-1410430543652:blk_1074365132_625237 len=100874500 repl=2 [10.0.3.181:50010, 10.0.3.174:50010]
Status: HEALTHY
Total size: 12448905476 B
Total dirs: 0
Total files: 1
Total symlinks: 0
Total blocks (validated): 93 (avg. block size 133859198 B)
Minimally replicated blocks: 93 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 2
Average block replication: 2.0
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 10
Number of racks: 1
FSCK ended at Fri Oct 17 12:25:55 HKT 2014 in 1 milliseconds
The filesystem under path '/user/aaa/file.name' is HEALTHY