HDFS stays in safe mode because of reported blocks not reaching 0.9990 of total blocks

After a node failure and restarting the HDFS, the NameNode reports:

“The reported blocks 1968810 needs additional 5071 blocks to reach the threshold 0.9990 of total blocks 1975856. Safe mode will be turned off automatically.”

in the log.

Why this happens? And how to fix it?

About why the NameNode stays in the safe mode:

At startup time, the namenode reads its namespace from disk (the
FSImage and edits files). This includes all the HDFS filenames and
block lists that it should know, but not the mappings of block
replicas to datanodes. Then it waits in safe mode for all or most of
the datanodes to send their Initial Block Reports, which let the
namenode build its map of which blocks have replicas in which
datanodes. It keeps waiting until dfs.namenode.safemode.threshold-pct
of the blocks that it knows about from FSImage have been reported from
at least dfs.namenode.replication.min (default 1) datanodes [so that’s
a third config parameter I didn’t mention earlier]. If this threshold
is achieved, it will post a log that it is ready to leave safe mode,
wait for dfs.namenode.safemode.extension seconds, then automatically
leave safe mode and generate replication requests for any
under-replicated blocks (by default, those with replication < 3).

and

If it doesn’t reach the “safe replication for all known blocks”
threshold, then it will not leave safe mode automatically. It logs
the condition and waits for an admin to decide what to do, because
generally it means whole datanodes or sets of datanodes did not come
up or are not able to communicate with the namenode. Hadoop wants a
human to look at the situation before hadoop starts trying to madly
generate re-replication commands for under-replicated blocks, and
deleting blocks with zero replicas available.

By Matthew Foley.

If you are sure that the blocks will never be reported in. You can force the NameMode to leave safemode by

hadoop dfsadmin -safemode leave

You may then run hdfs fsck -move or hdfs fdck -delete to move or delete corrupted files if you are sure you will not need these affected files any more.

Similar Posts

  • Conferences on Cloud Computing 2012

    This post lists important conferences on Cloud Computing in year 2012. OSDI 2012 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI ’12) October 8–10, 2012, Hollywood, CA “The tenth OSDI seeks to present innovative, exciting research in computer systems. OSDI brings together professionals from academic and industrial backgrounds in what has become a…

  • How to use ioprio_set in linux c

    Since there is no glibc wrapper for ioprio_set in linux c, we need to call this API with some definition. Using syscall in linux as follows. syscall(SYS_ioprio_set, IOPRIO_WHO_PROCESS, pid, IOPRIO_PRIO_CLASS(IOPRIO_CLASS_IDLE)); Some other definitions to declare above macros. 69 enum { 70 IOPRIO_CLASS_NONE, 71 IOPRIO_CLASS_RT, 72 IOPRIO_CLASS_BE, 73 IOPRIO_CLASS_IDLE, 74 }; 75 76 enum { 77…

  • How to force a checkpointing of metadata in HDFS?

    HDFS SecondaraNameNode log shows 2017-08-06 10:54:14,488 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in doCheckpoint java.io.IOException: Inconsistent checkpoint fields. LV = -63 namespaceID = 1920275013 cTime = 0 ; clusterId = CID-f38880ba-3415-4277-8abf-b5c2848b7a63 ; blockpoolId = BP-578888813-10.6.1.2-1497278556180. Expecting respectively: -63; 263120692; 0; CID-d22222fd-e28a-4b2d-bd2a-f60e1f0ad1b1; BP-622207878-10.6.1.2-1497242227638. at org.apache.hadoop.hdfs.server.namenode.CheckpointSignature.validateStorageInfo(CheckpointSignature.java:134) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:531) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:395) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$1.run(SecondaryNameNode.java:361) at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:415) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:357) It seems the checkpoint…

  • Latex is stuck with a strange problem, see more information for details.

    $ make pdflatex main.tex This is pdfTeX, Version 3.14159265-2.6-1.40.16 (TeX Live 2015/Debian) (preloaded format=pdflatex) restricted write18 enabled. entering extended mode (./main.tex LaTeX2e <2016/02/01> Babel <3.9q> and hyphenation patterns for 81 language(s) loaded. (/usr/share/texlive/texmf-dist/tex/latex/base/article.cls Document Class: article 2014/09/29 v1.4h Standard LaTeX document class (/usr/share/texlive/texmf-dist/tex/latex/base/size10.clo)) (./usenix.sty (/usr/share/texlive/texmf-dist/tex/latex/psnfss/mathptmx.sty)) (/usr/share/texlive/texmf-dist/tex/latex/graphics/epsfig.sty (/usr/share/texlive/texmf-dist/tex/latex/graphics/graphicx.sty (/usr/share/texlive/texmf-dist/tex/latex/graphics/keyval.sty) (/usr/share/texlive/texmf-dist/tex/latex/graphics/graphics.sty (/usr/share/texlive/texmf-dist/tex/latex/graphics/trig.sty) (/usr/share/texlive/texmf-dist/tex/latex/latexconfig/graphics.cfg) (/usr/share/texlive/texmf-dist/tex/latex/pdftex-def/pdftex.def (/usr/share/texlive/texmf-dist/tex/generic/oberdiek/infwarerr.sty) (/usr/share/texlive/texmf-dist/tex/generic/oberdiek/ltxcmds.sty))))) (/usr/share/texlive/texmf-dist/tex/generic/oberdiek/ifpdf.sty)…

2 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *