How to check the replication factor of a file in HDFS?

Posted on

A related question: how to find the replication factors of files in a HDFS cluster? method 1: You can use the HDFS command line to ls the file. The second column of the output will show the replication factor of the file. For example, $ hdfs dfs -ls /usr/GroupStorage/data1/out.txt -rw-r–r– 3 hadoop zma 11906625598 2014-10-22
Read more

How to change an running HDFS cluster’s replication factor?

Posted on

Now, I have a running HDFS cluster storing lost files. I want to change its default replication factor. How to change it? What will happen after it is changed? For example, I change from 2 to 3. Will HDFS automatically re-replicate the data chunks? First, the replication factor is client decided. Second, the replication factor
Read more

How to download a rtmp stream on Linux?

Posted on

How to download a rtmp video stream on Linux? You can use mplayer to dump the rtmp stream like: mplayer -dumpstream rtmp://example.com/path/to/stream.mp4 It will generate ./stream.dump and you can rename it to the file with the extension you need like stream.mp4. The rtmp link usually can be found from the HTML or JavaScript source code
Read more

How to install latest version of Calibre?

Posted on

How to install latest version of Calibre? The version from my distro (Ubuntu, Linux Mint, Fedora) seem at 1.xx while the latest Calibre is already at 2.x. You may check Caibre website’s instruction: http://calibre-ebook.com/download_linux sudo -v && wget -nv -O- https://raw.githubusercontent.com/kovidgoyal/calibre/master/setup/linux-installer.py | sudo python -c “import sys; main=lambda:sys.stderr.write(‘Download failedn’); exec(sys.stdin.read()); main()”

How to find which package can be installed for a file, like “yum provides”?

Posted on

How to find which package can be installed for a file, like “yum provides”? That is, the package is not installed yet and I do not know the package for a file that I want. The apt-file tool can do the similar things as yum provides. You may need to install it first by sudo
Read more

How to balance DataNode storage in HDFS?

Posted on

As nodes are added and deleted in a Hadoop cluster. Storage usage across DataNodes may be different. Some DataNodes’ disks are almost used up while some others’ are almost empty. How to balance data across DataNodes in HDFS? Hadoop provides the balancer to redistribute the data. Brief introduction to balancer in Hadoop: balancer. The design
Read more

How to install gitbook?

Posted on

How to install gitbook on my own Linux box? First, install node.js following https://www.systutorials.com/qa/1268/how-to-install-node-js-on-fedora or How to install node.js on Ubuntu/Linux Mint depending on your distro. Second, install gitbook by npm to /opt/: # cd /opt/ # npm install gitbook Then, the gitbook can be invoked by /opt/node_modules/gitbook/bin/gitbook.js You may need to install the latest
Read more

How to run gitbook on a headless server (make Calibre run in headless server)?

Posted on

When use gitbook to generate ebook, Calibre reports this: RuntimeError: X server required. If you are running on a headless machine, use xvfb After xvfb is installed, it does not work either. How to make gitbook/Calibre work on a headless server? You need to wrap the command ebook-convert with xvfb-run. However, in gitbook (lib/generate/ebook/index.js), ebook-convert
Read more

How to find the DataNodes that actually store a file in HDFS?

Posted on

A file may be splitted to many chunks and replications stored on many datanodes in HDFS. Now, the question is how to find the DataNodes that actually store a file in HDFS? You may use the dfsadmin -fsck tool from the Hadoop hdfs util. Here is an example: $ hadoop fsck /user/aaa/file.name -files -locations -blocks
Read more

How to increase the number of files allowed to be opened on Linux?

Posted on

On my system: $ ulimit -n 1024 Some tools like GATK are aggressive in creating temporary files by creating more than 1000 files under /tmp/. This will cause the program to fail. How to increase the number of files allowed to be opened on Linux? To increase the max number of open files to 10240,
Read more

How to write /etc/fstab entry for –bind mounting?

Posted on

How to write /etc/fstab entry for –bind mounting like mount –bind /home/hadoop/hdfs/store-tmp /home/store/tmp From man 8 mount: Since Linux 2.4.0 it is possible to remount part of the file hierarchy somewhere else. The call is mount –bind olddir newdir or shortoption mount -B olddir newdir or fstab entry is: /olddir /newdir none bind

How to get logs of a specific time range on Linux?

Posted on

The logs I am processing is Hadoop log (log4j). It is in format like: 2014-09-20 21:55:11,855 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated user map size: 36 2014-09-20 21:55:11,863 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated group map size: 55 2014-09-20 22:10:11,907 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Update cache now 2014-09-20 22:10:11,907 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Not doing static UID/GID mapping because ‘/etc/nfs.map’ does not exist. Now, I
Read more

How to change number of replications of certain files in HDFS?

Posted on

The HDFS has a configuration in hdfs-site.xml to set the global replication number of blocks with the “dfs.replication” property. However, there are some “hot” files that are access by many nodes. How to increase the number of blocks for these certain files in HDFS? You can the replication number of certain file to 10: hdfs
Read more

Making Hadoop Java process heap larger?

Posted on

In Hadoop 2.5.0, I use ‘ps -aux’ and find the Java process has options: -Xmx1000m However, my nodes have 32GB memory. How to make Hadoop Java process heap larger? In yarn-env.sh, you can find: # For setting YARN specific HEAP sizes please use this # Parameter and set appropriately # YARN_HEAPSIZE=1000 In hadoop-env.sh, you can
Read more