A related question: how to find the replication factors of files in a HDFS cluster? method 1: You can use the HDFS command line to ls the file. The second column of the output will show the replication factor of the file. For example, $ hdfs dfs -ls /usr/GroupStorage/data1/out.txt -rw-r–r– 3 hadoop zma 11906625598 2014-10-22
Read more
Author: Eric Ma
Eric is a systems guy. Eric is interested in building high-performance and scalable distributed systems and related technologies. The views or opinions expressed here are solely Eric's own and do not necessarily represent those of any third parties.How to change an running HDFS cluster’s replication factor?
Posted onNow, I have a running HDFS cluster storing lost files. I want to change its default replication factor. How to change it? What will happen after it is changed? For example, I change from 2 to 3. Will HDFS automatically re-replicate the data chunks? First, the replication factor is client decided. Second, the replication factor
Read more
How to download a rtmp stream on Linux?
Posted onHow to download a rtmp video stream on Linux? You can use mplayer to dump the rtmp stream like: mplayer -dumpstream rtmp://example.com/path/to/stream.mp4 It will generate ./stream.dump and you can rename it to the file with the extension you need like stream.mp4. The rtmp link usually can be found from the HTML or JavaScript source code
Read more
How to install latest version of Calibre?
Posted onHow to install latest version of Calibre? The version from my distro (Ubuntu, Linux Mint, Fedora) seem at 1.xx while the latest Calibre is already at 2.x. You may check Caibre website’s instruction: http://calibre-ebook.com/download_linux sudo -v && wget -nv -O- https://raw.githubusercontent.com/kovidgoyal/calibre/master/setup/linux-installer.py | sudo python -c “import sys; main=lambda:sys.stderr.write(‘Download failedn’); exec(sys.stdin.read()); main()”
What is the design of Snapshots in HDFS?
Posted onWhat is the design of Snapshots in HDFS? This PDF documents the design of snapshot. Jing Zhao and Tsz-Wo Sze from Hortonworks gave a great talk on the design of HDFS snapshots. The slides can be downloaded at here. The development of snapshot is tracked by HDFS-2802.
How to find which package can be installed for a file, like “yum provides”?
Posted onHow to find which package can be installed for a file, like “yum provides”? That is, the package is not installed yet and I do not know the package for a file that I want. The apt-file tool can do the similar things as yum provides. You may need to install it first by sudo
Read more
How to balance DataNode storage in HDFS?
Posted onAs nodes are added and deleted in a Hadoop cluster. Storage usage across DataNodes may be different. Some DataNodes’ disks are almost used up while some others’ are almost empty. How to balance data across DataNodes in HDFS? Hadoop provides the balancer to redistribute the data. Brief introduction to balancer in Hadoop: balancer. The design
Read more
How to install gitbook?
Posted onHow to install gitbook on my own Linux box? First, install node.js following https://www.systutorials.com/qa/1268/how-to-install-node-js-on-fedora or How to install node.js on Ubuntu/Linux Mint depending on your distro. Second, install gitbook by npm to /opt/: # cd /opt/ # npm install gitbook Then, the gitbook can be invoked by /opt/node_modules/gitbook/bin/gitbook.js You may need to install the latest
Read more
How to install node.js on Fedora?
Posted onHow to install node.js on Fedora? You may install it by: # yum install nodejs npm
How to run gitbook on a headless server (make Calibre run in headless server)?
Posted onWhen use gitbook to generate ebook, Calibre reports this: RuntimeError: X server required. If you are running on a headless machine, use xvfb After xvfb is installed, it does not work either. How to make gitbook/Calibre work on a headless server? You need to wrap the command ebook-convert with xvfb-run. However, in gitbook (lib/generate/ebook/index.js), ebook-convert
Read more
How to install node.js on Ubuntu/Linux Mint?
Posted onHow to install node.js on Ubuntu/Linux Mint? This is how I install node.js on Linux Mint: # aptitude install nodejs nodejs-legacy npm The nodejs-legacy makes sure the command node will invoke node.js.
How to find the DataNodes that actually store a file in HDFS?
Posted onA file may be splitted to many chunks and replications stored on many datanodes in HDFS. Now, the question is how to find the DataNodes that actually store a file in HDFS? You may use the dfsadmin -fsck tool from the Hadoop hdfs util. Here is an example: $ hadoop fsck /user/aaa/file.name -files -locations -blocks
Read more
How to increase the number of files allowed to be opened on Linux?
Posted onOn my system: $ ulimit -n 1024 Some tools like GATK are aggressive in creating temporary files by creating more than 1000 files under /tmp/. This will cause the program to fail. How to increase the number of files allowed to be opened on Linux? To increase the max number of open files to 10240,
Read more
kernel netfront: Too many frags in Xen VM
Posted onWe set up Xen F19 VMs on a Xen 3.4.3 / 2.6.32.13 xenified kernel. Check here.). However, we find the Xen VM keeps reporting: kernel netfront: Too many frags and skb rides the rocket in the dmsg. This solves the problem (assume in physical server vif1.0 is for the eth0 on the VM): On the
Read more
How to write /etc/fstab entry for –bind mounting?
Posted onHow to write /etc/fstab entry for –bind mounting like mount –bind /home/hadoop/hdfs/store-tmp /home/store/tmp From man 8 mount: Since Linux 2.4.0 it is possible to remount part of the file hierarchy somewhere else. The call is mount –bind olddir newdir or shortoption mount -B olddir newdir or fstab entry is: /olddir /newdir none bind
How to get logs of a specific time range on Linux?
Posted onThe logs I am processing is Hadoop log (log4j). It is in format like: 2014-09-20 21:55:11,855 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated user map size: 36 2014-09-20 21:55:11,863 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated group map size: 55 2014-09-20 22:10:11,907 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Update cache now 2014-09-20 22:10:11,907 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Not doing static UID/GID mapping because ‘/etc/nfs.map’ does not exist. Now, I
Read more
How to change number of replications of certain files in HDFS?
Posted onThe HDFS has a configuration in hdfs-site.xml to set the global replication number of blocks with the “dfs.replication” property. However, there are some “hot” files that are access by many nodes. How to increase the number of blocks for these certain files in HDFS? You can the replication number of certain file to 10: hdfs
Read more
Making Hadoop Java process heap larger?
Posted onIn Hadoop 2.5.0, I use ‘ps -aux’ and find the Java process has options: -Xmx1000m However, my nodes have 32GB memory. How to make Hadoop Java process heap larger? In yarn-env.sh, you can find: # For setting YARN specific HEAP sizes please use this # Parameter and set appropriately # YARN_HEAPSIZE=1000 In hadoop-env.sh, you can
Read more
if (p:string list) = c (is the only element)
Posted onif (p:string list) = [c] then (divide p1 c) showing unbound value c. i want to equalise (if one element in p which is (c:anything)) and use that variable let p = [“ocaml”] let f s = match s with | [c] -> print_endline c | _ -> print_endline “ops” f p It will print:
Read more
How to read email in Maildir on Linux?
Posted onHow to read email in Maildir on Linux? You can use mutt by: mutt -f /path/to/mail/dir/