The logs I am processing is Hadoop log (log4j). It is in format like: 2014-09-20 21:55:11,855 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated user map size: 36 2014-09-20 21:55:11,863 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated group map size: 55 2014-09-20 22:10:11,907 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Update cache now 2014-09-20 22:10:11,907 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Not doing static UID/GID mapping because ‘/etc/nfs.map’ does not exist. Now, I
Read more
Tag: map
How to set the number of mappers and reducers of Hadoop in command line?
Posted onHow to set the number of mappers and reducers of Hadoop in command line? Number of mappers and reducers can be set like (5 mappers, 2 reducers): -D mapred.map.tasks=5 -D mapred.reduce.tasks=2 in the command line. In the code, one can configure JobConf variables. job.setNumMapTasks(5); // 5 mappers job.setNumReduceTasks(2); // 2 reducers Note that on Hadoop
Read more
How to find which hard drives a LVM volume uses?
Posted onHow to find which hard drives a LVM volume uses to decide which volume will be affected if a disk failes. You can use lvdisplay with the –maps option to display the the mapping of logical extents to physical volumes and physical extents: # lvdisplay –maps To map physical extents to logical extents, use #
Read more
How to map Win key to Ctrl on Linux?
Posted onHow to map the Win key to another Ctrl on Linux? You can set it in gnome-tweak-tool in Gnome 3 by setting the “Alt/Win key behavior”:
How to remap the Caps Lock key to Control for Emacs
Posted onHow to remap the Caps Lock key to Control for Emacs? My left little finger is just so tired… You can either change it in gnome-tweak-tool on Gnome 3: Or: You can make use of 2 tools: xev to find out the key code for Caps Lock and xmodmap to modify key maps. First, run
Read more
How to choose the number of mappers and reducers in Hadoop
Posted onHow to choose the number of mappers and reducers in Hadoop to get good job performance? The Hadoop Wiki gives a discussion on this: http://wiki.apache.org/hadoop/HowManyMapsAndReduces Some valuable points: About the number of Maps: The number of maps is usually driven by the number of DFS blocks in the input files. Although that causes people to
Read more
How to quickly find out failed disks’ SATA port in Linux? (how to map Linux disk names to SATA ports)
Posted onI find one disk failed on my server which have several ones installed. I know the disk’s name in Linux (e.g. sda, sdb). However, the Linux disk name to SATA port mapping does not follow the same order. Now, I want to find out the failed disks. How to quickly find out them and which
Read more
Mmaping Memory Range Larger Than the Total Size of Physical Memory and Swap
Posted onIn Linux, mmap() is a system call that is used to map a portion of a file or device into the memory of a process. This can be useful for a variety of purposes, such as memory-mapped I/O, shared memory, and virtual memory management. However, when mapping a large range of memory that is larger
Read more
Where Does Evolution Save Its Data and Configuration Files on Linux?
Posted onEvolution is a great personal information management tool that provides Email, address book and calendar tools. Evolution provides many enterprise friendly feature such as native support to Microsoft Exchange connectivity for Emails, address books and calendars. Evolution uses various ways including plain files and dconf configuration systems. This post will give an introduction to the
Read more
Building and Installing Linux Kernel from the Source Code in an Existing Linux OS
Posted onBuilding Linux kernel may sound a complex and geek-only thing. However, as Linux kernel itself has much less depended tools/packages compared to other software packages, it is quite easy to compile, build and install a Linux kernel from the source code in an existing Linux OS. Building Linux kernel is needed if you need to
Read more
How to Turn GNOME terminal to a Pop-up Terminal
Posted onA pop-up terminal is great and handy on Linux and similar OS. On KDE, Yakuake is great. On Gnome or GTK, I ever tried Guake. It is quite good. However, it has not been as mature, stable and figure-rich as gnome-terminal. One day, I got this idea: why not using a script/program to manage the
Read more
Hadoop Installation Tutorial (Hadoop 2.x)
Posted onHadoop 2 or YARN is the new version of Hadoop. It adds the yarn resource manager in addition to the HDFS and MapReduce components. Hadoop MapReduce is a programming model and software framework for writing applications, which is an open-source variant of MapReduce designed and implemented by Google initially for processing and generating large data
Read more
Keyboard Key Mapping for Emacs: Evil Mode and Rearranging Alt, Ctrl and Win Keys
Posted onCtrl keys are important and possibly most frequently used in Emacs. However, it is painful on today’s common PC keyboards since Ctrl keys are usually in the corner of the keyboard main area. Why the key mappings in Emacs are designed like this? After it was designed, Emacs was commonly on the Lisp Machine keyboards
Read more
How to Find Out Failed Disks’ SATA Ports in Linux
Posted onThe Linux disk names (e.g. sda1, hdb3, etc.) are not reliable—they may be changed if there are hardware changes, such an adding or removing a disk. Additionally, the order for the Linux device names is not always the same as the order of SATA poets. For example, the disk connected to SATA port 0 (first
Read more
Improving Font Rendering for Fedora Using Bytecode Interpreter
Posted onFedora’s font rendering isn’t very nice. At least on my laptop with Fedora 12. Bytecode Interpreter (BCI for short) is disabled by default because of patent issues. As the TrueType bytecode patents have expired. We may enable BCI in Fedora now. TrueType announced that BCI is enabled by default from 2.4. Fedora 12’s TrueType version
Read more
Finding out Linux Network Configuration Information
Posted onThere is various network configuration information in Linux and lots tools can be used to find out those configuration information. Finding out these network information in Fedora Linux as the example will be introduced. IP address, MAC address and netmask ifconfig will print out all the network interfaces and their information including the IP address
Read more
Hadoop TeraSort Benchmark
Posted onTeraSort is one of Hadoop’s widely used benchmarks. Hadoop’s distribution contains both the input generator and sorting implementations: the TeraGen generates the input and TeraSort conducts the sorting. Here, we provide a short tutorial for using the Hadoop TeraSort benchmark. TeraGen generates random data that can be used as input data for a subsequent running
Read more
Hadoop Installation Tutorial (Hadoop 1.x)
Posted onUpdate: If you are new to Hadoop and trying to install one. Please check the newer version: Hadoop Installation Tutorial (Hadoop 2.x). Hadoop mainly consists of two parts: Hadoop MapReduce and HDFS. Hadoop MapReduce is a programming model and software framework for writing applications, which is an open-source variant of MapReduce that is initially designed
Read more
A Simple Sort Benchmark on Hadoop
Posted onAfter [[hadoop-installation-tutorial|installing Hadoop]], we usually run some benchmark programs to test whether the system works well. In the post of the Hadoop install tutorial, we show a very simple to grep strings from a simple sets of files. In this post, we introduce the Sort for testing and benchmarking Hadoop. The Sort program is also
Read more
mrcc – A Distributed C Compiler System on MapReduce
Posted onThe mrcc project’s homepage is here: mrcc project. Abstract mrcc is an open source compilation system that uses MapReduce to distribute C code compilation across the servers of the cloud computing platform. mrcc is built to use Hadoop by default, but it is easy to port it to other could computing platforms, such as MRlite,
Read more