Tutorial

Setting Up Standalone (Local) Hadoop

ByEric Ma Apr 6, 2011Apr 5, 2016

Hadoop is designed to run on [[hadoop-installation-tutorial|hundreds to thousands of computers]] inside cluster. However, Hadoop is configured to run things in a non-distributed mode as a single Java process by default. This is specially useful for debugging since distributed debugging is really a nightmare. This post introduces how to set up a standalone Hadoop environment.

1. Hadoop package and software installation

Follow the instruction of “1. Install needed packages” part in [[hadoop-installation-tutorial|Hadoop Installation Tutorial]] to install packages. Fllow “4. Hadoop Concigurations” to configure hadoop-env.sh (this file only).

2. Just run Hadoop!

Just run hadoop jobs whose input and output is in local directories. We use a simple example to show how to start a Hadoop job.

The example finds and displays every match of the given regular expression. Output is written to the given output directory.

$ mkdir input
$ cp conf/*.xml input
$ bin/hadoop jar hadoop-mapred-examples-0.21.0.jar grep input output '[a-z.]+'
$ cat output/*

The jar file’s name may be different depending on the Hadoop distribution’s version.

Is it simple? Enjoy it and go further to play [[hadoop-installation-tutorial|Fully-distributed Hadoop Installation]].

Linux

How to Use Google Apps Account for GTalk in Pidgin

ByEric Ma Jul 13, 2013Mar 3, 2018

I believe many users are using pidgin and google apps. Google apps support Google Talk. Can we use it in Pidgin? The answer is yes. How to use Google apps account for Google Talk in Pidgin will be introduced in this post. Let’s use one email address as the example: eric @ example.com First add…

How to find which program or process is listening on a certain port in Linux?

ByEric Ma Mar 24, 2018Mar 24, 2018

My program reports that the port is already used. How to find which program or process is listening on a certain port in Linux? You can use netstat to do this. netstat can print network connections. For example, to find which program is listing on port 9999 netstat -pln | grep 9999 You will need…

What is the difference between work conserving I/O scheduler and non-work conserving I/O scheduler?

ByWeiwei Jia Mar 24, 2018Jan 7, 2020

What is the difference between work conserving I/O scheduler and non-work conserving I/O scheduler? In a work-conserving mode, the scheduler must choose one of the pending requests, if any, to dispatch, even if the pending requests are far away from the current disk head position. The rationale for non-work-conserving schedulers, such as the anticipatory scheduler…

How do I force Linux to unmount a filesystem?

ByEric Ma Mar 24, 2018Mar 24, 2018

Some time, Linux fails to unmount a filesystem reporting “device is busy”. I understand that this helps avoid data lost by disallowing unmouting a filesystem when it is being used. But for some situations, I am sure there is something wrong happened or I care not data lost, such as a NFS mounting while the…

WordPress theme TwentyFourteen interfere with AddThis Welcome Bar

ByEric Ma Mar 24, 2018Mar 24, 2018

The AddThis welcome bar will make a empty gap on top of the WordPress webpage with the TwentyFourteen wordpress theme. Is there a fix for it? You will find AddThis will add this div to the webpage: <div class=”addthis_bar_placeholder” style=”height: 46px;”></div> which cause the problem. A little trick that works is to set that div’s…

How to find out those who do not follow me back on twitter

ByQ A Mar 24, 2018

I want to find out those who I follow but do not follow me back on twitter. Is there a method to find out those who do not follow me back on twitter? A very useful tool: twitNERD can help you find out the ones that you follow but do not follow you. Additionally, you…

1. Hadoop package and software installation

2. Just run Hadoop!

Similar Posts

Leave a Reply Cancel reply