Setting Up Standalone (Local) Hadoop
Posted on In TutorialHadoop is designed to run on [[hadoop-installation-tutorial|hundreds to thousands of computers]] inside cluster. However, Hadoop is configured to run things in a non-distributed mode as a single Java process by default. This is specially useful for debugging since distributed debugging is really a nightmare. This post introduces how to set up a standalone Hadoop environment.
1. Hadoop package and software installation
Follow the instruction of “1. Install needed packages” part in [[hadoop-installation-tutorial|Hadoop Installation Tutorial]] to install packages. Fllow “4. Hadoop Concigurations” to configure hadoop-env.sh (this file only).
2. Just run Hadoop!
Just run hadoop jobs whose input and output is in local directories. We use a simple example to show how to start a Hadoop job.
The example finds and displays every match of the given regular expression. Output is written to the given output directory.
$ mkdir input $ cp conf/*.xml input $ bin/hadoop jar hadoop-mapred-examples-0.21.0.jar grep input output '[a-z.]+' $ cat output/*
The jar file’s name may be different depending on the Hadoop distribution’s version.
Is it simple? Enjoy it and go further to play [[hadoop-installation-tutorial|Fully-distributed Hadoop Installation]].