Hadoop TeraSort Benchmark

TeraSort is one of Hadoop’s widely used benchmarks. Hadoop’s distribution contains both the input generator and sorting implementations: the TeraGen generates the input and TeraSort conducts the sorting. Here, we provide a short tutorial for using the Hadoop TeraSort benchmark.

TeraGen generates random data that can be used as input data for a subsequent running of TeraSort.

Generate input by TeraGen

The syntax for TeraGen:

$ hadoop jar hadoop-*examples*.jar teragen \
<number of 100-byte rows> <output dir>

To make the TeraGen run on multiple nodes with multiple tasks, you may need to specify the number of map tasks (30 here as an example; for Hadoop 2):

$ hadoop -D mapreduce.job.maps 30 \
jar hadoop-*examples*.jar teragen \
<number of 100-byte rows> <output dir>

The number of mappers depends on the number of rows you will generate and the number of nodes you have. For more information on how to set the number of mappers and reducers, please check this post.

Run TeraSort

After the data is generated, run the sort by TeraSort

$ hadoop jar hadoop-*examples*.jar terasort \
<input dir> <output dir>

You may also need to set the number of mappers and reducers for better performance.

Validate the sorted output data of TeraSort

TeraValidate ensures that the output data of TeraSort is globally sorted.

The syntax for TeraValidate:

$ hadoop jar hadoop-*examples*.jar teravalidate \
<output dir> <terasort-validate dir>

Similar Posts

  • Good introductions to Hadoop 2.0 (YARN)?

    Which ones are recommended introductions to Hadoop 2.0 (YARN)? Pointers to webpages are good. Those are good ones that I find: The SoCC13 paper “Apache Hadoop YARN: Yet Another Resource Negotiator” by Vinod Kumar Vavilapalli et al.: http://www.socc2013.org/home/program/a5-vavilapalli.pdf The introduction from Hortonworks by Arun Murthy:http://hortonworks.com/blog/apache-hadoop-yarn-concepts-and-applications/ The “Official” one from Apache Hadoop website (very brief):https://hadoop.apache.org/docs/r2.2.0/hadoop-yarn/hadoop-yarn-site/YARN.html Read…

  • MFC程序使用系统风格界面

    VC6默认编译出来的程序在XP下Luma风格下运行也是Windows的经典界面, 有损界面的美观与统一. VC2008默认设置下如果不是使用的unicode也是如此. 本文给出使VC6和VC2008可以编译出使用系统界面风格的解决方案. 1. 使VC6编译出使用系统风格的程序 步骤如下: 1) 创建一个.manifest文件的资源. 在res/文件夹下创建一个跟以程序名加.manifest的文件, 如果程序为test.exe, 则创建test.exe.manifest 文件可由此下载: https://www.systutorials.com/t/g/programming/resultcollector.manifest/ 注意要使用utf-8编码保存。 2) 将新定义的资源加入到.rc2文件中, 类型设为24. 打开res/文件夹下的.rc2文件, 在其中加入如下定义: 1 24 MOVEABLE PURE “res/test.exe.manifest” 其中的文件地址按1)步中修改的设置即可. 之后编译即可, 为了使程序界面可能充分利用系统的界面特性, 可以将界面字体设置为TrueType类型的, 利用Windows XP等系统的屏幕字体平滑特性. 2. 使VC2008编译出使用系统风格的程序 在VC2008下就比较简单了, 如果程序字符集使用unicode则默认就是使用系统界面风格的, 如果选择其它的类型, 则编辑下stdafx.h即可. 最后面部分找到这么一段: #ifdef _UNICODE #if defined _M_IX86 #pragma comment(linker,”/manifestdependency:”type=’win32′ name=’Microsoft.Windows.Common-Controls’ version=’6.0.0.0′ processorArchitecture=’x86′ publicKeyToken=’6595b64144ccf1df’ language=’*'””) #elif defined _M_IA64 #pragma comment(linker,”/manifestdependency:”type=’win32’…

One Comment

Leave a Reply

Your email address will not be published. Required fields are marked *