PUMA: A MapReduce Benchmark Suite

Posted on Dec 20, 2012 by Eric Ma In Computing systems, News

MapReduce is a well-known programming model designed for generating and processing large data. There are various MapReduce implementations. One widely known and used one may be Hadoop. Benchmarking MapReduce frameworks gets to be important.

Faraz Ahmad et al. developed a benchmark suite: PUMA MapReduce Benchmark.

During our work on MapReduce, we developed a benchmark suite which represents a broad range of MapReduce applications exhibiting application characteristics with high/low computation and high/low shuffle volumes. There are a total of 13 benchmarks, out of which Tera-Sort, Word-Count, and Grep are from Hadoop distribution. The rest of the benchmarks were developed in-house and are currently not part of the Hadoop distribution.

One good point of the benchmark is that it provides both the source code and datasets, which makes reproducing and comparing the benchmarking results easier.

The benchmark source code and datasets can be downloaded here.

3 comments

Update: the new links for homepage for the PUMA and datasets are updated in the post.

The links on Google don’t work.

Eric Zhiqiang Ma says:

Nov 19, 2015 at 12:24 am

Hi Evan, thanks for reporting the broken links. I have updated the post with the updated links.

Reply

PUMA: A MapReduce Benchmark Suite

Eric Ma

3 comments

Leave a Reply Cancel reply

Categories

Recent Posts