How to set replication factors for HDFS directories?

Posted on

Is it possible to set the replication factor for specific directory in HDFS to be one that is different from the default replication factor? This should set the existing files’ replication factors but also new files created in the specific directory. This can simplify the administration. We can set the replication factor of /tmp/ to
Read more

How to get logs of a specific time range on Linux?

Posted on

The logs I am processing is Hadoop log (log4j). It is in format like: 2014-09-20 21:55:11,855 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated user map size: 36 2014-09-20 21:55:11,863 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated group map size: 55 2014-09-20 22:10:11,907 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Update cache now 2014-09-20 22:10:11,907 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Not doing static UID/GID mapping because ‘/etc/nfs.map’ does not exist. Now, I
Read more

Good introductions to Hadoop 2.0 (YARN)?

Posted on

Which ones are recommended introductions to Hadoop 2.0 (YARN)? Pointers to webpages are good. Those are good ones that I find: The SoCC13 paper “Apache Hadoop YARN: Yet Another Resource Negotiator” by Vinod Kumar Vavilapalli et al.: http://www.socc2013.org/home/program/a5-vavilapalli.pdf The introduction from Hortonworks by Arun Murthy:http://hortonworks.com/blog/apache-hadoop-yarn-concepts-and-applications/ The “Official” one from Apache Hadoop website (very brief):https://hadoop.apache.org/docs/r2.2.0/hadoop-yarn/hadoop-yarn-site/YARN.html

Hadoop 2 (YARN) default configuration values

Posted on

Where to check the default Hadoop 2 (YARN) configuration values for: HDFS: hdfs-site.xml YARN: yarn-site.xml MapReduce: mapred-site.xml Default Hadoop 2 (YARN) configuration values for Hadoop 2.2.0 from Apache Hadoop website: HDFS: http://hadoop.apache.org/docs/r2.2.0/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml YARN: https://hadoop.apache.org/docs/r2.2.0/hadoop-yarn/hadoop-yarn-common/yarn-default.xml MapReduce: https://hadoop.apache.org/docs/r2.2.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml

Add header footer in directory listing in Apache (httpd)

Posted on

How to add header footer in directory listing in Apache (httpd)? In the web directory’s .htaccess file: Options +Indexes IndexOptions FancyIndexing VersionSort NameWidth=* HTMLTable Charset=UTF-8 HeaderName /header.html ReadmeName /footer.html IndexIgnore header.html footer.html .htaccess header.html and footer.html are under the website root directory (not the Linux root).

Installing WordPress in a sub directory while working for the whole site

Posted on

How to install WordPress in a sub directory while working for the whole site? Putting the WordPress files in the root directory seems messy. A method is introduced here: http://codex.wordpress.org/Giving_WordPress_Its_Own_Directory The process to move WordPress into its own directory is as follows: Create the new location for the core WordPress files to be stored (we
Read more

How to change the maximum execution time and memory size allowed for PHP

Posted on

I see this message in the error log of httpd: PHP Fatal error: Maximum execution time of 30 seconds exceeded in and PHP Fatal error: Allowed memory size of 268435456 bytes exhausted How to change them to a longer and larger value? To change the allowed maximum memory usage of PHP: Set memory_limit = 256M
Read more

How to choose the number of mappers and reducers in Hadoop

Posted on

How to choose the number of mappers and reducers in Hadoop to get good job performance? The Hadoop Wiki gives a discussion on this: http://wiki.apache.org/hadoop/HowManyMapsAndReduces Some valuable points: About the number of Maps: The number of maps is usually driven by the number of DFS blocks in the input files. Although that causes people to
Read more

SQL layers on NoSQL databases

Posted on

What are the SQL layer solution over NoSQL databases such as key/value stores? Phoenix: A SQL layer on HBase: https://github.com/forcedotcom/phoenix They also show some performance results: https://github.com/forcedotcom/phoenix/wiki/Performance F1 – The Fault-Tolerant Distributed RDBMS Supporting Google’s Ad Business: http://research.google.com/pubs/pub38125.html With F1, we have built a novel hybrid system that combines the scalability, fault tolerance, transparent sharding,
Read more

How to force a metadata checkpointing in HDFS

Posted on

The metadata checkpointing in HDFS is done by the Secondary NameNode to merge the fsimage and the edits log files periodically and keep edits log size within a limit. For various reasons, the checkpointing by the Secondary NameNode may fail. For one example, HDFS SecondaraNameNode log shows errors in its log as follows. 2017-08-06 10:54:14,488
Read more

What Is the Name of the Linux-based OS: A Survey

Posted on

You may already well know “Linux” and may also use the “operating system based on the Linux kernel” directly or indirectly (you are indirectly using it now as this site is hosted on Linux). But how should we name the OS based on Linux? You may know there is GNU/Linux naming controversy. Different people have
Read more

Hadoop Installation Tutorial (Hadoop 2.x)

Posted on

Hadoop 2 or YARN is the new version of Hadoop. It adds the yarn resource manager in addition to the HDFS and MapReduce components. Hadoop MapReduce is a programming model and software framework for writing applications, which is an open-source variant of MapReduce designed and implemented by Google initially for processing and generating large data
Read more

Speeding Up the Site With Apache GZIP Compression

Posted on

We can speed up the site with compression while save bandwidth at the same time. As most of the modern browsers support gzip encoding, we can set it up to let the users enjoy faster speed. The Apache mod_deflate is easy to set up and standard. It compress the content on the fly. We can
Read more

How to Change the Site’s Default 404 Error Not Found Page

Posted on

The apache’s default “404 Error not found” page seems ugly. And may some hosting service put theire ads in it. We can add some entry in .htaccess to change the defualt 404 error page. This method can also be used for some other error codes. A list of the server returned codes can be found
Read more

How to Redirect Old Domain to New Domain Using htaccess Redirect

Posted on

I want to move the sub domain blog.pkill.info to systutorials.com permanently. I can manage all the pages I want to post using WordPress. Changing domain in a bad way is dangerous. Put a page the tell the reader that the site is moved to a new domain is very unfriendly to the user and also
Read more

Hadoop Installation Tutorial (Hadoop 1.x)

Posted on

Update: If you are new to Hadoop and trying to install one. Please check the newer version: Hadoop Installation Tutorial (Hadoop 2.x). Hadoop mainly consists of two parts: Hadoop MapReduce and HDFS. Hadoop MapReduce is a programming model and software framework for writing applications, which is an open-source variant of MapReduce that is initially designed
Read more

mrcc – A Distributed C Compiler System on MapReduce

Posted on

The mrcc project’s homepage is here: mrcc project. Abstract mrcc is an open source compilation system that uses MapReduce to distribute C code compilation across the servers of the cloud computing platform. mrcc is built to use Hadoop by default, but it is easy to port it to other could computing platforms, such as MRlite,
Read more