Is it possible to set the replication factor for specific directory in HDFS to be one that is different from the default replication factor? This should set the existing files’ replication factors but also new files created in the specific directory. This can simplify the administration. We can set the replication factor of /tmp/ to
Read more
Tag: Apache
How to get logs of a specific time range on Linux?
Posted onThe logs I am processing is Hadoop log (log4j). It is in format like: 2014-09-20 21:55:11,855 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated user map size: 36 2014-09-20 21:55:11,863 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated group map size: 55 2014-09-20 22:10:11,907 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Update cache now 2014-09-20 22:10:11,907 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Not doing static UID/GID mapping because ‘/etc/nfs.map’ does not exist. Now, I
Read more
How to redirect HTTP to HTTPS in Apache with mod_rewrite?
Posted onI’d like to force using of https on one of my site. How to redirect HTTP to HTTPS in Apache with mod_rewrite? You can put these lines to the .htaccess file in the directory from which you would like to redirect HTTPS to HTTP: RewriteEngine On RewriteCond %{HTTPS} off RewriteRule (.*) https://%{HTTP_HOST}%{REQUEST_URI}
Good introductions to Hadoop 2.0 (YARN)?
Posted onWhich ones are recommended introductions to Hadoop 2.0 (YARN)? Pointers to webpages are good. Those are good ones that I find: The SoCC13 paper “Apache Hadoop YARN: Yet Another Resource Negotiator” by Vinod Kumar Vavilapalli et al.: http://www.socc2013.org/home/program/a5-vavilapalli.pdf The introduction from Hortonworks by Arun Murthy:http://hortonworks.com/blog/apache-hadoop-yarn-concepts-and-applications/ The “Official” one from Apache Hadoop website (very brief):https://hadoop.apache.org/docs/r2.2.0/hadoop-yarn/hadoop-yarn-site/YARN.html
Hadoop 2 (YARN) default configuration values
Posted onWhere to check the default Hadoop 2 (YARN) configuration values for: HDFS: hdfs-site.xml YARN: yarn-site.xml MapReduce: mapred-site.xml Default Hadoop 2 (YARN) configuration values for Hadoop 2.2.0 from Apache Hadoop website: HDFS: http://hadoop.apache.org/docs/r2.2.0/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml YARN: https://hadoop.apache.org/docs/r2.2.0/hadoop-yarn/hadoop-yarn-common/yarn-default.xml MapReduce: https://hadoop.apache.org/docs/r2.2.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml
Add header footer in directory listing in Apache (httpd)
Posted onHow to add header footer in directory listing in Apache (httpd)? In the web directory’s .htaccess file: Options +Indexes IndexOptions FancyIndexing VersionSort NameWidth=* HTMLTable Charset=UTF-8 HeaderName /header.html ReadmeName /footer.html IndexIgnore header.html footer.html .htaccess header.html and footer.html are under the website root directory (not the Linux root).
Installing WordPress in a sub directory while working for the whole site
Posted onHow to install WordPress in a sub directory while working for the whole site? Putting the WordPress files in the root directory seems messy. A method is introduced here: http://codex.wordpress.org/Giving_WordPress_Its_Own_Directory The process to move WordPress into its own directory is as follows: Create the new location for the core WordPress files to be stored (we
Read more
How to change the maximum execution time and memory size allowed for PHP
Posted onI see this message in the error log of httpd: PHP Fatal error: Maximum execution time of 30 seconds exceeded in and PHP Fatal error: Allowed memory size of 268435456 bytes exhausted How to change them to a longer and larger value? To change the allowed maximum memory usage of PHP: Set memory_limit = 256M
Read more
How to choose the number of mappers and reducers in Hadoop
Posted onHow to choose the number of mappers and reducers in Hadoop to get good job performance? The Hadoop Wiki gives a discussion on this: http://wiki.apache.org/hadoop/HowManyMapsAndReduces Some valuable points: About the number of Maps: The number of maps is usually driven by the number of DFS blocks in the input files. Although that causes people to
Read more
SQL layers on NoSQL databases
Posted onWhat are the SQL layer solution over NoSQL databases such as key/value stores? Phoenix: A SQL layer on HBase: https://github.com/forcedotcom/phoenix They also show some performance results: https://github.com/forcedotcom/phoenix/wiki/Performance F1 – The Fault-Tolerant Distributed RDBMS Supporting Google’s Ad Business: http://research.google.com/pubs/pub38125.html With F1, we have built a novel hybrid system that combines the scalability, fault tolerance, transparent sharding,
Read more
How to install php on Apache on Fedora?
Posted onHow to install php on Apache on Fedora? The basic support (basic PHP support, no caching, etc.) should be enough. First, install Apache2 (httpd): # yum install httpd Then, enable php support: # yum install php Remember to restart httpd after you install php: # service httpd restart
.htaccess: How to disable directory listing?
Posted onHow to disable directory listing using .htaccess? The webserver (Apache) should allow downloading files in a directory and child directories of it but forbid listing of the directory and child directories. In the directory that you want to disable directory listing, create the .htaccess file that contains: Options -Indexes You can also done by Options
Read more
How to force a metadata checkpointing in HDFS
Posted onThe metadata checkpointing in HDFS is done by the Secondary NameNode to merge the fsimage and the edits log files periodically and keep edits log size within a limit. For various reasons, the checkpointing by the Secondary NameNode may fail. For one example, HDFS SecondaraNameNode log shows errors in its log as follows. 2017-08-06 10:54:14,488
Read more
What Is the Name of the Linux-based OS: A Survey
Posted onYou may already well know “Linux” and may also use the “operating system based on the Linux kernel” directly or indirectly (you are indirectly using it now as this site is hosted on Linux). But how should we name the OS based on Linux? You may know there is GNU/Linux naming controversy. Different people have
Read more
Hadoop Installation Tutorial (Hadoop 2.x)
Posted onHadoop 2 or YARN is the new version of Hadoop. It adds the yarn resource manager in addition to the HDFS and MapReduce components. Hadoop MapReduce is a programming model and software framework for writing applications, which is an open-source variant of MapReduce designed and implemented by Google initially for processing and generating large data
Read more
Speeding Up the Site With Apache GZIP Compression
Posted onWe can speed up the site with compression while save bandwidth at the same time. As most of the modern browsers support gzip encoding, we can set it up to let the users enjoy faster speed. The Apache mod_deflate is easy to set up and standard. It compress the content on the fly. We can
Read more
How to Change the Site’s Default 404 Error Not Found Page
Posted onThe apache’s default “404 Error not found” page seems ugly. And may some hosting service put theire ads in it. We can add some entry in .htaccess to change the defualt 404 error page. This method can also be used for some other error codes. A list of the server returned codes can be found
Read more
How to Redirect Old Domain to New Domain Using htaccess Redirect
Posted onI want to move the sub domain blog.pkill.info to systutorials.com permanently. I can manage all the pages I want to post using WordPress. Changing domain in a bad way is dangerous. Put a page the tell the reader that the site is moved to a new domain is very unfriendly to the user and also
Read more
Hadoop Installation Tutorial (Hadoop 1.x)
Posted onUpdate: If you are new to Hadoop and trying to install one. Please check the newer version: Hadoop Installation Tutorial (Hadoop 2.x). Hadoop mainly consists of two parts: Hadoop MapReduce and HDFS. Hadoop MapReduce is a programming model and software framework for writing applications, which is an open-source variant of MapReduce that is initially designed
Read more
mrcc – A Distributed C Compiler System on MapReduce
Posted onThe mrcc project’s homepage is here: mrcc project. Abstract mrcc is an open source compilation system that uses MapReduce to distribute C code compilation across the servers of the cloud computing platform. mrcc is built to use Hadoop by default, but it is easy to port it to other could computing platforms, such as MRlite,
Read more