how to skip mapper function in hadoop

Posted on

In hadoop I need to skip mapper function and directly execute the reducer function. We doing this to improve hadoop performance, if the hadoop framework is used to analyze same data sets, then mapper’s output will be same for different kind of jobs. To save the redundant computation for same results, I am planning to
Read more

How to adjust the system partition (C:) size of Windows?

Posted on

The disk management tools of Windows can adjust it to some level. But there are more space available as far as I can tell. How to further adjust the system partition (C:) size of Windows? You may check these tools: EASEUS Partition Master (free) Includes Partition Manager, Disk & Partition Copy Wizard and Partition Recovery
Read more

Enlarging Linux UDP buffer size

Posted on

One of the most common causes of UDP data gram lost on Linux is an undersized receive buffer on the Linux socket. How to enlarge Linux UDP buffer size? On Linux, you can change the UDP buffer size (e.g. to 26214400) by (as root): sysctl -w net.core.rmem_max=26214400 The default buffer size on Linux is 131071.
Read more

How to choose the number of mappers and reducers in Hadoop

Posted on

How to choose the number of mappers and reducers in Hadoop to get good job performance? The Hadoop Wiki gives a discussion on this: http://wiki.apache.org/hadoop/HowManyMapsAndReduces Some valuable points: About the number of Maps: The number of maps is usually driven by the number of DFS blocks in the input files. Although that causes people to
Read more

Is cin much slower than scanf in C++?

Posted on

I frequently hear that cin is significantly slower than scanf in C++. Is this true? And how to improve the efficiency of cin? It is really nice to use most of time. One discussion about that cin is very slow is here: http://apps.topcoder.com/forums/?module=Thread&threadID=508058&start=0&mc=7 In short: cin is not always slower (can be faster actually, see
Read more

SQL layers on NoSQL databases

Posted on

What are the SQL layer solution over NoSQL databases such as key/value stores? Phoenix: A SQL layer on HBase: https://github.com/forcedotcom/phoenix They also show some performance results: https://github.com/forcedotcom/phoenix/wiki/Performance F1 – The Fault-Tolerant Distributed RDBMS Supporting Google’s Ad Business: http://research.google.com/pubs/pub38125.html With F1, we have built a novel hybrid system that combines the scalability, fault tolerance, transparent sharding,
Read more

Are there good free CDNs on the Web

Posted on

Are there some good free CDNs on the Web? There are some free CDNs in the Web. Cloudflare: https://www.cloudflare.com CloudFlare protects and accelerates any website online. Once your website is a part of the CloudFlare community, its web traffic is routed through our intelligent global network. We automatically optimize the delivery of your web pages
Read more

QEMU/KVM Network Mechanisms

Posted on

Introduction As we know, network subsystems are important in computer systems since they are I/O systems and need to be optimized with many algorithms and skills. This article will introduce how QEMU/KVM [2] network part works. In order to put everything simple and easy to understand, we will begin with several examples and then understand
Read more

I/O Microscopy: Tasks’ Disk I/O Information with High Accuracy

Posted on

Abstract Most popular task monitor systems (such as top, iotop, proc, etc) can only get tasks’ disk I/O information like tasks’ I/O utilization percentage every seconds due to kernel timer/tick frequency and high time cost of system interfaces. This article presents I/O Microscopy, a new way to get tasks’ disk I/O information with high accuracy.
Read more

Make Better Decisions for Your Businesses with Data Visualization

Posted on

In today’s time, data visualization has become a significant part of the success story of an organization. With the help of right techniques, visualizing data can reveal insights which the management staff can use in their decision-making in order to make sound data-driven decisions. Mapping software is among the robust data visualization tools that you
Read more

x-data-plane feature in QEMU/KVM

Posted on

Abstract In systems, sometimes, we use one global lock to keep synchronization among different threads. This principle also happens in QEMU/KVM (http://wiki.qemu.org/Main_Page) system. However, this may cause lock contention problem. The performance/scalability of whole system will be decreased. In order to solve this problem in QEMU/KVM, x-data-plane feature is designed/implemented, which the high-level idea is
Read more

How migration thread works inside of Linux Kernel

Posted on

Abstract In computer systems, resources have to be balanced so that the performance will be better based on the same hardware. In Linux Kernel system, we will see some migration kernel threads running as daemons to do this kind of jobs as follows. In this article, we will discuss how Linux Kernel balances its hardware/software
Read more

Which Checksum Tool on Linux is Faster?

Posted on

It is common practice to calculate the checksums for files to check its integrity. For large files, the checksum computation is slow. Now I am wondering why it is so slow and whether choosing another tool will be better. In this post, I try three common tools md5sum, sha1sum and crc32 to compute checksums on
Read more

What Is the Name of the Linux-based OS: A Survey

Posted on

You may already well know “Linux” and may also use the “operating system based on the Linux kernel” directly or indirectly (you are indirectly using it now as this site is hosted on Linux). But how should we name the OS based on Linux? You may know there is GNU/Linux naming controversy. Different people have
Read more

SSD Enabled For DreamHost Shared Hosting: Simple Performance Measurement

Posted on

SSD is common for VPS and PaaS virtual machines for higher I/O performance. Now, it is coming to shared hosting too. DreamHost states that “Now with solid state drives (SSDs), our standard web hosting loads pages 200% faster”. We ourselves are happy to know this performance improvement with the price kept the same. Good work,
Read more

Making GPT Partition Table and Creating Partitions Using parted in Linux

Posted on

My best favorite disk partition table manipulation tools are cfdisk/fdisk on Linux. However, for large disks, cfdisk/fdisk (of the versions by this post is written) will just give up with a message suggesting GPT partition table format and using GNU parted like WARNING: The size of this disk is 6.0 TB (6001042391040 bytes). DOS partition
Read more

How to Create Fedora 20 Domain-U on Fedora 20 Domain-0

Posted on

In this post, creating a file-backed virtual block device (VBD) and installing Fedora 20 in the Xen DomU via internet will be introduced. This domain is created on a Fedora 20 Dom0 as introduced in https://www.systutorials.com/installing-xen-on-fedora-20-as-domain-0/. For better performance, you may consider using LVM backed VM. Create file-backed VBD The actual space of VBD will
Read more

How to Install, Run and Uninstall VMware Player and VirtualBox on Fedora Linux

Posted on

VMware Player and VirtualBox are two cool and free full virtualization solutions and both can run on top of a Linux host. In this post, I introduce how to install, run, and uninstall VMware Player and VirtualBox on Fedora Linux. VMware Player Install VMware Player Download the installation bundle from VMware’s website. For example, the
Read more

Improving ssh/scp Performance by Choosing Suitable Ciphers

Posted on

Update on Oct. 9, 2014: You should be aware of the possible security problems of blowfish and it is suggested not to be used. Instead, you may consider ChaCha20 as suggested by Tony Arcieri. To use this with OpenSSH, you need to specify the Ciphers in your .ssh/config files as chacha20-poly1305@openssh.com possibly with another default
Read more