In hadoop I need to skip mapper function and directly execute the reducer function. We doing this to improve hadoop performance, if the hadoop framework is used to analyze same data sets, then mapper’s output will be same for different kind of jobs. To save the redundant computation for same results, I am planning to
Read more
Tag: performance
How to adjust the system partition (C:) size of Windows?
Posted onThe disk management tools of Windows can adjust it to some level. But there are more space available as far as I can tell. How to further adjust the system partition (C:) size of Windows? You may check these tools: EASEUS Partition Master (free) Includes Partition Manager, Disk & Partition Copy Wizard and Partition Recovery
Read more
Enlarging Linux UDP buffer size
Posted onOne of the most common causes of UDP data gram lost on Linux is an undersized receive buffer on the Linux socket. How to enlarge Linux UDP buffer size? On Linux, you can change the UDP buffer size (e.g. to 26214400) by (as root): sysctl -w net.core.rmem_max=26214400 The default buffer size on Linux is 131071.
Read more
How to improve ssh/scp performance on Linux?
Posted onssh/scp are convenient and handy tools on Linux. Is is possible to further improve its speed/performance? Please check this post for how to improve ssh/scp performance: https://www.systutorials.com/5450/improving-sshscp-performance-by-choosing-ciphers/
How to choose the number of mappers and reducers in Hadoop
Posted onHow to choose the number of mappers and reducers in Hadoop to get good job performance? The Hadoop Wiki gives a discussion on this: http://wiki.apache.org/hadoop/HowManyMapsAndReduces Some valuable points: About the number of Maps: The number of maps is usually driven by the number of DFS blocks in the input files. Although that causes people to
Read more
Is cin much slower than scanf in C++?
Posted onI frequently hear that cin is significantly slower than scanf in C++. Is this true? And how to improve the efficiency of cin? It is really nice to use most of time. One discussion about that cin is very slow is here: http://apps.topcoder.com/forums/?module=Thread&threadID=508058&start=0&mc=7 In short: cin is not always slower (can be faster actually, see
Read more
SQL layers on NoSQL databases
Posted onWhat are the SQL layer solution over NoSQL databases such as key/value stores? Phoenix: A SQL layer on HBase: https://github.com/forcedotcom/phoenix They also show some performance results: https://github.com/forcedotcom/phoenix/wiki/Performance F1 – The Fault-Tolerant Distributed RDBMS Supporting Google’s Ad Business: http://research.google.com/pubs/pub38125.html With F1, we have built a novel hybrid system that combines the scalability, fault tolerance, transparent sharding,
Read more
Are there good free CDNs on the Web
Posted onAre there some good free CDNs on the Web? There are some free CDNs in the Web. Cloudflare: https://www.cloudflare.com CloudFlare protects and accelerates any website online. Once your website is a part of the CloudFlare community, its web traffic is routed through our intelligent global network. We automatically optimize the delivery of your web pages
Read more
QEMU/KVM Network Mechanisms
Posted onIntroduction As we know, network subsystems are important in computer systems since they are I/O systems and need to be optimized with many algorithms and skills. This article will introduce how QEMU/KVM [2] network part works. In order to put everything simple and easy to understand, we will begin with several examples and then understand
Read more
I/O Microscopy: Tasks’ Disk I/O Information with High Accuracy
Posted onAbstract Most popular task monitor systems (such as top, iotop, proc, etc) can only get tasks’ disk I/O information like tasks’ I/O utilization percentage every seconds due to kernel timer/tick frequency and high time cost of system interfaces. This article presents I/O Microscopy, a new way to get tasks’ disk I/O information with high accuracy.
Read more
Make Better Decisions for Your Businesses with Data Visualization
Posted onIn today’s time, data visualization has become a significant part of the success story of an organization. With the help of right techniques, visualizing data can reveal insights which the management staff can use in their decision-making in order to make sound data-driven decisions. Mapping software is among the robust data visualization tools that you
Read more
x-data-plane feature in QEMU/KVM
Posted onAbstract In systems, sometimes, we use one global lock to keep synchronization among different threads. This principle also happens in QEMU/KVM (http://wiki.qemu.org/Main_Page) system. However, this may cause lock contention problem. The performance/scalability of whole system will be decreased. In order to solve this problem in QEMU/KVM, x-data-plane feature is designed/implemented, which the high-level idea is
Read more
How migration thread works inside of Linux Kernel
Posted onAbstract In computer systems, resources have to be balanced so that the performance will be better based on the same hardware. In Linux Kernel system, we will see some migration kernel threads running as daemons to do this kind of jobs as follows. In this article, we will discuss how Linux Kernel balances its hardware/software
Read more
Which Checksum Tool on Linux is Faster?
Posted onIt is common practice to calculate the checksums for files to check its integrity. For large files, the checksum computation is slow. Now I am wondering why it is so slow and whether choosing another tool will be better. In this post, I try three common tools md5sum, sha1sum and crc32 to compute checksums on
Read more
What Is the Name of the Linux-based OS: A Survey
Posted onYou may already well know “Linux” and may also use the “operating system based on the Linux kernel” directly or indirectly (you are indirectly using it now as this site is hosted on Linux). But how should we name the OS based on Linux? You may know there is GNU/Linux naming controversy. Different people have
Read more
SSD Enabled For DreamHost Shared Hosting: Simple Performance Measurement
Posted onSSD is common for VPS and PaaS virtual machines for higher I/O performance. Now, it is coming to shared hosting too. DreamHost states that “Now with solid state drives (SSDs), our standard web hosting loads pages 200% faster”. We ourselves are happy to know this performance improvement with the price kept the same. Good work,
Read more
Making GPT Partition Table and Creating Partitions Using parted in Linux
Posted onMy best favorite disk partition table manipulation tools are cfdisk/fdisk on Linux. However, for large disks, cfdisk/fdisk (of the versions by this post is written) will just give up with a message suggesting GPT partition table format and using GNU parted like WARNING: The size of this disk is 6.0 TB (6001042391040 bytes). DOS partition
Read more
How to Create Fedora 20 Domain-U on Fedora 20 Domain-0
Posted onIn this post, creating a file-backed virtual block device (VBD) and installing Fedora 20 in the Xen DomU via internet will be introduced. This domain is created on a Fedora 20 Dom0 as introduced in https://www.systutorials.com/installing-xen-on-fedora-20-as-domain-0/. For better performance, you may consider using LVM backed VM. Create file-backed VBD The actual space of VBD will
Read more
How to Install, Run and Uninstall VMware Player and VirtualBox on Fedora Linux
Posted onVMware Player and VirtualBox are two cool and free full virtualization solutions and both can run on top of a Linux host. In this post, I introduce how to install, run, and uninstall VMware Player and VirtualBox on Fedora Linux. VMware Player Install VMware Player Download the installation bundle from VMware’s website. For example, the
Read more
Improving ssh/scp Performance by Choosing Suitable Ciphers
Posted onUpdate on Oct. 9, 2014: You should be aware of the possible security problems of blowfish and it is suggested not to be used. Instead, you may consider ChaCha20 as suggested by Tony Arcieri. To use this with OpenSSH, you need to specify the Ciphers in your .ssh/config files as chacha20-poly1305@openssh.com possibly with another default
Read more