Cloud storage systems utilize various consistency models to balance performance, availability, and data accuracy. This article explores these models, their trade-offs, and examples of systems using them. We’ll also discuss the CAP theorem and its implications. Consistency Models Strong Consistency Definition: Guarantees that any read operation returns the most recent write for a given piece
Read more
Tag: Cluster
Understanding the Raft Consensus Protocol
Posted onThe Raft consensus protocol is a distributed consensus algorithm designed to be more understandable than other consensus algorithms like Paxos. It ensures that a cluster of servers can agree on the state of a system even in the presence of failures. Key Concepts Raft divides the consensus problem into three relatively independent subproblems: Leader Election:
Read more
Decentralized Exchanges (DEX) vs. Centralized Exchanges (CEX): A Technical Comparison
Posted onCryptocurrency exchanges have revolutionized the way we trade digital assets, with two main types of exchanges dominating the market: decentralized exchanges (DEX) and centralized exchanges (CEX). In this article, we’ll compare the DEX and CEX from a technical perspective. Decentralized Exchanges (DEX) DEX operate on a decentralized blockchain network, such as Ethereum, and are built
Read more
Linux Kernel 4.9.60 Release
Posted onThis post summarizes Linux Kernel new features, bugfixes and changes in Linux 4.9.60 Release. Linux 4.9.60 Release contains 24 changes, patches or new features. In total, there are 64,224 lines of Linux source code changed/added in Linux 4.9.60 release compared to Linux 4.9 release. To view the source code of Linux 4.9.60 kernel release online,
Read more
Installing R and RStudio Server in Ubuntu Linux
Posted onR is a language and environment for statistical computing and graphics, providing a wide variety of statistical and graphical techniques. The R environment is open source software under GPL. R has rich software packages and is widely used for statistical analysis. RStudio Server is an R integrated development environment (IDE) that provides many useful features
Read more
The cultural impact of cloud technology
Posted onCloud technology is one of the latest forms of technology. A cloud is a place where exactly the data is stored. Also, the cloud is the place where the data is managed and processed. Cloud ensures that the data managed on a cluster or the network of servers. All of these servers are available remotely
Read more
What is the Future of Big Data Analytics and Hadoop?
Posted onBig Data has taken a lead in the IT industry and has played a significant role in the Business growth and decision-making processes that gives you an edge over the competitors. This is equally applicable to the organizations as well as professionals existing in the analytics domain. Big Data Analytics bring an ocean of opportunities
Read more
How to estimate the memory usage of HDFS NameNode for a HDFS cluster?
Posted onHDFS stores the metadata of files and blocks in the memory of the NameNode. How to estimate the memory usage of HDFS NameNode for a HDFS cluster? Each file and each block has around 150 bytes of metadata on NameNode. So you may do the calculation based on this. For examples, assume block size is
Read more
Which filesystem operations in HDFS is atomic?
Posted onAtomicity is a very important and fundamental property aspect of filesystems. Applications semantics and many functions depend on and only be available based on the atomicity models of the underlying filesystem. Which filesystem operations in HDFS is atomic? So that locks can be implemented on top of it. In a reasonably widely usable filesystem, some
Read more
How to handle missing blocks and blocks with corrupt replicas in HDFS?
Posted onOne of HDFS cluster’s hdfs dfsadmin -report reports: Under replicated blocks: 139016 Blocks with corrupt replicas: 9 Missing blocks: 0 The “Under replicated blocks” can be re-replicated automatically after some time. How to handle the missing blocks and blocks with corrupt replicas in HDFS? Understanding these blocks A block is “with corrupt replicas” in HDFS
Read more
How to email admins automatically after a Linux server starts?
Posted onManaging a cluster of servers, I would like to notified when a server is started. How to make the Linux servers email me or other admins automatically after they are started? I did this by adding a crontab entry on each servers like @reboot date | mailx -S smtp=smtp://smtp.example.com -s “`hostname` started” -r zma@example.com zma@example.com
Read more
How to add a new HDFS NameNode metadata directory to an existing cluster?
Posted onWe have a running HDFS cluster. Currently, the NameNode metadata data directory has only one directory configured in hdfs-site.xml: <property> <name>dfs.namenode.name.dir</name> <value>file:///home/hadoop/hdfs/</value> <description>NameNode directory for namespace and transaction logs storage.</description> </property> We would like to add a new directory for dfs.namenode.name.dir to make replicas of the metadata on a separated disk for higher data reliability.
Read more
How to check the replication factor of a file in HDFS?
Posted onA related question: how to find the replication factors of files in a HDFS cluster? method 1: You can use the HDFS command line to ls the file. The second column of the output will show the replication factor of the file. For example, $ hdfs dfs -ls /usr/GroupStorage/data1/out.txt -rw-r–r– 3 hadoop zma 11906625598 2014-10-22
Read more
How to change an running HDFS cluster’s replication factor?
Posted onNow, I have a running HDFS cluster storing lost files. I want to change its default replication factor. How to change it? What will happen after it is changed? For example, I change from 2 to 3. Will HDFS automatically re-replicate the data chunks? First, the replication factor is client decided. Second, the replication factor
Read more
How to balance DataNode storage in HDFS?
Posted onAs nodes are added and deleted in a Hadoop cluster. Storage usage across DataNodes may be different. Some DataNodes’ disks are almost used up while some others’ are almost empty. How to balance data across DataNodes in HDFS? Hadoop provides the balancer to redistribute the data. Brief introduction to balancer in Hadoop: balancer. The design
Read more
How to totally disable firewall or iptables on Fedora 20
Posted onOur servers run inside our own cluster and no firewall is needed. How to totally disable firewall or iptables on Fedora 20? Fedora 20 uses FirewallD as the firewall service. To totally disable firewalld: # systemctl disable firewalld # systemctl stop firewalld
Directly SSH to hosts using internal IPs through the gateway
Posted onWe have many hosts with internal IPs like 10.0.3.* behind a gateway, say gateway.example.org. The hosts with internal IP connect to the Internet through the gateway. How to directly SSH to hosts using internal IPs through the gateway? Here is the solution: Directly SSH to Hosts with LAN IPs Through the Gateway
Random string password generator in Scala
Posted onManaging our research cluster, I frequently need to generate some string for new users’ password. How to generate them automatically and randomly in Scala? The passwords need characters ‘a’ – ‘z’, ‘A’ – ‘Z’ and ‘0’ – ‘9’ only. This piece of code works very well for me: def randomString(len: Int): String = { val
Read more
Rsync with non-standard ssh ports
Posted onThis problem appears when I try to rsync directories with hosts inside a cluster used NAT for forwarding ports to internal nodes. Hence, the ssh port for internal nodes are not the default 22. So, how to use rsync with the non-standard ssh ports? The -e options of rsync play the trick very well. For
Read more
Systems Conferences
Posted onWhich ones are good systems conferences? Top ones by ACM and USENIX: OSDI: https://www.usenix.org/conferences/byname/179 SOSP: http://sosp.org/ Other SIGOPS Events: http://www.sigops.org/conf-sponsored.html EuroSys: http://www.eurosys.org/ SoCC: http://www.socc2013.org/ (SoCC 2013) ASPLOS: http://www.sigplan.org/Conferences/ASPLOS/Main VEE: http://www.sigplan.org/vee.htm USENIX ATC: https://www.usenix.org/conferences/byname/131 NSDI: https://www.usenix.org/conferences/byname/178 IEEE Conferences: ICDCS: http://www.temple.edu/cis/icdcs2013/ (2013) IPDPS: http://www.ipdps.org/ Other related ones and workshops: HPCA: Search HPCA ConferenceSC: http://www.supercomp.org/IEEE CLUSTER: http://www.clustercomp.org/ HotCloud:
Read more