Understanding Cloud Storage Consistency Models

Posted on

Cloud storage systems utilize various consistency models to balance performance, availability, and data accuracy. This article explores these models, their trade-offs, and examples of systems using them. We’ll also discuss the CAP theorem and its implications. Consistency Models Strong Consistency Definition: Guarantees that any read operation returns the most recent write for a given piece
Read more

Understanding the Raft Consensus Protocol

Posted on

The Raft consensus protocol is a distributed consensus algorithm designed to be more understandable than other consensus algorithms like Paxos. It ensures that a cluster of servers can agree on the state of a system even in the presence of failures. Key Concepts Raft divides the consensus problem into three relatively independent subproblems: Leader Election:
Read more

Decentralized Exchanges (DEX) vs. Centralized Exchanges (CEX): A Technical Comparison

Posted on

Cryptocurrency exchanges have revolutionized the way we trade digital assets, with two main types of exchanges dominating the market: decentralized exchanges (DEX) and centralized exchanges (CEX). In this article, we’ll compare the DEX and CEX from a technical perspective. Decentralized Exchanges (DEX) DEX operate on a decentralized blockchain network, such as Ethereum, and are built
Read more

Linux Kernel 4.9.60 Release

Posted on

This post summarizes Linux Kernel new features, bugfixes and changes in Linux 4.9.60 Release. Linux 4.9.60 Release contains 24 changes, patches or new features. In total, there are 64,224 lines of Linux source code changed/added in Linux 4.9.60 release compared to Linux 4.9 release. To view the source code of Linux 4.9.60 kernel release online,
Read more

Installing R and RStudio Server in Ubuntu Linux

Posted on

R is a language and environment for statistical computing and graphics, providing a wide variety of statistical and graphical techniques. The R environment is open source software under GPL. R has rich software packages and is widely used for statistical analysis. RStudio Server is an R integrated development environment (IDE) that provides many useful features
Read more

The cultural impact of cloud technology

Posted on

Cloud technology is one of the latest forms of technology. A cloud is a place where exactly the data is stored. Also, the cloud is the place where the data is managed and processed. Cloud ensures that the data managed on a cluster or the network of servers. All of these servers are available remotely
Read more

What is the Future of Big Data Analytics and Hadoop?

Posted on

Big Data has taken a lead in the IT industry and has played a significant role in the Business growth and decision-making processes that gives you an edge over the competitors. This is equally applicable to the organizations as well as professionals existing in the analytics domain. Big Data Analytics bring an ocean of opportunities
Read more

How to estimate the memory usage of HDFS NameNode for a HDFS cluster?

Posted on

HDFS stores the metadata of files and blocks in the memory of the NameNode. How to estimate the memory usage of HDFS NameNode for a HDFS cluster? Each file and each block has around 150 bytes of metadata on NameNode. So you may do the calculation based on this. For examples, assume block size is
Read more

How to handle missing blocks and blocks with corrupt replicas in HDFS?

Posted on

One of HDFS cluster’s hdfs dfsadmin -report reports: Under replicated blocks: 139016 Blocks with corrupt replicas: 9 Missing blocks: 0 The “Under replicated blocks” can be re-replicated automatically after some time. How to handle the missing blocks and blocks with corrupt replicas in HDFS? Understanding these blocks A block is “with corrupt replicas” in HDFS
Read more

How to email admins automatically after a Linux server starts?

Posted on

Managing a cluster of servers, I would like to notified when a server is started. How to make the Linux servers email me or other admins automatically after they are started? I did this by adding a crontab entry on each servers like @reboot date | mailx -S smtp=smtp://smtp.example.com -s “`hostname` started” -r zma@example.com zma@example.com
Read more

How to add a new HDFS NameNode metadata directory to an existing cluster?

Posted on

We have a running HDFS cluster. Currently, the NameNode metadata data directory has only one directory configured in hdfs-site.xml: <property> <name>dfs.namenode.name.dir</name> <value>file:///home/hadoop/hdfs/</value> <description>NameNode directory for namespace and transaction logs storage.</description> </property> We would like to add a new directory for dfs.namenode.name.dir to make replicas of the metadata on a separated disk for higher data reliability.
Read more

How to check the replication factor of a file in HDFS?

Posted on

A related question: how to find the replication factors of files in a HDFS cluster? method 1: You can use the HDFS command line to ls the file. The second column of the output will show the replication factor of the file. For example, $ hdfs dfs -ls /usr/GroupStorage/data1/out.txt -rw-r–r– 3 hadoop zma 11906625598 2014-10-22
Read more

How to change an running HDFS cluster’s replication factor?

Posted on

Now, I have a running HDFS cluster storing lost files. I want to change its default replication factor. How to change it? What will happen after it is changed? For example, I change from 2 to 3. Will HDFS automatically re-replicate the data chunks? First, the replication factor is client decided. Second, the replication factor
Read more

How to balance DataNode storage in HDFS?

Posted on

As nodes are added and deleted in a Hadoop cluster. Storage usage across DataNodes may be different. Some DataNodes’ disks are almost used up while some others’ are almost empty. How to balance data across DataNodes in HDFS? Hadoop provides the balancer to redistribute the data. Brief introduction to balancer in Hadoop: balancer. The design
Read more

Random string password generator in Scala

Posted on

Managing our research cluster, I frequently need to generate some string for new users’ password. How to generate them automatically and randomly in Scala? The passwords need characters ‘a’ – ‘z’, ‘A’ – ‘Z’ and ‘0’ – ‘9’ only. This piece of code works very well for me: def randomString(len: Int): String = { val
Read more

Systems Conferences

Posted on

Which ones are good systems conferences? Top ones by ACM and USENIX: OSDI: https://www.usenix.org/conferences/byname/179 SOSP: http://sosp.org/ Other SIGOPS Events: http://www.sigops.org/conf-sponsored.html EuroSys: http://www.eurosys.org/ SoCC: http://www.socc2013.org/ (SoCC 2013) ASPLOS: http://www.sigplan.org/Conferences/ASPLOS/Main VEE: http://www.sigplan.org/vee.htm USENIX ATC: https://www.usenix.org/conferences/byname/131 NSDI: https://www.usenix.org/conferences/byname/178 IEEE Conferences: ICDCS: http://www.temple.edu/cis/icdcs2013/ (2013) IPDPS: http://www.ipdps.org/ Other related ones and workshops: HPCA: Search HPCA ConferenceSC: http://www.supercomp.org/IEEE CLUSTER: http://www.clustercomp.org/ HotCloud:
Read more