Computing systems

Computing systems | Systems | Systems 101 | Tutorial

Comparing Paxos and Raft
ByEric Ma Sep 6, 2024

Paxos and Raft are both consensus algorithms used to ensure consistency in distributed systems. While they solve similar problems, they have different approaches and design philosophies. Characteristics Paxos Roles: Proposers, Acceptors, Learners. Phases: Two main phases (Prepare/Promise and Propose/Accept). Leader Election: Not explicitly defined, often implemented using Multi-Paxos to handle multiple proposals efficiently. Use Cases:…

Read More Comparing Paxos and Raft
Computing systems | Systems | Systems 101 | Tutorial

Understanding the Paxos Consensus Algorithm
ByEric Ma Sep 6, 2024

The Paxos consensus algorithm is a fundamental concept in distributed computing that ensures a group of distributed systems can agree on a single value, even in the presence of failures. Developed by Leslie Lamport, Paxos is widely used in systems where consistency and fault tolerance are critical, such as databases and distributed ledgers. Consensus Problem…

Read More Understanding the Paxos Consensus Algorithm
Computing systems | Systems | Systems 101 | Tutorial

Understanding the Raft Consensus Protocol
ByEric Ma Aug 24, 2024Aug 24, 2024

The Raft consensus protocol is a distributed consensus algorithm designed to be more understandable than other consensus algorithms like Paxos. It ensures that a cluster of servers can agree on the state of a system even in the presence of failures. Key Concepts Raft divides the consensus problem into three relatively independent subproblems: Leader Election:…

Read More Understanding the Raft Consensus Protocol
Computing systems | Insights | Systems

Do big data stream processing in the stream way
ByEric Ma Nov 27, 2018Nov 21, 2019

Reading: Years in Big Data. Months with Apache Flink. 5 Early Observations With Stream Processing: https://data-artisans.com/blog/early-observations-apache-flink. The article suggest adopting the right solution, Flink, for big data processing. Flink is interesting and built for stream processing. The broader view and take away may be to solve problems using the right solution. We saw many painful…

Read More Do big data stream processing in the stream way
Computing systems | Resource management | Storage systems | Systems | Tutorial

Hadoop Installation Tutorial (Hadoop 2.x)
ByEric Ma Sep 14, 2014Dec 29, 2019

Hadoop 2 or YARN is the new version of Hadoop. It adds the yarn resource manager in addition to the HDFS and MapReduce components. Hadoop MapReduce is a programming model and software framework for writing applications, which is an open-source variant of MapReduce designed and implemented by Google initially for processing and generating large data…

Read More Hadoop Installation Tutorial (Hadoop 2.x)
Computing systems | Storage systems | Systems

Big Data Benchmark from AMPLab of UC Berkeley
ByEric Ma Mar 17, 2014Sep 5, 2020

Benchmarks are important to understand the performance and quantitative and qualitative comparison of different systems. Many analytic frameworks, such as Hive, Impala and Shark, are designed and implemented these years and become fundamental software for processing big data. How to benchmark these big data analytic systems is an interesting problem. The Big Data Benchmark The…

Read More Big Data Benchmark from AMPLab of UC Berkeley
Computing systems | Insights | Storage systems | Systems

Software Engineering Advice from Building Large-Scale Distributed Systems by Jeff Dean
ByEric Ma Jul 18, 2013Aug 30, 2020

Software Engineering Advice from Building Large-Scale Distributed Systems by Jeff Dean. You can download the slides from Software Engineering Advice from Building Large-Scale Distributed Systems by Jeff Dean. These slides contain the “Numbers everyone should know” which everyone working on systems should be familiar with. Numbers Everyone Should Know L1 cache reference 0.5 ns Branch…

Read More Software Engineering Advice from Building Large-Scale Distributed Systems by Jeff Dean
Computing systems | Tutorial

Hadoop MapReduce Tutorials
ByEric Ma Jul 17, 2013Sep 5, 2020

Here is a list of tutorials for learning how to write MapReduce programs on Hadoop, the opensource MapReduce implementation with HDFS. MapReduce Tutorials The official tutorial on Hadoop MapReduce framework: http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html. Yahoo! Hadoop Tutorial A comprehensive tutorial on Hadoop from Yahoo! Developer Network: http://developer.yahoo.com/hadoop/tutorial/. More about MapReduce To better understand the design behind MapReduce, it…

Read More Hadoop MapReduce Tutorials
Computing systems | News

PUMA: A MapReduce Benchmark Suite
ByEric Ma Dec 20, 2012Sep 5, 2020

MapReduce is a well-known programming model designed for generating and processing large data. There are various MapReduce implementations. One widely known and used one may be Hadoop. Benchmarking MapReduce frameworks gets to be important. Faraz Ahmad et al. developed a benchmark suite: PUMA MapReduce Benchmark. During our work on MapReduce, we developed a benchmark suite…

Read More PUMA: A MapReduce Benchmark Suite
Computing systems | Storage systems

Large-scale Data Storage and Processing System in Datacenters
ByEric Ma Dec 11, 2012Aug 30, 2020

Research on Cloud Computing has made big progresses and many excellent large-scale systems have been designed in recent years. I compiled a list of some large-scale data storage and processing systems in datacenters as follows. Storage systems Google File System (GFS): http://research.google.com/archive/gfs.html HDFS implementation: https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html Colossus (GFS2): Colossus: Successor to the Google File System (GFS)…

Read More Large-scale Data Storage and Processing System in Datacenters
Computing systems | Resource management | Storage systems

Microsofts Cosmos Service
ByEric Ma Dec 10, 2012May 31, 2020

Cosmos is “Microsoft’s internal data storage/query system for analyzing enormous amounts (as in petabytes) of data”. There is no paper/technical report about Cosmos published yet. I compiled a list of information about Cosmos on the Web as follows. What is Microsoft’s Cosmos service? by Yaron Y. Goland. Microsoft Cosmos: Petabytes perfectly processed perfunctorily by Seth…

Read More Microsofts Cosmos Service
Computing systems | Storage systems | Systems

Hadoop Installation Tutorial (Hadoop 1.x)
ByEric Ma Oct 9, 2012Nov 28, 2020

Update: If you are new to Hadoop and trying to install one. Please check the newer version: Hadoop Installation Tutorial (Hadoop 2.x). Hadoop mainly consists of two parts: Hadoop MapReduce and HDFS. Hadoop MapReduce is a programming model and software framework for writing applications, which is an open-source variant of MapReduce that is initially designed…

Read More Hadoop Installation Tutorial (Hadoop 1.x)
Computing systems

mrcc – A Distributed C Compiler System on MapReduce
ByEric Ma Jan 16, 2010Aug 30, 2020

The mrcc project’s homepage is here: mrcc project. Abstract mrcc is an open source compilation system that uses MapReduce to distribute C code compilation across the servers of the cloud computing platform. mrcc is built to use Hadoop by default, but it is easy to port it to other could computing platforms, such as MRlite,…

Read More mrcc – A Distributed C Compiler System on MapReduce