distributed systems

Blockchain | Systems | Systems 101 | Tutorial

Private Key Sharding: A Technical Guide
ByEric Ma Sep 14, 2024May 4, 2025

Private key sharding is a technique used to distribute a private key into multiple parts, or “shards,” to enhance security and fault tolerance. This method is particularly useful in scenarios where a single point of failure must be avoided, such as in secure communications, cryptocurrency wallets, and distributed systems. What is Private Key Sharding? Private…

Read More Private Key Sharding: A Technical Guide
Systems | Systems 101 | Tutorial

Linear Consistency Model for Computer Systems
ByEric Ma Sep 6, 2024

Linear consistency models are crucial in ensuring reliability and coherence in distributed computer systems. These models help manage how systems handle data and operations across multiple nodes, ensuring consistency without sacrificing performance. What is a Linear Consistency Model? In distributed computing, a linear consistency model ensures that operations on distributed data appear as if they…

Read More Linear Consistency Model for Computer Systems
Computing systems | Systems | Systems 101 | Tutorial

Comparing Paxos and Raft
ByEric Ma Sep 6, 2024

Paxos and Raft are both consensus algorithms used to ensure consistency in distributed systems. While they solve similar problems, they have different approaches and design philosophies. Characteristics Paxos Roles: Proposers, Acceptors, Learners. Phases: Two main phases (Prepare/Promise and Propose/Accept). Leader Election: Not explicitly defined, often implemented using Multi-Paxos to handle multiple proposals efficiently. Use Cases:…

Read More Comparing Paxos and Raft
Computing systems | Systems | Systems 101 | Tutorial

Understanding the Paxos Consensus Algorithm
ByEric Ma Sep 6, 2024

The Paxos consensus algorithm is a fundamental concept in distributed computing that ensures a group of distributed systems can agree on a single value, even in the presence of failures. Developed by Leslie Lamport, Paxos is widely used in systems where consistency and fault tolerance are critical, such as databases and distributed ledgers. Consensus Problem…

Read More Understanding the Paxos Consensus Algorithm
Systems 101

Sybil Attack 101
ByEthan Ainsworth Sep 16, 2023Sep 16, 2023

Distributed systems, such as peer-to-peer networks, , and other decentralized platforms, have become increasingly popular due to their potential to offer more robust, scalable, and secure solutions. However, these systems face unique challenges and vulnerabilities, one of which is the Sybil attack. Named after the psychiatric case study “Sybil,” in which a person exhibits multiple…

Read More Sybil Attack 101
Systems 101

Byzantine Faults 101
ByEthan Ainsworth Sep 16, 2023Sep 16, 2023

Distributed systems are becoming increasingly important in various applications, such as cloud computing, , and peer-to-peer networks. One of the challenges in designing robust distributed systems is dealing with Byzantine faults, a type of fault that can be particularly difficult to detect and handle. Byzantine faults, named after the Byzantine Generals’ Problem, involve components of…

Read More Byzantine Faults 101
Systems 101

Consensus Algorithm 101
ByEthan Ainsworth Sep 16, 2023Sep 16, 2023

Consensus algorithms play a crucial role in the functioning of decentralized networks, such as blockchain-based systems. They help maintain the integrity, security, and reliability of these networks by ensuring that all participants agree on the state of the system. In this post, we will explore the concept of consensus algorithms, their importance, and some of…

Read More Consensus Algorithm 101
Computing systems | Insights | Systems

Do big data stream processing in the stream way
ByEric Ma Nov 27, 2018Nov 21, 2019

Reading: Years in Big Data. Months with Apache Flink. 5 Early Observations With Stream Processing: https://data-artisans.com/blog/early-observations-apache-flink. The article suggest adopting the right solution, Flink, for big data processing. Flink is interesting and built for stream processing. The broader view and take away may be to solve problems using the right solution. We saw many painful…

Read More Do big data stream processing in the stream way
Storage systems | Systems

How to handle missing blocks and blocks with corrupt replicas in HDFS?
ByEric Ma Mar 24, 2018Feb 20, 2020

One of HDFS cluster’s hdfs dfsadmin -report reports: Under replicated blocks: 139016 Blocks with corrupt replicas: 9 Missing blocks: 0 The “Under replicated blocks” can be re-replicated automatically after some time. How to handle the missing blocks and blocks with corrupt replicas in HDFS? Understanding these blocks A block is “with corrupt replicas” in HDFS…

Read More How to handle missing blocks and blocks with corrupt replicas in HDFS?
QA

HDFS stays in safe mode because of reported blocks not reaching 0.9990 of total blocks
ByEric Ma Mar 24, 2018Feb 9, 2019

After a node failure and restarting the HDFS, the NameNode reports: “The reported blocks 1968810 needs additional 5071 blocks to reach the threshold 0.9990 of total blocks 1975856. Safe mode will be turned off automatically.” in the log. Why this happens? And how to fix it? About why the NameNode stays in the safe mode:…

Read More HDFS stays in safe mode because of reported blocks not reaching 0.9990 of total blocks
QA

How to understand some key system consistency algorithoms
ByEric Ma Mar 24, 2018Mar 24, 2018

When we design a system, we may want our systems to be consistency, scalability and so on. Currently, there are some famous consistency algorithms. How to understand them easily. 1, Paxos and its extensions 2, Replicated State Machine mechanisms 3, Quorum Welcome to adding other famous consistency algorithms and its understanding ;-) Reading text books…

Read More How to understand some key system consistency algorithoms
QA

What is the design of Snapshots in HDFS?
ByEric Ma Mar 24, 2018Mar 24, 2018

What is the design of Snapshots in HDFS? This PDF documents the design of snapshot. Jing Zhao and Tsz-Wo Sze from Hortonworks gave a great talk on the design of HDFS snapshots. The slides can be downloaded at here. The development of snapshot is tracked by HDFS-2802.

Read More What is the design of Snapshots in HDFS?
QA

What’s the difference between Reliability, Durability, and Availability for data storage system?
ByWeiwei Jia Mar 24, 2018Jan 7, 2020

Some important concepts in distributed system like Hadoop distributed file system, Google file system and so on. Answer from http://www.quora.com/Whats-the-difference-between-Reliability-Durability-and-Availability-for-data-storage-system The difference between durability and availability is fairly simple. Durability is about what happens when all power goes out everywhere. Has all data been written to stable storage that doesn’t require power (e.g. disk/flash), in…

Read More What’s the difference between Reliability, Durability, and Availability for data storage system?
QA

Redis Architecture, consistency model, etc.
ByQ A Mar 24, 2018Jun 26, 2018

Technical discussions on Redis. Redis internal documentation: http://redis.io/topics/internals Redis manifesto, the philosophy behind Redis: http://oldblog.antirez.com/post/redis-manifesto.html Redis Architecture: Overview Of Redis Architecture Redis data model and eventual consistency: http://antirez.com/news/36

Read More Redis Architecture, consistency model, etc.
QA

Consistency models for distributed systems
ByQ A Mar 24, 2018Jun 28, 2018

Which are the consistency models used for distributed systems? Papers that survey the consistency models Robert C. Steinke and Gary J. Nutt. 2004. A unified theory of shared memory consistency. J. ACM 51, 5 (September 2004), 800-849. DOI=10.1145/1017460.1017464 http://doi.acm.org/10.1145/1017460.1017464 David Mosberger. 1993. Memory consistency models. SIGOPS Oper. Syst. Rev. 27, 1 (January 1993), 18-26. DOI=10.1145/160551.160553…

Read More Consistency models for distributed systems
QA

Transactional memory learning materials
ByQ A Mar 24, 2018

I want to learn transactional memory technologies. Any suggestions on Transactional memory learning materials? Thanks! I highly suggest the Transactional Memory lecture by James R. Larus and Ravi Rajwar of Synthesis Lectures on Computer Architecture: The Transactional Memory lecture:http://www.morganclaypool.com/doi/abs/10.2200/S00070ED1V01Y200611CAC002 Link to the PDF:http://www.morganclaypool.com/doi/pdf/10.2200/S00070ED1V01Y200611CAC002

Read More Transactional memory learning materials
Programming | Tutorial

Notes for Beginners of Software Development on Linux
ByEric Ma Nov 29, 2015Aug 30, 2020

Linux is a great platform for software development targeting servers or backends. In general, working on Linux is very productive. The problem that beginners on Linux face is the the learning curve is steep at the beginning. But believe me, after you get through the initial green steep learning step as in the figure below…

Read More Notes for Beginners of Software Development on Linux
Computing systems | Insights | Storage systems | Systems

Software Engineering Advice from Building Large-Scale Distributed Systems by Jeff Dean
ByEric Ma Jul 18, 2013Aug 30, 2020

Software Engineering Advice from Building Large-Scale Distributed Systems by Jeff Dean. You can download the slides from Software Engineering Advice from Building Large-Scale Distributed Systems by Jeff Dean. These slides contain the “Numbers everyone should know” which everyone working on systems should be familiar with. Numbers Everyone Should Know L1 cache reference 0.5 ns Branch…

Read More Software Engineering Advice from Building Large-Scale Distributed Systems by Jeff Dean
Insights | Systems

Designs, Lessons and Advice from Building Large Distributed Systems
ByEric Ma Jan 22, 2013Aug 30, 2020

Designs, Lessons and Advice from Building Large Distributed Systems by Jeaf Dean. Everyone who is interested in large distributed systems should read: PDF for Designs, Lessons and Advice from Building Large Distributed Systems by Jeaf Dean.

Read More Designs, Lessons and Advice from Building Large Distributed Systems
Tutorial

Reading List for Distributed Systems and Cloud Computing
ByEric Ma Sep 15, 2012Aug 30, 2020

Understanding the literature is usually the first step to do research, which is the same for systems research on cloud computing. A reading list may help a lot to those that just start in cloud computing research. Prof. Lin Gu, my PhD supervisor, compiled a reading list for system research on cloud computing. The reading…

Read More Reading List for Distributed Systems and Cloud Computing