Private Key Sharding: A Technical Guide

Posted on

Private key sharding is a technique used to distribute a private key into multiple parts, or “shards,” to enhance security and fault tolerance. This method is particularly useful in scenarios where a single point of failure must be avoided, such as in secure communications, cryptocurrency wallets, and distributed systems. What is Private Key Sharding? Private
Read more

Linear Consistency Model for Computer Systems

Posted on

Linear consistency models are crucial in ensuring reliability and coherence in distributed computer systems. These models help manage how systems handle data and operations across multiple nodes, ensuring consistency without sacrificing performance. What is a Linear Consistency Model? In distributed computing, a linear consistency model ensures that operations on distributed data appear as if they
Read more

Comparing Paxos and Raft

Posted on

Paxos and Raft are both consensus algorithms used to ensure consistency in distributed systems. While they solve similar problems, they have different approaches and design philosophies. Characteristics Paxos Roles: Proposers, Acceptors, Learners. Phases: Two main phases (Prepare/Promise and Propose/Accept). Leader Election: Not explicitly defined, often implemented using Multi-Paxos to handle multiple proposals efficiently. Use Cases:
Read more

Understanding the Paxos Consensus Algorithm

Posted on

The Paxos consensus algorithm is a fundamental concept in distributed computing that ensures a group of distributed systems can agree on a single value, even in the presence of failures. Developed by Leslie Lamport, Paxos is widely used in systems where consistency and fault tolerance are critical, such as databases and distributed ledgers. Consensus Problem
Read more

Sybil Attack 101

Posted on

Distributed systems, such as peer-to-peer networks, , and other decentralized platforms, have become increasingly popular due to their potential to offer more robust, scalable, and secure solutions. However, these systems face unique challenges and vulnerabilities, one of which is the Sybil attack. Named after the psychiatric case study “Sybil,” in which a person exhibits multiple
Read more

Byzantine Faults 101

Posted on

Distributed systems are becoming increasingly important in various applications, such as cloud computing, , and peer-to-peer networks. One of the challenges in designing robust distributed systems is dealing with Byzantine faults, a type of fault that can be particularly difficult to detect and handle. Byzantine faults, named after the Byzantine Generals’ Problem, involve components of
Read more

Consensus Algorithm 101

Posted on

Consensus algorithms play a crucial role in the functioning of decentralized networks, such as blockchain-based systems. They help maintain the integrity, security, and reliability of these networks by ensuring that all participants agree on the state of the system. In this post, we will explore the concept of consensus algorithms, their importance, and some of
Read more

Do big data stream processing in the stream way

Posted on

Reading: Years in Big Data. Months with Apache Flink. 5 Early Observations With Stream Processing: https://data-artisans.com/blog/early-observations-apache-flink. The article suggest adopting the right solution, Flink, for big data processing. Flink is interesting and built for stream processing. The broader view and take away may be to solve problems using the right solution. We saw many painful
Read more

How to handle missing blocks and blocks with corrupt replicas in HDFS?

Posted on

One of HDFS cluster’s hdfs dfsadmin -report reports: Under replicated blocks: 139016 Blocks with corrupt replicas: 9 Missing blocks: 0 The “Under replicated blocks” can be re-replicated automatically after some time. How to handle the missing blocks and blocks with corrupt replicas in HDFS? Understanding these blocks A block is “with corrupt replicas” in HDFS
Read more

HDFS stays in safe mode because of reported blocks not reaching 0.9990 of total blocks

Posted on

After a node failure and restarting the HDFS, the NameNode reports: “The reported blocks 1968810 needs additional 5071 blocks to reach the threshold 0.9990 of total blocks 1975856. Safe mode will be turned off automatically.” in the log. Why this happens? And how to fix it? About why the NameNode stays in the safe mode:
Read more

How to understand some key system consistency algorithoms

Posted on

When we design a system, we may want our systems to be consistency, scalability and so on. Currently, there are some famous consistency algorithms. How to understand them easily. 1, Paxos and its extensions 2, Replicated State Machine mechanisms 3, Quorum Welcome to adding other famous consistency algorithms and its understanding ;-) Reading text books
Read more

What’s the difference between Reliability, Durability, and Availability for data storage system?

Posted on

Some important concepts in distributed system like Hadoop distributed file system, Google file system and so on. Answer from http://www.quora.com/Whats-the-difference-between-Reliability-Durability-and-Availability-for-data-storage-system The difference between durability and availability is fairly simple. Durability is about what happens when all power goes out everywhere. Has all data been written to stable storage that doesn’t require power (e.g. disk/flash), in
Read more

Consistency models for distributed systems

Posted on

Which are the consistency models used for distributed systems? Papers that survey the consistency models Robert C. Steinke and Gary J. Nutt. 2004. A unified theory of shared memory consistency. J. ACM 51, 5 (September 2004), 800-849. DOI=10.1145/1017460.1017464 http://doi.acm.org/10.1145/1017460.1017464 David Mosberger. 1993. Memory consistency models. SIGOPS Oper. Syst. Rev. 27, 1 (January 1993), 18-26. DOI=10.1145/160551.160553
Read more

Transactional memory learning materials

Posted on

I want to learn transactional memory technologies. Any suggestions on Transactional memory learning materials? Thanks! I highly suggest the Transactional Memory lecture by James R. Larus and Ravi Rajwar of Synthesis Lectures on Computer Architecture: The Transactional Memory lecture:http://www.morganclaypool.com/doi/abs/10.2200/S00070ED1V01Y200611CAC002 Link to the PDF:http://www.morganclaypool.com/doi/pdf/10.2200/S00070ED1V01Y200611CAC002

Notes for Beginners of Software Development on Linux

Posted on

Linux is a great platform for software development targeting servers or backends. In general, working on Linux is very productive. The problem that beginners on Linux face is the the learning curve is steep at the beginning. But believe me, after you get through the initial green steep learning step as in the figure below
Read more

Software Engineering Advice from Building Large-Scale Distributed Systems by Jeff Dean

Posted on

Software Engineering Advice from Building Large-Scale Distributed Systems by Jeff Dean. You can download the slides from Software Engineering Advice from Building Large-Scale Distributed Systems by Jeff Dean. These slides contain the “Numbers everyone should know” which everyone working on systems should be familiar with. Numbers Everyone Should Know L1 cache reference 0.5 ns Branch
Read more

Reading List for Distributed Systems and Cloud Computing

Posted on

Understanding the literature is usually the first step to do research, which is the same for systems research on cloud computing. A reading list may help a lot to those that just start in cloud computing research. Prof. Lin Gu, my PhD supervisor, compiled a reading list for system research on cloud computing. The reading
Read more