Large-scale Data Storage and Processing System in Datacenters

ByEric Ma Dec 11, 2012Aug 30, 2020

Research on Cloud Computing has made big progresses and many excellent large-scale systems have been designed in recent years. I compiled a list of some large-scale data storage and processing systems in datacenters as follows.

Storage systems

Google File System (GFS): http://research.google.com/archive/gfs.html
HDFS implementation: https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html
Colossus (GFS2): Colossus: Successor to the Google File System (GFS)
BigTable: http://research.google.com/archive/bigtable.html
Megastore: http://research.google.com/pubs/pub36971.html
Spanner: http://research.google.com/archive/spanner.html
Dynamo: http://dl.acm.org/citation.cfm?id=1294281
RAMCloud: http://dl.acm.org/citation.cfm?id=1965751 and http://dl.acm.org/citation.cfm?id=2043560

Compute systems

MapReduce: http://research.google.com/archive/mapreduce.html
Hadoop implementation: Hadoop MapReduce Tutorials
Sawzall: http://research.google.com/archive/sawzall.html
FlumeJava: http://dl.acm.org/citation.cfm?id=1806638
Pig latin: http://dl.acm.org/citation.cfm?id=1376726
Dryad/DryadLINQ: http://research.microsoft.com/en-us/projects/dryad/
Pregel: http://dl.acm.org/citation.cfm?id=1807184 and http://googleresearch.blogspot.com/2009/06/large-scale-graph-computing-at-google.html
Dremel: http://research.google.com/pubs/pub36632.html
Storm: https://blog.twitter.com/2011/a-storm-is-coming-more-details-and-plans-for-release and https://github.com/nathanmarz/storm/wiki
Spark: https://www.usenix.org/conference/nsdi12/resilient-distributed-datasets-fault-tolerant-abstraction-memory-cluster-computing and http://spark-project.org/
DVM: IEEE Transactions on Computers paper and VEE paper

Resource management

Mesos: http://mesos.apache.org/documentation/latest/architecture/

Linux

How to Send Email Using mailx/s-nail in Linux Through Internal SMTP

ByEric Ma Oct 28, 2013Mar 28, 2020

How to Send Email from mailx Command in Linux Using Gmail’s SMTP introduces how to send email using heirloom mailx (or s-nail if you are using Ubuntu 18 or later or similar releases) command in Linux through Gmail’s SMTP which requires some configuration. On the other hand, there are many environments that do not require…

Web

How to Change MediaWiki’s Main Page Title and URL

ByEric Ma Jul 16, 2013Jul 16, 2013

MediaWiki set’s the default page (the homepage) to the “Main Page” by default. The name “Main Page” is too general and does not give more meanings. Usually for a specific site, we may change it to some meaningful name and URL. This is a configurable in MediaWiki. This tutorial introduces how to change MediaWiki’s Main…

Computing systems | Storage systems | Systems

Hadoop Installation Tutorial (Hadoop 1.x)

ByEric Ma Oct 9, 2012Nov 28, 2020

Update: If you are new to Hadoop and trying to install one. Please check the newer version: Hadoop Installation Tutorial (Hadoop 2.x). Hadoop mainly consists of two parts: Hadoop MapReduce and HDFS. Hadoop MapReduce is a programming model and software framework for writing applications, which is an open-source variant of MapReduce that is initially designed…

How to change a user’s username on Linux?

ByEric Ma Mar 24, 2018Mar 24, 2018

I want to rename a user’s username on Linux. For example, rename user u1 to user1. How to change a user’s username on Linux? You can use the usermod command to modify the user’s info. For changing the user name, you can use the -l option: -l, –login NEW_LOGIN The name of the user will…

Free server images – SysTutorials QA

ByQ A Mar 24, 2018Mar 24, 2018

Any free server images? 24 Free Data Center Photos from fatcow.com 24 Free Data Center Photos: http://www.fatcow.com/data-center-photos From Wikimedia commons: Multiple servers: http://commons.wikimedia.org/wiki/File:Server-multiple.svg Server: http://commons.wikimedia.org/wiki/File:Server.svg Yellow server: http://commons.wikimedia.org/wiki/File:Server-yellow.svg Green server: http://commons.wikimedia.org/wiki/File:Server-green.svg More from clker.com: Web Virtualization Server clip art: http://www.clker.com/clipart-1826.html Small Case Web Mail Server clip art: http://www.clker.com/clipart-1902.html Inside our data centers from Google —…

Linux | Web

Accelerating WordPress with WP Super Cache, Opcache and Autoptimize

ByEric Ma Apr 3, 2014Aug 30, 2020

WordPress can be very fast after some effort on performance optimization with the help from its plenty of plugins. Possible ways include using cache to cut down the number of database queries, improves HTML/JavaScript/CSS code, and optimizing PHP’s performance with opcode cache. In this post, we introduce how to speed up WordPress with OPcache, page…

2 Comments

Pingback: 近些年数据中心云存储相关的系统整理 | 撤退的逃兵
Zhiqiang Ma says:

Aug 8, 2013 at 12:00 am

The Memcache and TAO from Facebook are also very interesting, scalable and real systems: http://www.systutorials.com/qa/364/cache-at-facebook

Reply

Storage systems

Compute systems

Resource management

Similar Posts

2 Comments

Leave a Reply Cancel reply