Colossus: Successor to the Google File System (GFS)
Posted on In Storage systems, SystemsColossus is the successor to the Google File System (GFS) as mentioned in the paper on Spanner at OSDI 2012. Colossus is also used by spanner to store its tablets. The information about Colossus is slim compared with GFS which is published in the paper at SOSP 2003. There is still some information about Colossus on the Web. Here, I list some of them.
Table of Contents
Storage Architecture and Challenges
A talk on Faculty Summit, July 29, 2010, by Andrew Fikes, Principal Engineer. The slides.
Some interesting points:
- Storage Software: Colossus
- Next-generation cluster-level file system
- Automatically sharded metadata layer
- Data typically written using Reed-Solomon (1.5x)
- Client-driven replication, encoding and replication
- Metadata space has enabled availability analyses
- Why Reed-Solomon?
- Cost. Especially w/ cross cluster replication.
- Field data and simulations show improved MTTF
- More flexible cost vs. availability choices
A peek behind the VM at the Google Storage infrastructure
An online talk on Google Cloud storage by Dean Hildebrand, Technical Director, and Denis Serenyi, Teach Lead, Google Cloud Storage. The talk gives quite some details on how Colossus works. View the online talk.
- Since GFS time, Google has scaled a lot and there’s a lot more data to store; the higher level of scalability drove the creation of Colossus
- Colossus client: probably the most complex part of the system
- lots of functions go directly in the client, such as
- software RAID
- application encoding chosen
- lots of functions go directly in the client, such as
- Curators: foundation of Colossus, its scalable metadata service
- can scale out horizontally
- built on top of a NoSQL database like BigTable
- allow Colossus to scale up by over a 100x over the largest GFS
- D servers: simple network attached disks
- Custodians: background storage managers, handle such as disk space balancing, and RAID construction
- ensures the durability and availability
- ensures the system is working efficiently
- Data: there are hot data (e.g. newly written data) and cold data
- Mixing flash and spinning disks
- really efficient storage organization
- just enough flash to push the I/O density per gigabyte of data
- just enough disks to fill them all up
- use flash to serve really hot data, and lower latency
- regarding to disks
- equal amounts of hot data across disks
- each disk has roughly same bandwidth
- spreads new writes evenly across all the disks so disk spindles are busy
- rest of disks filled with cold data
- moves older cold data to bigger drives so disks are full
- equal amounts of hot data across disks
- really efficient storage organization
GFS: Evolution on Fast-forward
An interview with Google’s Sean Quinlan by the Association for Computer Machinery (ACM).
Some important info:
- “We also ended up doing what we call a “multi-cell” approach, which basically made it possible to put multiple GFS masters on top of a pool of chunkservers.”
- “We also have something we called Name Spaces, which are just a very static way of partitioning a namespace that people can use to hide all of this from the actual application.” … “a namespace file describes”
- “The distributed master certainly allows you to grow file counts, in line with the number of machines you’re willing to throw at it.” … “Our distributed master system that will provide for 1-MB files is essentially a whole new design. That way, we can aim for something on the order of 100 million files per master. You can also have hundreds of masters.”
- BigTable “as one of the major adaptations made along the way to help keep GFS viable in the face of rapid and widespread change.”
Google File System II: Dawn of the Multiplying Master Nodes Comments on GFS2 (colossus)
by Cade Metz in San Francisco.
The article and some excerpt.
This page is linked by a director from Google https://www.linkedin.com/in/mbinde/ for reference to Colossus !
Glad this article is referenced by “Google Cloud Platform” article “Opinionated Managed Storage Engine” at https://medium.com/google-cloud/the-12-components-of-google-bigquery-c2b49829a7c7 – “Colossus is Google’s successor to GFS”.
Google Cloud Next 2020 virtual conference Infrastructure week had a session covering Colossus overview in session “A peek behind the VM at the Google Storage infrastructure” (presenters: Dean Hildebrand (technical director), Denis Serenyi (tech lead, Google Cloud Storage))
https://www.youtube.com/watch?v=q4WC_6SzBz4
Good info! Thanks Tuomas. I added the link and a digest of the content related to Colossus.