Colossus: Successor to the Google File System (GFS)

Posted on In Storage systems, Systems

Colossus is the successor to the Google File System (GFS) as mentioned in the paper on Spanner at OSDI 2012. Colossus is also used by spanner to store its tablets. The information about Colossus is slim compared with GFS which is published in the paper at SOSP 2003. There is still some information about Colossus on the Web. Here, I list some of them.

Storage Architecture and Challenges

A talk on Faculty Summit, July 29, 2010, by Andrew Fikes, Principal Engineer. The slides.

Some interesting points:

  • Storage Software: Colossus
    • Next-generation cluster-level file system
    • Automatically sharded metadata layer
    • Data typically written using Reed-Solomon (1.5x)
    • Client-driven replication, encoding and replication
    • Metadata space has enabled availability analyses
  • Why Reed-Solomon?
    • Cost. Especially w/ cross cluster replication.
    • Field data and simulations show improved MTTF
    • More flexible cost vs. availability choices

A peek behind the VM at the Google Storage infrastructure

An online talk on Google Cloud storage by Dean Hildebrand, Technical Director, and Denis Serenyi, Teach Lead, Google Cloud Storage. The talk gives quite some details on how Colossus works. View the online talk.

  • Since GFS time, Google has scaled a lot and there’s a lot more data to store; the higher level of scalability drove the creation of Colossus
  • Colossus client: probably the most complex part of the system
    • lots of functions go directly in the client, such as
      • software RAID
      • application encoding chosen
  • Curators: foundation of Colossus, its scalable metadata service
    • can scale out horizontally
    • built on top of a NoSQL database like BigTable
    • allow Colossus to scale up by over a 100x over the largest GFS
  • D servers: simple network attached disks
  • Custodians: background storage managers, handle such as disk space balancing, and RAID construction
    • ensures the durability and availability
    • ensures the system is working efficiently
  • Data: there are hot data (e.g. newly written data) and cold data
  • Mixing flash and spinning disks
    • really efficient storage organization
      • just enough flash to push the I/O density per gigabyte of data
      • just enough disks to fill them all up
    • use flash to serve really hot data, and lower latency
    • regarding to disks
      • equal amounts of hot data across disks
        • each disk has roughly same bandwidth
        • spreads new writes evenly across all the disks so disk spindles are busy
      • rest of disks filled with cold data
        • moves older cold data to bigger drives so disks are full

GFS: Evolution on Fast-forward

An interview with Google’s Sean Quinlan by the Association for Computer Machinery (ACM).

View the interview.

Some important info:

  • “We also ended up doing what we call a “multi-cell” approach, which basically made it possible to put multiple GFS masters on top of a pool of chunkservers.”
  • “We also have something we called Name Spaces, which are just a very static way of partitioning a namespace that people can use to hide all of this from the actual application.” … “a namespace file describes”
  • “The distributed master certainly allows you to grow file counts, in line with the number of machines you’re willing to throw at it.” … “Our distributed master system that will provide for 1-MB files is essentially a whole new design. That way, we can aim for something on the order of 100 million files per master. You can also have hundreds of masters.”
  • BigTable “as one of the major adaptations made along the way to help keep GFS viable in the face of rapid and widespread change.”

Google File System II: Dawn of the Multiplying Master Nodes Comments on GFS2 (colossus)

by Cade Metz in San Francisco.

The article and some excerpt.

Eric Ma

Eric is a systems guy. Eric is interested in building high-performance and scalable distributed systems and related technologies. The views or opinions expressed here are solely Eric's own and do not necessarily represent those of any third parties.

8 comments

  1. Google Cloud Next 2020 virtual conference Infrastructure week had a session covering Colossus overview in session “A peek behind the VM at the Google Storage infrastructure” (presenters: Dean Hildebrand (technical director), Denis Serenyi (tech lead, Google Cloud Storage))
    https://www.youtube.com/watch?v=q4WC_6SzBz4

Leave a Reply

Your email address will not be published. Required fields are marked *