Microsofts Cosmos Service

ByEric Ma Dec 10, 2012May 31, 2020

Cosmos is “Microsoft’s internal data storage/query system for analyzing enormous amounts (as in petabytes) of data”.

There is no paper/technical report about Cosmos published yet. I compiled a list of information about Cosmos on the Web as follows.

What is Microsoft’s Cosmos service? by Yaron Y. Goland.

Microsoft Cosmos: Petabytes perfectly processed perfunctorily by Seth Eliot.

Cosmos Big Data and Big Challenges by Pat Helland.

What Is COSMOS?

Petabyte Store and Computation System

About 62 physical petabytes stored (~275 logical petabytes stored)

Tens of thousands of computers across many datacenters

Massively parallel processing based on Dryad

Similar to MapReduce but can represent arbitrary DAGs of computation

Automatic computation placement with data

SCOPE (Structured Computation Optimized for Parallel Execution)

SQL-like language with set-oriented record and column manipulation

Automatically compiled and optimized for execution over Dryad

Management of hundreds of “Virtual Clusters” for computation allocation

Buy your machines and give them to COSMOS

Guaranteed that many compute resources

May use more when they are not in use

Ubiquitous access to OSD’s data

Combining knowledge from different datasets is today’s secret sauce

Computing systems | Storage systems | Systems

Big Data Benchmark from AMPLab of UC Berkeley

ByEric Ma Mar 17, 2014Sep 5, 2020

Benchmarks are important to understand the performance and quantitative and qualitative comparison of different systems. Many analytic frameworks, such as Hive, Impala and Shark, are designed and implemented these years and become fundamental software for processing big data. How to benchmark these big data analytic systems is an interesting problem. The Big Data Benchmark The…

How to replace gdm with lightdm on Fedora

ByQ A Mar 24, 2018

Cinnamon does not work well with gdm. And there is no meaning to use gdm when using Cinnamon. On Fedora, the default one id gdm. How to replace gdm with lightdm on Fedora? First, install lightdm if it is not installed yet: # yum install lightdm lightdm-gtk Then, disable gdm service and make the lightdm…

Programming | Tutorial | Web

Quartz Implementation in Java

ByAaronjacobson Aug 16, 2016Aug 30, 2020

In this post, java development India based experts will explain the concept of Quartz. You will also learn the method of setting up the Quartz in this article. You can ask experts if anything bothers you. Technology Quartz is the open source Java technology for scheduling background jobs. If we want to execute the task…

Add via to the Tweet button of AddThis share

ByEric Ma Mar 24, 2018Mar 24, 2018

I use the AddThis share buttons on my sites. How to add the via @fclosedotcom to the Tweet button of AddThis share so that the tweets automatically @ my twitter handle? Change <a class=”addthis_button_tweet”></a> to <a class=”addthis_button_tweet” tw_via=”fclosedotcom”></a> Try it on this page with the Tweet button. Read more: WordPress theme TwentyFourteen interfere with AddThis…

Software

Make Better Decisions for Your Businesses with Data Visualization

ByJoseph Macwan Feb 24, 2017Aug 30, 2020

In today’s time, data visualization has become a significant part of the success story of an organization. With the help of right techniques, visualizing data can reveal insights which the management staff can use in their decision-making in order to make sound data-driven decisions. Mapping software is among the robust data visualization tools that you…

Computing systems | Insights | Storage systems | Systems

Software Engineering Advice from Building Large-Scale Distributed Systems by Jeff Dean

ByEric Ma Jul 18, 2013Aug 30, 2020

Software Engineering Advice from Building Large-Scale Distributed Systems by Jeff Dean. You can download the slides from Software Engineering Advice from Building Large-Scale Distributed Systems by Jeff Dean. These slides contain the “Numbers everyone should know” which everyone working on systems should be familiar with. Numbers Everyone Should Know L1 cache reference 0.5 ns Branch…

One Comment

Pingback: Cluster/Distributed/Parallel/BigData Computing Frameworks | Insight's Delight

Similar Posts

One Comment

Leave a Reply Cancel reply