In hadoop I need to skip mapper function and directly execute the reducer function. We doing this to improve hadoop performance, if the hadoop framework is used to analyze same data sets, then mapper’s output will be same for different kind of jobs. To save the redundant computation for same results, I am planning to
Read more
Tag: Spark
What are the DDL and DML of Shark (Spark SQL)?
Posted onCurrently, I wanna take Shark’s (Spark SQL) DDL and DML as an reference to design/implement SQLE’s DDL and DML. However, I cannot find its DDL and DML. I can only find several SQLs in Shark paper[1]. [1] shark paper – http://tab.d-thinker.org/showthread.php?tid=2585 Shark’s language is Hive QL. HQL’s DDL and DML can be found at Hive
Read more
Big Data Benchmark from AMPLab of UC Berkeley
Posted onBenchmarks are important to understand the performance and quantitative and qualitative comparison of different systems. Many analytic frameworks, such as Hive, Impala and Shark, are designed and implemented these years and become fundamental software for processing big data. How to benchmark these big data analytic systems is an interesting problem. The Big Data Benchmark The
Read more
Large-scale Data Storage and Processing System in Datacenters
Posted onResearch on Cloud Computing has made big progresses and many excellent large-scale systems have been designed in recent years. I compiled a list of some large-scale data storage and processing systems in datacenters as follows. Storage systems Google File System (GFS): http://research.google.com/archive/gfs.html HDFS implementation: https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html Colossus (GFS2): Colossus: Successor to the Google File System (GFS)
Read more
Reading List for Distributed Systems and Cloud Computing
Posted onUnderstanding the literature is usually the first step to do research, which is the same for systems research on cloud computing. A reading list may help a lot to those that just start in cloud computing research. Prof. Lin Gu, my PhD supervisor, compiled a reading list for system research on cloud computing. The reading
Read more