Open distributed compute enabled by Mesos

Mesos is a framework for providing generalized compute services with networking and storage resources managed around a compute unit. Corporations spend a significant amount of time and resource managing data centers using outdated applications and processes. Mesos could simplify data center management if one is willing to look at the workload as a bunch of tasks with some duration, very short to long running.  I have some experience in this area being one of the lead architects behind the first public cloud, the Sun Cloud project which addressed and discussed many of the issues Mesos addresses.  I do find Mesos to have many similarities to LSF and N1 Grid engine. One of the most interesting talks at the conference was given by Neha Narula.  Neha discussed some of the topics that come up when discussing the CAP Theorem and distributed transaction systems. However she did not mention the CAP theorem and did discuss the need to consider the cost of scaling. Are we building systems to scale unnecessarily?  I refer to many of the approaches where one knows the system doesn’t work in certain conditions as “Approximate Computing“. Neha provided a wealth of links to current academic and commercial research:

  1. https://ramcloud.atlassian.net/wiki/display/RAM/RAMCloud+Papers
  2. https://syslab.cs.washington.edu/papers/tapir-tr14.pdf

Some readings on the larger problems that Mesos doesn’t address:

  • Holistic Configuration Management for SocialY 
    • Chunqiang Tang, Thawan Kooburat, Pradeep Venkatachalam, Akshay Chander, Je Wen, Aravind Narayanan, Patrick Dowell, and Robert Karl (Facebook Inc.)
  • Using Crash Hoare Logic for Certifying the FSCQ File System
    • Haogang Chen, Daniel Ziegler, Adam Chlipala, M. Frans Kaashoek, and Nickolai Zeldovich (MIT CSAIL)
  • JouleGuard: Energy Guarantees for Approximate Applications
    • Henry Hoffmann (University of Chicago)
  • Building Consistent Transactions with Inconsistent Replication
    • Irene Zhang, Naveen Kr. Sharma, Adriana Szekeres, Arvind Krishnamurthy, and Dan R. K. Ports (University of Washington)
  • How to Get More Value From Your File System Directory Cache
    • Chia-Che Tsai, Yang Zhan, Jayashree Reddy, Yizheng Jiao, Tao Zhang, and Donald E. Porter (Stony Brook University)
  • Read-Log-Update: A Lightweight Synchronization Mechanism for Concurrent Programming
    • Pascal Felber and Patrick Marlier (UNINE) and Alexander Matveev and Nir Shavit (MIT)
  • Interruptible Tasks: Treating Memory Pressure As Interrupts for Highly Scalable Data-Parallel Programs
    • Lu Fang, Khanh Nguyen, Guoqing (Harry) Xu, and Brian Demsky (University of California, Irvine) and Shan Lu (University of Chicago)
  • Chaos: Scale-out Graph Processing from Secondary Storage
    • Amitabha Roy, Laurent Bindschaedler, Jasmina Malicevic, and Willy Zwaenepoel (EPFL)
  • Coz: Finding Code that Counts with Causal Profiling
    • Charlie Curtsinger and Emery D. Berger (University of Massachusetts Amherst)
  • Arabesque: A System for Distributed Graph Pattern Mining
    • Carlos H. C. Teixeira, Alexandre J. Fonseca, Marco Serafini, Georgios Siganos, Mohammed Zaki, and Ashraf Aboulnaga (Qatar Computing Research Institute)
  • SibylFS: formal specification and oracle-based testing for POSIX and real-world file systems
    • Tom Ridge (University of Leicester), David Sheets (University of Cambridge), Thomas Tuerk (FireEye), Anil Madhavapeddy (University of Cambridge), Andrea Giugliano (University of Leicester), and Peter Sewell (University of Cambridge)
  • Failure Sketching: A Technique for Automated Root Cause Diagnosis of In-Production Failures
    • Baris Kasikci and Benjamin Schubert (EPFL), Cristiano Pereira and Gilles Pokam (Intel Corporation), and George Candea (EPFL)
  • Scalable SQL storage for Web applications using distributed balanced trees.
    • Marcos K. Aguilera (VMware Research Group), Joshua B. Leners (UT Austin), and Michael Walfish (NYU)
  • Drowsy Power Management
    • Matthew Lentz, James Litton, and Bobby Bhattacharjee (University of Maryland)
  • Pivot Tracing: Dynamic Causal Monitoring for Distributed Systems
    • Jonathan Mace, Ryan Roelke, and Rodrigo Fonseca (Brown University)
  • Implementing Linearizability at Large Scale and Low Latency
    • Collin Lee, Seo Jin Park, and Ankita Kejriwal (Stanford University), Satoshi Matsushita (NEC), and John Ousterhout (Stanford University)
  • Scalable Private Messaging Resistant to Traffic Analysis
    • Jelle van den Hooff, David Lazar, Matei Zaharia, and Nickolai Zeldovich (MIT)
  • Parallelizing user-defined aggregations using symbolic execution
    • Veselin Raychev (ETH Zurich) and Madanlal Musuvathi and Todd Mytkowicz (Microsoft Research)
  • Fast In-memory Transaction Processing using RDMA and RTM
    • Xingda Wei, Jiaxin Shi, Yanzhe Chen, Rong Chen, and Haibo Chen (Institute of Parallel and Distributed Systems, Shanghai Jiao Tong University)
  • Virtual CPU Validation
    • Nadav Amit, Dan Tsafrir, and Assaf Schuster (Technion – Israel Institute of Technology) and Ahmad Ayoub and Eran Shlomo (Intel Corporation)
  • Split-Level I/O Scheduling
    • Suli Yang, Tyler Harter, Nishant Agrawal, Salini Selvaraj Kowsalya, Anand Krishnamurthy, and Samer Al-Kiswany (University of Wisconsin-Madison), Rini T. Kaushik (IBM Research – Almaden), and Andrea C. Arpaci-Dusseau and Remzi H. Arpaci-Dusseau (University of Wisconsin-Madison)
  • Cross-checking Semantic Correctness: The Case of Finding File System Bugs
    • Changwoo Min, Sanidhya Kashyap, Byoungyoung Lee, Chengyu Song, and Taesoo Kim (Georgia Institute of Technology)
  • E2: A Framework for NFV Applications
    • Shoumik Palkar and Sangjin Han (UC Berkeley), Keon Jang (Intel Labs), Chang Lan and Sylvia Ratnasamy (UC Berkeley), Luigi Rizzo (Università di Pisa), and Scott Shenker (UC Berkeley and ICSI)
  • Opportunistic Storage Maintenance
    • George Amvrosiadis, Ashvin Goel, and Angela Demke Brown (University of Toronto)
  • No compromises: distributed transactions with consistency, availability and performance.
    • Aleksandar Dragojevic, Dushyanth Narayanan, Ed Nightingale, Matthew Renzelmann, Alex Shamis, Anirudh Badam, and Miguel Castro (Microsoft Research)
  • Principled and Practical Consistency Analyses in a Large Social Graph
    • Haonan Lu (University of Southern California and Facebook), Philippe Ajoux and Kaushik Veeraraghavan (Facebook), Wyatt Lloyd (University of Southern California and Facebook), and Sanjeev Kumar (Facebook)
  • Paxos Made Transparent
    • Heming Cui, Rui Gu, Cheng Liu, and Junfeng Yang (Columbia)
  • Fleet: Proving Practical Distributed Systems Correct
    • Chris Hawblitzel, Jon Howell, Manos Kapritsos, Jacob R. Lorch, Bryan Parno, Michael L. Roberts, Srinath Setty, and Brian Zill (Microsoft Research)
  • Software Defined Batteries
    • Anirudh Badam and Ranveer Chandra (Microsoft Research), Jon A. Dutra (Microsoft), Steve Hodges (Microsoft Research), Pan Hu (University of Massachusetts Amherst), Julia Meinershagen (Microsoft), Bodhi Priyantha and Thomas Moscibroda (Microsoft Research), Evangelia Skiani (Columbia University), and Anthony Ferrese (Tesla Motors)
  • High-Performance ACID via Modular Concurrency Control
    • Chao Xie, Chunzhi Su, Cody Littley, and Lorenzo Alvisi (University of Texas at Austin), Manos Kapritsos (Microsoft Research), and Yang Wang (Ohio State University)

 

Leave a Reply

Your email address will not be published. Required fields are marked *

*