We present the Hadoop Fair Sojourn Protocol (HFSP) scheduler, which implements a size-based scheduling discipline for Hadoop. The benefits of size-based scheduling disciplines are well recognized in a variety of contexts (computer networks, operating systems, etc...), yet, their practical implementation for a system such as Hadoop raises a number of important challenges. With HFSP, which is available as an open-source project, we address issues related to job size estimation, resource management and study the effects of a variety of preemption strategies. Although the architecture underlying HFSP is suitable for any size-based scheduling discipline, in this work we revisit and extend the Fair Sojourn Protocol, which solves problems related to job starvation that affect FIFO, Processor Sharing and a range of size-based disciplines. Our experiments, in which we compare HFSP to standard Hadoop schedulers, pinpoint at a significant decrease in average job sojourn times - a metric that accounts for the total time a job spends in the system, including waiting and serving times - for realistic workloads that we generate according to production traces available in literature.
Practical size-based scheduling for MapReduce workloads
arXiv:1302.2749, May 3rd, 2013
      
  Type:
        Conference
      Date:
        2013-05-03
      Department:
        Data Science
      Eurecom Ref:
        4004
      Copyright:
        © EURECOM. Personal use of this material is permitted. The definitive version of this paper was published in arXiv:1302.2749, May 3rd, 2013 and is available at : 
      PERMALINK : https://www.eurecom.fr/publication/4004
 
     
                       
                      