1 / 16

Job Scheduling on Amazon EC2

Job Scheduling on Amazon EC2 . Nathaniel Hart 5/19/14. What is Amazon EC2?. Instant, configurable server instances Pay only for what you use Easy to scale Frustrating Instances come bare-bones User configured MPI can run on it

bayle
Download Presentation

Job Scheduling on Amazon EC2

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Job Scheduling on Amazon EC2 Nathaniel Hart 5/19/14

  2. What is Amazon EC2? • Instant, configurable server instances • Pay only for what you use • Easy to scale • Frustrating • Instances come bare-bones • User configured • MPI can run on it • But the latency in a shared system can kill it, unless you pay extra for a cluster that is in the same rack.

  3. What is Job Management? • System level • Coordinating local and cloud resources • Cluster level • Dispatching jobs to all available servers in an equal manner • Job migration • Server level • Scheduling jobs to run efficiently

  4. System Level Job Management client client client Amazon EC2 corporate data center

  5. Cloudbursting using CometCloud Diagram Source: Hyunjoo, Kim et al

  6. Cost Savings Using Cloudbursting Source: Li, Yin et al.

  7. Job Dispatching vs. Job Scheduling Cluster Level Server Level Assigns tasks to computing resources within an instance Focused on using instance resources most efficiently Should be able to execute in relative isolation (worker node) • Load balance between available EC2 instances • Focused on maximizing use of all instances • Requires system-wide awareness (master node)

  8. Cluster Level Job Management decider client client client cluster instance instance instance instance cluster detail

  9. Job Dispatching

  10. Job Migration • Load balancing: • If a VM is starved, send it a job. • Incurs a time penalty, and can get out of hand quickly if not managed. • Requires Management • Limit job migration to when jobs cannot be scheduled normally • Closer to end of a job, some instances may be starved of tasks. • Limit job migration to a fixed interval

  11. Job Scheduling

  12. Scheduling Comparison Source: Li, Yin et al.

  13. Pros & Cons

  14. Closing Thoughts • This was a summary of the methods I found. There appear to be many solutions, and each author claims that their own works wonderfully. • The only constants are the problems: • Volatile job execution time / resource requirements • Emergent properties and unknowns at start of job can drastically affect the job scheduling needs • Tradeoffs between computation and communication • Need for reliability • Need for cost efficiency

  15. Works Sourced • Amazon Web Services, Inc. “AWS Simple Icons for Architecture Diagrams“. https://aws.amazon.com/architecture/icons/. Retreieved May 19, 2014. • Hyunjoo, Kim et al. “Investigating the Use of Autonomic Cloudbursts for High-Throughput Medical Image Registration”. Retrieved from IEEE Xplore Digital Library. • Leslie, Luke M. et al. “Exploiting Performance and Cost Diversity in the Cloud”. 2013 IEEE Sixth International Conference on Cloud Computing. Retrieved from IEEE Xplore Digital Library. • Li, Yin et al. “H-PFSP: Efficient Hybrid Parallel PFSP Protected Scheduling for MapReduce System”. 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications. Retrieved from IEEE Xplore Digital Library. • Lu, Peng, et al. "Workload Characteristic Oriented Scheduler for MapReduce". 2012 IEEE 18th International Conference on Parallel and Distributed Systems”. Retrieved from IEEE Xplore Digital Library. • Moschakis, Ioannis A., Karatza, Helen D. “Parallel Job Scheduling on a Dynamic Cloud Model with Variable Workload and Active Balancing”. 2012 16th Panhellenic Conference on Informatics. Retrieved from IEEE Xplore Digital Library. • Moschakis, Ioannis A., Karatza, Helen D. “Performance and Cost evaluation of Gang Scheduling in a Cloud Computing System with Job Migrations and Starvation Handling”. Retrieved from IEEE Xplore Digital Library. • Nahir, Amir, Ariel Orda, and Danny Raz. “Schedule First, Manage Later: Network-Aware Load Balancing”. 2013 Proceedings IEEE INFOCOM. Retrieved from IEEE Xplore Digital Library.

  16. ?

More Related