590 likes | 685 Views
Distributed Systems Introduction and background. Mohan Kumar. Course information. http:// www.cs.rit.edu /~ hpb /Lectures/20135/652/ index.html. Requirements. CSCI -352 Operating Systems or equivalent and CSCI-603 Advanced C++ and Program Design or equivalent. Course Content.
E N D
Distributed Systems Introduction and background Mohan Kumar CSCI652.002 Spring 2014 B&K
Course information http://www.cs.rit.edu/~hpb/Lectures/20135/652/index.html CSCI652.002 Spring 2014 B&K
Requirements • CSCI-352 Operating Systems or equivalent and CSCI-603 Advanced C++ and Program Design or equivalent CSCI652.002 Spring 2014 B&K
Course Content • Issues and challenges in distributed systems, including: communication, distributed processes, naming and name services, synchronization, consistency and replication, transactions, fault tolerance and recovery, security, distributed objects, and distributed file systems. CSCI652.002 Spring 2014 B&K
Outcomes • Build a solid foundation in distributed systems. • Outcomes: • Understand fundamental concepts of distributed computing systems. • Understand modern distributed systems – P2P, mobile, pervasive, sensor etc. • Recognize importance of addressing challenges in modern systems to facilitate distributed computing. • Develop distributed programs on real systems. • More (you tell usat the end of semester) CSCI652.002 Spring 2014 B&K
Attendance • Class participation: ACTIVE Participation will prepare students for midterms. Students are expected to interact actively during lectures. All students are expected to solve homework problems and engage in class discussions. CSCI652.002 Spring 2014 B&K
Course material • Reference Books • Slides by Coulouris etal. • www.cdk4.net • Power point slides and whiteboard notes prepared by the professors • Students are expected to read corresponding chapters from textbook prior to each class (please see tentative schedule). • PPT slides prepared by the professors may or may not be available before class. But they will be made available after class. • Reference books and articles CSCI652.002 Spring 2014 B&K
Course organization • The course will mainly have two main themes. • Distributed Algorithms– • distributed processes/objects, interprocess communication, remote procedure call, coordination, file systems, clocks and global states, security, concurrency, shared memory, transactions and replication. • Systems - • Operating systems, Distributed file systems, Name services, case studies, implementations, P2P, Security, • Plan 9 System CSCI652.002 Spring 2014 B&K
Textbook and References • Textbook • Distributed Systems: Concepts and Design George Coulouris, Jean Dollimore and Tim Kindberg Addison Wesley, 4th Edition, 5th Edition - e-version of 5th edition is available on Kindle • References • Distributed Systems: Principles and Paradigms A.S. Tanenbaum and M. V. Steen, Pearson Publishers,2nd Edition. • Distributed Operating Systems & Algorithms, R Chow and T. Johnson, Addison-Wesley, 1997. • Related Articles – details will be provided during the course CSCI652.002 Spring 2014 B&K
Grading • The structure of quizzes will be discussed in class, at least one week prior to the quiz. • Midterm 1: 15% • Midterm 2: 15% • Final Exam: 30% • Group Work (project, presentation, report and class participation): 40%. • Group Presentations: Will be scheduled during the last week of semester. • Group Work Reports: Due at 9 am May 10, 2014. • Each Group will have 3 members; Groups to be formed before February 15. • Group Work: Problems will be assigned by February 25 and the expected date of completion is May 10. CSCI652.002 Spring 2014 B&K
What is a distributed system? • Concurrent components • Independent • Use message passing to communicate and coordinate • Lack of global clock • Asynchronous • Independent failures of components • Good for fault-tolerance CSCI652.002 Spring 2014 B&K
“ A distributed system is a collection of independent computers that appears to its users as a single coherent system”Tannenbaum and Van Steen, Distributed Systems, 2007. • Application developers can focus on developing applications rather than system issues • The distributed system should be • Easy to expand or scale • Available all the time • Accessible uniformly • Fault-tolerant CSCI652.002 Spring 2014 B&K
Layered representation Applications and services Middleware Mask Heterogeneity Provide abstraction, transparency Uniformity Operating System Communications Network Hardware PLATFORM CSCI652.002 Spring 2014 B&K
Motivation • Resource sharing • CPU • Disk • Software services • Databases • Fault-tolerance • Redundancy • Replication CSCI652.002 Spring 2014 B&K
Challenges • Heterogeneity • Transparency, openness • Security and privacy • Scalability • Failure handling • Concurrency of components CSCI652.002 Spring 2014 B&K
Modern Distributed Systems • Mobility • Wireless communications • WiFI, Bluetooth, Zigbee, LTE, WiMax, Cellular • Ubiquity • Small, but multifunctional devices • Cell phones, sensors, RFIDs • Large scale • Components • Data • Users CSCI652.002 Spring 2014 B&K
Enablers • Computer Technology • Advanced microprocessors • Multi-core architectures • Lower costs (CPU, memory, peripheral devices) • High-speed networks • Wired and wireless • Applications • Business • Scientific • Everything else …. CSCI652.002 Spring 2014 B&K
Examples of Distributed Systems • Airlines • Aircraft • Car • Building • University • The Internet • Intranets • Mobile and Ubiquitous systems Grid Computers • Pervasive Systems • Sensor Systems • P2P Networks CSCI652.002 Spring 2014 B&K
The Challenge is to provide a uniform view CSCI652.002 Spring 2014 B&K
What is a distributed system? • Concurrent components • Independent • Use message passing to communicate and coordinate • Lack of global clock • Asynchronous • Independent failures of components • Good for fault-tolerance CSCI652.002 Spring 2014 B&K
Concurrency • Program execution • Access to resources • Message passing • Coordination • Resource sharing Coordination of concurrently executing programs CSCI652.002 Spring 2014 B&K
No Global Clock • Clocks of different components are not synchronized • Asynchronous • Concurrent programs coordinate their actions by passing messages CSCI652.002 Spring 2014 B&K
Event ordering • Lamport’s logical ordering • X sends m1 before Y receives m1 • Y sends m2 before X receives m2 • Because we know replies are sent after receiving messages • That is m2 is a reply to m1 • Y receives m1 before sending m2 CSCI652.002 Spring 2014 B&K
Time services • Global time consensus is needed to • Coordinate distributed activities • File backup • Expiration time of a received message/data • Event related activities • When an event occurs or has already occurred • How long did it take • Which event occurred first CSCI652.002 Spring 2014 B&K
Clocks • Physical clock • Approximation of real-time • Logical clock • Preserves ordering of events CSCI652.002 Spring 2014 B&K
Independent Failures • Distributed systems can fail in multiple ways • CPU/memory of one or more components • Network link/s • Programs might stop executing • E.g., input/output, synchronization • System components may get isolated CSCI652.002 Spring 2014 B&K
Resource sharing • Hierarchy • Processors, Disks • Shared data • Shared webpages • Search engine • Weather channel • Currency converter CSCI652.002 Spring 2014 B&K
Services • Manage resources • Present functionalities of resources to users and applications • Coherent to applications/users • Examples • File service • Mail service • FTP service • Client-server architectures • Service may access resources remotely • Clients connect to servers • Utilize services CSCI652.002 Spring 2014 B&K
Basic applications • Remote login • Keyboard and display interface • Virtual terminal support • telnet, rlogin • File transfer • File, file structures, file attributes • E.g., FTP • Messaging • Send and receive • Email, SMTP • Browsing • Information retrieval • Remote execution • Execute a program on a remote server • E.g, MIME – multipurpose Internet mail extension CSCI652.002 Spring 2014 B&K
System models • Architectural models • Client-server model • Peer-to-peer model • Functional models • Interaction model • Failure model • Security model CSCI652.002 Spring 2014 B&K
Architecture • Structural organization of various components • Simple abstraction of components • Two main objectives • Placements • Network topology • Data distribution • Interrelationships • Patterns of communications • Relationships between data objects • Data access patterns, dependencies CSCI652.002 Spring 2014 B&K
Peer-to Peer and Client/server variations • Peer-to-peer • No distinction among peers • Excellent scalability compared to C-S • Resources are utilized in a distributed network, and more efficiently. • Minimize bottleneck points • Variations • Multiple servers • Each server specializes in a providing a particular service • E.g., web servers, DNS server, authentication etc. • Proxy servers • Enhance availability • Reduce latency • Caches • Objects cached to reduce latency • Mobile code and mobile agents • Mobile code (e.g., applet) downloaded to client’s site • Local interactions, fast response as there are no communication delays • Mobile agents include code and data • Go around execute on different processors CSCI652.002 Spring 2014 B&K
Goals • Efficiency • Propagation delays, communications • Overlapped computation/communication • Efficient distributed processing and load sharing • Flexibility • User friendly • Ability to evolve and migrate • Modularity, scalability, portability, and interoperability • Consistency • Predictability and uniformity in system behavior • Integrity in concurrency control, failure handling and failure handling • Robustness • Ability to handle exceptional situations and errors • Change in topology, lost message, crashed system etc. • Reliability, protection and access control • Secure and privacy preserving CSCI652.002 Spring 2014 B&K
Design requirements • Performance • Responsiveness • Access to shared resources • Communication delays • Server loads, scheduling, wait periods • Control switching • Load balancing • Combined computation/communication scheduling • Scalability • Fault-tolerance CSCI652.002 Spring 2014 B&K
Transparency • Ability to hide/mask all system details from users/application developers • System details are irrelevant to users/developers • System details are very relevant to system managers • Creation of an illusion of a model that it is supposed to be Applications and services Middleware Mask Heterogeneity Provide abstraction, transparency Uniformity Operating System Communications Network Hardware PLATFORM This is in contrast to the meaning of transparency in English – open, visible, see through etc. CSCI652.002 Spring 2014 B&K
Basic Processes • Server • Accepts inputs from other processes • Performs a service • Returns outcomes • Client • User/application level • Makes requests, receives results The roles of server and client may change with time • Peer • All are equal CSCI652.002 Spring 2014 B&K
Processes • A process is a program in execution • Sequential • A single control block regulates the execution • A control block contains state information – program counters, register contents, stack pointers, communication ports, file descriptors etc. • Process control block (PCB) • Concurrent • Simultaneously interacting sequential processes are said to be concurrent • Asynchronous • Separate address space and PCBs • Components may interact through communication/synchronization PCB PCB PCB Process Process Process CSCI652.002 Spring 2014 B&K
Threads • A lightweight process • Threads of a process share the same address space, but have their own registers • A thread control block (or TCB) is local to a thread • Typically, • Threads have their own PC, SP and register set. • Threads share address space, communication ports and file descriptors • Multiple threads are spawned by a process • A PCB is shared among interacting threads • Context switching among threads is lightweight compared to context switching among processes PCB PCB PCB Thread TCB| TCB| TCB Thread Thread Thread Thread TCB| TCB Thread Thread run-time library support Operating System Support CSCI652.002 Spring 2014 B&K
Interaction model • Process interactions • C-S, P2P, message passing, shared space, synchronous, asynchronous • Single process/thread, multiple threads • Distributed algorithms • Behavior of multiple processes • Includes message transmissions • Each process • Has own its PCB and is inaccessible by other processes • Likely to be executing on different systems in the network • Difficult to coordinate • Two significant factors • Communication performance • Maintenance of global state • Computer clocks drift • Clock drifts differ from one another • Functional models • Interaction model • Failure model • Security model CSCI652.002 Spring 2014 B&K
Performance of communication channels • Latency • Time taken for message to arrive at the destination • Delay in accessing the network • Delay (processing times at) due to OS communication services at both ends • Bandwidth • Frequency • Interference • Channel sharing • Jitter • Variation in times taken to deliver different components of a message CSCI652.002 Spring 2014 B&K
Two variants • Synchronous • Process execution time is bounded • Message latency over a channel is bounded • Process’ local clock drift is bounded • Though difficult to build, very useful as a model • Time outs • Detect failures • Asynchronous • Blue bullets (Assumptions) above are NOT true • Most systems are asynchronous CSCI652.002 Spring 2014 B&K
Failure model • Omission failures • Processor/process crash • Communication failure/message drops • Arbitrary failures • Process setting wrong values in data • Data corruption during transmission • Timing failures • Synchronous systems • Real-time systems • Clock, process, channel • Masking failures • Replication • Service to mask failures • Functional models • Interaction model • Failure model • Security model CSCI652.002 Spring 2014 B&K
Security model • Protecting objects • Who is allowed to access what data • Check access rights, verify identity • Securing process and interactions • Processes • Server, client, peer • Communication channel • Copy/alter messages; inject harmful messages • Encryption, authentication, time stamping • Denial of service • Mobile code, mobile agents • Functional models • Interaction model • Failure model • Security model CSCI652.002 Spring 2014 B&K
Event ordering • Lamport’s logical ordering • X sends m1 before Y receives m1 • Y sends m2 before X receives m2 • Because we know replies are sent after receiving messages • That is m2 is a reply to m1 • Y receives m1 before sending m2 CSCI652.002 Spring 2014 B&K
Time services • Global time consensus is needed to • Coordinate distributed activities • File backup • Expiration time of a received message/data • Event related activities • When an event occurs or has already occurred • How long did it take • Which event occurred first CSCI652.002 Spring 2014 B&K
Clocks • Physical clock • Approximation of real-time • Logical clock • Preserves ordering of events CSCI652.002 Spring 2014 B&K
Network Background Slides from Kurose and Ross’s book will be used Please read the book CSCI652.002 Spring 2014 B&K
Networking review • Please read up chapter 4 or a networking book • I will cover only mobile and wireless networking CSCI652.002 Spring 2014 B&K
Mobile IP CSCI652.002 Spring 2014 B&K