340 likes | 464 Views
High Performance User-Level Sockets over Gigabit Ethernet. Pavan Balaji Ohio State University balaji@cis.ohio-state.edu. Piyush Shivam Ohio State University shivam@cis.ohio-state.edu. Pete Wyckoff Ohio Supercomputer Center pw@osc.edu. D.K. Panda Ohio State University
E N D
High PerformanceUser-Level Sockets overGigabit Ethernet Pavan Balaji Ohio State University balaji@cis.ohio-state.edu Piyush Shivam Ohio State University shivam@cis.ohio-state.edu Pete Wyckoff Ohio Supercomputer Center pw@osc.edu D.K. Panda Ohio State University panda@cis.ohio-state.edu
Presentation Overview • Background and Motivation • Design Challenges • Performance Enhancement Techniques • Performance Results • Conclusions
Background and Motivation • Sockets • Frequently used API • Traditional Kernel-Based Implementation • Unable to exploit High Performance Networks • Earlier Solutions • Interrupt Coalescing • Checksum Offload • Insufficient • It gets worse with 10 Gigabit Networks • Can we do better • User-level support
Kernel Based Implementation of Sockets Application or Library User Space • Pros • High Compatibility • Cons • Kernel Context Switches • Multiple Copies • CPU Resources Sockets TCP Kernel IP NIC Hardware
Alternative Implementations of Sockets (GigaNet cLAN) Application or Library User Space • Pros • High Compatibility • Cons • Kernel Context Switches • Multiple Copies • CPU Resources Sockets TCP Kernel IP IP-to-VI layer “VI aware” NIC Hardware
Sockets over User-Level Protocols • Sockets is a generalized protocol • Sockets over VIA • Developed by Intel Corporation [shah98] and ET Research Institute [sovia01] • GigaNet cLAN platform • Most networks in the world are Ethernet • Gigabit Ethernet • Backward compatible • Gigabit Network over the existing installation base • MVIA: Version of VIA on Gigabit Ethernet • Kernel Based • A need for a High Performance Sockets layer over Gigabit Ethernet
User-Level Protocol over Gigabit Ethernet • Ethernet Message Passing (EMP) Protocol • Zero-Copy OS-Bypass NIC-driven User-Level protocol over Gigabit Ethernet • Developed over the Dual-processor Alteon NICs • Complete Offload of message passing functionality to the NIC • Piyush Shivam, Pete Wyckoff, D.K. Panda, “EMP: Zero-Copy OS-bypass NIC-driven Gigabit Ethernet Message Passing”, Supercomputing, November ’01 • Piyush Shivam, Pete Wyckoff, D.K. Panda, “Can User-Level Protocols take advantage of Multi-CPU NICs?”, IPDPS, April ‘02
EMP: Latency A base latency of 28s compared to an ~120 s of TCP for 4-byte messages
EMP: Bandwidth Saturated the Gigabit Ethernet network with a peak bandwidth of 964Mbps
EMP Library Proposed Solution Application or Library • Kernel Context Switches • Multiple Copies • CPU Resources • High Performance Sockets over EMP User Space OS Agent Kernel Gigabit Ethernet NIC Hardware
Presentation Overview • Background and Motivation • Design Challenges • Performance Enhancement Techniques • Performance Results • Conclusions
Design Challenges • Functionality Mismatches • Connection Management • Message Passing • Resource Management • UNIX Sockets
Functionality Mismatches and Connection Management • Functionality Mismatches • No API for buffer advertising in TCP • Connection Management • Data Message Exchange • Descriptors required for connection management
Message Passing • Message Passing • Data Streaming • Parts of the same message can be read potentially to different buffers • Unexpected Message Arrivals • Separate Communication Thread • Keeps track of used descriptors and re-posts • Polling Threads have high Synchronization cost • Sleeping Threads involve OS scheduling granularity • Rendezvous Approach • Eager with Flow Control
Rendezvous Approach Sender Receiver SQ RQ SQ RQ send() Request receive() ACK Data
Eager with Flow Control Sender Receiver SQ RQ SQ RQ send() Data receive() ACK Data
Resource Management and UNIX Sockets • Resource Management • Clean up unused descriptors (connection management) • Free registered memory • UNIX Sockets • Function Overriding • Application Changes • File Descriptor Tracking
Presentation Overview • Background and Motivation • Design Challenges • Performance Enhancement Techniques • Performance Results • Conclusions
Performance Enhancement Techniques • Credit Based Flow Control • Disabling Data Streaming • Delayed Acknowledgments • EMP Unexpected Queue
Credit Based Flow Control • Multiple Outstanding Credits Sender Receiver SQ RQ SQ RQ Credits Left: 4 Credits Left: 3 Credits Left: 2 Credits Left: 1 Credits Left: 0 Credits Left: 4
Non-Data Streaming and Delayed Acknowledgments • Disabling Data Streaming • Intermediate copy required for Data Streaming • Place data directly into user buffer • Delayed Acknowledgments • Increase in Bandwidth • Lesser Network Traffic • NIC has lesser work to do • Decrease in Latency • Lesser descriptors posted • Lesser Tag Matching at the NIC • 550ns per descriptor
EMP Unexpected Queue • EMP Unexpected Queue • EMP features unexpected message queue • Advantages: Last to be checked • Disadvantage: Data Copy • Acknowledgments in the Unexpected Queue • No copy, since acknowledgments carry no data • Acknowledgments pushed out of the critical path
Presentation Overview • Background and Motivation • Design Challenges • Performance Enhancement Techniques • Performance Results • Conclusions
Performance Results • Micro-benchmarks • Latency (ping-pong) • Bandwidth • FTP Application • Web Server • HTTP/1.0 Specifications • HTTP/1.1 Specifications
Experimental Test-bed • Four Pentium III 700Mhz Quads • 1GB Main Memory • Alteon NICs • Packet Engine Switch • Linux version 2.4.18
Micro-benchmarks: Latency • Up to 4 times improvement compared to TCP • Overhead of 0.5us compared to EMP
Micro-benchmarks: Bandwidth • An improvement of 53% compared to enhanced TCP
FTP Application • Up to 2 times improvement compared to TCP
Web Server (HTTP/1.0) • Up to 6 times improvement compared to TCP
Web Server (HTTP/1.1) • Up to 3 times improvement compared to TCP
Conclusions • Developed a High Performance User-Level Sockets implementation over Gigabit Ethernet • Latency close to base EMP (28 s) • 28.5 s for Non-Data Streaming • 37 s for Data Streaming sockets • 4 times improvement in latency compared to TCP • Peak Bandwidth of 840Mbps • 550Mbps obtained by TCP with increased Registered space for the kernel (up to 2MB) • Default case is 340Mbps with 32KB • Improvement of 53%
Conclusions (contd.) • FTP Application shows an improvement of nearly 2 times • Web Server shows tremendous performance improvement • HTTP/1.0 shows an improvement of up to 6 times • HTTP/1.1 shows an improvement of up to 3 times
Future Work • Dynamic Credit Allocation • NIC: The trusted component • Integrated QoS • Currently on Myrinet Clusters • Commercial applications in the Data Center environment • Extend the idea to next generation interconnects • InfiniBand • 10 Gigabit Ethernet
NBC Home Page Thank You For more information, please visit the http://nowlab.cis.ohio-state.edu Network Based Computing Laboratory, The Ohio State University