450 likes | 561 Views
IT Briefing Agenda 7/17/05. Microsoft Agreement SPSS Site License IMAP Polling NetReg/CAT Update Research Cluster Premiere Support NetCom Q&A. John Ellis Marcy Alexander Ken Guyton Alan White Keven Haynes Karen Jenkins Paul Petersen. General Updates. Eagle Mail Performance.
E N D
IT Briefing Agenda 7/17/05 • Microsoft Agreement • SPSS Site License • IMAP Polling • NetReg/CAT Update • Research Cluster • Premiere Support • NetCom Q&A • John Ellis • Marcy Alexander • Ken Guyton • Alan White • Keven Haynes • Karen Jenkins • Paul Petersen
Eagle Mail Performance An opportunity to make it faster Ken Guyton
Architecture • Eagle mail consists of three services • Relay • Delivery • Reading
Architecture • Relay • Moving email from computer to computer (SMTP)
Architecture • Delivery • Delivering messages into an INBOX.
Architecture • Reading • Users retrieving their messages to read them, mark as read, delete, etc. (IMAP)
Virus Architecture LDAP routing other svrs Relay Spam disk 2 4 Read/Deliv firewall 6 IMAP proxy 3 Relay Webmail 4 3 clients
The Situation • Reading and Delivery live on the same servers.
The Situation CPU Utilization 100% CPU Utilization 90% 80% 80% 50% 50% 1 2 3 4 5 Read/Delivery Servers
The Situation Users per server 100% CPU Utilization 90% 12300 12809 80% 80% 8313 50% 50% 4168 75 1 2 3 4 5 Read/Delivery Servers
The Question • What are 75 users doing to use 50% of a Read/Delivery server?
Observations • Make some measurements of busy IMAP processes • Tracing with truss • Profile processes • Packet snooping
Observations • ...to answer questions: • What are these processes doing? • What system calls are using the most CPU time? • What IMAP commands are being sent? • ...and how often?
Results • Processes are doing a lot of disk I/O. • The system calls that account for the vast majority of CPU time are read() and alarm(). • The IMAP command is SELECT
More Observations • Instrument the imapd daemon (we have the source code!) • Log SELECTS on a user and mailbox basis • Plot behavior
Results ≤ 1 min ≤ 5 min > 5 min
Conclusions • High-frequency SELECTs are killing us • A new server/75 users is EXPENSIVE!
Hypothesis • When clients check for new email they send a SELECT • (They should send a NOOP) • Users are setting their clients to check for email every minute
Final Notes • Webmail checks every five minutes (and does use SELECT) • Some clients have a drop-down menu to select this time (1-min, 5-min, etc.)
Our Plea • See if your users are polling < 5 min • 10 min is better • You can always manually check for new email • Help them change their polling time if needed
The UNIX Group • Chris Alexander • Bruce Anderson • Karla Fields • Amanda Gagnon • Ken Guyton • Curt Tucker • Eric Van Wieren
NetReg/CAT Update Alan White
Emory University High Performance Computing Cluster Keven Haynes
Need for High Performance • Large number of computations • Large data set • Complex computations • Specialized applications • More disciplines doing computational work
Need for Shared Resources • Most researchers do not have physical resources to house large computing systems. Air Conditioning, Power, Security are all important, often overlooked. • Many researchers lack technical expertise required to manage systems, especially Linux/Unix. • Most personally-owned systems are underutilized, therefore not as cost-effective. • Money pooled-together can buy bigger and better systems.
Emory High Performance Computing Cluster • Partnership between Emory College, BIMCORE (School of Medicine) and ITD. • Emory College and individual faculty (Jeager, Printz) provided funding for purchase of the cluster. • BIMCORE provides software expertise, cost-recovery infrastructure. • ITD provides facility and system administration.
The Cluster - hardware • 63 dual-processor (AMD 2.2 GHz Opteron 248) Sun V20z’s “Compute Nodes” • 1 quad-processor (AMD 2.2 GHZ Opteron 848) Sun V40z “Master Node” • Compute Nodes have 2 GB of RAM each, Master node has 8GB. • Each node has 73 GB of local disk space (RAID 1) • Master Node has 550+ GB of local disk space • Two 47u APC powered rack enclosures
The Cluster - Networking • All nodes connected via gigabit Ethernet (copper) on private network • Two SMC gigabit switches • 21 Nodes are connected via 4 gigabit Myrinet • Service processors connected via 100 Mb Ethernet • Two MRV serial console switches
The Cluster - Software • Red Hat Enterprise Linux - Version 3, x86_64, Advanced Server and Workstation • Kernel 2.4.21, glibc 2.3.2, gcc version 3.2.3 • 64-bit Operating system/runtime environment • Sun Grid EngineTM : • Manages queuing and prioritization of jobs • Performs job and user accounting for time-shared resource • Can support up to 200,000 jobs simultaneously • Heterogeneous support allows connection of Mac OS X, Solaris and other execution hosts
Current and Future Applications • Genesis (neural simulator, Jaeger) • Pattern Generation and Homeostasis in Neural Circuits (Prinz) • Pharmacology (Severson) • Others: • Animation Rendering • Statistical Analysis (-R-) • Numerical Analyis (MATLAB) • Bioinformatics (BLAST) • Large Population Studies, GIS
Who may use the Cluster? • Researchers, namely PI’s • Open to anyone affiliated with Emory, possibly some external research • Subscription Fee: ~ $3000/year or $750/quarter
Premiere Support Overview Karen Jenkins
Premiere Support • Advanced/escalated support for specific set of customers • Local support and other campus technical professionals • Executive leadership (later phase) • Benefits • Dedicated number to reduce wait times • Direct entry to high level support technicians
When to use • Reporting of system down and other performance issues only. • University Enterprise applications & Network • Examples include: Network Outage affecting a large department, building, or campus; Eagle Mail is down; PeopleSoft is crawling; other “strange” behavior • Non-critical or other work requests should go through Manage IT or ESR. • Examples include: account requests, virus reporting, suggestions to improve service, etc.
Logistics • Hours: M-F 8:00am – 5:00pm • After hours calls automatically forwarded to the help desk (which forward to on-call after help desk hours). • Premiere Support Team: • Call Center supervisor • Craig Myers • Linda Ellis
Responsibilities • Obtain technical input/details • Escalate to proper Tier 3 team • Provide regular communication and updates • Via Manage IT Bulletin Board • Provide final debrief / explanation of problem • Again via Manage IT
Setup • Dedicated line rings on the primary team members phone sets • After 4 rings rolls over to help desk FTE phones … if busy the queue • Premiere Support calls are placed in the front of the queue. • Investigating adding a visual indicator for the help desk FTEs.
Pre-Requisites • Requires a support account in Manage IT • Participation in the local-l listserv • Caller ID information displayed
… and the number is … Will be posted on the Manage IT Bulleting Board Available on TBD
Quick Manage IT Update • New Manage IT Major Features for 8/31/05 • Flashboards • Port Status Table • 2-Way email • Assignment permissions group • Target Features for 9/30/05 • PS Status Table • Emory Reports • Magic View • Resolution / Communication to requester Manage IT Training Session Scheduled for: September 13th @ 1:00pm NDB Auditorium or Kennessaw
Quick ESR Update • New ESR Major Features by 8/31/05 • Change pop-ups to long names • Web & DB self-service forms • Target Features for 9/30/05 • On-behalf of (in Manage IT too) • Communication box