1 / 20

Site Report

Site Report. US CMS T2 Workshop. Samir Cury on behalf of T2_BR_UERJ Team. Server's Hardware profile. SuperMicro machines 2 X Intel Xeon dual core @ 2.0 GHz 4 GB RAM RAID 1 - 120 GB HDs. Nodes Hardware profile (40). Dell PowerEdge 2950 2 x Intel Xeon Quad core @ 2.33 GHz 16 GB RAM

darryl-mack
Download Presentation

Site Report

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Site Report US CMS T2 Workshop Samir Cury on behalf of T2_BR_UERJ Team

  2. Server's Hardware profile • SuperMicro machines • 2 X Intel Xeon dual core @ 2.0 GHz • 4 GB RAM • RAID 1 - 120 GB HDs

  3. Nodes Hardware profile (40) • Dell PowerEdge 2950 • 2 x Intel Xeon Quad core @ 2.33 GHz • 16 GB RAM • RAID 0 – 6 x 1 TB Hard Drives • CE Resources • 8 Batch slots • 66.5 kHS06 • 2 GB RAM / Slot • SE Resources • 5.8 TB Useful for dCache or hadoop Private network only

  4. Nodes Hardware profile (2+5) • Dell R710 • 2 are Xen Servers – not worker nodes • 2 X Intel Xeon Quad core @ 2.4 GHz • 16 GB RAM • RAID 0 – 6 x 2 TB Hard Drives • CE • 8 Batch Slots (or more?) • 124.41 kHS06 • 2 GB RAM / Slot • SE • 11.8 TB for dCache or hadoop Private network only `

  5. First phase nodes Profile (82) • SuperMicro Server • 2 Intel Xeon single core @ 2.66 GHz • 2 GB RAM • 500 GB Hard Drive & 40 GB Hard Drive • CE Resources • Not used – Old CPU’s & low RAM per node • SE Resources • 500 GB per node

  6. Plans for the future - Hardware • Buying 5 more Dell R710 • Deploying 5 R710 when the disks arrive • More 80 cores • More 120 TB Storage • More 1244 kHS06 Total • CE - 40 PE 2950 + 10 R710 = 400 Cores || 3.9 kHS06 • SE - 240 + 120 + 45 = 405 TB

  7. Software profile – CE • OS – CentOS 5.3 64 bits • 2 OSG Gatekeepers • Both running OSG - 1.2.x • Maintenance tasks eased by redundancy – less downtimes • GUMS 1.2.15 • Condor 7.0.3 for job scheduling

  8. Software profile – SE • OS - CentOS 4.7 32 bits • dCache 1.8 • 4 GridFTP Servers • PNFS 1.8 • PhEDEx 3.2.0

  9. Plans for the future: Software/Network • SE Migration • Right now we use dCache/PNFS • We plan to migrate to BeStman/Hadoop • Some effort already comes up with results • Adding the new nodes to the Hadoop SE • Migrate the data • Test with real production environment • Jobs and users accessing • Network Improvement • RNP (our network provider) plan to deliver for us a 10 Gbps link before the next SuperComputing Conference.

  10. T2 Analysis model & associated Physics groups • We have reserved 30 TB for each of the groups: • Forward Physics • B-Physics • Studying the possibility to reserve space for Exotica The group has several MSc & PhD students working on CMS Analysis for a long time – These have a good support Some Grid users submit, sometimes run into trouble and give up – don't ask for support

  11. Developments • Condor Mechanism based on suspend to give priority to a very little pool of important users : • 1 pair of batch slots per core • When the priority user’s jobs arrive, it pauses the normal job on the other batch slot • Once it finishes and vacate the slot, his pair automatically resumes. • Documentation can become available for the interested • Developed by Diego Gomes

  12. Developments • Condor4Web • Web interface to visualize condor queue • Shows grid DN’s • Useful for Grid users that want to know how the job is going scheduled inside the site – http://monitor.hepgrid.uerj.br/condor • Available on http://condor4web.sourceforge.net • Still have much to evolve, but already works • Developed by Samir

  13. CMS Center @ UERJ • During LISHEP 2009 – January we have inaugurated a small control room for CMS on UERJ:

  14. Shifts @ CMS Center Our computing team have participated on tutorials and now we have four potential CSP Shifters

  15. CMS Centre (quick) profile • Hardware • 4 Dell workstations with 22” monitors • 2 x 47” TV’s • Polycom SoundStation • Software • All the conferences including with the other CMS Centers are done via EVO

  16. Andre Sznajder (Project coordinator) Jose Afonso (Software coordinator) Fabiana Fortes (Site admin) Raul Matos (Trainee) Cluster & Team • Alberto Santoro (General supervisor) • Eduardo Revoredo (Hardware coordinator) • Samir Cury (Site admin) • Douglas Milanez (Trainee)

  17. 2009/2010 year’s goals • We have worked in 2009 mostly in • Getting rid of the infra-structure problems • Electrical Insuficciency • AC – Many downtimes due to this • These are solved now • Besides that problems • Running official production on small workflows • Doing private production & analysis for local and Grid users • 2010 goal • Use the new hardware and infra-structure for a more reliable site • Run more heavy workflows and increase participation and presence on official production.

  18. Thanks! • I want to formally thank Fermilab, USCMS and OSG for their financial help to bring an UERJ representantive here. • Also want to thank USCMS for this very useful meeting

  19. Questions? Comments?

More Related