420 likes | 573 Views
CERN openlab Board of Sponsors Update on Computing at CERN. Frédéric Hemmer IT Department Head CERN 2 July 2010. Storage & Data Management (I). LHC is just starting to take data Modest experience in distributing and analyzing the data worldwide
E N D
CERN openlab Board of SponsorsUpdate on Computing at CERN Frédéric Hemmer IT Department Head CERN 2 July 2010
Storage & Data Management (I) • LHC is just starting to take data • Modest experience in distributing and analyzing the data worldwide • Already some concerns wrt. performance and scalability • Initial assumptions in computing models that the network was the limitation • Data Management software “home grown” and too complicated • Concerns about the long term sustainability • However • Network, (global) file systems, storage have evolved Update on Computing at CERN - Frédéric Hemmer
Storage & Data Management (II) • Just starting to assemble ideas • IT PoW 11/2009 http://indico.cern.ch/getFile.py/access?sessionId=16&resId=0&materialId=2&confId=68463 • WLCG Data Management Jamboree http://indico.cern.ch/conferenceDisplay.py?confId=82919 • Other communities have similar or even larger scale problems • Biology, SKA, etc... • EIROLabs • European Commission is launching calls centred around data infrastructure • Opportunities for further collaborations Update on Computing at CERN - Frédéric Hemmer
Tier 0 – Tier 1 – Tier 2 • Tier-0 (CERN): • Data recording • Initial data reconstruction • Data distribution • Tier-1 (11 centres): • Permanent storage • Re-processing • Analysis • Tier-2 (~130 centres): • Simulation • End-user analysis Update on Computing at CERN - Frédéric Hemmer
WLCG Status is: • WLCG running increasingly high workloads: • ~1 million jobs/day • Real data processing and re-processing • Physics analysis • Simulations • ~100 k CPU-days/day • Unprecedented data rates ~100k CPU-days/day Traffic on OPN up to 70 Gb/s! - ATLAS reprocessing campaigns Data export during data taking: - According to expectations on average Tier-0 data traffic: > 4 GB/s input > 13 GB/s served Ian.Bird@cern.ch
The CERN “Tier-0” in numbers • Data Centre Operations (Tier 0) • 24x7 operator support and System Administration services to support 24x7 operation of all IT services. • Hardware installation & retirement • ~7,000 hardware movements/year; ~1000 disk failures/year • Management and Automation framework for large scale Linux clusters • Assets Update on Computing at CERN - Frédéric Hemmer
Some concerns • Complex, aging infrastructure with many players • (Too) much diversity • Too many hardware movements • Operations teams overloaded • How to handle remote operations • Virtualization • Handling 10x increase in host numbers • Many technologies • Security (signatures) • Systems Management • Many home made developments • Significant cost in software development/maintenance Update on Computing at CERN - Frédéric Hemmer
Tier-0 Power needs estimates May 2010 Update on Computing at CERN - Frédéric Hemmer
Plans for 2010-2012 • Consolidation of the existing 513 capacity • 600 KW of backed-up power • 3.5 MW of Physics capacity • Provision of “container” style of capacity • Incremental addition of ~400 KW units • Investigations of remote capacity usage • As a logical extension of 513, not as Grid/Cloud capacity • Already experimenting 100 KW in the Geneva area • But is this really (economically) feasible? • Many challenges • Timing, operation models, costs Update on Computing at CERN - Frédéric Hemmer
Other topics/concerns/questions • Computer Security • On-site – off-site • Identity Management & Single Sign on • Including multi-factor authentication • Cloud Computing • Can we make use of it? Economically? • Critical data protection – backups/restores • Some systems at the limit – O(10**9) files • Content Management Systems & Enterprise Search • ITIL • Slow progress – as expected • Software licenses (cost and models) • Common concern with EIROs • Wireless coverage and deployment • (unreasonable) expectations from users • Further progress in automation • Use of industry solutions? • Global File Systems • Are there alternatives to AFS? • Wide area/high speed networking evolution Update on Computing at CERN - Frédéric Hemmer
Collaboration with Institutions: UNOSAT Update on Computing at CERN - Frédéric Hemmer
CERN IT Department Background Information Update on Computing at CERN - Frédéric Hemmer
Outline • General Services • Collaborative Tools • CDS/Invenio • Indico • Networking • Internal • Wireless • External • CIXP • Computer Security • CNIC/PLC • Spam • Intrusion Detection • Grid Computing • Tier-0 • WLCG • Data Storage • EC Projects • Openlab • Competency Centres • Workshops – Summer Student program • UNOSAT Update on Computing at CERN - Frédéric Hemmer
IT Organization 2010 Director of ResearchandComputing Sergio Bertolucci Department Head Frédéric Hemmer CERN openlab Deputy Head David Foster Planning Officer Alan Silverman CERN School of Computing EU ProjectsBob Jones WLCG Ian Bird Computer Security Department Heads Office (DHO) Communication Systems(CS)Jean-Michel Jouanigot Database Services(DB)Tony Cass Experiment Support (ES) Jamie Shiers User & Document Services(UDS)Tim Smith Platform & Engineering Services (PES) Helge Meinhard Departmental Infrastructure(DI)Alan Silverman Operating Systems & Information Services (OIS) Christian Isnard Grid Technology (GT) Markus Schulz Computing Facilities(CF)Wayne Salter Data & Storage Services (DSS)Alberto Pace
IT Staff Breakdown Update on Computing at CERN - Frédéric Hemmer
General Services • Data Centre Operations (Tier 0) • 24x7 operator support and System Administration services to support 24x7 operation of all IT services. • Hardware installation & retirement (~7,000 hardware movements/year) • Management and Automation framework for large scale Linux clusters • Installed Capacity • 6’300 systems, 39’000 processing cores • CPU servers, disk servers, infrastructure servers • 13’900 TB usable on 42’600 disk drives • 34’000 TB on 45’000 tape cartridges • (56’000 slots), 160 tape drives • Tenders in progress or planned (estimates) • 2’400 systems, 16’000 processing cores • 19’000 TB usable on 20’000 disk drives Update on Computing at CERN - Frédéric Hemmer
General Services (II) • E-Mail and Distribution Lists • Up to 2 M incoming messages/day, 99% detected as spam • 18’000 mailboxes (~ 68% owned by physics community) • Web Services • 8’725 Web sites (~45% owned by physics community, 30% AFS-based) • Active Directory, CERN Certification Authority & CERN Authentication • Central authentication service for Linux and Windows computers and applications • Online X509 Certificate Authority • Windows Services • 60 TB of DFS workspaces (60 TB) • ~ 6’000 active PCs managed by CMF • Windows Terminal Servers and Custom Servers • 120 ‘custom servers’ (not for public use) hosted for various departments • including 62 Windows Terminal Servers Update on Computing at CERN - Frédéric Hemmer
OIS services significant numbers 15000 Linux systems (Quattor managed or updating from linuxsoft.cern.ch)5700 In the Computer Center3100 elsewhere at CERN6200 outside CERN 6’000 active NICE PCs, >1’500 Macs. Infrastructure of 300+ servers, including 120 ‘custom servers’ hosted for various dept.including 62 Windows Terminal Servers. 60 TB DFS workspace including 30 TB for Media Archive in collaboration with UDS, 15 TB Home Directories, 15 TB Project workspaces. 18’000 mailboxes, ~ 8’000 e-groups, 3.6TB of mail data, 40 production mail servers, ~ 2 M incoming messages/day, ~ 99% detected as spamFax service, 3’000 faxes/month, 1’700 users. 8’725 Web sitesincluding 845 SharePoint sites, 35 production Web servers,5.6 M hits/day, 2.2 TBytes/day transferred in June 2009 CERN Certification Authority: 5’000 user certificates, 9’000 host certificates issued. CERN Authentication usage increased 50’000 authentications/day, 180 applications registered. Update on Computing at CERN - Frédéric Hemmer
General Services (III) • Database and Application Deployment Services • Mainly based on Oracle software • AIS DBs and Applications, EDMS, Accelerator DBs, IT DBs, CASTOR DBs, Physics databases (Calibration, Alignment, etc...), Public J2EE Service, etc... • 120 General Purpose Databases, 240 TB of NAS storage • 130 Web /Application Servers with 700 virtual hosts • 50 Terabytes of worldwide replicated Physics databases • Engineering and Software Development Services • Mechanical and electronic CAE, field calculations, structural analysis, simulations, mathematics, etc • 50 packages, 1000 users • Twiki Service • 6000 users, 36’000 pages updated per month • Version Control Services (CVS/SVN) • 2000 users, 200 projects • Audiovisual Service: support, record and archive official committees and events • VideoConference Service: provide video conferencing in rooms across site • Video Conferencing System (Indico) • Distributed and used worldwide • CDS-Invenio, a Digital Library Open Source Software produced, used and maintained at CERN • free support via mailing lists • commercial-like support via a maintenance contract Update on Computing at CERN - Frédéric Hemmer
Invenio • Comprehensive solution for the management of document repositories of moderate to large size • Currently installed and in use by over a dozen scientific institutions worldwide. Some examples: • MeIND - HBZ NRW, Cologne, Germany • EPFL Infoscience - Lausanne, Switzerland • Aristotle University of Thessaloniki – Greece • Dipòsit de Documents, UniversitatAut. de Barcelona, Spain • RomDoc - UPB-CTTPI, Bucharest, Romania • RepozytoriumEnyPolitechnika - Wroclaw Univ. of Tech., Poland • Pacific Rim Library - Hong Kong • Academic Repository of Rwanda – KIST, Kigali, Rwanda • Being deployed: AstroParticle Data System, NASA/Smithsonian, ILO Update on Computing at CERN - Frédéric Hemmer
Invenio in Africa A digital library workshop initiated by UNESCO was organized in Sep 2009 in Kigali to train librarians on how to use CDS-Invenio, attended by librarians from Rwanda, Cameroun, Ghana and Mozambique The Academic Repository of Rwanda was set up and additional training was provided by CERN A fruitful collaboration is on-going with the Kwame Nkrumah University of Science and Technology, Kumasi, Ghana Invenio solution is under consideration for the Documentation Centre on Genocide documents and for the Parliament of Ghana documentation Update on Computing at CERN - Frédéric Hemmer
Networking • Design, implementation and support of CERN’s internal and external networking infrastructure in support of desktop, technical and scientific computing • Several 10 Gbps backbones in a multi manufacturer environment: • GPN, TN, ENs, LCG and External networks • Switching capacity of the internal LCG network is 4.8Tbps • Deployment and maintenance of network star points and wireless network services • More than 400 star points and ~50 000 UTP sockets • ~450 wireless base stations • Management CERN Firewall and provision of Internet connectivity • >10 Gbps Internet connectivity • Internal and external firewalling • Development and management of tools for network and telecom monitoring, user request provisioning and issue tracking • Central database for automatic network equipment configuration • About 19 000 network and telecom connectivity requests per year Update on Computing at CERN - Frédéric Hemmer
Fibre cut during STEP’09: Redundancy meant no interruption Update on Computing at CERN - Frédéric Hemmer
Telephony and CIXP • Provision and support of telephony services • Telephone exchange network of 10 000 lines • IP telephony, Audio conferencing, Switchboard, Call centres • GSM Mobile services • Dedicated VPN of more than 4300 subscriptions • Including LHC Tunnel coverage • VHF network for the fire brigade • Integrated operation services for both network and telecom services • Several support contracts using same software tools • Management of the CERN Internet Exchange (CIXP) • Around 40 clients (telco operators, institutions) Update on Computing at CERN - Frédéric Hemmer
Computer Security Update on Computing at CERN - Frédéric Hemmer
Tier 0 at CERN: Acquisition, First pass processingStorage & Distribution Update on Computing at CERN - Frédéric Hemmer
WLCG Grid Computing • Tier-0 (CERN): • Data recording • Initial data reconstruction • Data distribution • Tier-1 (11 centres): • Permanent storage • Re-processing • Analysis • Tier-2 (~130 centres): • Simulation • End-user analysis Update on Computing at CERN - Frédéric Hemmer
Data & Storage Services • The data management challenges • Storing 15’000’000 gigabytes of data every year • Ensure that any file, including the smallest kilobyte is available, anywhere from the internet, within a short time (small latency) • Cope with ever-changing storage technologies • Long term data preservation • All past data must be kept readable for the future • Castor • Software for data management at CERN (Castor) and for partner data centres • 1000 servers, 2000 cores, 10000 disks • AFS distributed filesystem operations (30 TB, 500 Million files) • Backups (~2 PB, 1 Billion files) Update on Computing at CERN - Frédéric Hemmer
Data transfers 2009: STEP09 + preparation for data Final readiness test (STEP’09) Preparation for LHC startup LHC physics data Nearly 1 petabyte/week Real data – from 30/3 Castor traffic: > 4 GB/s input > 13 GB/s served Update on Computing at CERN - Frédéric Hemmer
Readiness of the computing CMS ATLAS LHCb • Has meant very rapid data distribution and analysis • Data is processed and available at Tier 2s within hours! Update on Computing at CERN - Frédéric Hemmer
Today WLCG is: e.g. CMS: no. users doing analysis • Running increasingly high workloads: • Jobs in excess of 650k / day; Anticipate millions / day soon • CPU equiv. ~100k cores • Workloads are: • Real data processing • Simulations • Analysis – more and more (new) users • Data transfers at unprecedented rates Update on Computing at CERN - Frédéric Hemmer
Impact of the LHC Computing Grid in Europe • LCG has been the driving force for the European multi-science Grid EGEE (Enabling Grids for E-sciencE) • EGEE is now a global effort, and the largest Grid infrastructure worldwide • Co-funded by the European Commission (Cost: ~170 M€ over 6 years, funded by EU ~100M€) • EGEE already used for >100 applications, including… • Archeology • Astronomy • Astrophysics • Civil Protection • Comp. Chemistry • Earth Sciences • Finance • Fusion • Geophysics • High Energy Physics • Life Sciences • Multimedia • Material Sciences • … >250 sites 48 countries >50,000 CPUs >20 PetaBytes >10,000 users >150 VOs >150,000 jobs/day Update on Computing at CERN - Frédéric Hemmer
Similarity Search Measurement of Pulmonary Trunk Temporal Modelling RV and LV Automatic Modelling Visual Data Mining Surgery Planning Genetics Profiling Treatment Response Personalised Simulation Inferring Outcome Semantic Browsing Biomechanical Models Tumor Growth Modelling
Grid related collaborating projects and other IT-EC projects • ~39 2-yr project-funded posts (~26 EGEE posts) – on project and IT group-related activities • Total requested EC contribution (except PARTNER project): 12.1 M EUR Update on Computing at CERN - Frédéric Hemmer
Collaboration with Industry: CERN openlab • A science – industry partnership to drive R&D and innovation • Started in 2002, now in phase 3 Motto: “you make it – we break it” • Evaluates state-of-the-art technologies in a very complex environment and improves them • Test in a research environment today what will be used in industry tomorrow • Training: • CERN School of Computing • openlab student programme • Topical seminars Update on Computing at CERN - Frédéric Hemmer
CERN openlab phase III • Covers 2009-2011 • Status • Partners: HP, Intel, Oracle and Siemens • Topics • Global wireless coverage for CERN (HP Procurve) • Power-efficient solutions (Intel) • Performance Tuning (Oracle) • Control systems and PLC security (Siemens) • Advanced storage systems and/or global file system (partner to be identified) • 100Gb/s networking (partner to be identified) Update on Computing at CERN - Frédéric Hemmer
openlab people: students in 2009 Update on Computing at CERN - Frédéric Hemmer
Collaboration with Institutions: UNOSAT Update on Computing at CERN - Frédéric Hemmer