310 likes | 463 Views
Scientific Computing at Fermilab. Our “Mission”. Provide computing, software tools and expertise to all parts of the Fermilab scientific program including theory simulations (e.g. Lattice QCD and Cosmology), and accelerator modeling
E N D
Our “Mission” • Provide computing, software tools and expertise to all parts of the Fermilab scientific program including theory simulations (e.g. Lattice QCD and Cosmology), and accelerator modeling • Work closely with each scientific program – as collaborators (where a scientist/staff from SCD is involved) and as valued customers. • Create a coherent Scientific Computing program from the many parts and many funding sources – encouraging sharing of facilities, common approaches and re-use of software wherever possible • Work closely with CCD as part of an overall coherent program Scientific Computing - Fermilab S&T Review, Sept 5 2012
A Few points • We are ~160 strong made up of almost entirely technically trained staff • 26 Scientists in the Division • As the lab changes its mission, scientific computing is having to adapt to this new and more challenging landscape. • Scientific Computing is a very “matrixed” organization. I will not try to cover all we do but pick and choose things that are on my mind right now…
Scientific Discovery – the reason we are here • The computing capability needed for scientific discovery is bounded only by human imagination • Next Generation of Scientific Breakthroughs • Require major new advances in computing technology • Energy-efficient hardware, algorithms, applications and systems sofware • Data “Explosion” – Big Data is here • Observational, sensor networks, and simulation • Computing/Data Throughput challenges
About to Experience a Paradigm Shift in Computing • For the last decade – GRID and computing resources has been very stable • However…. • End of Moore’s Law is looming • New Computing technologies on the near horizon • Phase change memories • Stacked dies • Exponential grown in parallelism in HPC • IBM Blue Gene leading the charge • Heterogeneous systems delivering higher performance/watt (Titan) • Power is a constraint • Programmability…
Computing Landscape will change.. • HEP is going to have to adapt to this changing world • While the future for the next few years is clear, we don’t really know where we will be in the next decade or 20 years • Ultimately market forces will determine the future • We need to turn this into a positive force for both High Energy Physics and High Performance computing.
And Today…. Lattice b_c machine… Ask Yourself… Has Computing Gotten any Easier in the last 30 years?
Starting to Make the Bridge to the Future New Funding Initiatives • COMPASS Scidac Project (3 year) $2.2M/year • US Lattice QCD Project (5 year) ~$5M/year • Geant4 Parallelization -- joint with ASCR (2 year) $1M/year • CMS on HPC machines(1 year) $150k • PDACS – Galaxy Simulation Portal – joint with Argonne (1 year) $250k • Science Framework for DES (1 year) $150k • Tevatron Data Preservation (2 year) $350k/year • Partnering with NSF through OSG • Will be aggressive in upcoming data and knowledge discovery opportunities at DOE
Geant4 • Workshop Held • between HEP and ASCR • Discussed how to transform GEANT4 to run efficiently on modern and future multi-core computers and hybrids • Workshop chairs were Robert Lucas (USC) and RR. • Funded for $1M/year for 2 years • Here: Algorithmic development to be able to utilize multi-core architectures and are porting G4 sections to the GPUs)
CMS • CMS would like to maintain current trigger thresholds for 2015 run to allow full Higgs Characterization • Thus nominal 350hz output would increase to ~1khz. • Computing budgets expected to remain constant – not grow. • Need to take advantage of leadership class computing faciilites • Need to incorporate more parallelism into software • Algorithms need to be more efficient (faster)
PDACS • Portal for Data Analysis Services for Cosmological Simulation. • Joint Project with Argonne, Fermilab, and NERSC • Salman Habib (Argonne) is the PI • Cosmological data/analysis service at scale – a workflow management system • Portal based on that used for computational biology – idea is to facilitate analysis/simulation effort for those not familiar with advanced computing techniques Dark energy, matter Cosmic gas Galaxies Simulations connect fundamentals with observables
Data Archival Facility • Would like to offer archive facilities for broader community • Will require work on front ends to simplify for non HEP Users • Had discussions with Ice Cube One of Seven 10k slot tape robots at FNAL
CMS Tier 1 at Fermilab • The CMS Tier-1 facility at Fermilab and the experienced team who operate it enable CMS to reprocess data quickly and to distribute the data reliably to the user community around the world. • We lead US and Overall CMS in Software and computing • Fermilab also operates: • LHC Physics Center (LPC) • Remote Operations Center • U.S. CMS Analysis Facility
Intensity Frontier Strategy • Common approaches/solutions are essential to support this broad range of experiments with limited SCD staff. Examples include ArtDAQ, ART, SAM IF, LArSoft, Jobsub,… • SCD has established a liaison between ourselvs and experiments to insure communication and understand needs/requirements • Completing the process of establishing MOU’s between SCD and experiment to clarify our roles/responsibilities
Intensity Frontier Strategy - 2 • A shared analysis facility where we can quickly and flexibly allocate computing to experiments • Continue to work to “grid enable” the simulation and processing software • Good success with MINOS, MINERvA and Mu2e • All experiments use shared storage services – for data and local disk – so we can allocate resources when needed • Perception that intensity frontier will not be computing intensive is wrong
artdaq Introduction artdaq is a toolkit for creating data acquisition systems to be run on commodity servers • It is integrated with the art event reconstruction and analysis framework for event filtering and data compression. • provides data transfer, event building, process management, system and process state behavior, control messaging, message logging, infrastructure for DAQ process and art module configuration, and writing of data to disk in ROOT format. • The goal is to provide the common, reusable components of a DAQ system and allow experimenters to focus on the experiment-specific parts of the system. This software that reads out and configures the experiment-specific front-end hardware, the analysis modules that run inside of art, and the online data quality monitoring modules. • As part of our work in building the DAQ software system for upcoming experiments, such as Mu2e and Darkside 50, we will be adding more features • .
artdaq Introduction We are currently working with the DarkSide-50 collaboration to develop and deploy their DAQ system using artdaq. • The DS-50 DAQ reads out ~15 commercial VME modules into four front-end computers using commercial PCIe cards and transfers the data to five event builder and analysis computers over a QDR Infiniband network. • The maximum data rate through the system will be 500 MB/s, and we have achieved a data compression factor of five. • The DAQ system is being commissioned at LNGS, and it is being used to collect data and monitor the performance of the detector as it is being commissioned. (plots of phototube response?) artdaq will be used for the Mu2e DAQ, and we are working toward a demonstration system which reads data from the candidate commercial PCIe cards, builds complete events, runs sample analysis modules, and writes the data to disk for later analysis. • The Mu2e system will have 48 readout links from the detector into commercial PCIe cards, and the data rate into the PCIe cards will be ~30 GB/s. Event fragments will be sent to 48 commodity servers over a high-speed network, and the online filtering algorithms will be run in the commodity servers. • We will be developing the experiment-specific artdaq components as part of creating the demonstration system, and this system will be used to validate the performance of the baseline design in preparation for the CD-review early next year.
Cosmic Frontier • Continue to curate data for SDSS • Support data and processing for Auger, CDMS and COUPP • Will maintain an archive copy of the DES data and provide modest analysis facilities for Fermilab DES scientists. • Data management is an NCSA (NSF) responsibility • Helping NCSA by “wrappering” science codes needed for 2nd light when NCSA completes its framework. • DES use Open Science Grid resources opportunistically and will make heavy use of NERSC • Writing Science Framework for DES – hope to extend to LSST • Darkside 50 writing their DAQ system using artDAQ
Tevatron (Data) Knowledge Preservation • Maintaining full analysis capability for next few years though building software to get away from custom sys. • Successful FWP funded. hired TWO domain knowledgeable scientists to lead the preservation effort on each experiment(and 5 fte of SCD effort) • Knowledge Preservation • Need to plan and execute the following… • Preserving analysis notes, electronic logs etc • Document how to do analysis well • Document sample analyses as cross checks • Understand job submission, db, and data handling issues • Investigate/pursue virtualization • Try to keep CDF/D0 strategy in synchand leverage common resources/solutions
Synergia at Fermilab • Synergia is an accelerator simulation package combining collective effects and nonlinear optics • Developed at Fermilab, partially funded by SciDAC • Synergia utilizes state-of-the-art physics and computer science • Physics: state of the art in collective effects and optics simultaneously • Computer science: scales from desktops to supercomputers • Efficient running on 100k+ cores • Best practices: test suite, unit tests Weak scaling to 131,072 cores Synergia is being used to model multiple Fermilab machines Main Injector for Project-X and Recycler for ANU Mu2e: resonant extraction from the Debuncher Booster instabilities and injection losses
Synergia collaboration with CERN for LHC injector upgrades Individual particle tune vs. initial position • CERN has asked us to join in an informal collaboration to model space charge in the CERN injector accelerators • Currently engaged in benchmarking exercise • Current status reviewed at Space Charge 2013 workshop at CERN • Most detailed benchmark of PIC space charge codes to date, using both data and analytic models • Breaking new ground in accelerator simulation • Synergia has emerged as the leader in fidelity and performance • PTC-Orbit has been shown to have problems reproducing individual particle tunes Synergia results are smooth and stable PTC-Orbit displays noise and growth over time Phase space showing trapping benchmark Synergia
SCD has more work than human resources • Insufficiently Staffed at the moment • Improving Event generators – especially for Intensity Frontier • Modeling of neutrino beamline/target • Simulation effort – all IF experiments want more resources; both technical and analysis • Muon Collider Simulation – both accelerator and detector • R&D in Sofware definable networks
Closing Remarks • SCD has transitioned to fully support the Intensity Frontier • We also have a number of projects underway to prepare for the paradigm shift in computing • We are short handed and are having to make choices
MKIDs (Microwave Kinetic Inductance Devices) • Pixelated micro-size resonator array. • Superconducting sensors with meV energy gap. Not only a single photon detector: • Theoretically, allow for energy resolution (E/ΔE) of about 100 in the visible and near infrared spectrum. • Best candidate to provide medium resolution spectroscopy of >1 billion galaxies, QSO and other objects from LSST data if the energy resolution is improved to 80 or better, currently at ~16. Note that scanning that number of galaxies is outside the reach of current fiber based spectrometers. • An MKID array of 100,000 pixels will be enough to obtain medium resolution spectroscopic information for all LSST galaxies up to magnitude 24.5with an error . • High bandwidth: Allows for filtering of atmospheric fluctuations at ~100 Hz or faster.
Multi 10K-pixel instrument and science with MKIDs • PPD and SCD teamed up to build an instrument with a number of pixels between 10K and 100K. • External collaborators: UCSB (Ben Mazin, Giga-Z) , ANL, U. Michigan. • Potential collaboration: strong coupling with the next CMB instrument proposed by John Carlstrom U. Chicago and Clarence Chang ANL that also requires the same DAQ readout electronics. • Steve Heathcote, director of the SOAR telescope, Cerro Tololo, has expressed interest in hosting the MKID R&D instrument in 2016. (Ref. Steve Heathcote letter to Juan Estrada (FNAL)). • SOAR telescope operations in late 2016: • 10 nights x 10 hours/night. • would give a limiting magnitude of ~ 25. • Potential science (under consideration): Photometric redshift calibration for DES, Cluster of galaxies, Supernovae host galaxy redshift, Strong lensing. • SCD/ESE will design the DAQ for up to 100K pixel instrument. • 1000 to 2000 MKIDs per RF feed-line, 50 feedlines. • Input bandwidth: 400 GB/s • Triggerless DAQ. • Data reduction: ~200 MB/s to storage. • Digital signal processing for FPGAs, GPUs, processors, etc. • Status: • Adiabatic dilution refrigerator (ADR) functioning at Sidet. • Test of low noise electronics underway. • MKID testing to start this summer. • Electronic system design underway.
Open Science Grid (OSG) • The UScontribution and partnership with the LHC Computing Grid is provided through OSG for CMS and ATLAS Scientific Computing - Fermilab S&T Review, Sept 5 2012 The Open Science Grid (OSG) advances science through open distributed computing. The OSG is a multi-disciplinary partnership to federate local, regional, community and national cyberinfrastructures to meet the needs of research and academic communities at all scales. Total of 95 sites; ½ million jobs a day, 1 million CPU hours/day; 1 million files transferred/day. It is cost effective, it promotes collaboration, it is working!