250 likes | 333 Views
Level 3 Review. Short Term. NT Support. Long Term. The Level 3 Group. Michael Clements Doug Chapin Dave Cutts Andy Haas Sean Mattingly Gordon Watts. Short Term. Goal. Build a L3 Filter that can handle the data rates between now and September 1 (or start of Linux farm). Bug fixes too.
E N D
Level 3 Review Short Term NT Support Long Term The Level 3 Group Michael Clements Doug Chapin Dave Cutts Andy Haas Sean Mattingly Gordon Watts
Short Term Goal Build a L3 Filter that can handle the data rates between now and September 1 (or start of Linux farm). Bug fixes too. Requires • Faster build/release system • Verification (minimal). • Better filter author access to NT development environment & experts
Faster Build/Release System New Machine • Initial tests indicate a x2 speed up. • Multiple logins allowed (flexibility) Status • Final release system script testing • Porting of packages to NT5 (we know what has to be changed) • Commit to schedule Do not think there is any real development work left here.
Faster Build/Release System Improve Build System Bug Fixes? • Initial experiments indicate ctest_nt x10 faster than gmake • Recent upgrade allows one to run tests inline. Status • Must be incorporated into release structure • How to deal with “legacy” packages that don’t support the CTEST interface. Likely 1 week of development work here, then some amount of integration time. Initially L3 group, then L3 group & release managers
NT5 Port Status • Done • Changes must be fed back into cvs • Some have (SSS) • Most script changes • Less than 10 individual changes • All Scripts • Full Build system must be tested • Most complex parts have been tested (bldpara) • Initial scripts, etc. Less than a week of work for L3 Group, Rel Mgrs. Simple Author Updates
Verification Verification System ~7000 events/hour • The old build system • more effective use of the 4 cpus. • Have money for extra disk • Keep local copies of files. • Initially by hand, later by script Status • Have to use build systems until verification is ready • Just starting (ftp doesn’t like more than 256 character file names!). 2 weeks of development time & setup time. Requires both L3 Group and Filters Group
Better Development Author Node • Maintained environment for users to develop code • No more installs on own machine • Fast machine. Go to Central Processor Model. • Install just too complex • All platforms, particularly NT Status • One up and going. Configuration complete? • Unix access • Second one ordered • $$ left over for third and possibly fourth • So cheap now! No real work left (one works, others will).
Code State State Of Code • Minimal number of changes to get nt41 to build • Looks like nt43 went! • Test-Script Work • Package authors should translate filename locations using cygwin • About half already do. Status • New Tools Coming in (CPS, Tracking) • Most Errors are understood by us • What can we do to help get them fixed? Feedback Loop? Figuring out the error requires very little time
Expert Access Contacting Us? • Since last week there has been a flood • Almost too much for us to keep up with! • Most questions are one time fixes • Get them fixed, will not pop up again. • Quite gratified with the communication • SMT tracking, Jet Finding, real filtering in the control room! • Many Many people contributed to this! Always room for improvement; ears & Inbox open
Long Term Goal Maintainable and smooth process for building, developing, etc. L3 Executables. Complete verification. Requires • Faster build/release system • Verification • Better filter author access to NT development environment & experts
Build System Improvements • Recommend moving away from SRT_D0 no matter what • Horribly inefficient build system. • On NT, move away from complex cygwin dependencies • Get rid of gmake if at all possible • Base work on ctest_nt • Fast, already well supported by L3 Group Schedule • Proof of principle by July 1. • First production by August 1. • Improvements through fall. 0.5 FTEs for duration
Build System Maintain It • Once stable take the on-purpose view it need not be modified often • L3 filters have a much more restrictive environment than offline packages. • Few extra feature for L3 do not compete with all of L3 • Less changes, more stability.
Doing Builds Goal • Make build system boring • Close to Ferbelizing it • Remove hand art. • Will still need someone to do builds continuously • Target: 0.25 FTE! 0.5 down to 0.25 FTEs for first year Decreases until Run 2b upgrade due to reduce changes??
Verification What Can One CPU Do? Email from Amber • Same speed as L3 Farm • Not memory limited, etc. • 100ms per event • 0.25 MB/event L1/L2 Issues • 0.86 million events/day • 2.5 MB/sec • 210 Gig Bug Fixes approximately one day’s worth of processing • 2-4 nodes, dual CPU, data local for speed Or, use production system…
Verification Production Release • Order of magnitude larger • 10-20 million events • Terabyte of disk space ($2K?) • SAM Integration
NT Environment Author Nodes • Just implemented • Undoubtedly will need refinement. • Increase number of nodes depending upon demand • Each node is very cheap (l3 group) 0.1 FTEs, but in clumps Maintain Author Nodes 0.1 FTEs for life of experiment (L3 group & Rel Mgrs?)
Software Changes OS Upgrades • MS’s slavish commitment to backwards compatibility has some benefits • Level 3 is a conservative system • Imagine OS upgrades will be about once every two years. Testing: 0.05 FTE (based on NT4->NT5 experience). Implementing: 0.05 FTE Some true no matter what we do
Farm Management OS, general management • Automatic tools (AutoStart, etc.). • Minimal security updates (default deny ACLs). 0.05 FTEs (L3 Group) General Management • Configure changes, etc. • All based on the web True no matter the scheme we pick 0.05 FTEs (L3 Group)
Expert Access Brown & UW Commitment We will maintain the Level 3 Trigger/DAQ no matter what its form while the experiment takes data! Providing NT expertise to the experiment is part of this commitment! 0.1 FTE, in clumps (L3 Group)
Where are we now? Releases nt44 is an excellent release Small set of filters in release Several more poised to come into release. Build System Works, but slow ctest_nt improvements with eye to release building Close to ready to move to build machine Verification Ignoring L3 Farm Issues Just starting Needs our effort Will have to occur no matter what we do
Doing Both Now Linux L4 development work June 1 Minimal Set of Filters Sept 1 Sept 1 Debugged Filters + unpacking Linux L4 Farm Initially Ready
Option 2 Lots of CPU L3 Node Linux Node Front End Crates SB L3 Node Linux Node Feynman (SAM) Online C/R L3 Node Linux Node Examines L3 Node Linux Node MCH FCH FCH FCH Feynman
Possible 3rd Alternative Use Level 3 as a pre-filter/pipeline Raw Bandwidth: 5 mb/sec GB already? Unpacked bandwidth/node: 20 mb/sec Simple Filters and Tools: x5 reduction 200 Hz out Unpacked bandwidth/node: 4 mb/sec Simple Static • Make use of idle CPU power • Maintain what you’ll have by the end of September
Conclusions • Short term effort • Has a well functioning system by end of summer/fall • Long Term • Supportable and maintainable • What is missing? • Better understanding of verification • If go the Linux route have to support both; not thought through. • Third alternative • Makes use of available CPU • Maintains current investment work.