120 likes | 263 Views
Les Cottrell SLAC for SCCS core services network group (Antonio Ceseracciu, Jared Greeno,Yee Ting Li, Gary Buhrmaster), Presented at the OU Admin Group Meeting October 16, 2008 www.slac.stanford.edu/grp/scs/net/racks/netmove-oct08.ppt. Network Move & Upgrade 2008/2009: October 2008.
E N D
Les CottrellSLAC for SCCS core services network group (Antonio Ceseracciu, Jared Greeno,Yee Ting Li, Gary Buhrmaster), Presented at the OU Admin Group Meeting October 16, 2008 www.slac.stanford.edu/grp/scs/net/racks/netmove-oct08.ppt Network Move & Upgrade 2008/2009:October 2008
Why move • ~ 70 Building Switches connected to old core switch that has to “move” to seismically retrofitted area • While at it, replace old, beyond end of life, limited capability switches to provide better service
Move Types • Already done: Kavli, MCC, LCLS, SSRL (70 switches, 17 need replacing, will probably need to re-address later, SSRL decision) • Migrate: Switch beyond end of life, features missing (auto negotiation, higher speeds) = replace switch, connect to new core, re-address hosts • (CGB1), TL1, (WHS), CLA1, CLA2, 280, CL1..2, B267, CGB3 • Move - 1: Switch OK = use same switch but connect to temporary core switch, readdress later (after April 15th 2009) • B214, B031, B210, B005, B275, B279, CLR113, CLR224, CLR343, HFB1, HFB2, MCC-CORE1..2, MCC- WAPCORE1..2, ROB, Research Yard: SWH-RY, B062, B104A, B113, B121, B124, B128, B211, B225, B231, B420 • Move – 2: Switch beyond end of life etc., but not central responsibility to upgrade = connect to temporary core switch • Guest House has 2, PEP ring has 4 but ring de-commissioned at moment • Move – 3: Switch shares trunk cable, requires long (2 days) workday outage, or overtime (cost depends on what cables have to move etc., estimating costs probably $5K (2 technicians for 2 days) • Guest House 1 &2, ESA, CRYO, IR12, CGB2 (1 day). • Will send an email to OU Admins with head’s up so can contact and warn users, get account if need non-working hours and schedule.
Long Outage Switches • Contact users, group leaders to see if can take outage in normal work hours or get an account for overtime (could be $5K), schedule outage • ESA (21): Tyler Adams (11), Nicholas Arias (2), Rafael Gomez (5), Zen Szalata (3) • Cryo (7 hosts): Agustin Burgos (5), Tom Galeto (2) • IR12 (4 hosts): Tala Cadorna (1), Raymond Lo (3) • www.slac.stanford.edu/grp/scs/net/racks/slaconly/switches/ gives details of hosts on switches
Experience with Moves • Moves are easy: • Each building switch has two (for redundancy) fibre pairs to two old core routers on to B050 floor 2 • Prepare port in 2 temporary (probably ~ 1 year) switches in seismically retrofitted area • Identify pairs and prepare jumpers • Move backup pair to backup temporary switch • Move primary pair to primary temporary switch. • Two ~ 5 second outages, users unlikely to notice. • No need for detailed coordination with OU admins, users, can do whenever we get to it etc. • Could publish a schedule in future to all OU admins, but will require more effort, scheduling, easier to notify when done, or 5 mins before do it
Migrations • Require re-addressing & close coordination • ID Admins (can be many) & switch ports etc. create web page documenting what has to be done, addresses, set up tracking tickets etc. • Email to admins request them to validate CANDO info and read web page: • Three types of hosts: printers, SLAC only, open access to world. • Meet with admins, explain, schedule time • Install replacement switch when appropriate, configure • With each admin, a network tech and a network engineer move cables one by one from old switch to replacement, re-address host, check things work etc. • During or shortly after migration, network engineer will update CANDO with new IP address. • To date, have been migrating all of one OU Admin’s machines at a time.
Migration Experience • Two switches almost done (CGB1, TL1), elapsed > 1 week • Difficult, labor intensive, requires lots of coordination, availability, impacts users • Problems with devices not being in the documented place, patch panel labeling being wrong, patch cables not being long enough • Be wary of old, non-standard devices • Devices that have been turned off do not show up on our spreadsheets • Takes time to get print queues changed on Windows, but can be requested in advance • Will be setting a hard deadline depending on # devices etc.
Lessons learned • New networks require different subnet mask and default gateway; make sure this is clear. • Make sure all devices have an IP assigned in advance to reduce confusion. • Confirm which devices should be SLAC Only (IFZ) vs Public in advance. • When replacing the switch, can take up to 15 minutes per device (walk to machine, log in, change IP, request cable change, test), so be prepared and patient. • Use ipconfig /registerdns on Windows computers to make sure Windows DNS gets updated, then test and inform windows-admin if IP is still wrong. • Still working on developing automation to change Windows system IPs.
Progress Temporary switches CORE3OLD in seismically retrofitted area – Sep 08 Need reconfig and connect up & CORE4OLD CORE4OLD in seismically retrofitted area – Oct 08 CORE4OLD in place too
Documentation • See “Seismic Retrofitting Rack Move 2008” site • https://confluence.slac.stanford.edu/display/NetMan/Seismic+Retrofitting+Rack+Move+2008 • Contains background information, overview of procedures, milestones, drill down to lots more details (tickets, spreadsheets, subnet allocations, hosts on individual switches etc.) • This is where to go to get detailed information. It is very dynamic. • If you need more, let us know we will add as appropriate • Email to core-neteng • There is an FAQ at https://confluence.slac.stanford.edu/display/NetMan/Frequently+Asked+Questions
New Area • New area circa Aug 21 ‘08 • New area circa Oct 15 ‘08 • New area circa Sep 18 ‘08
Central Routers CORE3OLD in seismically retrofitted area – Sep 08 Need reconfig and connect up & CORE4OLD • SWH-CORE1&2-OLD in old racks