60 likes | 204 Views
WP1 WMS release 2: open issues. Massimo Sgaravatto INFN Padova. Open issues/missing functionalities. Memory leaks in NS (bug #2104) Has the problem been investigated ? Logging by WM, JC and LM fail when SSL problems using user proxy (bug #2016) Already addressed by WM, JC and LM
E N D
WP1 WMS release 2: open issues Massimo Sgaravatto INFN Padova
Open issues/missing functionalities • Memory leaks in NS (bug #2104) • Has the problem been investigated ? • Logging by WM, JC and LM fail when SSL problems using user proxy (bug #2016) • Already addressed by WM, JC and LM • Still to be done by NS (it always uses host proxy) • More clear error messages when no resources found with edg-job-list-match (bug #1997) • As already done with edg-job-submit • Not completely done
Open issues/missing functionalities • Problem with resubmission: CEs already “used” are not considered anymore (bug #1103) • Now also LCG is requiring a fix • Not abort immediately a job in case of problems (RLS or II down), but retry for a while (bug #1812) • Matchmaking should be retried till a certain TimeLimit=Min(TimeLimitJDL, TimeLimitConf) • To be addressed in rel. 3 (DAGMan) • Documentation: Gangmatching note missing • First version produced
Open issues/missing functionalities • Job stays in waiting status when there is an authorization problem when submitting to the CE (bug #2439) • To be addressed by LM • There is matchmaking even if not all files specified as InputData exist (i.e. if not all files specified as InputData are registered in the RLS) (bug #2434) • Waiting for feedback by users (LCG) • ‘Resubmitted’ flag is not set in the output of edg-job-status (bug #2269) • edg-job-get-output does not created the output directory (bug #2099) • Which mode and ownership ? • Default output directory is /tmp/joboutput, which gets deleted by tmpwatch (bug #2456) • Let’s use ~/joboutput (as suggested in the Iteam) ? • Need also fix for #2099
Open issues/missing functionalities • Job refusal at NS reported incorrectly (bug #2357) • Transfer-OK (by UI) + Refused (by NS) • Should be: Transfer-OK + Abort or Transfer-fail + Refused • Double-counts in edg-job-status –all • To be addressed by UI ? • edglog.log is always produced (bug #2440) • Problems with MPI jobs for LCG CEs (bug #2455) • This is because the LRMSType is taken from the GlueCEUniqueId and not from the GlueCELRMSType attribute • GlueCEUniqueId for EDG: lxde01.pd.infn.it:2119/jobmanager-pbs-short • GlueCEUniqueId for LCG:ce1.cern.ch:2119/jobmanager-lcgpbs-lng
Open issues/missing functionalities • Job matchmaking with group/role • Requirement coming from LCG to have more fine grained matchmaking than VO level • VO, groups, roles and capabilities are packed into a simple string in the VOMS proxy • /VO[/group[/subgroup(s)]][/Role=role][/Capability=cap] • In the IS groups, roles, etc. can be published • GlueCEAccessControlBaseRule: VO:/cms • GlueCEAccessControlBaseRule: VO:/lhcb/production • LCG would like this for the January LCG-2 upgrade