10 likes | 96 Views
Information System Evolution. Enabling Grids for E-sciencE.
E N D
Information System Evolution Enabling Grids for E-sciencE The information system is a mission-critical component of the EGEE production infrastructure. It provides the detailed information about Grid services which is required to discover, select and use them during Grid related activities such as job and data management. The information system components are found throughout the infrastructure, and are especially sensitive to the information volume and query rate. As such it must be ensured that current components can meet the scalability requirements due to the growth of the infrastructure. An improved Berkley Database Information Index (BDII) [1] architecture is presented that has the potential to meet these future requirements. The new architecture for the BDII consists of a standard LDAP database which is updated by an external process. The update process obtains LDIF from a number of sources and merges them. It then compares this to the contents of the database and creates an LDIF file of the differences. This is then used to update the database. The aim of this approach is to reduce complexity within the BDII and speed up the update cycle, therefore enabling more data to be handled in a given time period. This increased efficiency can be directly seen from viewing the graph below, which shows the once minute load average before and after upgrading from BDII v4 to BDII v5. Overview BDII v5 Query 2170 LDAP Update Query One minute load average before and after upgrading Update LDIF LDIF DIFF LDAP_ADD LDIF Improved Performance! LDAP_ADD Merge New LDIF Provider LDAP_MODIFY Plugin Log Scale! The growth of the number of sites, cores and jobs per day The Glue[2] information model version 2.0 is an official recommendation from the Open Grid Forum [3]. It consolidates over 4 years of production experience with the Glue 1.x series. A common information model is required to facilitate interoperation between Grid infrastructures, and the definition of version 2.0 in an open forum will increase its adoption by other infrastructures. Migrating the EGEE information system from Glue 1.3 to 2.0 will occur in three stages. Firstly the information system will be updated to support both versions. Secondly the information providers will be updated to produce both 1.3 and 2.0 information. Finally, applications can start migrating from using version 1.3 to 2.0. Glue 1.3 information will only be removed once applications have migrated to version 2.0. GLUE 2.0 The graph above shows that the rate of increase with respect to the number of sites joining the infrastructure is slowing; however, for the number of cores and jobs per day it is increasing. Assuming a growth rate of 50 sites per year, by 2015 there could potentially be 550 sites. Each new site would contribute more fundamental services, users and resources. Assuming an exponential growth rate for the number of cores and computing activities (jobs), by 2015 the number of cores in the EGEE infrastructure could reach 500,000 and the number of jobs per day could reach 2 million. Infrastructure Growth User Domain Admin Domain Negotiates Share with Provides Service Manager Contacts Manages End Point Share Resource Maps User to Defined on Runs Has Has Access Policy Mapping Policy Activity The information changes in the information system were monitored by recording the modified entries during each BDII update. Over a period of 9 days the changes for 1932 update cycles were recorded, which corresponds to approximately one update cycle every 7 minutes. A graph of the number of changes per cycle can be seen above. The average number of entries modified per update cycle was 12771 which corresponds to 21.8% of the total number of entries. A further investigation was conducted to find out how often each attribute type was changed and the results can be found in the table above. 97.8% of the changes are confined to 14 attributes which is only 4% of the total attributes used. In the current implementation all the entries are transported and updated during each cycle, which is inefficient. Investigation into the frequency of changes With the information being inserted in to the resource BDIIs as modifications to the database, this opens up number of possibilities. One possibility is to use LDAP replication mechanisms to automatically propagate these changes to the higher levels in the system. This would be a possibility for the site level BDIIs and would reduce the latency between the update of the resource BDII and the site level BDII. Due to the use of the Freedom of Choice for Resources (FCR) [4] mechanism, it may not be possible to use LDAP replication technologies. To improve efficiency in this case a compressed content exchange mechanism could be employed or the FCR mechanism may need to be re-evaluated. Future Directions References: [1] http://twiki.cern.ch/twiki//bin/view/EGEE/BDII [2] http://forge.gridforum.org/sf/projects/glue-wg [3] http://www.ogf.org [4] https://lcg-fcr.cern.ch:8443/fcr/fcr.cgi Authors: M. W. Schulz and L. Field CERN-IT EGEE-III INFSO-RI-222667 Laurence.Field@cern.ch