140 likes | 154 Views
Learn how Google and other search technology leaders are working to make government information more accessible and user-friendly. Gain new insights on embracing technologies to increase your agency's web presence.
E N D
A Needle in a Haystack: What Web Users Are Searching For:The Federal Sitemaps Initiative Brand Niemann, US EPA & Co-Chair SICoP Excellence in Government Conference Washington Convention Center Breakout Session II: April 4, 11:15 am to 12:15 pm Google: Federal Sitemaps Google: SICoP
Prospectus • 'A Needle in a Haystack: What Web Users Are Searching For.‘ • The federal government is both the world's largest information source and the inventor of the Internet. So why is it so hard for federal employees and citizens to find the information they need? Hear how Google and other leaders in Internet search technology make government more open, transparent and customer-focused. Gain new perspectives on how to embrace technologies to increase your agency's presence on the Web."
Agenda • Moderator: Jon Desenberg • PerformanceWeb.Org • Google: JL Needham • Sitemaps at FOSE 2007 and the need for agencies to balance their investment in web and site search (see next slide). • Science.Gov: Walt Warnick • OSTI's specific (and ongoing) experience with implementing sitemaps to make deep web information accessible to researchers using search engines. • State Department, Luigi Canali • Managing its web publishing centrally and, in particular, implementing sitemaps to ensure automatic communication of newly added content to Google. • Federal CIO Council’s Sitemaps Initiative: Brand Niemann • Broader policy context and the value of the Federalgovernment as a whole embracing the Sitemap protocol and similarstandards.
Web search vs. site search Supporting the two levels of search
Federal Government Context • Government information is estimated to be about 80% unstructured and about 90% of the structured information is estimated to be invisible to search engine crawlers and users. • In addition, because: • (1) the UK government recently announced that hundreds of their websites are being consolidated or shut down to make access to information easier for people and • (2) the recent SICoP Special Conference on Building DRM 3.0 and Web 3.0 in support of the Federal CIO Council Strategic Plan for FY 2007-2009 Goal 2 (Information securely, rapidly, and reliably delivered to our stakeholders) to provide implementation strategies, best practices, and success stories, • It seems appropriate to pilot a process that deals with all of these issues at the same time.
EPA Context Total: 27 Sample list of EPA sites with uncrawlable elements: http://spreadsheets.google.com/pub?key=pUb62ZKHnzgqEoGF4LFf3Gw
EPA Webmaster Experience • “Sitemaps as a method for discovering database content is something that I heartily endorse. It makes sense, and it's good to have a data standard for doing it. Google, et. al. are to be commended for that. Too bad it's such a minimalist protocol! As we work to expose database contents to our internal search engine, we will keep in mind the need to express that content in a Sitemap protocol as well. EIMS is our first target database, hopefully tackling it this spring.” Source: John Shirey, Notes on Federal Sitemaps Discussion, January 10, 2007.
EPA Pilot • March 15th, EPA Web Workgroup Presentation: • Objectives: • Structure unstructured EPA information. • Make EPA databases visible to search engine crawlers and users. • Consolidate EPA information to make it easier to use. • Provide semantic metadata and linking in support of DRM 3.0 and Web 3.0 applications. • Pilot Content: • The new EPA Strategic Plan, Report on the Environment, Enterprise Architecture, and Performance Results were used to illustrate the “long tail” of search (being successful with obscure queries). See http://colab.cim3.net/file/work/SICoP/2007-03-15/SICoPEPAWWG03152007.ppt
Policy Context • The CIO Council's XML Community of Practice (xml.gov) and the Semantic Interoperability Community of Practice (SICoP) encourage adoption and implementation of the Sitemap protocol by federal agencies because it: • Supports the E-Government Act of 2002 (Pub. L. No. 107-347). • Supports the Federal Enterprise Architecture's Data Reference Model 2.0. • Supports the SICoP DRM 2.0 Implementation - Knowledge Reference Model. • Supports the new CIOC Strategic Plan FY 2007-2009.
From Search to Knowing Source: Figure 10 in SICoP White Paper Series Module 2: Semantic Wave 2006 - Executive Guide to the Business Value of Semantic Technologies, May 15, 2006, Principal Author Mills Davis, Project10X.
From Search to Knowing • From bottom-to-top, the amount, kinds, and complexity of metadata, modeling, context, and knowledge representation increases. • From left-to-right, reasoning capabilities advance from (a) information recovery based on linguistic and statistical methods, to (b) discovery of unexpected relevant information and associations through mining, to (c) intelligence based on correlation of data sources, connecting the dots, and putting information into context; to (d) question answering ranging from simple factoids to complex decision-support, and (e) smart behaviors including robust adaptive and autonomous action.
From Search to Knowing • Moving from lower right to upper left, the diagram depicts a spectrum of progressively more capable categories of knowledge representation together with standards and formalisms used to express metadata, associations, models, contexts, and modes of reasoning. • As the amount and expressive power of the semantics and knowledge increases, so does the value of the reasoning capacity it enables.
Upcoming Events • April 25, 2007, SICoP Special Conference 2: Building Knowledgebases for Cross-Domain Semantic Interoperability • Google: DRM 3.0 and Web 3.0 • May 6-8, 2007, The 22nd Semi-Annual Spring Government CIO Summit Government by Wiki: New Tools for Collaboration, Information-Sharing, and Decision-Making. Web 2.0 Essentials for Government: Tying It All Together in a Service System.