1 / 32

WEB MINING

WEB MINING. Nuri Kayaoglu Humboldt University Master‘s Program in Economics and Management Sci. SEMANTIC WEB-MINING. Overview. Current Web Semantic Web Structure, components of SW Some key concepts Design facts Conclusion. Current Web. The web was pretty revolutionary, right?

elyse
Download Presentation

WEB MINING

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. WEB MINING Nuri Kayaoglu Humboldt University Master‘s Program in Economics and Management Sci. SEMANTIC WEB-MINING Web-Mining WS 01-02

  2. Overview • Current Web • Semantic Web • Structure, components of SW • Some key concepts • Design facts • Conclusion Web-Mining WS 01-02

  3. Current Web • The web was pretty revolutionary, right? • Before the web: systems like HyperCard • But the web was world-wide • Origin and Goals of the Web • Human communication through shared knowledge • Working together: Social efficiency, understanding and scaling • Exploitation of computing power in real life Web-Mining WS 01-02

  4. Current Web • Anyone with a server could • publish documents for the rest of the world, • hyperlink any document to any other document. • No matter where the servers were • if you could browse the page, then link. • These early days were exciting indeed. Web-Mining WS 01-02

  5. Current Web • Hyperlinking to everything in the universe is cool: but, it’s become rather boring. • Now, we have all of these documents linked together: question--> • isn't there something more we can do with them? Web-Mining WS 01-02

  6. Current Web • Web is a source of resources and links; • To a user this has become an exciting world. • Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully • To a machine, however, very little machine readable information is available. Web-Mining WS 01-02

  7. Semantic Web (SW) • The Web is huge but not very smart. • Computer scientists are beginning to build a „Semantic Web“ that understands the meanings that underlie the tangle of information. • Idea: weave a Web that not only links documents to each other but also recognizes the meaning of the information in those documents. • a task people can do quite well, but is a tall order for computers; e.g.: what do “head”, “cook” mean? Web-Mining WS 01-02

  8. Semantic Web vs. Current Web • “The Semantic Web is really data that is processable by machine.” Berners-Lee, director of W3C (father of the web) • Adding semantics will radically change the nature of Web: • from a place where information is merely displayed to one where it is interpreted, exchanged and processed. • Semantic-enabled search agents will be able to collect machine-readable data from diverse resources, process it and infer new facts. Web-Mining WS 01-02

  9. SW - Extension to Current Web • Ultimate goal of the Semantic Web: • Give users near omniscience over the vast resources of the Internet, turning the millions of existing database islands into a single gigantic database. • To a user, this will become even a more exciting world. • Realizing the full potential of the Web. Web-Mining WS 01-02

  10. Semantic-Web: an example • Gabriel, Aicha, mom • Mom needs to see a specialist and then has to have a series of physical therapy sessions. Biweekly or something. • So, set up an appointment! • Instruct the Semantic Web agent! • In a few minutes agents* provide the plan. • *: A piece of software that runs without direct human control or constant supervision to accomplish goals provided by a user • Thanks not to WWW of today but rather the Semantic Web that it will evolve into tomorrow. Web-Mining WS 01-02

  11. Semantic Web: Some features • The Web has developed most rapidly as a medium of documents for people rather than for data and information that can be processed automatically. • Semantic Web aims to make up for this. • Semantic Web will be as decentralized as possible (like the Internet). Web-Mining WS 01-02

  12. Components of SW • eXtensible Markup Language (XML): • A markup language like HTML that let individuals define and use their own tags, • has no built-in mechanism to convey the meaning of the user’s new tags to other users, • lets everyone create their own tags, hidden labels such as <zip code> that annotate Web pages or sections of text on a page. Web-Mining WS 01-02

  13. Components of SW • Resource Description Framework (RDF): • A scheme for defining information on the Web, • provides the technology for expressing the meaning of terms and concepts in a form that computers can readily process. • Meaning is expressed by RDF, which encodes it in sets of triples; • each triple being rather like the subject, verb and object of an elementary sentence, • these triples can be written using XML tags. Web-Mining WS 01-02

  14. Components of SW • RDF (cont.) • In RDF a document makes assertions that particular things • people, Web pages or whatever • have properties • such as “is a sister of”, “is the author of” • with certain values • another person, another web page. Web-Mining WS 01-02

  15. Components of SW • RDF (cont.) • Subjects and objects are each identified by a Universal Resource Identifier (URI), just as used in a link on a Web page. • A URI defines or specifies an entity, not necessarily by naming its location on the Web. • URLs, Uniform Resource Locators, are the most common type of URI. • Verbs are also identified by URIs. Web-Mining WS 01-02

  16. Components of SW • RDF (cont.) • WWW was originally built for human consumption. • although everything on it is machine-readable, this data is not machine-understandable. • Solution: Use metadata to describe the data contained on the web! • Metadata: Data about data (e.g.: library catalogpublications) Web-Mining WS 01-02

  17. Components of SW • RDF (cont.) • RDF is a foundation for processing metadata, • provides interoperability between applications that exchange machine-understandable information on the Web, • RDF with digital signatures will be key to building the “Web of Trust” for electronic commerce, collaboration, etc. Web-Mining WS 01-02

  18. Components of SW • Ontologies • Collections of information, • A document or file that forms the relations among the terms, • Collection of statements written in a language such as RDF that define the relations between concepts and specify logical rules for reasoning about them, • Computers will “understand” the meaning of semantic data on a Web page by following links to specified ontologies. Web-Mining WS 01-02

  19. Components of SW • Ontologies (cont.) • No SW without metadata, but metadata alone won‘t suffice. • The metadata in Web pages will have to be linked to special documents that define metadata terms and the relationships between these terms. • These sets of shared concepts and their interconnections are ontologies. • Example: members of faculty, condors • Problem: Political and cultural bias will creep into ontologies, e.g.: Chinese governmentTaiwan Web-Mining WS 01-02

  20. SW and ERM • Question: Is the RDF an ERM? • Answer: Yes and No! • It is great as a basis for ER-modelling, but because RDF is used for other things as well, RDF is more general. • RDF is a model of entities (nodes) and relationships. • If you are used to the “ER” modelling system for data, then the RDF model is basically an opening of the ER model to work on the Web. Web-Mining WS 01-02

  21. Real Power of SW • Agents • The real power of SW will be realized when people create many programs that • collect Web content from diverse resources, • process the information, • exchange the results with other programs. • The effectiveness of such software agents will increase exponentially as more machine-readable Web-content and automated services become available. Web-Mining WS 01-02

  22. Evolution of Knowledge • The SW is not merely the tool for conducting individual tasks. • If properly designed, the SW can assist the evolution of human knowledge as a whole. • A small group can • innovate rapidly and efficiently, • but this produces a subculture whose concepts are not understood by others. • Coordinating actions across a large group, however, is painfully slow and and takes an enormous amount of communication. Web-Mining WS 01-02

  23. Evolution of Knowledge (cont.) • The world works across the spectrum between these extremes, with a tendency to start small, from the personal idea, and move toward a wider understanding over time. • An essential process is the joining together of subcultures when a wider language is needed. • The SW must allow the independent work of diverse communities to be combined effectively. Web-Mining WS 01-02

  24. Some design facts • Inconsistency • Surely, once you have one statement that A and another somewhere on the Web that not A, then doesn’t the whole system fall apart? • This fear is quite valid. • Solution: Digital signature adds a notion of security to the whole process. • Key concept: Trust Web-Mining WS 01-02

  25. Some design facts • Expiry Gabriel: What is the time, Michael? Michael: Five past ten, my friend. [They chat for a minute] Gabriel: What is the time, Michael? Michael: Six minutes past ten, Mr. Gabriel. Gabriel: But Michael, you just told me just a minute ago it was five minutes past ten. How can I ever believe you again? Web-Mining WS 01-02

  26. Some design facts • Expiry (cont.) • Problem and question: • Time-varying information is one cause of apparent contradiction. • People and documents change status. • How does one base inference on information which may be out of date? Web-Mining WS 01-02

  27. Some design facts • Expiry (cont.) • A solution proposals: • Put explicit or implicit expiry dates on everything. • Whenever a server sends resource to an HTTP client, it can give an expiry date. • The client can track this, and ensure that all deductions from that document are cancelled when the date arrives, unless a more recent copy can be obtained. Web-Mining WS 01-02

  28. Some design facts • Expiry (cont.) • Another technique: • Make any looseness which exists in the real system visible. Instead of saying • Any employee of any member organization of W3C may register. • you say formally to the registration engine • Any person who was sometime in the last 2 months an employee of an organization which was sometime in the last 2 months a W3C member may register. • In other words, if an organization were to drop its membership, the system doesn’t have to support propagating that information instantly. Web-Mining WS 01-02

  29. A sampling of companies developing tools and applications for the SW Web-Mining WS 01-02

  30. Conclusion • The WWW is an information resource with virtually unlimited potential. • However, this potential is relatively untapped because it is difficult for machines to process and integrate this information meaningfully. • Solution: Semantic Web • Human understandable content is structured in such a way as to make it machine processable. • Key components: XML, RDF, ontology, agents Web-Mining WS 01-02

  31. Further information • World Wide Web Consortium (W3C): www.w3.org • W3C Semantic Web Activity: www.w3.org/2001/sw • An introduction to ontologies: www.SemanticWeb.org/knowmarkup.html • Simple HTML Ontology Extensions Frequently Asked Questions (SHOE FAQ): www.cs.umd.edu/projects/plus/SHOE/faq.html • DARPA Agent Markup Language (DAML) home page: www.daml.org Web-Mining WS 01-02

  32. HAPPY NEW YEAR Web-Mining WS 01-02

More Related