1 / 32

information retrieval

information retrieval. mon jan 26 2015 data…. framework for today ’ s lecture…. STRUCTURED vs unstructured data. easy to envision structured data in terms of “ tables ”. Employee. Manager. Salary. Smith. Jones. 50000. Chang. Smith. 60000. Ivy. Smith. 50000.

ellisonj
Download Presentation

information retrieval

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. information retrieval mon jan 26 2015 data…

  2. framework for today’s lecture…

  3. STRUCTUREDvs unstructured data easy to envision structured data in terms of “tables” Employee Manager Salary Smith Jones 50000 Chang Smith 60000 Ivy Smith 50000 Typically allows numerical range and exact match (for text) queries, e.g., Salary < 60000 AND Manager = Smith.

  4. tables in a MS Access relational database – defines each defining a social networking site

  5. Data entry form in a MS Access relational database – create each record

  6. structured vsUNSTRUCTURED data • typically refers to free text • email is a good example of unstructured data. it's indexed by date, time, sender, recipient, and subject, but the body of an email remains unstructured • other examples of unstructured data include books, documents, medical records, and social media posts

  7. magazine article is an example of unstructured data

  8. Document collection (corpus) Query Representation function Representation function Matching function Index CATEGORIES SUBJECT HEADINGS Results

  9. KWIC Key word in context

  10. KWIC Key word in context

  11. metadata metadata

  12. What is Metadata? • Classic definition: data about data • Metadata is structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource. (NISO) • 3 primary “types”: • Descriptive • Structural • Administrative (rights management, preservation)

  13. digital forensics

  14. The article was about a court case in which a judge ruled that the NSA's collection of metadata related to Americans' phone calls (their length, who they were to/from, how often they occurred) could very well be unconstitutional, despite the argument of the defendants of the NSA program--that collecting metadata was not akin to recording phone calls. But the judge's ruling clearly demonstrates that, today, metadata can tell us worlds of information. What's even more interesting to think about, and what the article also addresses, is that the power of metadata has increased partly because of the increasing extent to which technology is incorporated into our lives. Our use of technology leaves a sort of digital footprint, and as technology has become even more prevalent for us, there is more and more metadata about how we are using technology. That metadata, in turn, can tell others a great deal about ourselves. -Emma

  15. Google has millions and millions of web crawlers, robots that “crawl” around the “web” and gather new sites and archive them in the google sphere. The way they do this is by collecting and creating metadata about the sites they visit. Google proceeds to use this data in its ranking algorithms defining what gets precedence even in the most basic searches. We know Google accepts money from companies to place their site in the advertised content above your relevance search results. Even slightly divided from each other they’re still coming up in the same search. Google changes their Algorithms relatively regularly and who’s to say that their not weighting money givers higher in the rankings? -Ryan

  16. More Metadata: A Cataloging Record http://search.lib.unc.edu/search?R=UNCb7097376

  17. The Idea of Facets • Facets are a way of labeling data • A kind of Metadata (data about data) • Can be thought of as properties of items • Facets vs. Categories • Items are placed INTO a category system • Multiple facet labels are ASSIGNED TO items

  18. Facets Epicurious example http://www.epicurious.com/ • Create INDEPENDENT categories (facets) • Each facet has labels (sometimes arranged in a hierarchy) • Assign labels from the facets to every item • Example: recipe collection Ingredient Cooking Method Chicken Bell Pepper Stir-fry Curry Course Cuisine Main Course Thai

  19. The Idea of Facets • Break out all the important concepts into their own facets • Sometimes the facets are hierarchical • Assign labels to items from any level of the hierarchy Preparation Method Fry Saute Boil Bake Broil Freeze Desserts Cakes Cookies Dairy Ice Cream Sorbet Flan Fruits Cherries Berries Blueberries Strawberries Bananas Pineapple

  20. Using Facets • Now there are multiple ways to get to each item Preparation Method Fry Saute Boil Bake Broil Freeze Desserts Cakes Cookies Dairy Ice Cream Sherbet Flan Fruits Cherries Berries Blueberries Strawberries Bananas Pineapple Fruit > Pineapple Dessert > Cake Preparation > Bake Dessert > Dairy > Sherbet Fruit > Berries > Strawberries Preparation > Freeze

  21. labor intensive? expensive?

  22. UNC Libraries Online Catalog http://www.lib.unc.edu/ e.g. personal crisis

  23. caveat: semi-structured data • in fact almost no data is absolutely “unstructured” • e.g., this slide has distinctly identified zones such as the title and bullets • facilitates “semi-structured” search such as • title contains data and bullets contain structure

  24. Let’s look at a database of magazine & journal articles… …Academic Search Complete >> UNC Libraries Homepage: http://www.lib.unc.edu/ >> E-Research Tools >> Frequently Used >> Academic Search Complete [off-campus log in with onyen/password

  25. Organization / Search • We organize to enable retrieval • The more effort we put into organizing information, the more effectively it can be retrieved • The more effort we put into retrieving information, the less it needs to be organized first • We need to think in terms of investment, allocation of costs and benefits between the organizer and retriever • The allocation differs according to the relationship between them; who does the work and who gets the benefit?

More Related