1 / 42

Information integration, life-cycle and visualization

Information integration, life-cycle and visualization. Peter Fox Xinformatics – Week 8, March 20, 2012. Contents. Review of last class, reading Information integration Information life-cycle Information visualization Projects? Next classes. Information integration.

Download Presentation

Information integration, life-cycle and visualization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Information integration, life-cycle and visualization Peter Fox Xinformatics – Week 8, March 20, 2012

  2. Contents • Review of last class, reading • Information integration • Information life-cycle • Information visualization • Projects? • Next classes

  3. Information integration • Involves combining information residing in different sources and providing users with a unified view of them. This process becomes significant in a variety of situations both commercial (e.g. when two similar companies need to merge their databases) and scientific (e.g. combining research results from different bioinformatics repositories). • Integration appears with increasing frequency as the volume and the need to share existing information explodes.

  4. Information integration • It has become the focus of extensive theoretical work, and numerous open problems remain unsolved. • In management circles, people frequently refer to data integration as "Enterprise Information Integration" (EII)” wikipedia • Is this an information management challenge (rhetorical question)

  5. An example - Geospatial • Much of the work on information integration has focused on the dynamic integration of structured data sources, such as databases or XML data. • With the more complex geospatial data types, such as imagery, maps, and vector data, researchers have focused on the integration of specific types of information, such as placing points or vectors on maps, but much of this integration is only partially automated. • The challenge is that the dynamic integration of online data and geospatial data is beyond the state of the art of existing integration systems.

  6. Geospatial • The conflation process divides into following tasks: (1) find a set of conjugate point pairs, termed "control point pairs", in both vector and image datasets, (2) filter control point pairs, and (3) utilize algorithms, such as triangulation and rubber-sheeting, to align the rest of the points and lines in two datasets using the control point pairs. • Typically by human input has been essential to find control point pairs and/or filter control points

  7. Vectors on maps

  8. Recall elements/ forms of information • Structured/ un-structured, content, context • Presentation and organization • Syntax-semantics-pragmatics • Managed, designed and architected. • Integration poses an important challenge here • Two forms presented/ organized differently • Different structure, semantics… • Information back to data back to information

  9. Aiding integration • Usually an integration capability is HIGHLY curated or left entirely to the end user • If left to the user, the results is a new product which must also be managed and shared • “I can’t integrate what I don’t understand” • Key idea: provide for integratability !!! • Standards – formats for sure but also • Metadata • Semantics

  10. Different contexts? • Relies especially on structural/ use metadata • Provide different means/mode for integration • E.g. geospatial, uses … well ‘space’, really surfaces (latitude, longitude) • Geological data integration uses time and feature (of interest) – why? Yes, things move • Atmospheric science, e.g. chemistry or structure of the atmosphere may use ‘layers’ or pressure as an indicator for position • Comparing in-situ with remotely sensed information in many fields, e.g. medicine

  11. Informatics considerations • Be aware of what means for integration is and can be used • This is more than often what leads to new findings, and abductive reasoning… one of our goals • Exercise – how does integration occur for the other aspects of information?

  12. Life cycle - definitions • Life-cycle elements • Acquisition: Process of recording or generating a concrete artefact from the concept (see transduction) • Curation: The activity of managing the use of data from its point of creation to ensure it is available for discovery and re-use in the future (http://www.dcc.ac.uk/FAQs/data-curator) • Preservation: Process of retaining usability of data in some source form for intended and unintended use • Stewardship: Process of maintaining integrity across acquisition, curation and preservation

  13. Definitions ctd. • Management: Process of arranging for discovery, access and use of data, information and all related elements. Also oversees or effects control of processes for acquisition, curation, preservation and stewardship. Involves fiscal and intellectual responsibility.

  14. The nature of the challenge • To architect information systems today • You may play many roles • You may not get all the metadata or information you need even if you get the data • You will need skills that you were not taught • To work with end-users today • You may have lots of technical experience • You will need new skills in addressing the changing use of data and information • One ‘size’ does not fit all

  15. Many views of the Information life-cycle

  16. Acquisition • Learn / read what you can about the developer of the means of acquisition • Documents may not be easy to find • Remember bias!!! • Document things as you go • Have a checklist (the Management list) and review it often

  17. Curation • From producer to consumer! • Consider the organization and presentation of the data • Design factors to reduce uncertainty • Making use of semiotics • Document what has been (and not) done • Address the provenance to date, you are now THE next person • Technology-neutrality!! • Look to add metainformation

  18. Preservation • Archiving is a component • Intent is that ‘you can open it any time in the future’ and that ‘it will be there’ • This involves steps that may not be conventionally thought of • Think 10, 20, 50, 200 years…. looking historically gives some guide to future considerations

  19. Remember • The life cycle applies within and before and after your use case… • Stewardship is the act of preservation • So, let’s look in a little more detail

  20. How the information is created • Systemic • Environmental • Trial-and-error (or ad-hoc)

  21. How is information delivered? • One-to-many presentation • White paper (a document) • Web site FAQ • Web site informational • Web site directed (link sent with e-mail, and so on) to a specific Web site • Application-based delivery via managed expert system • One-to-one presentation: • Word of mouth • Ad-hoc communication

  22. How the information is managed • Complexity of the information • Complexity of the creation process • Complexity of the management system • Financial impact of IP/IC creation

  23. Type of information created • Tacit (created and stored informally): • Human memory • Localize, e.g. hard drive of the computer • Movement of tacit information into a formalized structure • Explicit (created and sorted formally): • Network shared • Network Web site/intranet • Informal knowledge-management system • Document-management system • Formal KM system

  24. For information creation: • Consider the • Value of the source • Age of the information • Proximity of the information to the consumer • Source of the information, and previous interactions with that specific source • Re-creation??

  25. Value of the source • Age of the information • Proximity of the information to the consumer • Source of the information, and previous interactions with that specific source

  26. Mostly Technical Issues • Data Preservation • Bit-level integrity • Data readability • Documentation • Metadata • Semantics • Persistent Identifiers • Virtual Data Products • Lineage Persistence • Required ancillary data • Applicable standards

  27. Mostly Non-Technical Issues • Policy (constrained by money…) • Front end of the lifecycle • Long-term planning, data formats, documentation... • Governance and policy • Legal requirements • Archive to archive transitions • Money (intertwined with policy) • Cost-benefit trades • Long-term needs of programs • User input • Identifying likely users • Levels of service • Funding source and mechanism

  28. Life cycle is a complex issue • Must be managed • Documented • As part of the use case, but also outside it

  29. Information Visualization • Defn: "to form a mental vision, image, or picture of (something not visible or present to sight, or of an abstraction); to make visible to the mind or imagination"
[The Oxford English Dictionary, 1989] • Questions to keep in mind • What is the improvement in the understanding as compared to the situation without visualization? • Which visualization techniques are suitable for one's information?

  30. Why visualization? • Reducing amount of data, quantization • Patterns • Features • Events • Trends • Irregularities • Exit points for analysis • Leading to presentation of data • Recall – cognitive science and the mental representation??!!??

  31. Types of visualization • Color coding (including false color) • Classification of techniques is based on • Dimensionality • Information being sought, i.e. purpose • Line plots • Contours • Surface rendering techniques • Volume rendering techniques • Animation techniques • Non-realistic, including ‘cartoon/ artist’ style

  32. Visualization formats • Many – vector, raster (image), animation, multi-dimensional,

  33. However, information cf. data.. • Think back to your presentations on semiotics and the visual representations of information systems – both good and bad • Not just a matter of the ‘producer’ view… consider the ‘consumer’ view, i.e. what is the goal of the visualization? • This is a time when • Experience helps a lot • But so does listening and gaining external feedback

  34. Remember - metadata • Many of these formats already contain metadata or fields for metadata, use them!

  35. New modes • http://www.visualizing.org/ • http://www.smashingmagazine.com/2007/08/02/data-visualization-modern-approaches/ • Many modes: • http://www.siggraph.org/education/materials/HyperVis/domik/folien.html

  36. visualizing.org

  37. Visualization

  38. Managing visualization products • The importance of a ‘self-describing’ product • Visualization products are not just consumed by people • How many images, graphics files do you have on your computer for which the origin, purpose, use is still known? • How are these logically organized?

  39. Discovery of visualizations • We covered some of this in the last class…. • When represented as images: • Image-based type free text search? • Referred to in publications (articles, books, web pages) • Vector graphics: • Postscript or PDF • SVG • Others?

  40. Discussion • About integration? • About information life-cycle in general? • About visualization? • Degree to which these topics are part of your projects?

  41. Reading for this week • Is retrospective and covers the four topic areas • Information Integration • Information Life Cycle • Information Visualization • Information model development and visualization

  42. What is next • Week 9: • Information Management, Workflow, and Discovery, • Project definition check-in/ discussion

More Related