1.55k likes | 1.56k Views
Introduction to DDI 3.0. Sanda Ionescu ICPSR CESSDA Expert Seminar, September 2007. DDI Version 3.0. Radically different. More complex… (…but certainly doable!) Brings important benefits. Workshop Schedule. 14:30 – 15:10 Overview (40) 15:10 – 15:35 Structure and Technical
E N D
Introduction to DDI 3.0 Sanda Ionescu ICPSR CESSDA Expert Seminar, September 2007
DDI Version 3.0 • Radically different. • More complex… (…but certainly doable!) • Brings important benefits.
Workshop Schedule 14:30 – 15:10 Overview (40) 15:10 – 15:35 Structure and Technical Mechanisms (25) 15:35 – 15:45 Break (10) 15:45 – 16:10 Study Unit – Modules Content (25) 16:10 – 16:30 Variable Markup Example (20) 16:30 – 16:40 Break (10) 16:40 – 17:10 Grouping – Modules Content and Examples (30) 17:10 – 17:30 Getting Started (20)
DDI 3.0 Overview
DDI BackgroundDevelopment History • 1995 – A grant-funded project initiated and organized by ICPSR proposes to create a new standard for documenting social science data, to replace OSIRIS tagged codebooks. • First drafts used SGML, then converted to Web-friendly XML. • 2000 – DDI Version 1.0 published as a mainly document- and codebook-centric standard.
DDI BackgroundDevelopment History • 2003 – DDI Version 2.0 published with extended scope: • Aggregate data coverage (based on matrix structure) • Additional geographic representation to assist geographic search systems and GIS users • Versions 1.0 through 2.1 (latest published) are backwards compatible, and based on the same structure.
DDI BackgroundDevelopment History • February 2003 – Formation of the DDIAlliance, a self-sustaining membership organization whose members have a voice in the development of the DDI specification. http://www.ddialliance.org/
DDI BackgroundDevelopment History Version 3.0: • 2004-2006: Planning and Development • November 2006: Internal Review • February 2007: Public Review • July 2007: Candidate Draft Release http://www.ddialliance.org/ddi3/index.html
Benefits of using DDI as an XML-based standard • Interoperability: • Enables seamless exchange and reuse by other systems. • Repurposing: • Provides a core document from which different types of outputs can be generated. • Value-added documentation: • Tagging carries “intelligence” in the document by describing content. • Enhanced Data Discovery: • Increases precision and granularity of searches. • Support for Data Analysis: • Variables description is accepted as input by online analysis systems. • Multiple presentation formats: • ASCII – text; PDF; HTML; RTF. • Preservation-friendly: • Non-proprietary format.
Why DDI 3.0? DDI 3.0 presents new features in response to: • Perceived needs of: -Data users -Data producers -Data archivists/librarians • Developments in documenting and archiving data • Advances in XML technology
DDI 3.0 and the Data Life Cycle Model DDI Versions 1/2 were codebook-centric: • Closely followed the structure of traditional print codebooks. • Captured data documentation at a single, “frozen” point in time – archiving.
DDI 3.0 and the Data Life Cycle Model Version 3.0 is Life Cycle oriented: -Designed to cover all stages in the life cycle of a data collection: pre-productionproductionpost-production secondary use
Life Cycle Coverage in DDI 3.0 • Planning for the Study: Proposal / Design Study Purpose / Outline Concepts Study Population Author(s) Funding Sources Version 3.1 Survey / Sample Design Pre-testing
Life Cycle Coverage in DDI 3.0 Proposal becomes reality… Data Collection methodology: sampling, time, etc. Instrument characteristics Questionnaire Data cleaning, weighting, coding, etc.
Life Cycle Coverage in DDI 3.0 Publishing the data… Physical representation: Data format, Record structure, Statistics. Intellectual content: Variables, Categories, Codes.
Life Cycle Coverage in DDI 3.0 Archiving / (Re)Distributing the data collection… Processing checks Holdings, availability and access conditions
Life Cycle Coverage in DDI 3.0 DDI becomes “visible” to the outside world… DDI Instance: Pulls together all life cycle stages Acquires its own identity as an object Becomes a tool for data discovery and analysis
Life Cycle Coverage in DDI 3.0 Secondary use of data – new conceptual framework… New DDI Instance: New Purpose New Logical Product New Physical Description of Data
DDI 3.0 and the Data Life Cycle Model Advantages of Life Cycle orientation: • Allows capture and preservation of metadata generated by different agents at different points in time. • Facilitates tracking changes and updates in both data and documentation.
DDI 3.0 and the Data Life Cycle Model Advantages of Life Cycle orientation: • Enables investigators, data collectors and producers to document their work directly in DDI, thus increasing the metadata’s visibility and usability. • Benefits data users, who need information from the full data life cycle for optimal discovery, evaluation, interpretation, and re-use of data resources.
New / Extended Functionalities in DDI 3.0: Questionnaire Versions 1/2: • No instrument coverage. • Question text only as part of variable description. • No documentation for question flow / conditions. Version 3.0: • Full description of instrument as a separate entity. • Documents specific use of questions: flow, conditions, loops. • Compatible with Computer Assisted Interviewing software.
New / Extended Functionalities in DDI 3.0: Complex Data Versions 1/2: • Inadequate representation of complex / hierarchical data Version 3.0: • Detailed documentation for complex / hierarchical data Logical structure of records Record Types and Relationships Relevant variables: key-link, case identification, record type locator Physical layout of records Single “hierarchical” file for all records, multiple rectangular files, relational database, etc.
New / Extended Functionalities in DDI 3.0: Aggregate Data Versions 1/2: • Initially designed for microdata only • Aggregate data section added in V 2.1 to support limited representation (Census-type data, delimited files) Version 3.0: • Adds support for tabular, spreadsheet-type, representation of aggregate data • Aggregate data transport option: cell content may be included inline with the data item description
New / Extended Functionalities in DDI 3.0: Data Transport Versions 1/2: -None Version 3.0: -In-line inclusion enabled for both aggregate data and microdata
New / Extended Functionalities in DDI 3.0: Longitudinal / Time Series / Cross-national DataComparability Versions 1/2: -None Version 3.0: -Grouping structure documents studies related on one or several dimensions (time, geography, language, etc.) as well as their comparability
New / Extended Functionalities in DDI 3.0:Increased Multilingual Support Versions 1/2: • Limited <anytag xml:lang=“”> Version 3.0: • Support for multiple language use and translations <InternationalStringType xml:lang=“” translated=“” translatable=“”> <Variable> <Label xml:lang=“ger” translated=“false” translatable=“true”> Geburtsjahr</Label> <Label xml:lang=“eng” translated=“true”>Year of Birth</Label> </Variable>
DDI 3.0 Specification: Schema-based Versions 1/2: • DTD-based Version 3.0: • Schema-based: Data typing supports machine actionability Use of namespaces supports • Modularity • Extensibility and reuse • Alignment with / use of other standards
DDI 3.0 Specification: Machine-actionable Versions 1/2: • Machine-readable Version 3.0: • Machine-actionable: 1. Data typing: increased use of controlled vocabularies and standard codes 2. Larger set of required elements Predictable content = a more consistent base for programming
DDI 3.0: Modular Structure Version 1/2: • Single file, hierarchical design Version 3.0: • Modular design: • Facilitates reuse • Facilitates versioning and maintenance • Supports life cycle model • Allows flexibility in organizing the DDI Instance • Supports grouping and comparing studies • Supports creation of metadata registries
DDI 3.0: Alignment with other metadata standards Versions 1/2: • MARC, Dublin Core (bibliographic standards) Version 3.0: • MARC, DC, but also… • SDMX (Statistical Data and Metadata Exchange) • ISO 11179 (Metadata Registries) • FGDC (Digital Geospatial Metadata) - ISO 19115 (Geographic Information Metadata)
DDI 1/2 or DDI 3.0? • DDI 3.0 will not supersede DDI 2.1. • Both versions will • coexist • continue to be maintained • be used according to specific needs. • All DDI 1/2 markup will not have to be migrated to Version 3.0.
DDI 3.0 Structure and Mechanisms
DDI 3.0 – Modular Structure Building blocks of DDI 3.0: • Modules • Schemes
DDI 3.0 – Modular Structure Modules: • Document different aspects of a study, or group of studies, following the data through their life cycle (Conceptual Components, Data Collection, Logical Product, Physical Instance, etc.) Schemes: • Include collections of sibling “objects” that are traditionally components of a variable description: Concepts, Universes, Questions, Variable Labels and Names, Categories, Codes.
DDI 3.0 – Modular Structure Modules: • Can live independently (have their own schemas) or connected to one another within a hierarchical structure. Schemes: • Can live semi-independently (need a higher-level wrapper as they do not have their own schemas) or in-line within a Study Unit or Group module.
DDI 3.0 – Modular Structure DDI 3.0 model = a multi-branched hierarchy Module level: DDI Instance Study Unit Group Resource Package Conceptual Components Data Collection Archive Study Unit Subgroup Study Unit (Sub)group Organizations Study Unit Subgroup
DDI 3.0 – Modular Structure DDI 3.0 model = a multi-branched hierarchy Within modules: Data Collection Methodology Question Scheme Processing Sampling Time Method Question Item Question Item Weighting Coding
DDI 3.0 – Modular Structure Relationships are established through: • In-line inclusion (Relational order is explicit) • Referencing Internal External (Relational order is implicit)
DDI 3.0 – Structural mechanisms Enable modular design and help actualize its benefits. • Inheritance • Referencing • Identification
DDI 3.0: Inheritance • Inheritance is based on the hierarchical structure of the model. • In DDI 3.0 a number of elements are reused at different levels of the hierarchy. • When the same element is present at multiple levels, lower levels inherit content from the upper levels, and only need to specify differences (=local overrides).
DDI 3.0 InheritanceExample • Instance: Coverage: Spatial: 50 US states -Study Unit A – no Spatial Coverage defined = will be inherited from Instance -Study Unit B – Coverage: Spatial: 48 coterminous states = supersedes definition in Instance
DDI 3.0: Referencing • DDI 3.0 modular structure is dependent upon creating relationships by reference. • Referencing implies bringing up the content of a DDI object within, or in association with, another object, by specifying its Unique Identifier. • Identifiers are the key links between DDI objects.
Data Collection Module: Question Scheme: Question: ID: “Q1” Text: “How many days in the past week did you watch the national network news on TV?” Conceptual Components Module: Concept Scheme: Concept: ID: “C1” Description: “Exposure to national TV news” DDI 3.0: ReferencingExample Logical Product Module: Variable Scheme: Variable: ID: “V1” Name: V043014 Label: Days past week watch natl news on TV Question Reference: ID: “Q1” Concept Reference: ID : “C1”
DDI 3.0: Identification Consistency in building and using identifiers is needed for: • Proper functioning of reference systems, enabling a smooth exchange and reuse of existing metadata. • Machine-actionability of DDI instances, allowing them to serve as a basis for running programs and processes.
DDI 3.0: Identification Element types used in the Identification system:
DDI 3.0: IdentificationElement Types Non-identified elements: • Require context, which is provided by containing parents. Example: codes within code schemes • Are not reusable. Example: variable and category statistics
DDI 3.0: IdentificationElement Types Identifiables • Carry their own ID • May be referenced / reused • Cannot be versioned or maintained, except as part of a complex parent element (Example: Variable – a change implies a new version of the entire scheme).
DDI 3.0: IdentificationElement Types Versionables • Carry their own ID • Carry their own Version: content changes are important to note (Example: Concept – may be independently versioned within a scheme).
DDI 3.0: IdentificationElement Types Maintainables • Are higher level DDI objects • Are both identifiable and versionable • Can also be published and maintained as separate entities (Example: all modules, schemes, comparison maps)