1 / 10

DSpace Batch Ingest

DSpace Batch Ingest. Prepared by Sarah Kim and Lorrie Dong Edited by Patricia Galloway INF 392K Problems in the Permanent Retention of Electronic Records Spring 2008 School of Information, University of Texas at Austin. What is Batch Ingest?.

Download Presentation

DSpace Batch Ingest

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DSpace Batch Ingest Prepared by Sarah Kim and Lorrie Dong Edited by Patricia Galloway INF 392K Problems in the Permanent Retention of Electronic Records Spring 2008 School of Information, University of Texas at Austin

  2. What is Batch Ingest? • General DSpace submission process: Single-item submission through a user-friendly web interface • What do you do if you have 5,000 items to submit? Batch ingest allows you to ingest as many items as you want in groups How? • Access to DSpace through UNIX Only authorized DSpace administrators can access to the School of Information DSpace file store and database directly through UNIX. (Contact: Shane Williams and Sam Burns) • Use Linux command lines *** Please follow DSpace batch ingest format rules. See DSpace simple archive format section in DSpace System Documentation: Application Layer http://www.dspace.org/

  3. Batch Ingest Process Overview Collection 1 Collection 2 Item_001 (Folder) contents dublin_core.xml file1 file2 Item_001 (Folder) contents dublin_core.xml file1 file2 • Off-line: Prepare • item folders • command line • for each collection Upload item folders into DSpace via Linux On-line Run Linux command line for each collection Done

  4. Step1: Creating a structure in DSpace and your file-preparation workspace • Structure the community and collections in DSpace; you need collection IDs for command lines, which are generated by DSpace when a collection is created • On the computer where you are preparing materials for ingest, organize your files by giving each collection a specifically named directory (or folder) • Each item in a collection will be represented as a subdirectory (or folder) in which a specific set of files should be placed (see next slide).

  5. Step2: Prepare item folders Create and organize item folders for each collection Each item (folder) should contain: • contents: a text file containing a list of file names to be included in the item folder; Does not include the “dublin_core.xml” and “contents” file names, and although it is a .txt file that you will make with Notepad or another text editor, you must not use the .txt extension, so if your editor creates it, rename this file to have no extension. • dublin_core.xml: a Qualified Dublin Core metadata file that pertains to the entire item • file1: original bitstream (there can be several in an item; an item can contain, for example, all the bitstreams that make up a website) • file2: any access copy or copies that may be needed to provide access

  6. Example: item folder *** Item folders in EACH collection should be named using the same names: item_001, item_002, item_003 and so forth. Example: contents file

  7. Example: dublin_core.xml <dublin_core> <dcvalue element="title" qualifier="none">DSpace Batch Ingest</dcvalue> <dcvalue element="contributor" qualifier="author">Kim, Sarah</dcvalue> <dcvalue element="description" qualifier="abstract">This is a Power Point about how to prepare items for DSpace batch ingest created for Dr. Galloway's class, Problems in Permanent Retention of Electronic Records, Spring 2008.</dcvalue> <dcvalue element="subject" qualifier="none">Digital Preservation</dcvalue> <dcvalue element="date" qualifier="created">2008-03-21</dcvalue> <dcvalue element="language" qualifier="iso">en_US</dcvalue> </dublin_core> *** Attention! All letters within <…> should be LOWER case. *** dublin_core.xml can be created and edited with XML-editors or with Notepad. *** Reference: See Dublin Core Elements with Qualifiers supported by DSpace http://www.dspace.org/index.php?option=com_content&task=view&id=141

  8. Step3: Prepare Linux command lines General format of a single command line : /opt/dspace/bin/dsrun org.dspace.app.itemimport.ItemImport --add --eperson=[eperson] --collection=[collection] --source=[source] --mapfile=[name of mapfile] Example /opt/dspace/bin/dsrun org.dspace.app.itemimport.ItemImport --add --eperson=srhkim@gmail.com --collection=2081/2254 --source=Wesker-2007_April --mapfile=20070406.ingest.map *** Each collection must have its own command line with unique collection ID. *** There is no particular rule for naming the source and map files. However, DSpace administrators usually use “[date].ingest.map” for the map file name. The map file can be used to remove the materials added through the batch ingest if something goes wrong. The source location should be the name of the collection directory that contains the item folders. E-person’s E-mail address Collection ID# Source location

  9. Step4: Conduct Batch Ingest • Set up an appointment with the authorized iSchoolDSpace administrator, Shane Williams or Sam Burns. • If you have not been working on an iSchool server for file preparation, create a source directory for the collection and upload items to an iSchool server workspace. *** • Conduct a test with a small amount of item folders by running test commend line. *** (For test ingest, add “ --test” at the end of each command line.) • Fix errors if there are any. (DSpace administrator will inform you of detected errors. Unqualified DC elements, capitalization in DC elements, unrecognizable symbols can cause errors.) • Run the prepared command lines for the actual batch ingest.*** (During the actual ingest, DSpace may reject individual items if they have errors. If any are rejected, the ingest process can be stopped, the errors can be fixed, and the process can be resumed: you don’t have to start over again) • Quality assurance: after ingesting, visit the collection in DSpace to ensure the ingest completed successfully. *** Step 2, 3, and 5 need to be conducted by the iSchoolDSpace administrator.

  10. Question or assistance: Lorrie Dong lorrie.d@gmail.com Sarah Kim srhkim@gmail.com Batch Ingest Process Guide: https://pacer.ischool.utexas.edu/handle/2081/9226 https://pacer.ischool.utexas.edu/handle/2081/8870

More Related