1 / 51

Digital Asset Management and Publication with LadyBird

Digital Asset Management and Publication with LadyBird. Eric James programmer/analyst library IT Yale University Library eric.james@yale.edu 12 July 2013. What is LadyBird ?. Bebop song by Tadd Dameron First Lady, Lyndon B. Johnson presidency Old dog from King of the Hill

nicola
Download Presentation

Digital Asset Management and Publication with LadyBird

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Digital Asset Management and Publication with LadyBird Eric James programmer/analyst library IT Yale University Library eric.james@yale.edu 12 July 2013

  2. What is LadyBird? • Bebop song by Tadd Dameron • First Lady, Lyndon B. Johnson presidency • Old dog from King of the Hill • Digital asset management tool

  3. LadyBird - Digital Asset Management Tool LadyBird from its origin is a system which processes metadata and temporarily houses digital assets to be published. It provides a configurable system for migrating digital objects and collections, normalizing metadata, and preserving and publishing content. It was initially writing in Microsoft .Net and C#, hosted on Windows 2008 using Microsoft SQL Server 2008. Some work on java modules (for import) Wish list – To migrate to Jruby/rails.

  4. LadyBird components • Web interface • Job processing engine - imports • Export processing engine – exports • Bag creation • Heartbeat monitor • Application cleanup system • This presentation will focus on the workflow and concepts involved in publication of digital objects w/ metadata to fedora

  5. LadyBird concepts I • Core of the application is the object table • Collection – departments within the library and Yale (later will come into play when discussing c# tables) • Project – projects specific to a collection • An object belongs to a project and a project belongs to a collection • Currently 16 collections with 34 projects and 1.53 million objects • We call objects “oids”, technically “oid” means object id column of the object table but we tend to use it to describe the whole ball of wax • User table – cataloger is registered and roles and permissions setting are used throughout the app

  6. LadyBird concepts II • Processing objects is all about the spreadsheet • Each row is an object • Each column represents either functions or metadata • Functions ex – {F1} is the object as identified by oid(primary key of object table), if left blank that is signal to create a new oid • {F4} parent oid (for complex objects) • {F40} can have a value PUBLISH telling ladybird to auto publish this object • Metadata ex – {FDID=58} call number,{FDID=262} Host,creator,etc. The cataloger can take advantage of excel functionality (like repeating fields) to quickly create a spreadsheet for batch import,

  7. LadyBird concepts III field_definition (fdid) table (230 metadata fields) 51 Cataloger 52 Record source 53 Record date 54 Record modified date 55 Record ID 56 Local record ID 57 Local record ID, other 58 Call number 59 Accession number 60 Box The values are either strings or acid values (more on acids later)

  8. LadyBird concepts IV • Import tables – all about the spreadsheets, though you can import MARC or EAD records by bibid, barcode, handle too, in that case the records are deserialized into fdids, and any spreadsheet data overrides the records im_job (1 master row for spreadsheet) Im_job_exHead (column headers from spreadsheet) im_job_contents (values) Im_files(for files) import_checksum (for files) im_job_contents_history • Job tracking (overall tracking associates a oid imported to a specific job) trk_project trk_job trk_job_contents trk_oid

  9. LadyBird concepts V • The C# tables – c for “current”,# for each collection • The “Metadata home” - data imported to the im tables finally transferred here • There is a set of tables for each collection. Ex: # = 13 (collection:Hydra, project: Hydra Test) c13 – master list of oids c13_strings c13_longstrings c13_acid Each row contains basically a oid/fdid/value, thus given an oid one could get all metadata fields for that object as rows from this table. It also has a favid for additional values associated with the fdid. There also corresponding p# tables, p for “past” that keep a audit trail of any updates to specific oids. C#table designed for high volumeExploring better options, hashing

  10. LadyBird concepts VI • Acid – authority control – a system for using controlled vocabulary for metadata fields Fdid 62 = Host, Creator Acid fdid value 126434 62 Luhan, Mabel Dodge, 1879-1962 126626 62 Dobbs, Arthur, 1689-1765 126628 62 Filson, John, ca. 1747-1788 126630 62 Thomson, Charles, 1729-1824 126632 62 Hutchins, Thomas, 1730-1789 126635 62 Adair, James, ca. 1709-1783 So If for an oid row in the spreadsheet the fdid 62 column was given the value 126635, that field would resolve to Adair, James, ca. 1709 Currently 155,415 values. Potential for more sophisticates uses with linked data.

  11. LadyBird sample workflow start • Workstation mounted with a job folder for both import and export Windows:\\birdcage.library.yale.edu\project25\import\ Mac: SMB://birdcage.library.yale.edu/project25//import// Windows:\\birdcage.library.yale.edu\project25\export\ Mac:SMB://birdcage.library.yale.edu/project25//export// • Project25 corresponds to the project table • Create a folder in the import directory and drag files into folders or subfolders • LadyBird will now have detected that folder and have created a job for this under the “Dashboard” menu selection

  12. LadyBird dashboard

  13. add digital object to folder

  14. Got to dashboard and process this folder

  15. Receive email confirmation Subject: LadyBird Import Complete job: test_open_rep Your import has been processed.test_open_repVisit your dashboard in Ladybird for your most recent jobs.http://ladybird.library.yale.edu/user_jobs.aspxView job: http://ladybird.library.yale.edu/user_jobs.aspx?qa=query&qid=12307 * A jobcomplete.txt file with the time is added to import folder so app know that directory is complete

  16. View job

  17. View set

  18. New object->Metadata (form)

  19. Or From View Set, “Export as Job”

  20. Receive export email confirmation Subject: LadyBird Export Ready Your export is ready. \\birdcage\project25\export\ermadmix_46371_06262013_165116.xls

  21. Spreadsheet – fill in and save as tab-delimited text file

  22. Import

  23. Import Email Confirmation Subject: LadyBird Import Complete job: ermadmix_import_062613_171134 Your import has been processed.ermadmix_import_062613_171134Visit your dashboard in Ladybird for your most recent jobs.http://ladybird.library.yale.edu/user_jobs.aspxView job: http://ladybird.library.yale.edu/user_jobs.aspx?qa=query&qid=12313

  24. Publish • Publishes automatically if {F40}=publish • Or can use interface to check file and metadata and explicitly click the publish button

  25. Publish (behind the scenes) • Oid is added to the hydra table with date (when added) and date published (when processing complete) timestamps Id oid date date published … … … … 39176 10684347 2013-06-26 16:01:11.043 2013-06-26 17:14:05.900 39177 10684348 2013-06-26 16:01:11.043 2013-06-26 17:14:07.457 39178 10684349 2013-06-26 16:01:11.043 2013-06-26 17:14:09.017 39179 10684350 2013-06-26 16:01:11.043 2013-06-26 17:14:10.577 39180 10684351 2013-06-26 16:01:11.043 2013-06-26 17:14:12.137 39181 10684352 2013-06-26 16:01:11.043 2013-06-26 17:14:13.697 … … … …

  26. oid added to hydra_publish table Key fields: hpid: 23703 hcmid: 2 cid:9 Pid: 27 Oid: 10681633 _oid: 0 zindex: 0 hydraID: null dateReady: 2013-06-26 16:01:55.430 dateHydraStart: null

  27. Rows for oid added to hydra_publish_path table Key fields w/ example: hppid: 139004 Hpid: 26340 Type: jp2 pathHTTP: http://lbfiles.library.yale.edu/10684274.jp2 pathUNC: \\storage.yale.edu\home\ladybird-801001-yul\ladybird\project27\publish\dl\10684274\1758.02.00.00_page1.jp2 Md5: 35433b00ca9de2cdaed275c455339090 controlGroup: M mimeType: image/jp2 Dsid: jp2 ingestMethod: filepath oidPointer: null

  28. Hydra_publish_path – typical files xml rights (hydra rights) Xml metadata (MODS descMetadata) Xml access (home grown granular rights) pdf (transcript YIPP) pdf2 (annotated transcript YIPP) jp2 (derivative) jpg (derivatives) tif (master)

  29. descMetadata - creation There is a service (c# class and methods) that is called upon hydra publish that iterates through all the fdids for an oid and uses the XML DOM to create a MODS file. This is basically a mapping of field definitions to the MODS schema. There is the potential to map the fdids to any metadata format.

  30. accessMetadata

  31. Rights metadata

  32. Transition in fedora hydra world select * from hydra_content_model id date uid contentModel 1 2013-04-25 08:50:20.043 1 simple 2 2013-04-25 08:50:26.350 1 complexParent • 2013-04-25 08:50:30.420 1 complexChild ContentModel maps to ActiveFedora model

  33. Transition into fedora hydra world II select * from hydra_content_model_ds id date uid hcmid dsid ingMethod required 1 2013-04-25 08:56:11.670 1 1 accessMetadata pullHTTP y 2 2013-04-25 08:56:11.670 1 1 descMetadata pullHTTP y 3 2013-04-25 08:56:11.670 1 1 rightsMetadata pullHTTP y 4 2013-04-25 08:56:11.670 1 1 tif filepath y 5 2013-04-25 08:56:11.670 1 1 jp2 filepath y 6 2013-04-25 08:56:11.670 1 1 jpg filepath y 7 2013-04-25 08:56:11.670 1 2 accessMetadata pullHTTP y 8 2013-04-25 08:56:11.670 1 2 descMetadata pullHTTP y 9 2013-04-25 08:56:11.670 1 2 rightsMetadata pullHTTP y 10 2013-04-25 08:56:11.670 1 2 tif filepath n 11 2013-04-25 08:56:11.673 1 2 jp2 filepath n 12 2013-04-25 08:56:11.673 1 2 jpg filepath n 13 2013-04-25 08:56:11.673 1 3 accessMetadata pullHTTP y 14 2013-04-25 08:56:11.673 1 3 descMetadata pullHTTP y 15 2013-04-25 08:56:11.673 1 3 rightsMetadata pullHTTP y 16 2013-04-25 08:56:11.673 1 3 tif filepath y 17 2013-04-25 08:56:11.673 1 3 jp2 filepath y 18 2013-04-25 08:56:11.673 1 3 jpg filepath y 19 2013-05-31 10:48:25.620 1 2 oidPointer pointer n 20 2013-06-07 11:03:24.537 1 2 pdf filepath n 21 2013-06-07 11:03:52.933 1 2 pdf2 filepath n

  34. Example - simple content model • require "active-fedora" • class Simple < ActiveFedora::Base •   belongs_to :collection, :property=> :is_member_of •   has_metadata :name => 'descMetadata', :type => Hydra::Datastream::SimpleMods •   has_metadata :name => 'accessMetadata', :type => Hydra::Datastream::AccessConditions •   has_metadata :name => 'rightsMetadata', :type => Hydra::Datastream::Rights •   has_metadata :name => 'propertyMetadata', :type => Hydra::Datastream::Properties •   delegate :oid, :to=>"propertyMetadata", :unique=>true •   delegate :projid, :to=>"propertyMetadata", :unique=>true •   delegate :cid, :to=>"propertyMetadata", :unique=>true •   delegate :zindex, :to=>"propertyMetadata", :unique=>true •   delegate :parentoid, :to=>"propertyMetadata", :unique=>true • end

  35. Example – Properties Datastream • require 'active_fedora' • module Hydra •   module Datastream •     class Properties < ActiveFedora::OmDatastream • #ERJ note ladybird pid = projid, ladybird _oid = parentoid •       set_terminology do |t| •         t.root(:path=>"root") • t.oid(:path=>"oid") • t.cid(:path=>"cid") • t.projid(:path=>"projid") • t.zindex(:path=>"zindex") • t.parentoid(:path=>"parentoid") • t.ztotal(:path=>"ztotal") • t.oidpointer(:path=>"oidpointer") • end • def to_solr(solr_doc=Hash.new) •         super(solr_doc) • solr_doc['oid_isi'] = oid • solr_doc['cid_isi'] = cid • solr_doc['projid_isi'] = projid • solr_doc['zindex_isi'] = zindex • solr_doc['parentoid_isi'] = parentoid • solr_doc['ztotal_isi'] = ztotal • solr_doc['oidpointer_isi'] = oidpointer •         solr_doc •       end •     end •   end • end

  36. Workflow review • Add folder with files to import folder • Process folder. This will create the records in the database (oids, job tracking,c# instances, and file derivatives) • Export spreadsheet. This will create a spreadsheet template for the folder of files in (1) • Fill in metadata in spreadsheet – the main cataloging task. • Import spreadsheet. This will ultimately populate the c# with metadata from the oid rows of the spreadsheet. • Publish to hydra. This will create the hydra tables with serialized metadata files(MODS, access rights), and stage files in storage for ingest.

  37. Ingest task • Set up within a hydra project • gem ‘tiny_tds’ connect to the ladybird SQL Server database

  38. app/models (objects) • collection.rb – maps to pid (project) in ladybird, parent to simple.rb and complex_parent.rb • simple.rb – 1 image w/derivatives, no hierarchy • complex_parent.rb – parent to a set of images (like a book or image set) • complex_child.rb – 1 image w/derivatives (like a page These relate to the hydra_content_model table

  39. app/model (datastreams) • coll_properties.rb • properties.rb • rights.rb • access_conditions.rb • simple_mods.rb

  40. simple_mods.rb - indexing

  41. rake yulhy4:ingest I Properties: • SQL server connection config • Mount of ladybird storage Uses the hydra_publish table as a queue (driven by this query until done): • select top 1 a.hpid,a.oid,a.cid,a.pid,b.contentModel,a._oid from dbo.hydra_publish a, dbo.hydra_content_model b where a.dateHydraStart is null and a.dateReady is not null and a._oid=0 and a.hcmid is not null and a.hcmid=b.hcmid and a.action='insert' order by a.dateReady")

  42. rake yulhy4:ingest II ActiveFedora ingest Create new object based on content model obj = Simple.new obj = ComplexParent.new obj = ComplexChild.new

  43. Rake yulhy4:ingest III Iterate through all datastreams for the content model • select hcmds.dsid as dsid,hcmds.ingestMethod as ingestMethod, hcmds.required as required from dbo.hydra_content_model hcm, dbo.hydra_content_model_ds hcmds where hcm.contentModel = '#{contentModel}' and hcm.hcmid = hcmds.hcmid/) For each in above query get the datastream info for the oid • select type,pathHTTP,pathUNC,md5,controlGroup,mimeType,dsid,OIDpointer from dbo.hydra_publish_path where hpid=#{i["hpid"]} and dsid='#{dsid}'/) Verify checksums and use activeFedora to ingest datastreams

  44. rake yulhy4:ingest IV Add ladybird specific info to properties datastream • oid • cid • pid • zindex • _oid Add hierarchical info to RELS-EXT • Simple and complex_parent – is_member_of a collection • Complex_child – is member of a complex_parent Some discussion about adding more linked data.

  45. Rake yulhy4:ingest V

  46. Rake yulhy4:ingest VI

  47. Blacklight

  48. review

  49. future Hydra_publish – revise already ingested content • action=‘update’ • action=‘insert’ Archivematica (by artefactual) • Replace the ingest task with a custom workflow • GUI interface • Human decision points and manual processing • Technical metadata generation (FITS) • Provenance (jhove) • Issues – how to employ OAI packages (SIP,AIP,DIP) for objects without a natural package structure?

  50. Contributors • Eric James • Lakeisha Robinson • Kalee Sprague • Osman Din • Jay Terray • Rebekeh Irwin • Mike Friscia

More Related