1 / 88

Scripting EPrints

This talk introduces the EPrints data model, significant objects, their relationships, and common methods. Learn how to find and extract documentation using perldoc, and explore scripting your archive.

Download Presentation

Scripting EPrints

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Scripting EPrints

  2. About This Talk • Light on syntax • object->function(arg1, arg2) • Incomplete • Designed to • give you a feel for the EPrints data model • introduce you to the most significant objects • how they relate to one another • their most common methods • act as a jumping off point for exploring

  3. Finding Documentation • EPrints modules have embedded documentation • Extract it using perldoc • perldoc perl_lib/EPrints/EPrint.pm

  4. EPrints 3.0 • This talk based on EPrints 2.3 series • 3.0 API still being finalised • tidies up object hierarchy • resolves some of 2.3’s naming clashes • lots of extra functionality • but core data model remains the same • EPrints 3.0 is fully back-compatible • 2.3 scripts will work with EPrints 3.0

  5. Roadmap • Data • EPrints, Users, Documents, Subjects, Subscriptions • Data collections • DataSets, MetaFields • Searching your data • SearchExpressions • Scripting your archive • Archives, Session

  6. 1. Data EPrints, Users, Documents, Subjects, Subscriptions

  7. Data Model Sketch EPrint

  8. Data Model Sketch Document PDF EPrint all documents Document HTML HTML HTML

  9. Data Model Sketch Document PDF EPrint all documents Document owner HTML User HTML HTML

  10. Data Model Sketch Document PDF EPrint all documents EPrint Document owned eprints owner HTML User HTML HTML

  11. Data Model Sketch Document PDF EPrint all documents EPrint Document owned eprints owner HTML User HTML HTML subscriptions Subscription Subscription

  12. Data Model Sketch Document PDF EPrint Subject all documents EPrint Document owned eprints owner HTML User HTML HTML subscriptions Subscription Subscription

  13. Data Model Sketch Subject Document child PDF EPrint Subject all documents EPrint Document owned eprints parent owner Subject HTML User HTML HTML subscriptions Subscription Subscription

  14. Data Model Sketch Subject EPrint Document child posted eprints PDF EPrint Subject all documents EPrint Document owned eprints parent owner Subject HTML User HTML HTML subscriptions Subscription Subscription

  15. EPrint • An EPrint object represents a single deposit in your EPrints archive • has some metadata fields • has one or more documents • is owned by a user

  16. Creating EPrints • new(session, id) • create an EPrint object for an existing deposit • create(session, dataset, data) • create a new EPrint object • More on sessions and datasets later!

  17. Introducing DataObj • EPrint is a subclass of DataObj • DataObj provides common methods for • accessing metadata • rendering XHTML output

  18. Inherited from DataObj • get_id • get_url(staff) • get the URL of an EPrint • e.g. URL to the abstract page of an eprint in the archive • if staff is true then returns the URL to the staff view, which shows more detail • get_type() • get the EPrint type • e.g. article, book, thesis, conference paper...

  19. Inherited from DataObj • get_value(fieldname) • get the value of the named field • set_value(fieldname, value) • set the value of the named field • Remember to call commit() to make changes in database! • is_set(fieldname) • true if the named field has a value

  20. EPrint Methods • remove() • erase the eprint and any associated records/files from the database and filesystem • this should only be called on EPrints in the "inbox" or "buffer" datasets • commit() • commit any changes made to the database • datestamp() • set the last modified date to today

  21. Moving EPrints Around • move_to_deletion() • transfer the eprint to the deletion dataset • should only be called on eprints in the archive dataset • See also: • move_to_inbox() • move_to_buffer() • move_to_archive()

  22. Rendering EPrints • generate_static() • generate the static abstract page for the eprint • in a multi-language archive this will generate a page in each language

  23. Rendering - Inherited from DataObj • render_citation(style) • create an XHTML citation for the EPrint • if style is set then use the named citation style • defined in citations-en.xml • render_citation_link(style) • as above, but citation is linked to the EPrint’s abstract page

  24. Rendering - Inherited from DataObj • render_value(fieldname, showall) • get an XHTML fragment containing the rendered version of the value of the named field • in the current language • if showall is true then all languages are rendered • usually used for staff viewing (checking) data

  25. Rendering Tips • Most rendering methods return XHTML • but not a string! • XML Node objects • DocumentFragment, Element, TextNode... • In your scripts, build a document tree from these nodes • e.g. node1->appendChild(node2) • then flatten it to a string • Why? It’s easier to manipulate a tree than to manipulate a large string

  26. More Rendering Tips • XML Node objects are not part of EPrints • XML::DOM or XML::GDOME libraries • explore these libraries using perldoc • XHTML is good for building Web pages • but not so good for command line output! • use tree_to_utf8() • extracts a string from the result of any rendering method • tree_to_utf8( eprint->render_citation)

  27. Navigating to Related Objects • get_user() • get a User object representing the user to whom the EPrint belongs • get_all_documents() • get a list of all the Document objects associated with the EPrint • We will look at these objects next...

  28. User • A User object represents a single registered user • Also a subclass of DataObj • inherits metadata access methods • get_url get_type get_value set_value is_set • inherits rendering methods • render_citation render_citation_link render_value • Also has commit and remove • inherited from DataObj in 3.0

  29. Creating Users • new(session, id) • create a User object from an existing user record • user_with_email(session, email) • user_with_username(session, username) • create_user(session, access_level) • create a new User

  30. User Accessors • get_editable_eprints() • get a list of EPrints that the user can edit • get_owned_eprints(dataset) • get a list of EPrints owned by the user in the dataset • is_owner(eprint) • true if the user is the owner of the EPrint • get_subscriptions() • get a list of Subscriptions associated with the user

  31. Document • A single document associated with an eprint • may actually contain one or more physical files • PDF = 1 file • HTML + images = many files • Another subclass of DataObj

  32. Creating a Document Object • new(session, docid) • create a Document object from an existing record • create(session, eprint) • create a new Document object for the given EPrint

  33. Document Accessors • get_eprint() • get the EPrint object the document is associated with • local_path() • get the full path of the directory where the document is stored in the filesystem • files() • get a list of (filename, file size) pairs

  34. Main File and Format • get_main() • set_main(main_file) • get/set the ‘main’ file for the document • e.g. if the document is multipage HTML with images, the main file needs to be set to the top index.html file • when rendering document links, EPrints always links to the main file in the document • set_format(format) • sets the document format

  35. Adding Files to Documents • upload(filehandle, filename) • uploads the contents of the given file handle • adds the file to the document (using the given filename) • add_file(file, filename) • adds a file to the document (using the given filename) • file is the full path to the file

  36. Adding Files to Documents • upload_url(url) • grab file(s) from given URL • in the case of HTML, only relative links will be followed • add_archive(file, format) • add files from a .zip or .tar.gz archive • remove_file(filename) • remove the named file from the Document

  37. Subject • A single subject from the subject hierarchy • Another subclass of DataObj

  38. Creating Subjects • new(session, subjectid) • create a Subject object from an existing subject • create(session, id, name, parent, depositable) • create a new Subject • depositable specifies whether or not users can deposit eprints in the subject

  39. Subject Accessors • children() • get a list of Subjects which are the children of the subject • get_parents() • get a list of Subjects which are the parents of the subject • subject_label(session, subject_tag) • get the full label of a subject, including parents

  40. Subject Accessors • count_eprints(dataset) • get the number of eprints associated with the subject • posted_eprints(dataset) • get a list of EPrints associated with the subject

  41. Rendering Subjects • render_with_path(session, topsubjid) • get a DocumentFragment containing the subject path • example of a subject path: H Social Sciences > HD Industries. Land use. Labor > HD28 Management. Industrial Management

  42. Subscription • A stored search which is performed every day/week/month on behalf of a user • get_user() • get the User who owns the subscription • Another subclass of DataObj

  43. Creating Subscriptions • new(session, id) • create a Subscription object from an existing subscription • create(session, userid) • create a new Subscription object for the given user

  44. Processing Subscriptions • send_out_subscription() • search for new items matching the subscription settings • email them to the user owning the subscription

  45. DataObj Hierarchy

  46. So Far.. • We’ve looked at individual data objects • but an EPrints archive holds many eprints and documents, has many registered users etc. • how do we access them collectively? • We’ve seen the get_value and set_value methods for metadata • but an archive’s metadata is configurable • so how do we know what metadata fields an EPrint, User etc. has? • how do we access properties of the fields?

  47. 2. Data Collections DataSets and MetaFields

  48. Dataset • A collection of data items • Tells us all the possible types in the collection • e.g. EPrints may be article, thesis • Tells us the fields in each type • e.g. article has title, authors, publication... • e.g. conference_item has title, authors, event_title, event_date.. • Can also tell us all the fields that apply to a dataset • title, authors, publication, event_title..

  49. Dataset Configuration • ArchiveMetadataFieldsConfig.pm • fields in each dataset • additional system fields defined in EPrint.pm, User.pm etc. • metadata-types.xml • types in each dataset • fields that apply to each type

  50. Datasets in EPrints • archive • EPrints that are live in the main archive • buffer • EPrints that have been submitted for editorial approval • deletion • EPrints that have been deleted from the archive • inbox • EPrints which users are still working on • eprint • All EPrints from archive, buffer, deletion and inbox

More Related