1 / 90

Dr. C. Lee Giles David Reese Professor, College of Information Sciences and Technology

IST 511 Information Management: Information and Technology Digital Humanities and Research Methods. Dr. C. Lee Giles David Reese Professor, College of Information Sciences and Technology Professor of Computer Science and Engineering Professor of Supply Chain and Information Systems

taro
Download Presentation

Dr. C. Lee Giles David Reese Professor, College of Information Sciences and Technology

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. IST 511 Information Management: Information and Technology Digital Humanities and Research Methods Dr. C. Lee Giles David Reese Professor, College of Information Sciences and Technology Professor of Computer Science and Engineering Professor of Supply Chain and Information Systems The Pennsylvania State University, University Park, PA, USA giles@ist.psu.edu http://clgiles.ist.psu.edu Special thanks to V. Ryabov,

  2. Today • What are the digital humanities • What are research methods • Qualitative • Quantitative • Computational • Last time: • Digital libraries • Scientometics and bibliometrics

  3. Tomorrow • Your research presentations

  4. Digital Humanities • Other names for the digital humanities • Computational humanities • Computational archaeology • Computational history • etc • Cultural informatics

  5. Digital Humanities

  6. Humanities • What are the humanities? • Wikipedia • Stanford • National Endowment for the Humanities (NEH)

  7. History of the Digital Humanities • Not that old – 1940’s – start of digitization

  8. “Digitus Dei est hic!” http://www.corpusthomisticum.org/it/index.age

  9. Hockey’s Consolidation • 1970’s –mid-1980’s

  10. New Developments Mid 1980’s –early 1990’s http://www.tei-c.org/index.xml

  11. http://www.perseus.tufts.edu/hopper/

  12. Web/Humanities 1.0: From the few to the many Web/Humanities 2.0: From the manyto the many

  13. http://vos.ucsb.edu/ http://nines.org/

  14. Film and Media and Communication Studies Film Media Communication Cultural Feminist STS http://www.manovich.net/

  15. Explosion of new groups, communities, subjects

  16. The Disciplines Is this all?

  17. Archeology http://www.cast.uark.edu/other/nps/nadb/ http://www.u.arizona.edu/~mlittler/ http://www.cast.uark.edu/

  18. Art History and The Arts http://www.vraweb.org/ http://users.ecs.soton.ac.uk/km/projs/vasari/ http://www.getty.edu

  19. Classical Studies • Obsolescence and Preservation http://scriptorium.lib.duke.edu/papyrus/ Problematics http://nolli.uoregon.edu/rioni.html http://www.romereborn.virginia.edu/

  20. History http://valley.vcdh.virginia.edu/ http://ashp.cuny.edu Accessibility

  21. Teaching and Learning • New Learning Environments • New Subjects • New Pedagogies • Digital Disconnect

  22. Literary Studies What happens to lit and “literary” in the age of digital tech? http://www.rossettiarchive.org/ http://www.emilydickinson.org/ http://ted.streamguys.net/ted_rives_mockingbirds_2006.mp3

  23. Thematic of Textuality vs. Visuality Jerome McGann‘: digital technology and literary studies written (and published) between 1993 and 2001. Episodes in the history of McGann's engagement with the intellectual opportunities offered by the interaction between computer power, digital technology and literary studies. Richard Mayer: For hundreds of years verbal messages have been the primary means of explaining ideas to learners. Although verbal learning offers a powerful tool for humans, this book explores ways of going beyond the purely verbal

  24. Teaching and LearningDigital Disconnect Electracy Oral Print Electronic

  25. Examples http://www.vectorsjournal.org/issues/index.php?issue=5 http://vectors.usc.edu/issues/05_issue/bluevelvet/ http://www.vectorsjournal.org/index.php?page=7&projectId=86

  26. Wayne State Digital http://www.lib.wayne.edu/resources/digital_library/index.php

  27. Social Networks

  28. From Remediation to Convergence and Intermediation

  29. HASTAC http://www.hastac.org/

  30. Digital Antiquity - Mission • Organization devoted to enhancing preservation and access to digital records of archaeological investigations • to permit scholars to more effectively create and communicate knowledge of the long-term human past; • to enhance the management of archaeological resources; and • to provide for the long-term preservation of irreplaceable records of archaeological investigations.

  31. We’re Losing the Archaeological Record • Explosion of Digital Information • >50,000 field projects/year, 1000s of databases • Primary archaeological data is now “born digital” • Absence of Trusted Repositories • Few institutions capable of long-term data curation • Media on which data resides is treated as an artifact • Standard work flows do not move digital data into trusted repositories • Fragility of Digital Data • Media degradation & software obsolescence • Loss of data semantics (metadata) •  We need a trusted digital repository for archaeological documents and data

  32. Digital Antiquity’s Repository:tDAR - the Digital Archaeological Record • On-line, trusted digital repository for archaeological data and documents that • financially and socially sustainable, • long-term preservation of data & metadata • on-line discovery, and access for data and documents produced by archaeological projects. • web ingest interface: acquire metadata and user upload of data • Scope • targets digital products of ongoing research & legacy data • focus on archival data (not continuously updated databases such as site files) • Work of scholars in the US and the Americas more broadly

  33. Digital Antiquity Builds on the ADS Model • The Archaeology Data Service (ADS) in the UK has a 10 year track record of success • ADS is heavily staffed (ca 10FTE), provides a high level of curation and high quality archive • ADS provides a refined presentation layer for its projects • ADS processes a relatively small number of projects (ca 200) each year at a high unit cost

  34. Digital Antiquity Diverges from ADS In Order to Scale to the US Situation • 50,000 federally mandated cultural resource field projects conducted each year in the US. • tDAR aspires to capture the digital data and documents from a substantial fraction • Implies a different business model • Demands much heavier reliance on users to provide metadata that make their data meaningful • Requires a user-friendly ingest interface for metadata acquisition and data upload

  35. Prototype Ingest Interface

  36. Preservation and Access Requirements • To maintain the utility of data, we must preserve the data (bits) on a sustainable media, in a sustainable format, along with their semantics • Existing coding keys and manuals are inadequate • Cannot require universal coding schemes • We must employ ontologies to allow naive users to locate relevant resources. • We must plan for integration of data that employ different systematics. • We must collect detailed database metadata (e.g., at the table, column, and value level) • Need persistent URIs, DOIs

  37. Metadata & Database Semantics • Standardization of original data on deposit is unacceptable • We must capture, not transform, original semantics • Digital coding sheets at dataset registration time • Our representation is not highly abstract but structured by archaeological practice • On registration, the dataset creator • associates database codes with dataset labels through a coding sheet • and maps coding sheet labels to default (and possible alternate) ontologies created by material class experts

  38. Modeling Global Societal Evolution Over a Half-Century: Petascale Humanities ComputingInstitute for Computing in the Humanities, Arts, and Social Science at the University of Illinois and Center Affiliate of the National Center for Supercomputing Applications

  39. Research Directions • Forecast global stability • Model social group interactions • Gain a better understanding of the underpinnings of global unrest and how society functions • Quantify the flow of information across the world and how human societies produce and consume realtime information • Gain new understanding of the evolution of the civil war discourse

  40. The Digital Humanities • Very large field, encompasses a tremendous variation in applications • Focus on the textually-driven humanities, such as history, journalism, etc

  41. Quantitative Qualitative Computation • Digital humanities requires “Quantitative Qualitative Computation” – find ways of converting the “latent” aspects of language into computable numeric indicators • Historically have focused on facts and discarded the rest as “uncomputable” • More recently, dimensions such as “tone” have become booming industries (brand mining)

  42. Quantitative Qualitative Computation • VERY computationally expensive • Easy to take Google Ngram dataset and plot frequency of “democrat” vs “republican” in time to see who gets more book coverage each year • Gauging which one gets the most POSITIVE coverage, however, and WHERE that coverage comes from requires a LOT of computation

  43. Building a Global Map • The map at the start of this presentation visualizes a geographic cross-section through a much larger dataset: a petascale network • What does a digital humanities pipeline look like?

  44. Petascale Networks • Start with a petascale network • 10 billion actors connected by over 100 trillion relationships just from a single dataset covering only 30 years • Assuming simple tuple structure: ID,WEIGHT,ID, that’s 8b * 3 = 24 bytes * 100 trillion rows = 2.4PB • Need this all memory-resident for random access across the ENTIRE dataset • This is just a small pilot dataset • Data is XD

  45. From “Big Data” to “Really Big Data”

  46. Is XD really “Big Data?” • Total disk of all current production XD systems combined: 12.1PB (Gordon is 1/3 of the entire XD) • If we add all XD tape silos, we get 34.1PB • The entire national allocated research infrastructure is just 12PB of disk and 22PB of tape! • Microsoft’s Bing search engine uses 150PB of spinning disk • Biggest scientific projects will generate only 10-20TB / day of data, while Twitter alone produces 28GB of new data a day and Bing processes 2PB / day

  47. “Really Big Data” • Traditional sciences are “small data” compared with the information world of news and social media • 200 MILLION new tweets a day • 1BILLION new Facebook items a day: average person adds 3 items to Facebook every single day

  48. “Really Big becomes REALLY Big” • Social media in particular is vastly outpacing traditional information sources • Entire New York Times 1945-2005 = 18M articles = 2.9 billion words • 5 BILLION words added to Twitter each DAY (almost twice the total volume of the Times in the last 60 years)

  49. And Even Bigger • HaitiTrust includes Google Books and contains 4% of all books every printed = 9.4 million digitized works = 3.3 billion pages = 2 trillion words • Estimated 49.5 trillion words ever printed in books over last 600 years • Twitter alone will reach that size in just 27 years with zero additional growth. With its current rate of tripling post volume each year, it will take just three years

More Related