1 / 53

Data Warehousing and Mining Data from Library and University Systems for Assessment of Library Operations

Data Warehousing and Mining Data from Library and University Systems for Assessment of Library Operations. ENUG Conference Cheng Library , William Paterson University, Wayne , New Jersey, Thursday , October 21, 2010 Ray Schwartz, Systems Specialist Librarian

skylar
Download Presentation

Data Warehousing and Mining Data from Library and University Systems for Assessment of Library Operations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Warehousing and Mining Data from Library and University Systems for Assessment of Library Operations ENUG Conference Cheng Library, William Paterson University, Wayne, New Jersey, Thursday, October 21, 2010 Ray Schwartz, Systems Specialist Librarian Cheng Library, William Paterson University, Wayne, New Jersey, USA schwartzr2 @ wpunj.edu

  2. Outline • What is Data Mining and Data Warehousing and Why Do We Do It? • Our Library and University • Patron Statistical Categories • Application Server • Reporting

  3. Collecting Transactional Data • ILSs collect transactional data for circulation and allocation of collection funds. • ILL and Document Delivery services supply general transactional data. • Reports from vendor services • Bibliographic utilities • Subscription agents • Book jobbers

  4. Collecting Transactional Data cont. • Most ILSs have search and web server logs • Most (if not all) Databases have usage reports • Link Resolver logs • Proxy Server logs • Many other ways of collecting transactional data. • Gate counts • Reference transaction counts • Reshelving counts

  5. What would we like to see? • Breakdowns by department and majors. • Combined usage by department/majors of more than one library service.

  6. What is Data Mining and Data Warehousing • Extracting data from legacy systems and other resources; • cleaning, scrubbing and preparing data for decision support; • maintaining data in appropriate data stores; • accessing and analysing data using a variety of end user tools; • and mining data for significant relationships. • Chaffey, D., Mayer, R., Johnston, K., & Ellis-Chadwick, F. (2002). Internet Marketing: Strategy, Implementation and Practice (2nd ed.). Financial Times/ Prentice Hall.

  7. The primary purpose of these efforts is to provide easy access to specifically prepared data that can be used with decision support applications such as management reports, queries, decision support systems, executive information systems and data mining. • Chaffey, D., Mayer, R., Johnston, K., & Ellis-Chadwick, F. (2002). Internet Marketing: Strategy, Implementation and Practice (2nd ed.). Financial Times/ Prentice Hall.

  8. Our University • 9000 undergraduates • 1000 graduates (mostly education majors)‏ • 400 faculty • 800 adjuncts • 1000 staff

  9. Our Library • 19 librarians and 26 library staff • 350,000 volumes • 18,000 audiovisual items • 47,000 print and electronic periodicals • 124 general and subject specific databases • $1,100,000 Non-Salary Allocations

  10. Our Transactions • 600,000 Database Searches • 413,000 Gate Counts • 40,000 Library Materials Circulation • 34,000 Equipment Circulation • 19,000 Reference Queries • 3,000 Interlibrary Loans • 5,000 Documents Delivered

  11. Our Systems • Voyager ILS • Clio ILL Software • EZProxy Server • Banner – University ERP • University Networked Drive K: • University Email Server • University Web Server

  12. Vendor Services • Serials Solutions • A to Z list • MARC Record Service • Link Resolver • OCLC – Bibliographic Utility • Worldcat Collection Analysis • Coutts (was Blackwell)– Book Jobber • Ebsco – Subscription Agent • Marcive – Authority Control • Database Vendors

  13. Email Reports from the ILS

  14. Voyager Overdue and Fine Notices - Daily

  15. Quarterly Extract for Serials Solutions AtoZ Service

  16. Which categories of patrons are accessing which services?

  17. First Step – Patron Statistical Categories

  18. Voyager Patron Database allows a maximum of 10 statistical categories per patron record. • Decide which statistical categories are needed for each patron group defined. • Work with your University Information Systems Department to extract the relevant data from the relevant sources.

  19. Circulation Books Media Reserve By Fund Code Location ILL / Document Delivery Databases Library Web Pages Subject Area Resource Guides Reference Requests Catalog Other Vendor Services Serials Solutions Groups and Services • Major • Status • Undergrad or Grad • Faculty, Adjunct Faculty or Staff • Department • College • Degree • No. of Credits • Year of Study • Campus Location

  20. PATRON STATUS BOOK CIRC MEDIA CIRC EQUIP CIRC TOTAL CIRC MEMBERS BORROWERS % BORROW ING CIRC/ MEMBER CIRC/ BORROWER UNDERGRADUATE STUDENTS 2,715 250 698 3,663 238 186 78% 15.39 19.69 GRADUATE STUDENTS 419 13 76 508 14 13 93% 36.29 39.08 ADJUNCT FACULTY 100 65 20 185 32 20 63% 5.78 9.25 FULL-TIME FACULTY 159 115 194 468 24 23 96% 19.50 20.35 HISTORY TOTALS 3,393 443 988 4,824 308 242 79% 15.66 19.93 LIBRARY TOTALS 23,370 8,713 20,703 52,756 7,418 4,981 67% 7.11 10.59 DEFINITIONS: BOOK CIRCULATION = books, book disks, maps, oversize, Curriculum materials, reserve books, NJ History, Leisure Lounge MEDIA CIRCULATION = audio & video materials, including media reserves EQUIPMENT CIRCULATION = camcorders, overhead & data projectors, laptops, easels, DVD players, etc. MEMBER = declared major or department member BORROWER = any member who borrowed materials Library Total = declared undergrad & grad majors, adjuncts & full time faculty borrowers History Department - 12 months - Feb. 2008

  21. Communications Majors FY08/09

  22. Challenges with combining data from variousservices • Little to no linkage of data • Multiple user IDs for authentication

  23. Second Step – Setup an Application Server

  24. What is an Application Server? • A machine or its software that works in conjunction with a web server to deliver application services such as the dynamic creation of a webpage from content stored in a database.From http://www.webtools.ca.gov/help/Glossary.asp • Web Server Software (Apache or IIS)‏ • Database Management System – DBMS (MySQL, Oracle, MS SQL Server)‏ • Scripting Language (Perl, PHP, ColdFusion, ASP)‏

  25. Why an Application Server? • Relevant data in logfiles need to be in a database to be analyze. • Need your own DBMS to create new tables and queries.

  26. Decide how you will use the Application Server. • Decide on the best and most plausible configuration.

  27. Authentication of ILL and other forms are routed through the EZProxy server

  28. Daily and Weekly Email Reports from the Application Server • Circ Fines Audit Daily Report - Daily at 6:05 AM. • Dupe Patron Record Report - Daily at 5:56 AM. • Hobart Media Services Equipment Pickup Summary - Daily at 6:58 AM. • Media Service Scheduling Rooms Report - Daily at 6:02 AM. • Media Services Equipment Pickup Summary - Daily at 7:00 AM. • Received Title Alert - Daily at 6:59 AM. • Reserves Overdues - Daily at 5:59 AM. • Scheduled LIS Tasks - Daily at 6:00 AM. • ILL Borrowing Overdues Report - Weekly at 5:59 AM. • ILL Lending Reports - Weekly at 6:15 AM.

  29. Monthly Email Reports from the Application Server • Circ Fines Audit - Monthly at 6:10 AM. • Circulation by Location and Item Type - Monthly at 6:21 AM. • Circulation Lost and Paid - Monthly at 6:25 AM. • Circulation Online Renewal Count - Monthly at 6:30 AM. • Media Circulation - Monthly at 6:35 AM. • Reserve Circulation - Monthly at 6:40 AM.

  30. On Demand Reports

  31. Lending Services Reports • Lists of patrons with finesbetween $10 and $19.99  • Student and Alumni fines list - Sorted by either Name, Amount or Notice Date. • PALS and Courtesy Patron fines list - Sorted by Name. • All other Patron fines list - Sorted by Name.    • Lists of patrons with finesover $19.99  • Student and Alumni fines list - Sorted by either Name, IID, Amount, Notice Date or Notes. • PALS and Courtesy Patron fines list - Sorted by Name. • VALE Patron fines list - Sorted by Name. • All other Patron fines list - Sorted by Name.    • Lists of patrons with overduesolder than 30 days • Student and Alumni overdues list - Sorted by either Name, IID or Notes. • PALS and Courtesy Patron overdues list - Sorted by Name. • All other Patron overdues list except VALE - Sorted by Name.

  32. Lending Services Reports, cont. • Lists of VALE patrons with overduesolder than 6 months • VALE patron overdues list - Sorted by Name. • Miscellaneous Reports • Patrons with the word "Collection Agency" or "CA" in their notes. • Patrons with the word "FINE" in one of their notes. • Patrons with the word "SOILS" in their notes. • Patrons with the word "FALL07 SOILS" in their notes. • Patrons with the word "HOLD" in their notes. • Combined list of HOLD, FINE, and CA. • Circulation Reports by Item Type from 2003 to the present • All Staff. • All Colleges • Undergraduates by Major. • Graduates by Major • Patrons that have reached a total fine balance of $10 or more after 31-Dec-2009 and 30-Nov-2009

  33. One of Our Projects • Mining EZProxylogfiles and linking to patron statistical categories from the Voyager Patron Database • What majors and departments are accessing which database services? • What majors and departments are accessing the ILL services?

  34. ILL request form authentications by major

  35. Which Databases are accessed by Majors and Departments?

  36. By Major and Host

  37. By Dept and Host

  38. By Dept and Service

  39. IP Address Location = 149.151.VlanID.*

  40. FY08/09 On Campus Hits to Databases by Class C IP Address

  41. Patron Privacy and Standards

  42. Using Voyager as the model for Patron Privacy

  43. Active Circ transactions are stored in a table with patron ID and statistical categories. • Completed Circ transactions are stored in a table without the patron ID, but still with the patron statistical categories. • The Patron Table contains the total counts of transactions for each patron, but no link to which transactions they are.

  44. EZProxy transactions would be stored in one table with patron statistical categories, but without the user ID. • User ID s would be stored in another table with counts for each service divided by academic year. • Logs are collected monthly and loaded and deleted monthly.

  45. Example of EZProxy log entry • Ip address • (Not used) • user id • date/time • Method • page retrieved • Version • response code • no. of bytes • Referring URL • User agent • nj.dhcp.embarqhsd.net • - • theuser • 1/1/2008 4:25:15 AM • GET • http://ezproxy.wpunj.edu:2048/connect?session=sGHMbeSss121YxZa&url=http://www.wpunj.edu/scripts/webscript.exe?fs.scr • HTTP/1.1 • 302 • 537 • http://ezproxy.wpunj.edu:2048/login?url=http://www.wpunj.edu/scripts/webscript.exe?fs.scr • Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)‏

  46. Perl Script for loading ezproxy log into MySQL use strict; my %month=(Jan=>'01',Feb=>'02',Mar=>'03',Apr=>'04',May=>'05',Jun=>'06',Jul=>'07',Aug=>'08',Sep=>'09',Oct=>'10',Nov=>'11',Dec=>'12'); while (<>){ my $pattern = '^(\S*) (\S*) (\S*) (\S*) '. '\[(..)\/(...)\/(....):(..):(..):(..) .....\]'. ' "(\S*) (\S*) (\S*)" '. '(\d*) (-|\d*) "([^"]*)" "([^"]*)"'; if (m/$pattern/){ my ($tgt,$ref,$agt) = (esc($12),esc($16),esc($17)); my $byt = $15 eq '_'?'NULL':$15; print "INSERT INTO ezproxylogs VALUES ('$1','$2','$3',". " TIMESTAMP '$7/$month{$6}/$5 $8:$9:$10','$11','$tgt',". "'$13',$14,$byt,'$ref','$agt');\r."; }else{ print "--Skipped line $.\n"; } } sub esc{ my ($p) = @_; $p =~ s/'/''/g; return $p; }

  47. Created table to assist the linking SELECT PATRON_ADDRESS.ADDRESS_TYPE, Left([ADDRESS_LINE1],InStr([ADDRESS_LINE1],"@")-1) AS usr, PATRON_ADDRESS.PATRON_ID, PATRON_ADDRESS.ADDRESS_STATUS, PATRON_ADDRESS.EFFECT_DATE, PATRON_ADDRESS.EXPIRE_DATE, PATRON_ADDRESS.MODIFY_DATE, PATRON_ADDRESS.MODIFY_OPERATOR_ID INTO emailprefix FROM PATRON_ADDRESS WHERE (((PATRON_ADDRESS.ADDRESS_TYPE)="3"));

  48. Reporting and Standards • Reporting • Emailed periodically - e.g., daily dossiers, and other event triggered reports. • On demand, via email, web pages or a printer. • Standards • Share data for comparative research. • Groups of libraries and consortia

More Related