1 / 59

Gordon Bell research.microsoft/~gbell mylifebits

MyLifeBits: Personal archive issues Archiving persons & things, past & future in cyberspace not cardboard boxes Imaging Science & Technology Conference on Archiving San Antonio, TX. Gordon Bell http://research.microsoft.com/~gbell http://www.mylifebits.com Microsoft Bay Area Research Center

dbarrera
Download Presentation

Gordon Bell research.microsoft/~gbell mylifebits

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MyLifeBits: Personal archive issuesArchiving persons & things, past & futurein cyberspace not cardboard boxesImaging Science & Technology Conferenceon Archiving San Antonio, TX Gordon Bell http://research.microsoft.com/~gbell http://www.mylifebits.com Microsoft Bay Area Research Center 22 April 2004

  2. Overview • “Just digitize”-- have “meta-data” sans the morass • Archive.org: “Access to all human knowledge” • Challenge: Archiving Corporate & Personal Lives • California Finder; e.g. Apple Collection • Einstein, Allen Newell, Joshua Lederberg • What small fraction of their lives? • ChM: Collecting companies, computers, & people • MyLifeBits: Realizing Bush’s Memex o(1TB/life) • Dear “appy” and other problems

  3. Some aspects…bottom line • Storage is free: Just move “it*” there with meta-data; Many others are doing it. But will anyone ever find “it”? • Projects: archive.org, million book project, prof. orgs. • Born Digital; LofC & Google; library & institutional capture • When will “born digital” helps archiving? What is needed? • Distributed scanning & meta-data creation at ChM • Finding aids; authored web sites for boxes of paper. Moving beyond computerizing “card catalogs” • How do you segment & D.coreize a paper archive? • Value beyond year, title, author, genre? Algorithms needed! • Automatic Dublin Coreization is critical to scale! • Many issues: IP, longevity aka “dear appy”, privacy, …because of the ubiquity of the technology & we can *Our cyber content

  4. The “dear appy” problem Dear Appy, How committed are you?Please come back to me.Forever yours truly, Lost and forgotten data • Who’s responsible? • Media: the 8 track cassette, 8” floppy problem • Platform, file, and maybe a database • Encodings: evolving, incompatible format standards for legacy data that disregard ancestors • App: evolving and/or disappearing apps

  5. By Gordon Bellhttp://research.microsoft.com/~gbell Dear Appy, How committed are you? Signed, Lost and Forgotten Data Dear Appy, I'm having trouble with long-term commitment -- not on my end, heaven knows, but from the apps that created me and with whom I like to associate. Over time, these pesky apps evolve and they simply don't recognize the data that they once helped create! But, we data progeny -- and there are lots of us -- feel that as our creators, should be responsible for eternal support. But the little problem with recognition isn't the worst of it – sometimes the apps even disappear altogether. I ask you, is it expecting too much for 20-something year old data like me to be interpretable by my app (e.g. Acrobat, DB2, Draw, Eudora, Office, Quicken, or RealNetworks), or am I just associating with irresponsible apps? If things continue on their current path, it seems I will be completely gone and un-interpretable within 20 to 50 years! My apps will move to other platforms, or evolve to be more Internet- or Next-Big-Thing-centric...

  6. LofC book library o(18Mbooks) in o(50TBytes) or on 150 drives Our 2004 home media centers are 8 TBytes!

  7. Book picture Capturing content from the physical world

  8. Why preserve an “original” reprint? “Xerox” copy? Or laser printer output? At $25/cu.ft./yr. for 2500 pages ($0.01/page per year)

  9. Archive.org • “Universal access to all human knowledge” • Started as archiving the internet. • Includes 20 WW TV channels • Book scanning as part of million book project • Bookmobile “print anything on demand” • 100s of rock n’ roll bands • 20K accessible movies & video lectures

  10. India: 2 in 2003

  11. Archive.org$2K/terabyte/year $0.0004/page/year Archive.org

  12. Computers (lib of alex)

  13. Million book project US/English:1M, France:200 K, Japan:300 K $10 to scan, $100 to buy, scan, match with a catalog, and endow for future format changes Scanner picture Courtesy Brewster Kahle Archive.org

  14. Rent’s cheap, but it’s hard to get there • Good news: it costs very little to live in cyberspace • Cyberspace: $0.0004/page/yr. You can spend 10-100x! • Physical: $0.01/page/yr. $25/cu.ft./yr. for 2500 pages,provided you don’t access it! • Bad news: it costs too much to move to cyberspace • Books at $10/book or <$0.10/page • Docs $0.10/page scan + $0.10 meta-data (manual); • $250-500/ft. • Cross-over… x years depending on interest rate, etc. • Obvious solutions: • Capture all new material before it gets into physical space • Automatically create meta-data (time, title, genre, author) • Just “Google”

  15. Archiving persons and things… • www.oac.cdlib.org for 0(1K) corporations, people, places, things. • List of finders, usually -> paper boxes! • E.g. Apple collection at Stanford points to 600’ or say $1K/ft. • www.AlbertEinstein.org Einstein’s papers, etc. • diva.library.cmu.edu/Newell/ for Allen Newell • profiles.nlm.nih.gov/ Nobel Prize winners, Lederberg • www.ComputerHistory.org computing artifacts • www.MyLifeBits.com project to capture entire life

  16. List of finding aids

  17. Or do you put it in a box?

  18. Apple at Stanford

  19. www.alberteinstein.info

  20. Allen Newell page

  21. Lederberg

  22. A giv Number of document segments

  23. :f X/diary.nobel This is a transcript of JL diary note for October 26, 1958 announcement of Nobel Prize I was not keeping a diary in those days but this particular event led me to make notes on it just at the time. Joshua Lederberg. handwritten letter transcribed Mon Ott 5 13:02:49 EDT 1998 Sunday, 26 Oct. (1958 About 1l:OO this a.m. I had gone to the lab to clean up the grant applications * I’ve been working on (to the essential exclusion of my lab work in recent weeks!). I’d gotten up rather early, had some coffee for breakfast and left, while Esther was having hers. Last night: an Australian party at home -- the Crawfords from Melbourne (history); Phyllis Rowntree; Maggie Blackwood and the Leslie Osborns (Psychiatry here). I was to work at the lab until about 12:30, then pick up Phyllis and Margaret for lunch and then see Phyllis off to her plane: --> Columbus-->Denver--> SFO-->Sydney. At I1:30 + or there was a call from a Mr. Lindquist of the “Tijding...” newspaper in Stockholm -- the New York correspondent. He explained his call to my astonishment that Beadle, Tatum, and I were to be the co-recipients of the Nobel prize in medicine this year. I was rather incredulous: he insisted the AP was quoting the rumors and he was quite sure it would be announced Thursday. It’s no surprise, of course, that Beadle should be honored this way and it is a perceptive courtesy for Tatum but I am still quite astonished (as I was for the NAS last year) to be added on. I just had the impression that this kind of dignification in biology should go to the venerables and veterans and it is a bit of a shock to be classed that What a mixed list it is! The “distinction” works out to the cash and to the public fuss that somehow has grown up around it. 1908 was Ehrlich Metchnikov; Muller was 1946 and to think of it did NP give him such a fuss!?? ? have to think about scheduling trip to ST0 in December -- by jet? I suppose just have to concede that all our plans will be upset. . 4 lines deleted, family private

  24. Abstracts Agendas Announcements m Application forms Articles m Autobiographies m Bibliographies m Biographies m Brochures m Certificates m Correspondence m Diaries m Drafts (documents) Drawings m Electronic images m Essays m Eulogies Excerpts Grant proposals Interviews m Invitations Laboratory notebooks m Laboratory notes Lecture notes Lectures m Legal documents m Legislative records Lists Manifestoes Memoirs m Minutes Monographs m Narratives Newsletters Newspaper columns m Notebooks m Notes Obituaries Official reports Oral histories m Petitions Photographic prints m Lederberg genre or artifact types Press releases m Procedures Proceedings m Programs m Proposals m Questionnaires Reminiscences Reports m Resolutions Resumes Reviews m School records Speeches m Summaries Tables (documents) Technical reports m Transcripts m Typescripts Video recordings m

  25. Email as a carrier for many document types Any personal info Calendar, contact Clipping… biographical Correspondence (all) Diary, log, scrapbook Financial, forms, legal Photo, music, video Property Recommendation … Personal library Professional Plan, project, proposal Computer source code Correspondence Org chart Presentation & speeches Ad, announcement, cards (many kinds), certificate, ephemera & memorabilia, instruction, What we don’t know about Lederberg!

  26. More aspects of personal archiveswill exacerbate content capture • Many new media…besides email • In effect, email is conversation. This adds tremendous noise for retrieval! • Who owns a person’s lives? Another person? A company? E.g. VAX Strategy? • Tablets to come will enhance notebook capture • ACM CARPE: Continuous Archival and Retrieval of Personal Experiences

  27. Computer History Museum • 1401 Shoreline, Mountain View

  28. Archiving computing artifacts • Charles Babbage Institute …Smithsonian is similar • 135 collections 8K cu.ft. (20 M pages; 2 TB) • 160 oral histories (30MB/hr =6000 MB) • 150 K photos (@1MB, 150 GB) • Computer history Museum • 6 K physical objects: world’s best artifact collection • 10 K photos • 2 K videos (<1 TB); including recent DV taped interviews • 12 M pages books, manuals, brochures, papers, (1.2 TB) • ?? Of executable source & object codes • 200 volunteers & many more world-wide Amateurs versus professionals.

  29. Artifact (“the machine”) Dormant or operating Hardware or software Project, people, plan Timeline of project Plan, schedule Specification, manuals Design Organization Communication Articles, books Interviews, talks, etc. Business aspects Plan, sales, marketing Ads, brochures, etc. Competitors Use User experience Video about it’s use Accessibility Raw bits, finding aid Interpreted story Exhibit Computer History MuseumArtifact Collecting… the world is bits

  30. ChM Software Acquisition

  31. I am data

  32. MemexAs We May Think, Vannevar Bush, 1945 “A memex is a device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility” • Full-text search, text & audio annotations, and hyperlinks

  33. The guinea pig • Gordon Bell is digitizing his life • Has now scanned virtually all: • Books written (and read when possible) • Personal documents (correspondence including memos and email, bills, legal documents, papers written, …) • Photos • Posters, paintings, photo of things (artifacts, …medals, plaques) • Home movies and videos • CD collection • And, of course, all PC files • Now recording: phone, radio, TV (movies), web pages… conversations and meetings to come • Paperless throughout 2002. 12” scanned, 12’ discarded. • Only 30 GB!!!

  34. Capture and encoding

  35. Wearable & interactive jewellery LEDs flash according to sensor type triggered

  36. Potentially useful trivia – but not normally photographed

  37. Kentaro Toyama wwmx.org

  38. gbell wag: 67 yr, 25Kday life

  39. MyLifeBits organization: time and space Archival (time) Working Timeline/ Context(space) Personal (some $s) GB Co.(angel, etc.) Professional ACM, etc., … @Microsoft.com, New co’s.

  40. Radio capture tool Telephone capture tool PocketPC transfer tool PocketRadio player TV capture tool Radio EPG tool TV EPG download tool MAPI interface Legacy email client Browser tool Internet files Legacy applications MyLifeBits Shell IM capture Voice annotation tool Text annotation tool Import files MyLifeBits Software MyLifeBits store database

  41. I mean everything

  42. Timeline view tells a story

  43. Value of media depends on annotations • “Its just bits until it is annotated”

More Related