1 / 34

Case Study: RFA Migration

Case Study: RFA Migration. How I migrated 208,566 news stories from Bricolage to Plone. Alex Clark • http://aclark.net March 12, 2008 • Plone Symposium East. Who Am I?. Plone Consultant Non-profits in DC Foundation Member Zope/Python Users Group of DC (ZPUGDC) Events Organizer

Download Presentation

Case Study: RFA Migration

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Case Study: RFA Migration How I migrated 208,566 news stories from Bricolage to Plone. Alex Clark • http://aclark.net March 12, 2008 • Plone Symposium East

  2. Who Am I? • Plone Consultant • Non-profits in DC • Foundation Member • Zope/Python Users Group of DC (ZPUGDC) Events Organizer • “UNIX guy”, sysadmin, Bachelor of Science in Computer Science, not really a programmer.

  3. What is this? • An example of a “successful” migration, YMMV (your mileage may vary). • Inspiration-a-palooza! If I can do it, anyone can. • An opportunity to learn from my mistakes. • Analyses at the end. • XXX: News ‘story’ not ‘news item’ ;-) • i.e. rfasite product ‘story’ content type, not Plone default content type ‘news item’. • Medium to large size migration

  4. What this is not • Plone vs. Bricolage. • How to: <your migration>. • Best practice (OK, maybe some best practice.)

  5. Radio Free Asia • RFA is a private, nonprofit corporation that broadcasts news and information in nine native Asian languages to listeners who do not have access to full and free news media. The purpose of RFA is to provide a forum for a variety of opinions and voices from within these Asian countries. • Our Web site adds a global dimension to this objective. If you have comments, questions or suggestions, please contact us…

  6. Before

  7. After • Not yet! ;-)

  8. Pre-migration decisions i.e. how to get the data out of the old site? • Relational database “content”? • No one understood the Bricolage data model. • http? • I didn’t want to crawl the website. • “Baked” content on the filesystem. • provided the clearest migration path. • Find /var/www/rfa -name index.html

  9. Zopectl run, then what? • Need a way to structure the migration of 10 different language services • e.g. zopectl run mandarin.py. • Need to ‘walk’ the file system. • i.e. how do we find the stories. • Need a way to parse the html on the file system, • i.e. we can’t shove the entire index.html into the body via setText() • Need to do Unicode conversions. • E.g. from Big5, euc_kr, gb2312, ascii to Unicode.

  10. Zopectl run, then what? • Use Framework for performing asynchronous tasks, http://www.simplistix.co.uk/software/zope/stepper • Use os.walk, http://docs.python.org/lib/os-file-dir.html (in particular cb2_examples/cb2_2_16_sol_1.py) • Use HTML parsing, http://docs.python.org/lib/module-sgmllib.html (in particular diveintopython-5.4/py/BaseHTMLProcessor.py) • Use Unicode conversions, http://docs.python.org/lib/standard-encodings.html

  11. Stepper Basics • Allows you to break your migration into pieces. • Commits transactions for you. • Zopectl run run.py site-object steps-or-chains

  12. Stepper config.py

  13. Basic Results • The ‘create’ step creates the site structure based on a list of categories defined in categories.py • The ‘migrate’ step walks the file system looking for index.html files, then • Extracts the contents • Invokes the Factory on the new object in the context of the category. • Calls mutators to insert content into fields, • E.g. obj.setTitle(title_extracted)

  14. Intermediate Results(How to: Promise Too Much) • Slug-i-fication: Turning • /english/news/symposium_talks_rfa/2008/03/12/index.html into • /english/news/20080312-symposium_talks_rfa.html • Change “category” names, e.g. from • /english/news to • /english/exciting_news. • Import audio and image files from file system • insert into story fields and/or story folders (stories are folderish). • Featured audio or image, vs. inline audio or image.

  15. Advanced Results(How to: Really Promise Too Much) • Related Links • At the bottom of each story are related links. • Slug-I-fy then insert them inline. • Slug-I-fy, change the category, then insert them inline.

  16. No, Really… • I promised too much.

  17. The RFA Migration Story • 10 Language Services • 208,566 stories • 5 Different encodings • 70GB of content on the file system • Hundreds of categories

  18. The RFA Migration - E! True Hollywood Story • Images everywhere • /english/category/story/2008/01/01/index.html has image • /english/category/story/2008/01/01/foo.jpg and • /english/images/foo.jpg • Audio everywhere • Duplicate stories everywhere • Stories published as • /english/category/story/2008/01/01/index.html were also published as • /english/category2/story/2008/01/01/index.html.

  19. Sidebar: Buildout vs. Buildit • Shortly after this project began, Buildout became the de facto standard for deploying a Plone site. • Deploy migration code and sample data with your buildout. • e.g. bin/buildout -c migration.cfg • where migration.cfg installs your migration code and sample data • Even better: bin/migrate

  20. And now the moment you have all been waiting for! • Run buildout • Add site • Configure migration • Run migration

  21. Run buildout and add site

  22. Configure migration ; run migration

  23. Runme.py

  24. Site wide results

  25. Individual story results

  26. Showcase of all language services

  27. Wrap up • Unexpected results • Avoidable problems • General wrap up

  28. Unexpected results • Missing content • Wrong content • Silent failures

  29. Quick Fix for date!

  30. Quick Fix for duplicates!

  31. Quick Fix for broken content!

  32. Avoidable problems • Don’t promise too much • Don’t write bad code (read: bare try/excepts, etc.) • Don’t write slow code (use string methods over regular expressions, etc.)

  33. General Wrap-up • Client is happy • May actually launch soon • Huge rewards • Great learning experience • This talk • Help others • Things I would do different? • unrestrictedTraverse instead of app.rfa[‘english’][‘news’][‘20080101-slug.html’]

  34. Questions/Comments? • Email me: aclark@aclark.net • http://aclark.net • ACLARK.NET, LLC

More Related