1 / 23

Automation and Customization of Rendered Web Pages

Automation and Customization of Rendered Web Pages. Michael Bolin , Greg Little, Marcos Ojeda, Matt Webber, Philip Rha, Tom Wilson, Rob Miller MIT CSAIL http://uid.csail.mit.edu/chickenfoot Supported by NSF IIS-0447800. Web Applications. The Web has become a major application platform.

Download Presentation

Automation and Customization of Rendered Web Pages

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Automation and Customization of Rendered Web Pages Michael Bolin, Greg Little, Marcos Ojeda, Matt Webber, Philip Rha, Tom Wilson, Rob MillerMIT CSAIL http://uid.csail.mit.edu/chickenfoot Supported by NSF IIS-0447800

  2. Web Applications • The Web has become a major application platform

  3. Automating Repetitive Operations • Bookmark my latest bank statement • Download many links at once • Fill in defaults for forms

  4. Transforming Appearance • Change color scheme for better contrast • Concatenate multiple pages

  5. Integrating Multiple Web Sites • Bookstore has links for New Books, Used Books, Auction… but not for my local library • Realtor has lots of data about houses for sale… but not length of my commute

  6. Web Apps Are Wonderfully Open • Web apps have automatic hooks for scripting • Display: machine-readable HTML • Commands: generic HTTP requests • Presentation: editable HTML, stylesheets • Web “screen scraping” is already common, mainly behind the scenes (e.g., pricescan.com) • But most users don’t do it

  7. Problem: Many Web Apps Require A Browser • Many web apps depend on the rich browser environment • Cookies, authentication, SSL, session IDs, plugins, user-agents, client-side scripting, proxies • Perl/Python scripts run outside the browser, so they can’t easily access these web apps • Solution: do customization in the browser • Greasemonkey for Firefox • User Javascript for Opera

  8. Problem: Web Apps Are Scary Under the Hood • HTML source of most sites is complex • This complexity is a real barrier to automation & customization

  9. Solution: Use Rendered View • Chickenfoot: user shouldn’t have to look at HTML source to customize the Web

  10. Outline • Demo • Language • Commands • Keyword patterns • Implementation • Pattern matching algorithm • Evaluation

  11. Chickenfoot Language • Chickenscratch = Javascript + runtime library • Javascript syntax • Standard browser objects document.links[] window.open() • Document Object Model (DOM) Node, Element, Text, Range • Chickenfoot-specific objects and commands

  12. Commands • Page navigation go(url) openTab(url) fetch(url) • Clicking and form manipulation click(button-or-link) check(checkbox-or-radio) enter([textbox], value) pick([listbox], choice) • Pattern matching find(pattern) • Page modification insert(pattern, html) replace(pattern, html) remove(pattern) • Widgets & input handling new Link(html, action) onClick(pattern, action)

  13. Keyword Patterns • Keywords + component type • Component type is optional for click(), enter(), check(), pick() • Nested pattern matching: find(“start address form”).find(“city textbox”) feeling lucky button depart textbox search web form

  14. Keyword Patterns vs. Other Names Keyword “all words textbox” Javascript document.f.as_q XPATH //body/form/table[1]/tbody/tr/td/table/tbody/tr[0]/td/ table/tbody/tr/td[1]/table/tbody/tr[0]/td[1]/input …<td>with <b>all</b> of the words</font></td> <td><input value="" name="as_q" size="25" type="text">…

  15. Pattern Matching Algorithm • Find labels matching the keywords • Find components matching each label • Rank & choose best Pattern Ranked list of components google search button Matcher Web page 1.0 0.5 0.5

  16. 1. Find Labels Matching Keywords • Label = visible chunk of text • text nodes • button labels, listbox items • ALT attributes on images • Tolerant matching • capitalization • word ordering • punctuation • typos with <b>all</b> of the words

  17. 2. Find Component Matching Label • Search in rendered view • Component must be aligned with label • Degree of match given by: • pixel distance • relative position • HTML path length

  18. 3. Rank the Matching Components • Rank score for each <label,component> pair is computed from: • Match between keywords and label • Match between label and component • Highest-ranked component is returned • If there’s a tie, find() returns the ambiguous matches, but click/enter/pick/check() throw an error

  19. Evaluation • Web-based survey of textbox naming • 40 respondents (24 programmers, rest not) • Comprehension: which textbox on the page is identified by this pattern? • Generation: how would you identify this textbox uniquely using only words visible on the page?

  20. 40 0 0 40 0 0 38 2 0 40 0 0 37 2 1 Results of Generation Task Patterns for which algorithm found: Right match Wrong match Multiple matches 0 26 14

  21. Disambiguation Strategies • Keywords from section heading “above person not available Mi” • Counting “second mi” same caption

  22. Future Work • More component types for patterns • Programming by demonstration • Pointing at page to generate patterns • Clicking & form filling to generate scripts • Javascript syntax extensions box table image

  23. Conclusion • Chickenfoot automates and customizes web applications without looking under the hood • Simple language • Keyword patterns • Developmentenvironmentin web browser http://uid.csail.mit.edu/chickenfoot

More Related