1 / 8

See, Hear, Do: Language and Robots

Far Reaching Research (FRR) Project. See, Hear, Do: Language and Robots. Jonathan Connell Exploratory Computer Vision Group Etienne Marcheret Speech Algorithms & Engines Group Sharath Pankanti (ECVG) Josef Vopicka (Speech). Title slide. Challenge = Multi-modal instructional dialogs.

tacita
Download Presentation

See, Hear, Do: Language and Robots

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Far Reaching Research (FRR) Project See, Hear, Do:Language and Robots Jonathan Connell Exploratory Computer Vision Group Etienne Marcheret Speech Algorithms & Engines Group Sharath Pankanti (ECVG) Josef Vopicka (Speech) Title slide

  2. Challenge = Multi-modal instructional dialogs Use speech, language, and vision to learn objects & actions Innate perception abilities (objects / properties) Innate action capabilities (navigation / grasping) Easily acquire terms not knowable a priori Example dialog: Round up my mug. I don’t know how to “round up” your mug. Walk around the house and look for it. When you find it bring it back to me. I don’t know what your “mug” looks like. It is like this <shows another mug> but sort of orange-ish. OK … I could not find your mug. Try looking on the table in the living room. OK … Here it is! command following verb learning noun learning advice taking Language Learning & Understanding is a AAAI Grand Challenge http://www.aaai.org/aitopics/pmwiki/pmwiki.php/AITopics/GrandChallenges#language

  3. Eldercare as an application • Example tasks: • Pick up dropped phone • Get blanket from another room • Bring me the book I was reading yesterday • Large potential market Many affluent societies have a demographic imbalance (Japan, EU, US) Institutional care can be very expensive (to person, insurance, state) • A little help can go a long way Can be supplied immediately (no waiting list for admission) Allows person to stay at home longer (generally easier & less expensive) Boosts independence and feeling of control (psychological advantage) • Note: We are not attempting to address the whole problem X Aggressive production cost containment X Robust self-recharging and stairs traversal X Bathing and bathroom care, patient transfer, cooking X OSHA, ADA, FDA, FCC, UL or CE certification

  4. State of the art • Indoor navigation Minerva from CMU, Jose from Univ. British Columbia • Perception & manipulation Herb from CMU / Intel (Kanade), PR2 from Willow Garage • Language learning Ripley from MIT (Deb Roy), HAM from KTH in Sweden • Dialog and speech Honda system from IBM, call center handling from IBM • No object perception • No manipulation capability • Off-line object model generation • No natural language interface • Either fetch or carry • No procedural learning • No physical presence or action • No visual perception of objects

  5. OEM buy hardware $70B / year add software and services Third Party customers Business Model IBM

  6. $24B / yr resell robot + value added software + field service • Eldercare market in US (x3 if EU and AP also) 3 million Total US population 300 million Ages 75-85 10% Suitable (ability level, desire, finances) 10% • Manufacturing business ($2000 / robot yr) $6B / yr • Services business ($3000 / robot yr)$9B / yr Costs & revenue potential • OEM sales price for hardware $6000 • Electromechanical parts $1300 • Onboard computer $500 • Assembly (15hrs x $80 / hr) $1200 • + 30% Sales & distribution + 20% profit $3000 • Value-added wholesale price (w/ software) $15,000 • 10% Continued R&D $1500 • 30% Sales & distribution $4500 • 20% Profit $3000 Price = Less than a new car • Total cost of ownership $8000 / yr • Lifetime = 3 years $5000 / yr • Service (15hrs / quarter x $50 / hr x 4 quarters) $3000 / yr • Effective wage (40hrs / wk x 50wks / yr = 2000 hrs / yr)$4 / hr

  7. Alternative: Half-time aide + robot $20,500 / yr Human still helps with clothes, hygiene, meals Robot potentially available after hours and on weekends No problem with robot Training, Turnover, and Trust (stealing) • Value proposition (to client): 30% more hours @ 10% less cost Split savings with customer ($50,000  $45,000 per client) Human 5 hrs + robot 8 hrs = 13 hrs / day during week 10% less revenue but 22% more profit (= $6.6B / yr extra profit if 100% market share) Bill at $20,000 - $3000 service = $17,000 / yr revenue  10.6 months payback on $15,000 purchase Sample business case • Home eldercare now (employer costs) $25,000 / yr • 1 aide from 8am to 6pm = 10 hrs • 50wks x 5days / wk x 10hrs / day = 2500 hrs / yr • Federal min. wage = $7.25 / hr • +38% overhead (FICA + 401K + medical) = $10 / hr • Aide’s activities: • Help with clothes, hygiene, meals • Odd tasks such as fetching objects • Sitting around watching TV

  8. What’s different and important • Speech-driven interface • No headset required (far field), can learn new nouns and verbs • Multi-modal dialog • Responds to gestures, exploits synergies between modalities • Manipulation as well as mobility • Not just a walking telephone, can do useful physical work also • One-shot learning • No turntable scanning, not 100’s of examples, no trial-and-error experiments • Cost containment • Vision instead of special-purpose sensors and precise mechanicals

More Related