1 / 22

Project 9 Automatic Fingersign to Speech Translator

Project 9 Automatic Fingersign to Speech Translator. Final Presentation. The group. Lale Akarun. Oya Aran. Alp Kindiroglu. Alexey Karpov. Milos Zeleny. Marek Hruz. Hasim Sak. Pavel Campr. Erinc Dikici. Daniel Schorno. Zdenek Krnoul. Alexander Ronzhin. Objectives & System Flowchart.

erik
Download Presentation

Project 9 Automatic Fingersign to Speech Translator

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Project 9Automatic Fingersign to Speech Translator Final Presentation

  2. The group Lale Akarun Oya Aran Alp Kindiroglu Alexey Karpov Milos Zeleny Marek Hruz Hasim Sak Pavel Campr Erinc Dikici Daniel Schorno Zdenek Krnoul Alexander Ronzhin

  3. Objectives & System Flowchart • Finger spelling <-> Speech (F2S & S2F) • Translation between Russian, English, Czech, Turkish

  4. Finger Spelling Recognition • Multilingual fingersign alphabet database • Turkish alphabet (5 subjects) • Czech alphabet (4 subjects) • Russianalphabet (2 subjects) • Numbers and special stop signs

  5. Finger Spelling Recognition • Semi-Automatic annotation module: • 11 videos each 15-30 minutes Filter Images Crop Sign-Space Select Keyframes Segment Hand Locations

  6. Finger Spelling Recognition • Skin color based hand detection • Initialization of model by movement of hands Video Input (Turkish or Czech) Skin Color Detection Tracking and Segmentation of hands Keyframe Selection Feature Extraction & Classification Text Output (UTF 8)

  7. Finger Spelling Recognition • Tracking of the hands by Camshift • Hierarchical hand and face redetection • Hand segmentation • Backprojection • Double Differencing Video Input (Turkish or Czech) Skin Color Detection Tracking and Segmentation of hands Keyframe Selection Feature Extraction & Classification Text Output (UTF 8)

  8. Finger Spelling Recognition • Two tier classification: • Keyframe Selection • Gesture Recognition • Detection of Keyframes: • Motion of Hands • Displacement of tracked hand centers • Changes in hand external contour • Image Blur • Strength of gradient trace around hand contours Video Input (Turkish or Czech) Skin Color Detection Tracking and Segmentation of hands Keyframe Selection Feature Extraction & Classification Text Output (UTF 8)

  9. Finger Spelling Recognition • Hand gesture Descriptors: • Radial Distance Functions • Elliptic Fourier Descriptors • Local Binary Patterns • Hu Moments • Classification of each feature is done by KNN. • Classified results for each feature are fused by voting. • Optional word level fusion with Levenshtein Distance. Video Input (Turkish or Czech) Skin Color Detection Tracking and Segmentation of hands Keyframe Selection Feature Extraction & Classification Text Output (UTF 8)

  10. Speech Recognition • Continuous speech recognition: • A weighted finite-state transducer based speech decoder • 3-gram language model • 100K vocabulary size • News portal based • 10843 tri-phone HMM states • 11 Gaussians for acoustic model • 188 hours broadcast news speech data

  11. Speech Recognition • Voice Activity Detection(VAD) • Preprocessing step on continious ASR • Identifies false voice triggers • Employed Methods: • Rabiner’s Method: Energy level and zero-crossing rates of the acoustic waveform • Supervised learning: Energy level of the signal modeled using GMMs

  12. Speech Recognition • Isolated speech recognition: • Phoneme based speech recognition • Represented by HMMs using GMMs • Used for out-of-vocabulary words • Speech Commands allow module control

  13. Server • Python Based Web Service • Handles Input/Output from multiple modules • Users communicate using sessions • All messages in utf-8 encoding or transcribed form • Translation of sentences handled by Google Translate • Messages types: • Letter • Word • Sentence

  14. Speech Synthesis • Computer speech synthesis given an arbitrary input text • Two TTS systems are applied: • MARY TTS developed by DFKI (Germany) • TTS engine developed by UIIP (Belarus) and SPIIRAS (Russia). • Web-based service • Polls for messages from the web-server.

  15. Finger Spelling Synthesis • Visual Fingersign output provided through a 3D avatar • Availablefor two languages: • Czech Sign Alphabet • American Sign Alphabet • Module composed of: • 3D animation model • 38 joints and segments (16 for hand) • Trajectory generator • Rotations of body parts handled with Inverse Kinematics • Head and lip motion provided by talking head system • Inputs and outputs words.

  16. Finger Spelling Synthesis

  17. Integrated System Scenarios • City names game • Module Design: • Fingerspell-> Amsterdam Speech-> Madrid • Fingerspell-> Doha Speech-> Alta • Fingerspell-> Athens Speech-> Sukre • Fingerspell-> Eton Speech-> Nairobi Visual Input (Turkish) Finger Spelling Recognition Finger Spelling Synthesis Visual Output (Czech) Server (Translator) Audio Letter Input (Russian) Isolated Speech Recognition Speech Synthesis Audio Output (English)

  18. Integrated System Scenarios • City names game • Fingerspell-> Amsterdam Speech-> Madrid • Fingerspell-> Doha Speech-> Alta • Fingerspell-> Athens Speech-> Sukre • Fingerspell-> Eton Speech-> Nairobi

  19. Integrated System Scenarios • Casual Continuous Conversation Finger Spelling Synthesis Visual Output (Czech) Audio Sentence Input (Turkish) Isolated Speech Recognition Server (Translator) Speech Synthesis Audio Output (English)

  20. Future Work... • Automated language detection for fingerspelling • Further testing • Increasing overall system speed • Addition of missing languages to underlying modules

  21. Questions

  22. Highlights

More Related