1 / 23

Progressive Filtering and Its Application for Query-by-Singing/Humming

Progressive Filtering and Its Application for Query-by-Singing/Humming. J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept., Tsing Hua Univ., Taiwan http://www.cs.nthu.edu.tw/~jang. Recent Publications. Journals

oya
Download Presentation

Progressive Filtering and Its Application for Query-by-Singing/Humming

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Progressive Filtering and Its Application for Query-by-Singing/Humming J.-S. Roger Jang (張智星) Multimedia Information Retrieval Lab CS Dept., Tsing Hua Univ., Taiwan http://www.cs.nthu.edu.tw/~jang

  2. Recent Publications • Journals • Jiang-Chun Chen, J.-S. Roger Jang, "TRUES: Tone Recognition Using Extended Segments", ACM Transactions on Asian Language Information Processing, 2008. • J.-S. Roger Jang and Hong-Ru Lee, "A General Framework of Progressive Filtering and Its Application to Query by Singing/Humming", IEEE Transactions on Audio, Speech, and Language Processing, No. 2, Vol. 16, PP. 350-358, Feb 2008. • Conferences • Liang-Yu Chen, Chun-Jen Lee, Jyh-Shing Roger Jang, "Minimum Phone Error Discriminative Training For Mandarin Chinese Speaker Adaptation", Proceedings of INTERSPEECH 2008, Brisbane, Australia, Sept. 2008. • Chao-Ling Hsu, Jyh-Shing Roger Jang, and Te-Lu Tsai, "Separation of Singing Voice from Music Accompaniment with Unvoiced Sounds Reconstruction for Monaural Recordings", Proceedings of 125th AES Convention, San Francisco, USA, Oct. 2008. • Zhi-Sheng Chen, Jia-Min Zen, Jyh-Shing Roger Jang, "Music Annotation and Retrieval System Using Anti-Models", Proceedings of 125th AES Convention, San Francisco, USA, Oct. 2008.

  3. Outline • Problem definition of QBSH • Methods for QBSH • Progressive Filtering • Conclusions

  4. Introduction to QBSH • QBSH: Query by Singing/Humming • Input: Singing or humming from microphone • Output: A ranking list retrieved from the song database • Overview • First paper: Around1994 • Extensive studies since 2001 • State of the art: QBSH tasks at ISMIR/MIREX

  5. Challenges in QBSH Systems • Reliable pitch tracking for acoustic input • Input from mobile devices • Input at noisy karaoke box • Song database preparation • Audio music vs. MIDIs • Efficient/effective retrieval • Karaoke machine: ~10,000 songs • Internet music search engine: ~500,000,000 songs

  6. Goal and Approach • Goal: To retrieve songs effectively within a given response time, say 5 seconds or so • Our strategy • Multi-stage progressive filtering • Data-driven design methodology based on DP

  7. Approaches to QBSH • Pitch Tracking • Methods for QBSH

  8. A Quick Demo of QBSH • Demo page of MIR lab: • http://mirlab.org/mir_main/demo.htm • Demo of QBSH • http://mirlab.org/Demo/MusicSearch/index.htm

  9. Progressive Filtering • Multi-stage representation • Each stage is a method for QBSH … … stage 1 stage 2 stage i si: survival rate for stage i di: delay for stage i ni-1: no. of input songs to stage i

  10. Stage Characteristics for Effectiveness • RS curve for stage i: recog. rate = ri(s) Recog. rates (%) Recog. rate (100, 100) 100 Survival rate More effective method Less effective method 65 Random guess Top-10% recog. rate is 65% Survival rates s (%) (0, 0) 10 100

  11. Stage Characteristics for Efficiency • TS curve for stage i: average time = ti(s) Time Averagetime (ms) Survival rate Less efficient method 5 When s=10%, the average one-to-one comparison time is 5ms More efficient method (100, 0) Survival rates (%) (0, 0) 10 100

  12. Formulation as an Optim. Problem • Max: subject to the constraints n (= n0): Size of the song database Tmax : maximum allowable response time, say, 5 sec. 10 : the size of the retrieved ranking list.

  13. DP-based Approach • The orig. optim. task can be cast into DP: • Optimum-value function Ri(s, t) is the optimum recog. rate at stage i, with a cumulated survival rate s and a cumulated computation time t. • Recurrent formula for Ri(s, t) can be derived based on changing the survival rate of stage i, as follows.

  14. Recurrent formula for Ri(s, t) di: delay of stage i … … stage 1 stage i-1 stage i

  15. DP-based Approach • Boundary conditions for Ri(s, t) : • Optim. recog. rate: We can then back track to find the optimum s1, s2, …, sm.

  16. Five Stages for Our Study • We chose 5 stages for DP-based design method: • Range comparison • Modified edit distance • LS • DTW with down-sampled inputs • DTW

  17. Corpora • QBSH corpus • 2797 8-second recordings (8 KHz, 8 bits) of 48 kids songs, by118 subjects • 500 for design set, the others for test • Song database • 13320 songs • Comparison mode • Anchored beginning

  18. RS curves

  19. TS Curves

  20. Optimum RR wrt Response Time

  21. Survival Rates wrt Response Time

  22. Conclusions & Future Work • Conclusions • Advantages: • A scalable meta-method • Feasible for optimizing QBSH systems • Applicable (?) to other multimedia retrieval systems • Disadvantages • Derivation of RS and TS curves is time-consuming • Future work • More effective/efficient method for each stage

More Related