1 / 12

Locating Cover Songs and Alternate Performances in Databases of Raw Audio

Locating Cover Songs and Alternate Performances in Databases of Raw Audio Robert Turetsky rjt72@columbia.edu Advent Workshop May 17, 2002 Technology enables “liquid music” Production Distribution Consumption Content-Based Analysis: Motivation

andrew
Download Presentation

Locating Cover Songs and Alternate Performances in Databases of Raw Audio

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Locating Cover Songs and Alternate Performances in Databases of Raw Audio Robert Turetsky rjt72@columbia.edu Advent Workshop May 17, 2002

  2. Technology enables “liquid music” Production Distribution Consumption

  3. Content-Based Analysis: Motivation • Search on file-sharing systems (e.g. KaZaA) involves meta-data • Meta-data prone to errors, omission, distortion • Only works if user already knows what to look for • Musical Content Analysis means: • Query by humming • Query by segment/prototype • Recommendation engines and artist discovery • Machine feedback/collaboration in composition • Locating cover songs is a first step

  4. Locating Cover Songs: Prior Work • Query By Humming • Mature field (kiosks, applets) but limited to monophonic music or manually transcribed polyphonic music • Jonathan Foote (FX Palo Alto) • ARTHUR (2000): align RMS energy. Works only on orchestral music, pop music has less dynamic range. • Content-Based Retrieval of Music and Audio (1997). Measures acoustic similarity, not equivalence. • Cheng Yang (Stanford) • Music Database Retrieval Based on Spectral Similarity (2001). Aligns MFCC at points of high energy using DTW. • MACS (2001). Aligns estimates of pitch likelihood. Indexing. “Bad” alignments discarded after linearity filter.

  5. Why is locating cover songs so difficult? • Alternate performances can vary: • Studio vs. Live • Tempo (non-linear time shifting) • Pitch transposition • Production technique, acoustic character • Additions (i.e. audience interaction) • Alternate lyrics (i.e. Don’t Cry versions I and II) • Cover versions, artist re-interpretations • Vocalist, instrumentation, ornamentation • Entire character changes (i.e. Layla, dance remixes) • Yet we still know these songs are the same!

  6. System Overview Locate Section Breaks Identify Summary Sections Preprocessing Pitch Extraction Tonic Estimation Query Alignment

  7. Phase 1: Locate Section Breaks • Employ Foote’s Similarity Matrix • Theory: Windows of same section will have similar features. Windows of different sections will have features. • Similarity Matrix: Cosine distance between every fixed width window of the song • Novelty Score - measure of ‘newness’: correlation with checkerboard matrix. • Section breaks are peaks in the Novelty Score.

  8. Phase 2: Summary Segments Section 1 -> • Motivation: Only transcribe and align salient segments • Measure of salience: Repetition • Method: Search for largest off-diagonal line in Similarity Matrix for each segment to measure extent of repetition (“score”) • Summary segment is most repeated section. Prune rows/columns of similar sections in score matrix. Repeat until 45-75 sec of audio is kept Section 4 -> Sec 1 Sec 2 Sec 3 Sec 4 Sec 1 Sec 2 Sec 3 Sec 4

  9. Phase 3: Pitch Extraction Noise Suppression • Multi-pitch extraction algorithm based on Klapuri et al, 2001. • Works well, except in presence of drums. Predominant Pitch Estimation Time -> Estimate Pitched Sound Characteristics Estimate # Voices and Iterate Remove Found Sound from Mixture <- Pitch ->

  10. Phase 3: MPE Details Noise Reduction: RASTA style filter Predominant pitch estimation: “Fuzzy search” for harmonic peaks Spectral Smoothing to estimate sound parameters Resynthesis Repeat on mixture after removal Resynthesis

  11. Phase 4-5: Query-time alignment • Exhaustively align summary segments • Two alignments needed: Pitch and Time • Pitch Alignment: Tonic Estimation • Align two piano rolls at point of maximum cross-correlation between note histograms • Temporal Alignment: Dynamic Programming (Dynamic Time Warp) • Currently investigating different weights for rewarding note matches, penalizing mismatches

  12. Locating Cover Songs: Future Work • Indexing scheme, other alignment techniques to improve speed of query • Thematic extraction to find only melody or harmony lines • Include Beat Tracking as part of score • Investigate harmonic analysis (identifying chord structure) for better feature • Speech recognition on lyrics???

More Related