270 likes | 428 Views
Automatic G enre Classification Using Large High-Level Musical Feature Sets. Cory McKay and Ichiro Fujinaga Dept. of Music Theory Music Technology Area McGill University Montreal, Canada. Topics. Introduction Existing research Taxonomy Features Classification methodology Results
E N D
Automatic Genre Classification Using Large High-Level Musical Feature Sets Cory McKay and Ichiro Fujinaga Dept. of Music Theory Music Technology Area McGill University Montreal, Canada
Topics • Introduction • Existing research • Taxonomy • Features • Classification methodology • Results • Conclusions 2/27
Introduction • GOAL: Automatically classify symbolic recordings into pre-defined genre taxonomies • This is first stage of a larger project: • General music classification system • Classifies audio • Simple interface 3/27
Why symbolic recordings? • Valuable high-level features can be used which cannot currently be extracted from audio recordings • Research provides groundwork that can immediately be taken advantage of as transcription techniques improve • Can classify music for which only scores exist (using OMR) • Can aid musicological and psychological research into how humans deal with the notion of musical genre • Chose MIDI because of diverse recordings available • Can convert to MusicXML, Humdrum, GUIDO, etc. relatively easily 4/27
Existing research • Automatic audio genre classification becoming a well researched field • Pioneering work: Tzanetakis, Essl & Cook • Audio results: • Less than 10 categories • Success rates generally below 80% for more than 5 categories • Less research done with symbolic recordings: • 84% for 2-way classifications (Shan & Kuo) • 63% for 3-way classifications (Chai & Vercoe) • Relatively little applied musicological work on general feature extraction. Two standouts: • Lomax 1968 (ethnomusicology) • Tagg 1982 (popular musicology) 5/27
Taxonomies used • Used hierarchical taxonomy • A recording can belong to more than one category • A category can be a child of multiple parents in the taxonomical hierarchy • Chose two taxonomies: • Small (9 leaf categories): • Used to loosely compare system to existing research • Large (38 leaf categories): • Used to test system under realistic conditions 6/27
Small taxonomy • Jazz • Bebop • Jazz Soul • Swing • Popular • Rap • Punk • Country • Western Classical • Baroque • Modern Classical • Romantic 7/27
Large taxonomy 8/27
Training and test data • 950 MIDI files • 5 fold cross-validation • 80% training, 20% testing 9/27
Features • 111 high-level features implemented: • Instrumentation • e.g. whether modern instruments are present • MusicalTexture • e.g. standard deviation of the average melodic leap of different lines • Rhythm • e.g. standard deviation of note durations • Dynamics • e.g. average note to note change in loudness • Pitch Statistics • e.g. fraction of notes in the bass register • Melody • e.g. fraction of melodic intervals comprising a tritone • Chords • e.g. prevalence of most common vertical interval • More information available in Cory McKay’s master’s thesis (2004) 10/27
A “classifier ensemble” 12/27
Feature types • One-dimensional features • Consist of a single number that represents an aspect of a recording in isolation • e.g. an average or a standard deviation • Multi-dimensional features • Consist of vectors of closely coupled statistics • Individual values may have limited significance taken alone, but together may reveal meaningful patterns • e.g. bins of a histogram, instruments present 13/27
Classifiers used • K-nearest neighbour (KNN) • Fast • One for all one-dimensional features • Feedforward neural networks • Can learn complex interrelationships between features • One for each multi-dimensional feature 14/27
A “classifier ensemble” • Consists of one KNN classifier and multiple neural nets • An ensemble with n candidate categories classifies a recording into 0 to n categories • Input: • All available feature values • Output: • A score for each candidate category based on a weighted average of KNN and neural net output scores 16/27
Feature and classifier selection/weighting • Some features more useful than others • Context dependant • e.g. best features for distinguishing between Baroque and Romantic different than when comparing Punk and Heavy Metal • Hierarchical and round-robin classifiers only trained on recordings belonging to candidate categories • Feature selection allows specialization to improve performance • Used genetic algorithms to perform: • Feature selection (fast) followed by • Feature weighting of survivors 19/27
Complete classifier 21/27
Exploration of taxonomy space • Three kinds of classification performed: • Parent (hierarchical) • 1 ensemble for each category with children • Only promising branch(es) of taxonomy explored • Field initially narrowed using relatively easy broad classifications before proceeding to more difficult specialized classifications • Flat • 1 ensemble classifying amongst all leaf categories • Round-robin • 1 ensemble for each pair of leaf categories • Final results arrived at through averaging 22/27
Complete classifier 23/27
Overall average success rates across all folds • 9 Category Taxonomy • Leaf: 86% • Root: 96% • 38 Category Taxonomy • Leaf: 57% • Root: 75% 24/27
Importance of number of candidate features • Examined effect on success rate of only providing subsets of available features to feature selection system: 25/27
Conclusions • Success rates better than previous research with symbolic recordings and on the upper end of research involving audio recordings • True comparisons impossible to make without standardized testing • Effectiveness of high-level features clearly demonstrated • Large feature library combined with feature selection improves results • Not yet at a point where can effectively deal with large realistic taxonomies, but are approaching that point 26/27