Modeling Genre with Musical Attributes

Overview

Genre provides one of the most convenient groupings of music, but it is often regarded as poorly defined and largely subjective.

Can musical genres be modeled objectively via a combination of musical attributes?
Can audio features mimic the behavior of these musical attributes?

In this work, evaluation is performed using labels from Pandora’s Music Genome Project® (MGP) across more than 1.2Million examples.

Methods

Figure 1. Classify genre using musical attributes (1). Predict attributes and genre using audio features (2).

Human Labeled Attributes (1): The first method uses only a selection of human-labeled attributes from Pandora's MGP as features to classify the presence of a genre.

Audio Features (2a): The second method uses only audio features directly to classify the presence of a genre.

Predicted Attributes (2b): The third method uses audio features to predict the presence of the human labeled attributes from (1). These attribute predictions are used to classify the presence of genre.

Hybrid Method (2c): The hybrid method uses audio features to predict the presence of the human labeled attributes from (1). These attribute predictions are used along with the raw audio features to classify the presence of genre.

Model Setup: Each genre task is formulated as an individual binary classification and uses logistic regression as the learning model. Each of the attribute learning tasks are formulated as a binary valued classification using logistic regression or a continuous valued regression using linear regression. Tasks that incorporate audio use the following features: MFCCs (460) [1], Mellin Scale Transform (230) [2], Beat Profiles (108) [2], Tempogram Ratios (39) [2].

The Music Genome Project

In this work, attribute and genre labels are derived from Pandora’s Music Genome Project® (MGP). The MGP contains 500+ expert labels across 1.2M+ songs. In this work we use a subset of the MGP containing 48 musical attributes, 12 “Basic” genres and 47 sub-genres.

Attribute Examples: Male Vocals, Female Vocals, Distorted Guitar, Triple Meter, Syncopation, Live Recording, etc.
Basic Genre Examples: Rock, Rap, Latin, Jazz, etc.
Sub-genre Examples: Light Rock, Hard Rock, Punk Rock, Bebop Jazz, Afro-Cuban Jazz, etc.

Results

In order to take a closer look at musical attributes and their relation to genre, we'll explore some components of Jazz. The results in Figure 2 show how well each of the individual attributes perform as a single dimensional feature when classifying sub-genres within Jazz. Classification ROC-AUC values are shown for each of the attributes on the left across a few sub-genres listed below the figure.

Figure 2. Important attributes within the Jazz sub-genre are shown. AUC-ROC values are shown for classifying each sub-genre with each attribute.

One of the more obvious correlations is with the swing attribute. Notice how the presence of swing is a good predictor of both "Swing Jazz" and "Boogie." However swing isn't always an important attribute of jazz. In "Afro-Cuban Jazz", swing is not a good predictor, however, we see that Syncopation is, due to the syncopated, straight-time clave rhythms present. "Afro-Cuban Jazz" also contains unique instrumentation. The presence of auxiliary percussion (not standard drumset) instruments (i.e., congas, claves, etc.) is a defining factor. More interestingly the presence of a backbeat is a good predictor of "Free Jazz". This is due to "Free Jazz" being one of the few styles of music with out a backbeat, making that negative correlation a powerful predictive attribute.

Classification results for all 4 model types from Figure 1 are shown in Figure 3. The top of the figure shows the results for the Jazz sub-genre group. The bottom plot shows the average AUC-ROC for all genre groups. In all cases, the 48 musical attributes are the best representation to use when classifying genre. This shows that this low dimensional representation (1) is powerful and contains important correlations to genre.

Figure 3. ROC-AUC results for classifying all the individual Jazz sub-genres (top) and average results for all genre groups (bottom) are shown.

The audio features alone (2a) do reasonably well also, but in using only audio features, there is no way to know what about genre each is capturing. This is achieved by learning the musical attributes from audio and using estimated attributes to classify genre (2b). While it does not work as well as direct audio features, we gain insight into what the features are capturing, as well as a significant and meaningful reduction in dimensionality. There is also lots of room for improvement here. Better models of each attribute individual will greatly improve the musical attributes layer of this model, and therefore improve genre classification overall. Lastly, the final model (2c) is second only to the human labeled attributes alone. This shows that audio features and the attribute models maybe contain complementary information, and each may be making up for shortcomings in the other.

Conclusions

Genre is largely defined through musical attributes.
Audio-based models can predict the presence of these attributes and use them to predict genre.

References

[1] Klaus Seyerlehner et. al “A refined block-level feature set for classification, similarity and tag prediction.” MIREX, 2011.
[2] Matthew Prockup et al. “Modeling musical rhythm at scale using the music genome project.” IEEE WASSPA, 2015.

Cite This Work

Prockup, M., Ehmann, A., Gouyon, F., Schmidt, E., Celma, O., Kim, Y., "Modeling Genre with the Music Genome Project: Comparing Human-Labeled Attributes and Audio Features." International Society for Music Information Retrieval Conference, Malaga, Spain, 2015. [PDF]

Matt Prockup