en:advanced2 [Computergestützte Musikanalyse]

In this module, the approaches to analyzing recordings as introduced in module Basics Audio are continued and deepened. The focus is on possibilities of an automated determination of

sound characteristics,
the tonal content or chords,
onsets, basic beat and tempo

in recordings with the help of algorithms from music information retrieval and the so-called Vamp Plugins, which can be executed with the Sonic Visualiser; the results can also be graphically displayed in the Sonic Visualiser. These procedures are presented in three tutorials using music examples:

Another tutorial introduces possibilities of corpus analysis, i.e., the examination and comparison of several pieces, with the Sonic Annotator.

First, however, here are some general information about Music Information Retrieval:

Against the background of the digitization of music recordings since the 1980s, the development of compressed data formats such as MP3 in the 1990s, and the resulting facilitated mass distribution of music files on the Internet, a new field of music informatics emerged around the turn of the millennium, the so-called Music Information Retrieval (MIR). In Music Information Retrieval algorithms, programs and tools are developed to automatically search for specific information about music in the worldwide growing music databases. MIR research is usually application-oriented (oriented to use cases) and is driven not only by research institutions but also by software developers in commercial companies (Apple, Google, Spotify, etc.).

In 2000, the International Society for Music Information Retrieval (ISMIR) was founded and the first ISMIR conference was held in Pymouth, USA. Since then, there have been annual conferences with Online Proceedings. Since 2005, ISMIR conferences have also included competitions with special tasks for new algorithms to be developed (Music Information Retrieval Evaluation eXchange (MIREX), including transcription algorithms. Today there are a number of MIR research centers worldwide (cf. projects).

Within MIR, a rough distinction is made between two approaches:

Algorithms based on metadata: i.e., information about the piece of music, manual annotations, e.g., tags from social networks, but also data traces on usage and purchasing behavior as collected in commercial music platforms. On this basis, quite successful music recommendation systems have been developed, for example.
Content-based methods (content-based MIR), in which the music itself is analyzed, either as notes (so-called symbolic data) or,for the most part, as digital audio files.

In content-based MIR, algorithms are used to extract certain features from the audio file. A distinction is made between Low Level Features and High Level Features. Low Level Features can be extracted directly from the audio file, e.g. values for intensity or spectral energy distribution or the frequency of zero crossings of the vibration curve.

Input is the 44,100 numerical values per second of a digitized audio signal.
A temporal window function aggregates numerical values for extraction or analysis.
Most methods are also based on a Fourier transform (STFT or FFT), i.e. a spectral analysis.
Further transformations refer to the sound spectrum, which is, among other things, divided into MFCCs (Mel-frequency cepstrum coefficients) or pitch classes, so-called chroma, or they are directed at the identification of tone beginnings (onset detection), from which beat, tempo and meter are derived.

These Low Level Features often serve as a starting point for the calculation of High Level Features, which correspond to a greater extent to human perception; these are properties such as pitch, chords, meter, timbre, etc. However, it is sometimes problematic to bridge the gap between the rather abstract Low Level Features, i.e. a more measurement-oriented description, and the High Level Features, i.e. music perception.

In the three advanced tutorials of the Audio Advanced module, the insights gained in Tutorial: Spectral Visualization with the Sonic Visualiser and Tutorial: Spectral Visualization of Vocal Recordings are supplemented and continued. They deal with automated analysis and visualization possibilities of sound, harmony and rhythm, which can be calculated from the spectral data of the audio file with the help of so-called Vamp Plugins and displayed in the Sonic Visualiser. The three tutorials are located on their own subpages:

Installation and functionality of the Vamp Plugins

The Vamp Plugins are an audio processing plugin system in which MIR researchers* develop and provide plugin extensions for the Sonic Visualiser. The individual plugins each extract different information from the audio files and visualize it into a new visualization layer within the Sonic Visualiser.

A List of the Vamp Plugins developed so far contains short descriptions and links to the corresponding documentation and download pages. After downloading, the plugin files have to be placed in a folder called „Vamp Plugins“, which must be created by you and should be in the same folder as the Sonic Visualiser program folder, e.g. in C\:Programs or C\:Program files.

The following plugin packages are required for the tutorials:

Queen Mary Plugin Set
Vamp Aubio Plugins
NNLS Chroma and Chordino (alternatively: here).

Tip: You can also download a comprehensive Vamp plugins package with its own installer here. Choose this option if the manual download does not work (which is sometimes the case on Mac computers).

After restarting Sonic Visualiser, the plugins are available via the Transform menu item and can be applied to the open audio file. The individual Vamp Plugins are organized by category, name and developer ('Maker'); the most recently opened plugins are listed as Recent Transforms. For most Vamp plugins, a window opens first where further fine-tuning can be done. These fine adjustments are documented on the developer*s pages and partly explained in the respective tutorials. After execution, the results are displayed as a new Layer where the visualization settings can be adjusted and changed on the tab (top right). The layer can be switched off and on with the Show button (bottom right) and deleted with Ctrl-D.

The Sonic Annotator can be used to conveniently evaluate several (hundreds!) of audio files with Vamp plugins in one go. The program runs without installation. The operation goes over a command window (power shell), in which commands must be typed.

Here goes to the tutorial for corpus studies with Sonic Annotator.

For a comprehensive and programmatic overview of music information retrieval research goals and approaches, see the Roadmap for Music Information ReSearch published in 2013 by the MIReS Consortium, an association of several European research institutions.

Practical application-based introductions to numerous areas of music information retrieval are provided by the FMP Notebooks Python Notebooks for Fundamentals of Music Processing of the Audio Labs Erlangen.

Advanced Audio Module

Music Information Retrieval: A Very Short Introduction

Visualizations with Vamp Plugins

Installation and functionality of the Vamp Plugins

Corpus studies with the Sonic Annotator

Deepening

Computergestützte Musikanalyse