Besides the visualization by spectrograms there are other ways to examine the sound of a recording. This usually involves peculiarities of the overall sound of a recording. However, in those passages of a recording in which the various voices and instruments do not overlap each other too much, sound peculiarities of certain (instrumental) voices or individual sound events can certainly be visualized.
These are so-called low level features - i.e., characteristics that are relatively close to the acoustic measurement and relatively far away from the auditory perception (e.g., a certain timbre perception). Therefore, they are usually not very informative. Nevertheless, it is worthwhile to compare different sections of a recording or different recordings with respect to these metrics. In this way, one can get a sense of what sonic aspects they relate to.

The spectral centroid is a measure for the center or centroid of a frequency spectrum. It corresponds to an important dimension of timbre perception. The higher the spectral centroid, the brighter and more radiant the corresponding sound.

 Start the Sonic Visualiser.
 Load Audio01.mp3 and start the Vamp plugin by the menu command: 
 'Transform' - 'Analysis by maker' - 'Paul Brossier' - 'Aubio Spectral Descriptor'.

A window will open where you can choose among different descriptor types. In addition, under Advanced you can set the size of the analysis windows (Audio frames per block) and their overlap (Windows increment).

 Under 'Spectral Descriptor Type' select the setting 'Spectral Centroid'.
 Press OK. 

A new Time Value Layer opens. Under Plot Type select Discrete Curve to get a well recognizable line.
(Attention: If there is no signal - i.e. if there is silence within or at the end of a recording - the spectral centroid automatically rises to a relatively high average value).

 What can you see? 
 How does the Spectral Centroid change with the different sounds of the recording?

Spectral Flux is a measure of how quickly and strongly the spectrum of a signal changes from analysis window to analysis window. Low values indicate monotonous, regular sounds (e.g. with horizontal pitches and constant timbre). At high values, the timbre changes very quickly - or the signal contains sounds with a very high, chaotic noise component, e.g. percussive sounds without a perceptible pitch.

 Under 'Spectral Descriptor Type', select the setting 'Spectral Flux'.
 Press OK. 

A new Time Value Layer opens. Under Plot Type, select Discrete Curve to obtain a well-detectable line.

Now compare the curves of Spectral Centroid and Spectral Flux. Where do they conform? Where do major differences show up? (Pay special attention to the passages starting at 0:53 and starting at 1:01 with their very special sounds).

There are several attempts to make spectral representations more closely match human auditory perception. One possibility is to adjust the frequencies (vertical axis) not linear, but logarithmically (Log), because our hearing perception is oriented at logarithms , i.e. the double frequency is perceived as twice as high (= octave), four times the frequency as three times higher, eight times the frequency as four times higher etc.

With Constant-Q spectrograms, the ratio of the central frequency and the frequency resolution remains constant for all determined and displayed frequency bands. In this way, it can be set that, for example, each frequency band corresponds to a chromatic note. This greatly facilitates the tonal interpretation of a spectrogram.
In addition, the minimum and maximum pitches of the display range can be set in a convenient way.

 Launch the Sonic Visualiser.
 Please, load the audio file of Ray Charles "Comeback Baby" (Audio01.mp3).
 Select in the menu item 'Transform' - 'Analysis by maker' - 'Queen Mary, University of London' - 'Constant Q Spectrogram'. 

In the menu window, the displayed pitch range can be set in MIDI pitches; the middle c' = C4 has the MIDI value 60, c'' = = C5 = 72, etc.
Under Bins per Octave you can set whether the octave is divided into 12 equal steps. With a value of 48, on the other hand, each semitone step is divided into four equal steps.
Also, test different possibilities for scaling (on the layer tab).

With the plugins Constant Q Spectrogram (MIDI pitch range) or Constant Q Spectrogram (Hz range) the MIDI pitches or the Hz range are displayed in addition to the pitches.

Mel Frequency Cepstral Coefficients (MFCC) were developed for automatic speech recognition, but can also be applied to the tonal properties of music. In particular, they are used for the identification of musical pieces.

MFCCs lead to a compact representation of the spectral properties of an audio signal, where not the pitches but the timbral-spectral properties are determined. Or related to speech recognition: a periodic excitation signal (vocal folds) is sonically-spectrally shaped by a linear filter (vocal tract: mouth, tongue, nasal cavities). For speech recognition by MFCCs, it is primarily the filter (i.e., the shape of the vocal tract) that is important and not with which fundamental pitch something is said or sung.

The „Mel“ in the name describes the perceived pitch (Mel scale). Eventually, coefficients are formed for different frequency bands (per analysis window); the number of MFCCs can be set.

 Please load the file Audio02.mp3 ("Comeback Baby" by Ray Charles).
 In the menu item 'Transform' select  'Analysis by maker' - 'Queen Mary, University of London' - 'Mel Frequency Cepstral Coefficients'.


In the menu window you can set the number of coefficients. Usually there are 20 coefficients, but you can also set a finer resolution.

The Vamp plugins can be used to determine other metrics related to the sound and sonic perception of an audio signal. Here are the most important parameters in alphabetical order:

  • Harmonic Ratio: Proportional amount of harmonic components within a signal.
  • Signal to Noise Ratio (SNR): Ratio of signal to (background) noise.
  • Spectral Crest: Ratio of the maximum value of a spectral distribution to the arithmetic mean; indicator of the degree of tonality of a signal.
  • Spectral Entropy: Measure for the degree of order and redundancy of a signal. White noise has low degree of order and therefore a high entropy value.
  • Spectral Flatness: Another measure for the uniformity (cf. Spectral Entropy).
  • Spectral Roll-Off Point: 85% of the energy of a signal is located below its Roll-Off-Point frequency.
  • Spectral Skewness: Measure for the symmetry of a spectrum above the mean (median); a high value implies a tendency (skewness) to high frequency components, a low value implies a tendency to low frequency components.
  • Spectral Slope: A measure for energy decay in the high frequency range; provides clues to the color of noise or the dominance of (high) partials in the spectrum.
  • Spectral Spread: Measure for the degree of spread of a spectrum around the spectral centroid; used to distinguish between periodic signals and noise.
  • Zero Crossing Rate: Rate of zero crossings per time; a high rate indicates a noisy signal or noise.

The Vamp plugin Aubio Spectral Descriptor offers more spectral descriptors to choose from. Test these with Audio02.mp3 and with music examples of your own choice.

  • en/tutorium_sound.txt
  • Zuletzt geändert: 2022/02/22 10:36
  • von martin