Tutorial Basics sheet music Part 2: Statistics

(version 2022_02_06)

Table of contents:

1. Query for simple statistical data

2. Frequencies

3. Two-dimensional frequency distributions

4. Application of bar and voice filters

5. Tasks

This tutorial introduces the computer-assisted possibilities of simple statistical queries based on CAMAT (Computer-Assisted Music Analysis Tool) with music examples.

Working through the tutorial should enable you to examine your own sheet music files using computer-assisted methods.

Each session with a Jupyter Notebook begins with the import of a set of Python libraries required for the analysis:

Then, you have to load the sheet music file you want to examine (from the internet or from your hard disk) and activate the xml-parser. This will create a new dataframe ('m_df') from the xml file, which will be the basis for the following statistical queries (cf. https://analyse.hfm-weimar.de/jupyter/CAMAT_Basics_Part1_Einfuehrung.html).

As music example for our tutorial we choose the first movement from the String Quartet K. 171 by Wolfgang Amadeus Mozart (see Basics Part 1).

1. Query simple statistical data

We start with simple statistical queries of the number of voices, the length in measures, the number of notes (total and per voice), and the ambitus of each voice.

Please open the sheet music file in parallel in your score editor (e.g. MuseScore).

Note: With the first evaluation command the data is read in for the first time. Therefore, the execution could take a relatively long time (up to several minutes – depending on your computer and the file size). However, all following commands will go very quickly!!!

Query of the piece length in measures:

Query of the number of tones per voice, with tied tones each counted as one tone:

The ambitus per voice is given in semitone steps, i.e. the difference between the lowest (min) and highest (max) note; given in MIDI values with c' = C4 = 60; c'' = C5 = 72 and so on:

2. Frequencies

To characterize individual compositions and to compare different pieces of music, it can be useful to determine the frequency of certain elements (pitches, note durations, etc.). For such issues frequency tables and graphical representations, so-called histograms, can be created.

2.1 Pitch

Which pitches appear how often? How diatonic is the tonal resource of a composition, how many additional chromatic notes appear?

What can we recognize?

Mozart apparently uses mainly notes of the E-flat major scale in the composition (E-flat=D#, A-flat=G#, B-flat=a#, etc.) and hardly any chromatic notes.

Tip: The graphic can also be displayed in an external pop-up window of the program Matplotlib and further processed, enlarged, reformatted and saved etc. there. To do this, the code must be preceded by the command '%matplotlib'. Afterwards Matplotlib must be switched off again by the command '%matplotlib inline'. Otherwise all following graphics are also displayed externally.

We now want to know exactly how often the individual pitches appear!

The table also shows the respective accidentals in the score. This can be useful when modulations into distant keys occur, for example. Here -1 stands for a b-sign, 1 for a #-sign, 0 for a resolution sign, -2 for a double bb and so on. This means:

Please pay attention to A3: Midi pitch 57 appears in two lines, since the A occurs 35 times with natural sign in the score, but twice without natural sign (presumably later in the measure of an already resolved measure).

If these specifications are too differentiated for me, I can switch them off with the additional parameter 'enharmonic=False'. Now E-flat becomes D-sharp (D#), A-flat becomes G-sharp (G#) and so on. Additionally I have to rename the column names of the table now, because now pitch and octave position are merged into one column:

Now A3 appears in only one column, with frequency 37.

The following command exports the list of pitch frequencies as a csv file (csv = comma separated variables; readable and processable in Excel or the text editor, among others). The export can be used to generate tables for comparisons and corpus analysis. The csv file is saved in the export folder and can be opened with a text editor or a spreadsheet program (e.g. Excel).

There are two ways to make the graphical representation a little clearer: On the one hand, the representation can be restricted to a certain pitch range. On the other hand, only those pitches can be selected that actually occur.

2.2 Pitch classes

For harmonic analyses, it is much clearer not to group the individual pitches, but to group them into pitch classes.

Now it can be seen at a glance: Mozart uses almost exclusively the notes of the E-flat major scale - with one interesting exception: the tritone a appears relatively often!

What could this be related to? To answer this question, of course, you need to look at the score and check the uses of the note a there. Could it have to do with the use of double dominants (F major)?

As for pitches (see section 2.1), tables can also be displayed for pitch classes. The frequency table is created automatically as soon as the plot parameter ('do_plot=None') is issued in the command:

2.3 Intervals in monophonic note sequence

How often does a certain interval step occur in the individual voices? Do all voices have a similar interval progression - or are there more leaps in the lower voices, for example, and more steps in the melody voice?

First, let's look at the interval distribution in the first violin

The first violin progresses primarily in seconds, thirds, and fourths, with descending steps being more common than ascending ones. Larger intervals also occur, but are much rarer.

Now, what about the cello part?

For this you simply have to replace the '1' with a '4' at the right place...

Simply copy the entire command into a new code cell and adjust the voice selection:

Noticeably, fourths up (5), seconds up (2) as well as fifths down (-7) occur quite frequently. Perhaps this is a hint to the fundamental tones that can be interpreted harmonically?

2.4 Tone durations

Now let's turn to rhythmic shaping: what duration values are used in the composition, and how often do they occur in each case?

In the following evaluation, the quarter note is given the value 1. Shorter and longer note values are named accordingly as multiples or divisors of 1.

As expected, Mozart uses mainly quarter notes (1) and smaller note values (<1). However, there are also a few longer notes. If we want to know the exact number of duration values and are also interested in the <1 range, we have to display the frequency table again:

To explain the duration values: They are multiples or divisors of a quarter note (=1). Thus:

2.5 Metrical profile

How clearly is the meter articulated in the voices of a composition - by the placement of tones on measure beginnings or on metrically important positions within the measure (e.g. the middle of the measure or on the quarter positions)? For this purpose, a list of the frequencies of tones on the various metrical positions can be displayed.

Of course, such a profile presupposes that the examined piece is in a single meter and has no meter changes. This can be checked with the following command:

In our Mozart movement, both 4/4 and 3/4 measures appear, whereby the 3/4 measure even predominates - although the piece begins in 4/4 time. The following command therefore creates two different metric profiles - one for the 4/4 measures, one for the 3/4 measures.

3. Two-dimensional frequency distributions

We have already looked at the frequencies of pitches and pitch classes. Now we could say: Longer tones naturally have more weight than short tones or tones between beats. We can pursue this idea further by looking at combined, 'double' or 'bivariate' frequency distributions: for example, the frequencies of the pitches for different duration values, or the frequencies of the pitch class for the different metrical positions. In the following, we will deal with this by means of two examples.

3.1 Permanent values of the pitch classes

The following command creates a so-called 3D graphic, where the frequencies of duration values per pitch class are displayed. Both the height and the color of the columns stand for the respective frequency (from blue=very rare via green and yellow to red=very frequent):