Tutorial Basics sheet music Part 2: Statistics

(version 2022_02_06)

Table of contents:

1. Query for simple statistical data

2. Frequencies

3. Two-dimensional frequency distributions

4. Application of bar and voice filters

5. Tasks

This tutorial introduces the computer-assisted possibilities of simple statistical queries based on CAMAT (Computer-Assisted Music Analysis Tool) with music examples.

Working through the tutorial should enable you to examine your own sheet music files using computer-assisted methods.

Each session with a Jupyter Notebook begins with the import of a set of Python libraries required for the analysis:

Then, you have to load the sheet music file you want to examine (from the internet or from your hard disk) and activate the xml-parser. This will create a new dataframe ('m_df') from the xml file, which will be the basis for the following statistical queries (cf. https://analyse.hfm-weimar.de/jupyter/CAMAT_Basics_Part1_Einfuehrung.html).

As music example for our tutorial we choose the first movement from the String Quartet K. 171 by Wolfgang Amadeus Mozart (see Basics Part 1).

1. Query simple statistical data

We start with simple statistical queries of the number of voices, the length in measures, the number of notes (total and per voice), and the ambitus of each voice.

Please open the sheet music file in parallel in your score editor (e.g. MuseScore).

Note: With the first evaluation command the data is read in for the first time. Therefore, the execution could take a relatively long time (up to several minutes – depending on your computer and the file size). However, all following commands will go very quickly!!!

Query of the piece length in measures:

Query of the number of tones per voice, with tied tones each counted as one tone:

The ambitus per voice is given in semitone steps, i.e. the difference between the lowest (min) and highest (max) note; given in MIDI values with c' = C4 = 60; c'' = C5 = 72 and so on:

2. Frequencies

To characterize individual compositions and to compare different pieces of music, it can be useful to determine the frequency of certain elements (pitches, note durations, etc.). For such issues frequency tables and graphical representations, so-called histograms, can be created.

2.1 Pitch

Which pitches appear how often? How diatonic is the tonal resource of a composition, how many additional chromatic notes appear?

What can we recognize?

Mozart apparently uses mainly notes of the E-flat major scale in the composition (E-flat=D#, A-flat=G#, B-flat=a#, etc.) and hardly any chromatic notes.

Tip: The graphic can also be displayed in an external pop-up window of the program Matplotlib and further processed, enlarged, reformatted and saved etc. there. To do this, the code must be preceded by the command '%matplotlib'. Afterwards Matplotlib must be switched off again by the command '%matplotlib inline'. Otherwise all following graphics are also displayed externally.

We now want to know exactly how often the individual pitches appear!

The table also shows the respective accidentals in the score. This can be useful when modulations into distant keys occur, for example. Here -1 stands for a b-sign, 1 for a #-sign, 0 for a resolution sign, -2 for a double bb and so on. This means:

Please pay attention to A3: Midi pitch 57 appears in two lines, since the A occurs 35 times with natural sign in the score, but twice without natural sign (presumably later in the measure of an already resolved measure).

If these specifications are too differentiated for me, I can switch them off with the additional parameter 'enharmonic=False'. Now E-flat becomes D-sharp (D#), A-flat becomes G-sharp (G#) and so on. Additionally I have to rename the column names of the table now, because now pitch and octave position are merged into one column:

Now A3 appears in only one column, with frequency 37.

The following command exports the list of pitch frequencies as a csv file (csv = comma separated variables; readable and processable in Excel or the text editor, among others). The export can be used to generate tables for comparisons and corpus analysis. The csv file is saved in the export folder and can be opened with a text editor or a spreadsheet program (e.g. Excel).

There are two ways to make the graphical representation a little clearer: On the one hand, the representation can be restricted to a certain pitch range. On the other hand, only those pitches can be selected that actually occur.

2.2 Pitch classes

For harmonic analyses, it is much clearer not to group the individual pitches, but to group them into pitch classes.

Now it can be seen at a glance: Mozart uses almost exclusively the notes of the E-flat major scale - with one interesting exception: the tritone a appears relatively often!

What could this be related to? To answer this question, of course, you need to look at the score and check the uses of the note a there. Could it have to do with the use of double dominants (F major)?

As for pitches (see section 2.1), tables can also be displayed for pitch classes. The frequency table is created automatically as soon as the plot parameter ('do_plot=None') is issued in the command:

2.3 Intervals in monophonic note sequence

How often does a certain interval step occur in the individual voices? Do all voices have a similar interval progression - or are there more leaps in the lower voices, for example, and more steps in the melody voice?

First, let's look at the interval distribution in the first violin

The first violin progresses primarily in seconds, thirds, and fourths, with descending steps being more common than ascending ones. Larger intervals also occur, but are much rarer.

Now, what about the cello part?

For this you simply have to replace the '1' with a '4' at the right place...

Simply copy the entire command into a new code cell and adjust the voice selection:

Noticeably, fourths up (5), seconds up (2) as well as fifths down (-7) occur quite frequently. Perhaps this is a hint to the fundamental tones that can be interpreted harmonically?

2.4 Tone durations

Now let's turn to rhythmic shaping: what duration values are used in the composition, and how often do they occur in each case?

In the following evaluation, the quarter note is given the value 1. Shorter and longer note values are named accordingly as multiples or divisors of 1.

As expected, Mozart uses mainly quarter notes (1) and smaller note values (<1). However, there are also a few longer notes. If we want to know the exact number of duration values and are also interested in the <1 range, we have to display the frequency table again:

To explain the duration values: They are multiples or divisors of a quarter note (=1). Thus:

2.5 Metrical profile

How clearly is the meter articulated in the voices of a composition - by the placement of tones on measure beginnings or on metrically important positions within the measure (e.g. the middle of the measure or on the quarter positions)? For this purpose, a list of the frequencies of tones on the various metrical positions can be displayed.

Of course, such a profile presupposes that the examined piece is in a single meter and has no meter changes. This can be checked with the following command:

In our Mozart movement, both 4/4 and 3/4 measures appear, whereby the 3/4 measure even predominates - although the piece begins in 4/4 time. The following command therefore creates two different metric profiles - one for the 4/4 measures, one for the 3/4 measures.

3. Two-dimensional frequency distributions

We have already looked at the frequencies of pitches and pitch classes. Now we could say: Longer tones naturally have more weight than short tones or tones between beats. We can pursue this idea further by looking at combined, 'double' or 'bivariate' frequency distributions: for example, the frequencies of the pitches for different duration values, or the frequencies of the pitch class for the different metrical positions. In the following, we will deal with this by means of two examples.

3.1 Permanent values of the pitch classes

The following command creates a so-called 3D graphic, where the frequencies of duration values per pitch class are displayed. Both the height and the color of the columns stand for the respective frequency (from blue=very rare via green and yellow to red=very frequent):

Since the assignment of the bars to the note values is a bit confusing (the numbers refer to the subsequent fields), we use the following command to display the corresponding frequency table:

It is not very surprising that the root E-flat (=D#) and the fifth B (=A#) occur mainly as eighth notes (0.5) and quarter notes (1.0). After all, these are the most frequent duration values!

The plot can be changed from pitch classes to pitches by selecting 'Pitch' (in single quotes) at the parameter plot_with=. Now, however, the 3D graphic becomes a bit more confusing...

3.2 Metric positions of the pitch classes

Now to the question: on which positions in the bar occur the twelve pitch classes? The following command will generate the corresponding 3D graphic.

The same works with pitches. For this we simply have to replace 'PitchClass' with 'Pitch' in the ‘plot_with’ parameter:

Again, it may be useful to look at the graph in Matplotlib's external pop-up window.

However, we have now made a mistake: the Mozart movement does have a change of time signatures!!! However, we did not distinguish between 4/4 time and 3/4 time in our evaluation. Therefore, we have to repeat all commands again after we have differentiated into the two time signature types.

4. Application of bar and voice filters

All statistical queries can be executed on any sections and voices with the help of an easy-to-use filter function (cf. Tutorial Part 1, Section 5). All you have to do is to enter the corresponding measures and voice names.

Here are two examples:

  1. With the following commands a combined list of the interval frequencies of the two violin parts ('PartID':'1-2') for the first five measures ('Measure':'1-5') can be displayed, exported or shown as histogram.

This selection can be changed at will!

  1. With the following commands a list of the pitch classes of the cello part ('PartID':'4') for the first ten bars ('Measure':'1-10') can be displayed, exported or shown as histogram.

This selection can also be changed as desired!

5. Tasks

So far, we have only looked at the results on the basis of a single piece. But how does the situation look now if we compare several pieces, e.g. several or all movements of a composition, with each other - and with other pieces? Are there stylistic regularities - or do the differences predominate?

Load compositions of your choice (different genres, composers, and eras) and compare these pieces with each other in terms of frequencies of pitches, pitch classes, note values, and intervals. Interpret the results in each case with a look at the sheet music!