====== Corpora Work Progression ====== This page tracks the current state of corpus work, known issues, recurring MEI encoding patterns, and links to general documentation. ===== General documentation ===== ==== Git and GitHub ==== * [[git|Introduction to Git and GitHub]] ==== MEI and XML ==== * [[mei|Introduction to MEI and XML schema]] ===== Current corpus status ===== [[https://github.com/egorpol/DdT_1_vol_11#current-progress]] ===== Current corpus assignments ===== [[https://nextcloud.uni-weimar.de/f/42181526]] ===== Open bugs ===== * mei-friend may create new branches if it cannot properly save work on the **main** branch. * **Always check which branch you are saving your work on.** ==== Requested features / improvements ==== tba ===== Quick reference: recurring MEI topics ===== ==== Rests ==== In MEI, ordinary rests and full-bar rests are not the same thing. * '''' = a non-sounding event with a specific written duration * '''' = a complete measure rest, independent of meter * '''' = multiple consecutive complete-measure rests compressed into one symbol, typically in parts * '''' = an explicitly empty measure, where no musical content is encoded but nothing is considered missing === When do we use ''''? === Use '''' when the source shows a rest with a specific notated duration inside the rhythmic flow of the layer. Typical cases: * quarter rest inside a 4/4 measure * half rest followed by notes * a rest that is part of ongoing voice activity * silence in one voice while another voice continues Example:

=== When do we use ''''? === Use '''' when the layer is silent for the entire measure and the source expresses that as a full-bar rest. This is the preferred encoding for a complete measure rest because it does not depend on the current meter. Example:

=== Local recommendation === For our corpus work: * use '''' for ordinary rests with explicit duration inside a measure * use '''' when a whole measure is silent in that layer * do **not** replace a full-bar rest with a duration-based '''' just because the meter happens to allow it * prefer '''' when the notation is semantically a “whole measure rest” === Important restriction === A layer containing '''' should not also contain notes or ordinary rests. Control events such as fermatas may still occur alongside it. === Full-bar silence in different meters === A full-bar rest should usually still be encoded as:

This remains true in different meters: * 2/4 * 3/4 * 4/4 * 3/8 * 6/8 * etc. The point of '''' is that it means “this whole measure is silent”, regardless of how many beats the measure contains. === What about multi-measure rests? === Use '''' when the source compresses several complete silent measures into a single multiple-rest symbol. Example:

=== Local recommendation for multi-measure rests === For our project: * use '''' only when the source actually presents a compressed multi-measure rest * avoid '''' in score-like encodings * if consecutive silent measures are shown individually in the source, encode them as separate measures with separate '''' elements === Empty measure vs. rest === Do not confuse: * '''' = actual notated rest event * '''' = actual complete-measure rest * '''' = explicitly empty measure with no encoded content Use '''' only when the layer is intentionally empty and this emptiness is itself what needs to be represented. === Practical examples === ==== Example 1: ordinary rest ====

==== Example 2: full-bar rest ====

==== Example 3: multiple-rest in a part ====

=== Project rule of thumb === Ask: * is this a rest with its own written duration inside the measure? → use '''' * is the whole measure silent? → use '''' * are several full silent measures compressed into one sign? → use '''' ==== Voice handling ==== In MEI, a '''' is an independent stream of events on a staff. A staff may contain more than one layer in order to represent multiple voices. === What is a layer? === A layer is best understood as a single rhythmic and event stream within one staff. For beginners: * one staff may have one layer * one staff may also have two or more layers * multiple layers are usually used when distinct voices must be represented independently === When does one staff need more than one ''''? === Use more than one layer when the notation clearly contains multiple independent voices on the same staff. Typical cases: * soprano and alto on one staff * tenor and bass on one staff * independent rhythms in upper and lower voices * overlapping note values that cannot be represented cleanly in a single stream * voice-specific ties, rests, slurs, or cues that belong to different voices Do **not** create extra layers just because stems point in different directions once or twice if the passage is still best understood as one continuous voice. === How do we distinguish voices clearly? === At minimum: * give each layer its own ''n'' value * keep one voice consistently in the same layer as far as possible * avoid switching the same musical voice back and forth between layers without a good reason Example:

=== Local project policy for layer numbering === Recommended local convention: * ''layer n="1"'' = upper voice or primary voice * ''layer n="2"'' = lower voice or secondary voice * keep this convention stable across the corpus * only add ''n="3"'', ''n="4"'', etc. when genuinely required === Shared stems and polyphonic overlap === Shared stems and polyphonic overlap are often visually complex, but the encoding priority should be: * represent the musical voices clearly * keep each independent voice in its own layer when needed * avoid forcing polyphonic notation into one layer if that makes duration or tie logic unclear A useful practical rule is: * if two events behave like separate voices, encode them in separate layers * if the notation is only visually compressed but musically still one stream, keep one layer === Why this matters for ties and other relations === This matters because some relations are layer-sensitive. For example: * a tie that starts in one layer should also end in that same layer So unstable or inconsistent layer assignment can create problems later. === Recommended workflow === When deciding whether to split into layers, ask: - are there two independent rhythmic streams? - are there overlapping note values that imply separate voices? - are rests voice-specific? - do ties or slurs belong to separate voices? - would one-layer encoding become confusing? If yes to several of these, use multiple layers. ==== Generalbass ==== MEI supports figured bass through harmonic indication markup. The key elements are: * '''' = the harmonic indication as the attached object * '''' = the figured-bass container * '''' = one individual figure or component inside the figured bass sign === Which elements do we use? === For our corpus, the default pattern should be:

This means: * ''harm'' provides the attachment point * ''fb'' says this is figured bass / Generalbass * ''f'' holds the visible figure component * ''staff'' identifies the staff the figured bass belongs to * ''tstamp'' identifies the rhythmic position in the measure === How do we align figured bass with notes or harmonic events? === ''harm'' must define a point of attachment using one of these attributes: * ''startid'' * ''tstamp'' * ''tstamp.ges'' * ''tstamp.real'' The most common attachment methods are ''startid'' and ''tstamp''. For practical work in this corpus, I recommend: * use ''tstamp'' + ''staff'' as the default encoding for figured bass * use fractional timestamps such as ''tstamp="4.5"'' when a figure changes on the second eighth of a quarter-note position * avoid mixing ''startid'' and ''tstamp'' in the same local figured-bass passage, because Verovio may render the figures on different vertical baselines * use ''startid'' only when a figure must be linked to a specific encoded event and no clean timestamp solution is available === Example with ''tstamp'' ===

=== Example with sequential figures under one bass note ===

Use this pattern when the source shows two separate figured-bass signs during one written bass note, for example a quarter note whose harmony moves like two eighths. Do **not** create an extra '''' only to anchor the second figure. The timestamp is the anchor. === Ordering of figures === The order of ''f'' elements is significant. Figures inside the same '''' are simultaneous and should be encoded in the order they appear, usually top to bottom on the page. So this:



  6
  4

means a stacked figure, not a sequence over time. It is not the same as ''6'' followed later by ''4''. For a time sequence, use separate '''' elements with separate timestamps:

=== Accidentals in figured bass === Accidentals can be encoded directly in the figure content. Example:



  
    7♭

=== Recommended local policy === * use ''harm'' + ''fb'' + ''f'' as the default structure * prefer ''staff'' + ''tstamp'' for all figured-bass entries * use fractional timestamps for mid-note figure changes * use multiple '''' elements only for simultaneous stacked figures * use multiple '''' elements for successive figured-bass signs * avoid local ''vo'' fixes unless there is no cleaner encoding solution * avoid artificial '''' events used only as figured-bass anchors * preserve figure order as written * keep editorial additions explicitly distinguishable from source readings ==== Ties ==== A tie connects two notes of the same pitch so that the first note sounds for the combined duration of both notes. === Basic principle === Use ties only when: * the connected notes have the same pitch * the notation indicates a tie rather than a slur * the sounding duration is continued across noteheads === How are ties encoded? === The simplest MEI method uses the ''tie'' attribute. Allowed values: * ''i'' = initial * ''m'' = medial * ''t'' = terminal Example:

=== Ties across barlines === A tie may continue into the following measure. Example:

=== Ties and layers === This point is crucial for corpus consistency: * a tie that starts in one layer must end in the same layer So for local practice: * never begin a tie in layer 1 and end it in layer 2 * if the voice continues, keep it in the same layer * check layer assignment before debugging a “broken” tie === Ties on chords === The ''tie'' attribute can also be used on ''chord''. When used on a chord, it acts as shorthand for multiple ties on all unchanged pitches in the chord. Example:

=== Local recommendation === For our project: * use ''tie'' on ''note'' for simple cases * use ''tie'' on ''chord'' only when the shorthand is genuinely clear * if only some notes of a chord are tied, encode ties on the individual ''note'' elements instead * check pitch identity carefully before calling something a tie === Tie vs slur === Do not confuse: * tie = same pitch, sustained duration * slur = phrasing or articulation grouping, usually across different pitches This distinction matters both musically and computationally. === Minimal checklist === Before encoding a tie, ask: * are the connected notes the same pitch? * is this really a tie, not a slur? * does the tied continuation stay in the same layer? * if the notes are in a chord, are all pitches tied or only some? ==== Barline ==== When barlines should run through some staves, then break, and then continue through a lower group, encode this with nested ''staffGrp'' elements. Local recommendation: * use one outer ''staffGrp'' for the full system * create one child ''staffGrp bar.thru="true"'' for each continuous barline span * put the lower brace or bracket on the child group it actually belongs to * do not rely on two top-level sibling groups if the system belongs together visually Example: barline through staves 1-3, break, then through staves 4-5:

Minimal checklist: * where should the barline continue without interruption? * where should it stop? * does each continuous span have its own child ''staffGrp''? * is the brace or bracket attached to the correct subgroup? ==== Staff and page breaks ==== For the current editorial task, also check the position of '''' and '''' carefully. This is important because misplaced system or page breaks can change the number of bars per system or page and create mismatches with the facsimile layout. Local recommendation: * always compare encoded '''' and '''' positions with the facsimile * check whether the number of measures per system matches the source * check whether the number of measures per page matches the source * if the bar count looks wrong, inspect break positions before changing musical content Minimal checklist: * does each encoded system begin where the facsimile begins? * does each encoded page begin where the facsimile page begins? * do the measures per system and per page still align with the source image? ==== Facsimile zones ==== When a measure still has no facsimile mapping, add a new '''' in the relevant '''' and then link that zone to the measure with the ''facs'' attribute. Local recommendation: * create the new '''' inside the correct '''' * give it a unique ''xml:id'' * set the bounding box with ''ulx'', ''uly'', ''lrx'', and ''lry'' * use ''type="measure"'' for measure zones * add or update ''facs="#zone_id"'' on the matching '''' Example:

...

Minimal checklist: * is the new zone on the correct page surface? * does the bounding box match the visible measure in the facsimile? * does the ''facs'' value point to the correct zone id? * does each measure point to exactly the intended measure zone? ==== Duration encoding: dur vs. dur.ppq ==== Please encode written musical duration with ''dur'' and, where needed, ''dots''. ''dur'' records the notated value, for example ''dur="4"'' for a quarter note or ''dur="8"'' for an eighth note. ''dur.ppq'' records a calculated playback/timing value in pulses per quarter note. It is not the editorial duration and does not represent the visual notation directly. Local directive: * use ''dur'' and ''dots'' as the authoritative duration encoding * do not add ''dur.ppq'' to notes, rests, chords, or spaces * do not add ''ppq'' to ''staffDef'' or ''scoreDef'' * remove existing ''dur.ppq'' and ''ppq'' values from imported files during cleanup Reason: * the edition is based on the facsimile and should preserve written notation, not generated timing data * ''dur.ppq'' is usually import or playback residue and can disagree with the written ''dur''/''dots'' * rhythmic proofreading should check whether the written durations fill the measure Minimal checklist: * does every timed event have a clear written ''dur'' where required? * are dotted values encoded with ''dots''? * have ''dur.ppq'' and ''ppq'' been removed? ==== Cross-staff notation ==== For cross-staff chords encode it as one logical '''' in a single '''', and place the low (or high) note on the other staff with ''staff="5"'' (use the number of corresponding staff). The lower (or higher) staff's time is then filled with '''' so that the rhythmic alignment remains complete. The MEI guidelines explicitly give this as an alternative for cross-staff chords. Some software handles this method better than separate cross-staff notes connected only by ''stem.with''. Example:

...

For mei-friend / Verovio-style workflows, this second version is often the more robust semantic encoding because it represents one chord rather than two separate events that merely share a stem. Local recommendation: * use a single '''' with note-level ''staff'' when the source clearly shows one chord split across staves * fill the destination staff's elapsed time with '''' when that staff has no independent sounding event at that moment * keep the written duration on the '''' with ''dur''