K-ToBI (KOREAN ToBI) Labeling Conventions

(version 2.1, revised November 1996)

Mary E. Beckman (OSU) and Sun-Ah Jun (UCLA)

[Please note that figures and sound files are being prepared to be uploaded]
    1. Synopsis
    2. Word Tier
    3. Break-Index Tier
    4. Tone Tier
    5. Miscellaneous tier
    6. Online data files and future versions
    7. References
    8. Appendix
1. Synopsis

K-ToBI is a prosodic transcription convention for standard (Seoul) Korean.  It is based on the design principles of the original English ToBI (see Silverman et al., 1992; Beckman & Hirschberg, 1994; Pitrelli et al. 1994), and more directly on J_ToBI, the Japanese ToBI system devised by Jennifer Venditti (see Venditti, 1995; Campbell & Venditti, 1995).  Like the other ToBI systems, therefore, K-ToBI assumes an intonational phonology with a close relationship to a hierarchical model of prosodic constituents.  The intonational analysis and attendant prosodic model of Seoul Korean were developed at the Ohio State University by Sook-hyang Lee, Sun-Ah Jun, and Kenneth de Jong (see Lee, 1989; Jun, 1990, 1993, 1995; de Jong, 1994).  A first version of K-ToBI was developed at ATR Interpreting Telecommunication Systems in late 1994.  The present version is updated one, modified in accordance with the discussion of the Japanese/Korean working group at the Prosody Transcription Workshop held just before ICPhS in Stockholm, August 1995.

A K-ToBI transcription for an utterance consists minimally of a recording of the speech, an associated record of the fundamental frequency contour, and (the transcription proper) symbolic labels for events on the following four parallel tiers:

1. a word tier
2. a tone tier
3. a break-index tier
4. a miscellaneous tier
The original English ToBI allows the free proliferation of site-specific extra tiers, and so do J_ToBI and K-ToBI.  Sites with aligner for English, for example, have generally added a phones tier for phonetic segmentation, and J_ToBI users have agreed to add an obligatory "finality" tier where intonational phrases that sound "final" to a turn are minimally marked as such (until we can develop a more complete discourse model of discourse finality to govern a hierarchy of labels for this tier).  In accordance with this general design principle, K-ToBI users are encouraged to add their own customized tiers to label events of site-specific interest, and keep records of why each particular tier was added and how it is used.  By comparing extra tiers across labeling sites we, too, may find that we all agree on the desirability of some generally used tiers specific to Korean labeling which can be made obligatory for K-ToBI.
2. The word tier

The word tier in K-ToBI (and J_ToBI) corresponds to the "orthographic tier" in English ToBI.  In this tier, words may be labeled using either Hangul orthography or some conventional romanization, depending on what is more convenient for the users labeling platform or on what is most appropriate for exporting to relevant applications.  Since what constitutes a "word" in Korean is a matter of some debate, we cannot be very specific here about how frequently to place word labels.  For example, the intended applications at one site might require that a word label be placed for each morpheme string that has its own separate entry in some on-line dictionary.  At the other end of the scale, another site may need only that there be as many labels as there are spaces in a standard Hangul transcript of the text.  We anticipate that different sites may find that the intended applications pose specific needs as to how finely an utterance should be broken up into words, and that eventually a consensus will emerge from these needs.  In this version, we consider ‘word’ as a sequence of segments divided by space.

If the labeling platform is xwaves and xlabel (or any similar labeling platform that works in terms of time flags), the word label should be placed at the end of the final segment in the word, as determined by the labeler from the waveform or spectrogram record. That is, each word should be marked at its right edge.  Filled pauses and the like should also be marked, using some site specific convention for the Hangul or romanized spelling.  A romanization convention used at UCLA and ATR site is in Appendix A.

3. The break index tier

Break indices represent a rating for the degree of juncture perceived between each pair of words and between the final word and the silence at the end of the utterance.  They are to be marked after all words that have been transcribed in the word tier.  All junctures -- including those after fragments and filled pauses -- must be assigned an explicit break index value; there is no default juncture type.

Values for the break index are chosen from the following set:

Example sentences: (sound files and f0 tracks will be available soon)
1. <<l8c3>>
                   azumEninga  ENze  maNdIrEjo?
                                     2        1                  3
                   ‘madam         when make’
                -> ‘When is Madam making (it)?’

2. <<t1p2s10>>
                    igEsIN  uridIR  maIMU  segjeedo  hAdaQdweNda
                              2         2            2            2                       3
                    ‘This      our     mind      world too   apply to’
                 -> ‘This applies to insdie our mind’

3. <<t1p2s6>>
                    zIG, saNhonIN  saraiSImjE  aMsEgIN  zugEiNnIn  gEsida
                          3              2                 3              2               1          3
                   ‘That is, coral-TOP  alive   and   rock-TOP  dead-progressive rel.marker to be’
                   -> ‘That is, coral is alive and rock is dead’

(cf. for an example of ‘0’ break, see the example sentence 6 below)
Note that while the accentual phrase and intonational phrase are defined in the prosodic model by tonal markings, the break index value indicates the labeler’s subjective sense of disjuncture and not simply the juncture that typifies the apparent tones.  Thus, the break index tier markings are not completely redundant to the tone tier markings for break index levels 2 and 3.  In cases of mismatch, the number should follow the perceived juncture rather than the tones, although it should be flagged with the diacritic "m", as in:

2m a medium strength disjuncture that typically would be marked by the  tonal pattern of the accentual phrase, but without any tonal  markings, or with the tonal markings of an intonational phrase edge.
3m a stronger disjuncture that typically would be marked by the tonal  pattern of an intonation phrase, but with the tones of an accentual  phrase rather than a boundary tone.

Note that low-ending phrases marked with the 2m label in theory allow two corresponding analyses on the tone tier:  (1) no tonal markings for the end of the accentual phrase, or (2) a L% boundary tone.  Since the latter is a more complex analysis, we prescribe the first.

In an xwaves/xlabel type system, the break index label should be associated with a point in time at the end of each word, as indicated in the word tier.  It should be located exactly at, or slightly to the right, of this word marker, so that break indices can be unambiguously associated with other tiers.  Transcriber uncertainty about break-index strength is to be indicated with a minus ("-") diacritic affixed directly to the right of the break index -- e.g.  "1-" to indicate uncertainty between "0" and "1"; "2-" to indicate uncertainty between "2" and "1"; and so on.
Note that since the "m" diacritic suggests certainty about the break index analysis in the face of conflicting tonal evidence, the "-" diacritic should not be used together with "m".  That is, for example, in the case of a break with the sense of disjuncture usually associated with an accentual phrase but no corresponding rise indicative of LHa, either the labeler is unsure that there is such a strong sense of disjuncture (i.e. "2-") or the labeler is sure of the
disjuncture and this is a mismatch (i.e. "2m").

Example sentences (sound files and f0 tracks will be available soon)

4. <<t1p1s2>>
    doQgi  bujEU  du  hjEQtA  zuQesE  iRbaNzEgiN  kEsIn  waNzEnhwa,
            2         2    2-          1          2m                 1         3                   3
    ‘motivation giving-poss two method among general-rel thing-top completeness’
    -> ‘Among two kinds of providing motivation, the general thing is completeness’

5. <<t1p2s5>>
    gIrEna, gatIN  hjENmigjEQe  sanho  zogagIR  noko  bomjEN
              3-       2                      3        1           2-        1            3
    ‘but, same microscope-loc. coral piece-acc. to put and see if’
     -> ‘But, if you see a piece of coral under microscope,...’

6. <<t1p2s5>>
    sanhoga  sEQzaQhamjENsE  bjENhwahago  iDTanIN  gEsIR  aR  SuiDTa.
               2m                         2                     2-            1         2    0            3
    ‘coral-nom. growing change-prog.-rel. thing-acc.  to see’
    -> ‘We can see that coral is growing and changing’

4. The tone tier

The tone tier assigns tone labels for the accentual phrases and boundary tones.  Although the current K-ToBI works in terms of the Seoul Korean tone system, K-ToBI would be easy to accommodate to other dialects by expanding the inventory here.  There are two classes of tones: those associated with the accentual phrase, and those associated with the right edge of the intonation phrase.  The accentual phrase tones are:

LHa the accentual phrase edge tone.  It marks the right edge of an intonation-phrase-medial accentual phrase.
H- the phrasal H-.  It marks the early peak seen around the first or second syllable of some accentual phrases.  Such a peak occurs very often in the last accentual phrase of an intonational phrase that ends in L%.  (In fact here it is the typical case unless the phrase is so very short that there cannot be any rise at all).  The peak also can readily occur in phrases with final rises (i.e. medial accentual phrases, which end in LHa, or final accentual phrases before H%, LH%, etc.) in cases where the phrase is long enough to realize another smaller peak at the beginning.  The timing of the peak depends on the initial segment in the phrase.  When the phrase begins with an aspirated or tense obstruent, the peak will be on the first syllable, but if the initial segment is anything other than an aspirated or tense obstruent, the peak for the H- will occur later, and there will be a noticeable rise to it from a L tone on the first syllable.  The H- early phrasal tone should be placed at the corresponding peak observed in the F0 contour.

The LHa tone should be placed at or just before the corresponding break index marker regardless of the actual location of the peak.  If the peak is not at the phrase boundary as observed in the waveform, then the corresponding F0 peak should be marked by one of the

Note that while here we include both logical possibilities, in reality, we have yet to observe an instance of "<".  That is, typically the peak is slightly before the accentual phrase boundary.

Example sentences (sound files and f0 tracks will be available soon):

7. <<t1p2s8>>
    sEQzaQhago  iNnIN  gEsi  saraiNnIN  gEsida
       H-                      > LHa      H-                  L%

    ‘to grow-prog. is              to live-prog.’
    -> ‘Growing means that it is live’

8. <<t1p2s6>>
    zIG,  saNhonIN     saraiSImjE aMsEgIN zugEiNnIn gEsida
    H- L%    H-    LHa          L%          LHa      H-                L%

    ‘i.e., coral-TOP  alive   and   rock-TOP  dead-progressive rel.marker to be’
    -> ‘That is, coral is alive and rock is dead’

The boundary tones for intonational phrases include:
All intonation-phrase final boundary tones are placed at or just before the break index mark, regardless of F0 contour.  When a word is final to an accentual phrase and fianl to an intonational phrase, only the intonational phrase boundary tone is written at or just before the endof the word.  The actual peak in the F0 contour corresponding to H%, LH%, HL%, LHL%, etc., should be marked by one of the following (unless the peak for a H% or LH% is actually at the phrase edge):
For any of the complex boundary tones with more than one H tone, the peak that is marked in this way should be the highest one.  In this way, K-ToBI (like J_ToBI) provides for pitch range estimators without a separate HiF0 label.  Again, note that while we include both logical possibilities for boundary tones ending in a H, in reality, we have yet to see a case of a rising boundary tone with "<", and of course "<" is logically impossible for rising-falling tones HL%, LHL%, etc.

Example sentences (sound files and f0 tracks will be available soon):

9. <<4boundary>>  gIrASEjo
                                  H-    H%
                            ‘Is that so?’

10. <<4boundary>> gIrASEjo
                                H-       LH%
                           ‘Is that so?’

11. <<4boundary>> gIrASEjo
                           ‘Is that so?’

12. <<4boundary>> gIrASEjo
                                 H-     LHL%
                            ‘Is that so?’

13. <<J3A2 >> onIR zEnyEge nuga mEgEyo
                                      LHa     H-          HLH%

                           ‘Today night     who    eat?’
                           -> ‘Who is eating tonight?’

5. The misc tier

The miscellaneous tier will be used for any comments or markings (e.g., silence, audible breaths, laughter, disfluencies, and so on) desired by particular transcription groups.  The only conventions K-ToBI specifies for this tier are that events that cover some clearly specifiable interval (such as breaths or laughter) be labeled at both their temporal beginnings and ends, using label pairs of the sort:
  • laughter < beginning of interval of laughter
  • laughter > end of a period of laughter
  • so that the interval is delimited by the < .... > pair.

    6. Online Data Files and Future Versions

    All examples (sound file, f0 track, and labels) shown in this manual can be accessed by ftp. (To get the user id & password, email to jun@humnet.ucla.edu.) All examples are under a subdirectory called K-ToBI.  This directory includes more examples, some labeled and some not, for labellers to practice transcribing the K-ToBI system.  As more speech data become available, this labeling guidelines may be refined further.
    Beckman, Mary; & Hirschberg, Julia (1994) The ToBI Annotation Conventions. Manuscript, Ohio State University. [For information on obtaining by ftp, send e-mail to tobi@ling.ohio-state.edu.]

    De Jong, Kenneth (1994) "Initial tones and prominence in Seoul Korean," Ohio State University Working Papers in Linguistics, No. 43, pp. 1-14.

    Jun, Sun-Ah (1990) "The prosodic structure of Korean -- in terms of voicing," In E-J. Baek, ed., Proceedings of the 7th International Conference on Korean Linguistics, pp. 87-104.  University of Toronto Press.

    Jun, Sun-Ah (1993) The Phonetics and Phonology of Korean Prosody. Doctoral Dissertation, Linguistics, Ohio State University.  [For information on ordering, send e-mail to osdl@ling.ohio-state.edu.]

    Jun, Sun-Ah (1995) "Asymmetrical prosodic effects on the laryngeal gesture in Korean," In Bruce Connell and Amalia Arvaniti, eds., Phonology and Phonetic Evidence: Papers in Laboratory Phonology IV, pp. 235-253.  Cambridge University Press.

    Lee, Sook-hyang (1989) "Intonational domains of the Seoul dialect of Korean," Journal of the Acoustical Society of America, vol. 85, suppl. 1, p. S99.
    Pitrelli, John; Beckman, Mary; & Hirschberg, Julia (1994) "Evaluation of prosodic transcription labeling reliability in the ToBI framework," Proceedings of the 1992 International Conference on Spoken Language Processing, vol. 1, pp. 123-126.

    Silverman, Kim; Beckman, Mary; Pitrelli, John; Ostendorf, Mari; Wightman, Colin; Price, Patti; Pierrehumbert, Janet; & Hirschberg, Julia (1992) "ToBI: a standard for labeling English prosody," Proceedings of the 1992 International Conference on Spoken Language Processing, vol. 2, pp. 867-870.

    Venditti, Jennifer (1995) Japanese ToBI Labeling Guidelines. Manuscript with examples, Ohio State University.  [For information on obtaining by ftp, send e-mail to venditti@ling.ohio-state.edu.]

    Campbell, Nick; & Venditti, Jennifer (1995) "J-ToBI: an intonational labeling system for Japanese," Paper presented at the Autumn, 1995, Meeting of the Acoustical Society of Japan.

     Appendix [in preparation]
