Morphological glossing

ETCSRI's Glossing

In ETCSRI every word form is represented on three levels:

  1. in orthographic representation, or transliteration;
  2. in morphemic representation (M1); and
  3. in morphemic glossing representation (M2).

All three representations of a given word form are stored in the project's glossary under the heading of the lemma to which the word form is assigned. Homographic word forms (i.e. word forms that have the same OR but differ in their morphemic representations [= M1]) and homomorphemic word forms (i.e. morphemic representations that are identical but receive different morphemic glossing [M2]) will, however, be distinguished by disambiguators [http://oracc.museum.upenn.edu/doc/builder/linganno/SUX/#Disambiguation] in the lemmatized transliteration.

Morphemic representation (M1)

The morphemic representation of each word form is presented in the #-part (= MORPH [http://oracc.museum.upenn.edu/doc/builder/linganno/#Replacing_X's]) of the @form lines in the glossary. ETCSRI uses special conventions (detailed below) to encode an analysis of Sumerian that follows a modified version of the analysis outlined in Zólyomi 2007b. This approach describes Sumerian using the model of so-called template morphology (see, e.g., Stump 1998), which arranges all morphemes into structural slots. The grammatical morphemes are listed with their morphemic glossing correspondences on the page "Morphological parsing" of this portal. Ex. (1) below shows the M1 of the word form transliterated as lugal-a-ni:

(1) @form lugal-a-ni /lugal #N1=lugal.N3=ani.N5=ra

Each morpheme is represented by a complex symbol that consists of a position tag (= p-tag) followed after an equals sign by the morpheme itself. The symbols are seperated by periods. A position-tag gives the morpheme's position within the nominal, verbal, or non-finite verbal template, and has the form like N1 which means "the first slot of the nominal template" (for the nominal, verbal, and non-finite verbal templates, cf. the document entitled "ETCSRI's Morphological Parsing"). In M1, stems are represented by the Citation Form [http://oracc.museum.upenn.edu/doc/builder/linganno/#Replacing_X's] (= CF) used by ePSD [http://psd.museum.upenn.edu/epsd/index.html]. If the stem is fully or partially reduplicated, then it is represented by its reconstructed form in M1 (for example, the PF form of ŋar will be ŋaŋa, the pluralform of dim₂ will be dimdim). In case of verbs using suppletive stems, the stems are represented by the Citation Form of the suppletive stem in M1. In ex. (2) below, for example, the stem is represented by "lah", although this verbal form is listed in the glossary file (sux.glo) under the lemma de [bring], and the verbal form is lemmatized as de[bring] in the text file.

(2) @form mu-na-lah₅ /lah₅ #V4=mu.V6=nn.V7=a.V11=n.V12=lah.V14=ø

The Sumerian NP contains no suffixes but enclitics cumulated at its right end. Consequently, word forms listed in the glossary may contain no grammatical morphemes at all, grammatical morphemes relating to the head may occur in the glossary entry of the modifier or the possessor in NPs which include also a modifier and/or a non-pronominal possessor, as shown in ex. 3 below:

(3) dumu ki aĝ₂ {d}nin-ĝir₂-su-ka-ke₄

The NP in ex. (3) above contains four lexemes (the last one, the divine name is itself complex but its inner structure is disregarded here). Parts of this NP will be listed under five different glossary entries:

@form dumu #N1=dumu

@form ki\abs #N1=ki.N5=ø

@form ki\abs_aĝ₂

@form aĝ₂ #NV2=aĝ

@form {d}nin-ĝir₂-su-ka-ke₄ #N1=ninĝirsuk.N5=ak.N5=e

The head of the NP is "dumu", but the ergative case-marker in N5 indicating the syntactic function of the NP will occur in the glossary entry of "ninĝirsuk", the possessor of "dumu". An M1 like "#N1=dumu" indicates that "dumu"
(a) is the head of a NP that contains either a modifier and/or a non-pronominal possessor, or
(b) occurs in a sequence of NPs among which only the last one is case-marked.
The form "#N1=dumu" stands in contrast with a form like, e.g., "#N1=dumu.N5=ra" which is the head of a NP that consists only of a head.

The M1 "N1=ninĝirsuk.N5=ak.N5=e" indicates by having two morphemes with an N5 position-tag that it functions as the possessor of a NP. In contrast, an M1 like "N1=ninŋirsuk.N5=ak" may indicate that it is either
(a) a left-dislocated (i.e., anticipated) possessor, or
(b) the possessor of a NP that occurs in a sequence of NPs among which only the last one is case-marked.

Unidentifiable morphemes are marked with an "x" in M1. In the verbal form in ex. (4) below, the morpheme in V11 cannot be specified because of obscurity of the context (Statue B [E3/1.1.7], ix 25).

(4) @form he₂-mi-ĝal₂ /ĝal₂ #V1=ha.V2=i.V4=m.V5=b.V10=i.V11=x.V12=ŋal.V14=ø ##V1=MOD1.V2=FIN.V4=VEN.V5=3-NH.V10=L2.V11=X.V12=STEM.V14=X

Morphemic glossing (M2)

The morphemic glossing (henceforth, M2) of each word form is presented in the ##-part of the @form lines in the glossary. The morphemic glossing used by ETCSR attempts to follow the guidelines of Lehmann 2004. The grammatical morphemes are listed with their morphemic glossing correspondences on the page "Morphological parsing" of this portal. Ex. (5) below shows the morphemic representation of the word form transliterated as lugal-a-ni with its morphemic glossing:

(5) @form lugal-a-ni /lugal #N1=lugal.N3=ani.N5=ra ##N1=STEM.N3=3-SG-H-POSS.N5=DAT-H

In M2, in the glossary file (sux.glo), stems are represented by the gloss describing the kind of stem, e.g. "STEM", or "STEM-RDP" etc., while in the proper name file (qpn.glo) stems are referred to as "NAME" as in ex. (6) below.

Unidentifiable morphemes are marked with an "X" in morphemic glossing. In ex. (6) below, which contains a word form from Gudea 53 [E3/1.01.07.053], 4, the morpheme /e/ cannot be identified on the basis of the context; consequently it is glossed as "X".

(6) @form ĝir₂-nun-na-ke₄ %sux /ĝir₂-nun-na #N1=Ŋirnuna.N5=ak.N5=e ##N1=NAME.N5=GEN.N5=X

Disambiguators

Consider the entry of the Sumerian word a₂ "arm" in the glossary of the RIME 3/1-texts:

@entry a [arm] N
@bases a₂
@form a₂ /a₂ #N1=a ##N1=STEM
@form a₂\abs /a2 N1=a.N5=ø ##N1=STEM.N5=ABS
@sense N arm
@end entry

The spelling a₂ corresponds to two different morphemic representations. In case of the first one, the stem stands alone without any case-marker, as in Gudea Statue D [E3/1.01.07], iv 2:

2. a₂ {d}nanše-ta
#lem a[arm]; Nanše[1]DN
M1: N1=a N1=Nanše.N5=ak.N5=ta
M2: N1=STEM N1=STEM.N5=GEN.N5=ABL

In the second one, the stem is followed by an absolutive case-marker in N5, as in Urbau 5 [E3/1.01.06.05], i 10:

10. a₂ šum₂-ma {d#}nin-ĝir-su-ka-ke₄
#lem a[arm]\abs; šum[give]; Ninĝirsuk[1]DN
M1: N1=a.N5=ø NV2=šum.NV4='a N1=Ninĝirsuk.N5=ak.N5=e
M:- N1=STEM.N5=ABS NV2=STEM.NV4=SUB N1=STEM.N5=GEN.N5=ERG

Homographic word forms (i.e. word forms that have the same spelling but differ in their morphemic representations [M1]) will be distinguished by the use of disambiguators [http://oracc.museum.upenn.edu/doc/builder/linganno/SUX/#Disambiguation] in the lemmatized transliteration. So here, in the Ur-Bau text, a₂ receives the disambiguator \abs indicating that this word form should be analyzed as having an absolutive case-marker /ø/ in its N5. As a consequence, this occurrence of a₂ will enter the glossary as a form different from its occurrence in Gudea Statue D, and a₂ will have two forms in the glossary:

@ form a2
@ form a2\abs

The two forms are homographic on the level of the OR, but different on the level of M1, resulting in different morphemic glossing. Homomorphemic word forms (i.e. morphemic representations that are identical but receive different morphemic glossing [M2]) will be distinguished similarly in the lemmatized transliteration.
The full list of disambiguators cannot be compiled in advance. One can only foresee the disambiguators that distinguish homomorphemic word forms, as they are the result of ETCSRI's morphological analysis. The number and kind of disambiguators that distinguish homographic words will only be clear during the preparation of the corpus. (Hopefully their number will not be enormous!)
The general principle followed by ETCSRI is that always the less frequent form (or the form thought to be less frequent) gets a disambiguator.
Note that our use of disambiguators makes Augmentation [http://oracc.museum.upenn.edu/doc/builder/linganno/SUX/#Augmentation] as described in the ORACC manual is superfluous. Disambiguators that distinguish homographic word forms may function as augmentation. So in the case of ex. (5) discussed above,

(5) @form lugal-a-ni\dat /lugal #N1=lugal.N3=ani.N5=ra ##N1=STEM.N3=3-SG-H-POSS.N5=DAT-H

the disambiguator \dat will indicate that this occurrence of lugal-a-ni is different from the one whose M1 is #N1=lugal.N3=ani.N5=ø. A word form may have more than one disambiguators, their order then follows the order of slots in the template as in the example below (Gudea Statue B ix 9):

(7) a₂ huš-na he₂-dab₉
#lem a[arm]; husz[reddish]\l1; dab[seize]\v10l1\v14sg3s

In some cases transliteration contributes to disambiguation. In particular, the transliteration of the 3rd ps. sg. possessive enclitics will distinguish between the word form in the absolutive/human dative and in the ergative/non-human dative/non-human locative3- -(Ca/a)-ni vs. -(Ca/a)-ne2, -bi vs. be2.

The interim list of disambiguators
Disambiguator Meaning Relevance

abl

word form is in the ablative (= ends in M1 as .N5=ta)

M1

abs

word form is in the absolutive (= ends in M1 as .N5=ø)

M1

adv

word form ends with the adverbiative (= ends in M1 as .N5=eš)

M1

com

word form is in the comitative (= ends in M1 as .N5=da)

M1

cop

word form ends with a copula (= ends in M1 as N6=COP-)

M1

dat

word form is in the human dative (= last N5 in M1 is /ra/)

M1

dat

verbal form contains a dative prefix

dem1

/e/ in N2 of M1 is the demonstrative pronoun (= is DEM1 in M2)

M1

erg

word form is in the ergative (= last N5 in M1 is /e/)

M1

gen

word form is in the genitive (= ends in M1 as .N5=ak)

M1

gen\abs

word form ends with a sequence of a genitive and an absolutive case-marker (ends in M1 as .N5=ak.N5=ø)

M1

l1

word form is in the locative1 (= last N5 in M1 is /'a/)

M1

l2

word form is in the locative2 (= last N5 in M1 is /'a/)

l3

word form is in the locative3 (= last N5 in M1 is /e/)

M1

dem2

/bi/ in N3 of M1 is the demonstrative pronoun(= is DEM2 in M2)

M2

n5al2

/'a/ in the last N5 of M1 is locative2 (= is L2-NH in M2)

n5edat

/e/ in the last N5 of M1 is dative (= is DAT-NH in M2)

M2

n5el3

/e/ in the last N5 of M1 is locative3 (= is L3-NH in M2)

M2

n5ral2

/ra/ in the last N5 of M1 is locative2 (= is L2-H in M2)

M2

n5ral3

/ra/ in the last N5 of M1 is locative3 (= is L2-H in M2)

M2

n6sg2

/men/ in N6 of M1 is 2nd ps sg. (= is COP-2-SG in M2)

M2

term

word form is in the terminative (= last N5 in M1 is /še/)

M1

v6sg1

the initial pronominal prefix in V6 is 1st ps. sg.

M2

v10l1

there is a locative1 prefix in V10 ( = /ni/ or /n/ in M1)

M1

v10l3

/i/ in V10 of M1 is the locative3 prefix (= is L3 in M2)

M2

v11bl3

/b/ in V11 of M1 is the 3rd ps sg non-human FPP construed with a participant in L3 (= is 3-SG-NH-L3)

M2

v11nh3a

/b/ in V11 of M1 is the 3rd ps sg non-human FPP construed with a agent (= is 3-SG-NH-A)

M2

v11nl3

/n/ in V11 of M1 is the 3rd ps sg human FPP construed with a participant in L3 (= is 3-SG-H-L3)

M2

v11sg1a

in V11 there is a /?/ in M1 (= the verbal form has a 1-SG-A in M2)

M1

v11sg2a

in V11 there is a /y/ in M1 (= the verbal form has a 2-SG-A in M2)

M1

v11sg3p

/n/ in V11 of M1 is the 3rd ps sg human FPP construed with a patient (= is 3-SG-H-P)

M2

v14pl1

/enden/ in V14 of M1 is 1st ps pl suffix in plural transitive preterite verbal forms (= is 1-PL in M2)

M2

v14pl1a

/enden/ in V14 of M1 is 1st ps pl agent suffix (= is 1-PL-A in M2)

M2

v14pl2

/enzen/ in V14 of M1 is 2nd ps pl suffix in plural transitive preterite verbal forms (= is 2-PL in M2)

M2

v14pl2a

/enzen/ in V14 of M1 is 2nd ps pl agent suffix (= is 2-PL-A in M2)

M2

v14pl3p

/eš/ in V14 of M1 is 3rd ps pl patient suffix (= is 3-PL-P in M2)

M2

v14pl3s

/eš/ in V14 of M1 is 3rd ps pl subject suffix (= is 3-PL-P in M2)

M2

v14sg1p

/en/ in V14 of M1 is 1st ps sg patient suffix (= is 1-SG-P in M2)

M2

v14sg1s

/en/ in V14 of M1 is 1st ps sg subject suffix (= is 1-SG-S in M2)

M2

v14sg2a

/en/ in V14 of M1 is 2nd ps sg agent suffix (= is 2-SG-A in M2)

M2

v14sg2p

/en/ in V14 of M1 is 2nd ps sg patient suffix (= is 2-SG-P in M2)

M2

v14sg2s

/en/ in V14 of M1 is 2nd ps sg subject suffix (= is 2-SG-S in M2)

M2

v14sg3s

/ø/ in V14 of M1 is 3rd ps sg subject suffix (= is 3-SG-S in M2)

M2

v2l1

/ii/ in V2 of M1 is the lenghtened finite-marker signaling the syncopation of the vowel of the L1 prefix (= is FIN-L1 in M2)

M2

v5nh3

/ba/ in V5 of M1 is 3rd ps sg non-human IPP (= is 3-NH in M2)

M2

List of actually used disambiguators

Practical remarks

1. Appositives
In case of appositives it is assumed that only the last member of the sequence of appositives receives a case-marker, unless there is positive evidence for the opposite. This has the consequence that when a sequence of appositives is in the absolutive, we never mark its members preceding the last one as absolutives, since there is never positive evidence for a /ø/.

2. Vocatives
In Sumerian the vocative case is assumed to be the absolutive, we mark them accordingly.

3. Lists
In lists all items listed are assumed to be in the absolutive.

4. Allomorphy
"If the L1 representation to be glossed corresponds to standard orthography, the analyst has no decisions to make in its regard. Otherwise, a good option for the representation (as well as for any writing system) is a morphophonemic representation which steers a middle course as far as allomorphy is concerned- Phonologically conditioned allomorphy is resolved (ignored), morphologically conditioned allomorphy is not resolved (is rendered)." (Lehmann 2004: 1841)

5. Numerals
Just like non-finite verbal forms, cardinal numbers may be used both as modifiers and nouns, they will thus be inserted in the non-finite verbal template.

6. Seal inscriptions ending with the phrase "arad₂-zu"
Seal inscriptions ending with the phrase "arad₂-zu" are considered to be made up of three parts- a) Name of the dedicatee; b) Name of the dedicator; c) The phrase "arad₂-zu". In ETCSRI all three parts are considered to be in the absolutive, and are analyzed and glossed accordingly.

Changes in the glossary

1. di[speak] was made one of the bases of dug[speak]. All occurences in text were changed.
2. e[speak] was made one of the bases of dug[speak]. All occurences in text were changed.
3. tum[bring] was made one of the bases of de[bring]. All occurences in text were changed.
4. lah[bring] was made one of the bases of de[bring]. All occurences in text were changed.

List of words not in template

Non-regular STEM-PFs or STEM-RDPs in M1

STEM STEM-PF STEM-RDP
e₃ e[leave] ed
ĝar ĝar[place] ĝaĝa
gi₄ gi[turn] gigi
kur₉ kur[enter] kuku
taka₄ taka[abandon] dada
tuku tuku[acquire] dudu

The date of last modification: 05 Jul 2012

 
Back to top ^^
 
The ETCSRI Project at Oracc.org / Content released under a CC BY-SA 3.0 licence, 2013-14
Oracc sites use cookies only to collect Google Analytics data. Read more here; see the stats here; opt out here.
http://oracc.museum.upenn.edu/etcsri/glossing/