Lemmatising ATF Files
Lemmatisation consists of labelling written words, which may be inflected, with the base word (or dictionary headword) of which the written form is an instance. Oracc's lemmatiser, L2, helps you generate glossaries from your Oracc corpus.
Learning and Using L2
- Lemmatisation Primer
- This page provides an introduction to linguistic annotation facilities, especially lemmatization, used in Oracc.
- Lemmatising: How to Use L2
- This page summarises the steps required to use L2, the lemmatiser used by Oracc. First we describe what you need to know about editing ATF files, then glossary management, then rebuilding the whole project. This page is designed as a refresher for those already familiar with lemmatisation.
- More details on lemmatising Akkadian, Sumerian, proper nouns and other ancient languages.
- COFs: Compound Orthographic Forms
- This page describes how the lemmatiser handles Compound Orthographic Forms, or written words which contain more than one lemma.
- PSUs: Phrasal Semantic Units
- This page describes how L2 handles Phrasal Semantic Units, or glossary entries which consist of more than one word.
- BFFs: Byforms
- BFFs are the Oracc glossary mechanism for word byforms of various kinds. The general mechanism allows the individual byforms to be treated under their own entries within the glossary and during lemmatization, but then be kept separate or grouped together when the glossaries are rendered.
- Rebuild Errors
- This page gives some help with fixing rebuild error messages that arise through lemmatisation or glossary management problems.
More technical descriptions
- L2: How It Works
- This document provides an overview of how the lemmatizer, L2, works to help Oracc builders understand what files are used for validation and how to control lemmatization.
- L2: Signature/Lemmatization Syntax
- This document describes extant and planned elements of the syntax of signatures and the lemmatization specifications that use them.
23 Jul 2014
Eleanor Robson, 'Lemmatising ATF Files', Oracc: The Open Richly Annotated Cuneiform Corpus, Oracc, 2014 [http://oracc.museum.upenn.edu/doc/help/lemmatising/]