Lemmatising ATF Files

Lemmatisation consists of labelling written words, which may be inflected, with the base word (or dictionary headword) of which the written form is an instance. Oracc's lemmatiser, L2, helps you generate glossaries from your Oracc corpus.

Learning and Using L2

Lemmatisation Primer
This page provides an introduction to linguistic annotation facilities, especially lemmatization, used in Oracc.
Lemmatising: How to Use L2
This page summarises the steps required to use L2, the lemmatiser used by Oracc. First we describe what you need to know about editing ATF files, then glossary management, then rebuilding the whole project. This page is designed as a refresher for those already familiar with lemmatisation.
More details on lemmatising Akkadian, Sumerian, proper nouns and other ancient languages.

Further details

COFs: Compound Orthographic Forms
This page describes how the lemmatiser handles Compound Orthographic Forms, or written words which contain more than one lemma.
PSUs: Phrasal Semantic Units
This page describes how L2 handles Phrasal Semantic Units, or glossary entries which consist of more than one word.
BFFs: Byforms
BFFs are the Oracc glossary mechanism for word byforms of various kinds. The general mechanism allows the individual byforms to be treated under their own entries within the glossary and during lemmatization, but then be kept separate or grouped together when the glossaries are rendered.
Rebuild Errors
This page gives some help with fixing rebuild error messages that arise through lemmatisation or glossary management problems.

More technical descriptions

L2: How It Works
This document provides an overview of how the lemmatizer, L2, works to help Oracc builders understand what files are used for validation and how to control lemmatization.
L2: Signature/Lemmatization Syntax
This document describes extant and planned elements of the syntax of signatures and the lemmatization specifications that use them.
