Contributing to Oracc: How and Why

Oracc projects minimally consist of an educational 'portal' website and an optional corpus of cuneiform texts. They come in all shapes and sizes from small student projects to large international research collaborations. We welcome them all. Here we outline what is involved in creating an Oracc project, and what the benefits are.

Portals: why and how | Corpora: how | Corpora: why

Why and how build an Oracc 'portal' site?

Here are some reasons why you should consider using Oracc to create an educational online resource about the ancient cuneiform world, whether or not you are also building and Oracc corpus.

How building an Oracc corpus works

Our basic model for corpus and tool development is that text corpora are edited and annotated at the source (or manuscript or tablet) level. We also know that it is often desirable to add new texts, joins, and fragments to the corpus; and to improve or update existing transliterations, translations, and annotations. Oracc thus works by merging manuscript files with lists of varying complexity to produce tools for describing and exploring the corpora in many ways. Whenever an editor or project manager edits, adds or updates the texts or data lists, the tools are rebuilt programmatically from scratch, so that the latest improvements to the annotated texts and the lists of data are automatically incorporated throughout the project.

The core Oracc standard for entering textual data is known as ATF, the ASCII Transliteration Format. ATF can support multiple translations, in any language.

Lemmatization is the process of annotating instances of forms of words according to their dictionary headword. Oracc uses interlinear lemmatization in the ATF transliterations to enable lemmatization data to remain synchronized with textual changes. Even for completely new projects, the lemmatizer can be set up to draw on relevant glossaries from existing Oracc projects, thus automating much of the process.

Why build a corpus using the Oracc tools?

Text Cataloguing

The CDLI catalogue provides a global repository of unique identifiers for inscribed objects:

Data Entry

You can use old data and enter new data easily:

Data Consistency

The Oracc tools help you get data into a well-defined and highly consistent format and keep it that way:

Data Backup and Version History

The Oracc server can look after your data:

Data Development

Once texts are entered they can be enhanced in various ways:


The same transliterations and translation can be presented in several ways:

Enhanced Usefulness

Corpora prepared with these tools are reusable and more useful:

For more information on how to manage a project, go to the Manager section of the Oracc documentation.

18 Dec 2019 osc at oracc dot org

Steve Tinney & Eleanor Robson

Steve Tinney & Eleanor Robson, 'Contributing to Oracc: How and Why', Oracc: The Open Richly Annotated Cuneiform Corpus, Oracc, 2019 []

Back to top ^^

Released under a Creative Commons Attribution Share-Alike license 3.0, 2014. [] [] []
Oracc uses cookies only to collect Google Analytics data. Read more here; see the stats here []; opt out here.