Assyrian-Babylonian scholarly literacies: identifying individual spelling habits

In order to study exactly how individuals, families and literate communities used cuneiform script to express themselves, the Assyrian-Babylonian Scholarly Literacies (ABSL) project is developing computational, statistical methods to investigate spelling habits in cuneiform.

The ABSL team comprises Eleanor Robson and Greta Van Buylaere at the University of Cambridge, advised by Steve Tinney of the University of Pennsylvania and Niek Veldhuis of UC Berkeley. The project was funded by The Leverhulme Trust from July 2011 to December 2012.


Cuneiform was written on clay tablets and on more perishable materials such as parchment and leather. Here two scribes take dictation from a superior, as shown on a low-relief wall carving from the palace of king Tiglath-Pileser III in the Assyrian city of Kalhu, northern Iraq, c.730 BC. Detail of British Museum ME 118882, photograph by Greta Van Buylaere. Go to British Museum website for more information.

Cuneiform script [] is one of the world's oldest writing systems and apparently also one of its most complex. It comprises about 1000 different wedge-shaped signs that can each represent one or more words or syllables. There may be several legitimate spellings of even very common words.

But much of this apparent complexity comes from cuneiform's long period of use, across a wide geographical area, for many different languages and purposes. We already know that no one individual writer ever used the full capacities of cuneiform, but it is difficult to explore the subtleties of personal writing habits using current methods.

Undertaken effectively, spelling analysis could radically transform our understanding of cuneiform literacy. It will not only refine our understanding of how different social groups used writing but will enable us to explore individual literacies within particular scribal communities within a supposedly static and anonymous culture.

Our case studies will focus on Assyrian and Babylonian scholarly writings of the first millennium BC, in order to explore how knowledge travelled from person to person and place to place. But we will design the query tools so that any interested researcher can use them on any Oracc texts. In this way we hope to foster a new, bottom-up approach to cuneiform literacies that will fundamentally enrich our understanding of personal writing, thinking and communication in the ancient Middle East.


It is already well understood that levels of cuneiform literacy correlate with scribal profession: those employed to write administrative documents used fewer signs than scribes writing letters, who, in turn, had a poorer sign repertoire than scholars. We plan to subject this intuitive understanding of cuneiform to statistical analysis. To do so we will focus on three interrelated and consistently edited online corpora from the first millennium BC.

Our analysis will help address such questions as:

Identifying distinct spelling habits may also help us attribute unsigned or fragmentarily preserved tablets to a particular scribe or scribal community, and discover new 'joins', or fragments belonging to the same tablet.


Professional scribes underwent a formal training in cuneiform. This tablet, written in the Babylonian city of Nippur, southern Iraq, in c.1750 BC, contains a vocabulary in Sumerian in the left column, with Akkadian (Babylonian) translations on the right. Penn Museum CBS 10466 obverse, photograph by Penn Museum staff. View edition of tablet on the Digital Corpus of Cuneiform Lexical Texts.

The complexity of cuneiform script allows us to construct detailed profiles, scholarly 'signatures', of each scribe's writing conventions. We can thus explore individuals' contributions to the transmission and creation of scholarship, the interrelations between different production environments, and the paths by which knowledge travelled between them.

The ABSL project is developing and testing analytic tools and methodologies to obtain and present statistical data on sign use on three complementary Oracc corpora:

Oracc texts and metadata can be combined into a richly annotated XML/TEI-format (Text Encoding Initiative [], the international standard for humanities computing). This data can be manipulated with eXist [] (an XML database format) and XQuery [] (a language for querying XML databases). The query tools we will develop will thus be instantly applicable across all existing and future Oracc corpora. Oracc's implementation of TEI and of eXist [/doc/developer/exist] are already extensively documented.

The output of the query tools will be highly adaptable to individual research interests thanks to Oracc's rich metadata, including information on the author or copyist of a composition (if known), the text's date, language, findspot, etc. In this way, it will be possible to generate not only the writing habits of individual scribes but, eventually, a holistic picture of the use of cuneiform across time, space and social class.


The issues tackled here have rarely been addressed by linguists. They cannot be answered by traditional methods since the amount of data is simply too vast to handle manually. However, computational approaches allow such data to be processed rigorously and quickly, enabling us to pose and answer previously unsolvable questions.

New insights will not be limited to this project. Rather, they will grow more compelling with continued application to the ever-expanding Oracc super-corpus, adding an important new dimension to the study of the world's oldest writing. Familiar intuitions about cuneiform will be fine-tuned or even proven wrong by quantitative analysis, thus enhancing our understanding of scribal literacies and the production of scholarship in the first millennium BC.

27 Dec 2019

Eleanor Robson & Greta Van Buylaere

Eleanor Robson & Greta Van Buylaere, 'Assyrian-Babylonian scholarly literacies: identifying individual spelling habits', The Geography of Knowledge, The GKAB Project, 2019 []

Back to top ^^
The GKAB Project at / Content released under a CC BY-SA 3.0 licence, 2007-14
Oracc sites use cookies only to collect Google Analytics data. Read more here; see the stats here; opt out here.