The basic unit in the Oracc system is the project. This document gives a basic introduction to how projects are organized and how to create and work with them.
From the user's perspective a project comprises one or two components:
A project minimally contains a portal but may also contain a corpus that is related to it.
the system perspective, there are actually two kinds of projects: main
projects (which we call simply `projects') and subprojects.
Subprojects are a way of dividing up projects for various reasons.
For instance, the CAMS subproject Ludlul is found in the
Projects are the organizational core of the Oracc server.
Before we begin, it is useful to explain the fundamentals which are available to all projects.
A project has at least one portal page, which may contain links to the corpus. The portal website is hosted on the same server as the corpus data, and may also be located elsewhere (for instance if required by a funding body).
Files containing editable portal content live in the
directory. Files containing static content, such as images and downloads, live in the
The link to the portal for a project `cams' is:
Help on setting up a portal is given here.
Most projects relate in some way to a text corpus. The texts are entered or converted to the ATF format and may have translations. The project management software takes care of turning the ATF sources into the various formats used for web display and other purposes.
Transliterations are the core of a corpus.
Translations can be integrated into the corpus.
ATF files containing transliterations and translations are kept in a project's
00atf directory on the Oracc server.
The pager is the name given to the web-interface which enables users to interact with the corpus. The pager understands how to present long lists of results in pages, and also how to assemble metadata, texts and translations into pages displaying individual texts.
In your web browser you can jump directly to the corpus pager by using the keyword `corpus' after the project name in the URL. Compare http://oracc.museum.upenn.edu/saao and http://oracc.museum.upenn.edu/saao/corpus.
While it may not be obvious, the most fundamental part of any corpus is the catalogue which provides the text metadata--at the very least the CDLI ID and a human-readable designation--which provides the organizational basis for all other components of the project.
The easiest way to provide a catalogue for a corpus is to derive the project dynamically from the CDLI catalogue. However, some projects have special needs and in those cases it is possible to tailor the catalogue processing software to the required metadata fields and values.
P-numbers are unique identifiers required by the tools. To get P-numbers for you tablets:
If a project has its own catalogue, that is kept in the
The ATF format supports lemmatization, which is the process of adding references to dictionary headwords into the texts. If a corpus is lemmatized, it can be used to generate glossaries directly from the texts with no glossary-editing at all. Normally, however, the glossary and text corpus are used together: the glossary is maintained and may be edited or augmented with bibliography, and the corpus is synchronized with the glossary so that all of the instances of terms are instantly reachable from the glossary articles.
Linguistic annotation makes a corpus more useful.
+.in the appropriate places
Glossaries are generated from the ATF files when a project is rebuilt. They live in the
Lists of texts can be handled in either of two ways: as LIST files or as URLS.
List files are simply files containing P, Q or X IDs. They must be
placed in the directory
cams/00lib/lists/. The rebuild process
installs the lists in the proper place. You can then refer to your
list by name.
After creating a list file in the CAMS project with the name
00lib/lists/ritual-drawings and the content:
You can then refer to
For small numbers of texts, it is convenient to give the P, Q or X IDs in a comma-separated list after the project name:
The project organization is intended for use with multi-user systems. At the operating system level, each project is a user with a password and a home directory.
Projects can also own subprojects, which also means that regular users on a system can have their own personal projects.
The files used by a project live in several different folders (aka directories). The most important of these are:
Project management tasks are carried out by logging on to the Oracc server with a terminal programme and typing some simple Unix commands. Images and files can also be uploaded by drag-and-drop. For more detailed information, see the page on Project Management with Unix.
Once logged in as the project-user on the server, most tasks are
accomplished via the program
oracc, which is fully documented on another page.
Project files are stored on the Oracc server, currently http://oracc.museum.upenn.edu. A stable version of the project is publicly viewable on one or more web servers, currently also http://oracc.museum.upenn.edu.
You can build your portal and corpus independently from one another: rebuilding one does not entail rebuilding the other.
For fuller instructions, see the pages on Project Management with Unix.
Write your portal pages in ESP and upload them to
00web/. Place any static content for your portal in
Run the oracc command to update the portal.
If you are using the CDLI catalogue then no action is required. If
you are using a custom or local catalogue, the project must be correctly
configured, then the catalogue updates must be placed in the
00cat folder with the file name(s) the project has
been configured to use.
There is a separate page about setting up your own project catalogue.
Transliterations should be placed in the
folder. There can be one big file, one file per text, or something in
between; the rebuild process uses all the relevant files in
When new texts are added, simply run the oracc command to update the website, indexes, etc.
The recommended workflow for glossary building is:
oracc merge [LANGUAGE](this automatically redoes the harvest).
*.glofile from the 'backups' directory--multiple
oracc merge [LANGUAGE]commands on the same day overwrite the same file.
Steve Tinney & Eleanor Robson
Steve Tinney & Eleanor Robson, 'An overview of Oracc projects', Oracc: The Open Richly Annotated Cuneiform Corpus, Oracc, 2014 [http://oracc.museum.upenn.edu/doc/help/managingprojects/projects/]