Project files

The files belonging to a project are organized according to a specific set of conventions which are described here.

Background | Project home | Editing data | Generating data | Public data | Subprojects


The files belonging to a project are strictly organized; no files may be present in the project's home directory except system configuration files (these generally begin with a period (.) and so are not normally visible when listing the project home directory. Several groups of directories contain static and computer-generated project content, and all of these begin with digits. Any directory whose name does not begin with digits is by definition a sub-project.

Project Home

Most project workers will access project files from the project home directory. Thus, when we refer to a directory 00atf, that would be accessed via Emacs using tramp as: /, where PROJECT would be replaced by the actual project name.

The programs behind Oracc, however, access project files is from the top-level Oracc installation directory. This matters to you because when a program refers to, e.g., the 00atf directory in an error message, it will probably call it something like /usr/local/oracc/PROJECT/00atf.

This access route is the reason why project names must all be four characters or more in length: Oracc reserves names of three or less letters for system files and directories.

This is also the reason why project names must be in all lowercase: for simplicity and predictability all Oracc directory names consist only of lowercase letters, in some cases prefixed by a sequence of digits.

Editing Data

All project content which can be edited by hand is contained in one of the directories prefixed with 00. The current set of these directories is:

conventional space where arbitrary project materials may be kept; by convention a subdirectory '00any/portal' contains a copy of any project portal, which may be a compressed archive. This copy of the portal is not used in any way, but is simply a convenient way to ensure that the portal and the project can be archived together.
source texts in ATF format; this is versioned, and ideally each ATF file is in its own file named by its P/Q/X ID
project backups can be stored here, and when the glossary-merger updates a glossary a copy is made in this directory
project bibliography can be kept here, but it is not yet used by the Oracc build mechanism
executable versions of any programs specific to the project should be kept here (sources should be kept in 00any/src)
if a project maintains its own catalogues the installable sources live in 00cat; note that projects which use the CDLI main catalogue directly have their metadata created dynamically when the project is built, and do not have any catalogue files in the 00-tree
project configuration data, glossaries and signlists live in 00lib; text lists live in 00lib/lists
to support the transition to Oracc's new architecture, ATF sources may be accessed through alternate names via the 00map directory which contains links to the files in 00atf in a more user-friendly format
static content for the project's ESP portal website should be placed here and is migrated into the HTML version of the project along with the automatically generated HTML version of project data; there are separate subdirectories for 00res/images and 00res/downloads respectively
additional TEI data can be placed here and is migrated into the XML version of the project along with the automatically generated TEI version of project data.
editable project content for the ESP portal website should be placed here and is migrated into the HTML version of the project along with the automatically generated HTML version of project data; images and non-editable downloads live in 00res/images and 00res/downloads respectively
additional project data in XML format should be placed here and is migrated into the XML version of the project

Generating Data

During the build process, a number of XML versions of the text-based sources are generated, and these are placed in the directory 01bld. Each text has its own directory in 01bld, named according to its ID. When one project borrows data from another via proxying, it is this ID-named directory from which the proxied data is taken.

Unmerged glossary data harvested from ATF files is stored in 01bld/new/[LANGUAGE].new.

There is also a working directory named 01tmp.

Both of these directories actually live in the variable part of the filesystem, by convention in /var/local/oracc. The entries in the project home directory are links.

Public Data

There are several different kinds of public data besides the raw build results. Like 01bld these directories are also links to the Oracc directory /var/local/oracc.

public files which don't belong in 02www or 02xml are put here; these files are public but are not visible online
this is the live project web site
this is the XML form of the project's data, which is what is loaded into the XML database; there is an [ID].xml for each text in the project, including those which are proxied from other projects, and the XML versions of glossaries etc., are included with 00 prefixes so that they occur first in the project data in a simple sort


Subprojects work exactly like projects; the only difference is that their home directory is a project or subproject directory, rather than the home directory of the project.

23 Jul 2014 osc at oracc dot org

Steve Tinney & Eleanor Robson

Steve Tinney & Eleanor Robson, 'Project files', Oracc: The Open Richly Annotated Cuneiform Corpus, Oracc, 2014 []

Back to top ^^

Released under a Creative Commons Attribution Share-Alike license 3.0, 2014. [] [] []
Oracc uses cookies only to collect Google Analytics data. Read more here; see the stats here []; opt out here.