The @collo tag

The @collo tag gives access to the Ngrammer. It is normally used with some convenient shorthands that are defined only for use in glossaries--where the shorthands can be expanded from context. A typical Ngram entry consists of a Left-Hand Side (LHS) and a Right-Hand Side (RHS); in @collo entries it's most common only to give the LHS and let the @collo system generate the RHS for you.

When the glossary processor reads 00lib/sux.glo it expands the @collo rules and puts them in a file, 02pub/coll-sux.ngm. The lemmatizer reads these rules and applies them early in the disambiguation sequence.

When we talk of 'expanding' here, we mean that the shorthand given in @collo is supplemented with contextual data to create a valid Ngrammer entry in the output file 02pub/coll-sux.ngm.


A simple hyphen indicates the current word: it is expanded to CF[GW]POS
A hyphen followed immediately, with no whitespace, indicates the current word but with the specific form in question: it is expanded to :FORM=CF[GW]POS
A pair of square brackets around a sense indicates that the specified sense of the current word should be selected: it is expanded to CF[GW]POS => CF[GW//SENSE]
Form and sense can be combined to mean "select this sense of the current word when this form is used"
A simple Part-of-speech (e.g., PN) indicates that any word with that POS should be considered a match: it is expanded to itself (in fact, it is a standard part of the Ngrammer system, but is mentioned here because of its utility for use in @collo)
Any other signature than those described above is simply passed through to the ngram rule that is being built from the @collo

The Ngrammer

Work-in-progress documentation for the Ngrammer is available here. It is sometimes necessary to give full Ngrammer rules in a @collo entry, but because the RHS (after the => is often unnecessary in the Ngrammer it's rarely necessary to give it in @collo. It is never necessary to give an RHS consisting entirely of * elements--in fact, the @collo system discards an RHS that consists entirely of asterisks.

Back to top ^^

Released under a Creative Commons Attribution Share-Alike license 3.0, 2014. [] [] []
Oracc uses cookies only to collect Google Analytics data. Read more here; see the stats here []; opt out here.