Skip to content

rules

Annotation rules are loaded with name, scope, priority, match and set. Supported scope are token and morpheme. Matching handles regex, lists, lexicons, extractors, require and forbid.

Grammar

rules:
  - name: <string>
    scope: token | morpheme
    priority: <int>

    match:
      gloss: ...
      in_list: <string | [string, ...]>       # compatible with older format
      in_lexicon: <lexicon-name>
      regex: <regex>
      require: <string | [string, ...]>
      forbid: <string | [string, ...]>
      extract:
        - type: scan_agreement
          extractor: <extractor-name>

    set:
      upos: <UPOS>
      feats:
        <FeatName>: <FeatValue>
      feats_template:
        <FeatName>: <template>
      extract:
        - type: scan_agreement
          extractor: <extractor-name>
          into: <context-key>

Semantic

  • match.gloss can be a simple string or a map.
  • in_list is supported for compatibility.
  • in_lexicon references a loaded lexicon at the top-level.
  • regex constructs a Pattern.
  • require and forbid test context paths.
  • match.extract and set.extract are used to launch extractors.

Examples

Minimal Example

- name: identify verbs from gloss lexicon
  scope: morpheme
  match:
    gloss:
      in_lexicon: spanish_verbs
  set:
    upos: VERB

Example with templates

- name: scan agreement on verb tokens
  scope: token
  match:
    gloss:
      in_lexicon: spanish_verbs
  set:
    extract:
      - type: scan_agreement
        extractor: agreement_verbs
        into: ab
    feats_template:
      Pers[subj]: "{ab.A.person}"
      Number[subj]: "{ab.A.number}"
      Pers[obj]: "{ab.B.person}"
      Number[obj]: "{ab.B.number}"

This structure exists as it is in the YAML of the test.