8 ANNIS framework

find your way through: https://corpus-tools.org/annis/, install ANNIS on your system and try to import the zipped ANNIS SES corpus you find in the HU-box.

> folder: sketch engine Work, namescheme of latest zip: [datestamp]_SES_annis_tagged_corpus.zip 

8.1 SES sample procedure to create ANNIS corpus

the following is just for documentation of the process; you wont have to follow these steps, just follow above instructions to install ANNIS on your system and import the zipped corpus.

  • upload files in HU box folder version without header for SketchEngine upload to SketchEngine > create new corpus
  • expert compiler settings > adapt docscheme to > sesCPT
    • with that done you can already explore the SES corpus in the SketchEngine GUI using the built in CQL (corpus query language) commands.
  • download corpus (vertical)
    • corpus is now a database of token, PoS, lemma; tagged according to the GermanRF tagset1 used by SketchEngine
  • process database in: conc-essai.R
    • splits PoS tag (scheme: x.x.x.x.x) into seperate columns defining classes of PoS tags
    • writes single .xlsx files for each kid into folder
    • ANNIS preprocessing:
      • pepper: xls > treetagger format from .xlsx files folder. parameter file
      • pepper: treetagger > annis graph format from treetagger files folder. parameter file
      • zip annis graph files
  • upload annis.zip to ANNIS localhost server

8.2 ANNIS ready to use installation:

please find here: link follows an ANNIS server installation with the SES corpus ready to use. (! 20230904: the link is not yet freely available, use the link shared in moodle if you dont want to use your own local installation !)