Syntactic dependency parsing
MSTParser models used in an experiment with dependency parsing of Croatian. Two models are available for download:
- full MSD features, MTE v5 reduced tagset, second order, non-projective (model.mte5.defnpout.tar.gz)
- only POS features, second order, non-projective (model.mte5.pos.tar.gz)
The models are based on a new simplified dependency-syntactic formalism for Croatian, as implemented in the SETimes Dependency Treebank of Croatian (SETimes.HR Treebank). Overall parsing accuracy is observed at 75-78% in labeled attachment (LAS). The MSD model outperforms the POS model. Both models are trained on CoNLL-X data from SETimes.HR Treebank, i.e., 2 500 sentences with lemmas and morphosyntactic tags as features (CoNLL-X columns LEMMA, CPOSTAG and POSTAG were used). The data is provided under the CC-BY-SA-3.0 license.
Please cite this paper when using the models:
Agić, Ž.; Merkler, D. (2013.) Three Syntactic Formalisms for Data-Driven Dependency Parsing of Croatian. In Text, Speech and Dialogue. Lecture Notes in Computer Science. Berlin, Heidelberg, Springer, 2013, pp. 560–567.
For a more detailed description of the new formalism, see also this paper:
Merkler, D.; Agić, Ž.; Agić, A. (2013.) Babel Treebank of Public Messages in Croatian. Procedia — Social and Behavioral Sciences, 95, 2013, pp. 490–497.