Crowdsourcing speakers of Croatian for improving basic language tools

If you happen to speak Croatian, feel free to join our crowdsourcing efforts through which we try to produce more annotated data and improve our existing models for basic language tools for Croatian.

The prepared dataset is compiled from the dump of the Croatian Wikipedia. The worker tasks consist of tokens, their assumed morphosyntactic description and context for which the HunPos and the TreeTagger tools, both trained on the same dataset, do not agree on. With crowdsourcing we hope to eliminate the correctly annotated tokens from the final checkup of the tagger disagreements which will be performed by a language professional.

You will need a Google account to register with the tool.

The result of the crowdsourcing efforts will be freely available under the CC-BY-SA license as all our other data.