Named entity recognition
We have recently published an API with significantly improved processing accuracy. You can find the documentation on the API here, while there is a web application for testing the API available here.
NER models for Croatian and Slovene using StanfordNER v1.2.4 and distributional similarity calculated on large unannotated corpora. Download the models:
- Croatian 3-class model (hr.3-class.distsim.ser.gz), measured F1 is 0.899
- Croatian 4-class model (hr.4-class.distsim.ser.gz) trained on a small subset of available data annotated with all four classes, measured F1 is 0.636
- Slovene 4-class model (sl.distsim.ser.gz), measured F1 is 0.7
The models are provided under the CC-BY-SA-3.0 license, but take notice of the StanfordNER license before using the tool.
Please cite this paper when using the models:
Ljubešić, N.; Stupar, M.; Jurić, T.; Agić, Ž. (2013.) Combining Available Datasets for Building Named Entity Recognition Models of Croatian and Slovene. In Slovenščina 2.0: empirical, applied and interdisciplinary research, in press.