NER | Natural Language Processing group

Named entity recognition

We have recently published an API with significantly improved processing accuracy. You can find the documentation on the API here, while there is a web application for testing the API available here.

NER models for Croatian and Slovene using StanfordNER v1.2.4 and distributional similarity calculated on large unannotated corpora. Download the models:

Croatian 3-class model (hr.3-class.distsim.ser.gz), measured F1 is 0.899
Croatian 4-class model (hr.4-class.distsim.ser.gz) trained on a small subset of available data annotated with all four classes, measured F1 is 0.636
Slovene 4-class model (sl.distsim.ser.gz), measured F1 is 0.7

The models are provided under the CC-BY-SA-3.0 license, but take notice of the StanfordNER license before using the tool.

Please cite this paper when using the models:

Ljubešić, N.; Stupar, M.; Jurić, T.; Agić, Ž. (2013.) Combining Available Datasets for Building Named Entity Recognition Models of Croatian and Slovene. In Slovenščina 2.0: empirical, applied and interdisciplinary research, in press.