Nikola Ljubešić

Positions:
Postdoctoral researcher
Department of Knowledge Technologies
Jožef Stefan Institute
Jamova cesta 39
SI-1000 Ljubljana
Slovenia

Assistant professor
Department of Information and Communication Sciences
Faculty of Humanities and Social Sciences
University of Zagreb
Ivana Lučića 3
HR-10000 Zagreb
Croatia

Active projects: ReLDI, JANES, Abu-MaTran

Contact:
E-mail: nikola dot ljubesic at ffzg dot hr
Phone: +385 1 600 23 23
Fax: +385 1 600 24 38
Office: E-315
Office hours: appointment via e-mail


Publications:

2015

Raphael Rubino, Tommi Pirinen, Miquel Espla-Gomis, Nikola Ljubešić, Sergio Ortiz-Rojas, Vassilis Papavassiliou, Prokopis Prokopidis, Antonio Toral. Abu-MaTran at WMT 2015 Translation Task: Morphological Segmentation and Web Crawling. Proceedings of the Tenth Workshop on Statistical Machine Translation. Lisbon, Portugal. pdf bibtex

Željko Agić and Nikola Ljubešić. Universal Dependencies for Croatian (that Work for Serbian, too). Proceedings of the 5th Workshop on Balto-Slavic Natural Language Processing. Hissar, Bulgaria. pdf bibtex

Nikola Ljubešić, Miquel Esplà-Gomis, Filip Klubička and Nives Mikelić Preradović. Predicting Inflectional Paradigms and Lemmata of Unknown Words for Semi-automatic Expansion of Morphological Lexicons. Proceedings of Recent Advances in Natural Language Processing. Hissar, Bulgaria. pdf bibtex

Nikola Ljubešić, Darja Fišer, Tomaž Erjavec, Jaka Čibej, Dafne Marko, Senja Pollak and Iza Škrjanec. Predicting the Level of Text Standardness in User-generated Content. Proceedings of Recent Advances in Natural Language Processing. Hissar, Bulgaria. pdf bibtex

Antonio Toral, Tommi Pirinen, Andy Way, Raphael Rubino, Gema Ramırez-Sanchez, Sergio Ortiz-Rojas, Vıctor Sanchez-Cartagena, Jorge Ferrandez-Tordera, Mikel Forcada, Miquel Espla-Gomis, Nikola Ljubešić, Filip Klubička, Prokopis Prokopidis and Vassilis Papavassiliou. Automatic Acquisition of Machine Translation Resources in the Abu-MaTran Project. Procesamiento del Lenguaje Natural 55 (1), 2015. pdf bibtex

Nikola Ljubešić, Kaja Dobrovoljc and Darja Fišer. *MWELex — MWE Lexica of Croatian, Slovene and Serbian Extracted from Parsed Corpora. Informatica 39 (3), 2015.pdf bibtex

Nikola Ljubešić and Mario Peronja. Predicting corpus example quality via supervised machine learning. Electronic lexicography in the 21st century: linking lexical data in the digital age. Proceedings of the eLex 2015 conference, 11-13 August 2015, Herstmonceux Castle, United Kingdom. pdf bibtex

Petra Bago and Nikola Ljubešić. Using machine learning for language and structure annotation in an 18th century dictionary. Electronic lexicography in the 21st century: linking lexical data in the digital age. Proceedings of the eLex 2015 conference, 11-13 August 2015, Herstmonceux Castle, United Kingdom. pdf bibtex

Tomaž Erjavec, Nikola Ljubešić and Nataša Logar. The slWaC Corpus of the Slovene Web. Informatica 39 (1), 2015.pdf bibtex

Nikola Ljubešić and Denis Kranjčić. Discriminating between Closely Related Languages on Twitter. Informatica 39 (1), 2015.pdf bibtex

2014

Maja Popović and Nikola Ljubešić. Exploring cross-language statistical machine translation for closely related South Slavic languages. Language Technology for Closely Related Languages and Language Variants (LT4CloseLang), EMNLP 2014. Doha, Qatar. pdf bibtex

Nikola Ljubešić, Kaja Dobrovoljc, Simon Krek, Marina Peršurić Antonić and Darja Fišer. hrMWELex – A MWE lexicon of Croatian extracted from a parsed gigacorpus. Language technologies: Proceedings of the 17th International Multiconference Information Society IS2014. Ljubljana, Slovenia. pdf bibtex

Tomaž Erjavec and Nikola Ljubešić. The slWaC 2.0 Corpus of the Slovene Web. Language technologies: Proceedings of the 17th International Multiconference Information Society IS2014. Ljubljana, Slovenia. pdf bibtex

Darja Fišer, Tomaž Erjavec, Ana Zwitter Vitez and Nikola Ljubešić. Janes se predstavi: metode, orodja in viri za nestandardno pisno spletno slovenščino. Language technologies: Proceedings of the 17th International Multiconference Information Society IS2014. Ljubljana. pdf bibtex

Filip Klubička and Nikola Ljubešić. Using crowdsourcing in building a morphosyntactically annotated and lemmatized silver standard corpus of Croatian. Language technologies: Proceedings of the 17th International Multiconference Information Society IS2014. Ljubljana. pdf bibtex

Nikola Ljubešić and Denis Kranjčić. Discriminating between VERY similar languages among Twitter users. Language technologies: Proceedings of the 17th International Multiconference Information Society IS2014. Ljubljana. pdf bibtex

Željko Agić and Nikola Ljubešić. The SETimes.HR Linguistically Annotated Corpus of Croatian. Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), Reykjavik. pdf bibtex

Nikola Ljubešić, Darja Fišer and Tomaž Erjavec. TweetCaT: a tool for building Twitter corpora of smaller languages. Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), Reykjavik. pdf bibtex

Miquel Esplà-Gomis, Filip Klubička, Nikola Ljubešić, Sergio Ortiz-Rojas, Vassilis Papavassiliou and Prokopis Prokopidis. Comparing two acquisition systems for automatically building an English–Croatian parallel corpus from multilingual websites. Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), Reykjavik. pdf bibtex

Nikola Ljubešić and Antonio Toral. caWaC – a Web Corpus of Catalan and its Application to Language Modeling and Machine Translation. Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), Reykjavik. pdf bibtex

Raphael Rubino, Antonio Toral, Nikola Ljubešić and Gema Ramirez Sanchez. Quality Estimation for Synthetic Parallel Data Generation. Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), Reykjavik. pdf bibtex

Nikola Ljubešić and Filip Klubička: {bs,hr,sr}WaC — Web corpora of Bosnian, Croatian and Serbian. Proceedings of the 9th Web as Corpus Workshop (WaC-9). Gothenburg, Sweden. pdf bibtex

Nikola Ljubešić, Tomaž Erjavec and Darja Fišer: Standardizing Tweets with Character-level Machine Translation. Computational Linguistics and Intelligent Text Processing. Lecture Notes in Computer Science, Springer pdf bibtex

2013

Marianna Apidianaki, Nikola Ljubešić and Darja Fišer. Vector Disambiguation for Translation Extraction from Comparable Corpora. Informatica 37 (2), 2013. pdf bibtex

Nikola Ljubešić, Marija Stupar, Tereza Jurić and Željko Agić. Combining Available Datasets for Building Named Entity Recognition Models of Croatian and Slovene. Slovenščina 2.0. 1, 2013. pdf bibtex

Nataša Logar and Nikola Ljubešić. Gigafida in slWaC: tematska primerjava. Slovenščina 2.0. 1, 2013. pdf bibtex

Darja Fišer and Nikola Ljubešić. Best friends or just faking it?: corpus-based extraction of Slovene-Croatian translation equivalents and false friends. Slovenščina 2.0. 1, 2013. pdf bibtex

Željko Agić, Nikola Ljubešić and Danijela Merkler. Lemmatization and Morphosyntactic Tagging of Croatian and Serbian. Proceedings of BSNLP 2013. Sofia, Buglaria. pdf bibtex

Nikola Ljubešić and Darja Fišer. Identifying false friends between closely related languages. Proceedings of BSNLP 2013. Sofia, Buglaria. pdf bibtex

Marianna Apidianaki, Nikola Ljubešić and Darja Fišer. Cross-lingual WSD for Translation Extraction from Comparable Corpora. Proceedings of BUCC 2013. Sofia, Buglaria. pdf bibtex

2012

Jörg Tiedemann and Nikola Ljubešić. Efficient Discrimination Between Closely Related Languages. Proceedings of the 24th International Conference on Computational Linguistics (COLING’12). Mumbai, India. pdf bibtex

Nikola Ljubešić, Marija Stupar and Tereza Jurić: Building Named Entity Recognition Models For Croatian And Slovene. Jezikovne tehnologije, 2012, Ljubljana, Slovenia. pdf bibtex

Marianna Apidianaki, Nikola Ljubešić and Darja Fišer: Disambiguating vectors for bilingual lexicon extraction from comparable corpora. Jezikovne tehnologije, 2012, Ljubljana, Slovenia. pdf bibtex

Nikola Ljubešić, Špela Vintar and Darja Fišer: Multi-word term extraction from comparable corpora by combining contextual and constituent clues. Workshop on Building and Using Comparable Corpora (BUCC’12), Istanbul, Turkey. pdf bibtex

Darja Fišer, Nikola Ljubešić and Ozren Kubelka: Addressing polysemy in bilingual lexicon extraction from comparable corpora. Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’12), Istanbul, Turkey. pdf bibtex

Mārcis Pinnis, Nikola Ljubešić, Dan Ştefănescu, Inguna Skadiņa, Marko Tadić and Tatiana Gornostay: Term Extraction and Mapping Tools for Under-Resourced Languages. Terminology and Knowledge Engineering 2012, Madrid, Spain. pdf bibtex

2011

Nikola Ljubešić and Tomaž Erjavec: hrWaC and slWac: Compiling Web Corpora for Croatian and Slovene. Text, Speech and Dialogue 2011. Lecture Notes in Computer Science, Springer. pdf bibtex

Nikola Ljubešić and Darja Fišer, Bootstrapping bilingual lexicons from comparable corpora for closely related languages. Text, Speech and Dialogue 2011. Lecture Notes in Computer Science, Springer. pdf bibtex

Darja Fišer and Nikola Ljubešić, Bilingual Lexicon Extraction from Comparable Corpora for Closely Related Languages. Recent Advances in Natural Langugage Processing 2011. Hissar, Bulgaria. pdf bibtex

Darja Fišer, Nikola Ljubešić, Špela Vintar and Senja Pollak, Building and using comparable corpora for domain-specific bilingual lexicon extraction. 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web. Portland, USA. pdf bibtex

Nikola Ljubešić, Darja Fišer, Špela Vintar and Senja Pollak, Bilingual lexicon extraction from comparable corpora: A comparative study. First International Workshop on Lexical Resources, Ljubljana, Slovenia. pdf bibtex

2010

Nikola Ljubešić: Event detection in Newspaper Texts, Presented on Series of talks in language technology – JOTA (Jezikovnotehnološki abonma), Ljubljana, Slovenia pdf bibtex

Nikola Ljubešić, Petra Bago, and Damir Boras, Statistical machine translation of Croatian weather forecast: How much data do we need?, Proceedings of the ITI 2010 32nd International Conference on Information Technology Interfaces pdf bibtex

Željko Agić, Nikola Ljubešić, and Marko Tadić, Towards sentiment analysis of financial texts in Croatian, Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC’10) pdf bibtex

Nikola Ljubešić, Tomislava Lauc and Damir Boras, Building a gold standard for event detection in Croatian, Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC’10) pdf bibtex

2009

Nikola Ljubešić, Pronalaženje događaja u višestrukim izvorima informacija [Event detection in parallel information sources], Ph.D. thesis, University of Zagreb pdf bibtex

2008

Nikola Ljubešić, Željko Agić, and Nikola Bakarić. Document representation methods for news event detection in Croatian, Proceedings of the 6th International Conference on Formal Approaches to South Slavic and Balkan Languages pdf bibtex

Nikola Ljubešić, Tomislava Lauc, and Damir Boras. Generating a morphological lexicon of organization entity names, Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC 2008) pdf bibtex

Nikola Ljubešić, Damir Boras, Nikola Bakarić, and Jasmina Njavro. Comparing measures of semantic similarity, Proceedings of the 30th International Conference on Information Technology Interfaces pdf bibtex

2007

Nikola Bakarić, Jasmina Njavro, and Nikola Ljubešić. What makes sense? Searching for strong WSD predictors in Croatian, Digital Information and Heritage, InFuture 2007. Zagreb, Croatia. pdf bibtex

Nikola Ljubešić, Damir Boras, and Ozren Kubelka. Retrieving information in Croatian: Building a simple and efficient rule-based stemmer, Digital Information and Heritage, InFuture 2007. Zagreb, Croatia. pdf bibtex

Nikola Ljubešić, Nives Mikelić, and Damir Boras. Language identification: How to distinguish similar languages, Proceedings of the 29th International Conference on Information Technology Interfaces. Cavtat, Croatia. pdf bibtex