Monolingual corpora
- SETimes.HR corpus and dependency treebank of Croatian
- hrWaC, the Croatian web corpus
- slWaC, the Slovene web corpus
- srWaC, the Serbian web corpus
- bsWaC, the Bosnian web corpus
- CroLTeC, the CROatian Learner TExt Corpus
Multilingual corpora
- SETimes corpus, a 10-languages corpus built from the setimes.com domain
- hrenWaC corpus, a Croatian–English parallel corpus built from the hrWaC web corpus data
- TED talks parallel corpus — 86,348 Croatian–English sentence pairs