The corpora of BCMS (Bosnian, Croatian, Montenegrin and Serbian) and Slovene are build with the TweetCaT tool. The collection process has started in June 2013 and is still under way.
There are ongoing efforts on discriminating between BCMS and specific corpora for each of those languages will follow.
Nikola Ljubešić, Darja Fišer and Tomaž Erjavec: TweetCaT: a tool for building Twitter corpora of smaller languages. In: LREC 2014. pdf bib.
|corpus||# of users||# of tweets||# of words|