The slWaC corpus contains texts extracted from the crawled HTML pages in Slovene (mostly)
from the .si domain.
This corpus is an extended version of the corpus described in:
Nikola Ljubešić and
Tomaž Erjavec:
hrWaC and slWac: Compiling Web Corpora for Croatian and Slovene.
Text, Speech and Dialogue 2011. Lecture Notes in Computer Science vol. 9743, 395-402
Springer.
|