TWEET

TEI Header

§file description
§title statement
§title

type = main
TWEET corpus
§title

Slovene tweets
§statement of responsibility
§name Matija Rijavec
§responsibility Zagotovitev sporočil s strežnika sitweet.com.
§statement of responsibility
§name Tomaž Erjavec
§responsibility Pretvorba v TEI P5, vključitev v konkordančnike.
§extent 367,510 tweetov
§publication statement
§distributor nl.ijs.si
§availability

The corpus is available via the concordancers at nl.ijs.si

§source description

The Tweet-sl corpus contains Slovene language tweets in the period from 2009-08-30 to 2011-02-16. The tweets were collected on the aggregator sitweet.com.

§encoding description
§project description

The corpus was compiled to enable corpus-based studies of Slovene as used on social networks.

§tagging declaration
§namespace

name = http://www.tei-c.org/ns/1.0
§tag usage

gi = corpus occurs = 1
corpus
§tag usage

gi = u occurs = 367510
utterance
§tag usage

gi = w occurs = 5021853
word
§tag usage

gi = pc occurs = 1269967
punctuation character
§tag usage

gi = c occurs = 5158460
character
§application information
§application

ident = ToTrTaLe
§label

Linguistic analysis: s, w, pc, c
§revision description
§change
§date 2012-12-21
§name Tomaž Erjavec
§label Minor fixes in header.
§change
§date 2012-08-13
§name Tomaž Erjavec
§label Conversion of TWEET SQL database dump to TEI P5 and automatic linguistic analysis.


Datum: 2012-12-21

Avtorske pravice za besedilo te izdaje določa licenca Creative Commons Priznanje avtorstva 3.0.