JTDH 2018 Proceedings

Proceedings of the
Conference on Language Technologies & Digital Humanities 2018


Invited lectures

Malvina Nissim:
Too good to be true: Current approaches to author profiling
[PDF] [PPT]

Martijn Kleppe:
Bringing Digital Humanities to the wider public: libraries as incubator for DH research results
[PDF] [PPT]


Papers


Mihael Arčan:
A comparison of Statistical and Neural Machine Translation for Slovene, Serbian and Croatian
[PDF] [PPT]

Vuk Batanović, Nikola Ljubešić and Tanja Samardžić:
SETimes.SR – A Reference Training Corpus of Serbian
[PDF] [PPT]

Narvika Bovcon and Aleš Vaupotič:
Artistic Visualizations and Beyond: A Study of Materializations of a Digital Database
[PDF]

Petar Božović, Tomaž Erjavec, Jörg Tiedemann, Nikola Ljubešić and Vojko Gorjanc:
Opus-MontenegrinSubs 1.0: First electronic corpus of the Montenegrin language
[PDF] [PPT]

Nina Ditmajer, Matija Ogrin and Tomaž Erjavec:
Zapis in prikaz starejših pesniških besedil ter njihovih variant v TEI
(Encoding and rendering early modern Slovenian poetry texts and their variants in TEI)
[PDF] [PPT]

Helena Dobrovoljc and Urška Vranjek Ošlak:
Zakaj ne z eno poizvedbo po različnih korpusih? (Troje korpusnih preverb pod primerjalnim drobnogledom)
(One query to search different corpora? (A comparative scrutiny of three corpora checks))
[PDF] [PPT]

Kaja Dobrovoljc:
N-gram Frequency Lists for Reference Corpora of Slovenian Language
[PDF] [PPT]

Maja Dolinar, Janez Štebe and Sonja Bezjak:
Razvoj smernic za predajo in arhiviranje kvalitativnih podatkov v Arhivu družboslovnih podatkov
(Development of the guidelines for ingest and archiving of qualitative data in the Social Science Data Archives)
[PDF] [PPT]

Gregor Donaj and Mirjam S. Maučec:
From statistical machine translation to translation with neural networks for the Slovene-English language pair
[PDF] [PPT]

Darja Fišer and Monika Kalin Golob:
A corpus analysis of tweets of Slovene corporate users
[PDF] [PPT]

Darja Fišer, Jakob Lenardič and Tomaž Erjavec:
Citiranje jezikoslovnih podatkov v slovenskih znanstvenih objavah: stanje in priporočila
(Linguistic data citation in Slovene scientific publications: analysis and recommendations)
[PDF] [PPT]

Polona Gantar, Špela Arhar Holdt, Jaka Čibej, Taja Kuzman and Teja Kavčič:
Glagolske večbesedne enote v učnem korpusu ssj500k 2.1
(Verbal multi-word expressions in the Slovene training corpus ssj500k 2.1)
[PDF] [PPT]

Polona Gantar, Kristina Štrkalj Despot, Simon Krek and Nikola Ljubešić:
Towards Semantic Role Labeling in Slovene and Croatian
[PDF] [PPT]

Peter Holozan:
Zbirka primerov rabe vejice Vejica 1.3
(Corpus of comma usage Vejica 1.3)
[PDF] [PPT]

Lana Hudeček and Milica Mihaljević:
Croatian web dictionary Mrežnik: One year later – What is different?
[PDF] [PPT]

Maria Jose Bocorny Finatto, Paulo Quaresma and Maria Filomena Gonçalves:
Portuguese Corpora of the 18th century: old Medicine texts for teaching and research activities
[PDF] [PPT]

Alenka Kavčič, Ivan Lovrić and Vera Smole:
Interaktivna karta slovenskih narečnih besedil
(Interactive map of Slovenian dialect texts)
[PDF] [PPT]

Aleksander Ključevšek, Simon Krek and Marko Robnik-Šikonja:
Efficient calculation of frequency statistics for Slovene language corpora
[PDF] [PPT]

Iztok Kosem, Simon Krek, Polona Gantar, Špela Arhar Holdt, Jaka Čibej and Cyprian Laskowski:
Kolokacijski slovar sodobne slovenščine
(Collocation dictionary of modern Slovene)
[PDF] [PPT]

Aniko Kovač and Maja Marković:
A Rule-Based Syllabifier for Serbian
[PDF] [PPT]

Milan van Lange and Ralf Futselaar:
Debating Evil: Using Word Embeddings to Analyze Parliamentary Debates on War Criminals in The Netherlands
[PDF] [PPT]

Nikola Ljubešić, Željko Agić, Filip Klubička, Vuk Batanović and Tomaž Erjavec:
hr500k – A Reference Training Corpus of Croatian
[PDF] [PPT]

Nikola Ljubešić, Darja Fišer, Tomaž Erjavec and Filip Dobranić:
The Parlameter corpus of contemporary Slovene parliamentary proceedings
[PDF]

Nikola Ljubešić, Tomaž Erjavec and Darja Fišer:
KAS-term and KAS-biterm: Datasets and baselines for monolingual and bilingual terminology extraction from academic writing
[PDF]

Nataša Logar and Tomaž Erjavec:
Strokovno-znanstvena slovenščina: besednovrstne in oblikoskladenjske značilnosti
(Academic Slovene: part-of-speech and morphosyntactic characteristics)
[PDF] [PPT]

Tatjana Marvin, Jure Derganc, Samo Beguš and Saba Battelino:
Word Selection in the Slovenian Sentence Matrix Test for Speech Audiometry
[PDF] [PPT]

Eneja Osrajnik, Darja Fišer and Vojko Gorjanc:
Korpusna analiza nestandardne vejice po uvajalnih prislovnih zvezah v slovenskih formalnih in neformalnih besedilih
(Corpus analysis of non-standard comma usage after introductory adverbial phrases)
[PDF] [PPT]

Andrej Pančur:
Trajnost digitalnih izdaj: Uporaba statičnih spletnih strani na portalu Zgodovina Slovenije – SIstory
(Sustainability of digital editions: Use of static web pages at the History of Slovenia – SIstory portal)
[PDF] [PPT]

Andrej Pančur, Alenka Pirman and Maruša Kocjančič:
Spregledana kulturna dediščina in uporaba digitalne raziskovalne infrastrukture za humanistiko v raziskavi Odlivanje smrti
(Overlooked cultural heritage and the use of digital research infrastructure for humanities in the research action Casting of Death)
[PDF] [PPT]

Miha Pavlovič and Rena Ito:
Analiza slovničnih napak v korpusu spisov učencev japonščine na osnovni ravni
(Analysis of grammatical errors in a Japanese beginner learner corpus)
[PDF] [PPT]

Benedikt Perak and Filip Rodik:
Building a corpus of the Croatian parliamentary debates using ReLDI API for tokenization, lemmatization, syntactic parsing and Neo4j graph database for creation of social ontology model, text classification and extraction of semantic information
[PDF]

Dan Podjed and Ajda Pretnar:
Self-Promotion on Instagram: A Case of President’s Profile
[PDF] [PPT]

Ajda Pretnar and Dan Podjed:
Data Mining Workspace Sensors: A New Approach to Anthropology
[PDF] [PPT]

Ivanka Rajh and Siniša Runjaić:
Crowdsourcing terminology: harnessing the potential of translator’s glossaries
[PDF] [PPT]

Tadej Škvorc, Simon Krek, Senja Pollak, Špela Arhar Holdt and Marko Robnik-Šikonja:
Evaluation of Statistical Readability Measures on Slovene texts
[PDF] [PPT]

Tobias Weber and Jeremy Bradley:
Exploring Finno-Ugric linguistics through solving IT problems
[PDF] [PPT]


Abstracts


Katja Mihurko Poniž, Amelia Sanz, Marie Nedregotten Sørbø, Suzan van Dijk, Viola Parente-Čapková, Narvika Bovcon and Aleš Vaupotič:
Teaching women writers with NEWW Virtual Research Environment
[PDF]

Damjan Popič and Darja Fišer:
Odnosi do jezika v slovenski, hrvaški in srbski računalniško posredovani komunikaciji
(Attitude towards language in Slovene, Croatian, and Serbian Computer Mediated Communication)
[PDF]

Sara Ries:
Online database in Research of Correspondence of Franjo Ksaver Kuhač (1834-1911)
[PDF]

Christof Schöch, Maciej Eder, Carolin Odebrecht, Mike Kestemont, Antonija Primorac, Justin Tonra, Katja Mihurko Poniž and Catherine Kanellopoulou:
Distant Reading for European Literary History. A COST Action
[PDF]

Darinka Verdonik:
Corpus and database GOS Videolectures
[PDF]


Student Papers


Urška Bratoš:
Gradnja korpusa tvitov slovenskih politikov Janes-TwePo
(Compiling the Janes-TwePo corpus of tweets of Slovene politicians)
[PDF] [PPT]

Isolde Van Dorst:
You, thou and thee: A statistical analysis of Shakespeare’s use of pronominal address terms
[PDF] [PPT]

Klara Eva Kukovičič:
Uporabnost luščilnikov terminologije Sketch Engine in CollTerm z vidika (študenta) prevajalca
(Usability of Sketch Engine and CollTerm term extraction from the perspective of a (student) translator)
[PDF] [PPT]


Student Abstracts


Gabi Rolih:
K-means Clustering for POS Tagger Improvement
[PDF] [PPT]