Zbornik
JEZIKOVNE TEHNOLOGIJE ZA SLOVENSKI JEZIK

6.-7. oktober 1998
Cankarjev dom
Ljubljana

Konferenčni prispevki so izšli v:

T. Erjavec, J. Gros (ur.):
JEZIKOVNE TEHNOLOGIJE ZA SLOVENSKI JEZIK: zbornik konference
Institut Jožef Stefan, Ljubljana
ISBN 961-6303-00-7, 133 str.

Tiskani zbornik je žal pošel, zato pa so članki dostopni preko mreže:

Na tej strani najdete kazalo zbornika, povzetke in seznam avtorjev. Povzetki so v slovenskem in angleškem jeziku, vsebujejo pa tudi naslove avtorjev. Kazlaci v oglatih oklepajih ustrezajo številkam strani v zborniku. Vsak povzetek ima skozi ime povezavo na prispevek iz zbornika; članki so zaenkrat dostopni v formatu PDF (za branje/tiskanje potrebujete Acrobat Reader).


Language Technologies for the Slovene Language
On-line Proceedings

6.-7. oktober 1998
Cankarjev dom
Ljubljana

The Conference contributions are available in:

T. Erjavec, J. Gros (eds.):
JEZIKOVNE TEHNOLOGIJE ZA SLOVENSKI JEZIK: zbornik konference
Language Technologies for the Slovene Language: proceedings of the conference
Jozef Stefan Institute, Ljubljana
ISBN 961-6303-00-7, 133 pp.

This paper proceedings are unfortunately out of print, however, the papers are available on the WWW:

this page contains the table of contents, the abstracts and list of authors. The abstracts are in Slovene and English and contain the addresses of the authors. Pointers in square brackets correspond to pages in the published proceedings. Each abstract has linked to its title the paper from the proceedings; the articles are currently available in PDF format, so you need Acrobat Reader to view and print them.

Kazalo / Table of Contents

Uvodni del: naslovnica, predgovori, odbori, kazalo, seznam avtorjev
Front matter: title page, prefaces, committees, table of contents, author index
  1. Vabljena predavanja / Invited Lectures
  2. Uvodna predavanja / Introductory Lectures
  3. Iskanje informacij / Information Retrieval
  4. Govor / Speech
  5. Prevajanje / Translation
  6. Korpusi in standardizacija / Corpora and Standardisation

Knjižica povzetkov / Booklet of Abstracts


Language Resources, TELRI, and Multilingual Lexical Semantics

Wolfang Teubert

Institut für deutsche Sprache,
Postfach 10 16 21, D-68016 Mannheim
Tel/Fax +0049-621-1581-415
wolfgang.teubert@ids-mannheim.de

Abstract

In the emerging European civil society, all citizens must be able to communicate freely and easily with each other and with the public institutions serving them, without being restricted by language barriers. There must be more and better language instruction, and there must be a multilingual language technology helping everyone to retrieve information regardless of the source language, to translate and to write texts in foreign languages. Most classical machine translation systems and other translation aids use a concept-based approach developed in AI research on a cognitive linguistics foundation. This approach works for controlled, that is, quasi-formal languages, but not for general language. There we need an abundance of language data extracted from monolingual and multilingual resources and processed for translations platforms. Instead of language-independent concepts, these platforms work with translation units and their equivalents, as they can be found in parallel corpora. TELRI brings together focal research centers in Europe and promotes multilingual research, creation of such resources and of corpus-derived linguistic knowledge.

Language Technology and Multilinguality - the European Dimension

Poul Andersen

European Commission
Address: EUFO 1197 -- rue Alcide de Gasperi -- L-2920 Luxembourg
Tel. +352-4301-34324, fax +352-4301-34655
poul.andersen@lux.dg13.cec.be

Abstract

The European Commission supports a multilingual Europe, where each citizen can use his native language. With the ongoing integration process in Europe, and the advance of the Information Society, we must create the necessary tools to facilitate communication across language barriers, and make these tools easily available to professionals in the translation sector and information-related industries, as well as to ordinary people.

The Commission pursues these goals, both through promotional action in the MLIS programme, and through support for Research and Technological Development in the Framework Programmes.

The Commission is itself an important user of Language Technology, with more than 1300 translators working in the 11 official languages of the European Union -- and the expected extension with 11 new member states (10 CEECs + Cyprus) will increase the number of official languages up to 21.

DISCLAIMER: This article is written as an overview conference paper, to provide general information about Commission-supported activities in an informal way, and does not constitute an official policy statement.


Natural Language Processing at the Xerox Research Centre Europe

Jean-Pierre Chanod

Xerox Research Centre Europe
6, chemin de Maupertuis, 38240 Meylan, France
Tel.: +33 (0)4 76 61 50 75
chanod@xrce.xerox.com

Abstract

The Xerox Research Centre Europe (XRCE, see http://www.rxce.xerox.com for more information) pursues a vision of document technology where language, physical location and medium - electronic, paper or other - impose no barrier to effective use.

Our primary activity is research. Our second activity is a Program of Advanced Technology Development, to create new document services based on our own research and that of the wider Xerox community. We also participate actively in exchange programs with European partners.

Language issues cover important aspects in the production and use of documents. As such, language is a central theme of our research activities. More particularly, our Centre focuses on multilingual aspects of Natural Language Processing (NLP). Our current developments cover more than fifteen European languages and some non-European languages such as Arabic, Turkish or Chinese. Some of these developments are conducted through direct collaboration with academic institutions all over Europe.

The present article is an introduction to our basic linguistic components and to some of their multilingual applications.



On the Vowel System in Present-Day Standard Slovene

Tatjana Srebot Rejec

Department of Comparative and General Linguistics
Faculty of Arts of the University in Ljubljana
Aškerčeva 2, 1000 Ljubljana, Slovenia
tel: +386 61 1769200; fax: +386 61 1259337

POVZETEK

Današnji slovenski samoglasniški sestav sestoji iz osmih fonemov, ki so vsi lahko naglašeni, medtem ko jih je v nenaglašenem položaju lahko le šest, ker sta široka e in o lahko samo naglašena. V nenaglašenem položaju je razlika med sprednjima srednjima samoglasnikoma e in o in zadnjima srednjima e in o odpravljena ter izgovarjamo v tem položaju nevtralni /E/ in /O/. Ni več minimalnih parov, kjer bi bila dolžina besedno razlikovalna. Tako imamo v današnji standardni slovenščini samo dve skupini samoglasnikov, naglašene in nenaglašene, ker dolžina ni več fonološko relevantna.

Abstract

The present-day vowel system in Slovene consists of eight vowel phonemes, that can all be stressed, while only six of them can appear in unstressed position as well because e and o can be only stressed. In unstressed position the difference between the close and open middle vowels e and o is abolished with the result that a neutral /E/ and /O/ is used. There are no minimal pairs with distinctive length. We have thus in Standard Slovene only two groups of vowels, stressed and unstressed, while length is not distinctive any longer.


VERBMOBIL: A Speech-to-Speech Translation System

Damir Ćavar, Wolfgang Menzel

Universitat Hamburg, FB Informatik, AB NatS
Vogt-Kolln-Str.30, D-22527 Hamburg
tel: +49-40-5494 2522, fax: +49-40-5494 2515
cavar@informatik.uni-hamburg.de

Abstract

Verbmobil is a speech-to-speech translation project that involves about 29 partners in 3 countries that combines research in continuous speech recognition and machine translation. It started in 1993 and went into its second phase at the beginning of 1997. This phase will last until the end of the year 2000. The main goal of the project is to develop a translation system for spontaneous speech that allows people who speak different languages (i.e. German, English or Japanese) to arrange appointments, make hotel reservations, or get travel information. This paper describes the basic goals of the Verbmobil project, the architecture of the system, and the evaluation efforts made.


AMEBIS IN JEZIKOVNE TEHNOLOGIJE

Miro Romih

Amebis d. o. o.
p. p. 69, 1241 Kamnik
miro.romih@amebis.si

POVZETEK

Podjetje Amebis d.o.o. se z jezikovnimi tehnologijami ukvarja že od leta 1990, pri čemer še posebno pozornost posveča slovenskemu jeziku. Članek opisuje aktivnosti podjetja Amebis d.o.o. na tem področju znanosti. Posebej so izpostavljeni črkovalnik, delilnik, besedna analiza, tezaver, korpus, elektronski slovarji, strojno podprto prevajanje in sinteza govora.

Abstract

The company Amebis d.o.o. has been involved in language technologies since the beginning of the 1990's, concentrating in particular on the Slovene language. The paper introduces the activities of Amebis d. o. o., with special attention given to the areas of spell-checking, hyphenation, syntax checking, thesaurus, corpus, electronic dictionaries, machine translation and speech synthesis.


GNUsl: PROSTO PROGRAMJE IN SLOVENŠČINA

Aleš Košir,1 Primož Peterlin,2 Tomaž Erjavec3

1 Hermes SoftLab,
Litijska 51, 1000 Ljubljana,
ales.kosir@hermes.si

2 Inštitut za biofiziko MF, Univerza v Ljubljani
Lipičeva 2, 1000 Ljubljana
primoz.peterlin@biofiz.mf.uni-lj.si

3 Odsek za inteligentne sisteme, Institut Jožef Stefan
Jamova 39, 1000 Ljubljana
tomaz.erjavec@ijs.si

POVZETEK

GNUsl na http://nl.ijs.si/GNUsl/ je ime za zbirko po internetu prosto dostopnih virov in programov, ki izkoriščajo znanje o slovenskem jeziku, državno uzakonjeni standardizaciji in kulturnih značilnostih, s čimer omogočajo rabo ali olajšujejo uporabo računalnikov v slovenskem kulturnem prostoru.

Zbirko programov spremljajo navodila, ki vsebujejo nasvete s področja internacionalizacije in lokalizacije. Nasveti s primeri se praviloma nanašajo na prosto programje in na bolj razširjene komercialno dostopne izdelke, pri čemer so rešitve sistematično ponujene predvsem za različne operacijske sisteme in za bolj razširjene urejevalnike besedil. Poleg tega navodila vsebujejo tudi informacije, specifične za slovenski kulturni prostor, na primer o pravilnem pisanju številk z decimalno vejico in o tem, kako so predpisani prazniki v Republiki Sloveniji. Navodila so pripravljena tako v obliki dokumentov, dostopnih na internetu, in v obliki elektronskih brošur, primernih za natis.

V ponudbi GNUsl je tudi Trubar, internetni servis za preverjanje pravilnosti črkovanja v mnogih jezikih, med katerimi je tudi slovenski. Uporabljeni črkovalnik GNU ispell je v duhu GNUsl prosto dostopen z izvorno kodo in s seznami uporabljenih besednih oblik vred.

Predstavljeno gradivo na GNUsl skuša sistematično zajeti prosto dostopno znanje s področja jezikovnih tehnologij in ga predstaviti na način, ki bo vsakomur razumljiv. Navodila so opremljena z značilnimi primeri.

Abstract

GNUsl, hosted on http://nl.ijs.si/GNUsl/, is the name for a collection of free documentation and software that uses information about the Slovene language, Slovene standards, and cultural specifics and helps people to use computers in the Slovene cultural environment.

The software collection is accompanied by guides, containing instructions on internationalization and localization. Instructions are focused on free software and on popular commercial software. Solutions are systematically provided for various operating systems and for prevalent text editors. Guidelines include notes that are specific for the Slovene cultural environment, such as the rule for comma as decimal separator and legislation on holidays in Slovenia. Instructions are available through Internet as hyperlinked active documents or in ready-to-print form.

GNUsl includes Trubar, an Internet spelling checker supporting many languages, including Slovene. Trubar is built on top of GNU ispell, a free spelling checker for which both program sources and dictionaries are freely available.

Materials presented on GNUsl compile most freely available knowledge from the field of language technologies for the Slovene language. They are presented in the form of understandable instructions, which are illustrated with examples.



ISKALNIK ZA SLOVENSKE IN ANGLEŠKE DOKUMENTE NA SVETOVNEM SPLETU

Jure Dimec1, Sašo Džeroski2, Ljupčo Todorovski1, Dimitar Hristovski1

1 Inštitut za biomedicinsko informatiko Medicinske fakultete,
Vrazov trg 2, 1105 Ljubljana,
tel. 31 32 33, fax. 311 540,
{jure.dimec| ljupco.todorovski}@mf.uni-lj.si, hristovski@ibmi.mf.uni-lj.si

2 Inštitut Jožef Stefan,
Jamova 39, 1000 Ljubljana
saso.dzeroski@ijs.si

POVZETEK

Predstavljamo razvoj informacijskih orodij za organiziranje in iskanje slovenskih in angleških medicinskih dokumentov, dostopnih na Svetovnem spletu. Orodja, zaenkrat še v testni fazi, omogočajo avtomatsko opisovanje vsebine dokumentov, iskanje z iskalnimi zahtevami v naravnem jeziku in rangiranje zadetkov po izračunani relevantnosti. Iskalnik se zaveda stanja poizvedbe, zato lahko iskalec z iskanjem s povratno zanko postopno izboljšuje kvaliteto iskanja.

Abstract

The development of information tools for the organization and searching of Slovene and English medical documents is presented. The tools, presently in testing phase, provide automatic subject description of documents, searching with natural language queries and ranking of search hits according to their relevance. The search engine is state-full allowing searcher to use relevance feedback in order to perform incremental improvement of search quality.


PRIPOMOČKI ZA ISKANJE PO POLNEM BESEDILU V SLOVENŠČINI

spec. Anton Tomažič, dipl. pravnik

IUS SOFTWARE, d.o.o. Ljubljana
Brnčičeva 31
1231 Ljubljana, SLO
telefon, fax: 373 903
http://www.ius-software.si/slo/podatki/tone.htm
tone.tomazic@ius-software.si

POVZETEK

Sodobne informacijske rešitve upravljanja z znanjem (predvsem iskanja po polnem besedilu), uporabe naravnega jezika in prepoznavnja glasu zahtevajo tudi uporabo zahtevnih jezikovnih modulov (virov), katerih pa za slovenski jezik še ni. Ker gre tako za splošen, kot za komercialen interes, bi veljalo združiti sredstva in znanje za izdelavo splošnih in vsem dostopnih modulov (tezavri besed, slovarji, seznami izrazov, fraz, katalogi znanja, itd.). Kot najbolj primerna oblika sodelovanja med državo in gospodarskimi družbami se kaže poseben konzorcij, katerega bi sklicalo Ministrstvo za raziskovanje in razvoj.

Abstract

The latest knowledge management information solutions (mostly full text search and retrieval), natural language and voice recognition require the use of sophisticated language modules (resources), that do not yet exist for the Slovenian language. Because of general as well as commercial interest it would be worth joining the material and human resources to develop general and public domain modules (thesauri, glossaries, lists of expressions, phrases, knowledge catalogues, etc.). As the most appropriate form of such a cooperation between the administration and commercial companies seems to be a dedicated consortium, initiated by the Ministry of Science and Technology.



AKUSTIČNA SPEKTRALNA FFT ANALIZA SAMOGLASNIŠKEGA SISTEMA SLOVENSKEGA JEZIKA

Prof.def. Martina Ozbič,
asistent stažist, diplomirani defektolog za telesno, duševno, gibalno motene in za govorno in slušno motene

Oddelek za defektologijo, Pedagoška fakulteta,
Kardeljeva ploščad 16, 1000 Ljubljana
tel.: 061/1892200
martina.ozbic@uni-lj.si, maozbic@tin.it

POVZETEK

Članek opisuje formante slovenskih samoglasniških fonemov naglašenih i-ja, ozkega in širokega e-ja, a-ja, ozkega in širokega o-ja in u-ja, polglasnika in nenaglašenih i-ja, e-ja, a-ja, o-ja in u-ja. Na osnovi FFT analize s spektralnim analizatorjem Bruel&Kjaer 2148 je avtorica analizirala izgovor vokalov enajstih oseb ženskega spola na osnovi prebrane predložene liste besed: poleg določitve višine posameznih formantov prihaja do izraza pomen akustične in ne grafične klasifikacije vokalov.

Abstract

This paper presents the values of the formants based on the analysis with the FFT analyzer Bruel&Kjaer 2148 in slovene vowels stressed i, the closed stressed e, the open stressed e, the stressed a, the stressed open o, the closed stressed o, the stessed u and the schwa and the unstressed i, the unstressed e, the unstressed a, the unstressed o and the unstressed u, pronounced by 11 slovene females. The paper is stressing the importance of acoustic - phonetic and not graphic classification of vowels.


SLOVENSKI GOVOR NA INTERNETU

Tomaž Šef, Aleš Dobnikar, Matjaž Gams, Marko Grobelnik

Odsek za inteligentne sisteme
Institut Jožef Stefan
Jamova 39, 1000 Ljubljana, Slovenija
Tel: +386 61 1773419, fax: +386 61 1251038
tomaz.sef@ijs.si

POVZETEK

Predstavljamo sintetizator slovenskega govora, ki je sposoben samodejnega pretvarjanja poljubnih slovenskih besedil v govor. Sistem temelji na združevanju osnovnih govornih enot s pomočjo algoritma TD-PSOLA, ki smo ga dopolnili z linearno interpolacijo s spremenljivim številom interpoliranih period. Zasnovan je modularno, kar omogoča enostavno popravljanje in spreminjanje posameznih delov sistema. Sintetizator smo uporabili v zaposlovalnem agentu EMA in sicer za govorno posredovanje obvestil o prostih delovnih mestih na internetu.

Abstract

This paper presents a text-to-speech (TTS) system, capable of synthesising continuous Slovenian speech. The system is based on the concatenation of basic speech units, diphones, using TD-PSOLA technique improved with a variable length linear interpolation process. Input text is processed by a series of independent modules. That enables an easy improvement of separate parts of the system. Our system is used in an employment agent EMA that provides employment information through the Internet. It is the most often visited and used intelligent system in Slovenia.


GOVOREČI RAČUNALNIK

Jerneja Gros, France Mihelič, Nikola Pavešić

Fakulteta za elektrotehniko
Tržaška cesta 25
tel. (061) 1768 316, fax (061) 1264 900
nejka@fe.uni-lj.si

POVZETEK

V članku opisujemo različne postopke in meritve, ki smo jih udejanili oz. izvedli pri razvoju sintetizatorja govora za slovenski jezik. Sintetizator govora smo uporabili za podajanje odgovorov v samodejnem odzivniku za poizvedovanje o letalskih informacijah. Sintetizator govora lahko pretvori poljubno slovensko besedilo v razumljiv računalniški govor. Sprva se vhodno besedilo pretvori v zaporedje fonetičnih simbolov. Nato za vsak glas napovemo njegovo trajanje, za zveneči glas pa še višino osnovnega tona, na kateri naj bo glas izgovorjen. Sledi preoblikovanje in povezovanje kratkih, vnaprej posnetih delov govornega signala, pri čemer se upoštevata želeno trajanje in višina (zvenečih) glasov.

Abstract

A text-to-speech system, capable of synthesising continuous Slovenian speech from an arbitrary input text is presented. The input text is transformed into its spoken equivalent by a series of modules, which we describe in detail. Further, a special approach to prosody modelling is presented. F0 modelling is based primarily on predicting the appropriate tonemic accent. Phone durations are predicted by a two-level approach, taking into account how acceleration or slowing down apply to the durations of individual phones. The TTS system is based on the concatenation of basic speech units, diphones, using the TD-PSOLA technique.


UPORABA GOVORNE TEHNOLOGIJE PRI AVTOMATIZACIJI DALJINSKIH TELEFONSKIH STORITEV

Zdravko Kačič, Bogomir Horvat, Bojan Imperl, Andrej Miksič, Janez Kaiser, Matej Rojc, Mirjam Sepešy-Maučec

Fakulteta za elektrotehniko, računalništvo in informatiko
Smetanova 17, 2000 Maribor
Tel.: 062 220 7 220, fax: 062 211 178
kacic@uni-mb.si

POVZETEK

V članku podajamo pregled možnosti uporabe govorne tehnologije pri avtomatizaciji daljinskih telefonskih storitev. Z razvojem ISDN in mobilne telefonije se je močno razmahnilo trženje daljinskih telefonskih storitev. Uvedba govorne tehnologije omogoča pocenitev obstoječih storitev in zaradi majhnih stroškov delovanja tudi razvoj množice novih. Na koncu članka predstavljamo eksperimentalna sistema HOBIS in VEDAMA, demonstracijski sistem O-tel in aplikacijo InfoDesk. Vsi so bili razviti na Fakulteti za elektrotehniko, računalništvo in informatiko v Mariboru, Laboratoriju za digitalno procesiranje signalov.

Abstract

In this paper an overview of possibilities for development of automatic teleservices is given. With development of ISDN and mobile telephone networks a new market of automatic teleservices is developing very fast. Introduction of speech technology can contribute to even faster growth of the market, largely by improving the existing teleservices and by introduction of new one. Two experimental systems HOBIS and VEDAMA, the demonstration system O-tel and the application InfoDesk are further described. They were all developed at the Faculty of Electrical Engineering and Computer Science Maribor, Laboratory for Digital Signal Processing.


GOVORNA POIZVEDOVANJA V ČEŠČINI, NEMŠČINI, SLOVAŠČINI IN SLOVENŠČINI

France Mihelič, Ivo Ipšić, Jerneja Gros, Karmen Pepelnjak, Simon Dobrišek, Nikola Pavešić

Laboratorij za umetno zaznavanje
Fakulteta za elektrotehniko
Tržaška 25, 1000 Ljubljana
Tel. + 386 61 1768 313, fax + 386 61 1264 631
mihelicf@fe.uni-lj.si

POVZETEK

V članku predstavljamo mednarodni projekt SQEL (Spoken Queries in European Languages) in posebej opisujemo dosežke raziskovalne skupine na Fakulteti za elektrotehniko v okviru tega projekta. To so: posebne podatkovne zbirke slovenskega govora in besedil, udejanjenje razpoznavalnika tekočega slovenskega govora, pomenska analiza govornih sporočil in sistem za samodejno tvorjenje slovenskega govora.

Abstract

In the paper we present the SQEL (Spoken Queries in European Languages) project and outline the work done by the Speech Recognition Group at the Faculty of Electrical Engineering, especially on Slovenian speech corpora, realisation of a Slovenian continuous speech recogniser, semantic analysis of spoken messages and Slovenian text-to-speech synthesis.


ZUNAJJEZIKOVNE OKOLIŠČINE NEIDEALNEGA GOVORA

Primož Vitez

Filozofska fakulteta
Aškerčeva 2, 1000 Ljubljana
tel.: 00386 61 1769 333, fax: 00386 61 1259 337
primoz.vitez@uni-lj.si

POVZETEK

Razkrivanje pravil, ki uravnavajo notranjo organiziranost govorne verige in medglasovnih prehodov, je ključni raziskovalni cilj sodobne eksperimentalne fonetike, ki se ukvarja z univerzalnim vidikom členjenja kot bistvenega govornega problema. V smislu ponazoritve prožnih postopkov pri zaznavi neidealno tvorjenih glasov in njihovih sklopov bomo spregovorili o akustični stabilnosti glasovnih jeder v govorni verigi. Za natančnejše razumevanje stvarnosti v človeškem govoru je namreč bistveno vprašanje, kakšna je narava te stabilnosti.

Abstract

The analysis of rules generating the inherent organisation of a speech chain and sound transitions is one of the key aims of experimental phonetics when it deals with the universal aspect of segmentation as an essential speech problem. The elasticity of procedures in the perception of non-ideal speech production incites a reflection on acoustic stability of sound nuclea. To understand the reality of human speech it is essential to try to reveal the nature of this stability.



RAČUNALNIKI V PREVAJANJU: KAKO RAZVRSTITI PREDNOSTI?

Jaro Lajovic

Topniška 45, 1000 Ljubljana
Tel/faks: 061-1375-284
jaro.lajovic@mf.uni-lj.si

POVZETEK

Povečevanje informacijskega pretoka povečuje potrebo po prevajanju in računalniških orodjih zanj. Kljub medsebojni integraciji so ta orodja po zasnovi zelo različna, zato je njihovo zgradbo včasih smiselno obravnavati kot dva modula: programskega in jezikovnega. Pri razvrščanju razvojnih prioritet velja upoštevati, da so programski moduli sorazmerno jezikovno neodvisni, da je njihov razvoj dolgotrajen in da je bilo vanje vloženega že veliko dela drugod. Po drugi strani so jezikovni viri specifični in hkrati pomemben temelj za razvoj prevajalskih programov. Čeprav moramo slednjim (tudi razvojno) posvečati pozornost, je na področju jezikovnih tehnologij dolgoročno treba dati poudarek ustvarjanju jezikovnih zbirk: oblikovanju slovenskega nacionalnega korpusa, dvo- oz. večjezičnih korpusov s slovenščino in sorodnih baz (npr. splošnih in specialnih slovarjev).

Abstract

Increased information flow increases needs for translation and, consequently, for the computerised translation tools (e. g. translation memories, machine translation programs). In spite of their integration, the concepts of these tools differ widely among different groups. All can, however, be regarded as consisting of a program module and a linguistic one. When setting the development priorities it should be considered that program modules are relatively language-independent, that their building is time-consuming and that much work has already been invested in them in several countries. On the other hand linguistic resources are language-specific and are at the same time an important basis for the development of translation tools. Although the former should not be neglected in nations with lesser-used languages (e. g. Slovenian), the long-term priority in the field of language technologies should be given to the latter, specifically to linguistic corpora. In Slovenia this would, inter alia, require the establishment of the Slovenian National Corpus, building of bi- and multilingual corpora including Slovenian, and creation of corresponding databases (e. g. general and specialised dictionaries).


PROGRAMI S POMNILNIKOM PREVODOV S STALIŠČA MOREBITNEGA UPORABNIKA

Špela Vintar

Filozofska fakulteta
Oddelek za prevajanje in tolmačenje
Borštnikov trg 3, Ljubljana
Tel./Fax: +386 61 221 310
vintar@net.zaslon.si

POVZETEK

Med številnimi računalniškimi tehnologijami in orodji, ki jih imajo prevajalci danes na voljo, so programi s pomnilnikom prevodov (Translation Memory Software) zagotovo eden najpomembnejših premikov zadnjega desetletja, za slovenski prostor pa so zaradi svoje jezikovno neodvisne zasnove še posebej zanimivi. Prispevek tako predstavlja nekaj vidikov pri ugotavljanju njihove uporabnosti pri prevajanju v slovenščino ali iz nje. Tu se s stališča potencialnega uporabnika zastavlja vrsta vprašanj, na katera skušamo vsaj delno odgovoriti: Pri katerih vrstah besedil si lahko ob uporabi pomnilnika prevodov obetamo večjo učinkovitost? Za katere profile uporabnikov so ta orodja najbolj primerna? Kakšne spremembe vnaša njihova uporaba v prevajalski proces? Prispevek omenja tudi nekaj težav tehnične in konceptualne narave, ki jih sodobne prevajalske tehnologije prinašajo, in v sklepnem poglavju nakaže vizijo prihodnjega razvoja na tem področju.

Abstract

Among the various translation tools and aids available today, Translation Memory Systems are regarded as one of the most significant achievements of the past decade and their language independent design makes them all the more interesting for small languages like Slovene. The paper presents some aspects of their applicability in translations to and from Slovene from the potential user's perspective addressing the following issues: What text types can be handled by these translation tools successfully and by what user profiles? How does the implementation of TM-based tools affect the translation process and text flow? The concluding sections point out some technical and conceptual problems and present a vision of future development and needs in the field of translation technology.


PROBLEMATIKA PREVAJANJA ZAKONODAJE EVROPSKE UNIJE

Adriana Krstič

Služba Vlade RS za evropske zadeve
Slovenska c. 27, Ljubljana
Tel.: (061) 178 2511 faks: 178 2537
adriana.krstic@uvez.sigov.si

POVZETEK

Prevajanje zakonodaje Evropske unije v slovenščino ter slovenske zakonodaje v enega izmed jezikov EU (glede na dosedanjo prakso je to angleščina) pomeni enega največjih prevajalskih projektov zadnjih let v Sloveniji. Skupaj približno 160 000 strani besedil bi moralo biti prevedeno do leta 2003, ko naj bi Slovenija predvidoma postala polnopravna članica Evropske unije. Obvladovanje takšne količine prevajanja zahteva veliko število usposobljenih prevajalcev, premišljeno organizacijo dela, koordinacijo med prevajalci, uporabo enotnega izrazja, spremljanje prevoda od naročila do končnega izdelka, večkratno pregledovanje prevoda, lektoriranje in strokovno redakcijo. To delo je prevzel Prevajalski oddelek Službe Vlade RS za evropske zadeve. Prevajalcem je lahko prevajanje olajšano z dobrimi računalniškimi programi, ki so že na voljo. Mednje spadajo različni slovarji v elektronski obliki, črkovalniki, tezavri, CD-romi, program za pomoč pri popravljanju besedil, programi za urejanje terminologije in programi s pomnilnikom prevodov. Nekateri od njih so že integrirani v urejevanike. V nadaljevanju bodo podrobneje predstavljeni problematika prevajanja in urejanja omenjenih besedil, tipi dokumentov EU in možnosti uporabe računalniških programov namenjenih prevajanju takih dokumentov.

Abstract

The translation of the European Union's legislation into Slovene, and of Slovene legislation into one of the official languages of the EU, most probably English, is one of the largest translation projects undertaken in Slovenia. Approximately 160 000 pages of different types of legal documents should be translated by the year 2003 when Slovenia plans to become full member of the European Union. In order to control such a quantity of translation work it is necessary to employ numerous qualified translators, coordinate them, use consistent terminology, monitor a document which is to be translated from the request through to the final version and provide expert revision and proof-reading of the translation. The Translation Unit within the Government Office for European Affairs is responsible for this task. Translation work can be assisted by computer software, including various electronic dictionaries, spelling checkers, thesauri, CDROMs, tools for text revisions, terminology management tools and tools based on translation memory, some of them already integrated into word processor. In the following chapters, various issues of translation management within the Translation Unit will be presented: workflow management, types of documents to be translated and the scope of possible usage of translation tools.


VIRTUALNA UČILNICA: UPORABA INTERNETA PRI POUČEVANJU TUJIH JEZIKOV

Agnes Pisanski

Oddelek za prevajanje in tolmačenje
Filozofska fakulteta
Aškerčeva 2
tel: 121 32 42
agnes.pisanski@guest.arnes.si

POVZETEK

V članku so predstavljeni najpogostejši načini, s katerimi internet vključujemo v pouk tujih jezikov: elektronska pošta, virtualna šola, gradiva, namenjena učenju tujega jezika, programi za izdelavo učnih gradiv, medosebna interakcija, podatki v zvezi z učenjem tujega jezika, učni pripomočki in avtentična gradiva. Izpostavljene so glavne prednosti in pomanjkljivosti uporabe interneta v učni situaciji.

Abstract

This paper presents the most frequently used ways of introducing the Internet into foreign language teaching: electronic mail, virtual classrooms, materials designed for language learners, language software, person-to-person interaction, information on language schools, study aids and authentic materials. The main advantages and disadvantages of using the Internet as a pedagogical tool are identified.



IZGRADNJA INFRASTRUKTURE POTREBNE ZA RAZVOJ GOVORNE TEHNOLOGIJE ZA SLOVENSKI JEZIK

Zdravko Kačič, Bogomir Horvat

Fakulteta za elektrotehniko, računalništvo in informatiko
Center za jezikovne tehnologije
Smetanova 17, 2000 Maribor
Tel.: 062 220 7 220, fax: 062 211 178
kacic@uni-mb.si

POVZETEK

V članku podajamo pregled aktivnosti pri izgradnji infrastrukture, potrebne za razvoj govorne tehnologije, ki jih vodi Center za jezikovne tehnologije na Fakulteti za elektrotehniko, računalništvo in informatiko Univerze v Mariboru. Naloga centra je skrb za načrtno izgradnjo infrastrukture, potrebne za razvoj jezikovnih tehnologij za slovenski jezik. Trenutno so vse aktivnosti usmerjene k zagotavljanju infrastrukture za razvoj govorne tehnologije.

Infrastrukturo za slovenski jezik, s katero razpolaga center, sestavljajo: baza izgovarjav SNABI (vsebuje govor 80 govorcev), baza izgovarjav SpeechDat II (vsebuje govor 1000 govorcev), fonetični slovar z 282 000 fonetičnimi transkripcijami slovenskih lastnih imen in korpus besed, ki vsebuje 750 000 besed. Center razpolaga še z osmimi drugimi bazami izgovarjav, predvsem za nemški in angleški jezik, fonetičnimi leksikoni za domala vse evropske jezike in s korpusi besed. Slednji vsebujejo skupaj več kot 39 milijonov besed.

Abstract

In this paper we describe the activities in setting up the necessary speech resources needed for development of speech technology. The activities are carried out by the Centre for Language Technology at the Faculty of Electrical Engineering and Computer Science, University of Maribor. The role of the Centre is to conduct the long term development of the language resources, needed for development of language technology for Slovenian language. Currently, all the activities are concentrated on setting up the resources needed for development of speech technology. The current speech resources for Slovenian language available at the Centre are: speech database SNABI, (80 speakers), speech database SpeechDat II (1000 speakers), phonetic lexicon with 282.000 phonetic transcriptions of Slovenian proper names and word corpus containing 750 000 words. The Centre also has eight speech databases (for German and English languages), phonetic lexica for almost all European languages and word corpus. Word corpus contains more than 39 millions words (for German and English languages).


GOPOLIS: SLOVENSKA PODATKOVNA ZBIRKA GOVORJENIH POIZVEDOVANJ

Simon Dobrišek, Jerneja Gros, Ivo Ipšić, Karmen Pepelnjak, France Mihelič in Nikola Pavešić

Univerza v Ljubljani, Fakulteta za elektrotehniko, Laboratorij za umetno zaznavanje
Tržaška 25, SI-1000 Ljubljana
tel: 061 1768 467, fax: 061 1264 630
(simond, nejka, ivoi, mihelicf, nikolap)@fe.uni-lj.si

Zahvala: Delo je bilo sofinancirano s strani komisije Evropske skupnosti v okviru projekta COP-94 in pogodbe št. 01634 (SQEL)

POVZETEK

Ustrezne govorne podatkovne zbirke so nepogrešljive pri razvoju in gradnji sistemov za samodejno razpoznavanje govorjenega jezika. Tovrstni sistemi so pomemben del sodobnih jezikovnih tehnologij. V članku predstavljamo slovensko podatkovno zbirko govorjenih poizvedovanj, ki smo jo pripravili v našem laboratoriju. Zbirka vsebuje preko 8000 označenih posnetkov stavkov, ki jih je izgovorilo 50 govorcev, in vrsto dodatnih uporabnih podatkov. Predstavljeno zbirko smo uporabili kot učni material pri gradnji samodejnega sistema za vodenje govorjenega dialoga z uporabnikom, ki poizveduje po letalskih informacijah. Del zbirke je tudi izbor računalniških simbolov posebne fonetične abecede. Ta izbor ponujamo kot predlog za standardni računalniški fonetični zapis slovenskega govora na področju jezikovnih tehnologij.

Abstract

Spoken language data are essential for development and building of automatic speech recognition systems. This paper presents a Slovenian speech database of spoken queries. GOPOLIS is a large multi-speaker database, derived from real situation dialogues concerning airline timetable information service. The database consists of the recordings of more than 8000 sentences, spoken by 50 speakers. It was used as the Slovenian database within the SQEL project for building a multi-lingual speech recognition and understanding dialog system. On the basis of the Speech Assessment Methods Phonetic Alphabet, a selection of machine readable phonetic symbols appropriate for the Slovenian spoken languageis also presented.


GOVORJENA BESEDILA IN KORPUS SLOVENSKEGA JEZIKA

Simona Kranjc

Filozofska fakulteta v Ljubljani
Aškerčeva 2
1000 Ljubljana
tel.: 061-1769-232, fax: 061-1259-337
simona.kranjc@guest.arnes.si

POVZETEK

Govorjena besedila so v slovenskem prostoru malo raziskana, kar je predvsem posledica značilnosti te vrste besedil. Na podlagi pilotne analize otroškega govora bomo ob obravnavi osebnih deiksisov, to je izrazov s kazalno vlogo, skušali pokazati razliko med govorjenimi in zapisanimi besedili in s tem utemeljiti potrebo po načrtovanem zbiranju govorjenih besedil in njihovo vključitev v korpus slovenskega jezika.

Abstract

So far spoken texts have been rather poorly researched by Slovene experts, which is to be seen as connected with the very nature of such texts. On the basis of a pilot analysis of child speech, which takes into particular consideration personal deictics (i.e. expressions having a demonstrative function), the paper will attempt to show differences between spoken and written texts and argue that systematic collection of spoken texts and their inclusion in the Slovene language corpus would be necessary.


KORPUSI V PREVODOSLOVJU

Nataša Hirci

Oddelek za prevajanje in tolmačenje
Filozofska fakulteta
Aškerčeva 2
Ljubljana
tel: 061/121 32 40
natasa.hirci@guest.arnes.si

POVZETEK

Tehnologija postaja neizogiben del teoretičnega in uporabnega jezikoslovja. Članek predstavlja tipologijo računalniško berljivih besedilnih korpusov in njihovo uporabnost v jezikoslovju, še posebej v prevodoslovju. Predstavljena so tudi orodja za delo z njimi. Prevodoslovje se je znašlo pred vprašanjem, ali je besedilne korpuse mogoče koristno uporabljati tudi pri prevajanju in kako. Vse več prevodoslovcev se zavzema, da bi jih vpeljali v prevodoslovne študije, saj nam lahko pomagajo ugotavljati prevodno ustreznost, uporabni pa so tudi kot pripomoček pri poučevanju prevajanja.

Abstract

Technology is becoming an inevitable part of theoretical and applied lingustics. This article presents a typology of computer-readable textual corpora and their applicability to linguistics, and in particular in translation. Tools used to search corpora are also presented. Translation studies are faced with the question of whether it is possible to successfully apply corpora to translation and how to do so. More and more translation experts and theoreticians favour the introduction of textual corpora in translation studies as they can be of great use in the search for translation equivalence. They can also be useful as an aid in translator training.


STANDARDIZACIJA ZAPISA JEZIKOVNIH PODATKOV

Tomaž Erjavec

Odsek za inteligentne sisteme
Institut Jožef Stefan
Jamova 39, Ljubljana
tomaz.erjavec@ijs.si

POVZETEK

Standardizirani računališki zapis jezikovnih podatkov poveča njihovo uporabnost, saj spodbudi večnamenskost in izmenljivost podatkov ter poveča njihovo trajnost. V članku argumentiram koristnost standardizacije, nato pa se osredotočim na standard ISO SGML (Standard Generalised Markup Language) ter z njim povezane standarde in pobude. Obravnavam standarde za prenos jezikovnih podatkov po omrežju (HTML/XML), za zapis jezikovnih podatkov v znanstvene namene (TEI) in za zapis terminoloških podatkov (MARTIF, TMX). Te standarde in pobude predstavim na primerih, podam njihovo uporabo na zapisih slovenskega jezika ter pokažem na možne uporabe pri nas.

Abstract

Standardised digital encoding of language data increases its utility as it facilitates multiple uses and interchange of the data and increases its longevity. The paper outlines the benefits of standardisation, and then focuses on the ISO standard SGML (Standard Generalised Markup Language) and standards and initiatives connected with SGML. Described are those for transfering language data over the internet (HTML/XML); for encoding language data for scholarly purposes (TEI); and for encoding of terminological data (MARTIF, TMX). The discussion of these standards and initiatives is accompanied by examples, and possible applications to Slovene language data.


KORPUS FIDA

Tomaž Erjavec,1 Vojko Gorjanc,2 Marko Stabej2

1 Odsek za inteligentne sisteme, Institut Jožef Stefan
Jamova 39, Ljubljana
tomaz.erjavec@ijs.si

2Oddelek za slovanske jezike in književnosti, Filozofska fakulteta, Univerza v Ljubljani
Aškerčeva 2, Ljubljana
vojko.gorjanc@guest.arnes.si, marko.stabej@guest.arnes.si

POVZETEK

V okviru projekta FIDA je v izdelavi referenčni korpus slovenskega jezika. Članek predstavi projekt in opiše zvrstnost ter zgradbo korpusa FIDA. Posebej opiše način vključevanja besedil v korpus in digitalni zapis korpusa.

Abstract

The FIDA project is compiling a reference corpus of the Slovene language. The paper introduces the project nad describes the characteristics and structure of the FIDA corpus. The methodology of incorporating texts into the corpus and the digital coding of the corpus is also discussed.


DIGITALNI ZAPIS SLOVENSKIH ZNAKOV

Primož Peterlin,1 Aleš Košir,2 Tomaž Erjavec3

1 Inštitut za biofiziko MF, Univerza v Ljubljani
Lipičeva 2, 1000 Ljubljana
peterlin@biofiz.mf.uni-lj.si

2 Hermes SoftLab,
Litijska 51, 1000 Ljubljana,
ales.kosir@hermes.si

3 Odsek za inteligentne sisteme, Institut Jožef Stefan
Jamova 39, 1000 Ljubljana
tomaz.erjavec@ijs.si

POVZETEK

Obdelana je tematika zapisa slovenskih znakov v računalništvu in informatiki. Predstavljen je pregled obstoječih zakonskih norm in uporabljenih praktičnih rešitev. Navedemo nekaj primerov slabih rešitev, jih komentiramo, in predlagamo boljše. Kot problem, ki še čaka na rešitev, pa so izpostavljeni problemi standardiziranega kodiranja pismenk, ki so jih uvedli slovenski slovničarji s prve polovice 19. stoletja.

Abstract

The topic of digital coding of Slovene characters in computer is treated. An overview of the existing regulating norms is presented, along with solutions found in practice. Some examples of inferior solutions are quoted and commented, with suggestions for improvements. The problem of digital coding of glyphs introduced by the 19th century Slovene grammarians is presented as a case still waiting to be solved.

Kazalo po avtorjih / Author Index


Stran http://nl.ijs.si/isjt98/zbornik.html, zadnja sprememba 2001-05-06, Tomaž Erjavec