next up previous contents
Next: Romanian Up: Morphosyntactic Tagging Previous: Estonian


 COP project 106 MULTEXT-East Deliverable D2.3 F ``1984'', Hungarian

        <h.title>Multext-East cesAna: Nineteen Eighty-Four, Hungarian</h.title>
          <respname>Csaba Oravecz</respname>
          <resptype>Overall Responsibility</resptype>
          <respname>Vladim&iacute;r Petkevi&ccaron;</respname>
          <resptype>Conversion to cesAna DTD </resptype>
      <editionstmt version="1.0">MTE Final Release</editionstmt>
        <byteCount units="MB">18.4</byteCount>
        <extnote>wordCount represents he number of TOK TYPE=WORD
          elements in the text. byteCount is in megaBytes</extnote>
           Research Institute for Linguistics, Hungarian Academy of Sciences
        <pubaddress> Budapest, Sz&iacute;nh&aacute;z u. 5-9.</pubaddress>
        <eaddress type="email"></eaddress>
        <eaddress type="www"></eaddress>
        <availability status="restricted">
          Available for research purposes upon receipt of signed agreement
        <pubDate value="1998-01-01">January 1st, 1998</pubDate>
           <>George Orwell</>
           <publisher>Eur&oacute;pa K&ouml;nyvkiad&oacute;</publisher>
        Multilingual Text Tools and Corpora for Central and Eastern
        European Languages.
        EU Copernicus Project COP106
          In the cesDoc to cesAna conversion, DIV, QUOTE, Q tags and
          HEAD, POEM, LIST elements have been omitted. cesDoc P
          elements are encoded as PAR, and S as S.
          cesDoc sub-S level tags are omitted: DATE, NAME, ABBR, etc.
          Q and QUOTE tags from the cesDoc source not retained.
          S segmentation same as in cesDoc source (hand-validated).
          TOK segmentation performed with mtseg and manually corrected,
        <tagusage gi=chunkList occurs=1>
          Element corresponds to TEXT of the cesDoc source
        <tagusage gi=chunk occurs=1>
          Element corresponds to BODY of the cesDoc source
        <tagusage gi=par occurs=1303>
          Elements correspond to P elements of the cesDoc source.
          The FROM attribute gives the reference to the ID of the
          corresponding cesDoc P element.
        <tagusage gi=s occurs=6768>
          Elements correspond to S elements of the cesDoc source
          The FROM attribute gives the reference to the ID of the
        corresponding cesDoc S element.
        <tagusage gi=tok occurs=98426>
          Tokens are of TYPE=WORD or PUNCT, with the CLASS attribute
          giving the mtseg class of the token.
        <tagusage gi=orth   occurs=98426>
          Contains the orthography of the token, as found in the
          cesDoc source.
        <tagusage gi=disamb occurs=80705>
          Contains disambiguated lexical information.
        <tagusage gi=lex    occurs=111945>
          Contains undisambiguated lexical information.
        <tagusage gi=base   occurs=192650>
          Base or lemmma of a token.
        <tagusage gi=msd    occurs=192650>
          Morphosyntactic description of a token.
        <tagusage gi=ctag   occurs=98426>
          Corpus tag.
      <creation date="1997-11-04">
        <![ %ONECOMPONENT [ &ISOlang; ]]>
        <language id=ns-hu iso639=hu>Newspeak Hungarian</language>
        <respname>Csaba Oravecz, RIL</respname>
        <h.item>Initial header</h.item>
        <respname>Tomaz Erjavec, IJS</respname>
         <h.item>Converted from ISO Latin-2 to SGML entities</h.item>
         <h.item>Changed ... to &hellip;</h.item>
         <h.item>Modified EDITIONSTMT, BYTECOUNT</h.item>