DOCX to TEI to HTML Conversion

After submitting your Word file you can either receive the TEI P5 and HTML (and the .docx file) on a permanent and anonymous URL or get a .zip file containing these results. The service can also be used by programs (using normal HTTP and POST) as long as the server is not overloaded. By using this service you agree that the submitted files can be used for its further development.

Input Word file:

with profile and return url zip with HTML localisation in


This form provides an OxGarage-like service, supporting only one conversion path: it is trying to achieve high-quality (well, as much as possible) up-conversion from Office Open XML (.docx files) to TEI P5 (and from there to HTML for validation). The focus is on books from social sciences and humanities, for the latter esp. as the first stage in producing TEI-encoded text-critical editions of manuscripts and older prints.

This interface runs some recent version of the TEI Stylesheets with various local profiles, which can be chosen above. The most work went into the JSI profile, which fixes some bugs in the TEI docx conversion and introduces various extensions. However, it was written a number of years ago, and has not kept up with the development of the Stylesheets, so some of these changes might no longer be necessary (or desirable, as it redefines various existing templates).

The template file of the JSI profile contains a short tutorial and is also a test case for the conversion. The template uses standard Word styles, as well as defining many new ones, which start with the "tei:" prefix and are converted to appropriate TEI elements. The JSI profile template is available in source Word, derived TEI and HTML. The HTML is styled so that it looks, as much as possible, like the source Word with the intention that the HTML is compared side-by-side with the Word in order to check that the conversion to TEI was correct, and fix the Word if necessary.

The work on this service is carried out in the scope of the CLARIN.SI and DARIAH-SI research infrastructures and has been used for preparing the eZB digital editions in cooperation between the Institute of Slovenian Literature and Literary Studies ZRC SAZU and the Knowledge Technologies Department JSI.


Powered by TEI Valid XHTML 1.0 Transitional
Last update 2021-12-24, et