Copyright (c) Robin Cover 1994-97. Last modified June 30, 1997.
This document http://www.sil.org/sgml/publicSW.html is part of the SGML Web Page. Support for development and maintenance of the SGML Web Page is provided in part by SoftQuad, Inc. and by the Summer Institute of Linguistics, to whom gratitude is acknowledged.
[Back to Main Page, Table of Contents] [Search the entire SGML database]
[CR: 19961029] [Table of Contents]
Priority is given to "public" SGML software in this document database since the scope of interest is mainly the Internet, where the ethic of public gift is highly esteemed. The wealth of SGML software made freely available for public use is evidence of that ethos. As a supplement to the links and information provided on public SGML software below, readers should consult the the hypertext version of Steve Pepper's "SGML Tools and Vendors" in the Whirlwind Guide, where a code for "public" identifies these same software offerings. The FTP location for a text version is: ftp://ftp.falch.no/pub/sgmltool/sgmltool.txt. See the main bibliographic entry for the Whirlwind Guide for a document abstract and detailed information about its contents. See also the detailed software summary for 89 products extracted from the technical report of Kuikka and Nikunen: (a) the full bibliographic entry, or (b) the overview in the "Commercial SGML Software" page. NICE Technologies [October 1996] also has an online database of SGML vendors and products.
Primary sections in this document include the following -- however infelicitious the taxonomy for software categories. See the Contents listing to link directly to a particular description.
[CR: 19970221]
James Clark's new SP parser toolkit is the successor to his SGMLS parser. The current version is SP 1.1.1 (July 30, 1996). SP is a "free, object-oriented toolkit for SGML parsing and entity management." SP is written in C++, supports the LINK feature, is reentrant (a single process can use multiple parsers at the same time), is command-line compatible with SGMLS, includes an application [nsgmls] to generate sgmls-style output format, and an application [rast] to generate RAST output format (like SGMLS) conforming to ISO/IEC 13673:1944. Other parser tools include [sgmlnorm], a simple SGML tag normalizer, and [spent], a facility for printing an SGML entity on standard output. SP supports any concrete syntax allowed by ISO 8879, and supports large character sets (can be compiled to use 16-bit characters internally; supported systems include UTF-8, Unicode/UCS-2, UJIS/EUC, and Shift-JIS). It is said to be fast for large documents. In addition to the C++ source code, binaries [nsgmls and rast] are available for MS-DOS (SP version 0.2) and several UNIX systems. The MS-DOS binaries use a 32-bit DOS extender (included in the distribution), so that the MS-DOS 640K conventional memory barrier should not be a limiting factor in the use of SP.
In the most recent releases of SP, James Clark has also issued some very useful tools that handle entities and "normalize" SGML documents in various ways, as specified in command line options. For example, SPAM (SP Add Markup) will provide canonical SGML when SHORTTAG and OMITTAG have been used in the SGML source. The output SGML is determined by the user's specification. SPAM (SP Add Markup) thus serves as a markup stream editor. See the documentation from the official site for complete details. Version 1.1 also supports Architectural Form Processing [mirror copy], on which, see the following "toy example".
Commercial support for SP is provided by TechnoTeacher, Inc. (although James Clark himself has no commercial connection with TechnoTeacher, Inc.). See the support announcement (September 1996).
Pointers to the latest released version of the SP parser (version 1.0.1: October 21, 1995) and its description:
[CR: 19970617]
PSGML is a GNU Emacs Major Mode for editing SGML coded documents. Version 0.4 requires GNU Emacs 19.19 or higher, Lucid Emacs 19.9, or OEmacs. "PSGML contains a simple SGML parser and can work with any DTD. Functions provided includes menus and commands for inserting tags with only the contextually valid tags, identification of structural errors, editing of attribute values in a separate window with information about types and defaults, and structure based editing."
[CR: 19970527]
SoftQuad Panorama is a free version of SoftQuad Panorama PRO. It supports browsing (and searching?) of fully compliant SGML documents on the WWW.
HoTMetaL is an unsupported version of the commercial product HoTMetaL Pro. It provides an editor/browser for (extended) HTML documents. HoTMetaL is available on a number of platforms (UNIX, MS-Windows, etc.). A tutorial for HoTMetaL Pro teaches HTML basics, supported by an HTML Quick Reference guide. The most recent [March 1995] Windows version of HoTMetaL supports some of the Netscape extensions (e.g., <CENTER>, <BLINK>), displays graphics inline, uses a stylesheet configured to look like a standard HTML browser, and supports a filter for loading plain text files and invalid HTML documents. See the posted public announcement or the fuller description on the SoftQuad server, including FTP location. Try the FTP directory ftp://ftp.ncsa.uiuc.edu/Web/html/hotmetal/Windows, and specifically the binary file ftp://ftp.ncsa.uiuc.edu/Web/html/hotmetal/Windows/hotm1new.exe).
[CR: 19961012]
perlSGML is a collection of Perl programs and libraries written by Earl Hood for processing SGML documents. The following software is available in the perlSGML distribution: dtd.pl (A Perl library to parse SGML DTDs), dtd2html (An SGML DTD documentation/navigation tool), dtddiff (a utility to list changes in a DTD), dtdtree (Generate content hierarchy trees of SGML elements), dtdview (Interactively query a DTD), sgml.pl (A Perl library to parse SGML instances), stripsgml (utility to remove SGML markup).
The 'dtd2html' tool is widely used. "What is dtd2html: dtd2html is part of the perlSGML package. dtd2html is a program that generates an HTML document (composed of several files) that documents and allows hypertext navigation of an SGML DTD."
[May 1996] "NORMDTD is a DOS (yes!) program that reads a valid SGML DTD, even a TEI-like one that uses marked sections and multiple input files, and generates a single file containing a normalized version of that DTD. The element content models in this normalized DTD will not contain any references to elements that are not declared, and so it can be used by highly-strung SGML packages such as RulesBuilder that refuse to process TEI applications (in particular) for this reason. In fact, having a normalized DTD in a single file can be helpful for a number of reasons, to a variety of SGML applications."
NORMDTD is written in Borland Pascal and runs only under DOS.
The SARA system. SARA (SGML-Aware Retrieval Application) is a client/server software tool allowing a central database of texts with SGML mark-up to be queried by remote clients. The system was developed at Oxford University Computing Services, with funding from the British Library Research and Development Department (1993-4) and the British Academy. The original motivation for its development was the need to provide a robust low-cost search-engine for use with the 100 million word British National Corpus, and several features of the system design necessarily reflect this.
The SARA system has four key parts:
Links:
[CR: 19970225]
[CR: 19970524]
DESCRIPTION: 'sgrep' (structured grep) is a tool for searching text files and filtering text streams using structural criteria. The authors are Pekka Kilpeläinen and Jani Jaakkola of Helsinki University, and they have distributed sgrep under GNU General Public License. The data model of sgrep is based on regions, which are nonempty substrings of text. Regions are typically occurrences of constant strings or meaningful text elements, which are recognizable through some delimiting strings. Regions can be arbitrarily long, arbitrarily overlapping, and arbitrarily nested. . . Like grep, sgrep can be used for any kind of text files. However it is most useful for text files containing some kind of structured text. A file containing structured text could be defined as a file, which obeys some syntax. Examples of structured text files are SGML, HTML, C, Tex and mail files."
"Sgtool is tcl/tk based X frontend to sgrep. Sgtool supports easy creation of sgrep queries and macro libraries. Sgtool requires tk version 4.0, and comes bundled in with the sgrep distribution."
Links (November 1996):
"MU is a perl-based program that builds fill-out forms for SGML editing, based on simple templates. It supports lock files (for networked workgroups), and it is distributed with a TEI-lite template. Demonstrations, source code, help files, and an email list for bug reports and developers are available. . .Features: (1) Helps to automate the SGML markup process; (2) Quite general - works on various types of DTD templates; (3) Version 1.1 deals quite nicely with attributes; (4) Allows for multi-user editorial communication through the use of remarks; (5) Supports internet workgroups via lockfiles."
Several companies have collaborated on the design of an SGML interchange language for word-processing formats. Rainbow makers produce SGML from the supported word-processing formats, preserving as much information about document structure as can be deduced reliably. The Rainbow SGML format can then be used as input to other applications. See further explanation on EBT's server or on the mirrors in the file 'rainbow.why'. Rainbow makers are now available (free) for FrameMaker/FrameBuilder MIF, RTF, Interleaf, and (possibly) Ventura. Authoritative files for the Rainbow distribution are located on EBT's FTP server (SGML Rainbow via ftp.ebt.com/pub/nv/dtd/rainbow/
Other sources for Rainbow makers include:
The ICA (Release 1.6, February 1994) is a toolset for generating data translators. In particular, the toolset can be used to generate translators to and from a constrained subset of instances of SGML Document Type Definitions (DTDs). There are several example translators included in the distribution. The first is a book DTD and includes specific translators for the LaTeX book documentstyle and a specific troff macro package. The second is a bibliographic DTD and includes specific translators for BibTeX and refer bibliographic database formats. Please note that the ICA is for developing translators and not providing translators. The ICA runs in the Unix environment, using the X Window System for the basis of the graphical user interfaces.
A new user's manual for ICA is also available. Published by Prentice Hall, the book is entitled The Integrated Chameleon Architecture: Translating Documents with Style, by Sandra Mamrak, Conleth S. O'Connell, and Julie Barnes. ISBN 0-13-056418-4. This book contains much new and revised material over the previously available online documentation, including a chapter on the ICA and SGML. See also description in excerpts from the release notes.
See further description in the ICA toolkit anouncement, and see network addresses for supporting mailing list. The sources for ICA on the Internet are:
"What is CoST? CoST (Copenhagen SGML Tool) is a structure-controlled SGML application programming tool. It is built on top of a public domain SGML tool: the SGMLS parser made by James Clark. With CoST you can write translation specifications for SGML document instances. CoST is purely structure driven, i.e. it gives you access to the structure of the SGML document instance. It won't, however, let you access the lexical and syntactical details in the SGML entities that represent the document instance in storage. You can write CoST programs that will translate SGML document instances or perform other processing in response to SGML documents. You program CoST using TCL - Tool Command Language." [from the Manual Introduction [March 1995]
CoST was written by Klaus Harbo (Klaus.Harbo@euromath.dk) and is currently [October1995] maintained by Joe English (joe@trystero.art.com).
Links:
"Costwish is a graphical interface (SGML postprocessor and renderer) for Joe English's CoST-2 tool. From the README: "costwish is a generic graphical interface to Joe English's CoST SGML/ESIS post-processing tool. It is aimed at those who wish to: (1) run sgmls (or other ESIS-based parser) under a graphical interface; (2) browse their documents graphically (3) customise their postprocessing easily, powerfully and flexibly; (4) construct powerful searches of SGML-based documents; (5) and manage the results interactively; (6) develop interfaces to helper applications (e.g. graphical renderers)." [from the README, April 1996]
Links:
Written by David Megginson, ". . .SGMLSpm is a free perl5 object-oriented postprocessor for James Clark's SGMLS and NSGMLS parsers. The main part of this release is a library, SGMLS.pm, which repackages the ESIS output of (N)SGMLS into perl5 objects. On top of this, I have built a script, sgmls.pl, for formatting or processing SGML documents quickly using event patterns. Like CoST (which is several times slower), and unlike QWERTZ (etc.), SGMLSpm is a general-purpose package which can be used with any DTD. It even includes a script, skel.pl, which will write a skeleton conversion script for your document automatically!"
"sgmlspl is a sample application distributed with the SGMLS.pm perl5 class library -- you can use it to convert SGML documents to other formats by providing a specification file detailing exactly how you want to handle each element, external data entity, subdocument entity, CDATA string, record end, SDATA string, and processing instruction. sgmlspl also uses the Output.pm library (included in this distribution) to allow you to redirect or capture output."
[CR: 19970128]
From the Language Technology Group, Human Communication Research Centre, University of Edinburgh: the "Normalised SGML Library (NSL version 2.0 ) . . .consists of a set of C programs for manipulating SGML files and a C application program interface (API) designed to ease the writing of C programs which manipulate SGML documents."
"LT NSL is a development environment for SGML-based corpus and document processing, with support for multiple versions and multiple levels of annotation. It consists of a C-based API for accessing and manipulating SGML documents and an integrated set of SGML tools. The LT NSL initial parsing module incorporates v1.1.1 of James Clark's SP software, arguably the best SGML parser available. The basic architecture is one in which an arbitrary SGML document is parsed once, yielding two results: (1) An optimised representation of the information contained in the document's DTD; (2) A normalised version of the document instance, which can be piped through any tools built using our API for augmentation, extraction, etc.
Links:
[CR: 19970107]
"RATFINK, a library of Tcl utilities for generating RTF files, including a Cost script for converting SGML to RTF, is now available." From Joe English
Links:
[CR: 19970422]
SGML-Tools "is a text-formatting package based on SGML (Standard Generalized Markup Language), which allows you to produce LaTeX, HTML, GNU info, LyX, RTF, and plain ASCII (via groff) from a single source; due to the flexible nature of SGML many other target formats are possible. This system is tailored for writing technical software documentation, an example of which are the Linux HOWTO documents. However, there is nothing Linux-specific about this package; it can be used for many other types of documentation on many other systems. It should be useful for all kinds of printed and online documentation. The package was formerly called Linuxdoc-SGML because it originates from the Linux Documentation Project (LDP). The name has been changed into SGML-Tools to make it clearer that there is no Linux-specific stuff included in this package." Currently [April 1997] maintained by Cees de Groot.
Links:
[CR: 19970630] [Table of Contents]
See the main DSSSL entry for fuller information about sample application profiles, stylesheets, etc. In particular, see the DSSSL stylesheet for HTML 3.2 printouts, submitted by Jon Bosak, and for the TEI-Lite DSSSL stylesheet, from Richard Light.
[CR: 19970528]
jade
, that combines the style engine with the spgrove grove interface and four backends: "(a) a backend that generates an SGML representation of the flow object tree; (b) a backend that generates RTF (tested with Microsoft Word 97); (c) a backend that generates TeX; (d) a backend that generates SGML. This is used in conjunction with non-standard flow object classes to generate SGML, thus allowing Jade to be used for SGML transformations."[CR: 19970603]
"This tool, which embeds a full R4RS Scheme interpreter in James Clark's SP parser, is designed both to provide an online syntax checker for all DSSSL expression, style and transformation language programs, and to serve as a preprocessor for any Scheme-embedded DSSSL implementation." [from version 1.0 announcement] "Version 2.0, providing a much richer implementation framework, including the ocre query language, is scheduled for 2Q97."
[CR: 19970602]
The announcement from R. Alexander Milowski (Copernican Solutions Incorporated) describes the DSSSL Developer's Toolkit (DSSSLTK) version 1.0, available as a downloadable distribution. The toolkit "is similar in nature to the applet or serverlet architectures developed by Sun Microsystems/JavaSoft. . . a set of abstract interfaces written in Java to allow application developers to work with different Java-based DSSSL environments. . .[it] serves as an interface between difference DSSSL components. It represents an architecture for building DSSSL-oriented systems using the Java programming language. . .[it] provides a means for different DSSSL implementations in Java to share components such as parsers, transformation engines and flow object semantics. The toolkit contains three Java packages: dsssl.engine, dsssl.grove, and dsssl.flowobject. . . Developed as part of the Seng DSSSL Environment from Copernican Solutions, the SSSL Developer's Toolkit contains: (1) Full source code to the interfaces and classes; (2) Javadoc for the API reference; (3) Configuration and makefile utilities for building the distribution; (4) A prebuilt zip file containing all the classes."
Links:
[CR: 19970307]
"This program generates skeleton DSSSL specifications for DTDs from within PSGML. Emacs and PSGML are required."
[CR: 19970609]
"This file ]psgml-jade.el] is an add-on to the psgml package for editing SGML files with Emacs which is intended to make menu-driven processing SGML files with jade and jadetex possible."
[CR: 19970624]
"Jadetex package, an implementation of the TeX skeleton produced by "jade -t tex". . built on top of LaTeX. From Sebastian Rahtz (s.rahtz@elsevier.co.uk) and David Megginson: