Scribe to HTML: How the Conversion and Structuring Works

Producing HTML Documents with Scribe

(Prev) (No Next)

12. How the Conversion and Structuring Works

The material in this section is only interesting if you want to modify the HTML library, and need to understand how it works.

12.1. The HTML Device Type

This turns out to be relatively simple, with just a few tricks.

The HTML raw font is normal, except that the four special characters &, <, >, and " are redefined to emit the HTML entity codes &amp;, &lt;, &gt;, and &quot;.

To actually emit the raw characters &, <, >, and ", a raw font, HTMLsymbols, is defined that maps all of these characters directly. For most uses, the commands in Section 8 should be used. If you are desperate, you can use the @H(char) environment, which will emit the raw text.

Environments that map directly to HTML constructs, such as itemize, enumerate, example, etc., are handled by specifying BeforeEntry and AfterExit to insert the tags. Per-paragraph tags, such as <LI> and <DT> are put in using the Numbered environment attribute.

Headings and chapter title environments are handled the same way, with the <H1>, <H2>, etc. tags.

The <P> separator between paragraphs is inserted by making the text environment numbered, with the string <P>. Since both the text and list-making environments use the numbered attribute to insert tags, using @multiple to group multiple paragraphs into one list item is not possible.

12.2. HTML tags vs. Scribe Character Counting

Scribe counts each character output to the .html file as a significant character -- it thinks that the string "<B>bold</B>" is 11 characters long, even though it will be seen by the user as only 4 characters. This can severely disrupt the alignment of text in formatted environments. When anchors are involved, where dozens of characters may be "invisible", line breaks may be affected enough to make the formatting intolerable.

The solution used in the HTML device definition is that all such tags are output with tag, which makes Scribe output the tag, then the same number of backspace characters, so that it treats the string as having zero width. The awk script removes the backspace characters.

A better solution would have been to define a font for printing HTML tags with, that has zero-width characters. Unfortunately, Scribe didn't properly interpret this when we tried it.

12.3. Document Structuring and Links

The Scribe facility for generated portions is heavily used. The most important generated portion is tags, which contains @string definitions for cross references. A combination Scribe/HTML tag is generated for each:

This combination tag consists of the following components:

For chapters and sections, the Scribe tag and the HTML anchor are placed slightly differently: The Scribe tag is located after the @Section or @Chapter command, so that it will have the correct number associated with it. The HTML anchor is placed immediately before the @Section or @Chapter command, so that the heading will be visible when the link is referenced.

The following strings are defined in the tags generated portion:

tag-file
HTML file that contains the tag.
tag
Name of the Chapter or Section (Only for chapter and section tags.)
tag-next
Next HTML file after this one. (Only for chapter/section tags that start a new file.)
tag-nextc
HTML file for the chapter after this one. (Only for chapter tags that start a new file.)

For each chapter (inline or not), a generated portion is used to contain the hypertext links to the sections in that chapter. The @HTMLcontents textform does a @backplace of the portion to insert the chapter contents at that point.


Copyright (C) 1993, 1994 Digital Equipment Corporation
Glenn Trewitt, trewitt@pa.dec.com