Producing HTML Documents with Scribe
The material in this section is only interesting if you want to modify the
HTML library, and need to understand how it works.
This turns out to be relatively simple, with just a few tricks.
The HTML raw font is normal, except that the four special characters &, <,
>, and " are redefined to emit the HTML entity codes &, <, >,
and ".
To actually emit the raw characters &, <, >, and ", a raw font,
HTMLsymbols, is defined that maps all of these characters directly. For most
uses, the commands in Section 8 should be used. If you are desperate, you can
use the @H(char) environment, which will emit the raw text.
Environments that map directly to HTML constructs, such as itemize, enumerate,
example, etc., are handled by specifying BeforeEntry and AfterExit to insert
the tags. Per-paragraph tags, such as <LI> and <DT> are put in using the
Numbered environment attribute.
Headings and chapter title environments are handled the same way, with the
<H1>, <H2>, etc. tags.
The <P> separator between paragraphs is inserted by making the text
environment numbered, with the string <P>. Since both the text and
list-making environments use the numbered attribute to insert tags, using
@multiple to group multiple paragraphs into one list item is not possible.
Scribe counts each character output to the .html file as a significant
character -- it thinks that the string "<B>bold</B>" is 11 characters long,
even though it will be seen by the user as only 4 characters. This can
severely disrupt the alignment of text in formatted environments. When anchors
are involved, where dozens of characters may be "invisible", line breaks may be
affected enough to make the formatting intolerable.
The solution used in the HTML device definition is that all such tags are
output with tag, which makes Scribe output the tag, then the same number of
backspace characters, so that it treats the string as having zero width. The
awk script removes the backspace characters.
A better solution would have been to define a font for printing HTML tags
with, that has zero-width characters. Unfortunately, Scribe didn't properly
interpret this when we tried it.
The Scribe facility for generated portions is heavily used. The most
important generated portion is tags, which contains @string definitions for
cross references. A combination Scribe/HTML tag is generated for each:
This combination tag consists of the following components:
For chapters and sections, the Scribe tag and the HTML anchor are placed
slightly differently: The Scribe tag is located after the @Section or @Chapter
command, so that it will have the correct number associated with it. The HTML
anchor is placed immediately before the @Section or @Chapter command, so that
the heading will be visible when the link is referenced.
The following strings are defined in the tags generated portion:
For each chapter (inline or not), a generated portion is used to contain the
hypertext links to the sections in that chapter. The @HTMLcontents textform
does a @backplace of the portion to insert the chapter contents at that point.
12. How the Conversion and Structuring Works
12.1. The HTML Device Type
12.2. HTML tags vs. Scribe Character Counting
12.3. Document Structuring and Links
Copyright (C) 1993, 1994 Digital Equipment Corporation
Glenn Trewitt, trewitt@pa.dec.com