                           --==| UNHTML v1.3 |==--

         (C)opyright 1996 by Jawed Karim <kari0022@gold.tc.umn.edu>



What's New
==========

UNHTML 1.3 has several improvements over 1.0 :

        
        o The output files contain fewer empty lines, thus
          reducing its size.

        o An ELF executable for Linux is included.

        o An editor can be launched after completion to
          manually edit the output file.

        o UNHTML counts how many HTML tags were removed.

        o Special character symbols '&' and ';' no longer
          cause trouble within '<' and '>'.


Instructions
============

XXXX unhtml v1.3 -- Removes HTML code from ascii files.
(C)opyright 1996 by Jawed Karim <kari0022@gold.tc.umn.edu>

syntax: unhtml <inputfile> <outputfile>

        
        <inputfile> : The file that contains HTML code.

        <outputfile>: After removing the HTML code, the text
                      will be written to this file.


        EXAMPLE: unhtml index.html index.txt

        Will remove any HTML code from index.html and write plain text
        to file index.txt.

        After completion, the following message will be displayed:

---------- Done. Removed 110 HTML tags ----------

edit index.txt manually [y] ?

        If you would like to edit the output file manually with a text
        editor, press 'y' at this point. If not, just hit enter. UNHTML
        will execute a batch file, depending on which system you are 
        using.

        under Linux: command 'pico'  will be executed
        under MSDOS: command 'edit'  will be executed
        under OS/2 : command 'tedit' will be executed

        Should you get an error message under MSDOS or OS/2, make a 
        batchfile that points to an editor such as the following 
        example of a DOS BATCHFILE:

        ---CUT HERE---
        c:\dos\edit %1
        ---CUT HERE---         
        
        Save this file as 'EDIT.BAT' in the same path as UNHTML, or have
        it in a path that is contained in your PATH variable.
           
        Accordingly the OS/2 BATCHFILE would look like this:

        ---CUT HERE---
        c:\os2\tedit.exe %1
        ---CUT HERE---
        
        Save this file as 'TEDIT.CMD' in the same path as UNHTML, or have
        it in a path that is contained in your PATH variable.

        Under Linux, if you get an error message, make a symbolic link
        that points to whichever editor you use. Name the link 'pico'.
        For more help, see: man ln


OS/2 Warp
=========

Compiler used: OS/2 EMX GCC v2.7.2

This executable requires you to have the EMX Runtime version v0.9b or
higher. It is available at:

ftp://hobbes.nmsu.edu/os2/unix/emx09b/emxrt.zip

This is worth getting since you will be able to use long filenames with
UNHTML for OS/2.


Linux
=====

Compiler used: GNU GCC v2.7.0

This ELF executable has been tested under Linux 1.2.13.


MSDOS
=====

Compiler used: djgpp GCC v2.6.3

Unless you are running UNHTML for MSDOS in an OS/2, or Windows(95/3.1/NT)
DOS window, you need to have the file CWSDPMI.EXE in your path variable,
or in the same directory as UNHTML.


Known Problems
==============

Right now, UNHTML assumes that HTML code follows after any '&' or '<'
character and is terminated with ';' or '>'. The exception to this is the
case where '&' or ';' appear within '<' and '>'. Therefore, any of these
characters that are not part of an HTML tag may cause problems.


Where to find updates
=====================

New UNHTML versions will be posted on:

http://umn.edu/~kari0022

or search for "Jawed Karim" on Yahoo! (http://www.yahoo.com)

or email Jawed Karim at:

Jawed.Karim-1@umn.edu
kari0022@gold.tc.umn.edu

-----BEGIN PGP PUBLIC KEY BLOCK-----
Version: 2.6.2

mQBtAzAEEsYAAAEDAKkXRZuRhuJ919uqvT4jzBRNw5Xi6+N5uH3QIoyPR1qeA3NW
60ji+3Yo2lOewzKrw0z8Aon5KsCfR/dAYJKpWIbQCI9WEedArFRxP48ClsHneWB9
VYmMQnpu4PUi2KOHDQAFEbQmSmF3ZWQgS2FyaW0gPGthcmkwMDIyQGdvbGQudGMu
dW1uLmVkdT4=
=O8+H
-----END PGP PUBLIC KEY BLOCK-------

