                                     8                                         WASD HTTP Server9                                         -"Nuts and Bolts"       %                    5th September 1998   A                    Minor revision for v5.2 (still needs much more I                    work . . . ever see Michael Keaton in "Multiplicity"?)                       Supercedes:  *                      1st March 1998 (v5.0)-                      2nd November 1997 (v4.5)                       Abstract   A                    This document is a brief technical overview of C                    the design and coding for the WASD VMS HyperText H                    Transport Daemon. (New for version 5.0 release (MarchD                    1998) . . . actually unbundled from the Technical8                    Overview into a document of its own.)  H                    Also see "WASD Technical Environment" for informationG                    on server configuration and "WASD Hypertext Environ- H                    ment" for information on using the WASD VMS Hypertext                    Services.  H                    It is strongly suggested those using printed versionsI                    of this document also access the Hypertext version. It @                    provides online access to some examples, etc.                      Author   !                    Mark G. Daniel 8                    Senior Information Technology Officer2                    Wide Area Surveillance Division>                    Defence Science and Technology Organisation  2                    Mark.Daniel@dsto.defence.gov.au                 )                    +61 (8) 82596031 (bus) )                    +61 (8) 82596673 (fax)                       PO Box 1500                    Salisbury'                    South Australia 5108                       Printed Copy   F                    This book is available for printing to a PostScriptH                    printer. Use a browser to access a print menu in this!                    same location.   E                    Some of the online demonstrations may not work due C                    to the local organisation of the Web environment G                    differing from WASD where it was originally written.                                                                                  ii                 .                    WASD VMS Hypertext Services  8                    Copyright  1996-1998 Mark G. Daniel.  I                    This package is free software; you can redistribute it F                    and/or modify it under the terms of the GNU GeneralC                    Public License as published by the Free Software E                    Foundation; version 2 of the License, or any later                     version.   G                    This package is distributed in the hope that it will H                    be useful, but WITHOUT ANY WARRANTY; without even theG                    implied warranty of MERCHANTABILITY or FITNESS FOR A I                    PARTICULAR PURPOSE. See the GNU General Public License $                    for more details.  E                    You should have received a copy of the GNU General H                    Public License along with this package; if not, writeG                    to the Free Software Foundation, Inc., 675 Mass Ave, ,                    Cambridge, MA 02139, USA.                       Eric A. Young  B                    This package can include cryptographic softwareI                    (SSLeay) copyright by Eric Young (eric@CryptSoft.com):   ]                       This library is free for commercial and non-commercial use provided ... N                       Eric Young should be given attribution as the author ...2                       copyright notice is retained  #                    MadGoat Software   H                    For supporting non-Digital-TCP/IP (UCX) this software>                    uses the NETLIB package by Matthew Madison:  H                       permission is granted to copy and redistribute ...,                       for no commercial gain  (                    Ohio State University  G                    This package contains software provided with the OSU F                    (DECthreads) HTTP server package, authored by David                    Jones:   D                       Copyright 1994,1997 The Ohio State University.V                       The Ohio State University will not assert copyright with respectT                       to reproduction, distribution, performance and/or modificationS                       of this program by any person or entity that ensures that all U                       copies made, controlled or distributed by or for him or it bear S                       appropriate acknowlegement of the developers of this program.   N                                                                            iii                 $                    RSA Data Security  G                    This software contains code derived in part from RSA &                    Data Security, Inc:  \                       permission granted to make and use derivative works provided that suchW                       works are identified as "derived from the RSA Data Security, Inc. ]                       MD5 Message-Digest Algorithm" in all material mentioning or referencing '                       the derived work.   #                    Bailey Brown Jr.   G                    LZW compression is implemented using code derived in I                    part from the PBM suite. This code is copyright by the #                    original author:   X                       * GIF Image compression - LZW algorithm implemented with Tree type:                       *                         structure.L                       *                         Written by Bailey Brown, Jr.H                       *                         last change May 24, 1990?                       *                         file: compgif.c                        * \                       *  You may use or modify this code as you wish, as long as you mention7                       *  my name in your documentation.                       Other  G                    OpenVMS, Digital TCP/IP Services for OpenVMS, VAX C, %                    DEC C, VAX and AXP A                    are registered trademarks of Digital Equipment                     Corporation.   G                    MultiNet is a registered trademark of Cisco Systems,                     Inc.   H                    Pathway is a registered trademark of Attachmate, Inc.  H                    TCPware is a registered trademark of Process Software                    Corporation.                                    iv                       P                Contents_________________________________________________________  P                Chapter_1__Brief_Introduction_to_HTTPd_Code______________________      P                Chapter_2__General_Design________________________________________  P                2.1    Server Behaviour.......................................2-1  P                2.2    Multi-Threaded.........................................2-2  P                2.3    ASTs...................................................2-3  P                2.4    Tasks..................................................2-3  P                2.5    Memory Management......................................2-4  P                2.6    Output Buffering.......................................2-5  P                2.7    Rule-Mapping...........................................2-6  P                2.8    Auto-Scripting.........................................2-6  P                2.9    Internal Directives and "Scripts"......................2-6  P                2.10   Server Security and Privileges.........................2-7  P                Chapter_3__HTTPd_Modules_________________________________________  P                3.1    ADMIN.C................................................3-2  P                3.2    AUTH.C.................................................3-2  P                3.3    BASIC.C................................................3-3  P                3.4    CACHE.C................................................3-3  P                3.5    CGI.C..................................................3-5  P                3.6    DCL.C..................................................3-5  P                3.7    DECNET.C...............................................3-9  P                3.8    DESCR.C...............................................3-11  P                3.9    DIGEST.C..............................................3-12  P                3.10   DIR.C.................................................3-12  P                3.11   FILE.C................................................3-13  P                3.12   HTADMIN.C.............................................3-15  P                3.13   HTTPD.C...............................................3-16  P                3.14   ISMAP.C...............................................3-16  P                3.15   LOGGING.C.............................................3-17  P                3.16   MENU.C................................................3-18  P                3.17   MSG.C.................................................3-19  P                3.18   NET.C.................................................3-20  P                3.19   PUT.C.................................................3-21  P                3.20   REQUEST.C.............................................3-22  1                                               iii                  M             3.21   SESOLA.C..............................................3-23   M             3.22   SSI.C.................................................3-25   M             3.23   STMLF.C...............................................3-26   M             3.24   SUPPORT.C.............................................3-27   M             3.25   UPD.C.................................................3-27   M             3.26   VM.C..................................................3-27                                                                               .                                             iv                     M             Chapter__1_______________________________________________________   ,             Brief Introduction to HTTPd Code    K                This document is designed to be only a broad overview of the M                basic functionality of the HTTP server. It also does not cover F                the full suite of WASD VMS Hypertext Services software.  J                1997 NOTE: Some code descriptions have become very brief inG                the transition from version 3 to version 4 of the HTTPd, D                during which some significant changes occured to dataJ                structures and processing. Also I have not been rigorous inI                my revision of the descriptions, so they may be misleading L                (though not grossly). Apologies! I'll blame time constraints,M                but more probably this is a rationalization of procrastination                 :^)  H                1998 NOTE: Although updated for the version 5 server thisL                document is still far from complete or satisfactory . . . but@                I'll have to stick to my 1997 excuses I'm afraid.  8                The source code should also be consulted.  &                <online hypertext link>                                              N                                          Brief Introduction to HTTPd Code  1-1                     M             Chapter__2_______________________________________________________                General Design                  2.1 Server Behaviour  K                The HTTPd executes permanently on the server host, listening M                for client connection requests on TCP/IP port 80 (by default). M                It provides concurrent services for a (technically) unlimitted L                number of clients (constrained only by the server resources).G                When a client connects the server performs the following                 tasks:   G                1. creates a thread for this request (this term does not G                   denote the use of DECthreads or other specific thread G                   library, just a thread of execution, see Section 2.2)   ;                2. reads and analyzes the HTTP request sent, >                   depending on the nature of the request . . .  I                   o  initiates I/O-driven transfer of the requested file, G                      either from the file system or from the file cache   C                   o  initiates I/O-driven processing of an SSI file   G                   o  initiates I/O-driven interpretation of a menu file   ;                   o  initiates I/O-driven directory listing   I                   o  initiates I/O-driven processing of a clickable-image !                      mapping file   F                   o  initiates I/O-driven file/directory create/update  ?                   o  initiates I/O-driven server administration   @                   o  initiates I/O-driven web file-system update  F                   o  spawns a subprocess to execute a CGI script with:  J                     -  SYS$COMMAND and SYS$OUTPUT assigned to intermediate4                        mailboxes (essentially pipes)  J                     -  SYS$INPUT logical name providing a mailbox allowingJ                        the script to read the raw HTTP header and body (if                        any)   L                     -  CGIPLUSIN logical name providing a mailbox allowing a;                        CGIplus script to read CGI variables   K                     -  CGI-compliant symbols representing the important CGI /                        variables of the request   N                                                            General Design  2-1                 <                     -  for the life of the subprocess HTTPd:  L                        o  controls the essential behaviour of the subprocess5                           via its SYS$COMMAND mailbox   H                        o  receives data written by the subprocess to itsM                           SYS$OUTPUT via the associated mailbox, writing this '                           to the client   H                   o  connects to a DECnet object to execute a CGI or OSU                      script   I                3. closes the connection to the client and disposes of the (                   thread data structures  L                For I/O intensive activities like file transfer and directoryI                listing, the AST-driven code provides an efficient, multi- J                threaded environment for the concurrent serving of multiple                clients.                2.2 Multi-Threaded  H                The WASD HTTPd is written to exploit VMS operating systemK                characteristics allowing the straight-forward implementation H                of event-driven, multi-threaded code. Asynchronous SystemL                Traps (ASTs), or software interrupts, at the conclusion of anL                I/O (or other) event allow functions to be activated to post-M                process the event. The event traps are automatically queued on K                a FIFO basis, allowing a series of events to be sequentially H                processed. When not responding to an event the process isI                quiescent, or otherwise occupied, effectively interleaving M                I/O and processing, and allowing a sophisticated client multi-                 threading.   J                Multi-threaded code is inherently more complex than single-K                threaded code, and there are issues involved in the synchro- I                nization of some activities in such an environment. Fortu- H                nately VMS handles many of these issues internally. AfterK                connection acceptance, all of the processing done within the M                server is at USER mode AST delivery level, and for all intents M                and purposes the processing done therein is atomic, implicitly 7                handling its own synchronization issues.   G                The HTTPd is written to make longer duration activities,"G                such as the transfer of a file's contents, event-driven. L                Other, shorter duration activites, such as accepting a client=                connection request, are handled synchronously.   H                It is worth noting that with asynchronous, and AST-drivenI                output, the data being written must be guaranteed to exist L                without modification for the duration of the write (indicatedK                by completion AST delivery). This means data written must be                2-2  General Designs                 K                static or in buffers that persist with the thread. Function- M                local (automatic) storage cannot be used. The server allocates E                dynamic storage for general (e.g. output buffering) or 5                specific (e.g. response headers) uses.                  2.3 ASTs  I                With server functions having AST capability, in particular I                $QIO, the server is designed to rely on the AST routine to G                report any error, including both those that occur during F                the IO operation and any that occur when initiating theH                IO (which would normally prevent it being queued) even ifF                that requires directly setting the IO status block withJ                the offending status and explicitly declaring the AST. ThisL                eliminates any ambiguity about under what conditions ASTs are9                delivered . . . ASTs are always delivered.   J                If a call to a server function with AST capability does notL                supply an AST routine then it must check the return status toK                determine whether it can continue processing. If it suppliesgG                an AST routine address then it must not act on any errorrH                status returned, it must allow the AST routine to process7                according to the IO status block status.                2.4 Tasksa  L                Each request can have one or more tasks executed sequentiallyF                to fullfil the request. This occurs most obviously withK                Server-Side Includes (SSI, the HTML pre-processor) but also, M                to a more limited extent, with directory listing and its read-MK                me file inclusion. A task is more-or-less defined as one of:e  0                o  transfer file from file-system  *                o  transfer file from cache  #                o  directory listing   $                o  SSI interpretation  %                o  menu interpretationn  4                o  DCL execution or script processing  *                o  DECnet script processing  %                o  POST/PUT processingl  ,                o  update facility processing  E                Each one of the associated modules executes relatively K                independently. Before commencing a task, a next-task pointer D                can be set to the function required to execute at the  N                                                            General Design  2-3 i  a            E                conclusion of that task. At that conclusion, the next- I                task functionality checks for a specified task to start or K                continue. If it has been specified control is passed to that -                next-task function via an AST.   K                Some tasks can only be called once per request. For example,rJ                image mapping, file transfer using cache, file upload, menu                interpretation.  L                Other tasks have the possibility of being called within otherL                tasks or multiple times serially during a request. An exampleG                is the transfer file task (non-cache), which can be usedpJ                within directory listings to insert read-me files, and whenD                <!-#includeing multiple files within an SSI document.  M                Two tasks, the directory listing and SSI interpretation tasks, H                can be called multiple times and can also have concurrentJ                instances running. For example, an SSI file can <!-#includeH                another SSI file, nesting the SSI execution. The same SSIL                document can have an embedded directory listing that containsJ                an SSI read-me file with another directory listing. Can getJ                quite convoluted! The tasks are inplemented using a linked-H                list FILO stack allowing this nesting. SSI documents haveI                a maximum depth for nesting, preventing recursive documente                inclusion.   !             2.5 Memory Management   M                Memory management is exclusively done using VMS system library I                virtual memory routines. Using these rather that generic CaI                library routines is a deliberate design decision, and done 1                with the following considerations.a  E                o  The library routines allow a more precise integrityrJ                   checking and error reporting for both the allocation and3                   freeing of dynamic memory chunks.e  J                o  Separate zones provide some measure of isolation betweenJ                   threads of usage and in this way assist in isolating any)                   errors in memory usage.   M                o  Separate zones may be created with characteristics tailoredoL                   to specific memory request profiles, reducing overhead and(                   improving performance.  E                o  A separate zone may be used for each request thread I                   improving deallocation performance at request disposal.   J                o  Memory behaviour for the various aspects of server usageI                   is more easily monitored where separate zones represent "                   distinct usages.               2-4  General Designy m  i            H                Per-request memory is managed in three distinct portions.  M                1. A fixed-size structure of dynamic memory is used to contain J                   the core request thread data. This is released at threadL                   disposal. This is allocated from a specific virtual memory:                   zone tailored for fixed-size management.  M                2. A heap of dynamically allocated memory is maintained duringk1                   the life of a thread structure.i  E                   When a dynamic structure is required during requestiK                   processing it is allocated from a request-thread-specific F                   zone of virtual memory. This list is released in oneJ                   operation at thread disposal, making it quite efficient.K                   Maintaining a thread-specific heap of vritual memory also_:                   makes it easier to avoid memory leakage.  H                3. Per-task data structures are allocated using the above                   heap..  K                   These structures are used to store task-specific data. If.K                   a task is used multiple times within the one request (see.J                   above) the previous allocated and now finished-with (butJ                   not deallocated) task structures can be reused, reducing                   overhead..                2.6 Output Buffering  J                To reduce the number of individual network writes, and thusH                provide significant improvements in efficiency, generatedK                output can be buffered into larger packets before sending to.I                the client. Not all modules use this (e.g. File.c) and not.J                all modules use it all of the time, but all modules work toL                implement a seamless integration of output via this mechanism/                (best seen in the SSI.c module).2  L                The output buffer functionality underwent a complete redesignI                for v5.0. It is now based on a list of one or more buffers -                that can be used in two modes..  E                1. When both an AST address and data to be buffered is.F                   supplied the buffering function operates to fill oneJ                   entire buffer, overflowing into a second linked into theI                   list. When that overflow occurs the first is written to3K                   the network asynchronously (calling the supplied AST when I                   complete) and the second moved to the head of the list, D                   effectively to the front of the buffer, and so on.  E                2. When no AST address is supplied with the data to be.I                   buffered, it keeps on filling buffers and adding others.I                   to the tail of the list as required, creating a virtual..                   buffer with no fixed length.  N                                                            General Design  2-5    3            I                The first mode is used for general buffering (e.g. SSI and E                directory listings), streaming data to the client in a G                sequence of larger aggregates. The second mode is useful J                for functions that must block (e.g. those reporting on dataH                structures such as the file cache), write a lot of outputJ                for a report, and not want to block general server activityJ                for a long-ish period due to network throughput (e.g. againI                the caching reports). In these cases the entire report can.G                be written to buffer, then simply asynchronously output,.8                unblocking any resource it may have held.                 2.7 Rule-Mapping  K                A fundamental aspect of any HTTPd implementation is the rule.J                mapping used to create a logical structure for the Web fileL                system. The HTTPd mapping function is designed to be flexibleM                enough that script programs can also use it. As a result it is M                text-file based, and opened and read when mapping. This method K                of mapping provides a good deal of flexibility, coupled with_M                acceptable performance. The function has received a high level 8                of attention in an effort to optimize it.               2.8 Auto-Scripting  I                The WASD VMS HTTP server has the facility to automatically J                invoke a script to process a non-HTML document (file). ThisH                facility is based on detecting the MIME content data typeJ                (via the file's extension) and causing a transparent, localM                redirection, invoking the script as if it was specified in theu                 original request.  1             2.9 Internal Directives and "Scripts"   G                The HTTPd server detects certain paths and query stringsyC                as directives about its behaviour. Certain paths arerK                interpreted as pseudo, or internal scripts, handled internal F                to the server. Other directives are passed in the queryI                string component of the request, and as reserved sequencestJ                cannot occur in normal requests (an unlikely combination of-                characters has been selected).e                               2-6  General Design                  /             2.10 Server Security and Privileges   L                As a major security design criterion the WASD environment hasM                specified the use of a non-privileged, non-SYSTEM, non-system- H                group server account. In this way it begins with a fairlyJ                restricted and safe base, resources limited to those world-J                accessable or explicitly allowed to the server account. ForI                access to selected, essential resources (such a privileged H                IP ports, for example 80) selected privileges are enabledI                only on an as-required basis, then as soon as the need for K                that privilege has passed disabled. Hence, the executable issL                installed with the minimum required extended privileges whichH                are operating and used only as required during the courseK                of processing. The server program is almost always executingeL                with only NETMBX and TMPMBX enabled . . . in other words as a+                completely average VMS user!o  G                Extended privileges are required for the purposes listed                 below:   K                o  ALTPRI -  Allows the server account to raise it's prioity A                   above 4 if enabled by the /PRIORITY= qualifier.   I                o  SYSPRV -  Used for various purposes, including creating I                   sockets within the privileged port range (1-1023) which K                   includes port 80 of course. Accessing configuration filesoK                   (which can be protected from world access). To ensure therM                   server can stream-LF convert a file. It is also extensively L                   used to enable AUTHORIZED write access to the file system.G                   If the authorization configuration is set up to allow H                   write access to selected portions of the Web-space (byI                   default it's not and up to the local site to configure) H                   SYSPRV is enabled just before a file is sys$create()edL                   and then immediately disabled. If SYSUAF authentication isL                   enabled (by default it is not) then SYSPRV is enabled justM                   before sys$getuai() is used to check a user's password then '                   immediately disabled.s  J                o  SYSNAM -  Allows the server to write into the LNM$SYSTEMF                   logical name table (for HTTPDMON-required logicals).  M                o  PRMMBX -  Used by the subprocess scripting module to create J                   permanent mailboxes (much more efficient that creating a7                   new set with each script subprocess).r  L                o  PSWAPM -  Allows the server process to prevent itself fromL                   being swapped out if specified by the /[NO]SWAP qualifier.  H                Not that the author doesn't have at least some confidenceI                in his code ;^) but has also placed a sanity checker whichsJ                when the server becomes quiescent establishes that only theM                NETMBX and TMPMBX privileges are enabled. The server will exit K                with an error message if any extended privileges are enabledn  N                                                            General Design  2-7                 I                at the time of the check. (During development in 1997 this M                check discovered an instance where an EnableSysPrv()  call hadtJ                inadvertantly been coded instead of a DisableSysPrv()  call3                :^( so it does work in real-life :^)t  K                The capacity for the server to write into the file system is K                a major concern, and a lot of care has been taken to make itrJ                as secure as possible. Of course there is always the chanceH                of a problem :^( The main defence against a system designG                or programming problem allowing write access to the filepI                system is having the server account as a separate user andwK                group (and definitely non-SYSTEM). In this way a part of the K                file system must explicitly have write access granted to the J                server account for it to be able write into the file systemI                (or for it to have world write access ... but then what is L                the problem with server access if the world has access?) ThisH                is recommended to be done using an ACE (see the Technical                Overview).S                                                                             2-8  General Designs o  ,                M             Chapter__3_______________________________________________________T               HTTPd Modulesd    L                The HTTPd server comprises several main modules, implementingK                the obvious functionality of the server, and other, smaller,t                support modules.e  '                o  ADMIN.C (Section 3.1)   &                o  AUTH.C (Section 3.2)  '                o  BASIC.C (Section 3.3)   %                o  CGI.C (Section 3.5)b  '                o  CACHE.C (Section 3.4)o  $                o  DCL.C (Section 3.6  '                o  DECNET.C (Section 3.7i  &                o  DESCR.C (Section 3.8  (                o  DIGEST.C (Section 3.9)  &                o  DIR.C (Section 3.10)  '                o  FILE.C (Section 3.11)d  *                o  HTADMIN.C (Section 3.12)  (                o  HTTPD.C (Section 3.13)  (                o  ISMAP.C (Section 3.14)  *                o  LOGGING.C (Section 3.15)  '                o  MENU.C (Section 3.16)   &                o  MSG.C (Section 3.17)  &                o  NET.C (Section 3.18)  &                o  PUT.C (Section 3.19)  *                o  REQUEST.C (Section 3.20)  )                o  SESOLA.C (Section 3.21)i  &                o  SSI.C (Section 3.22)  (                o  STMLF.C (Section 3.23)  *                o  SUPPORT.C (Section 3.24)  &                o  UPD.C (Section 3.25)  N                                                             HTTPd Modules  3-1                 %                o  VM.C (Section 3.26)e  M                o  WASD.H The main header file containing all data structures,S.                   etc. <online hypertext link>               3.1 ADMIN.C   &                <online hypertext link>  !                INCOMPLETE AS YET!   J                This module provides the on-line server administration menuH                and functionality. Some administration pages are providedF                by the Upd.c module, "piggy-backed" into normal editing                dialogues.                3.2 AUTH.C  &                <online hypertext link>  !                INCOMPLETE AS YET!   K                Sorry, its fairly complex module so I'll have to plead beinge                too busy!  C                HTAdmin.c module helps administer the authentication                 databases.e  L                The authorization module handles user authentication and pathL                method authorization for all requests received by the server.I                Authenticated username/password information is cached in alJ                balanced binary tree, improving performance compared to on-=                disk checking each time a request is received.a  L                HTTPd-specific authentication databases are binary files withM                fixed-length 512 bytes records. Within the record is provision                 for:o                  o  username  +                o  VMS-hashed Basic passwordc  0                o  MD5-generated Digest passwords  ?                o  capabilities (can the user read, write, etc.)s  !                o  contact details   #                o  significant datest  &                o  significant counters  B                Server-host SYSUAF authentication is also provided.               3-2  HTTPd Modules t                            3.3 BASIC.C   &                <online hypertext link>  H                This module provides authentication functionality for the                BASIC method.               3.4 CACHE.Ci  &                <online hypertext link>  M                This module implements a file data and revision time cache. ItiI                is different to most other modules in that it doesn't have G                a "task" structure. The small amount of storage requiredoJ                is integrated into the request structure. It is designed toL                provided an efficient static document request (file transfer)J                mechanism. Unlike the file module which may interleave it'sE                activities within the those of other modules (e.g. the.I                directory module using it to provide read-me information),tA                it can only be used once, stand-alone per request.v  I                Cache data is loaded by the file module while concurrently<M                transfering that data to the original requesting client, usingeI                buffer space supplied by cache module, space that can thenoH                be retained for reuse as cache. Hence the cache load addsH                no significant overhead to the actual reading and initial$                transfer of the file.  C                Space for a file's data is dynamically allocated andmJ                reallocated if necessary as cache entries are reused. It isH                allocated in user-specifiable chunks. It is expected thisF                mechanism provides some efficiencies when reusing cacheI                entries. Memory may be reclaimed from entries twowards theyL                end of the list (least used) if this is required for the fileJ                data of one requiring loading. The process is termed memory                scavenging.  J                Cache entries are maintained in a linked list with the mostM                recent and most frequently hit entries towards the head of thezK                list. A case-insensitive hash index into the list entries isu                maintained.  J                The search is based on three factors. A simple, efficientlyG                generated, case-insensitive hash value providing a rapid I                but inconclusive index into the cached paths. Secondarily, G                the length of the two paths. Finally a conclusive, case-aK                insensitive string comparison if the previous two tests werepK                matches. When the paths do not match a collision list allowse*                rapid subsequent searching.  N                                                             HTTPd Modules  3-3 i  n            L                The linked-list organisation also allows a simple implementa-M                tion of a least-recently-used (LRU) algorithm for selecting an-L                entry when a new request demands an entry and space for cacheJ                loading. The linked list is naturally ordered from most re-L                cently and most frequently accessed at the head, to the leastK                recently and least frequently accessed at the tail. Hence an K                infrequently accessed entry is selected from the tail end ofdK                the list, it's data invalidated and given to the new request L                for cache load. Invalidated data cache entries are also imme-J                diately placed at the tail of the list for reuse/reloading.  K                When a new entry is initially loaded it is placed at the top.I                of the list. Hits on other entries result in a check beingsL                made against the number of hits of head entry in the list. IfM                the entry being hit has a higher hit count it is placed at the J                head of the list, pushing the previously head entry "down".L                If not then it is again checked against the entry immediatelyI                before it in the list. If higher then the two are swapped.dL                This results in the most recently loaded entries and the moreK                frequently hit being nearest and migrating towards the start                 of the search.m  J                To help prevent the cache thrashing with floods of requestsL                for not currently loaded files, any entry that has a suitablyJ                high number of hits over the recent past (suitably high ...K                how many is that, and recent past ... how long is that?) are H                not reused until no hits have occured within that period.L                Hopefully this prevents lots of unnecessary loads of one-offsE                at the expense of genuinely frequently accessed files.t  L                To prevent multiple loads of the same path/file, for instanceJ                if a subsequent request demands the same file as a previousI                request is still currently loading, any subsequent request K                will merely transfer the file, not concurrently load it into                 the cache.i  "                Contents Validation  J                The cache will automatically revalidate the file data afterK                a specified number of seconds by comparing the original file K                revision time to the current revision time. If different the I                file contents have changed and the cache contents declared2I                invalid. If found invalid the file transfer then continuesuL                outside of the cache with the new contents being concurrentlyG                reloaded into the cache. Cache validation is also alwaysuH                performed if the request uses "Pragma: no-cache" (i.e. asK                with the Netscape Navigator reload function). Hence there isrJ                no need for any explicit flushing of the cache under normalH                operation. If a document does not immediately reflect andM                changes made to it (i.e. validation time has not been reached)t               3-4  HTTPd Modules o  a            H                validation (and consequent reload) can be "forced" with aL                browser reload. The entire cache may be purged of cached dataJ                either from the server administration menu or using command#                line server control.a                 3.5 CGI.C   &                <online hypertext link>  J                The CGI module provides the CGI scripting support functionsF                used by the DCL.C and DECNET.C modules. These functions%                provide the following.   J                o  Using a buffer for storage, generate the DCL commands orM                   CGIplus records required to provide the script with its CGI K                   variables. For standard CGI this requires DCL commands tosM                   create DCL symbols containing the CGI variable information.hK                   DCL symbols with values up to 1024 characters are createduI                   using a series of assignments if necessary. For CGIplusdG                   scripts this involves a series of "name=value" pairs.   H                   When the buffer is returned to the calling routine, itH                   is scanned from start to finish. For standard CGI eachI                   DCL command is parsed from the buffer and passed to theeI                   subprocesses SYS$INPUT command stream. For CGIplus eachnI                   "name=value" pair is passed to the CGIplus data stream.e  L                o  When the first record is output by a script the first lineJ                   is examined for CGI compatibility. Depending on the con-L                   tents of that first line (essentially the response header)J                   subsequent server behaviour can vary. In particular, de-L                   pending on the script-stated or server-determined content-J                   type output from the script may have is carriage control                   modified.n  J                o  To provide correct carriage-control to the browser (eachK                   line terminated by a newline character) records output by L                   the script may be examined for this trailing newline. If aL                   text document/stream a newline will be added to the end of1                   a record (line) if not present.e               3.6 DCL.C-  &                <online hypertext link>  '                MAJOR REVISION FOR v4.2!   L                The DCL execution functionality must interface and coordinateK                with an external subprocess. It too is asynchronously drivenFK                by I/O once the subprocess has been created and is executing L                independently. Communication with the subprocess (IPC) is via                mailboxes.r  N                                                             HTTPd Modules  3-5 d  s            J                Process creation by the VMS operating system is notoriouslyE                slow and expensive. This is an inescapable factor wheneI                scripting using child processes within the environment. AniJ                obvious strategy is to avoid, at least as much as possible,G                the creation of subprocesses. The only way to do this isoG                to share subprocesses between multiple scripts/requests.eG                The obvious complication becomes isolating the potentialeD                interactions due to changes made by any script to theJ                subprocess' enviroment. For VMS these changes are basicallyH                symbol and logical name creation, and files opened at theJ                DCL level. In reality few scripts need to make logical nameH                changes and symbols are easily removed between uses. DCL-H                opened files are a little more problematic, but again, inK                reality most scripts doing file manipulation will be images.b  L                The conclusion arrived at is that for almost all environmentsE                scripts can quite safely share subprocesses with greatsJ                benefit to response latency and system impact. If the localM                environment requires absolute script isolation for some reason M                then this subprocess-persistance may easily be disabled with as3                consequent trade-off on performance.   G                The term zombie is used affectionately to describe these K                subprocesses when persisting between uses (the reason should J                be obvious, they are neither "alive" (processing a request)0                nor are they "dead" (deleted) :^)  7                The DCL facility is used in three modes:   6                1. To execute independent DCL commands.J                   This is used to provide DCL command output for SSI (pre-H                   processed HTML). Subprocesses will exist over multiple4                   commands (if zombies are enabled).  2                2. To execute standard CGI scripts.L                   Subprocesses will exist over multiple requests (if zombies                   are enabled).m  -                3. To execute CGIplus scripts.hM                   Subprocesses will exist over multiple requests. TechnicallyqH                   the CGIplus script only executes once and then remainsE                   blocking until a request is provided to it. It thenmL                   processes the request, provides output, then blocks again.  J                The DCL module creates a data structure that allows subpro-M                cesses to be managed independently of any request. This allows I                both CGIplus and standard CGI subprocesses (in the form of H                zombies) to persist across multiple requests. There are aL                fixed number of subprocesses that can exist for all purposes.I                This is set by the subprocess hard-limit configuration pa-cI                rameter. Each of these structures is created and populateduL                on an as-required basis, linked into a list growing from zero               3-6  HTTPd Modules s               J                to maximum based on demand and the life of the server. Sub-K                processes can come and go depending on requirements. CGIplus I                script subprocesses and any zombies are semi-permanent. In                 summary:   M                o  Four mailboxes and a SYS$OUTPUT data buffer are permanentlyt5                   associated with each DCL structure.(  L                o  With CGIplus a subprocess semi-permanently associated with$                   the DCL structure.  J                o  Other activities, standard CGI scripts and DCL commands,M                   can maintain similar semi-permanent subprocesses as zombies K                   if configured, or they create and delete a subprocess for 5                   each use if zombie use is disabled._  M                o  For a CGIplus script request the list of data structures isTI                   searched for an idle subprocess providing the requested G                   script. If one is found the request is allocated that L                   structure. If all existing are busy or none exists at thatK                   time an idle zombie (if enabled) is searched for and used L                   if found. Failing that an empty structure is searched for,I                   a subprocess created and the request allocated to that.(  F                o  For standard CGI script and DCL command execution anI                   idle zombie (if enabled) is used. Failing that an empty I                   structure (no associated subprocess) is searched for, a3G                   subprocess created and the request allocated to that.A  I                o  Structures may be added to the list until a hard-limit, L                   is reached. However when a soft-limit threshold is reachedH                   the list is searched for idle CGIplus structures (i.e.L                   subprocess executing but no associated request). The leastI                   used of any of these idle structures has the subprocess E                   deleted freeing that slot for reuse. This is termed2                   purging.  ?                The four mailboxes serve as the subprocess' IPC:                   1. SYS$COMMAND J                   This stream controls the subprocess execution, providingK                   DCL commands to the subprocess' CLI. For DCL commands and J                   standard CGI it creates DCL symbols representing the CGIC                   variables then the command or script is executed.                   2. SYS$OUTPUTJ                   The subprocess simply writes output to SYS$OUTPUT (<std-J                   out>). Due to buffering in the C RTL binary-mode streamsL                   are more efficient and faster than record-mode. See any ofK                   the CGI applications in this package for example code forLJ                   changing script <stdout> to binary-mode. CGIplus scriptsJ                   must indicate the end of a single request's output usingL                   a special EOF string which is specifically detected by the  N                                                             HTTPd Modules  3-7 m  o            M                   output functions. As this mailbox persists between requestsnI                   it is essential to ensure no output from a previous re-eM                   quest lingers in the mailbox due to request concellation ore2                   abnormal subprocess termination.                  3. SYS$INPUTaH                   For CGI script execution, available for the subprocessM                   access to the HTTP data stream as <stdin>. A synonym exists L                   for backward compatibility, the logical name HTTP$INPUT, a8                   stream which can be explicitly opened.  H                   NOTE: Versions of the server prior to 4.3 supplied theM                   full request (header then body) to the script. This was not M                   fully CGI-compliant. Versions 4.3 and following supply only M                   the body, although the previous behaviour may be explicitly D                   selected by enabling this configuration parameter.                  4. CGIPLUSINeI                   For CGIplus this mailbox provides access to a request's L                   CGI variables. The first line of any request can always beM                   discarded (for synchronization) and end-of-request vriables L                   is indicated by an empty record (blank line). For standard@                   CGI and DCL commands this mailbox is not used.  $                DCL Module Processing  D                1. The primary DCL function ensures any required fileF                   specification exists (e.g. script procedure). It theL                   allocates a slot to the request. Slot allocation is a very'                   fundamental activity:r  J                   A function writes to the CGIPLUSIN, creating a number ofM                   logical names, CGI-compliant symbol names and executing theiM                   command or invoking the execution of a DCL procedure, etc.,nJ                   and supplies the CGIplus variable stream if appropriate.K                   If the use of zombies is enabled then DCL to clean up thecD                   environment as much as possible is provided first.  M                2. When the subprocess writes to the SYS$OUTPUT stream the I/OrM                   completion AST routine associated with reading that mailboxe                   is called.  L                   If CGIplus script execution the I/O is always examined forM                   the CGIplus end-of-output signature bits. It must always be K                   at the start of the record and if detected the request is                    concluded.  L                   If CGI script execution, the first I/O from this stream isM                   analyzed for CGI-compliance. It is determined whether a raw M                   HTTP data stream will be supplied by the script, or whether M                   the script will be CGI-compliant (requiring the addition of                3-8  HTTPd Modules i               L                   HTTP header, etc.) and whether HTTP carriage-control needs6                   to be checked/added for each record.  K                   A CGI local redirection header (partial URL) is a specialnL                   case. When this is received all output from the subprocessJ                   is suppressed until the script processing is ready to beL                   concluded. At that time the "Location:" information of theJ                   header is used to reinitiate the request, using the same(                   thread data structure.  J                   When normal SYS$OUTPUT processing is complete the recordJ                   received can be handled in one of two ways. If it is rawG                   HTTP it is asynchronously written to the network. TheiI                   AST completion routine specified with the network writefF                   will queue another read from subprocess' SYS$OUTPUT.I                   If it is record-oriented I/O (e.g. from DCL output), itdG                   has it's carriage-control checked for HTTP compliancetJ                   before asynchronously writing the record. Hence a scriptK                   supplying its own raw, HTTP-compliant data stream is muchn3                   more efficient than line-by-line.I  M                   The SYS$OUTPUT stream is a little problematic. For standardeL                   CGI and DCL command execution at subprocess exit there mayK                   be one or more records waiting in the mailbox for reading H                   and subsequent writing to the client over the network,L                   delaying processing conclusion. Detection of completion isM                   accomplished by making each QIO sensitive to mailbox statuspJ                   via the SS$_NOWRITER status, which indicates there is noL                   channel assigned to the mailbox, and the mailbox buffer isM                   empty. It then becomes safe to dispose of the client threads'                   without loss of data.   K                3. If CGI-script execution is for a POST or PUT metthod, the H                   HTTP data stream made available is also AST driven. IfL                   the subprocess opens the stream and reads from it, the I/OL                   completion routine called queues another asynchronous read?                   from the buffered request header and/or body.d               3.7 DECNET.C  &                <online hypertext link>  D                The DECnet module provides scripting based on processH                management using DECnet. Both standard WASD CGI scriptingL                and an emulation of OSU (DECthreads) scripting are supported.E                Both function by activating specific DECnet object DCL K                procedures on the target node using transparent task-to-tasktH                communication. These procedures act to set up and controlI                the scripting environment and script activation. With botheL                standard WASD CGI scripting and an emulation of OSU scriptingI                being supported, separate functions, and associated objectn  N                                                             HTTPd Modules  3-9    r            I                procedures, are provided to support the dialogs associated "                with each of these.  J                The DECnet node and task specification string is determinedL                by examination of the script specification. Connection to theJ                object on the node is made asynchronously. When established0                one of two dialogs is maintained.  J                As of version 5.2 WASD provides reuse of DECnet connectionsD                for both CGI and OSU scripting, in-line with OSU v3.3H                which provided reuse for OSU scripts. This means multipleJ                script requests can be made for the cost of a single DECnetH                connection establishment and task object activation. ThisK                functionality provides substantial performance improvements. M                It is implemented by maintaining a request-independent list of M                connection information. The DECnet object procedures both have 0                code explicitly supporting reuse.                  CGI  H                Using successive execution states the CGI dialog functionE                handles interaction with the fairly simple CGIWASD.COM.K                procedure, used as the DECnet object, and the script output.   M                The procedure may be viewed here: HT_ROOT:[SRC.OSU]CGIWASD.COM &                <online hypertext link>  I                This procedure is executed within the NETSERVER process onrH                the remote node. It's function is very simple. ProcessingG                in a loop, it receives a record from the HTTP server and G                then executes it by substitution on the command line. InsI                this way DCL commands to set up the CGI environment can beeK                sent to the network process where they are executed creatingtE                a CGI environment in much the same way as is done withrJ                subprocess-based CGI scripts. Once set up, the server sendsH                the DCL command "GOTO DOIT" which causes the procedure toJ                branch out of the loop and to read one last record from theJ                server . . . the actual DCL command to activate the script.F                After the script finishes the procedure writes the end-I                of-output indicator to the server which then concludes thea                script.  K                Once the CGI environment is set up and the script activation L                DCL is sent the function assumes the role of accepting outputH                from the script over the network link, processing that asL                necessary for CGI compliance, etc., and then writing the data                to the client.o                   3-10  HTTPd Modulesi t  i                            OSU  H                The behaviour for the OSU dialog has been determined fromJ                reverse-engineering the OSU v3.1 'script_execute.c' module,8                and a certain measure of trial-and-error.  H                Using successive execution states the OSU dialog functionD                handles interaction with the standard OSU WWWEXEC.COMK                procedure, used as the DECnet object, and the script output.wM                The procedure may be viewed here: HT_ROOT:[SRC.OSU]WWWEXEC.COM &                <online hypertext link>  I                OSU scripts operate in two distinct and successive phases.r  H                1. Dialog -  During part of this phase the script has notJ                   been activated and the link is in the process of settingI                   up the script execution environment. The network object E                   (WWWEXEC.COM) can request the HTTP server to supply L                   specific data which it does by writing one or more recordsG                   to the network link. The object then searches for and '                   activates the script..  M                   The dialog phase is not yet complete for the script may nowlJ                   request the server to supply more data. The dialog phaseI                   ends when the script indicates to the server that it issK                   ready to supply output. The script then enters the outpute                   phase.  M                2. Output -  During the output phase the server is responsiblevK                   for ensuring the CGI compliance, or at least HTTP compli-aM                   ance, of that output. End-of-output is indicated by writingr8                   a special tag that the server detects.  K                   Output may be made in one of a number of modes. Basically J                   these are raw, and the script is totally responsible forF                   HTTP compliance, record, where each record must haveH                   correct carriage-control for a line enforced, and CGI,3                   where CGI compliance is enforced.o  F                   The output phase may be entered before any script isK                   activated, such as when the DECnet object needs to report F                   errors, for example the script file not being found.               3.8 DESCR.Ce  &                <online hypertext link>  !                INCOMPLETE AS YET!t  K                The Descr.c module generates a file description by searchingaJ                HTML files for the first occurance of <TITLE>...</TITLE> orJ                <Hn>...</Hn> tags, using the description provided there-in.L                It is primarily used by the directory listing module, but can/                also be used by the menu module.t  N                                                            HTTPd Modules  3-11 e  e            F                It does this search asynchronously (surprise-surprise!)  J                To asynchronously locate a description in an HTML file, theJ                file is opened and then each record asynchronously read andL                examined for the <TITLE> element. Once obtained a synchronousI                call is made to a function to list the file details. AfterdK                the file details are listed another asynchronous search call G                is made, with the file search function specified for ASTuC                completion. The function then immediately completes.n               3.9 DIGEST.C  &                <online hypertext link>  H                This module provides authentication functionality for the                DIGEST method.a  L                This module uses code derived in part from RSA Data Security,#                Inc., under licence:   J                granted to make and use derivative works provided that suchK                works are identified as "derived from the RSA Data Security,sL                Inc. MD5 Message-Digest Algorithm" in all material mentioning/                or referencing the derived work.n               3.10 DIR.C  &                <online hypertext link>  L                There is some fairly complex and convoluted behaviour in this                code!  M                This module implements the HTTPd directory listing functional-bI                ity. Directories are listed first, then files. File detailiG                format customizable, with the default resembling the de-cI                fault CERN and NCSA server layout. Output from this module H                is buffered to reduce network writes, improving I/O effi-L                ciency. HTML files have the <TITLE></TITLE> element extracted'                as a "Description" item.   #                Essential behaviour:   J                1. The primary function obtains the file specification fromL                   the request data structure. Server directives, controllingH                   some features of the directory listing beahaviour, areF                   checked for and parsed out if present. The directoryL                   listing layout is initialized. The directory specificationJ                   (path information) is parsed to obtain the directory andG                   file name/type components. After successfully parsing H                   the specification it generates an HTTP response header                   if required.               3-12  HTTPd Modulesn                 L                2. Column headings and (possibly) a parent directory item areM                   buffered in an asynchronous function call. An RMS structureoI                   is initialized to allow the asynchronous search for alltD                   files in the specified directory ending in ".DIR".  H                3. For each directory file found the directory search ASTF                   completion function is called. Status is checked forJ                   success or otherwise. If an error the status is reportedL                   to the client and the request processing concluded. If theJ                   directory contained no directory files, or the directoryM                   files are exhausted a call to a function to begin a listing F                   of non-directory files is made and the function then                   completes.  M                   If a directory file was returned a synchronous call to list H                   the details of that directory is made and then anotherF                   asynchronous search call made with an AST completion7                   function again back to this function.u  M                4. When the directory files are exhausted the RMS structure istJ                   reinitialized to allow the search for all specification-D                   matching, non-directory files in the directory. An3                   asynchronous search call is made.L  M                5. For each matching file found the file search AST completion F                   function is called. Status is checked for success orM                   otherwise. If an error the status is reported to the client M                   and the processing concluded. If the directory contained no L                   matching files, or the files are exhausted, the processingF                   is concluded and the function immediately completes.  M                   If a file was returned a call is made to the Descr.c moduleeK                   to check whether a file description can be obtained (HTML L                   files only). If it can then this module is use to generateI                   it and the function completes. If no description can beaK                   obtained a synchronous call is made to a function to listhM                   the file details. After the file details are listed anotheroJ                   asynchronous search call is made, with the same functionM                   specified for AST completion. The function then immediatelya                   completes.               3.11 FILE.C.  &                <online hypertext link>  F                This module implements the file transfer functionality.F                It obtains the file specification and mime content typeF                information from the request data structure. It handlesH                VARIABLE or VFC files differently from STREAM, STREAM_LF,I                STREAM_CR, FIXED and UNDEFINED. With STREAM(_*), FIXED andaK                UNDEFINED files the assumption is that HTTP carriage-controlyI                is within file itself (i.e. at least the newline (LF), allt  N                                                            HTTPd Modules  3-13                 H                that is required required by most browsers), and does notH                require additional processing. With VARIABLE, etc., filesH                the carriage-control is implied and therefore each recordI                requires additional processing by the server to supply it.mH                Record-oriented files will have multiple records bufferedI                before writing them collectively to the network (improvingpG                efficiency). Stream and binary file reads are by VirtualeG                Block, and are written to the network immediately makingeI                the transfer of these very efficient indeed! The essentials2                behaviour however is much the same.  K                If file caching is enabled, and this file is to be cached, aTL                pointer provides a cache entry. Storage for the file contentsI                is provided by the cache structure. Instead of loading thefK                file into temporary storage before writing to the network itsJ                is loaded into cache storage and retained at the end of theL                request. In this way a cache load adds insignificant overhead*                to a generic file transfer.  F                If conversion to STREAM_LF files is enabled this moduleK                will, upon encountering a VARIABLE or VFC file, initiate itseL                conversion to STREAM_LF record format. This is done using the6                StmLf.c <online hypertext link> module.  L                (Versions prior to 3.2 used a configuration directive for theK                MIME content-type to determine whether a file was transfered J                record-by-record or in binary. This is no longer required.)  G                1. The primary function allocates a task structure. This I                   function then gets some file information using ACP I/O. M                   If the file does not exist it immediately returns the error H                   status to the calling routine for further action (thisI                   behaviour is used to try each of multiple home pages byeI                   detecting file- not-found, for example). If it does theoG                   ACP information provides modification date/time, size I                   and record-format. If the record format is VARIABLE andiK                   STREAM-LF conversion is enabled, conversion is initiated.sI                   If the request specified an "If-Modified-Since:" header.K                   line the modification date is checked and a possible "304o3                   Not Modified" response generated.n  J                2. After successfully opening the file it generates an HTTPJ                   response header if required. It then calls one of eitherJ                   two functions to queue the first read from the file, oneG                   for variable-record files (record-oriented transfer), J                   another for stream (STREAM-LF and stream record formats)L                   text and binary files (block-oriented transfer). After theG                   read is queued it returns with a success status code.i                 3-14  HTTPd Modules                  I                3. When the asynchronous file read completes one of either H                   two AST completion functions (one for record the otherI                   for block) is called to post-process the I/O. Status is J                   checked for success or otherwise. If an error the statusM                   is reported to the client, the file closed, and the requesta#                   thread concluded.   I                   If end-of-file, the file is closed, for record-oriented J                   files the buffer checked and if necessary flushed. If anJ                   end task function was specified control is now passed to:                   that, otherwise the thread is concluded.  K                   If not end-of-file, for record files multiple records may M                   be buffered before writing to the network. If the buffer ismL                   full (the read was unsuccessful due to insufficient space)I                   the contents are asynchronously written to the network,cH                   with the network write completion routine specifying aK                   function to re-read the the file record that just failed.iL                   If there is still space in the buffer another asynchronousH                   read of the file is queued in an attempt to append theK                   next record into the buffer. After the read is queued thet%                   function completes.o  L                   If not end-of-file, for stream and binary files a success-J                   ful read results in a call to the network write functionL                   to send this to the client. This call contains the addressJ                   of the function to read the next blocks from the file asK                   an AST completion routine. After the asynchronous network 9                   write is queued the function completes.i  M                For text files the contents can be encapsulated as plain text.rJ                This involves prefixing the file send with a <PRE> HTML tagK                and postfixing it with a </PRE> tag. The buffer is filled aseL                per normal but when ready to output a function is called thatJ                escapes all HTML-forbidden characters first (e.g. "<", ">",;                "&", etc.) This is used by the SSI.C module.a               3.12 HTADMIN.C  &                <online hypertext link>  !                INCOMPLETE AS YET!s  H                The HTAdmin.c module allows on-line administration of the7                HTTPd-specific authentication databases.y          N                                                            HTTPd Modules  3-15 t  C                         3.13 HTTPD.C  &                <online hypertext link>  D                This is the main()  module of the server. It performsJ                server startup and shutdown, along with other miscellaneous                functionality.                3.14 ISMAP.C  &                <online hypertext link>  M                The clickable-image support module provides this functionalitytE                as an integrated part of the server. It supports image F                configuration file directives in either of NCSA or CERNM                formats. Extensive configuration specification error reporting $                has been implemented.  6                                        Acknowlegement:  A                      Three coordinate mapping functions have beens@                      plagiarized from the NCSA IMAGEMAP.C scriptC                      program. These have been inserted unaltered inwB                      the module and an infrastructure built around?                      the essential processing they provide. Due ?                      acknowlegement to the original authors andtH                      maintainers of that application. Any copyright over?                      portions of that code is also acknowleged:   $                        ** mapper 1.2K                        ** 7/26/93 Kevin Hughes, kevinh@pulua.hcc.hawaii.eduec                        ** "macmartinized" polygon code copyright 1992 by Eric Haines, erich@eye.com   #                Essential behaviour:   E                1. The primary function allocates a task structure and F                   then attempts to open the map configuration file. IfI                   unsuccessful it generates an error report and concludesa                   processing.   F                2. After successfully opening the configuration file itH                   extracts the client-supplied coordinate from the queryM                   string. A call is then made to asynchronously read a recordlH                   (line) from the configuration file. Configuration file=                   processing is asynchronous from that point.   H                3. The record (line) read AST function checks for end-of-J                   file, when it will return the default URL (if supplied).L                   After end-of-file the file is closed and the processing is                   concluded.               3-16  HTTPd Modulesd d  e            G                   If not end-of-file, a function is called to parse themL                   record for an image mapping directive. When the componentsK                   have been parsed the NCSA IMAGEMAP.C routines are used tooM                   determine if the click coordinates are within the specifiedc%                   region coordinates.e  F                   If it is within the region the click has been mappedG                   and the URL is placed in heap memory and the thread'srE                   redirection location pointer set to it. The file istL                   closed and the processing conclusion function called. ThisJ                   function detects the redirection location and if a localI                   URL instead of disposing of the thread generates a new, I                   internal request from the redirection information. In a M                   non-local URL the client is sent a redirection response and ,                   then the thread concluded.  I                   If not within the region a call is made to asynchronous C                   read the next record from the configuration file.b               3.15 LOGGING.C  &                <online hypertext link>  F                The logging module provides an access log (server logs,K                including error messages are generated by the detached HTTPd                 process.b  E                The access log format can be that of the Web-standard, L                "common"-format, "common+server"-format or "combined"-format,M                along with user-definable formats, allowing processing by mosti"                log-analysis tools.  C                The "common"-format entries (record, line) comprise:i  Z                   client_host r_ident auth_user [time] "request" reponse_status bytes_sent                  where:   B                o  client_host is from where the request originated  E                o  r_ident is the user identified by the client host's K                   authentication daemon (RFC931), this is not available and *                   is always a hyphen ("-")  K                o  auth_user the authenticated user-name associated with ther,                   request, or a hyphen ("-")  H                o  time the following format: dd/mmm/yyyy:hh:mm:ss +/-GMT6                   (e.g. "16/Dec/1995:21:15:34 +10:30")  J                o  request the method, a space, then the path and any query                   string  L                o  response_status the three digit response status code (e.g.                   200, 302)e  N                                                            HTTPd Modules  3-17                 I                o  bytes_sent the number of bytes sent to the client, or aE                   hyphen ("-")  J                The "common+server"-format entry appends the server name toB                the common-format entry (for multi-homed services).  G                The "combined"-format entry appends quote-delimited, thecJ                referer and then the user-agent to the common-format entry.  H                In addition to legitimate request entries the server addsI                bogus entries to the "common"-format log for time-stampingnL                server startup, shutdown, and the log being explicitly openedH                or closed. These entries are correctly formatted so as toK                be processed by a log analysis tool, and are recognisable as L                being "POST" method and coming from user "HTTPd". The requestI                path contains the event and a hexadecimal VMS status code,fI                that represents a valid exit status only in "END" entries.s  L                Clickable-image requests are logged as "302" entries, and theB                resulting, redirected request entry logged as well.  I                When a log entry is required the file is opened if closed.>M                The file is again closed one minute after the initial request.EE                This flushes the contents of the write-behind buffers.t               3.16 MENU.C   &                <online hypertext link>  H                This module implements the WASD menu interpretation func-L                tionality. It obtains the file specification from the requestL                data structure. Output from this module is buffered to reduce8                network writes, improving I/O efficiency.  #                Essential behaviour:y  J                1. The primary function allocates a task structure and thenK                   attempts to open the file. If unsuccessful it immediately M                   returns the error status to the calling routine for further L                   action (this behaviour is used to try multiple home pages,I                   for example). No checking of modification date/times istL                   done as menu documents are considered dynamic in a similar'                   way to SSI documents.e  J                2. After successfully opening the file it generates an HTTPE                   response header if required. A call is then made to J                   asynchronously read a record from the file opened. AfterK                   the asynchronous file read is queued the function returnsu-                   with a success status code.                3-18  HTTPd Modulesy l               K                   When the asynchronous file read completes the AST comple-iJ                   tion function is called to interpret the line, dependantK                   on the section number it occurs in. Status is checked for J                   success or otherwise. If an error the status is reportedL                   to the client, the file closed, and the request concluded.L                   If end-of-file, the file is closed and the processing con-J                   cluded. For a successful record read the line can eitherM                   be title, description or menu item. When the line is inter-cK                   preted and written to the network another read is queued, J                   with an AST completion routine again specifying the con-M                   tents interpretation function. The function then completes.                  3.17 MSG.C  &                <online hypertext link>  H                The message database for the server is maintained by thisH                module. Some structures are fixed in size at compilation,L                but the actual messages themselves are stored using allocatedF                memory so each and all may be of greatly variable size.  <                There are three main functions in the module.  I                1. MsgLoad() Loads a message database. Is called at servertK                   startup and whenever a report is to be generated from thee                   message file.t  I                2. MsgFor() Called each time the server needs to provide aaH                   message that originates from the database. The requestI                   pointer (if available) and the message number (from the G                   defined-by-macro number in msg.h) are supplied. If no G                   request pointer the prefered language is used (lowest J                   number). If there is a request pointer it is checked forI                   a prefered language before getting that message pointer >                   from whichever language array is to be used.  J                3. MsgReport() This is called via the server administrationJ                   menu to provide an HTML-formatted listing of messages inJ                   the server's volatile database or from the on-disk file.F                   For the on disk file it calls MsgInit() with a localI                   message structure, completely loading a new instance ofhJ                   all messages from the file, displays the report and then#                   disposes of them.             N                                                            HTTPd Modules  3-19    r                         3.18 NET.C  &                <online hypertext link>  E                This module handles all TCP/IP network activites, fromiG                creating the server socket and listening on the port, tocM                reading and writing network I/O. It manages request initiationrL                and rundown, and controls connection persistence. The networkI                read and write functions have provision for specifying I/OrL                completion AST function addresses. If these are provided thenI                the function is called upon completion of the network I/O.rL                If not provided then the I/O completes without calling an AST                routine.   I                As of v4.3 this module supports the MadGoat NETLIB network J                progamming package. This excellent freeware tool provides aH                generic, asynchronous interface to a number of underlyingI                TCP/IP packages. It behaves in much the same manner as the K                $QIO interface and so dove-tails perfectly into this server. F                It took less that eight hours to build support into theL                original UCX version for NETLIB! To avoid complete dependenceF                on, and the slight extra overhead of NETLIB, both a UCXJ                version and a NETLIB version are maintained via conditional*                compilation using C macros.  F                The server begins by creating a network socket and thenG                binding that to the HTTP port. The server then enters an 9                infinite loop, waiting for IP connections.e  G                When a connection request is received the remote host isuK                checked as an allowed connection. If allowed, a request datagL                structure is created from dynamic memory, and an asynchronousJ                read is queued from the network client. The pointer to thisH                dynamic data structure becomes the request thread, and isL                passed from function to function, AST routine to AST routine.H                The AST completion routine of the network request read(s)G                specifies a request analysis function. The function thenV9                returns to the connection acceptance loop.b  K                When the network read(s) complete an AST completion functionIE                in the Request()  module is called to process the HTTPe                request.e  L                This module also contains the code for the NetWriteBuffered()5                function described above, Section 2.6.T  L                With the introduction of SSL (see Section 3.21) with v5.0 theL                NetWrite()  and NetRead() functions no longer had the role ofK                the lowest-level network interface functions, but now assume I                the role of delivering data via a raw network interface or K                via an SSL-encrypted one depending on a particular request's H                requirement. They no longer read or wrote directly to theL                network, this functionality was devolved to NetReadRaw()  and               3-20  HTTPd Modulesn f  t            I                NetWriteRaw().  The SSL routines also use NetReadRaw() and J                NetWriteRaw()  when receiving or transmitting the encrypted+                data stream from the client.f                 3.19 PUT.C  &                <online hypertext link>  !                INCOMPLETE AS YET!   H                The PUT module allows files to be uploaded to, and storedL                by, the server. It also allows the deletion of files, and theM                creation and deletion of directories. This same module handleseF                PUT, POST and DELETE methods appropriately. It requiresL                authorization to be enabled on the server. Created files have-                a three-version limit applied.   K                The Request.c module controls the size of any request POSTedeI                or PUTed via a configurable parameter limiting the size innL                Kilobytes. The request is completely read by that same moduleJ                before being parsed and handed over to the Put.c module (orH                DCL.c if a script). Hence the Put.c module has a completeB                request body pointed in-memory that it can process.  K                POSTed or PUTed requests are processed differently accordingr(                to the MIME content-type:  3                o  application/x-www-form-urlencoded   D                   The server specially processes "application/x-www-A                   form-urlencoded" POSTS (i.e. those generated bytI                   <FORM>...</FORM>, allowing files to be created directlysL                   from HTML forms. The processing eliminates any field namesM                   from the URL-encoded data stream, placing only field values K                   into the file. This capability can be quite useful and is 9                   demonstrated in the Update HTTPd module   /                   <online hypertext link>. formg  %                o  multipart/form-data   J                   This server can process a request body according to RFC-L                   1867, "Form-based File Upload in HTML". As yet it is not aL                   full implementation. It will not process "multipart/mixed"G                   subsections. The implementation is basic, providing a2H                   facility to allow the upload of a file into the serverH                   administered file system. The ACTION= parameter of theJ                   <FORM> tag must specify the directory (as a URL path) in:                   which the uploaded file will be created.  N                                                            HTTPd Modules  3-21                 J                   The following example HTML illustrates how a form may beJ                   used to upload a file from the browser host file system:  \                    <FORM METHOD=POST ACTION="/web/directory/" ENCTYPE="multipart/form-data">D                    <INPUT TYPE=submit VALUE=" Upload document ... "><                    <INPUT TYPE=file SIZE=50 NAME=uploadfile>                    </FORM>  )                   <online hypertext link>l  M                   NOTE: This capability has only been tested against NetscaperJ                   Navigator versions 2 and 3. VMS Netscape Navigator 3.0b5H                   hangs if an upload of a variable-record format file isM                   attempted. Stick to STREAM-LF or fixed, or convert the fileiM                   to STREAM-LF. Windows-based Navigator will hang if the fileeG                   is open by another application (e.g. Microsoft Word).a                  o  any/other   L                   Any other content type has a file created according to theK                   path. If it is a text file the VMS record type is STREAM-oH                   LF (e.g. "text/plain", "text/html"). Any other type isM                   considered binary and the created file is made an UNDEFINED M                   record type (e.g. "image/gif", "application/octet-stream").   M                The parent directory of any file/directory operation is always@M                checked for permission to modify its contents. This permissionrH                is usually granted to the HTTPd account via an ACL. FilesK                are written using Virtual Block I/O. This can make them veryeM                efficient. They are handled asynchronously, not disturbing the -                multi-threading of the server.i               3.20 REQUEST.C  &                <online hypertext link>  K                This module reads the request header from the client, parsesuL                this, and then calls the appropriate task function to executeJ                the request (i.e. send a file, SSI an HTML file, generate a9                directory listing, execute a script, etc.)   J                The request header is contained in the network read buffer.K                If it cannot be completely read in the first chunk, the readtJ                buffer is dynamically expanded so as to be read in multipleL                chunks. The request header is addressed by a specific pointerK                that allows the parse-and-execute function to process eithervK                a genuine, initial client request header, or a pseudo-headerC1                generated to effect a redirection.   L                The method, path information and query string are parsed fromK                the first line of the header. Other, specific request header I                fields are also parsed out and stored for later reference.aF                Once this has been done the header is not further used.               3-22  HTTPd Modulest p  e            I                Once the relevant information is obtained from the requestiG                header processing commences on implementing the request. H                This comprsises the rule-mapping of any path information,J                the RMS parsing of any resulting VMS file specification and=                decision-making on how to execute the request.t  <                o  If an internal directive that is executed.  I                o  If the content-type of a supplied file specification ishH                   auto-scripting, an automatic redirection is generated.  L                o  If a file specification and no wildcards, the file is sent                    to the client.  L                o  If a wildcarded file specification, and no query string, a1                   directory listing is generated.   E                o  If a directory specification (no file name), one ofrJ                   multiple, possible, home pages are attempted to be sent.L                   If no home page is found in the directory then a directory'                   listing is generated.e  I                o  If a script specification prefixed the path informationm*                   that script is executed.  G                o  If a query string is supplied (and it is not a serveruI                   directive), and no script name was included in the pathlG                   information, the server query script is automaticallyo                   activated.  G                This functionality is used to parse and execute both theaF                initial client request and any pseudo-request generated2                internally to effect a redirection.  J                If a POST/PUT method is used the entire request body (usingM                the "Content-Length:" header line to determine length) is alsoaL                read into dynamic memory before passing to the PUT.C or DCL.C&                modules for processing.               3.21 SESOLA.C   &                <online hypertext link>  K                This module provides the optional Secure Sockets Layer (SSL)eH                encrypted communication link functionality for WASD. ThisI                section will not discuss how SSL works, or even the SSLeayeL                package (see below), it is merely a thumb-nail sketch of someK                quite complex functionality, much of it hidden by the SSLeayc                package.a  L                WASD implements SSL using a freely available software libraryL                known as "SSLeay" (pronounced "S-S-L-E-A-Y", i.e. all lettersJ                spelt), version 0.8.1, authored and copyright by Eric YoungD                and Tim Hudson. It is not a GNU-licensed package, but  N                                                            HTTPd Modules  3-23 l  a            H                does makes unrestricted commercial and non-commercial use<                available. The FAQ for SSLeay may be found at  3                http://www.psy.uq.oz.au/~ftp/Crypto/   F                This should be consulted for all information on the SSL+                technology employed by WASD.a  K                If the SSLeay component of WASD is installed it can be foundg                at   )                HT_ROOT:[SRC.SSLEAY-0_8_1]i  G                It has been necessary to make minor modifications to theeH                v0.8.1 distribution to support VMS (there was rudimentaryC                support that looked like a hang-over from a previous J                distribution). These changes are very minor and designed toL                address the differences in VMS and DECC versions. All changesI                to source can be found by searching for "MGD" in [...]*.C, -                [...]*.H and [...]*.COM files.t  F                These changes have been made only to support WASD's useF                of the package, they are not proposed as general SSLeay>                modifications, i.e. they were purely pragmatic!                  SeSoLa   J                This module is named "SeSoLa" to avoid any confusion and/orM                conflict with SSLeay routines. SSLeay and WASD supports SSL v2 M                and v3 protocols. The module has two distinct roles controlled L                by the SESOLA define. If this is defined the SeSoLa module isK                compiled into an interface with SSLeay. If it is not definedoL                it just provides the function stubs required by, but not usedH                by, the other modules in the server. In this way only theK                SeSoLa module needs to be recompiled and the server relinkeddK                to provide the SSL functionality, all other modules stay thee                same.                  Non-Blocking I/O   L                SSL I/O is implemented as a SSLeay BIO_METHOD, named "sesola_F                method". It provides NON-BLOCKING SSL input/output. AllE                routines that are part of this functionality are namedcJ                "sesola_..." and are grouped towards the end of the module.  K                SSLeay supports non-blocking I/O by requiring the BIO (BasiccI                Input/Output) routine to indicate (using a -1 return) whensJ                the I/O is not available but will be later. It then expectsJ                the same routine to be called with the same parameters whenK                it is, completing that part of the processing. WASD utilizesdI                this with the sesola_read()  and sesola_write() functions,yJ                and their respective AST functions. A state variable tracks               3-24  HTTPd Modules. M  C            H                where in a session a particular read or write is occuringL                and the AST re-calls the appropriate function to complete the                processing.  I                The main functions directly used when processing a requestdE                are SeSoLaAccept(),  which establishes the SSL session.H                with the client, SeSoLaRead()  which is the equivalent ofK                NetRead(),  and is in fact called from it, and SeSoLaWrite() F                the equivalent of NetWrite(),  and is again called fromI                it. SeSoLaRead()  accepts encrypted data from the network,mE                decrypts it and returns plain data to the AST routine.aJ                SeSoLaWrite()  accepts plain data from the calling routine,H                encrypts it and write that encrypted data to the network.               3.22 SSI.C  &                <online hypertext link>  L                The Server Side Includes (HTML pre-processor) module providesF                this functionality as an integrated part of the server.L                Output from this module is buffered to reduce network writes,(                improving I/O efficiency.  #                Essential behaviour:e  D                1. The primary function attempts to open the file. IfM                   unsuccessful it immediately returns the error status to the L                   calling routine for further action (this behaviour is usedI                   to try multiple home pages, for example). SSI documents L                   can be dynamic in undetectable ways so no checking of file1                   modification date/time is done.   J                2. After successfully opening the file it generates an HTTPE                   response header if required. A call is then made to H                   asynchronously read a record from the file opened. TheJ                   record read AST function scans the record (line) lookingG                   for pre-processor directives embedded in HTML commentOK                   directives. If no directive is found the record is outputn9                   buffered and another queued to be read.r  H                3. If a directive is detected any part of the line up theG                   directive is output buffered and a function called tonL                   parse the directive. This function reports an error if theK                   directive specified is not supported (unknown, etc.) If a M                   supported directive a specific function is called according I                   to the directive specified. These functions provide the @                   pre-processor information in one of four ways:                     1. Internallyg  N                                                            HTTPd Modules  3-25 d  -            J                      Information such as the system time, current documentH                      information, etc., can be provided from informationK                      contained in the request data, etc., or in the case ofEN                      specified document/file information obtained via the fileK                      system. These directives have the relevant informationeL                      buffered and then the function returns to the directive&                      parsing function.  &                   2. Via DCL Execution  L                      Information that must be obtained through DCL executionH                      is obtained using an asynchronous call to the Dcl()G                      module. The next-task function is specified as themL                      line parsing function. When the DCL module has finishedM                      executing the required command control is passed back to #                      this function.   #                   3. Sending a Filer  D                      If a file is #included this is provided with anJ                      asynchronous call to the File() module. The next-taskM                      function is specified as the line parsing function. WhenpL                      the File() module has finished transfering the includedB                      file control is passed back to this function.  &                   4. Directory Listing  M                      If a directory listing is requested this is provided via L                      an asynchronous call to the Dir() module. The next-taskM                      function is specified as the line parsing function. WhendI                      the Dir() module has finished generating the listinge=                      control is passed back to this function.   G                4. Directives continue to be parsed, and executed, asyn- J                   chronously if necessary (as just described), from withinH                   a line until the end-of-line is reached. Any remainingK                   characters are output buffered. Lines continue to be read J                   from the file using the AST mechanism until end-of-file.               3.23 STMLF.C  &                <online hypertext link>  M                The stmLF.c module converts VARIABLE format records to STREAM- I                LF. It is only called from the File.c module and only thentE                when STREAM-LF conversion is enabled within the serverm                configuration.e  M                The conversion is done asynchronously (naturally) concurrentlyiE                with any reads being performed by the File.c module in C                transfering the file to the client. After successfulr6                conversion the original file is purged.               3-26  HTTPd Modulesi    e                         3.24 SUPPORT.C  &                <online hypertext link>  L                The support module provides a number of miscellaneous support:                functions for the HTTPd (well go-o-o-lee!).               3.25 UPD.C  &                <online hypertext link>  !                INCOMPLETE AS YET!e  D                The Upd.c module implements the on-line web directoryF                administration and file editing and upload facility. ItG                requires authorization to be enabled on the server. FilerJ                and directory modification are still performed by the Put.cK                module. The Upd.c is an overly large body of code generating G                all of the dialogues and editing pages. It also providesmM                additional functionality for the server administration, adding>@                admin-specific portions of dialogues as required.               3.26 VM.C   &                <online hypertext link>  K                The virtual memory management module provides dynamic memory M                allocation and deallocation functions. These functions use the K                VMS system library virtual memory routines. Also see generalt'                comments in Section 2.5.t  E                Separate virtual memory zones are created and used foroL                specific dynamic memory requirements within the server. These                are:   F                o  General -  This zone is used for all general-purposeK                   dynamic memory requirements. These include configuration, K                   rule, message file, history records, activity data, etc., I                   interpretation and server configuration, etc. Generally J                   memory allocated from this pool is fairly static, seldomK                   released or subsequently requested during the lifetime of                    the server.   K                o  Cache -  This zone provides memory for the file cache. ItxL                   can be quite dynamic as cache entries are flushed or newly                   loaded.i  E                o  Request -  This zone provides a pool of essentially/J                   fixed sized blocks used for the basic request processing                   structure.  N                                                            HTTPd Modules  3-27 >               J                o  Request Thread - A zone of virtual memory is created forL                   each request that commences processing. The entire zone isJ                   released at request conclusion. This is a very efficientD                   method of handling the dynamic memory requirementsF                   during request processing. No explicit list or otherJ                   chunk tracking is required, all is performed by the zone&                   management routines.  L                If a request for dynamic memory, made for any of the purposesI                listed above, fails the server exits reporting the problemaI                (e.g. insufficient dynamic memory). Memory availability istI                considered so crucial that any problem with it affects therM                server severely enough for it to be reported immediately. ThisgJ                also make the code requesting the memory simpler, it is notJ                necessary to check the success of the call as it is part ofD                the design that only successful requests ever return!                                                                                 3-28  HTTPd ModulesB