J VMS_SHARE, UTILITIES, Pack multiple files into a form suitable for mailing   			  A B S T R A C T    F   VMS_SHARE is designed to package a series of files into a multi-partH   share file suitable for mailing across a network. Files are encoded to?   be resistant to the corruption that many mailers and networks G   generate.  When all parts of the share file are combined and run as a D   command procedure, the packaged directory tree is recreated in its   original format.      @   This software is copyright (C) of the author and comes with noH   warranties either expressed or implied.  It may be distributed free ofF   charge to anyone who may require a copy, provided that all copyrightG   notices remain intact. Any problems arising from its use are entirely !   the responsibility of the user.     
   Andy Harper    Systems Manager    Computing Centre   Kings College London   The Strand   London WC2R 2LS 	   England      Tel:  +44 (0) 71 873 2347 .   E-mail:   UDAA055 @ UK.AC.KCL.CC.OAK (JANET)D             UDAA055%OAK.CC.KCL.AC.UK @ NSFNET-RELAY.AC.UK (INTERNET)    .         	TECHNICAL INFORMATION ABOUT VMS_SHARE       								Version 8.4      								May 1993       1. INTRODUCTION   K VMS_SHARE is designed to package a series of files into a form that can be  L easily mailed across many different networks. Difficulties arise with doing L this because of the many and varied possibilities for corruption of data in I transit.  For example, line wrapping, case folding, transposition of key   characters etc.   M VMS_SHARE encodes files before transmission so that these things may be kept  @ under control and proper restoral effected at the receiving end.  K For a given series of files to be packaged, VMS_SHARE combines them into a  O single large 'text archive' file that can be unpacked into its component files  F simply by running it as a command procedure at the receiving end. For L convenience, VMS_SHARE will optionally split the result into multiple parts D that can be individually mailed and recombined at the receiving end.  M NOTE - VMS_SHARE is designed for Digital VAX or Alpha systems running the VMS K operating system. It will not work on other operating systems/hardware, and 3 minimum operating system versions must be observed.        2. WHAT VMS_SHARE DOES NOT DO   L Because VMS_SHARE relies on electronic mail to ship the files, there are no J protocols that can be used to check the accuracy of the received file(s). M There is a reliance on the underlying mail system to get everything there in  K one piece and unchanged.  VMS_SHARE is unable to ask for retransmission of   missing or damaged pieces.  O VMS_SHARE should therefore be used to send files only via essentially reliable  O mail systems which can get files, whose characters fall within certain bounds,  
 there intact.   J VMS_SHARE is intended for sequential files only. Other file formats can beK packaged into a backup saveset, whose file format is supported by VMS_SHARE       < 3. LIMITATIONS OF MAILERS AND HOW VMS_SHARE GETS AROUND THEM  K Various mail systems have different limitations within them. For instance,  O they will wrap or truncate lines that are too long, they may limit the size of  M an individual mail message, they may transpose characters incorrectly if the  M underlying character set is different from the transmitter (ASCII/EBCDIC is a  good example of this).    I VMS_SHARE encodes the files in different ways to get around the problems. L Please note however, that the encoding techniques are NOT foolproof. We haveJ merely tried to anticipate all possible corruptions and devise an encodingO scheme which ensures that the conditions under which corruption occurs does not O arise. If a form of corruption that has not been anticipated occurs, corruption L to the transmitted files will be irreparable except through manual editing.       " 3.1 Maximum Size of a Mail Message  L Many mail systems cannot cope with single mail messages larger than a fixed N number of bytes and will truncate messages or maybe even fail to deliver them  altogether.   M This is a real problem if a large software package is being sent.  VMS_SHARE  K tries to overcome this by splitting the packaged files into several parts,  K each part being smaller than some fixed size. By default, a part size of 30 B blocks is chosen; this can be overriden by defining a logical nameL (SHARE_PART_SIZE) or by a qualifier on the command line (/PART_SIZE=nn). ForL example, we might send a  total of 300 blocks of code as 10 parts each of 30M blocks or less. VMS_SHARE  will automatically split at the 30 block boundary.   K It should be noted that mail headers added on route can account for several G blocks worth of extra space so this should be realised when setting the  maximum part size.         3.2 Maximum Line Length   M Many mail systems do not like lines longer than some fixed maximum length, a  N maximum length of 80 characters is typical. This results in longer lines being7 wrapped or truncated at seemingly arbitrary positions.    N VMS_SHARE tries to cope with this by wrapping long lines itself and inserting O markers to allow them to be rejoined at the receiving end. What VMS_SHARE does  N is to prefix each line with a flag character. This flag character says EITHER J 'this is the first part of a line' OR 'this line is a continuation of the  previous line'.   O The maximum line size is configured into the code as a global value and  can be I easily changed if required. It is not intended that this value should  be $ altered by the average user however.         3.3 Trailing Blanks   K Some mailers interfere with blanks at the start and end of lines. VMS_SHARE L encodes blanks (and tabs) as if they were troublesome characters (see below)M to get around this. During unpacking of an encoded file, any blank characters  are ignored.         3.4 Escaped Characters  K Undoubtedly the biggest problem is that a mail message moving through many  D different systems on route to the destination may undergo character J conversions (for example - ASCII to EBCDIC if moving from VAX to an IBM). N Unfortunately, not all systems keep similar translation tables and characters M can get translated into something unexpected at the remote end. Culprits are  M caret (^), tilde (~), square and curly brackets ( [ ] { } ) and a few others.   G VMS_SHARE deals with this problem by replacing each of the troublesome  M characters - the ones mentioned above plus any non-printing character - by an O escape sequence. The escape sequence is recognized at the receiving end and is  N translated back to the original character.  Obviously, to work correctly, the A escape sequence itself must be immune from translation problems.    N The escape technique used is to replace each character by a string of the formL `xx   where the ` symbol flags the start of an escape sequence and 'xx' is aF 2-digit string which is the hexadecimal form of the ASCII code for theL character. Naturally, the ` character itself must be escaped in this form toK avoid confusion. For example, a space would be replaced by  `20   and a tab 	 by  `09.        % 3.5 Additional Compression Techniques   L Two additional forms of character encoding can be optionally selected by theN user to reduce the size of the packaged data - either run-length encoding or a( modified form of Lempel-Ziv compression.  N A file compressed with one of these options will be automatically decompressedD when unpacked. It is not necessary for the recipient to use external decompression tools.     3.5.1 Run-Length Encoding   O A form of run length encoding is used to encode sequences of the same character I into a 5 character sequence. In this instance, the generated sequence is:       &nnXX  D where & is the run length sequence flag, nn is the count (in hex) ofN characters, and ZZ is the hex code of the ascii character.  For example, a runH of 15 spaces would be replaced by  &0F20 (`0F' = 15, `20' = hex code for space).   G The use of run length encoding dramatically increases the time spent on O encoding the files. In many cases, it will be of no benefit. Because of this it  is not active by default.      3.5.2 Lempel-Ziv Compression  K The Lempel-Ziv algorithm scans for common substrings in a file and replaces I them by a pointer back to a previous occurrence within the file. For this O implementation, a number of changes have been made to the basic idea to fit the O restrictions of the TPU utility, and the line wrapping and quoting schemes used , for long lines and non-printable characters.  I The file is scanned for the longest previously occurring substring and is + replaced by an escape sequence of the form:       \bbll  I where \ is the flag to indicate an lz encoded string, bb is a 2 digit hex L encoded backwards count to the start of the original string, and ll is a twoJ digit hex encoded length. Because of the 2 digit hex encoding, the maximumI backwards search distance is 255 bytes and the maximum length is also 255 N bytes. Therefore up to 255 bytes can be compressed to a 5 char sequence in theK optimal case. In practice, compression ratios are nothing like as dramatic.   F This form of compression is very slow in operation due to the repeatedO searching for substrings that have previously occurred. Some optimisations have H been made to the searching but it should still be selected only if it is certain to be of some benefit.      * 3.6 Detecting Damaged Files with Checksums  O In cases where some corruption occurs despite the encodings used by VMS_SHARE,  N detection of damage (BUT NOT REPAIR!) should be possible because each file is @ checked for accuracy using a checksum once it has been unpacked.  H VMS_SHARE uses the currently undocumented CHECKSUM command to produce a K checksum value for the source file. This checksum is carried across in the  O packed share file and checked when the file is restored. A failed match causes  K a message and the receiver can take action to try to locate and repair the   damage.    The DCL command:        $ CHECKSUM filename  E writes the checksum value into a DCL symbol called CHECKSUM$CHECKSUM.   H The CHECKSUM command does not work with files that have certain types ofN records (specifically, those with an MRS value of 0 and records exceeding 2048M bytes). Therefore, VMS_SHARE cannot verify such files. Unfortunately, for the O same reason, VMS_SHARE is unable package such files at all, so an error message " is issued and the file is skipped.     4. VMS_SHARE IMPLEMENTATION   N VMS_SHARE is provided as a combination of DCL and TPU code in order to ensure K that it will run on any VMS system.  A specific program would be faster of  . course but then portability is not guaranteed.  E The DCL part of the software is used merely to pick up parameters and J qualifiers, and parse filenames, passing them to the TPU code in a scratch file.   H The TPU code does the hard work of packaging the files, wrapping lines, N escaping characters, compressing if requested,  and generating multiple parts.  O As distributed, the DCL and TPU code are bundled into a single large procedure  O but there is no reason why the TPU code could not be extracted and made into a  F section file for enhanced speed. The modifications required are quite  straightforward.     4.1 Long Lines  E Because the code is based upon TPU, some limitations are imposed upon K VMS_SHARE. In particular, early versions of TPU (pre-VMS 5.4 on VAX) do not H allow records longer than 960 bytes so it is impossible to package them.H Versions of TPU at VMS 5.4 and beyond (VAX) or any OpenVMS (Alpha) allowC records up to 65535 bytes, so the problem virtually disappears. For M compatibility, VMS_SHARE still uses the old record length unless requested by M the user with the /LONGLINES qualifier. Use of this requires a minimum VMS of O 5.4 (VAX), or any OpenVMS (Alpha) and the generated share file will unpack only 4 on VMS 5.4 or greater (VAX), or any OpenVMS (Alpha).  L TPU file handling is limited. Files can only be written with variable lengthG records and CR carriage control. To allow other formats to be packaged, N VMS_SHARE encodes selected file record attributes into the share file and usesN the CONVERT utility to restore those attributes during the unpacking phase. InJ principle, this allows VMS_SHARE to package files of most types, includingL .EXE, .OBJ and .BCK files. In the case of .BCK files, this is subject to theO BACKUP block length being compatible with the maximum record length selected by O the user (960 or 65535 as appropriate). Allowing BACKUP savesets to be packaged N allows files of all other types to be packaged, provided they are first storedO in a saveset. BACKUP requires a minimum block length of 2048 bytes, so the long ) line support is a pre-requisite for this.      4.2 Part Size Determination   M The size of a part is conceptually simple. Find the size of a buffer in bytes N and divide by 512 to get the number of blocks it will occupy. However, this is complicated by several things.  L First, TPU does not count line ends when returning the `LENGTH' of a buffer.L Second, when a buffer is written to disk, there is a 2 byte overhead on eachN record giving the length of the record. Finally, within a disk block, a recordK always starts on a word boundary so that some records may be padded with an  additional null byte.   O To accurately determine how much disk space a buffer would occupy would involve O some complex computations. However, since we know that each record has either a M 2 byte or a 3 byte overhead we can get a reasonably accurate approximation by L taking the LENGTH of the buffer and adding 3 bytes for each record. We use 3K bytes to allow for the worst case and ensure that the part, when written to O disk, never exceeds the specified part size. In practice, this means that parts M will sometimes be less than the part size - the discrepancy grows as the part  size is increased.       5. USING VMS_SHARE  G As distributed, VMS_SHARE is run as a command procedure (usually via a  - suitable symbol set up to point to it) thus:-   %      $ @VMS_SHARE filespecs sharefile   J where 'filespecs' is a comma separated list of wildcarded filenames to be M packaged, and 'sharefile' is the name to be given to the packaged files. Each D part of the sharefile will be suffixed by a part number in the form:       nnn-OF-mmm  B where nnn is the part number and mmm is the total number of parts.  > There are some restrictions on the filenames that can be used:  L      - Subdirectories may be used provided that they are beneath the currentL        directory. It is not permitted to package files in other directories.  K      - At least one valid file must be given in 'filespecs' or no sharefile         will be produced.     6. UNPACKING A VMS_SHARE FILE   N In general, a package delivered using the VMS_SHARE software will arrive in a N number of parts, from 1 up to 'n'.  All parts should be concatenated together O in order. It is NOT necessary to remove superfluous mail headers from any part  ) other than part 1 prior to concatenation.   N The resulting combined file should then be executed as a command procedure in $ order to unpack the resulting files.       6.1 Typical Unpack Sequence   , A typical sequence of events goes like this:  D  - Set your default directory to a scratch directory which is empty.  E  - Go into MAIL and select the folder which contains the parts of the     package.   O  - Extract part 1 into a file, using the command 'EXTRACT/NOHEADER        file' O    Extract part 2 into a file, using the command 'EXTRACT/NOHEADER/APPEND file'     ...    ...O    Extract part n into a file, using the command 'EXTRACT/NOHEADER/APPEND file'   *  - Read warning below BEFORE proceeding!!!  ?  - Execute as a command procedure, using the following command: 
       $ @file        6.2 Warning   N It is strongly suggested that the generated command procedure ('file.SHAR' in N the above example) be carefully checked before execution. It is possible that H unscrupulous persons might tamper with the source before sending it and M introduce a virus into the VMS_SHARe'd code. There is nothing that VMS_SHARE  N can do about this automatically. However, since all the files should be human L readable it should be possible to detect fraudulent code by manual checking.H Certainly the lines starting with '$' symbols, and the TPU code near theN start, should be checked carefully as these are most likely to be troublesome.         7. DECLARATION AND DISCLAIMER   L This software is in the public domain and may be freely distributed without H charge as required. However, all copyright notices and references to the* author in the source must be left intact.   K Third party modifications may be made to the source but any errors arising  ? from their use are entirely the responsibility of the modifier.   M The author accepts no responsibility for the suitability of this software for F any specific purpose. Any errors arising from its use are entirely the responsibility of the user.      Andy Harper  Kings College London UK 