This is the README file for the MOLPHY (PROTML) distribution,  version 2.2.
Copyright (c) 1992-1994, Jun Adachi & Masami Hasegawa; All rights reserved.

        MOLPHY is a program package for MOLecular PHYlogenetics.

PROTML is a main program in MOLPHY for inferring evolutionary trees from
PROTein (amino acid) sequences by using the Maximum Likelihood method.

Programs (C language)
  PROTML: Maximum Likelihood Inference of Protein Phylogeny
  NUCML:  Maximum Likelihood Inference of Nucleic Acid Phylogeny
  PROTST: Basic Statistics of Protein Sequences
  NUCST:  Basic Statistics of Nucleic Acid Sequences
  NJDIST: Neighbor Joining Phylogeny from Distance Matrix

//NT compilers note:  I haven't compiled these utilities
Utilities (Perl)
  mollist:  get identifiers list        molrev:   reverse DNA sequences
  molcat:   concatenate sequences       molcut:   get partial sequences
  molmerge: merge sequences             nuc2ptn:  DNA -> Amino acid
  rminsdel: remove INS/DEL sites        molcodon: get 3rd(1st,2nd) codons
  molinfo:  get (non)infomation sites   mol2mol:  MOLPHY format beautifer
  inl2mol:  Interleaved -> MOLPHY       mol2inl:  MOLPHY -> Interleaved
  mol2phy:  MOLPHY -> Sequential        phy2mol:  Sequential -> MOLPHY
  must2mol: MUST -> MOLPHY

MOLPHY is a free software, and you can use and redistribute it.
The programs are written in a standard subset of C with UNIX-like OS.
The utilities are written in the "Perl" (Ver.4.035) with UNIX-like OS.
MOLPHY has been tested on SUN4's (cc & gcc with SUN-OS 4.1.3) and
HP9000/700 (cc, c89 & gcc with HP-UX 9.03).
But, MOLPHY has NOT been tested on VAX, IBM-PC, and Macintosh.

NETWORK DISTRIBUTION ONLY: The latest version of MOLPHY are always available
by anonymous ftp in sunmh.ism.ac.jp(133.58.12.20): /pub/molphy*.

Followings are users manuals of the PROTML and others contained in the MOLPHY.

Notes!  'F' option of PROTML(2.1.*) was incorrect.  Now, it is corrected.
        Some option changed.

//Ignore this part for the NT binaries
                              INSTALLATION
To build MOLPHY, UNIX users should be able to type "make" in molphy-2.2/src
directory.  (Edit the molphy-2.2/src/Makefile if you need to customize it)
    % cat molphy-2.2.tar.Z | uncompress | tar xvf -
    % cd molphy-2.2/src
    % make
    % make install

                                 TEST
    % cd ..
    % njdist.exe > njdist.out
    % diff NJDIST.EXA njdist.out
    % protml.exe > protml.out
    % diff PROTML.EXA protml.out
    % nucml.exe > nucml.out
    % diff NUCML.EXA nucml.out

-------------------------------------------------------
Jun Adachi  adachi@ism.ac.jp
  Department of Statistical Science,
  The Graduate University for Advanced Study
  4-6-7 Minami-Azabu, Minato-ku, Tokyo 106, Japan

Masami Hasegawa  hasegawa@ism.ac.jp
  The Institute of Statistical Mathematics
  4-6-7 Minami-Azabu, Minato-ku, Tokyo 106, Japan
======================================

ProtML 2.2(Jun 30 1994) Maximum Likelihood Inference of Protein Phylogeny
Copyright (C) 1992-1994 J. Adachi & M. Hasegawa; All rights reserved.
Usage: protml [switches] sequence_file [topology_file]
sequence_file = MOLPHY_format | Sequential(-S) | Interleaved(-I)
topology_file = users_trees(-u) | constrained_tree(-e)
Model:
-j  JTT (default)   -jf  JTT-F         Jones, Taylor & Thornton(1992)
-d  Dayhoff         -df  Dayhoff-F     Dayhoff et al.(1978)
-p  Poisson         -pf  Proportional
-r  users RTF       -rf  users RTF-F   (Relative Transition Frequencies)
-f  with data Frequencies
Search strategy or Mode:
-u  Users trees (need users_trees file)
-e  Exhaustive search (with/without constrained_tree file)
-s  Star decomposition search (may not be the ML tree)
-q  Quick add OTUs search (may not be the ML tree)
-D  maximum likelihood Distance matrix --> NJDIST
Others:
-n num  retained top ranking trees win Approx.likelihood(default -e:100,-q:50)
-b  no Bootstrap probabilities (Users trees)
-S  Sequential format   -I  Interleaved format
-v  verbose to stderr   -i, -w  output some infomation

NucML 2.2(Jun 30 1994) Maximum Likelihood Inference of Nucleic Acid Phylogeny
Copyright (C) 1992-1994 J. Adachi & M. Hasegawa; All rights reserved.
Usage: nucml [switches] sequence_file [topology_file]
sequence_file = MOLPHY_format | Sequential(-S) | Interleaved(-I)
topology_file = users_trees(-u) | constrained_tree(-e)
Model:
-t n1     n1: Alpha/Beta ratio    (default:4.0)  Hasegawa, Kishino & 
Yano(1985)
-t n1,n2  n2: AlphaY/AlphaR ratio (default:1.0)  Tamura & Nei(1993)
-p  Proportional    -pf  Poisson
-r  users RTF-F     -rf  users RTF     (Relative Transition Frequencies)
-f  withOUT data Frequencies
Search strategy or Mode:
-u  Users trees (need users_trees file)
-e  Exhaustive search (with/without constrained_tree file)
-s  Star decomposition search (may not be the ML tree)
-q  Quick add OTUs search (may not be the ML tree)
-D  maximum likelihood Distance matrix --> NJDIST
Others:
-n num  retained top ranking trees win Approx.likelihood(default -e:100,-q:50)
-b  no Bootstrap probabilities (Users trees)
-S  Sequential format   -I  Interleaved format
-v  verbose to stderr   -i, -w  output some infomation

ProtST 1.1.1 (Jun 30 1994) Basic Statistics of Protein Sequences
Copyright (C) 1993, 1994 J. Adachi & M. Hasegawa; All rights reserved.
Usage: protst [switches] sequence_file
Switches:
-w  Alignments viewer
-S  Sequential input format (PHYLIP)
-I  Interleaved input format (other packages)
NucST 1.1.1 (Jun 30 1994) Basic Statistics of Nucleic Acid Sequences
Copyright (C) 1993, 1994 J. Adachi & M. Hasegawa; All rights reserved.
Usage: nucst [switches] sequence_file
Switches:
-w  Alignments viewer
-S  Sequential input format (PHYLIP)
-I  Interleaved input format (other packages)

NJDist 1.2.1 (Jun 30 1994) Neighbor Joining Phylogeny from Distance Matrix
Copyright (C) 1993, 1994 J. Adachi & M. Hasegawa; All rights reserved.
Ref: N. Saitou & M. Nei 1987. Molecular Biology and Evolution 4:406-425
Usage: njdist [switches] distance_matrix_file
Switches:
-w      branch length
-l      Least squares
-S      Sequential input format (PHYLIP)
-O num  branch number of Out group 
-T str  output Tree file name

======================================

                 Format of Input SEQUENCES File


	standard MOLPHY input sequence data format:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
4 90
Data1
MTAILERRESESLWGRFCNWITSTENRLYIGWFGVLMKPTLLTATSVFIIAFIHAPPVDK
DGHREPVSGSGRVINTWADIINRANLGMEV
Data2
MTTALRQRESANAWEQFCQWIASTENRLYVGWFGVIMKPTLLTATICFIIAFIHAPPVDK
DGHREPVAGSGRVISTWADILNRANLGFEV
Data3
MTTALQRRESASLWQQFCEWVTSTDNRLYVGWFGVLMKPTLLTATICFIVAFIHAPPVDK
DGHREPVAGSGRVINTWADVLNRANLGMEV
Data4
MTTTLQQRSRASVWDRFCEWITSTENRIYIGWFGVLMKPTLLAATACFVIAFIHAPPVDK
DGHREPVAGSGRVIATWADVINRANLGMEV
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Note, "~~~" is file separater.

An input file has two parts of data; SIZE and SEQUENCES.

1. SIZE
The first line of the file contains the number of species(OTUs) and the length
of amino acid sequences, in free format, separated by blanks(space or tab).
You can write comment of the data after two digits numbers,
separated by blanks.

2. SEQUENCES
The following lines give sets of species name and amino acid sequence data.
Names are made up of letters and digits; the first character must be a letter.
The underscore "_" is regarded as a letter. Upper case and lower case letters 
are
distinct, so "spc_1", "Spc_1" and "SPC_1" are three different names.
Name can NOT include blanks.
You must put the amino acid sequence AFTER NEWLINE in free format.
Separated by whitespace(space, tab or newline) is allowed.
The amino acids must be specified by the one letter codes adopted by
IUPAC-IUB Commission on Biochemical Nomenclature (1968).

      CODE   Amino acid Description
        A    Ala    Alanine
        R    Arg    Arginine
        N    Asn    Asparagine
        D    Asp    Aspartic acid
        C    Cys    Cysteine
        Q    Gln    Glutamine
        E    Glu    Glutamic acid
        G    Gly    Glycine
        H    His    Histidine
        I    Ile    Isoleucine
        L    Leu    Leucine
        K    Lys    Lysine
        M    Met    Methionine
        F    Phe    Phenylalanine
        P    Pro    Proline
        S    Ser    Serine
        T    Thr    Threonine
        W    Trp    Tryptophan
        Y    Tyr    Tyrosine
        V    Val    Valine
        B    Asx    Aspartic acid or Asparagine
        Z    Glx    Glutamine or Glutamic acid
        X    Xaa    Any amino acid
        -    gap    Ins/Del


	Felsenstein's PHYLIP "SEQUENTIAL" format:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    4   90
Data1     MTAILERRESESLWGRFCNWITSTENRLYIGWFGVLMIPTLLTATSVFII
AFIAAPPVDIDGIREPVSGSGRVINTWADIINRANLGMEV
Data2     MTTALRQRESANAWEQFCQWIASTENRLYVGWFGVIMIPTLLTATICFII
AFIAAPPVDIDGIREPVAGSGRVISTWADILNRANLGFEV
Data3     MTTALQRRESASLWQQFCEWVTSTDNRLYVGWFGVLMIPTLLTATICFIV
AFIAAPPVDIDGIREPVAGSGRVINTWADVLNRANLGMEV
Data4     MTTTLQQRSRASVWDRFCEWITSTENRIYIGWFGVLMIPTLLAATACFVI
AFIAAPPVDIDGIREPVAGSGRVIATWADVINRANLGMEV
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The information for each species follows, starting with a TEN-CHARACTER
species name (which CAN include punctuation marks and blanks).
You must use SEQUENTIAL_FILE with "-S" Switch, follow as:

	protml -S SEQUENTIAL_FILE


	MOLPHY and PHYLIP common format:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    4   90$
Data1     $
MTAILERRESESLWGRFCNWITSTENRLYIGWFGVLMIPTLLTATSVFIIAFIAAPPVDI$
DGIREPVSGSGRVINTWADIINRANLGMEV$
Data2     $
MTTALRQRESANAWEQFCQWIASTENRLYVGWFGVIMIPTLLTATICFIIAFIAAPPVDI$
DGIREPVAGSGRVISTWADILNRANLGFEV$
Data3     $
MTTALQRRESASLWQQFCEWVTSTDNRLYVGWFGVLMIPTLLTATICFIVAFIAAPPVDI$
DGIREPVAGSGRVINTWADVLNRANLGMEV$
Data4     $
MTTTLQQRSRASVWDRFCEWITSTENRIYIGWFGVLMIPTLLAATACFVIAFIAAPPVDI$
DGIREPVAGSGRVIATWADVINRANLGMEV$
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Note, '$' is newline(return) code.


	PHYLIP and other packages "INTERLEAVED" format:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    4   90
Data1     MTAILERRESESLWGRFCNWITSTENRLYIGWFGVLMIPTLLTATSVFII
Data2     MTTALRQRESANAWEQFCQWIASTENRLYVGWFGVIMIPTLLTATICFII
Data3     MTTALQRRESASLWQQFCEWVTSTDNRLYVGWFGVLMIPTLLTATICFIV
Data4     MTTTLQQRSRASVWDRFCEWITSTENRIYIGWFGVLMIPTLLAATACFVI

AFIAAPPVDIDGIREPVSGSGRVINTWADIINRANLGMEV
AFIAAPPVDIDGIREPVAGSGRVISTWADILNRANLGFEV
AFIAAPPVDIDGIREPVAGSGRVINTWADVLNRANLGMEV
AFIAAPPVDIDGIREPVAGSGRVIATWADVINRANLGMEV
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
You must use INTERLEAVED_FILE with "-I" Switch, follow as:

	protml -I INTERLEAVED_FILE





                  Format of USERS TREES File

	standard USERS TREES file format:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
3
(((HUMAN,(CHIMP,PYGMY)),GORIL),ORANG,SIAMA);
((HUMAN,((CHIMP,PYGMY),GORIL)),ORANG,SIAMA);
(((HUMAN,GORIL),(CHIMP,PYGMY)),ORANG,SIAMA);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Note, "~~~" is file separater.

An input file has two parts of data; SIZE and MACHINE READABLE TREES.

1. SIZE
The first line of the file contains the number of machine readable trees.
You can write comment of the trees after one digits number,
separated by blanks(space or tab).

2. MACHINE READABLE TREES
The following lines give sets of (user-defined) machine readable tree.
The tree is specified by the nested pairs of parentheses, enclosing names
and separated by commas.  Semicolon ";" is tree terminator.
The pattern of the parentheses represents the tree topology by having
each pair of parentheses enclose all the members of a monophyletic group.
You must put the next machine readable tree AFTER NEWLINE in free format,
allow separated by whitespace(space, tab or newline).
for example,

	(((HUMAN,(CHIMP,PYGMY)),GORIL),ORANG,SIAMA);

	(
		(
			(
				HUMAN,
				(
					CHIMP,
					PYGMY
				)
			),
			GORIL
		),
		ORANG,
		SIAMA
	);

the above two machine readable tree are the same.

Note that the machine readable tree is an UNROOTED one, and therefore its base
must be multifurcation with a multiplicity of greater than or equal to three.

    Unrooted tree (PROTML & DISTNJ)        Rooted tree (not allowed)
        variable rate                          constant rate

    ( subtree1, subtree2, subtree3 );      ( subtree1, subtree2 );

        :-----subtree1
        :                                      :-----subtree1
        :-----subtree2                         :
        :                                      :-----subtree2
        :-----subtree3
                                               ^root
        ^provisional root





                  Format of CONSTRAINED TREE File


	standard CONSTRAINED TREE file format:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
( { HUMAN,CHIMP,PYGMY,GORIL }, ORANG, SIAMA );
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Note, "~~~" is file separater.

CONSTRAINED TREE file allow constrained machine readable tree.
Pair of PARENTHESIS indicates FIX tree structure, but Pair of BRACE indicates
COMBINATION tree structure in a monophyletic group.

above CONSTRAINED TREE input PROTML with "-e" switch

	protml -e sequence_file constrained_tree

automatic generation of all possible trees.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
15
(((HUMAN,(CHIMP,PYGMY)),GORIL),ORANG,SIAMA);
((HUMAN,((CHIMP,PYGMY),GORIL)),ORANG,SIAMA);
(((HUMAN,GORIL),(CHIMP,PYGMY)),ORANG,SIAMA);
((((HUMAN,PYGMY),CHIMP),GORIL),ORANG,SIAMA);
((((HUMAN,CHIMP),PYGMY),GORIL),ORANG,SIAMA);
((HUMAN,(CHIMP,(PYGMY,GORIL))),ORANG,SIAMA);
((HUMAN,((CHIMP,GORIL),PYGMY)),ORANG,SIAMA);
((((HUMAN,GORIL),PYGMY),CHIMP),ORANG,SIAMA);
((((HUMAN,CHIMP),GORIL),PYGMY),ORANG,SIAMA);
(((HUMAN,CHIMP),(PYGMY,GORIL)),ORANG,SIAMA);
((((HUMAN,GORIL),CHIMP),PYGMY),ORANG,SIAMA);
(((HUMAN,(PYGMY,GORIL)),CHIMP),ORANG,SIAMA);
(((HUMAN,(CHIMP,GORIL)),PYGMY),ORANG,SIAMA);
(((HUMAN,PYGMY),(CHIMP,GORIL)),ORANG,SIAMA);
((((HUMAN,PYGMY),GORIL),CHIMP),ORANG,SIAMA);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

======================================
Jones, D.T., Taylor, W.R. and Thornton, J.M. (1992)
The rapid generation of mutation data matrices from protein sequences.
Computer Applications in Biosciences, 8:275-282.

Olsen, G.J., Matsuda, H., Hagstrom, R., and Overbeek, R. (1994)
fastDNAml: A tool for construction of phylogenetic trees of DNA sequences
using Maimum likelihood.
Computer Applications in Biosciences, 10:41-48.

Tamura, K. and Nei, M. (1993)
Estimation of the number of nucleotide substitutions in control region
of mitochondrial DNA in humans and chimpanzees.
Mol. Biol. Evol. 10:512-526.

Philippe, H. (1993)
MUST, a computer package of management utilities for sequences and trees.
Nucleic Acids Research, 21:5264-5272.

======================================


