                                   ffitch



Wiki

   The master copies of EMBOSS documentation are available at
   http://emboss.open-bio.org/wiki/Appdocs on the EMBOSS Wiki.

   Please help by correcting and extending the Wiki pages.

Function

   Fitch-Margoliash and Least-Squares Distance Methods

Description

   Estimates phylogenies from distance matrix data under the "additive
   tree model" according to which the distances are expected to equal the
   sums of branch lengths between the species. Uses the Fitch-Margoliash
   criterion and some related least squares criteria, or the Minimum
   Evolution distance matrix method. Does not assume an evolutionary
   clock. This program will be useful with distances computed from
   molecular sequences, restriction sites or fragments distances, with DNA
   hybridization measurements, and with genetic distances computed from
   gene frequencies.

Algorithm

   The programs FITCH, KITSCH, and NEIGHBOR are for dealing with data
   which comes in the form of a matrix of pairwise distances between all
   pairs of taxa, such as distances based on molecular sequence data, gene
   frequency genetic distances, amounts of DNA hybridization, or
   immunological distances. In analyzing these data, distance matrix
   programs implicitly assume that:
     * Each distance is measured independently from the others: no item of
       data contributes to more than one distance.
     * The distance between each pair of taxa is drawn from a distribution
       with an expectation which is the sum of values (in effect amounts
       of evolution) along the tree from one tip to the other. The
       variance of the distribution is proportional to a power p of the
       expectation.

   These assumptions can be traced in the least squares methods of
   programs FITCH and KITSCH but it is not quite so easy to see them in
   operation in the Neighbor-Joining method of NEIGHBOR, where the
   independence assumptions is less obvious.

   THESE TWO ASSUMPTIONS ARE DUBIOUS IN MOST CASES: independence will not
   be expected to be true in most kinds of data, such as genetic distances
   from gene frequency data. For genetic distance data in which pure
   genetic drift without mutation can be assumed to be the mechanism of
   change CONTML may be more appropriate. However, FITCH, KITSCH, and
   NEIGHBOR will not give positively misleading results (they will not
   make a statistically inconsistent estimate) provided that additivity
   holds, which it will if the distance is computed from the original data
   by a method which corrects for reversals and parallelisms in evolution.
   If additivity is not expected to hold, problems are more severe. A
   short discussion of these matters will be found in a review article of
   mine (1984a). For detailed, if sometimes irrelevant, controversy see
   the papers by Farris (1981, 1985, 1986) and myself (1986, 1988b).

   For genetic distances from gene frequencies, FITCH, KITSCH, and
   NEIGHBOR may be appropriate if a neutral mutation model can be assumed
   and Nei's genetic distance is used, or if pure drift can be assumed and
   either Cavalli-Sforza's chord measure or Reynolds, Weir, and
   Cockerham's (1983) genetic distance is used. However, in the latter
   case (pure drift) CONTML should be better.

   Restriction site and restriction fragment data can be treated by
   distance matrix methods if a distance such as that of Nei and Li (1979)
   is used. Distances of this sort can be computed in PHYLIp by the
   program RESTDIST.

   For nucleic acid sequences, the distances computed in DNADIST allow
   correction for multiple hits (in different ways) and should allow one
   to analyse the data under the presumption of additivity. In all of
   these cases independence will not be expected to hold. DNA
   hybridization and immunological distances may be additive and
   independent if transformed properly and if (and only if) the standards
   against which each value is measured are independent. (This is rarely
   exactly true).

   FITCH and the Neighbor-Joining option of NEIGHBOR fit a tree which has
   the branch lengths unconstrained. KITSCH and the UPGMA option of
   NEIGHBOR, by contrast, assume that an "evolutionary clock" is valid,
   according to which the true branch lengths from the root of the tree to
   each tip are the same: the expected amount of evolution in any lineage
   is proportional to elapsed time.

Usage

   Here is a sample session with ffitch


% ffitch
Fitch-Margoliash and Least-Squares Distance Methods
Phylip distance matrix file: fitch.dat
Phylip tree file (optional):
Phylip fitch program output file [fitch.ffitch]:

Adding species:
   1. Bovine
   2. Mouse
   3. Gibbon
   4. Orang
   5. Gorilla
   6. Chimp
   7. Human

Output written to file "fitch.ffitch"

Tree also written onto file "fitch.treefile"

Done.



   Go to the input files for this example
   Go to the output files for this example

Command line arguments

Fitch-Margoliash and Least-Squares Distance Methods
Version: EMBOSS:6.4.0.0

   Standard (Mandatory) qualifiers:
  [-datafile]          distances  File containing one or more distance
                                  matrices
  [-intreefile]        tree       Phylip tree file (optional)
  [-outfile]           outfile    [*.ffitch] Phylip fitch program output file

   Additional (Optional) qualifiers (* if not always prompted):
   -matrixtype         menu       [s] Type of input data matrix (Values: s
                                  (Square); u (Upper triangular); l (Lower
                                  triangular))
   -minev              boolean    [N] Minimum evolution
*  -njumble            integer    [0] Number of times to randomise (Integer 0
                                  or more)
*  -seed               integer    [1] Random number seed between 1 and 32767
                                  (must be odd) (Integer from 1 to 32767)
   -outgrno            integer    [0] Species number to use as outgroup
                                  (Integer 0 or more)
   -power              float      [2.0] Power (Any numeric value)
*  -lengths            boolean    [N] Use branch lengths from user trees
*  -negallowed         boolean    [N] Negative branch lengths allowed
*  -global             boolean    [N] Global rearrangements
   -replicates         boolean    [N] Subreplicates
   -[no]trout          toggle     [Y] Write out trees to tree file
*  -outtreefile        outfile    [*.ffitch] Phylip tree output file
                                  (optional)
   -printdata          boolean    [N] Print data at start of run
   -[no]progress       boolean    [Y] Print indications of progress of run
   -[no]treeprint      boolean    [Y] Print out tree

   Advanced (Unprompted) qualifiers: (none)
   Associated qualifiers:

   "-outfile" associated qualifiers
   -odirectory3        string     Output directory

   "-outtreefile" associated qualifiers
   -odirectory         string     Output directory

   General qualifiers:
   -auto               boolean    Turn off prompts
   -stdout             boolean    Write first file to standard output
   -filter             boolean    Read first file from standard input, write
                                  first file to standard output
   -options            boolean    Prompt for standard and additional values
   -debug              boolean    Write debug output to program.dbg
   -verbose            boolean    Report some/full command line options
   -help               boolean    Report command line options and exit. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose
   -warning            boolean    Report warnings
   -error              boolean    Report errors
   -fatal              boolean    Report fatal errors
   -die                boolean    Report dying program messages
   -version            boolean    Report version number and exit


Input file format

   ffitch reads any normal sequence USAs.

  Input files for usage example

  File: fitch.dat

    7
Bovine      0.0000  1.6866  1.7198  1.6606  1.5243  1.6043  1.5905
Mouse       1.6866  0.0000  1.5232  1.4841  1.4465  1.4389  1.4629
Gibbon      1.7198  1.5232  0.0000  0.7115  0.5958  0.6179  0.5583
Orang       1.6606  1.4841  0.7115  0.0000  0.4631  0.5061  0.4710
Gorilla     1.5243  1.4465  0.5958  0.4631  0.0000  0.3484  0.3083
Chimp       1.6043  1.4389  0.6179  0.5061  0.3484  0.0000  0.2692
Human       1.5905  1.4629  0.5583  0.4710  0.3083  0.2692  0.0000

Output file format

   ffitch output consists of an unrooted tree and the lengths of the
   interior segments. The sum of squares is printed out, and if P = 2.0
   Fitch and Margoliash's "average percent standard deviation" is also
   computed and printed out. This is the sum of squares, divided by N-2,
   and then square-rooted and then multiplied by 100 (n is the number of
   species on the tree):

     APSD = ( SSQ / (N-2) )1/2 x 100.

   where N is the total number of off-diagonal distance measurements that
   are in the (square) distance matrix. If the S (subreplication) option
   is in force it is instead the sum of the numbers of replicates in all
   the non-diagonal cells of the distance matrix. But if the L or R option
   is also in effect, so that the distance matrix read in is lower- or
   upper-triangular, then the sum of replicates is only over those cells
   actually read in. If S is not in force, the number of replicates in
   each cell is assumed to be 1, so that N is n(n-1), where n is the
   number of species. The APSD gives an indication of the average
   percentage error. The number of trees examined is also printed out.

  Output files for usage example

  File: fitch.ffitch


   7 Populations

Fitch-Margoliash method version 3.69

                  __ __             2
                  \  \   (Obs - Exp)
Sum of squares =  /_ /_  ------------
                                2
                   i  j      Obs

Negative branch lengths not allowed


  +---------------------------------------------Mouse
  !
  !                                +------Human
  !                             +--5
  !                           +-4  +--------Chimp
  !                           ! !
  !                        +--3 +---------Gorilla
  !                        !  !
  1------------------------2  +-----------------Orang
  !                        !
  !                        +---------------------Gibbon
  !
  +------------------------------------------------------Bovine


remember: this is an unrooted tree!

Sum of squares =     0.01375

Average percent standard deviation =     1.85418

Between        And            Length
-------        ---            ------
   1          Mouse             0.76985
   1             2              0.41983
   2             3              0.04986
   3             4              0.02121
   4             5              0.03695
   5          Human             0.11449
   5          Chimp             0.15471
   4          Gorilla           0.15680
   3          Orang             0.29209
   2          Gibbon            0.35537
   1          Bovine            0.91675



  File: fitch.treefile

(Mouse:0.76985,((((Human:0.11449,Chimp:0.15471):0.03695,
Gorilla:0.15680):0.02121,Orang:0.29209):0.04986,Gibbon:0.35537):0.41983,Bovine:0
.91675);

Data files

   None

Notes

   None.

References

   None.

Warnings

   None.

Diagnostic Error Messages

   None.

Exit status

   It always exits with status 0.

Known bugs

   None.

See also

                    Program name                       Description
                    efitch       Fitch-Margoliash and Least-Squares Distance Methods
                    ekitsch      Fitch-Margoliash method with contemporary tips
                    eneighbor    Phylogenies from distance matrix by N-J or UPGMA method
                    fkitsch      Fitch-Margoliash method with contemporary tips
                    fneighbor    Phylogenies from distance matrix by N-J or UPGMA method

Author(s)

                    This program is an EMBOSS conversion of a program written by Joe
                    Felsenstein as part of his PHYLIP package.

                    Please report all bugs to the EMBOSS bug team
                    (emboss-bug (c) emboss.open-bio.org) not to the original author.

History

                    Written (2004) - Joe Felsenstein, University of Washington.

                    Converted (August 2004) to an EMBASSY program by the EMBOSS team.

Target users

                    This program is intended to be used by everyone and everything, from
                    naive users to embedded scripts.
