CAFCA - Manual


III. Group Compatibility - Primary Analysis

Contents [of manual]


Introduction   Top

CAFCA employs the methods of group-compatibility (Zandee and Geesink, 1987) and component-compatibility (Zandee and Roos, 1987) to run its cladistic analyses. In contrast to, say, standard parsimony methods CAFCA does not find cladograms by exploring a search space looking for cladograms that maximize an optimality criterion expressing a property of a cladogram in terms of the characters used, like cladogram length. Instead, CAFCA explores a search space looking for cladograms with maximum resolution given the components available. The components in their turn can be defined in terms of character states in various ways. The collection of cladograms found can be delimited further by applying different optimality criteria.

This manual introduces you to these methods by taking as a starting point the results that can be obtained by applying them. In this way it is possible to guide you one step at the time, from a straightforward type of analysis to the more complicated ones.

CAFCA output has 6 parts.

  1. a Header, including CAFCA parameter settings
  2. the Data Matrix.
  3. a list of Building blocks for cladograms (clada, components), and their corresponding character states.
  4. a list of Character states on the root of selected cladogram(s).
  5. a list of Selection criteria for cladograms.
  6. Diagrams of selected cladograms, plus their lists of apomorphies and character state changes.

You will get a guided tour along these items, using a simple example of a data matrix with 5 taxa, Aus, Bus, Cus, Dus, and Eus, and 8 characters. The data matrix is available in the examples folder on your distribution disk.


Example   Top

The first analysis will be very straightforward, using the program's defaults for the major parameters. The data matrix used is complex enough to introduce the several possibilities of CAFCA as to coding of characters in a binary matrix, but on the other hand also simple enough from the point of phylogenetic structure as not to allow many competing best cladograms. The same data matrix will be used in the next examples as well, to illustrate the effect of changing some of the parameters.

Tutorial

Discussion of results

What follows is a discussion of the results obtained in this first primary analysis, and at the same time an explanaion of the concepts, jargon, and peculiarities of CAFCA.

  CAFCA - Mac   version 1.3i  (c) M.Z. 1987, 1995
      Date : 1 AUG 1995     Time : 9H30M28S
           CAFCA Parameter Settings

Type of analysis ...................... Primary
Cladon option ......................... 1: Partial  Monothetic Sets (PMS)
Cladogram Selection Criterion ......... Minimum Length
Taxon on outgroup-node ................
Ancestral zero's in character ......... 4 5 10
Ordered characters .................... 10
Maximum Number of Cladograms .......... 2

Table 3.1: Header of CAFCA output of first example.

Header

The header section (table 3.1) recapitulates the values of the major CAFCA parameters, as set before the analysis, either by you or by the program (defaults).

Type of analysis.
Type of analysis indicates your choice for either a Primary analyses are run with the complete character data matrix as a source of building-blocks (clada) for cladograms. These building blocks are defined as monothetic sets (Beckner 1959; partially or strict). Their number can be increased, optionally, by including all additive binary codings for each multi-state character. By using this option you check every possible sequence of states as a hypothesis of homology. You may, as an option, also include all groups from valid three-taxon-statement permutations to increase the number of building blocks for cladograms. A three-taxon-statement is considered valid if each cladon (group of terminal taxa) within the statement is supported by its own independent (local) synapomorphy.

Cladograms for taxa are derived as general patterns of internested (hierarchical) groups of taxa, emerging from the combination of the particular (independent) pattern in each separate character.

Secondary analyses serve to resolve polytomies in cladograms resulting from a previous primary or biogeographical analysis or user-tree evaluation where the set of building blocks for cladograms contained insufficient information for complete dichotomous resolutions (see chapter 4).

Biogeographic analyses are run to explore the relations between the phylogeny of a group of taxa, or different phylogenies of different (unrelated) groups, and the geographical distribution of the taxa involved (see chapter 6).

This type of analysis can also be used to explore the historical relationships between parasites and hosts, by considering hosts as areas of endemism for parasites. In fact any co-evolutionary pattern can be studied for its historical implications by this type of analysis. You may even consider taxa as areas of endemism for genes (character state expressions). The phylogeny of the taxa is seen as the general pattern emerging from the separate phylogenies of independent genes (character carriers), just as a general pattern for the historical relations among areas of endemism emerges from the separate phylogenies of independent taxa. That's the reason why in CAFCA primary, secondary, and biogeographic analyses are identical as to the method employed.

User-tree evaluation takes place when you have entered a data matrix plus one or more cladograms that must be evaluated against this data matrix. The cladograms usually come from the literature, or are based on intuition, but are as a rule not directly derived from the data matrix itself (see also chapter 5).

User-trees need not be completely resolved. If they are unresolved (i.e., contain polytomies) they can be subjected to a secondary analysis after evaluation.

Another possible use of user-tree evaluation may result from running a primary analysis on a data matrix containing the 'better' characters that, however, do not give a completely resolved cladogram. After saving, this cladogram can be entered as a user-tree and evaluated against another data matrix containing the 'weaker' characters, and consequently subjected to a secondary analysis on the basis of the 'weaker' characters.

User-tree evaluation is also applied in co-evolutionary studies. In those cases an independent estimate of the host phylogeny may be available. This host phylogeny is evaluated against the cladogram(s) for hosts found from the data matrix based on the parasite phylogeny and the distribution of parasites over hosts.

Cladon option.

The cladon option refers to the way the building blocks for cladograms (clada) are defined. In the first example these clada are defined following the partial definition for monothetic sets.

Cladogram selection.

In CAFCA you can choose from six different cladogram selection criteria. In the first example the default option, cladogram length, is chosen.

Ancestors and outgroups.
CAFCA can be run without an outgroup being indicated in the data matrix. On the other hand an outgroup can be declared optionally and interactively by you, or the program may deduce an outgroup from the presence of a full-zero row in the data matrix.

In a multi-state character a zero entry is interpreted as an indication of a (putative) ancestral state (see the paragraphs on 'assumptions regarding zero's in the data matrix').

Ordering of (multi-state) characters.
All characters are treated as unordered as well as unpolarized (unless ancestral zero is present and indicated as such) by default, with the exception of characters with (incomplete) additive binary coded states in binary data matrices, and multi-state characters for which you implemented a linear ordering upon request by the program.

In case a multi-state data matrix is used as input, the program will ask if these characters (none, some, all; if some then which) should be treated as ordered, that is, seen as an a priori polarised and ordered sequence of states. CAFCA can order multi-state characters only linearly in a sorted sequence (0 -> 1 -> 2 -> 3, etc...). Thus if you want the states ordered like 2 -> 1 -> 3 -> 0 you should first renumber them to 0, 1, 2, and 3, respectively.

Binary characters (0/1) are seen as characters with only one state (1) as CAFCA groups taxa as a result of presence (= 1) of states only (see also the assumptions regarding zero's in the data matrix). This state should best indicate a putative apomorphy if true phylogenetic results are required If such putative polarisation is impossible or unwarranted you should transform the binary character (0/1) to a multi-state one (1/2), or you should click No for 'Ancestral state indicated by zero' in the CAFCA parameter dialog. In the latter case groups of terminal taxa will also be based on the distribution of zero's as these zero's may now represent an apomorphic state.

So, if you want to implement an a priori polarisation plus ordering of character states you should either apply binary matrices with (incomplete) additive binary coding, or enter a multi-state data matrix and enforce a linear ordering for all or some of the characters when the program prompts you to do so.

In the present example character 10 shows a linear ordering of its states in the binary image of the data matrix (table 3.2).

Number of cladograms.

In the CAFCA parameter dialog box you can declare what the maximum number of cladograms (MNC) should be for which results will be retained in memory. In the header this parameter is represented by its declared value. Note that this number is different from the maximum number of cliques of components that CAFCA stores during its clique search (a built-in maximum of 5000).

Data Matrix.   Top

Types of characters, and the partitioning of columns.
CAFCA accepts both binary (0/1), multi-state (0/1/2/3/etc..) and mixed binary/multi-state matrices as input (table 3.2). In the latter case the binary characters are restricted to two states only (0/1). You can not mix the multi-state (0/1/2/3/etc.) representation and the binary representation of multi-state characters in one and the same data matrix.

Character states must be represented by digits (integers). Other symbols, like items from the alphabet, are not allowed. Missing values are allowed and must be indicated by a negative integer or a question mark.

In the analysis a binary representation of the data matrix is used for almost all computations (the cladogram optimisation algorithm uses a multi-state representation). This implies that if you define a multi-state matrix as input a copy of this matrix will be converted into a binary image (with the postfix ∆B added to its name).

The elements of the column partitioning vector (CPV) indicate how the columns (character states) of the binary data matrix should be taken together blockwise. Each block of columns corresponds to 1 character, i.e., a transformation series (= 1 column in the multi-state data matrix); each column in a block represents one character state.

This procedure of treating the binary representation of multi-state characters as blocks of interdependent states avoids the errors that are introduced when each state of a multi-state character is treated as a separate nominal variable (Pimentel and Riggins, 1987).

If you define a binary data matrix as input, you will be prompted to provide a column partitioning vector to let the program know how the character states (columns) should be grouped together, successively, to derive a multi-state matrix. If you enter a multi-state data matrix as input, the program can derive a column partitioning vector for the binary image.

The data matrix of our first example (table 3.2) was copied from an ASCII file as a binary data matrix (PLANTB.INP from the Xmpls folder). There are 3 characters with only 1 state (# 1, 7, and 8), 4 characters with two states (# 2, 3, 4, and 9), and 3 characters with 3 states (# 5, 6, and 10).

Data Matrix (binary) : PLANT (Columns represent character states)

       1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
    +------------------------------------------------------------
Aus |  1  1  0  0  1  0  0  0  0  0  0  0  1  0  1  0  1  0  0  0
Bus |  1  0  1  1  0  1  0  1  0  0  1  0  0  0  1  0  1  1  0  0
Cus |  1  0  1  1  0  0  1  0  1  0  0  1  0  0  1  0  1  1  1  0
Dus |  1  0  1  1  0  0  1  0  1  0  0  1  0  1  0  1  1  1  1  1
Eus |  1  0  1  1  0  0  1  0  0  1  0  1  0  1  0  1  0  1  1  1

Column Partitioning Vector : 1 2 2 2 3 3 1 1 2 3

Data Matrix (multi-state) : PLANT  (Columns represent characters)

       1  2  3  4  5  6  7  8  9 10
    +------------------------------
Aus |  1  1  2  0  0  3  0  1  2  0
Bus |  1  2  1  1  1  1  0  1  2  1
Cus |  1  2  1  2  2  2  0  1  2  2
Dus |  1  2  1  2  2  2  1  0  3  3
Eus |  1  2  1  2  3  2  1  0  1  3

Table 3.2: Data matrix for a cladistic character analysis.

Apparently there is a contradiction between some elements in the CPV (# 4, 5, 9) and the actual number of states in the characters. However, a zero in a multi-state character (like in # 4 and 5) is not treated as a separate state but as an indication of a (putative) ancestral condition (see assumptions regarding zero's). In character 9 state 3 in the multi-state matrix actually reflects a polytypism (see below) for state 1 and 2 in taxon 4 (columns 16 and 17 in binary image).

Note character 10 which in its binary image is additively coded (= a priori polarised and ordered). If you want to implement a priori polarisation plus ordering of character states in a particular character you should apply a binary matrix with (incomplete) additive coding in the block for that character, or enter a multi-state data matrix and enforce a linear ordering for all or some of the characters when the program asks you to do so.

In the case of characters with only one state (characters 1, 7, and 8) this state should best indicate a putative apomorphy if true phylogenetic results are required as CAFCA groups taxa as a result of presence of states only. If such putative polarisation is unwarranted you should transform the binary character (0/1) to a multi-state one (1/2), or you should click No for 'Ancestral state indicated by zero' in the CAFCA parameter dialog. In the latter case, groups of terminal taxa will also be based on the distribution of zero's as these zero's may now represent an apomorphic state. In all other cases characters are treated as unordered and unpolarized (characters 2 - 6, and 9), unless all states in a block are (incompletely) additive binary coded (character 10; columns 18, 19, 20 in binary image).

Polytypism.
Polytypism for character states in one or more taxa can be coded in different ways. In a multi-state data matrix one can enter a polytypic state with its own code of which, like all other codes, you know what it represents. In a binary data matrix polytypism can be indicated by simply entering a 1 for all the states (within a block of homologous states) present in a taxon. The program then offers you the opportunity either to let the program insert separate (new) columns for each polytypism, or to leave the matrix as it is. In the latter case polytypisms will show up in the multi-state data matrix by their own code (as they will in the state change list), although in the binary image of the data no separate columns for polytypism are present. CAFCA does not accept codes like {123} or (1,3~5), enclosed either in curly braces or parentheses, as used in PAUP's or MacClade's (both version 3.0) NEXUS file format, to indicate possible assignments ('uncertainty' vs 'polymorphism') of character states.

In the present example taxon 4 shows polytypism for character 9. No separate column for this state is present in the binary data matrix. The multi-state image of the binary matrix, however, shows a distinct code (3) for this polytypism that can be traced as such in the state change list (see page 45). Note, however, that CAFCA has no provision to deal with polytypism in internal nodes (hypothetical ancestral taxa) of the cladogram. It simply does a most parsimonious assignment, according to accelerated transformation (ACCTRAN) of a character state to an internal node.

Assumptions regarding zero's in the data matrix.
  1. In a data matrix with multi-state characters, zero's will, as a default, be interpreted as indications of putative ancestral states, except in full-zero columns (full-zero columns are neglected). In the cladogram optimisation process these putative ancestral conditions will be forced to be present at the root, even at the cost of extra steps.
    If you want a more liberate (and sometimes more parsimonious) attitude towards such putative ancestral states during cladogram optimisation, these states should be given either a question mark or their own code (any number but zero) in the data matrix, or you should click No for 'Ancestral state indicated by zero' in the CAFCA parameter dialog.
  2. In a binary data matrix where columns represent character states and blocksizes for columns indicate the number of homologues represented by contiguous columns, zero's are parameters of a character state indicating a Pi=0 for finding that state in taxon i, unless all entries in a row within a block are zero; then a putative ancestral condition is indicated (= neither one of the states in the block is present in the ancestor). The zero condition is forced to be present at the root in the cladogram optimisation process, unless you clicked No for 'Ancestral state indicated by zero' in the CAFCA parameter dialog.
  3. In true binary characters (with a blocksize equal to one !) the number of homologues is actually 1 under the condition that the ancestral state is indicated by zero, and 2 when it is not. In the latter case the zero's (0) in such a true binary character will, like the one's (1) be treated as indicating a group of terminal taxa (cladon), as they may represent an apomorpic state.

In the present example character 4, 5, and 10 show the first assumption in the multi-state image, and the second assumption in the binary image. The second assumption is also demonstrated by characters 1, 7, and 8, although a full-zero row is not present here. For the other characters 2, 3, 6, and 9 (without zero's) these assumptions do not obtain.
Full-zero columns
Full-zero columns in either a multi-state or a binary data matrix will pass the analysis as dummy's, i.e., they do not influence any of the results in any way. As a weight option is not yet implemented in CAFCA, this characteristic gives you a provisional opportunity to run different analyses on different selections of characters by substituting zero's for some of the characters, without the need of changing the size of the data matrix and thereby the index numbers of the characters. When you use the Clip data matrix option in the Utilities menu or during the preparation of a (primary) analysis, this is what happens with columns that are marked for deletion by you.
Missing values.
Missing values are allowed and must be indicated by a negative integer. For a binary data matrix this implies that missing values are represented by -1. In a multi-state data matrix either -1 or any other negative integer can be used. In the binary image of a multi-state data matrix negative integers show up as -1 in the appropriate column in the respective block. In a multi-state data matrix it is allowed to indicate a missing value for one taxon by, say, -2, and another missing value in the same character but for another taxon by, say -4, suggesting that these states are not alike, although unknown.

For identical indicators of missing values for several taxa, say, three taxa all showing a -2, all possible combinations of these taxa with the taxa showing known states with value 2 will be used in the derivation of building-blocks for cladograms. This is likewise true for taxa showing -1 with those showing 1 as a known state, those showing -3 with those showing 3, etc... This procedure implies that a data matrix with one column indicating identical missing values (e.g., -1) for all taxa will result in all cladograms possible given the number of taxa (the default is 6 taxa, implying 945 cladograms; you can indicate otherwise in the CAFCA parameter dialog. The maximum number possible is 12, although using it is quite absurd when you realise the number of cladograms (13.749.310.575) that are implied by this number. You may tie up CAFCA for years.

Thus if you know nothing, i.e., you have no data on your taxa, all possible outcomes are equally likely and will be presented (within limits).

In ASCII files representing data matrices you can also use a question mark as an indicator of a missing value. When CAFCA imports these files the question marks are translated to -1.

Clada   Top

Building-blocks for cladograms, or clada (singular: cladon), are derived from the binary representation of the data matrix by defining either partial or strict monothetic sets (Beckner, 1959) of terminal taxa, and their corresponding sets of character states (table 3.3a, table 3.8), or by adding to the partial sets the clada resulting from all additive binary codings of all multi-state characters (table 3.11), or by adding to the partial sets the clada resulting from all valid three-taxon-statements obtained from all three-taxon-statement permutations (table 3.15).

Although using monothetic sets and variations thereof in the recognition of building blocks for cladograms, the group- and component compatibility method should not be confused with the so-called monothetic group method as discussed by Farris, Kluge and Mickevich (1982). In contrast to the latter method CAFCA does not depend on a priori specification of transformation series and polarities of characters.

Option 1: Partial monothetic sets (PMS)
Partial monothetic sets of terminal taxa are defined by sets of unique character states (= partial application of the definition for monothetic sets according to Beckner, 1959). Partial monothetic defined clada reflect, as it were, your strong belief in your conjectures of homology, e.g., the branched hairs underneath the leaves are 'the same' in all taxa observed (compare dePinna's [1991] primary homologies).

Given a multi-state and a binary character for the taxa A to H, like for instance

1 1
1 0
2 1
3 0
3 0
2 1
1 1
2 1
the unordered representation in the binary data matrix will be the following blocks of character states:
11 12 13  21

1  0  0   1
1  0  0   0
0  1  0   1
0  0  1   0
0  0  1   0
0  1  0   1
1  0  0   1
0  1  0   1
from which, assuming that zero in the binary character implies an ancestral condition, the following list of clada is derived:

ABG, CFH, DE, and ACFGH;

from character states 11, 12, 13, and 21, respectively.

If the apomorphy decision for 1 or zero in binary character #2 is still undecided, the list of clada is supplemented by set {BDE} based on character state 20.

Option 2: Strict monothetic sets (SMS)
Strict monothetic sets are defined by unique combinations of character states (neither one of the separate states need to be unique = strict application of the definition of monothetic sets; see also Sharrock and Felsenstein, 1975; Farris, 1978; Farris, Kluge and Mickevich, 1982)).

Strict monothetic defined clada reflect the first signs of doubt as to the homology conjectures implied by partial monothetic sets. Strict sets say, as it were, that if the initial conjectures of homology are doubtful than a first hint of how these homologies may be broken down is given by the distribution (over taxa) of other states from other characters (congruence).

Given binary characters for the taxa A to H, like for instance

1 2 3 4 (characters)
1 1 1 0
1 0 1 0
1 1 0 0
0 0 1 1
1 0 0 0
0 1 0 0
1 1 0 1
0 1 0 1
the following list of clada is generated under option 1 (PMS):

ABD, DGH, ABCEG, and ACFGH;

from character 3, 4, 1, and 2, respectively (zero's assumed to indicate the ancestral condition).

Using option 2 (SMS) the following clada are generated as well:

AB, GH, and ACG.

AB results from the interaction among characters 1 and 3 (if ABD is not based on a character state, homologous over taxa, then maybe AB ìs), GH from characters 2 and 4, and ACG from characters 1 and 2. In this way we are still limiting the number of homoplasies to account for if character state distributions are not fully congruent. For instance, the set GH is not broken down in G and H separately unless there is evidence supplied by other characters that we should do so.

As we should not burden our analysis with hypotheses of homoplasy beyond necessity (Hennig's auxiliary principle), we usually take partial monothetic sets for clada (option 1) as first approximations in our attempts to achieve a fully resolved and parsimonious explanation of our data in terms of a cladogram. This approximation can be made better, if need be, by using strict monothetic sets (option 2) in another attempt to achieve fully resolved most parsimonious cladograms.

Wilkinson (1995) describes a strategy, safe taxonomic reduction, to cope with abundant missing entries in a data matrix. Through this strategy only taxa that can have no effect upon the inferred relationships of other taxa included in the analysis are excluded prior to analysis. According to Wilkinson a minimum requirement for the inclusion of any terminal taxon to alter relationships among the other terminal taxa is that it must have unique combinations of phylogenetically informative characters. Taxa that have the same combination of character states, so-called taxonomic equivalents, can be safely removed from the data matrix and are potential candidates for elimination prior to analysis. As described above, unique combinations of character states can be found through the application of the definition of (strict) monothetic sets.

We could extend the notion of what is considered a minimal requirement for terminal taxa to apply to internal nodes of a cladogram as well. Thus any component that is a strict monothetic set will affect relationships among the other components. As a corollary one may contemplate how to judge and what to do with components from a MPC that appear to have identical sets of character state combinations.

Option 3: PMS + all complementary codes
In defining sets of taxa by unique character states only, one may miss MPC's for which it is necessary to assume reversal(s) in order to fit character state distribution(s) to a cladogram. Using PMS + all complementary codes may serve as a first approximation to repair this anomaly, if necessary.

Instead of breaking down clada into subsets as indicated by overlapping character states as SMS does, this option finds new clada by iteratively joining the distributions of pairwise overlapping character states. For instance the binary data used in the example above (option 2) to derive strict monothetic sets result in the following sets after the first iteration.

{ABCEFGH}  1 + 2
{ABCDEG}   1 + 3
{ABCDEGH}  1 + 4
{ABCDFGH}  2 + 3
{ACDFGH}   2 + 4
{ABDGH}    3 + 4
In the second iteration these joint distributions are combined among each other as well as with the distributions of the original character states, e.g., {1 + 2} will be combined with {3} and with {4}, as well as with {3 + 4), etc. The iterations stop until no new combinations of sets are found.

This option # 3 can also be applied to strict monothetic sets. If you chose option 2 (SMS) right from the beginning, you will be prompted by CAFCA whether you want to add the complementary codes to the SMS's as well.

Option 4: PMS + all additive binary codes (PMS + ABC).
This option only applies to multi-state characters in a data matrix. For these characters the following algorithm is used to derive clada in addition to those obtained by PMS (option 1).

Given a multistate character for the taxa A to H, like for instance

A 1
B 1
C 2
D 3
E 3
F 2
G 1
H 2
the unordered representation in the binary data matrix will be the following block of character states:
11 12 13 

1  0  0
1  0  0
0  1  0
0  0  1
0  0  1
0  1  0
1  0  0
0  1  0
from which the following permutations of additive binary codings (transformation series) are derived
   1 2 3   1 3 2   2 1 3   2 3 1   3 2 1   3 1 2 (series)
   1 2 3   1 2 3   1 2 3   1 2 3   1 2 3   1 2 3 (states)
A  1 0 0   1 0 0   1 1 0   1 1 1   1 1 1   1 0 1
B  1 0 0   1 0 0   1 1 0   1 1 1   1 1 1   1 0 1
C  1 1 0   1 1 1   0 1 0   0 1 0   0 1 1   1 1 1
D  1 1 1   1 0 1   1 1 1   0 1 1   0 0 1   0 0 1
E  1 1 1   1 0 1   1 1 1   0 1 1   0 0 1   0 0 1
F  1 1 0   1 1 1   0 1 0   0 1 0   0 1 1   1 1 1
G  1 0 0   1 0 0   1 1 0   1 1 1   1 1 1   1 0 1
H  1 1 0   1 1 1   0 1 0   0 1 0   0 1 1   1 1 1
as well as the codes for the branched varieties

These binary codes are used to derive partial monothetic sets, as shown under option 1 above, in addition to the sets already obtained from the original binary codes of the character states in the data matrix.
Thus the list already obtained in option 1, ABG, CFH, and DE, is supplemented with the clada ABCDEFGH, CDEFH, AB, ABDEG, and ABCFGH.

Option 5: PMS + Three-Taxon-Statement permutations (PMS + TTSP).
This option applies to binary as well as to multi-state characters. For these characters the following algorithm is used to derive clada in addition to those obtained by PMS (option 1). It is based on an unpublished MS (M. Zandee - Three-taxon statement permutations + outgroup comparison = cladistic analysis. Paper read at the 4th meeting of the Willi Hennig Society, 1984).

  1. Take the binary representation of a character state, e.g.
     A 0
     B 1
     C 1 
     D 1
     E 0
     F 0
     G 1
     H 0
    if this representation contains six 1's or less (6 as a default; you can indicate more if you want to in the CAFCA parameter dialog). Make all groupings of taxa (duo's, trio's, quartet's, etc...) based on the distribution of state present indication (1), e.g.,

    BC, BD, BG, CD, CG, DG, BCD, BCG, BDG, CDG, and BCDG

  2. Make all possible three-taxon statements, based on these groupings, e.g., (BC)D, (BD)C, (BG)C, B(CD), (BC)G, B(GC), (BD)G, (BG)D, B(GD), (CD)G, (CG)D, C(GD), (BCD)G, (BCG)D, (BDG)C, and (CDG)B

  3. Check for each three-taxon statement whether its constituent parts have any independent character-state support.

    Thus, for instance, the groups BC, D, and BCD in (BC)D must have independent (= not identical) supporting character states for the three-taxon statement to be considered valid (all characters in the data matrix are used to this end). But the same must be true for BCD, G, and BCDG in (BCD)G, etc...

  4. Collect the valid three-taxon-statements.

  5. Extract their constituent sets.

  6. Do this for the binary representation(s) of each character in the data matrix.

  7. Join the resulting list of constituent sets of valid three-taxon statements with the list of partial monothetic sets, now representing the collection of clada under option 4.

By generating three-taxon statements we can, within practical limits, explore the situation where, according to Wilkinson (1991) parsimony analysis will not be misled if "... for any pair of sister taxa A and B there is more reliable evidence of their membership in a series of nested holophyletic groups to the exclusion of any unrelated taxon C, than there is misleading counterevidence for the inclusion of either A or B in an alternative set of nested groups to the exclusion of the other. "

Nelson and Platnick (1991) suggest another implementation of three-taxon statements. They only consider all pairs of taxa, and disregard groupings of higher order, that can be derived from a list of taxa sharing the same state of a character. To form three-taxon statements these pairs are united with all other taxa not sharing this state (i.e. having a zero), one at the time. This is repeated for each separate character. In this way a new data matrix is build, composed of the three-taxon statements implied by the characters in the original data matrix. Note that my implementation of three-taxon statements does not replace the original data matrix but only serves to provide additional building blocks for cladograms. Nelson and Platnick's definition of three-taxon statements is also treated in this way by CAFCA. Only if you export the data matrix in NEXUS, PAUP, or HENNIG86 format the N&P three-taxon codes will replace the original data matrix. If you opt for three-taxon-statements in the CAFCA parameters dialog you will be offered a choice between Nelson and Platnick's implementation and CAFCA's.


Previous part of manual | Next part of manual

  Top
Document:
URL:
Last update:

Questions ?:  Mail

© M. Zandee 1996.