Biology | Theoretical Biology | Rino's Home Page
[CAFCA logo] a Collection of APL Functions for Cladistic Analysis
 
Top A short history of CAFCA
 

CAFCA is an acronym that stands for Collection of APL Functions for Cladistic Analysis. As indicated in the acronym all algorithms implemented in CAFCA are written in APL. APL is developed by K.E. Iverson, originally as a notational tool for mathematics, but soon implemented on IBM mainframes as an interpreted computer language. Because APL code is interpreted, programs (or functions in APL jargon) written in APL can in terms of speed hardly compete with programs written in languages as C or Pascal (that are, in general, implemented as compilers). This is especially the case for algorithms that contain loops. On the other hand, APL contains very powerful operators enabling the programmer to achieve in one line of code the equivalent of dozens of lines of C or Pascal.

CAFCA was originally developed for use on mainframe computers. These versions were never distributed. When APL interpreters for personal computers became available CAFCA was ported to PC in 1985. In 1986 a menu driven user-interface was programmed and CAFCA could be distributed, eventually, as an integrated APL workspace. The main obstacle in its distribution was the need for users to separately acquire the APL interpreter as well, in order to make CAFCA run. Only after an APL interpreter with a free runtime license became available for Atari and Macintosh computers, the constraints on CAFCA's distribution could be lifted and CAFCA was ported to the Atari (1988) and to the Macintosh (1989). Soon further development was focused on the Macintosh only, including a native version for the PowerPC chip (1994). A Mac OSX [carbon] version is in the works, although the development of a new user-interface is a major hick-up. When it is finished a Windows version [sharing the same code] will be available as well. The old DOS version, as well as the Mac system 6 and 7 versions are no longer distributed.

Top Description
 

CAFCA employs the methods of group-compatibility and component-compatibility to run a cladistic analysis of a data matrix. The data matrix may either be binary, multi-state, or mixed binary/multi-state. Missing values are allowed and should be indicated by a negative integer, or a question mark. Multi state characters may be expressed in binary form.

CAFCA uses a so-called column partitioning vector to indicate which columns belong together block wise. Each block of binary states represents one multi state character, thus avoiding the errors that are introduced when each state of a multi state character is treated as a separate nominal variable. Polymorphism can be indicated, but only in the binary expression of multi-state characters. Characters may either be polarized by indicating the (putative) ancestral state (zero), or polarized and ordered by means of a (partial) additive binary coding, or kept neutral (no polarity, no order), or polarized and linearly ordered upon request (multi-state characters only). CAFCA has no options for step matrices. Taxa may either all belong to the ingroup, or the outgroup(s) may be included and (interactively) indicated by the user, or deduced from the data matrix (full zero row). The data matrix must be available either as an ASCII file, or be present in the OutputFile system, or the user may use CAFCA's built-in editor to enter a data matrix. The group- and component-compatibility method is based on the idea that each cladogram has components as its building blocks, i.e. sets of terminal taxa corresponding with the nodes in a cladogram. Any two components share one of four possible relations; exclusion, inclusion, overlap, and replication. Components are compatible when they either include, exclude, or replicate each other. All components (taxon subsets) of a cladogram are mutually compatible. Components of a cladogram can be seen as nodes of a graph, connected by lines depicting the compatibility relation. Thus the cladogram corresponds to a clique, a maximally connected [sub]graph.

CAFCA starts by extracting components from character data. Components can be defined in terms of character states in several ways. They may be defined by unique character states, by unique combinations of character states (neither one of the separate states needs to be unique; the combination, however, is), by using all possible transformation series (additive binary codings) in multi state characters, by three taxon statement permutations, or by polythetic sets of character states. Components thus defined are seen as nodes in a graph, connected when they are compatible. The graph is searched for maximally connected subgraphs (cliques) by a branch and bound algorithm. Each clique corresponds to a cladogram. The characters from the data matrix are optimized on each of the cladograms found (parsimony mapping), and the most parsimonious cladograms are selected as the most likely representation of the cladistic relationships of the taxa involved.

Nota Bene: Note that component compatibility is, in general, not the same as character compatibility. In component compatibility groups of taxa may be based on partial agreement of characters, something that is forbidden in character compatibility. As a consequence, more, better resolved, and more parsimonious cladograms can be found by the component compatibility method. In general, CAFCA can be used to analyse any pattern generated by historically associated lineages, be it genes (characters) in taxa (taxa as areas for genes), parasites on hosts (hosts as areas for parasites), or taxa and their areas of endemism or biota's. In all these instances the input data are generated from a cladogram describing the cladogenetic relationships of the genes, taxa, or parasites involved, and a binary matrix describing the distributions of these entities over their associate, i.e., taxa, areas, and hosts, respectively. The method employed in all these cases, i.e., component compatibility, is identical to the one as described above for character data.

Top Distribution
 

Your copy of the CAFCA archive contains 1 item: An executable file (program) with the name CAFCAppc (plus a version indication).
Additionally, you can download the archive with Help files and Examples. This archive contains two folders, one called Help, and the other Xmpls. The first contains Help files; the latter contains examples. In the Xmpls folder, several TEXT files are present. They are identical with the example data used in the manual. They give you an idea of how CAFCA input files should look like, and you can use them to run the examples.
Software Top How to get the program...
 

If you have bug reports, comments or suggestions, do not hesitate to give Feedback or send me E-mail
 

Documentation Top How to get the manual...
 

The CAFCA manual is available in two formats: OnLine (WWW pages), and Portable Document Files (PDF).

Top CAFCA Manual OnLine (work in progress !)
 
Top CAFCA Manual in PDF format
 

Portable Document Format requires a particular viewer, Acrobat Reader, which can be obtained for free from Adobe.

Top Installation
 
  1. Make a new folder on your hard disk (the boot partition, preferably) and name it CAFCA.
  2. You need a copy of Stuffit Expander to extract the CAFCA program from the archive. Double click the archived file you downloaded to start Stuffit and extract it. Put the CAFCA program in the CAFCA folder you just made in step 1.
  3. Double click the archive with Help and Example files. When prompted by the extracting program indicate the CAFCA folder you made in step 1.
  4. That's all.
Top Copyright notice
 

Copyright © 1988 M. Zandee.

Permission to use and distribute this software and its documentation for educational and research purposes is hereby granted without fee, provided the above copyright notice, author statement and this permission notice appear in all copies of this software and related documentation.

THE SOFTWARE IS PROVIDED "AS-IS" AND WITHOUT WARRANTY OF ANY KIND, EXPRESS, IMPLIED OR OTHERWISE, INCLUDING WITHOUT LIMITATION, ANY WARRANTY OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. IN NO EVENT SHALL THE AUTHOR, THE INSTITUTE OF BIOLOGY, LEIDEN UNIVERSITY, BE LIABLE FOR ANY SPECIAL, INCIDENTAL, INDIRECT OR CONSEQUENTIAL DAMAGES OF ANY KIND, OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER OR NOT ADVISED OF THE POSSIBILITY OF DAMAGE, AND ON ANY THEORY OF LIABILITY, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.

Or, in other words: I assume no responsibility for your computer, your data, your hardware, your software, you, or your mental well-being. By using this product, you agree not to hold me liable for anything. If you do not agree to these terms, then stay away from CAFCA and trash the downloaded file and forget you ever had it.

The Macintosh and PowerPC version of CAFCA was developed using APL.68000, a proprietary product of MicroAPL Ltd., which has given permission for a runtime version of APL.68000 to be distributed with the software, or to be packaged into the software as one executable. Copyright and all intellectual property rights in APL.68000 remain vested in MicroAPL Ltd.

Top Future releases
  Version 2.0: sometime in the future
  • When I succeed in writing and embedding some compiled C-code for the cladogram character optimisation algorithms in the otherwise interpreted APL code, in order to obtain a major speed-up for the main performance bottle-neck of CAFCA.
Version 1.6:
  • Will run natively on MacOSX (carbon) as well as Windows.
  • Adaptation of ASCII file parser for matrices with molecular data (ACGT/U codes).
  • Adaptation of ASCII file parser to NEXUS format.
Top Release history
  Version 1.5.12: 12 october 2002
  • When in a biogeographical analysis a data matrix is used in BPA format (i.e. with missing values for missing areas - in contrast to CCA which uses zero's) CAFCA will now ignore the missing values (and treat them as zero's) when the "missing value threshold" is set to zero. When set to 1 CAFCA will derive components as if the missing values indicate a zero, as well as additional components as if the missing values indicate a 1. Up to version 1.5k CAFCA used to do that anyway, irrespective of the threshold value. When the "missing value threshold" is set to a value higher than 1 (default 6 - max is 10) all possible combinations will be used (unchanged - as described in the manual).
Version 1.5k: 11 februari 2001
  • Minor bug fixes.
  • Counter intervals changed (from units to tens) due to write-to-screen bottleneck in APL on ever faster Macs.
Version 1.5j: 18 october 1999
  • Minor bug fixes.
Version 1.5i: 28 september 1998
  • What to do if you want to run BPA under assumption 1 or 2 ?
    1. For a PC:
      Click Export data matrix from the Utilities menu, choose Hennig86 format and take the option to include the assumptions in the data matrix. Run Hennig86 with this matrix.
    2. For a Mac:
      Click Export data matrix from the Utilities menu, choose NEXUS format and take the option to include the assumptions in the data matrix. Run PAUP with this matrix.
Version 1.5h: 27 april 1997
  • Minor bugs repaired: (with thanks to Brain R. Warren and Marco van Veller).
    1. Export data matrix in NEXUS format for option 5 (three taxon statements) according to Nelson & Platnick's definition now renders the proper matrix.
    2. Missing colon (:) in data matrix NEXUS format repaired.
    3. File format error repaired in cladogram optimization for biogeographic analysis when due to widespread taxa (etc...) some columns in the data matrix representing cladogram structure are identical.
  • Separate line added in analysis log screen for the computation of state change probabilities (RQ-loop)
  • You can now break (interupt) the clique search routine and continue further analysis, but based on an incomplete set of cladograms.
Version 1.5g: 27 august 1996
  • Bug repairs: mostly minor + 1 rather serious in the editor.
  • In case of binary data with more than 1 full-zero taxon as potential outgroups a dialog is added to select all or some of these taxa on the outgroup node.
  • Outgroup handling in three taxon statements for binary data adapted to Nelson's (1996) "Nullius in Verba"
Version 1.5f: 22 march 1996
  • NEXUS format for Spectrum program (by Michael Charleston) added in [Utilities | Export data matrix...] menu. CAFCA will not write the data matrix itself but instead use a list of components, as a binary matrix, derived from the data matrix. When the PMS option is applied these components correspond with the non-zero frequency partitions used in spectral analysis.
  • In [Print | Diagrams...] menu the option Print to file now offers a choice between a tree-file in SimpleText format and a tree-file in NEXUS format, to be used by the TreeView program (by Rod Page). The branchlength indication for the cladograms correspond with state change probabilities according to the Redundancy Quotient.
  • In [ASCII | Write...] menu the option Cladogram parentheses notation now offers a choice between a tree-file in CAFCA format and a tree-file in NEXUS format. Both these options offer a choice as to whether labels for internal nodes as well as branchlength indications should be included. The branchlength indication for the cladograms correspond with state change probabilities according to the Redundancy Quotient. Tree-files in NEXUS format with labels and branchlength indications can be viewed by means of the program TreeView (by Rod Page).
  • Minor bug repairs
Version 1.5e: december 1995
  • First release to be available on the Internet from this WWW site.
Top Page blurp
  Document
URL
Last update

© M. Zandee 1995.


IBL | Theoretical Biology | Rino's Home Page